2023-05-19 14:14:04

by Breno Leitao

[permalink] [raw]
Subject: [PATCH 0/1] net: ioctl: Use kernel buffer on proto ioctl callbacks

With the implementation of network ioctl on io_uring[1], Willem
suggested[2] that the "struct proto" ioctls functions should be reused,
instead of duplicating the code.
For that, the ioctl callbacks need to be more flexible, and avoid
operating on userspace buffers (doing get/put_user()) directly on the
callbacks. This patch adds this flexibility, so, the io_uring plumbing
becomes more clean, avoiding duplicating code. This may also benefit
BPF.

For that, a wrapper is created, which will copy from/to userspace, and
the ioctl callback will rely on the wrapper to do userspace memory
copies.

I've tested this patch in three different ways:
1) Created a simple testcase for TCP/UDP [3]
2) Run relevant LTP tests, such as: sockioctl, setsockopt, bind, sendto,
fanout, ns-udpsender, etc
3) Run basics network selftests

PS: There are some `strcmp()` in the `sock_skprot_ioctl()`, that I was
not able to find a better way to deal with it. Any feedback is
appreciated.

[1] Link: https://lore.kernel.org/all/GV1P193MB200533CC9A694C4066F4807CEA6F9@GV1P193MB2005.EURP193.PROD.OUTLOOK.COM/
[2] Link: https://lore.kernel.org/all/[email protected]/
[3] Link: https://github.com/leitao/liburing/blob/master/test/ioctl.c

Breno Leitao (1):
net: ioctl: Use kernel memory on protocol ioctl callbacks

include/linux/mroute.h | 4 +-
include/linux/mroute6.h | 4 +-
include/net/sock.h | 4 +-
include/net/tcp.h | 2 +-
include/net/udp.h | 2 +-
net/core/sock.c | 107 ++++++++++++++++++++++++++++++++++++++++
net/dccp/dccp.h | 2 +-
net/dccp/proto.c | 12 ++---
net/ieee802154/socket.c | 15 +++---
net/ipv4/af_inet.c | 2 +-
net/ipv4/ipmr.c | 41 +++++++--------
net/ipv4/raw.c | 16 +++---
net/ipv4/tcp.c | 5 +-
net/ipv4/udp.c | 12 ++---
net/ipv6/af_inet6.c | 2 +-
net/ipv6/ip6mr.c | 43 +++++++---------
net/ipv6/raw.c | 16 +++---
net/l2tp/l2tp_core.h | 2 +-
net/l2tp/l2tp_ip.c | 9 ++--
net/mptcp/protocol.c | 11 ++---
net/phonet/datagram.c | 11 ++---
net/phonet/pep.c | 11 ++---
net/phonet/socket.c | 2 +-
net/sctp/socket.c | 8 +--
24 files changed, 214 insertions(+), 129 deletions(-)

--
2.34.1



2023-05-19 14:14:58

by Breno Leitao

[permalink] [raw]
Subject: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks

Most of the ioctls to net protocols operates directly on userspace
argument (arg). Usually doing get_user()/put_user() directly in the
ioctl callback. This is not flexible, because it is hard to reuse these
functions without passing userspace buffers.

Change the "struct proto" ioctls to avoid touching userspace memory and
operate on kernel buffers, i.e., all protocol's ioctl callbacks is
adapted to operate on a kernel memory other than on userspace (so, no
more {put,get}_user() and friends being called in the ioctl callback).

This changes the "struct proto" ioctl format in the following way:

int (*ioctl)(struct sock *sk, int cmd,
- unsigned long arg);
+ int *karg);

So, the "karg" argument, which is passed to the ioctl callback, is a
pointer allocated to kernel space memory (inside a function wrapper -
sock_skprot_ioctl()). This buffer (karg) may contain input argument
(copied from userspace in a prep function) and it might return a
value/buffer, which is copied back to userspace if necessary. There is
not one-size-fits-all format (that is I am using 'may' above), but
basically, there are three type of ioctls:

1) Do not read from userspace, returns a result to userspace
2) Read an input parameter from userspace, and does not return anything
to userspace
3) Read an input from userspace, and return a buffer to userspace.

The default case (1) (where no input parameter is given, and an "int" is
returned to userspace) encompasses more than 90% of the cases, but there
are two other exceptions. Here is a list of exceptions:

* Protocol RAW:
* cmd = SIOCGETVIFCNT:
* input and output = struct sioc_vif_req
* cmd = SIOCGETSGCNT
* input and output = struct sioc_sg_req
* Explanation: for the SIOCGETVIFCNT case, userspace passes the input
argument, which is struct sioc_vif_req. Then the callback populates
the struct, which is copied back to userspace.

* Protocol RAW6:
* cmd = SIOCGETMIFCNT_IN6
* input and output = struct sioc_mif_req6
* cmd = SIOCGETSGCNT_IN6
* input and output = struct sioc_sg_req6

* Protocol PHONET:
* cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE
* input int (4 bytes)
* Nothing is copied back to userspace.

For the exception cases, functions sock_skproto_ioctl_in{out}() will
copy the userspace input, and copy it back to kernel space.

The wrapper that prepares the buffer and puts the buffer back to user is
sock_skprot_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the
callee now calls sock_skprot_ioctl().

Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/mroute.h | 4 +-
include/linux/mroute6.h | 4 +-
include/net/sock.h | 4 +-
include/net/tcp.h | 2 +-
include/net/udp.h | 2 +-
net/core/sock.c | 107 ++++++++++++++++++++++++++++++++++++++++
net/dccp/dccp.h | 2 +-
net/dccp/proto.c | 12 ++---
net/ieee802154/socket.c | 15 +++---
net/ipv4/af_inet.c | 2 +-
net/ipv4/ipmr.c | 41 +++++++--------
net/ipv4/raw.c | 16 +++---
net/ipv4/tcp.c | 5 +-
net/ipv4/udp.c | 12 ++---
net/ipv6/af_inet6.c | 2 +-
net/ipv6/ip6mr.c | 43 +++++++---------
net/ipv6/raw.c | 16 +++---
net/l2tp/l2tp_core.h | 2 +-
net/l2tp/l2tp_ip.c | 9 ++--
net/mptcp/protocol.c | 11 ++---
net/phonet/datagram.c | 11 ++---
net/phonet/pep.c | 11 ++---
net/phonet/socket.c | 2 +-
net/sctp/socket.c | 8 +--
24 files changed, 214 insertions(+), 129 deletions(-)

diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index 80b8400ab8b2..dec4815a85b2 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -18,7 +18,7 @@ static inline int ip_mroute_opt(int opt)

int ip_mroute_setsockopt(struct sock *, int, sockptr_t, unsigned int);
int ip_mroute_getsockopt(struct sock *, int, sockptr_t, sockptr_t);
-int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg);
+int ipmr_ioctl(struct sock *sk, int cmd, void *arg);
int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg);
int ip_mr_init(void);
bool ipmr_rule_default(const struct fib_rule *rule);
@@ -35,7 +35,7 @@ static inline int ip_mroute_getsockopt(struct sock *sk, int optname,
return -ENOPROTOOPT;
}

-static inline int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg)
+static inline int ipmr_ioctl(struct sock *sk, int cmd, void *arg)
{
return -ENOIOCTLCMD;
}
diff --git a/include/linux/mroute6.h b/include/linux/mroute6.h
index 8f2b307fb124..1dcbf15a2206 100644
--- a/include/linux/mroute6.h
+++ b/include/linux/mroute6.h
@@ -29,7 +29,7 @@ struct sock;
extern int ip6_mroute_setsockopt(struct sock *, int, sockptr_t, unsigned int);
extern int ip6_mroute_getsockopt(struct sock *, int, sockptr_t, sockptr_t);
extern int ip6_mr_input(struct sk_buff *skb);
-extern int ip6mr_ioctl(struct sock *sk, int cmd, void __user *arg);
+extern int ip6mr_ioctl(struct sock *sk, int cmd, void *arg);
extern int ip6mr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg);
extern int ip6_mr_init(void);
extern void ip6_mr_cleanup(void);
@@ -48,7 +48,7 @@ int ip6_mroute_getsockopt(struct sock *sock,
}

static inline
-int ip6mr_ioctl(struct sock *sk, int cmd, void __user *arg)
+int ip6mr_ioctl(struct sock *sk, int cmd, void *arg)
{
return -ENOIOCTLCMD;
}
diff --git a/include/net/sock.h b/include/net/sock.h
index 656ea89f60ff..4dc8e9ac7a0c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1246,7 +1246,7 @@ struct proto {
bool kern);

int (*ioctl)(struct sock *sk, int cmd,
- unsigned long arg);
+ int *karg);
int (*init)(struct sock *sk);
void (*destroy)(struct sock *sk);
void (*shutdown)(struct sock *sk, int how);
@@ -2961,6 +2961,8 @@ int sock_get_timeout(long timeo, void *optval, bool old_timeval);
int sock_copy_user_timeval(struct __kernel_sock_timeval *tv,
sockptr_t optval, int optlen, bool old_timeval);

+int sock_skprot_ioctl(struct sock *sk, unsigned int cmd,
+ void __user *arg);
static inline bool sk_is_readable(struct sock *sk)
{
if (sk->sk_prot->sock_is_readable)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 04a31643cda3..f784fa49d95d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -342,7 +342,7 @@ void tcp_release_cb(struct sock *sk);
void tcp_wfree(struct sk_buff *skb);
void tcp_write_timer_handler(struct sock *sk);
void tcp_delack_timer_handler(struct sock *sk);
-int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg);
+int tcp_ioctl(struct sock *sk, int cmd, int *karg);
int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb);
void tcp_rcv_established(struct sock *sk, struct sk_buff *skb);
void tcp_rcv_space_adjust(struct sock *sk);
diff --git a/include/net/udp.h b/include/net/udp.h
index de4b528522bb..9ff5bce33aa0 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -283,7 +283,7 @@ void udp_flush_pending_frames(struct sock *sk);
int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size);
void udp4_hwcsum(struct sk_buff *skb, __be32 src, __be32 dst);
int udp_rcv(struct sk_buff *skb);
-int udp_ioctl(struct sock *sk, int cmd, unsigned long arg);
+int udp_ioctl(struct sock *sk, int cmd, int *karg);
int udp_init_sock(struct sock *sk);
int udp_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
int __udp_disconnect(struct sock *sk, int flags);
diff --git a/net/core/sock.c b/net/core/sock.c
index 5440e67bcfe3..47567909baf2 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -114,6 +114,8 @@
#include <linux/memcontrol.h>
#include <linux/prefetch.h>
#include <linux/compat.h>
+#include <linux/mroute.h>
+#include <linux/mroute6.h>

#include <linux/uaccess.h>

@@ -138,6 +140,7 @@

#include <net/tcp.h>
#include <net/busy_poll.h>
+#include <net/phonet/phonet.h>

#include <linux/ethtool.h>

@@ -4106,3 +4109,107 @@ int sock_bind_add(struct sock *sk, struct sockaddr *addr, int addr_len)
return sk->sk_prot->bind_add(sk, addr, addr_len);
}
EXPORT_SYMBOL(sock_bind_add);
+
+/* Copy 'size' bytes from userspace and do not copy anything back */
+int sock_skproto_ioctl_in(struct sock *sk, unsigned int cmd,
+ void __user *arg)
+{
+ int karg;
+
+ if (get_user(karg, (u32 __user *)arg))
+ return -EFAULT;
+
+ return sk->sk_prot->ioctl(sk, cmd, &karg);
+}
+
+/* Copy 'size' bytes from userspace and return `size` back to userspace */
+int sock_skproto_ioctl_inout(struct sock *sk, unsigned int cmd,
+ void __user *arg, size_t size)
+{
+ void *ptr;
+ int ret;
+
+ ptr = kmalloc(size, GFP_KERNEL);
+ if (!ptr)
+ return -ENOMEM;
+
+ if (copy_from_user(ptr, arg, size)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ ret = sk->sk_prot->ioctl(sk, cmd, ptr);
+ if (ret)
+ goto out;
+
+ if (copy_to_user(arg, ptr, size))
+ ret = -EFAULT;
+
+out:
+ kfree(ptr);
+ return ret;
+}
+
+/* This is the most common ioctl prep function, where the result (4 bytes) is
+ * copied back to userspace if the ioctl() returns successfully. No input is
+ * copied from userspace as input argument.
+ */
+int sock_skproto_ioctl_out(struct sock *sk, unsigned int cmd,
+ void __user *arg)
+{
+ int ret, karg = 0;
+
+ ret = sk->sk_prot->ioctl(sk, cmd, &karg);
+ if (ret)
+ return ret;
+
+ return put_user(karg, (int __user *)arg);
+}
+
+/* A wrapper around sock ioctls, which copies the data from userspace
+ * (depending on the protocol/ioctl), and copies back the result to userspace.
+ * The main motivation for this function is to pass kernel memory to the
+ * protocol ioctl callsback, instead of userspace memory.
+ */
+int sock_skprot_ioctl(struct sock *sk, unsigned int cmd,
+ void __user *arg)
+{
+#ifdef CONFIG_IP_MROUTE
+ if (!strcmp(sk->sk_prot->name, "RAW")) {
+ switch (cmd) {
+ /* These userspace buffers will be consumed by ipmr_ioctl() */
+ case SIOCGETVIFCNT:
+ return sock_skproto_ioctl_inout(sk, cmd,
+ arg, sizeof(struct sioc_vif_req));
+ case SIOCGETSGCNT:
+ return sock_skproto_ioctl_inout(sk, cmd,
+ arg, sizeof(struct sioc_sg_req));
+ }
+ }
+#endif
+#ifdef CONFIG_IPV6_MROUTE
+ if (!strcmp(sk->sk_prot->name, "RAW6")) {
+ switch (cmd) {
+ /* These userspace buffers will be consumed by ip6mr_ioctl() */
+ case SIOCGETMIFCNT_IN6:
+ return sock_skproto_ioctl_inout(sk, cmd,
+ arg, sizeof(struct sioc_mif_req6));
+ case SIOCGETSGCNT_IN6:
+ return sock_skproto_ioctl_inout(sk, cmd,
+ arg, sizeof(struct sioc_sg_req6));
+ }
+ }
+#endif
+#ifdef CONFIG_PHONET
+ if (!strcmp(sk->sk_prot->name, "PHONET")) {
+ /* This userspace buffers will be consumed by pn_ioctl() */
+ switch (cmd) {
+ case SIOCPNADDRESOURCE:
+ case SIOCPNDELRESOURCE:
+ return sock_skproto_ioctl_in(sk, cmd, arg);
+ }
+ }
+#endif
+
+ return sock_skproto_ioctl_out(sk, cmd, arg);
+}
diff --git a/net/dccp/dccp.h b/net/dccp/dccp.h
index 9ddc3a9e89e4..1f748ed1279d 100644
--- a/net/dccp/dccp.h
+++ b/net/dccp/dccp.h
@@ -292,7 +292,7 @@ int dccp_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen);
int dccp_setsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, unsigned int optlen);
-int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg);
+int dccp_ioctl(struct sock *sk, int cmd, int *karg);
int dccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
int dccp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
int *addr_len);
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index a06b5641287a..9fc3ba4f62de 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -359,7 +359,7 @@ __poll_t dccp_poll(struct file *file, struct socket *sock,

EXPORT_SYMBOL_GPL(dccp_poll);

-int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg)
+int dccp_ioctl(struct sock *sk, int cmd, int *karg)
{
int rc = -ENOTCONN;

@@ -370,17 +370,17 @@ int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg)

switch (cmd) {
case SIOCOUTQ: {
- int amount = sk_wmem_alloc_get(sk);
+ *karg = sk_wmem_alloc_get(sk);
/* Using sk_wmem_alloc here because sk_wmem_queued is not used by DCCP and
* always 0, comparably to UDP.
*/

- rc = put_user(amount, (int __user *)arg);
+ rc = 0;
}
break;
case SIOCINQ: {
struct sk_buff *skb;
- unsigned long amount = 0;
+ *karg = 0;

skb = skb_peek(&sk->sk_receive_queue);
if (skb != NULL) {
@@ -388,9 +388,9 @@ int dccp_ioctl(struct sock *sk, int cmd, unsigned long arg)
* We will only return the amount of this packet since
* that is all that will be read.
*/
- amount = skb->len;
+ *karg = skb->len;
}
- rc = put_user(amount, (int __user *)arg);
+ rc = 0;
}
break;
default:
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index 1fa2fe041ec0..2948e9f38aec 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -162,7 +162,7 @@ static int ieee802154_sock_ioctl(struct socket *sock, unsigned int cmd,
default:
if (!sk->sk_prot->ioctl)
return -ENOIOCTLCMD;
- return sk->sk_prot->ioctl(sk, cmd, arg);
+ return sock_skprot_ioctl(sk, cmd, (void __user *)arg);
}
}

@@ -531,22 +531,21 @@ static int dgram_bind(struct sock *sk, struct sockaddr *uaddr, int len)
return err;
}

-static int dgram_ioctl(struct sock *sk, int cmd, unsigned long arg)
+static int dgram_ioctl(struct sock *sk, int cmd, int *karg)
{
switch (cmd) {
case SIOCOUTQ:
{
- int amount = sk_wmem_alloc_get(sk);
+ *karg = sk_wmem_alloc_get(sk);

- return put_user(amount, (int __user *)arg);
+ return 0;
}

case SIOCINQ:
{
struct sk_buff *skb;
- unsigned long amount;

- amount = 0;
+ *karg = 0;
spin_lock_bh(&sk->sk_receive_queue.lock);
skb = skb_peek(&sk->sk_receive_queue);
if (skb) {
@@ -554,10 +553,10 @@ static int dgram_ioctl(struct sock *sk, int cmd, unsigned long arg)
* of this packet since that is all
* that will be read.
*/
- amount = skb->len - ieee802154_hdr_length(skb);
+ *karg = skb->len - ieee802154_hdr_length(skb);
}
spin_unlock_bh(&sk->sk_receive_queue.lock);
- return put_user(amount, (int __user *)arg);
+ return 0;
}
}

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index c4aab3aacbd8..fdee1273e101 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -978,7 +978,7 @@ int inet_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
break;
default:
if (sk->sk_prot->ioctl)
- err = sk->sk_prot->ioctl(sk, cmd, arg);
+ err = sock_skprot_ioctl(sk, cmd, (void __user *)arg);
else
err = -ENOIOCTLCMD;
break;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index eec1f6df80d8..5d6531f6a235 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1593,13 +1593,13 @@ int ip_mroute_getsockopt(struct sock *sk, int optname, sockptr_t optval,
}

/* The IP multicast ioctl support routines. */
-int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg)
+int ipmr_ioctl(struct sock *sk, int cmd, void *arg)
{
- struct sioc_sg_req sr;
- struct sioc_vif_req vr;
struct vif_device *vif;
struct mfc_cache *c;
struct net *net = sock_net(sk);
+ struct sioc_vif_req *vr;
+ struct sioc_sg_req *sr;
struct mr_table *mrt;

mrt = ipmr_get_table(net, raw_sk(sk)->ipmr_table ? : RT_TABLE_DEFAULT);
@@ -1608,40 +1608,33 @@ int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg)

switch (cmd) {
case SIOCGETVIFCNT:
- if (copy_from_user(&vr, arg, sizeof(vr)))
- return -EFAULT;
- if (vr.vifi >= mrt->maxvif)
+ vr = (struct sioc_vif_req *)arg;
+ if (vr->vifi >= mrt->maxvif)
return -EINVAL;
- vr.vifi = array_index_nospec(vr.vifi, mrt->maxvif);
+ vr->vifi = array_index_nospec(vr->vifi, mrt->maxvif);
rcu_read_lock();
- vif = &mrt->vif_table[vr.vifi];
- if (VIF_EXISTS(mrt, vr.vifi)) {
- vr.icount = READ_ONCE(vif->pkt_in);
- vr.ocount = READ_ONCE(vif->pkt_out);
- vr.ibytes = READ_ONCE(vif->bytes_in);
- vr.obytes = READ_ONCE(vif->bytes_out);
+ vif = &mrt->vif_table[vr->vifi];
+ if (VIF_EXISTS(mrt, vr->vifi)) {
+ vr->icount = READ_ONCE(vif->pkt_in);
+ vr->ocount = READ_ONCE(vif->pkt_out);
+ vr->ibytes = READ_ONCE(vif->bytes_in);
+ vr->obytes = READ_ONCE(vif->bytes_out);
rcu_read_unlock();

- if (copy_to_user(arg, &vr, sizeof(vr)))
- return -EFAULT;
return 0;
}
rcu_read_unlock();
return -EADDRNOTAVAIL;
case SIOCGETSGCNT:
- if (copy_from_user(&sr, arg, sizeof(sr)))
- return -EFAULT;
+ sr = (struct sioc_sg_req *)arg;

rcu_read_lock();
- c = ipmr_cache_find(mrt, sr.src.s_addr, sr.grp.s_addr);
+ c = ipmr_cache_find(mrt, sr->src.s_addr, sr->grp.s_addr);
if (c) {
- sr.pktcnt = c->_c.mfc_un.res.pkt;
- sr.bytecnt = c->_c.mfc_un.res.bytes;
- sr.wrong_if = c->_c.mfc_un.res.wrong_if;
+ sr->pktcnt = c->_c.mfc_un.res.pkt;
+ sr->bytecnt = c->_c.mfc_un.res.bytes;
+ sr->wrong_if = c->_c.mfc_un.res.wrong_if;
rcu_read_unlock();
-
- if (copy_to_user(arg, &sr, sizeof(sr)))
- return -EFAULT;
return 0;
}
rcu_read_unlock();
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ff712bf2a98d..e394f02380b6 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -855,29 +855,29 @@ static int raw_getsockopt(struct sock *sk, int level, int optname,
return do_raw_getsockopt(sk, level, optname, optval, optlen);
}

-static int raw_ioctl(struct sock *sk, int cmd, unsigned long arg)
+static int raw_ioctl(struct sock *sk, int cmd, int *karg)
{
switch (cmd) {
case SIOCOUTQ: {
- int amount = sk_wmem_alloc_get(sk);
-
- return put_user(amount, (int __user *)arg);
+ *karg = sk_wmem_alloc_get(sk);
+ return 0;
}
case SIOCINQ: {
struct sk_buff *skb;
- int amount = 0;

spin_lock_bh(&sk->sk_receive_queue.lock);
skb = skb_peek(&sk->sk_receive_queue);
if (skb)
- amount = skb->len;
+ *karg = skb->len;
+ else
+ *karg = 0;
spin_unlock_bh(&sk->sk_receive_queue.lock);
- return put_user(amount, (int __user *)arg);
+ return 0;
}

default:
#ifdef CONFIG_IP_MROUTE
- return ipmr_ioctl(sk, cmd, (void __user *)arg);
+ return ipmr_ioctl(sk, cmd, karg);
#else
return -ENOIOCTLCMD;
#endif
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4d6392c16b7a..8ff2c3784aab 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -599,7 +599,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
}
EXPORT_SYMBOL(tcp_poll);

-int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
+int tcp_ioctl(struct sock *sk, int cmd, int *karg)
{
struct tcp_sock *tp = tcp_sk(sk);
int answ;
@@ -641,7 +641,8 @@ int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
return -ENOIOCTLCMD;
}

- return put_user(answ, (int __user *)arg);
+ *karg = answ;
+ return 0;
}
EXPORT_SYMBOL(tcp_ioctl);

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index aa32afd871ee..8b83a67cf852 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1720,21 +1720,19 @@ static int first_packet_length(struct sock *sk)
* IOCTL requests applicable to the UDP protocol
*/

-int udp_ioctl(struct sock *sk, int cmd, unsigned long arg)
+int udp_ioctl(struct sock *sk, int cmd, int *karg)
{
switch (cmd) {
case SIOCOUTQ:
{
- int amount = sk_wmem_alloc_get(sk);
-
- return put_user(amount, (int __user *)arg);
+ *karg = sk_wmem_alloc_get(sk);
+ return 0;
}

case SIOCINQ:
{
- int amount = max_t(int, 0, first_packet_length(sk));
-
- return put_user(amount, (int __user *)arg);
+ *karg = max_t(int, 0, first_packet_length(sk));
+ return 0;
}

default:
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 2bbf13216a3d..930454d7eaca 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -579,7 +579,7 @@ int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
prot = READ_ONCE(sk->sk_prot);
if (!prot->ioctl)
return -ENOIOCTLCMD;
- return prot->ioctl(sk, cmd, arg);
+ return sock_skprot_ioctl(sk, cmd, (void __user *)arg);
}
/*NOTREACHED*/
return 0;
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 51cf37abd142..0b6974daa666 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1879,11 +1879,10 @@ int ip6_mroute_getsockopt(struct sock *sk, int optname, sockptr_t optval,
/*
* The IP multicast ioctl support routines.
*/
-
-int ip6mr_ioctl(struct sock *sk, int cmd, void __user *arg)
+int ip6mr_ioctl(struct sock *sk, int cmd, void *arg)
{
- struct sioc_sg_req6 sr;
- struct sioc_mif_req6 vr;
+ struct sioc_sg_req6 *sr;
+ struct sioc_mif_req6 *vr;
struct vif_device *vif;
struct mfc6_cache *c;
struct net *net = sock_net(sk);
@@ -1895,40 +1894,32 @@ int ip6mr_ioctl(struct sock *sk, int cmd, void __user *arg)

switch (cmd) {
case SIOCGETMIFCNT_IN6:
- if (copy_from_user(&vr, arg, sizeof(vr)))
- return -EFAULT;
- if (vr.mifi >= mrt->maxvif)
+ vr = (struct sioc_mif_req6 *)arg;
+ if (vr->mifi >= mrt->maxvif)
return -EINVAL;
- vr.mifi = array_index_nospec(vr.mifi, mrt->maxvif);
+ vr->mifi = array_index_nospec(vr->mifi, mrt->maxvif);
rcu_read_lock();
- vif = &mrt->vif_table[vr.mifi];
- if (VIF_EXISTS(mrt, vr.mifi)) {
- vr.icount = READ_ONCE(vif->pkt_in);
- vr.ocount = READ_ONCE(vif->pkt_out);
- vr.ibytes = READ_ONCE(vif->bytes_in);
- vr.obytes = READ_ONCE(vif->bytes_out);
+ vif = &mrt->vif_table[vr->mifi];
+ if (VIF_EXISTS(mrt, vr->mifi)) {
+ vr->icount = READ_ONCE(vif->pkt_in);
+ vr->ocount = READ_ONCE(vif->pkt_out);
+ vr->ibytes = READ_ONCE(vif->bytes_in);
+ vr->obytes = READ_ONCE(vif->bytes_out);
rcu_read_unlock();
-
- if (copy_to_user(arg, &vr, sizeof(vr)))
- return -EFAULT;
return 0;
}
rcu_read_unlock();
return -EADDRNOTAVAIL;
case SIOCGETSGCNT_IN6:
- if (copy_from_user(&sr, arg, sizeof(sr)))
- return -EFAULT;
+ sr = (struct sioc_sg_req6 *)arg;

rcu_read_lock();
- c = ip6mr_cache_find(mrt, &sr.src.sin6_addr, &sr.grp.sin6_addr);
+ c = ip6mr_cache_find(mrt, &sr->src.sin6_addr, &sr->grp.sin6_addr);
if (c) {
- sr.pktcnt = c->_c.mfc_un.res.pkt;
- sr.bytecnt = c->_c.mfc_un.res.bytes;
- sr.wrong_if = c->_c.mfc_un.res.wrong_if;
+ sr->pktcnt = c->_c.mfc_un.res.pkt;
+ sr->bytecnt = c->_c.mfc_un.res.bytes;
+ sr->wrong_if = c->_c.mfc_un.res.wrong_if;
rcu_read_unlock();
-
- if (copy_to_user(arg, &sr, sizeof(sr)))
- return -EFAULT;
return 0;
}
rcu_read_unlock();
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 7d0adb612bdd..51d4f2d7c596 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1117,29 +1117,29 @@ static int rawv6_getsockopt(struct sock *sk, int level, int optname,
return do_rawv6_getsockopt(sk, level, optname, optval, optlen);
}

-static int rawv6_ioctl(struct sock *sk, int cmd, unsigned long arg)
+static int rawv6_ioctl(struct sock *sk, int cmd, int *karg)
{
switch (cmd) {
case SIOCOUTQ: {
- int amount = sk_wmem_alloc_get(sk);
-
- return put_user(amount, (int __user *)arg);
+ *karg = sk_wmem_alloc_get(sk);
+ return 0;
}
case SIOCINQ: {
struct sk_buff *skb;
- int amount = 0;

spin_lock_bh(&sk->sk_receive_queue.lock);
skb = skb_peek(&sk->sk_receive_queue);
if (skb)
- amount = skb->len;
+ *karg = skb->len;
+ else
+ *karg = 0;
spin_unlock_bh(&sk->sk_receive_queue.lock);
- return put_user(amount, (int __user *)arg);
+ return 0;
}

default:
#ifdef CONFIG_IPV6_MROUTE
- return ip6mr_ioctl(sk, cmd, (void __user *)arg);
+ return ip6mr_ioctl(sk, cmd, karg);
#else
return -ENOIOCTLCMD;
#endif
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index a88e070b431d..91ebf0a3f499 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -272,7 +272,7 @@ int l2tp_nl_register_ops(enum l2tp_pwtype pw_type, const struct l2tp_nl_cmd_ops
void l2tp_nl_unregister_ops(enum l2tp_pwtype pw_type);

/* IOCTL helper for IP encap modules. */
-int l2tp_ioctl(struct sock *sk, int cmd, unsigned long arg);
+int l2tp_ioctl(struct sock *sk, int cmd, int *karg);

/* Extract the tunnel structure from a socket's sk_user_data pointer,
* validating the tunnel magic feather.
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 41a74fc84ca1..2b795c1064f5 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -562,19 +562,18 @@ static int l2tp_ip_recvmsg(struct sock *sk, struct msghdr *msg,
return err ? err : copied;
}

-int l2tp_ioctl(struct sock *sk, int cmd, unsigned long arg)
+int l2tp_ioctl(struct sock *sk, int cmd, int *karg)
{
struct sk_buff *skb;
- int amount;

switch (cmd) {
case SIOCOUTQ:
- amount = sk_wmem_alloc_get(sk);
+ *karg = sk_wmem_alloc_get(sk);
break;
case SIOCINQ:
spin_lock_bh(&sk->sk_receive_queue.lock);
skb = skb_peek(&sk->sk_receive_queue);
- amount = skb ? skb->len : 0;
+ *karg = skb ? skb->len : 0;
spin_unlock_bh(&sk->sk_receive_queue.lock);
break;

@@ -582,7 +581,7 @@ int l2tp_ioctl(struct sock *sk, int cmd, unsigned long arg)
return -ENOIOCTLCMD;
}

- return put_user(amount, (int __user *)arg);
+ return 0;
}
EXPORT_SYMBOL_GPL(l2tp_ioctl);

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 08dc53f56bc2..abcdd7cf54b3 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3545,11 +3545,10 @@ static int mptcp_ioctl_outq(const struct mptcp_sock *msk, u64 v)
return (int)delta;
}

-static int mptcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
+static int mptcp_ioctl(struct sock *sk, int cmd, int *karg)
{
struct mptcp_sock *msk = mptcp_sk(sk);
bool slow;
- int answ;

switch (cmd) {
case SIOCINQ:
@@ -3558,24 +3557,24 @@ static int mptcp_ioctl(struct sock *sk, int cmd, unsigned long arg)

lock_sock(sk);
__mptcp_move_skbs(msk);
- answ = mptcp_inq_hint(sk);
+ *karg = mptcp_inq_hint(sk);
release_sock(sk);
break;
case SIOCOUTQ:
slow = lock_sock_fast(sk);
- answ = mptcp_ioctl_outq(msk, READ_ONCE(msk->snd_una));
+ *karg = mptcp_ioctl_outq(msk, READ_ONCE(msk->snd_una));
unlock_sock_fast(sk, slow);
break;
case SIOCOUTQNSD:
slow = lock_sock_fast(sk);
- answ = mptcp_ioctl_outq(msk, msk->snd_nxt);
+ *karg = mptcp_ioctl_outq(msk, msk->snd_nxt);
unlock_sock_fast(sk, slow);
break;
default:
return -ENOIOCTLCMD;
}

- return put_user(answ, (int __user *)arg);
+ return 0;
}

static void mptcp_subflow_early_fallback(struct mptcp_sock *msk,
diff --git a/net/phonet/datagram.c b/net/phonet/datagram.c
index ff5f49ab236e..3aa50dc7535b 100644
--- a/net/phonet/datagram.c
+++ b/net/phonet/datagram.c
@@ -28,24 +28,21 @@ static void pn_sock_close(struct sock *sk, long timeout)
sk_common_release(sk);
}

-static int pn_ioctl(struct sock *sk, int cmd, unsigned long arg)
+static int pn_ioctl(struct sock *sk, int cmd, int *karg)
{
struct sk_buff *skb;
- int answ;

switch (cmd) {
case SIOCINQ:
lock_sock(sk);
skb = skb_peek(&sk->sk_receive_queue);
- answ = skb ? skb->len : 0;
+ *karg = skb ? skb->len : 0;
release_sock(sk);
- return put_user(answ, (int __user *)arg);
+ return 0;

case SIOCPNADDRESOURCE:
case SIOCPNDELRESOURCE: {
- u32 res;
- if (get_user(res, (u32 __user *)arg))
- return -EFAULT;
+ u32 res = *karg;
if (res >= 256)
return -EINVAL;
if (cmd == SIOCPNADDRESOURCE)
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index 83ea13a50690..faba31f2eff2 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -917,10 +917,9 @@ static int pep_sock_enable(struct sock *sk, struct sockaddr *addr, int len)
return 0;
}

-static int pep_ioctl(struct sock *sk, int cmd, unsigned long arg)
+static int pep_ioctl(struct sock *sk, int cmd, int *karg)
{
struct pep_sock *pn = pep_sk(sk);
- int answ;
int ret = -ENOIOCTLCMD;

switch (cmd) {
@@ -933,13 +932,13 @@ static int pep_ioctl(struct sock *sk, int cmd, unsigned long arg)
lock_sock(sk);
if (sock_flag(sk, SOCK_URGINLINE) &&
!skb_queue_empty(&pn->ctrlreq_queue))
- answ = skb_peek(&pn->ctrlreq_queue)->len;
+ *karg = skb_peek(&pn->ctrlreq_queue)->len;
else if (!skb_queue_empty(&sk->sk_receive_queue))
- answ = skb_peek(&sk->sk_receive_queue)->len;
+ *karg = skb_peek(&sk->sk_receive_queue)->len;
else
- answ = 0;
+ *karg = 0;
release_sock(sk);
- ret = put_user(answ, (int __user *)arg);
+ ret = 0;
break;

case SIOCPNENABLEPIPE:
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 71e2caf6ab85..75ef3daa709e 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -387,7 +387,7 @@ static int pn_socket_ioctl(struct socket *sock, unsigned int cmd,
return put_user(handle, (__u16 __user *)arg);
}

- return sk->sk_prot->ioctl(sk, cmd, arg);
+ return sock_skprot_ioctl(sk, cmd, (void __user *)arg);
}

static int pn_socket_listen(struct socket *sock, int backlog)
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index cda8c2874691..3acd6e223cd4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4895,7 +4895,7 @@ static struct sock *sctp_accept(struct sock *sk, int flags, int *err, bool kern)
}

/* The SCTP ioctl handler. */
-static int sctp_ioctl(struct sock *sk, int cmd, unsigned long arg)
+static int sctp_ioctl(struct sock *sk, int cmd, int *karg)
{
int rc = -ENOTCONN;

@@ -4911,7 +4911,7 @@ static int sctp_ioctl(struct sock *sk, int cmd, unsigned long arg)
switch (cmd) {
case SIOCINQ: {
struct sk_buff *skb;
- unsigned int amount = 0;
+ *karg = 0;

skb = skb_peek(&sk->sk_receive_queue);
if (skb != NULL) {
@@ -4919,9 +4919,9 @@ static int sctp_ioctl(struct sock *sk, int cmd, unsigned long arg)
* We will only return the amount of this packet since
* that is all that will be read.
*/
- amount = skb->len;
+ *karg = skb->len;
}
- rc = put_user(amount, (int __user *)arg);
+ rc = 0;
break;
}
default:
--
2.34.1


2023-05-19 14:27:26

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks

From: Breno Leitao
> Sent: 19 May 2023 14:58
>
> Most of the ioctls to net protocols operates directly on userspace
> argument (arg). Usually doing get_user()/put_user() directly in the
> ioctl callback. This is not flexible, because it is hard to reuse these
> functions without passing userspace buffers.
>
> Change the "struct proto" ioctls to avoid touching userspace memory and
> operate on kernel buffers, i.e., all protocol's ioctl callbacks is
> adapted to operate on a kernel memory other than on userspace (so, no
> more {put,get}_user() and friends being called in the ioctl callback).
>
> This changes the "struct proto" ioctl format in the following way:
>
> int (*ioctl)(struct sock *sk, int cmd,
> - unsigned long arg);
> + int *karg);

I think I'd add a karg_len field for the actual buffer length.
It will save embarrassment later on.

Do any of the ioctl functions return +ve values on success?
If not you can use the return value as the length for any
copy_to_user().

If all the current 'cmd' are 16bit, there is the option
of using 32bit IOR() etc commands to get automatic sizing.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2023-05-19 15:27:13

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH 0/1] net: ioctl: Use kernel buffer on proto ioctl callbacks

On Fri, 19 May 2023 06:58:20 -0700 Breno Leitao wrote:
> With the implementation of network ioctl on io_uring[1], Willem
> suggested[2] that the "struct proto" ioctls functions should be reused,
> instead of duplicating the code.
> For that, the ioctl callbacks need to be more flexible, and avoid
> operating on userspace buffers (doing get/put_user()) directly on the
> callbacks. This patch adds this flexibility, so, the io_uring plumbing
> becomes more clean, avoiding duplicating code. This may also benefit
> BPF.
>
> For that, a wrapper is created, which will copy from/to userspace, and
> the ioctl callback will rely on the wrapper to do userspace memory
> copies.
>
> I've tested this patch in three different ways:
> 1) Created a simple testcase for TCP/UDP [3]
> 2) Run relevant LTP tests, such as: sockioctl, setsockopt, bind, sendto,
> fanout, ns-udpsender, etc
> 3) Run basics network selftests
>
> PS: There are some `strcmp()` in the `sock_skprot_ioctl()`, that I was
> not able to find a better way to deal with it. Any feedback is
> appreciated.

Why not CC netdev@ on this?

2023-05-19 15:30:25

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks

On Fri, May 19, 2023 at 9:59 AM Breno Leitao <[email protected]> wrote:
>
> Most of the ioctls to net protocols operates directly on userspace
> argument (arg). Usually doing get_user()/put_user() directly in the
> ioctl callback. This is not flexible, because it is hard to reuse these
> functions without passing userspace buffers.
>
> Change the "struct proto" ioctls to avoid touching userspace memory and
> operate on kernel buffers, i.e., all protocol's ioctl callbacks is
> adapted to operate on a kernel memory other than on userspace (so, no
> more {put,get}_user() and friends being called in the ioctl callback).
>
> This changes the "struct proto" ioctl format in the following way:
>
> int (*ioctl)(struct sock *sk, int cmd,
> - unsigned long arg);
> + int *karg);
>
> So, the "karg" argument, which is passed to the ioctl callback, is a
> pointer allocated to kernel space memory (inside a function wrapper -
> sock_skprot_ioctl()). This buffer (karg) may contain input argument
> (copied from userspace in a prep function) and it might return a
> value/buffer, which is copied back to userspace if necessary. There is
> not one-size-fits-all format (that is I am using 'may' above), but
> basically, there are three type of ioctls:
>
> 1) Do not read from userspace, returns a result to userspace
> 2) Read an input parameter from userspace, and does not return anything
> to userspace
> 3) Read an input from userspace, and return a buffer to userspace.
>
> The default case (1) (where no input parameter is given, and an "int" is
> returned to userspace) encompasses more than 90% of the cases, but there
> are two other exceptions. Here is a list of exceptions:
>
> * Protocol RAW:
> * cmd = SIOCGETVIFCNT:
> * input and output = struct sioc_vif_req
> * cmd = SIOCGETSGCNT
> * input and output = struct sioc_sg_req
> * Explanation: for the SIOCGETVIFCNT case, userspace passes the input
> argument, which is struct sioc_vif_req. Then the callback populates
> the struct, which is copied back to userspace.
>
> * Protocol RAW6:
> * cmd = SIOCGETMIFCNT_IN6
> * input and output = struct sioc_mif_req6
> * cmd = SIOCGETSGCNT_IN6
> * input and output = struct sioc_sg_req6
>
> * Protocol PHONET:
> * cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE
> * input int (4 bytes)
> * Nothing is copied back to userspace.
>
> For the exception cases, functions sock_skproto_ioctl_in{out}() will
> copy the userspace input, and copy it back to kernel space.
>
> The wrapper that prepares the buffer and puts the buffer back to user is
> sock_skprot_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the
> callee now calls sock_skprot_ioctl().
>
> Signed-off-by: Breno Leitao <[email protected]>

Overall this looks great to me.

Thanks for the detailed commit message that lists all exceptions, Bruno.

Since that is a limited well understood list, I'm not in favor of the
suggestion to add an explicit length argument that then needs to be
checked in each callee.

> +/* Copy 'size' bytes from userspace and return `size` back to userspace */
> +int sock_skproto_ioctl_inout(struct sock *sk, unsigned int cmd,
> + void __user *arg, size_t size)
> +{
> + void *ptr;
> + int ret;
> +
> + ptr = kmalloc(size, GFP_KERNEL);
> + if (!ptr)
> + return -ENOMEM;

> +/* A wrapper around sock ioctls, which copies the data from userspace
> + * (depending on the protocol/ioctl), and copies back the result to userspace.
> + * The main motivation for this function is to pass kernel memory to the
> + * protocol ioctl callsback, instead of userspace memory.
> + */
> +int sock_skprot_ioctl(struct sock *sk, unsigned int cmd,
> + void __user *arg)
> +{
> +#ifdef CONFIG_IP_MROUTE
> + if (!strcmp(sk->sk_prot->name, "RAW")) {

This must check both sk_family and sk_protocol. That is preferable
over string match.

For these exception cases, instead of having sock_skproto_ioctl_inout
dynamically allocate the struct, how about stack allocating them here
and passing to the function?

Nit: the function name is quite a mouthful. Just sock_ioctl_inout?

> + switch (cmd) {
> + /* These userspace buffers will be consumed by ipmr_ioctl() */
> + case SIOCGETVIFCNT:
> + return sock_skproto_ioctl_inout(sk, cmd,
> + arg, sizeof(struct sioc_vif_req));
> + case SIOCGETSGCNT:
> + return sock_skproto_ioctl_inout(sk, cmd,
> + arg, sizeof(struct sioc_sg_req));
> + }
> + }
> +#endif
> +#ifdef CONFIG_IPV6_MROUTE
> + if (!strcmp(sk->sk_prot->name, "RAW6")) {
> + switch (cmd) {
> + /* These userspace buffers will be consumed by ip6mr_ioctl() */
> + case SIOCGETMIFCNT_IN6:
> + return sock_skproto_ioctl_inout(sk, cmd,
> + arg, sizeof(struct sioc_mif_req6));
> + case SIOCGETSGCNT_IN6:
> + return sock_skproto_ioctl_inout(sk, cmd,
> + arg, sizeof(struct sioc_sg_req6));
> + }
> + }
> +#endif
> +#ifdef CONFIG_PHONET
> + if (!strcmp(sk->sk_prot->name, "PHONET")) {
> + /* This userspace buffers will be consumed by pn_ioctl() */
> + switch (cmd) {
> + case SIOCPNADDRESOURCE:
> + case SIOCPNDELRESOURCE:
> + return sock_skproto_ioctl_in(sk, cmd, arg);
> + }
> + }
> +#endif
> +
> + return sock_skproto_ioctl_out(sk, cmd, arg);
> +}

2023-05-19 15:39:27

by Breno Leitao

[permalink] [raw]
Subject: Re: [PATCH 0/1] net: ioctl: Use kernel buffer on proto ioctl callbacks

On Fri, May 19, 2023 at 08:15:26AM -0700, Jakub Kicinski wrote:
> On Fri, 19 May 2023 06:58:20 -0700 Breno Leitao wrote:
> > With the implementation of network ioctl on io_uring[1], Willem
> > suggested[2] that the "struct proto" ioctls functions should be reused,
> > instead of duplicating the code.
> > For that, the ioctl callbacks need to be more flexible, and avoid
> > operating on userspace buffers (doing get/put_user()) directly on the
> > callbacks. This patch adds this flexibility, so, the io_uring plumbing
> > becomes more clean, avoiding duplicating code. This may also benefit
> > BPF.
> >
> > For that, a wrapper is created, which will copy from/to userspace, and
> > the ioctl callback will rely on the wrapper to do userspace memory
> > copies.
> >
> > I've tested this patch in three different ways:
> > 1) Created a simple testcase for TCP/UDP [3]
> > 2) Run relevant LTP tests, such as: sockioctl, setsockopt, bind, sendto,
> > fanout, ns-udpsender, etc
> > 3) Run basics network selftests
> >
> > PS: There are some `strcmp()` in the `sock_skprot_ioctl()`, that I was
> > not able to find a better way to deal with it. Any feedback is
> > appreciated.
>
> Why not CC netdev@ on this?

Oops, my mistake. I will do it on V2.

2023-05-19 16:07:48

by Breno Leitao

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks

On Fri, May 19, 2023 at 11:09:29AM -0400, Willem de Bruijn wrote:
> On Fri, May 19, 2023 at 9:59 AM Breno Leitao <[email protected]> wrote:
> >
> > Most of the ioctls to net protocols operates directly on userspace
> > argument (arg). Usually doing get_user()/put_user() directly in the
> > ioctl callback. This is not flexible, because it is hard to reuse these
> > functions without passing userspace buffers.
> >
> > Change the "struct proto" ioctls to avoid touching userspace memory and
> > operate on kernel buffers, i.e., all protocol's ioctl callbacks is
> > adapted to operate on a kernel memory other than on userspace (so, no
> > more {put,get}_user() and friends being called in the ioctl callback).
> >
> > This changes the "struct proto" ioctl format in the following way:
> >
> > int (*ioctl)(struct sock *sk, int cmd,
> > - unsigned long arg);
> > + int *karg);
> >
> > So, the "karg" argument, which is passed to the ioctl callback, is a
> > pointer allocated to kernel space memory (inside a function wrapper -
> > sock_skprot_ioctl()). This buffer (karg) may contain input argument
> > (copied from userspace in a prep function) and it might return a
> > value/buffer, which is copied back to userspace if necessary. There is
> > not one-size-fits-all format (that is I am using 'may' above), but
> > basically, there are three type of ioctls:
> >
> > 1) Do not read from userspace, returns a result to userspace
> > 2) Read an input parameter from userspace, and does not return anything
> > to userspace
> > 3) Read an input from userspace, and return a buffer to userspace.
> >
> > The default case (1) (where no input parameter is given, and an "int" is
> > returned to userspace) encompasses more than 90% of the cases, but there
> > are two other exceptions. Here is a list of exceptions:
> >
> > * Protocol RAW:
> > * cmd = SIOCGETVIFCNT:
> > * input and output = struct sioc_vif_req
> > * cmd = SIOCGETSGCNT
> > * input and output = struct sioc_sg_req
> > * Explanation: for the SIOCGETVIFCNT case, userspace passes the input
> > argument, which is struct sioc_vif_req. Then the callback populates
> > the struct, which is copied back to userspace.
> >
> > * Protocol RAW6:
> > * cmd = SIOCGETMIFCNT_IN6
> > * input and output = struct sioc_mif_req6
> > * cmd = SIOCGETSGCNT_IN6
> > * input and output = struct sioc_sg_req6
> >
> > * Protocol PHONET:
> > * cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE
> > * input int (4 bytes)
> > * Nothing is copied back to userspace.
> >
> > For the exception cases, functions sock_skproto_ioctl_in{out}() will
> > copy the userspace input, and copy it back to kernel space.
> >
> > The wrapper that prepares the buffer and puts the buffer back to user is
> > sock_skprot_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the
> > callee now calls sock_skprot_ioctl().
> >
> > Signed-off-by: Breno Leitao <[email protected]>
>
> Overall this looks great to me.

Thanks for the guidance and quick review!

>
> Thanks for the detailed commit message that lists all exceptions, Bruno.
>
> Since that is a limited well understood list, I'm not in favor of the
> suggestion to add an explicit length argument that then needs to be
> checked in each callee.
>
> > +/* Copy 'size' bytes from userspace and return `size` back to userspace */
> > +int sock_skproto_ioctl_inout(struct sock *sk, unsigned int cmd,
> > + void __user *arg, size_t size)
> > +{
> > + void *ptr;
> > + int ret;
> > +
> > + ptr = kmalloc(size, GFP_KERNEL);
> > + if (!ptr)
> > + return -ENOMEM;
>
> > +/* A wrapper around sock ioctls, which copies the data from userspace
> > + * (depending on the protocol/ioctl), and copies back the result to userspace.
> > + * The main motivation for this function is to pass kernel memory to the
> > + * protocol ioctl callsback, instead of userspace memory.
> > + */
> > +int sock_skprot_ioctl(struct sock *sk, unsigned int cmd,
> > + void __user *arg)
> > +{
> > +#ifdef CONFIG_IP_MROUTE
> > + if (!strcmp(sk->sk_prot->name, "RAW")) {
>
> This must check both sk_family and sk_protocol. That is preferable
> over string match.
>
> For these exception cases, instead of having sock_skproto_ioctl_inout
> dynamically allocate the struct, how about stack allocating them here
> and passing to the function?

Should I stack allocate all the 4 structures sock_skprot_ioctl and pass
them to sock_skproto_ioctl_inout() together with the size? (using the
original name to avoid confusion - will rename in V2)

I mean, writing something as:

int sock_skprot_ioctl(struct sock *sk, unsigned int cmd
void __user *arg`
{
struct sioc_vif_req sioc_vif_req_arg;
struct sioc_sg_req sioc_sg_req_arg;
struct sioc_mif_req6 sioc_mif_req6_arg;
struct sioc_sg_req6 sioc_sg_req6_arg;

..

if (!strcmp(sk->sk_prot->name, "RAW6")) {
switch (cmd) {
case SIOCGETMIFCNT_IN6:
return sock_skproto_ioctl_inout(sk, cmd,
arg, &sioc_mif_req6_arg, sizeof(sioc_mif_req6_arg);
case SIOCGETSGCNT_IN6:
return sock_skproto_ioctl_inout(sk, cmd,
arg, &sioc_sg_req6_arg, sizeof(sioc_sg_req6_arg));
}
}
...
}

2023-05-19 16:58:10

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks

Hi Breno,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mptcp/export]
[also build test WARNING on mptcp/export-net net-next/main net/main linus/master v6.4-rc2 next-20230519]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Breno-Leitao/net-ioctl-Use-kernel-memory-on-protocol-ioctl-callbacks/20230519-223824
base: https://github.com/multipath-tcp/mptcp_net-next.git export
patch link: https://lore.kernel.org/r/20230519135821.922326-2-leitao%40debian.org
patch subject: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks
config: m68k-allyesconfig
compiler: m68k-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/95a4f4f84bf68692f5b42921c9f6067c0986aed8
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Breno-Leitao/net-ioctl-Use-kernel-memory-on-protocol-ioctl-callbacks/20230519-223824
git checkout 95a4f4f84bf68692f5b42921c9f6067c0986aed8
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash net/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

>> net/core/sock.c:4114:5: warning: no previous prototype for 'sock_skproto_ioctl_in' [-Wmissing-prototypes]
4114 | int sock_skproto_ioctl_in(struct sock *sk, unsigned int cmd,
| ^~~~~~~~~~~~~~~~~~~~~
>> net/core/sock.c:4126:5: warning: no previous prototype for 'sock_skproto_ioctl_inout' [-Wmissing-prototypes]
4126 | int sock_skproto_ioctl_inout(struct sock *sk, unsigned int cmd,
| ^~~~~~~~~~~~~~~~~~~~~~~~
>> net/core/sock.c:4157:5: warning: no previous prototype for 'sock_skproto_ioctl_out' [-Wmissing-prototypes]
4157 | int sock_skproto_ioctl_out(struct sock *sk, unsigned int cmd,
| ^~~~~~~~~~~~~~~~~~~~~~


vim +/sock_skproto_ioctl_in +4114 net/core/sock.c

4112
4113 /* Copy 'size' bytes from userspace and do not copy anything back */
> 4114 int sock_skproto_ioctl_in(struct sock *sk, unsigned int cmd,
4115 void __user *arg)
4116 {
4117 int karg;
4118
4119 if (get_user(karg, (u32 __user *)arg))
4120 return -EFAULT;
4121
4122 return sk->sk_prot->ioctl(sk, cmd, &karg);
4123 }
4124
4125 /* Copy 'size' bytes from userspace and return `size` back to userspace */
> 4126 int sock_skproto_ioctl_inout(struct sock *sk, unsigned int cmd,
4127 void __user *arg, size_t size)
4128 {
4129 void *ptr;
4130 int ret;
4131
4132 ptr = kmalloc(size, GFP_KERNEL);
4133 if (!ptr)
4134 return -ENOMEM;
4135
4136 if (copy_from_user(ptr, arg, size)) {
4137 ret = -EFAULT;
4138 goto out;
4139 }
4140
4141 ret = sk->sk_prot->ioctl(sk, cmd, ptr);
4142 if (ret)
4143 goto out;
4144
4145 if (copy_to_user(arg, ptr, size))
4146 ret = -EFAULT;
4147
4148 out:
4149 kfree(ptr);
4150 return ret;
4151 }
4152
4153 /* This is the most common ioctl prep function, where the result (4 bytes) is
4154 * copied back to userspace if the ioctl() returns successfully. No input is
4155 * copied from userspace as input argument.
4156 */
> 4157 int sock_skproto_ioctl_out(struct sock *sk, unsigned int cmd,
4158 void __user *arg)
4159 {
4160 int ret, karg = 0;
4161
4162 ret = sk->sk_prot->ioctl(sk, cmd, &karg);
4163 if (ret)
4164 return ret;
4165
4166 return put_user(karg, (int __user *)arg);
4167 }
4168

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Attachments:
(No filename) (4.34 kB)
config (295.63 kB)
Download all attachments

2023-05-19 17:09:27

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks

On Fri, May 19, 2023 at 11:39 AM Breno Leitao <[email protected]> wrote:
>
> On Fri, May 19, 2023 at 11:09:29AM -0400, Willem de Bruijn wrote:
> > On Fri, May 19, 2023 at 9:59 AM Breno Leitao <[email protected]> wrote:
> > >
> > > Most of the ioctls to net protocols operates directly on userspace
> > > argument (arg). Usually doing get_user()/put_user() directly in the
> > > ioctl callback. This is not flexible, because it is hard to reuse these
> > > functions without passing userspace buffers.
> > >
> > > Change the "struct proto" ioctls to avoid touching userspace memory and
> > > operate on kernel buffers, i.e., all protocol's ioctl callbacks is
> > > adapted to operate on a kernel memory other than on userspace (so, no
> > > more {put,get}_user() and friends being called in the ioctl callback).
> > >
> > > This changes the "struct proto" ioctl format in the following way:
> > >
> > > int (*ioctl)(struct sock *sk, int cmd,
> > > - unsigned long arg);
> > > + int *karg);
> > >
> > > So, the "karg" argument, which is passed to the ioctl callback, is a
> > > pointer allocated to kernel space memory (inside a function wrapper -
> > > sock_skprot_ioctl()). This buffer (karg) may contain input argument
> > > (copied from userspace in a prep function) and it might return a
> > > value/buffer, which is copied back to userspace if necessary. There is
> > > not one-size-fits-all format (that is I am using 'may' above), but
> > > basically, there are three type of ioctls:
> > >
> > > 1) Do not read from userspace, returns a result to userspace
> > > 2) Read an input parameter from userspace, and does not return anything
> > > to userspace
> > > 3) Read an input from userspace, and return a buffer to userspace.
> > >
> > > The default case (1) (where no input parameter is given, and an "int" is
> > > returned to userspace) encompasses more than 90% of the cases, but there
> > > are two other exceptions. Here is a list of exceptions:
> > >
> > > * Protocol RAW:
> > > * cmd = SIOCGETVIFCNT:
> > > * input and output = struct sioc_vif_req
> > > * cmd = SIOCGETSGCNT
> > > * input and output = struct sioc_sg_req
> > > * Explanation: for the SIOCGETVIFCNT case, userspace passes the input
> > > argument, which is struct sioc_vif_req. Then the callback populates
> > > the struct, which is copied back to userspace.
> > >
> > > * Protocol RAW6:
> > > * cmd = SIOCGETMIFCNT_IN6
> > > * input and output = struct sioc_mif_req6
> > > * cmd = SIOCGETSGCNT_IN6
> > > * input and output = struct sioc_sg_req6
> > >
> > > * Protocol PHONET:
> > > * cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE
> > > * input int (4 bytes)
> > > * Nothing is copied back to userspace.
> > >
> > > For the exception cases, functions sock_skproto_ioctl_in{out}() will
> > > copy the userspace input, and copy it back to kernel space.
> > >
> > > The wrapper that prepares the buffer and puts the buffer back to user is
> > > sock_skprot_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the
> > > callee now calls sock_skprot_ioctl().
> > >
> > > Signed-off-by: Breno Leitao <[email protected]>
> >
> > Overall this looks great to me.
>
> Thanks for the guidance and quick review!
>
> >
> > Thanks for the detailed commit message that lists all exceptions, Bruno.
> >
> > Since that is a limited well understood list, I'm not in favor of the
> > suggestion to add an explicit length argument that then needs to be
> > checked in each callee.
> >
> > > +/* Copy 'size' bytes from userspace and return `size` back to userspace */
> > > +int sock_skproto_ioctl_inout(struct sock *sk, unsigned int cmd,
> > > + void __user *arg, size_t size)
> > > +{
> > > + void *ptr;
> > > + int ret;
> > > +
> > > + ptr = kmalloc(size, GFP_KERNEL);
> > > + if (!ptr)
> > > + return -ENOMEM;
> >
> > > +/* A wrapper around sock ioctls, which copies the data from userspace
> > > + * (depending on the protocol/ioctl), and copies back the result to userspace.
> > > + * The main motivation for this function is to pass kernel memory to the
> > > + * protocol ioctl callsback, instead of userspace memory.
> > > + */
> > > +int sock_skprot_ioctl(struct sock *sk, unsigned int cmd,
> > > + void __user *arg)
> > > +{
> > > +#ifdef CONFIG_IP_MROUTE
> > > + if (!strcmp(sk->sk_prot->name, "RAW")) {
> >
> > This must check both sk_family and sk_protocol. That is preferable
> > over string match.
> >
> > For these exception cases, instead of having sock_skproto_ioctl_inout
> > dynamically allocate the struct, how about stack allocating them here
> > and passing to the function?
>
> Should I stack allocate all the 4 structures sock_skprot_ioctl and pass
> them to sock_skproto_ioctl_inout() together with the size? (using the
> original name to avoid confusion - will rename in V2)
>
> I mean, writing something as:
>
> int sock_skprot_ioctl(struct sock *sk, unsigned int cmd
> void __user *arg`
> {
> struct sioc_vif_req sioc_vif_req_arg;
> struct sioc_sg_req sioc_sg_req_arg;
> struct sioc_mif_req6 sioc_mif_req6_arg;
> struct sioc_sg_req6 sioc_sg_req6_arg;
>
> ..
>
> if (!strcmp(sk->sk_prot->name, "RAW6")) {
> switch (cmd) {
> case SIOCGETMIFCNT_IN6:
> return sock_skproto_ioctl_inout(sk, cmd,
> arg, &sioc_mif_req6_arg, sizeof(sioc_mif_req6_arg);
> case SIOCGETSGCNT_IN6:
> return sock_skproto_ioctl_inout(sk, cmd,
> arg, &sioc_sg_req6_arg, sizeof(sioc_sg_req6_arg));
> }
> }
> ...
> }

Slight preference for using braces in the individual case statements
and defining the variables in that block scope. See for instance
do_tcp_setsockopt.

Btw, no need for a cover letter for a single patch.

2023-05-19 20:06:34

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks

Hi Breno,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mptcp/export]
[also build test WARNING on mptcp/export-net net-next/main net/main linus/master v6.4-rc2 next-20230519]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Breno-Leitao/net-ioctl-Use-kernel-memory-on-protocol-ioctl-callbacks/20230519-223824
base: https://github.com/multipath-tcp/mptcp_net-next.git export
patch link: https://lore.kernel.org/r/20230519135821.922326-2-leitao%40debian.org
patch subject: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks
config: hexagon-randconfig-r045-20230517
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project b0fb98227c90adf2536c9ad644a74d5e92961111)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/95a4f4f84bf68692f5b42921c9f6067c0986aed8
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Breno-Leitao/net-ioctl-Use-kernel-memory-on-protocol-ioctl-callbacks/20230519-223824
git checkout 95a4f4f84bf68692f5b42921c9f6067c0986aed8
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash net/core/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

In file included from net/core/sock.c:91:
In file included from include/linux/errqueue.h:6:
In file included from include/net/ip.h:22:
In file included from include/linux/ip.h:16:
In file included from include/linux/skbuff.h:17:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
^
In file included from net/core/sock.c:91:
In file included from include/linux/errqueue.h:6:
In file included from include/net/ip.h:22:
In file included from include/linux/ip.h:16:
In file included from include/linux/skbuff.h:17:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
^
In file included from net/core/sock.c:91:
In file included from include/linux/errqueue.h:6:
In file included from include/net/ip.h:22:
In file included from include/linux/ip.h:16:
In file included from include/linux/skbuff.h:17:
In file included from include/linux/bvec.h:10:
In file included from include/linux/highmem.h:12:
In file included from include/linux/hardirq.h:11:
In file included from ./arch/hexagon/include/generated/asm/hardirq.h:1:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/hexagon/include/asm/io.h:334:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
>> net/core/sock.c:4114:5: warning: no previous prototype for function 'sock_skproto_ioctl_in' [-Wmissing-prototypes]
int sock_skproto_ioctl_in(struct sock *sk, unsigned int cmd,
^
net/core/sock.c:4114:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int sock_skproto_ioctl_in(struct sock *sk, unsigned int cmd,
^
static
>> net/core/sock.c:4126:5: warning: no previous prototype for function 'sock_skproto_ioctl_inout' [-Wmissing-prototypes]
int sock_skproto_ioctl_inout(struct sock *sk, unsigned int cmd,
^
net/core/sock.c:4126:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int sock_skproto_ioctl_inout(struct sock *sk, unsigned int cmd,
^
static
>> net/core/sock.c:4157:5: warning: no previous prototype for function 'sock_skproto_ioctl_out' [-Wmissing-prototypes]
int sock_skproto_ioctl_out(struct sock *sk, unsigned int cmd,
^
net/core/sock.c:4157:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int sock_skproto_ioctl_out(struct sock *sk, unsigned int cmd,
^
static
9 warnings generated.


vim +/sock_skproto_ioctl_in +4114 net/core/sock.c

4112
4113 /* Copy 'size' bytes from userspace and do not copy anything back */
> 4114 int sock_skproto_ioctl_in(struct sock *sk, unsigned int cmd,
4115 void __user *arg)
4116 {
4117 int karg;
4118
4119 if (get_user(karg, (u32 __user *)arg))
4120 return -EFAULT;
4121
4122 return sk->sk_prot->ioctl(sk, cmd, &karg);
4123 }
4124
4125 /* Copy 'size' bytes from userspace and return `size` back to userspace */
> 4126 int sock_skproto_ioctl_inout(struct sock *sk, unsigned int cmd,
4127 void __user *arg, size_t size)
4128 {
4129 void *ptr;
4130 int ret;
4131
4132 ptr = kmalloc(size, GFP_KERNEL);
4133 if (!ptr)
4134 return -ENOMEM;
4135
4136 if (copy_from_user(ptr, arg, size)) {
4137 ret = -EFAULT;
4138 goto out;
4139 }
4140
4141 ret = sk->sk_prot->ioctl(sk, cmd, ptr);
4142 if (ret)
4143 goto out;
4144
4145 if (copy_to_user(arg, ptr, size))
4146 ret = -EFAULT;
4147
4148 out:
4149 kfree(ptr);
4150 return ret;
4151 }
4152
4153 /* This is the most common ioctl prep function, where the result (4 bytes) is
4154 * copied back to userspace if the ioctl() returns successfully. No input is
4155 * copied from userspace as input argument.
4156 */
> 4157 int sock_skproto_ioctl_out(struct sock *sk, unsigned int cmd,
4158 void __user *arg)
4159 {
4160 int ret, karg = 0;
4161
4162 ret = sk->sk_prot->ioctl(sk, cmd, &karg);
4163 if (ret)
4164 return ret;
4165
4166 return put_user(karg, (int __user *)arg);
4167 }
4168

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Attachments:
(No filename) (9.09 kB)
config (119.09 kB)
Download all attachments

2023-05-19 23:54:39

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks

Hi Breno,

kernel test robot noticed the following build errors:

[auto build test ERROR on mptcp/export]
[also build test ERROR on mptcp/export-net net-next/main net/main linus/master v6.4-rc2 next-20230519]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Breno-Leitao/net-ioctl-Use-kernel-memory-on-protocol-ioctl-callbacks/20230519-223824
base: https://github.com/multipath-tcp/mptcp_net-next.git export
patch link: https://lore.kernel.org/r/20230519135821.922326-2-leitao%40debian.org
patch subject: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks
config: powerpc-allmodconfig
compiler: powerpc-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/95a4f4f84bf68692f5b42921c9f6067c0986aed8
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Breno-Leitao/net-ioctl-Use-kernel-memory-on-protocol-ioctl-callbacks/20230519-223824
git checkout 95a4f4f84bf68692f5b42921c9f6067c0986aed8
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "sock_skprot_ioctl" [net/ipv6/ipv6.ko] undefined!
>> ERROR: modpost: "sock_skprot_ioctl" [net/phonet/phonet.ko] undefined!
>> ERROR: modpost: "sock_skprot_ioctl" [net/ieee802154/ieee802154_socket.ko] undefined!

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Attachments:
(No filename) (2.27 kB)
config (337.98 kB)
Download all attachments

2023-05-20 04:11:38

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks

On 5/19/23 7:58 AM, Breno Leitao wrote:
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 5440e67bcfe3..47567909baf2 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -4106,3 +4109,107 @@ int sock_bind_add(struct sock *sk, struct sockaddr *addr, int addr_len)
> return sk->sk_prot->bind_add(sk, addr, addr_len);
> }
> EXPORT_SYMBOL(sock_bind_add);
> +
> +/* Copy 'size' bytes from userspace and do not copy anything back */
> +int sock_skproto_ioctl_in(struct sock *sk, unsigned int cmd,
> + void __user *arg)

Unless I missed a reference, this one, the _inout, and the _out below
can be marked static - they are not need outside of sock.c.


> +{
> + int karg;
> +
> + if (get_user(karg, (u32 __user *)arg))
> + return -EFAULT;
> +
> + return sk->sk_prot->ioctl(sk, cmd, &karg);
> +}
> +
> +/* Copy 'size' bytes from userspace and return `size` back to userspace */
> +int sock_skproto_ioctl_inout(struct sock *sk, unsigned int cmd,
> + void __user *arg, size_t size)
> +{
> + void *ptr;
> + int ret;
> +
> + ptr = kmalloc(size, GFP_KERNEL);
> + if (!ptr)
> + return -ENOMEM;
> +
> + if (copy_from_user(ptr, arg, size)) {
> + ret = -EFAULT;
> + goto out;
> + }
> +
> + ret = sk->sk_prot->ioctl(sk, cmd, ptr);
> + if (ret)
> + goto out;
> +
> + if (copy_to_user(arg, ptr, size))
> + ret = -EFAULT;
> +
> +out:
> + kfree(ptr);
> + return ret;
> +}
> +
> +/* This is the most common ioctl prep function, where the result (4 bytes) is
> + * copied back to userspace if the ioctl() returns successfully. No input is
> + * copied from userspace as input argument.
> + */
> +int sock_skproto_ioctl_out(struct sock *sk, unsigned int cmd,
> + void __user *arg)
> +{
> + int ret, karg = 0;
> +
> + ret = sk->sk_prot->ioctl(sk, cmd, &karg);
> + if (ret)
> + return ret;
> +
> + return put_user(karg, (int __user *)arg);
> +}
> +
> +/* A wrapper around sock ioctls, which copies the data from userspace
> + * (depending on the protocol/ioctl), and copies back the result to userspace.
> + * The main motivation for this function is to pass kernel memory to the
> + * protocol ioctl callsback, instead of userspace memory.
> + */
> +int sock_skprot_ioctl(struct sock *sk, unsigned int cmd,
> + void __user *arg)
> +{
> +#ifdef CONFIG_IP_MROUTE
> + if (!strcmp(sk->sk_prot->name, "RAW")) {
> + switch (cmd) {
> + /* These userspace buffers will be consumed by ipmr_ioctl() */
> + case SIOCGETVIFCNT:
> + return sock_skproto_ioctl_inout(sk, cmd,
> + arg, sizeof(struct sioc_vif_req));
> + case SIOCGETSGCNT:
> + return sock_skproto_ioctl_inout(sk, cmd,
> + arg, sizeof(struct sioc_sg_req));
> + }
> + }
> +#endif
> +#ifdef CONFIG_IPV6_MROUTE
> + if (!strcmp(sk->sk_prot->name, "RAW6")) {
> + switch (cmd) {
> + /* These userspace buffers will be consumed by ip6mr_ioctl() */
> + case SIOCGETMIFCNT_IN6:
> + return sock_skproto_ioctl_inout(sk, cmd,
> + arg, sizeof(struct sioc_mif_req6));
> + case SIOCGETSGCNT_IN6:
> + return sock_skproto_ioctl_inout(sk, cmd,
> + arg, sizeof(struct sioc_sg_req6));
> + }
> + }
> +#endif
> +#ifdef CONFIG_PHONET
> + if (!strcmp(sk->sk_prot->name, "PHONET")) {
> + /* This userspace buffers will be consumed by pn_ioctl() */
> + switch (cmd) {
> + case SIOCPNADDRESOURCE:
> + case SIOCPNDELRESOURCE:
> + return sock_skproto_ioctl_in(sk, cmd, arg);
> + }
> + }
> +#endif
> +
> + return sock_skproto_ioctl_out(sk, cmd, arg);
> +}



2023-05-20 12:51:52

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks

From: Willem de Bruijn
> Sent: 19 May 2023 18:05
...
> > Should I stack allocate all the 4 structures sock_skprot_ioctl and pass
> > them to sock_skproto_ioctl_inout() together with the size? (using the
> > original name to avoid confusion - will rename in V2)
> >
> > I mean, writing something as:
> >
> > int sock_skprot_ioctl(struct sock *sk, unsigned int cmd
> > void __user *arg`
> > {
> > struct sioc_vif_req sioc_vif_req_arg;
> > struct sioc_sg_req sioc_sg_req_arg;
> > struct sioc_mif_req6 sioc_mif_req6_arg;
> > struct sioc_sg_req6 sioc_sg_req6_arg;
> >
> > ..
> >
> > if (!strcmp(sk->sk_prot->name, "RAW6")) {
> > switch (cmd) {
> > case SIOCGETMIFCNT_IN6:
> > return sock_skproto_ioctl_inout(sk, cmd,
> > arg, &sioc_mif_req6_arg, sizeof(sioc_mif_req6_arg);
> > case SIOCGETSGCNT_IN6:
> > return sock_skproto_ioctl_inout(sk, cmd,
> > arg, &sioc_sg_req6_arg, sizeof(sioc_sg_req6_arg));
> > }
> > }
> > ...
> > }
>
> Slight preference for using braces in the individual case statements
> and defining the variables in that block scope. See for instance
> do_tcp_setsockopt.

Beware of stack bloat especially under KASAN (etc).
It might be better to use a union.
Then the switch statement only need determine the required length.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2023-05-20 13:20:14

by David Laight

[permalink] [raw]
Subject: RE: [PATCH 1/1] net: ioctl: Use kernel memory on protocol ioctl callbacks

From: Willem de Bruijn
> Sent: 19 May 2023 16:09
...
> Since that is a limited well understood list, I'm not in favor of the
> suggestion to add an explicit length argument that then needs to be
> checked in each callee.

While calls from userspace and direct calls from drivers can be
reasonably expected to have the required length buffer, I'm
not sure that is guaranteed for indirect calls via io_uring
and bpf.
In those cases the associated length is likely to come from
userspace and a suitably sized kernel buffer allocated.
So something needs to ensure the buffer is long enough
(and, indeed, not stupidly long).

Now you could require that the caller always supply a buffer
of at least (say) 64 bytes as well as the actual length.
Then only callee functions that have a long buffer need check.

An alternate option is to define a union of all the valid
argument types and require that any code making 'unknown'
requests supply a kernel buffer of that length.
(With due care taken to avoid overlong copies of uninitialised
kernel memory back to userspace.)

The same union would be useful as an upper bound for the
kernel buffer size - even if it is too large to always
allocate on stack.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)