2018-07-26 02:36:20

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 00/18] xfrm: Add compat layer

Due to some historical mistake, xfrm User ABI differ between native and
compatible applications. The difference is in structures paddings and in
the result in the size of netlink messages.
As it's already visible ABI, it cannot be adjusted by packing structures.

Possibility for compatible application to manage xfrm tunnels was
disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
userspace socket policies on 64 bit systems") and the commit 74005991b78a
("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").

By some wonderful reasons and brilliant architecture decisions for
creating userspace, on Arista switches we still use 32-bit userspace
with 64-bit kernel. There is slow movement to full 64-bit build, but
it's not yet here. As the switches need support for ipsec tunnels, the
local kernel has reverted mentioned patches that disable xfrm for
compat apps. On the top of that there is a bunch of disgraceful hacks
in userspace to work around the size check for netlink messages
and all that jazz.

It looks like, we're not the only desirable users of compatible xfrm,
there were a couple of attempts to make it work:
https://lkml.org/lkml/2017/1/20/733
https://patchwork.ozlabs.org/patch/44600/
http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctly-parse-netlink-msg-from-32bits-ip-command-on-64bits-host

All the discussions end in the conclusion that xfrm should have a full
compatible layer to correctly work with 32-bit applications on 64-bit
kernels:
https://lkml.org/lkml/2017/1/23/413
https://patchwork.ozlabs.org/patch/433279/

In some recent lkml discussion, Linus said that it's worth to fix this
problem and not giving people an excuse to stay on 32-bit kernel:
https://lkml.org/lkml/2018/2/13/752

So, here I add a compatible layer to xfrm.
As xfrm uses netlink notifications, kernel should send them in ABI
format that an application will parse. The proposed solution is
to save the ABI of bind() syscall. The realization detail is
to create kernel-hidden, non visible to userspace netlink groups
for compat applications.

The first two patches simplify ifdeffery, and while I've already submitted
them a while ago, I'm resending them for completeness:
https://lore.kernel.org/lkml/[email protected]/T/#u

There is also an exhaustive selftest for ipsec tunnels and to check
that kernel parses correctly the structures those differ in size.
It doesn't depend on any library and compat version can be easy
build with: make CFLAGS=-m32 net/ipsec

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: [email protected]

Dmitry Safonov (18):
x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
compat: Cleanup in_compat_syscall() callers
selftest/net/xfrm: Add test for ipsec tunnel
net/xfrm: Add _packed types for compat users
net/xfrm: Parse userspi_info{,_packed} depending on syscall
netlink: Do not subscribe to non-existent groups
netlink: Pass groups pointer to .bind()
xfrm: Add in-kernel groups for compat notifications
xfrm: Dump usersa_info in compat/native formats
xfrm: Send state notifications in compat format too
xfrm: Add compat support for xfrm_user_expire messages
xfrm: Add compat support for xfrm_userpolicy_info messages
xfrm: Add compat support for xfrm_user_acquire messages
xfrm: Add compat support for xfrm_user_polexpire messages
xfrm: Check compat acquire listeners in xfrm_is_alive()
xfrm: Notify compat listeners about policy flush
xfrm: Notify compat listeners about state flush
xfrm: Enable compat syscalls

MAINTAINERS | 1 +
arch/x86/include/asm/compat.h | 9 +-
arch/x86/include/asm/ftrace.h | 4 +-
arch/x86/kernel/process_64.c | 4 +-
arch/x86/kernel/sys_x86_64.c | 11 +-
arch/x86/mm/hugetlbpage.c | 4 +-
arch/x86/mm/mmap.c | 2 +-
drivers/firmware/efi/efivars.c | 16 +-
include/linux/compat.h | 4 +-
include/linux/netlink.h | 2 +-
include/net/xfrm.h | 14 -
kernel/audit.c | 2 +-
kernel/time/time.c | 2 +-
net/core/rtnetlink.c | 14 +-
net/core/sock_diag.c | 25 +-
net/netfilter/nfnetlink.c | 24 +-
net/netlink/af_netlink.c | 28 +-
net/netlink/af_netlink.h | 4 +-
net/netlink/genetlink.c | 26 +-
net/xfrm/xfrm_state.c | 5 -
net/xfrm/xfrm_user.c | 690 ++++++++---
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/ipsec.c | 1987 ++++++++++++++++++++++++++++++++
24 files changed, 2612 insertions(+), 268 deletions(-)
create mode 100644 tools/testing/selftests/net/ipsec.c

--
2.13.6



2018-07-26 02:33:10

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 06/18] netlink: Do not subscribe to non-existent groups

Make ABI more strict about subscribing to group > ngroups.
Code doesn't check for that and it looks bogus.
(one can subscribe to non-existing group)
Still, it's possible to bind() to all possible groups with (-1)

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/netlink/af_netlink.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 393573a99a5a..ac805caed2e2 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1008,6 +1008,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
if (err)
return err;
}
+ groups &= (1UL << nlk->ngroups) - 1;

bound = nlk->bound;
if (bound) {
--
2.13.6


2018-07-26 02:33:10

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 02/18] compat: Cleanup in_compat_syscall() callers

Now that in_compat_syscall() == false on native i686, it's possible to
remove some ifdeffery and no more needed helpers.

Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: John Stultz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
drivers/firmware/efi/efivars.c | 16 ++++------------
kernel/time/time.c | 2 +-
net/xfrm/xfrm_state.c | 2 --
net/xfrm/xfrm_user.c | 2 --
4 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/firmware/efi/efivars.c b/drivers/firmware/efi/efivars.c
index 3e626fd9bd4e..8061667a6765 100644
--- a/drivers/firmware/efi/efivars.c
+++ b/drivers/firmware/efi/efivars.c
@@ -229,14 +229,6 @@ sanity_check(struct efi_variable *var, efi_char16_t *name, efi_guid_t vendor,
return 0;
}

-static inline bool is_compat(void)
-{
- if (IS_ENABLED(CONFIG_COMPAT) && in_compat_syscall())
- return true;
-
- return false;
-}
-
static void
copy_out_compat(struct efi_variable *dst, struct compat_efi_variable *src)
{
@@ -263,7 +255,7 @@ efivar_store_raw(struct efivar_entry *entry, const char *buf, size_t count)
u8 *data;
int err;

- if (is_compat()) {
+ if (in_compat_syscall()) {
struct compat_efi_variable *compat;

if (count != sizeof(*compat))
@@ -324,7 +316,7 @@ efivar_show_raw(struct efivar_entry *entry, char *buf)
&entry->var.DataSize, entry->var.Data))
return -EIO;

- if (is_compat()) {
+ if (in_compat_syscall()) {
compat = (struct compat_efi_variable *)buf;

size = sizeof(*compat);
@@ -418,7 +410,7 @@ static ssize_t efivar_create(struct file *filp, struct kobject *kobj,
struct compat_efi_variable *compat = (struct compat_efi_variable *)buf;
struct efi_variable *new_var = (struct efi_variable *)buf;
struct efivar_entry *new_entry;
- bool need_compat = is_compat();
+ bool need_compat = in_compat_syscall();
efi_char16_t *name;
unsigned long size;
u32 attributes;
@@ -495,7 +487,7 @@ static ssize_t efivar_delete(struct file *filp, struct kobject *kobj,
if (!capable(CAP_SYS_ADMIN))
return -EACCES;

- if (is_compat()) {
+ if (in_compat_syscall()) {
if (count != sizeof(*compat))
return -EINVAL;

diff --git a/kernel/time/time.c b/kernel/time/time.c
index 2b41e8e2d31d..d59caa6d03e6 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -865,7 +865,7 @@ int get_timespec64(struct timespec64 *ts,
ts->tv_sec = kts.tv_sec;

/* Zero out the padding for 32 bit systems or in compat mode */
- if (IS_ENABLED(CONFIG_64BIT_TIME) && (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()))
+ if (IS_ENABLED(CONFIG_64BIT_TIME) && in_compat_syscall())
kts.tv_nsec &= 0xFFFFFFFFUL;

ts->tv_nsec = kts.tv_nsec;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 8308281f3253..3f48a6925606 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2057,10 +2057,8 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen
struct xfrm_mgr *km;
struct xfrm_policy *pol = NULL;

-#ifdef CONFIG_COMPAT
if (in_compat_syscall())
return -EOPNOTSUPP;
-#endif

if (!optval && !optlen) {
xfrm_sk_policy_insert(sk, XFRM_POLICY_IN, NULL);
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 080035f056d9..2677cb55b7a8 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2546,10 +2546,8 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
const struct xfrm_link *link;
int type, err;

-#ifdef CONFIG_COMPAT
if (in_compat_syscall())
return -EOPNOTSUPP;
-#endif

type = nlh->nlmsg_type;
if (type > XFRM_MSG_MAX)
--
2.13.6


2018-07-26 02:33:14

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 08/18] xfrm: Add in-kernel groups for compat notifications

Introduce kernel-only, hidden from userspace groups.
Application that bind()ed by kernel to such group will receive netlink
messages in compatible ABI on 64-bit kernels.

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_user.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index bf2ca93edaf5..b123e788488f 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -67,6 +67,29 @@ struct xfrm_userspi_info_packed {
__u32 max;
} __packed;

+/* In-kernel, non-uapi compat groups.
+ * As compat/native messages differ, send notifications according
+ * to .bind() caller's ABI. There are *_COMPAT hidden from userspace
+ * groups for such task.
+ */
+enum xfrm_nlgroups_kernel {
+ XFRMNLGRP_COMPAT_MIN = XFRMNLGRP_MAX,
+ XFRMNLGRP_COMPAT_ACQUIRE,
+ XFRMNLGRP_COMPAT_EXPIRE,
+ XFRMNLGRP_COMPAT_SA,
+ XFRMNLGRP_COMPAT_POLICY,
+ /* Group messages for the following notifications do not differ
+ * in size between native and compat structures:
+ * XFRMNLGRP_AEVENTS,
+ * XFRMNLGRP_REPORT,
+ * XFRMNLGRP_MIGRATE,
+ * XFRMNLGRP_MAPPING,
+ */
+ __XFRMNLGRP_COMPAT_MAX
+};
+
+#define XFRMNLGRP_KERNEL_MAX (__XFRMNLGRP_COMPAT_MAX - 1)
+
static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
{
struct nlattr *rt = attrs[type];
@@ -2645,6 +2668,34 @@ static void xfrm_netlink_rcv(struct sk_buff *skb)
mutex_unlock(&net->xfrm.xfrm_cfg_mutex);
}

+static inline void xfrm_nlgrp_compat(unsigned long *groups,
+ int group, int group_compat)
+{
+ unsigned long group_bit = 1UL << (group - 1);
+
+ if (*groups & group_bit) {
+ *groups &= ~group_bit;
+ *groups |= 1UL << (group_compat - 1);
+ }
+}
+
+static int xfrm_netlink_bind(struct net *net, unsigned long *groups)
+{
+ unsigned long uapi_mask = (1UL << XFRMNLGRP_MAX) - 1;
+
+ *groups &= uapi_mask;
+
+ if (!in_compat_syscall())
+ return 0;
+
+ xfrm_nlgrp_compat(groups, XFRMNLGRP_ACQUIRE, XFRMNLGRP_COMPAT_ACQUIRE);
+ xfrm_nlgrp_compat(groups, XFRMNLGRP_EXPIRE, XFRMNLGRP_COMPAT_EXPIRE);
+ xfrm_nlgrp_compat(groups, XFRMNLGRP_SA, XFRMNLGRP_COMPAT_SA);
+ xfrm_nlgrp_compat(groups, XFRMNLGRP_POLICY, XFRMNLGRP_COMPAT_POLICY);
+
+ return 0;
+}
+
static inline unsigned int xfrm_expire_msgsize(void)
{
return NLMSG_ALIGN(sizeof(struct xfrm_user_expire))
@@ -3283,8 +3334,9 @@ static int __net_init xfrm_user_net_init(struct net *net)
{
struct sock *nlsk;
struct netlink_kernel_cfg cfg = {
- .groups = XFRMNLGRP_MAX,
+ .groups = XFRMNLGRP_KERNEL_MAX,
.input = xfrm_netlink_rcv,
+ .bind = xfrm_netlink_bind,
};

nlsk = netlink_kernel_create(net, NETLINK_XFRM, &cfg);
--
2.13.6


2018-07-26 02:33:21

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 17/18] xfrm: Notify compat listeners about state flush

Notify two groups of listeners:
XFRMNLGRP_SA - applications that uses native UABI for messages;
XFRMNLGRP_COMPAT_SA - applications that uses compat UABI for messages;

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_user.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 7bba0638c014..7e3a132b76fb 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2879,7 +2879,7 @@ static int xfrm_aevent_state_notify(struct xfrm_state *x, const struct km_event
return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_AEVENTS);
}

-static int xfrm_notify_sa_flush(const struct km_event *c)
+static int __xfrm_notify_sa_flush(const struct km_event *c, unsigned int group)
{
struct net *net = c->net;
struct xfrm_usersa_flush *p;
@@ -2902,7 +2902,16 @@ static int xfrm_notify_sa_flush(const struct km_event *c)

nlmsg_end(skb, nlh);

- return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_SA);
+ return xfrm_nlmsg_multicast(net, skb, 0, group);
+}
+
+static int xfrm_notify_sa_flush(const struct km_event *c)
+{
+ int ret = __xfrm_notify_sa_flush(c, XFRMNLGRP_SA);
+
+ if ((ret && ret != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+ return ret;
+ return __xfrm_notify_sa_flush(c, XFRMNLGRP_COMPAT_SA);
}

static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
--
2.13.6


2018-07-26 02:33:31

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 16/18] xfrm: Notify compat listeners about policy flush

Notify two groups of listeners:
XFRMNLGRP_POLICY - applications that uses native UABI for messages;
XFRMNLGRP_COMPAT_POLICY - applications that uses compat UABI for messages;

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_user.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 752019963b1e..7bba0638c014 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3368,7 +3368,8 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir,
return __xfrm_notify_policy(xp, dir, c, true);
}

-static int xfrm_notify_policy_flush(const struct km_event *c)
+static int __xfrm_notify_policy_flush(const struct km_event *c,
+ unsigned int group)
{
struct net *net = c->net;
struct nlmsghdr *nlh;
@@ -3389,13 +3390,22 @@ static int xfrm_notify_policy_flush(const struct km_event *c)

nlmsg_end(skb, nlh);

- return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_POLICY);
+ return xfrm_nlmsg_multicast(net, skb, 0, group);

out_free_skb:
kfree_skb(skb);
return err;
}

+static int xfrm_notify_policy_flush(const struct km_event *c)
+{
+ int ret = __xfrm_notify_policy_flush(c, XFRMNLGRP_POLICY);
+
+ if ((ret && ret != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+ return ret;
+ return __xfrm_notify_policy_flush(c, XFRMNLGRP_COMPAT_POLICY);
+}
+
static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, const struct km_event *c)
{

--
2.13.6


2018-07-26 02:33:36

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 18/18] xfrm: Enable compat syscalls

Compatible syscalls were disabled for xfrm with the following commits:
19d7df69fdb2 ("xfrm: Refuse to insert 32 bit userspace socket policies
on 64 bit systems") and 74005991b78a ("xfrm: Do not parse 32bits
compiled xfrm netlink msg on 64bits host").

As some structures in xfrm uapi header were not packed by a mistake,
they differ in size between 64-bit and 32-bit applications:

32-bit UABI | 64-bit UABI
--------------------------------------|--------------------------------------
sizeof(xfrm_usersa_info) = 220 | sizeof(xfrm_usersa_info) = 224
sizeof(xfrm_userpolicy_info) = 164 | sizeof(xfrm_userpolicy_info) = 168
sizeof(xfrm_userspi_info) = 228 | sizeof(xfrm_userspi_info) = 232
sizeof(xfrm_user_acquire) = 276 | sizeof(xfrm_user_acquire) = 280
sizeof(xfrm_user_expire) = 224 | sizeof(xfrm_user_expire) = 232
sizeof(xfrm_user_polexpire) = 168 | sizeof(xfrm_user_polexpire) = 176

With previous patches compatible layer was added to xfrm, so now we
support users of both ABI. A selftest to check work of ipsec tunnel is
present in net/ipsec. It can be easily compiled as compat application
and doesn't require any compat libraries.

Revert the mentioned commits and check the size of received message
according to native/compat syscall.

Cc: "David S. Miller" <[email protected]>
Cc: Fan Du <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_state.c | 3 ---
net/xfrm/xfrm_user.c | 35 ++++++++++++++++++++++++++++++-----
2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 3f48a6925606..515a565bfc37 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2057,9 +2057,6 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen
struct xfrm_mgr *km;
struct xfrm_policy *pol = NULL;

- if (in_compat_syscall())
- return -EOPNOTSUPP;
-
if (!optval && !optlen) {
xfrm_sk_policy_insert(sk, XFRM_POLICY_IN, NULL);
xfrm_sk_policy_insert(sk, XFRM_POLICY_OUT, NULL);
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 7e3a132b76fb..f6da6ea65d37 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2634,6 +2634,30 @@ static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = {
[XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = sizeof(u32),
};

+static const int xfrm_msg_min_compat[XFRM_NR_MSGTYPES] = {
+ [XFRM_MSG_NEWSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info_packed),
+ [XFRM_MSG_DELSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_id),
+ [XFRM_MSG_GETSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_id),
+ [XFRM_MSG_NEWPOLICY - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_info_packed),
+ [XFRM_MSG_DELPOLICY - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
+ [XFRM_MSG_GETPOLICY - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
+ [XFRM_MSG_ALLOCSPI - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userspi_info_packed),
+ [XFRM_MSG_ACQUIRE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_acquire_packed),
+ [XFRM_MSG_EXPIRE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_expire_packed),
+ [XFRM_MSG_UPDPOLICY - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_info_packed),
+ [XFRM_MSG_UPDSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info_packed),
+ [XFRM_MSG_POLEXPIRE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_polexpire_packed),
+ [XFRM_MSG_FLUSHSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_flush),
+ [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = 0,
+ [XFRM_MSG_NEWAE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id),
+ [XFRM_MSG_GETAE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id),
+ [XFRM_MSG_REPORT - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_report),
+ [XFRM_MSG_MIGRATE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
+ [XFRM_MSG_GETSADINFO - XFRM_MSG_BASE] = sizeof(u32),
+ [XFRM_MSG_NEWSPDINFO - XFRM_MSG_BASE] = sizeof(u32),
+ [XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = sizeof(u32),
+};
+
#undef XMSGSIZE

static const struct nla_policy xfrma_policy[XFRMA_MAX+1] = {
@@ -2715,10 +2739,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
struct nlattr *attrs[XFRMA_MAX+1];
const struct xfrm_link *link;
- int type, err;
-
- if (in_compat_syscall())
- return -EOPNOTSUPP;
+ int type, err, hdrlen;

type = nlh->nlmsg_type;
if (type > XFRM_MSG_MAX)
@@ -2747,7 +2768,11 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
}
}

- err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs,
+ hdrlen = xfrm_msg_min[type];
+ if (in_compat_syscall())
+ hdrlen = xfrm_msg_min_compat[type];
+
+ err = nlmsg_parse(nlh, hdrlen, attrs,
link->nla_max ? : XFRMA_MAX,
link->nla_pol ? : xfrma_policy, extack);
if (err < 0)
--
2.13.6


2018-07-26 02:33:56

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 15/18] xfrm: Check compat acquire listeners in xfrm_is_alive()

As now there are two groups of listeners:
XFRMNLGRP_ACQUIRE - applications that uses native UABI for messages;
XFRMNLGRP_COMPAT_ACQUIRE - applications that uses compat UABI for messages;

So, both groups should be checked for listeners of acquire
notifications.

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
include/net/xfrm.h | 14 --------------
net/xfrm/xfrm_user.c | 16 ++++++++++++++++
2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 557122846e0e..c9b713017ae8 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1784,20 +1784,6 @@ static inline int xfrm_aevent_is_on(struct net *net)
rcu_read_unlock();
return ret;
}
-
-static inline int xfrm_acquire_is_on(struct net *net)
-{
- struct sock *nlsk;
- int ret = 0;
-
- rcu_read_lock();
- nlsk = rcu_dereference(net->xfrm.nlsk);
- if (nlsk)
- ret = netlink_has_listeners(nlsk, XFRMNLGRP_ACQUIRE);
- rcu_read_unlock();
-
- return ret;
-}
#endif

static inline unsigned int aead_len(struct xfrm_algo_aead *alg)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2fe6174b8a18..752019963b1e 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3517,6 +3517,22 @@ static int xfrm_send_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr,
return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_MAPPING);
}

+static inline int xfrm_acquire_is_on(struct net *net)
+{
+ struct sock *nlsk;
+ int ret = 0;
+
+ rcu_read_lock();
+ nlsk = rcu_dereference(net->xfrm.nlsk);
+ if (nlsk)
+ ret = netlink_has_listeners(nlsk, XFRMNLGRP_ACQUIRE);
+ if (!ret || IS_ENABLED(CONFIG_COMPAT))
+ ret = netlink_has_listeners(nlsk, XFRMNLGRP_COMPAT_ACQUIRE);
+ rcu_read_unlock();
+
+ return ret;
+}
+
static bool xfrm_is_alive(const struct km_event *c)
{
return (bool)xfrm_acquire_is_on(c->net);
--
2.13.6


2018-07-26 02:34:09

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 14/18] xfrm: Add compat support for xfrm_user_polexpire messages

Parse polexpire messages sent by userspace according to in_compat_syscall().
Applications that used native bind() syscall are in XFRMNLGRP_EXPIRE, so
send there xfrm_usersa_info messages (with 64-bit ABI). Compatible
applications are added to kernel-hidden XFRMNLGRP_COMPAT_EXPIRE group, so
send there xfrm_usersa_info messages_packed (with 32-bit ABI)

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_user.c | 89 +++++++++++++++++++++++++++++++++++-----------------
1 file changed, 61 insertions(+), 28 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 89f891a0a9a4..2fe6174b8a18 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -84,6 +84,12 @@ struct xfrm_user_acquire_packed {
__u32 seq;
} __packed;

+struct xfrm_user_polexpire_packed {
+ struct xfrm_userpolicy_info_packed pol;
+ __u8 hard;
+ __u8 __pad[3];
+} __packed;
+
/* In-kernel, non-uapi compat groups.
* As compat/native messages differ, send notifications according
* to .bind() caller's ABI. There are *_COMPAT hidden from userspace
@@ -2225,7 +2231,15 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
int err = -ENOENT;
struct xfrm_mark m;
u32 mark = xfrm_mark_get(attrs, &m);
+ u8 hard;

+ if (in_compat_syscall()) {
+ struct xfrm_user_polexpire_packed *_up = nlmsg_data(nlh);
+
+ hard = _up->hard;
+ } else {
+ hard = up->hard;
+ }
err = copy_from_user_policy_type(&type, attrs);
if (err)
return err;
@@ -2263,11 +2277,11 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
goto out;

err = 0;
- if (up->hard) {
+ if (hard) {
xfrm_policy_delete(xp, p->dir);
xfrm_audit_policy_delete(xp, 1, true);
}
- km_policy_expired(xp, p->dir, up->hard, nlh->nlmsg_pid);
+ km_policy_expired(xp, p->dir, hard, nlh->nlmsg_pid);

out:
xfrm_pol_put(xp);
@@ -3192,43 +3206,59 @@ static struct xfrm_policy *xfrm_compile_policy(struct sock *sk, int opt,
return xp;
}

-static inline unsigned int xfrm_polexpire_msgsize(struct xfrm_policy *xp)
+static int build_polexpire(struct sk_buff **skb, struct xfrm_policy *xp,
+ int dir, const struct km_event *c, bool compat)
{
- return NLMSG_ALIGN(sizeof(struct xfrm_user_polexpire))
+ struct xfrm_user_polexpire_packed *_upe;
+ struct xfrm_user_polexpire *upe;
+ unsigned int upe_size, polexpire_msgsize;
+ int hard = c->data.hard;
+ struct nlmsghdr *nlh;
+ int err;
+
+ if (compat)
+ upe_size = NLMSG_ALIGN(sizeof(struct xfrm_user_polexpire_packed));
+ else
+ upe_size = NLMSG_ALIGN(sizeof(struct xfrm_user_polexpire));
+ polexpire_msgsize = upe_size
+ nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr)
+ nla_total_size(xfrm_user_sec_ctx_size(xp->security))
+ nla_total_size(sizeof(struct xfrm_mark))
+ userpolicy_type_attrsize();
-}

-static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp,
- int dir, const struct km_event *c)
-{
- struct xfrm_user_polexpire *upe;
- int hard = c->data.hard;
- struct nlmsghdr *nlh;
- int err;
+ *skb = nlmsg_new(polexpire_msgsize, GFP_ATOMIC);
+ if (*skb == NULL)
+ return -ENOMEM;

- nlh = nlmsg_put(skb, c->portid, 0, XFRM_MSG_POLEXPIRE, sizeof(*upe), 0);
+ nlh = nlmsg_put(*skb, c->portid, 0, XFRM_MSG_POLEXPIRE, upe_size, 0);
if (nlh == NULL)
return -EMSGSIZE;

+ _upe = nlmsg_data(nlh);
upe = nlmsg_data(nlh);
- copy_to_user_policy(xp, &upe->pol, dir);
- err = copy_to_user_tmpl(xp, skb);
+ if (compat)
+ copy_to_user_policy_compat(xp, &_upe->pol, dir);
+ else
+ copy_to_user_policy(xp, &upe->pol, dir);
+
+ err = copy_to_user_tmpl(xp, *skb);
if (!err)
- err = copy_to_user_sec_ctx(xp, skb);
+ err = copy_to_user_sec_ctx(xp, *skb);
if (!err)
- err = copy_to_user_policy_type(xp->type, skb);
+ err = copy_to_user_policy_type(xp->type, *skb);
if (!err)
- err = xfrm_mark_put(skb, &xp->mark);
+ err = xfrm_mark_put(*skb, &xp->mark);
if (err) {
- nlmsg_cancel(skb, nlh);
+ nlmsg_cancel(*skb, nlh);
return err;
}
- upe->hard = !!hard;

- nlmsg_end(skb, nlh);
+ if (compat)
+ _upe->hard = !!hard;
+ else
+ upe->hard = !!hard;
+
+ nlmsg_end(*skb, nlh);
return 0;
}

@@ -3238,14 +3268,17 @@ static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, const struct
struct sk_buff *skb;
int err;

- skb = nlmsg_new(xfrm_polexpire_msgsize(xp), GFP_ATOMIC);
- if (skb == NULL)
- return -ENOMEM;
-
- err = build_polexpire(skb, xp, dir, c);
- BUG_ON(err < 0);
+ err = build_polexpire(&skb, xp, dir, c, false);
+ if (err)
+ return err;
+ err = xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_EXPIRE);
+ if ((err && err != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+ return err;

- return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_EXPIRE);
+ err = build_polexpire(&skb, xp, dir, c, true);
+ if (err)
+ return err;
+ return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_COMPAT_EXPIRE);
}

static int __xfrm_notify_policy(struct xfrm_policy *xp, int dir,
--
2.13.6


2018-07-26 02:34:18

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 13/18] xfrm: Add compat support for xfrm_user_acquire messages

Parse acquire messages sent by userspace according to in_compat_syscall().
Applications that used native bind() syscall are in XFRMNLGRP_ACQUIRE, so
send there xfrm_usersa_info messages (with 64-bit ABI). Compatible
applications are added to kernel-hidden XFRMNLGRP_COMPAT_ACQUIRE group, so
send there xfrm_usersa_info messages_packed (with 32-bit ABI)

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_user.c | 113 +++++++++++++++++++++++++++++++++++----------------
1 file changed, 77 insertions(+), 36 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index df792a3be8f2..89f891a0a9a4 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -73,6 +73,17 @@ struct xfrm_user_expire_packed {
__u8 __pad[3];
} __packed;

+struct xfrm_user_acquire_packed {
+ struct xfrm_id id;
+ xfrm_address_t saddr;
+ struct xfrm_selector sel;
+ struct xfrm_userpolicy_info_packed policy;
+ __u32 aalgos;
+ __u32 ealgos;
+ __u32 calgos;
+ __u32 seq;
+} __packed;
+
/* In-kernel, non-uapi compat groups.
* As compat/native messages differ, send notifications according
* to .bind() caller's ABI. There are *_COMPAT hidden from userspace
@@ -2316,8 +2327,8 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,
struct nlattr *rt = attrs[XFRMA_TMPL];
struct xfrm_mark mark;

- struct xfrm_user_acquire *ua = nlmsg_data(nlh);
- struct xfrm_userpolicy_info_packed *upi = (void *)&ua->policy;
+ struct xfrm_user_acquire_packed *ua = nlmsg_data(nlh);
+ struct xfrm_user_acquire *_ua = nlmsg_data(nlh);
struct xfrm_state *x = xfrm_state_alloc(net);
int err = -ENOMEM;

@@ -2326,12 +2337,12 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,

xfrm_mark_get(attrs, &mark);

- err = verify_newpolicy_info(upi);
+ err = verify_newpolicy_info(&ua->policy);
if (err)
goto free_state;

/* build an XP */
- xp = xfrm_policy_construct(net, upi, attrs, &err);
+ xp = xfrm_policy_construct(net, &ua->policy, attrs, &err);
if (!xp)
goto free_state;

@@ -2348,9 +2359,15 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,
x->props.mode = t->mode;
x->props.reqid = t->reqid;
x->props.family = ut->family;
- t->aalgos = ua->aalgos;
- t->ealgos = ua->ealgos;
- t->calgos = ua->calgos;
+ if (in_compat_syscall()) {
+ t->aalgos = ua->aalgos;
+ t->ealgos = ua->ealgos;
+ t->calgos = ua->calgos;
+ } else {
+ t->aalgos = _ua->aalgos;
+ t->ealgos = _ua->ealgos;
+ t->calgos = _ua->calgos;
+ }
err = km_query(x, t, xp);

}
@@ -3017,25 +3034,32 @@ static int xfrm_send_state_notify(struct xfrm_state *x, const struct km_event *c

}

-static inline unsigned int xfrm_acquire_msgsize(struct xfrm_state *x,
- struct xfrm_policy *xp)
+static int build_acquire(struct sk_buff **skb, struct xfrm_state *x,
+ struct xfrm_tmpl *xt, struct xfrm_policy *xp,
+ bool compat)
{
- return NLMSG_ALIGN(sizeof(struct xfrm_user_acquire))
+ __u32 seq = xfrm_get_acqseq();
+ struct xfrm_user_acquire_packed *ua;
+ struct nlmsghdr *nlh;
+ unsigned int ua_size, ack_msgsize;
+ int err;
+
+ if (compat)
+ ua_size = NLMSG_ALIGN(sizeof(struct xfrm_user_acquire_packed));
+ else
+ ua_size = NLMSG_ALIGN(sizeof(struct xfrm_user_acquire));
+
+ ack_msgsize = ua_size
+ nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr)
+ nla_total_size(sizeof(struct xfrm_mark))
+ nla_total_size(xfrm_user_sec_ctx_size(x->security))
+ userpolicy_type_attrsize();
-}

-static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
- struct xfrm_tmpl *xt, struct xfrm_policy *xp)
-{
- __u32 seq = xfrm_get_acqseq();
- struct xfrm_user_acquire *ua;
- struct nlmsghdr *nlh;
- int err;
+ *skb = nlmsg_new(ack_msgsize, GFP_ATOMIC);
+ if (*skb == NULL)
+ return -ENOMEM;

- nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_ACQUIRE, sizeof(*ua), 0);
+ nlh = nlmsg_put(*skb, 0, 0, XFRM_MSG_ACQUIRE, ua_size, 0);
if (nlh == NULL)
return -EMSGSIZE;

@@ -3043,25 +3067,36 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
memcpy(&ua->id, &x->id, sizeof(ua->id));
memcpy(&ua->saddr, &x->props.saddr, sizeof(ua->saddr));
memcpy(&ua->sel, &x->sel, sizeof(ua->sel));
- copy_to_user_policy(xp, &ua->policy, XFRM_POLICY_OUT);
- ua->aalgos = xt->aalgos;
- ua->ealgos = xt->ealgos;
- ua->calgos = xt->calgos;
- ua->seq = x->km.seq = seq;

- err = copy_to_user_tmpl(xp, skb);
+ if (compat) {
+ copy_to_user_policy_compat(xp, &ua->policy, XFRM_POLICY_OUT);
+ ua->aalgos = xt->aalgos;
+ ua->ealgos = xt->ealgos;
+ ua->calgos = xt->calgos;
+ ua->seq = x->km.seq = seq;
+ } else {
+ struct xfrm_user_acquire *_ua = nlmsg_data(nlh);
+
+ copy_to_user_policy(xp, &_ua->policy, XFRM_POLICY_OUT);
+ _ua->aalgos = xt->aalgos;
+ _ua->ealgos = xt->ealgos;
+ _ua->calgos = xt->calgos;
+ _ua->seq = x->km.seq = seq;
+ }
+
+ err = copy_to_user_tmpl(xp, *skb);
if (!err)
- err = copy_to_user_state_sec_ctx(x, skb);
+ err = copy_to_user_state_sec_ctx(x, *skb);
if (!err)
- err = copy_to_user_policy_type(xp->type, skb);
+ err = copy_to_user_policy_type(xp->type, *skb);
if (!err)
- err = xfrm_mark_put(skb, &xp->mark);
+ err = xfrm_mark_put(*skb, &xp->mark);
if (err) {
- nlmsg_cancel(skb, nlh);
+ nlmsg_cancel(*skb, nlh);
return err;
}

- nlmsg_end(skb, nlh);
+ nlmsg_end(*skb, nlh);
return 0;
}

@@ -3072,14 +3107,20 @@ static int xfrm_send_acquire(struct xfrm_state *x, struct xfrm_tmpl *xt,
struct sk_buff *skb;
int err;

- skb = nlmsg_new(xfrm_acquire_msgsize(x, xp), GFP_ATOMIC);
- if (skb == NULL)
- return -ENOMEM;

- err = build_acquire(skb, x, xt, xp);
- BUG_ON(err < 0);
+ err = build_acquire(&skb, x, xt, xp, false);
+ if (err)
+ return err;
+
+ err = xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_ACQUIRE);
+ if ((err && err != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+ return err;
+
+ err = build_acquire(&skb, x, xt, xp, true);
+ if (err)
+ return err;

- return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_ACQUIRE);
+ return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_COMPAT_ACQUIRE);
}

/* User gives us xfrm_user_policy_info followed by an array of 0
--
2.13.6


2018-07-26 02:34:39

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 11/18] xfrm: Add compat support for xfrm_user_expire messages

Parse expire messages sent by userspace according to in_compat_syscall().
Applications that used native bind() syscall are in XFRMNLGRP_EXPIRE, so
send there xfrm_usersa_info messages (with 64-bit ABI). Compatible
applications are added to kernel-hidden XFRMNLGRP_COMPAT_EXPIRE group, so
send there xfrm_usersa_info messages_packed (with 32-bit ABI)

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_user.c | 95 +++++++++++++++++++++++++++++++++++-----------------
1 file changed, 65 insertions(+), 30 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 230462077dc9..ca1a14f45cf7 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -67,6 +67,12 @@ struct xfrm_userspi_info_packed {
__u32 max;
} __packed;

+struct xfrm_user_expire_packed {
+ struct xfrm_usersa_info_packed state;
+ __u8 hard;
+ __u8 __pad[3];
+} __packed;
+
/* In-kernel, non-uapi compat groups.
* As compat/native messages differ, send notifications according
* to .bind() caller's ABI. There are *_COMPAT hidden from userspace
@@ -2240,10 +2246,19 @@ static int xfrm_add_sa_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
struct xfrm_state *x;
int err;
- struct xfrm_user_expire *ue = nlmsg_data(nlh);
- struct xfrm_usersa_info_packed *p = (struct xfrm_usersa_info_packed *)&ue->state;
+ struct xfrm_user_expire_packed *ue = nlmsg_data(nlh);
+ struct xfrm_usersa_info_packed *p = &ue->state;
struct xfrm_mark m;
u32 mark = xfrm_mark_get(attrs, &m);
+ u8 hard;
+
+ if (in_compat_syscall()) {
+ hard = ue->hard;
+ } else {
+ struct xfrm_user_expire *expire = nlmsg_data(nlh);
+
+ hard = expire->hard;
+ }

x = xfrm_state_lookup(net, mark, &p->id.daddr, p->id.spi, p->id.proto, p->family);

@@ -2255,9 +2270,9 @@ static int xfrm_add_sa_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
err = -EINVAL;
if (x->km.state != XFRM_STATE_VALID)
goto out;
- km_state_expired(x, ue->hard, nlh->nlmsg_pid);
+ km_state_expired(x, hard, nlh->nlmsg_pid);

- if (ue->hard) {
+ if (hard) {
__xfrm_state_delete(x);
xfrm_audit_state_delete(x, 1, true);
}
@@ -2727,33 +2742,49 @@ static int xfrm_netlink_bind(struct net *net, unsigned long *groups)
return 0;
}

-static inline unsigned int xfrm_expire_msgsize(void)
-{
- return NLMSG_ALIGN(sizeof(struct xfrm_user_expire))
- + nla_total_size(sizeof(struct xfrm_mark));
-}
-
-static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct km_event *c)
+static int build_expire(struct sk_buff **skb, struct xfrm_state *x,
+ const struct km_event *c, bool compat)
{
- struct xfrm_user_expire *ue;
struct nlmsghdr *nlh;
+ unsigned int ue_sz;
int err;

- nlh = nlmsg_put(skb, c->portid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0);
- if (nlh == NULL)
+ if (compat)
+ ue_sz = NLMSG_ALIGN(sizeof(struct xfrm_user_expire_packed));
+ else
+ ue_sz = NLMSG_ALIGN(sizeof(struct xfrm_user_expire));
+
+ *skb = nlmsg_new(ue_sz + nla_total_size(sizeof(struct xfrm_mark)), GFP_ATOMIC);
+ if (*skb == NULL)
+ return -ENOMEM;
+
+ nlh = nlmsg_put(*skb, c->portid, 0, XFRM_MSG_EXPIRE, ue_sz, 0);
+ if (nlh == NULL) {
+ kfree_skb(*skb);
return -EMSGSIZE;
+ }

- ue = nlmsg_data(nlh);
- copy_to_user_state(x, &ue->state);
- ue->hard = (c->data.hard != 0) ? 1 : 0;
- /* clear the padding bytes */
- memset(&ue->hard + 1, 0, sizeof(*ue) - offsetofend(typeof(*ue), hard));
+ if (compat) {
+ struct xfrm_user_expire_packed *ue = nlmsg_data(nlh);

- err = xfrm_mark_put(skb, &x->mark);
- if (err)
+ copy_to_user_state_compat(x, &ue->state);
+ ue->hard = (c->data.hard != 0) ? 1 : 0;
+ } else {
+ struct xfrm_user_expire *ue = nlmsg_data(nlh);
+
+ copy_to_user_state(x, &ue->state);
+ ue->hard = (c->data.hard != 0) ? 1 : 0;
+ /* clear the padding bytes */
+ memset(&ue->hard + 1, 0, sizeof(*ue) - offsetofend(typeof(*ue), hard));
+ }
+
+ err = xfrm_mark_put(*skb, &x->mark);
+ if (err) {
+ kfree_skb(*skb);
return err;
+ }

- nlmsg_end(skb, nlh);
+ nlmsg_end(*skb, nlh);
return 0;
}

@@ -2761,17 +2792,21 @@ static int xfrm_exp_state_notify(struct xfrm_state *x, const struct km_event *c)
{
struct net *net = xs_net(x);
struct sk_buff *skb;
+ int err;

- skb = nlmsg_new(xfrm_expire_msgsize(), GFP_ATOMIC);
- if (skb == NULL)
- return -ENOMEM;
+ err = build_expire(&skb, x, c, false);
+ if (err)
+ return err;

- if (build_expire(skb, x, c) < 0) {
- kfree_skb(skb);
- return -EMSGSIZE;
- }
+ err = xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_EXPIRE);
+ if ((err && err != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+ return err;

- return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_EXPIRE);
+ err = build_expire(&skb, x, c, true);
+ if (err)
+ return err;
+
+ return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_COMPAT_EXPIRE);
}

static int xfrm_aevent_state_notify(struct xfrm_state *x, const struct km_event *c)
--
2.13.6


2018-07-26 02:34:45

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 09/18] xfrm: Dump usersa_info in compat/native formats

Create xfrm_usersa_info in netlink messages in 32/64-bit UABI according
to type of syscall used to dump xfrm state.

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_user.c | 55 ++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 43 insertions(+), 12 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index b123e788488f..63622264a3a9 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -799,9 +799,9 @@ static int xfrm_del_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
return err;
}

-static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p)
+static void __copy_to_user_state(struct xfrm_state *x,
+ struct xfrm_usersa_info_packed *p)
{
- memset(p, 0, sizeof(*p));
memcpy(&p->id, &x->id, sizeof(p->id));
memcpy(&p->sel, &x->sel, sizeof(p->sel));
memcpy(&p->lft, &x->lft, sizeof(p->lft));
@@ -818,11 +818,25 @@ static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p)
p->seq = x->km.seq;
}

+static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p)
+{
+ memset(p, 0, sizeof(*p));
+ __copy_to_user_state(x, (struct xfrm_usersa_info_packed *)p);
+}
+
+static void copy_to_user_state_compat(struct xfrm_state *x,
+ struct xfrm_usersa_info_packed *p)
+{
+ memset(p, 0, sizeof(*p));
+ __copy_to_user_state(x, p);
+}
+
struct xfrm_dump_info {
struct sk_buff *in_skb;
struct sk_buff *out_skb;
u32 nlmsg_seq;
u16 nlmsg_flags;
+ bool compat_dump;
};

static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb)
@@ -882,14 +896,10 @@ static int copy_to_user_auth(struct xfrm_algo_auth *auth, struct sk_buff *skb)
}

/* Don't change this without updating xfrm_sa_len! */
-static int copy_to_user_state_extra(struct xfrm_state *x,
- struct xfrm_usersa_info *p,
- struct sk_buff *skb)
+static int __copy_to_user_state_extra(struct xfrm_state *x, struct sk_buff *skb)
{
int ret = 0;

- copy_to_user_state(x, p);
-
if (x->props.extra_flags) {
ret = nla_put_u32(skb, XFRMA_SA_EXTRA_FLAGS,
x->props.extra_flags);
@@ -968,23 +978,42 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
return ret;
}

+static int copy_to_user_state_extra(struct xfrm_state *x,
+ struct xfrm_usersa_info *p, struct sk_buff *skb)
+{
+ copy_to_user_state(x, p);
+ return __copy_to_user_state_extra(x, skb);
+}
+
+static int copy_to_user_state_extra_compat(struct xfrm_state *x,
+ struct xfrm_usersa_info_packed *p, struct sk_buff *skb)
+{
+ copy_to_user_state_compat(x, p);
+ return __copy_to_user_state_extra(x, skb);
+}
+
static int dump_one_state(struct xfrm_state *x, int count, void *ptr)
{
struct xfrm_dump_info *sp = ptr;
struct sk_buff *in_skb = sp->in_skb;
struct sk_buff *skb = sp->out_skb;
- struct xfrm_usersa_info *p;
struct nlmsghdr *nlh;
+ size_t msg_len;
int err;

+ if (sp->compat_dump)
+ msg_len = sizeof(struct xfrm_usersa_info_packed);
+ else
+ msg_len = sizeof(struct xfrm_usersa_info);
nlh = nlmsg_put(skb, NETLINK_CB(in_skb).portid, sp->nlmsg_seq,
- XFRM_MSG_NEWSA, sizeof(*p), sp->nlmsg_flags);
+ XFRM_MSG_NEWSA, msg_len, sp->nlmsg_flags);
if (nlh == NULL)
return -EMSGSIZE;

- p = nlmsg_data(nlh);
-
- err = copy_to_user_state_extra(x, p, skb);
+ if (sp->compat_dump)
+ err = copy_to_user_state_extra_compat(x, nlmsg_data(nlh), skb);
+ else
+ err = copy_to_user_state_extra(x, nlmsg_data(nlh), skb);
if (err) {
nlmsg_cancel(skb, nlh);
return err;
@@ -1018,6 +1047,7 @@ static int xfrm_dump_sa(struct sk_buff *skb, struct netlink_callback *cb)
info.out_skb = skb;
info.nlmsg_seq = cb->nlh->nlmsg_seq;
info.nlmsg_flags = NLM_F_MULTI;
+ info.compat_dump = in_compat_syscall();

if (!cb->args[0]) {
struct nlattr *attrs[XFRMA_MAX+1];
@@ -1064,6 +1094,7 @@ static struct sk_buff *xfrm_state_netlink(struct sk_buff *in_skb,
info.out_skb = skb;
info.nlmsg_seq = seq;
info.nlmsg_flags = 0;
+ info.compat_dump = in_compat_syscall();

err = dump_one_state(x, 0, &info);
if (err) {
--
2.13.6


2018-07-26 02:35:02

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 12/18] xfrm: Add compat support for xfrm_userpolicy_info messages

Parse userpolicy messages sent by userspace according to in_compat_syscall().
Applications that used native bind() syscall are in XFRMNLGRP_POLICY, so
send there xfrm_usersa_info messages (with 64-bit ABI). Compatible
applications are added to kernel-hidden XFRMNLGRP_COMPAT_POLICY group, so
send there xfrm_usersa_info messages_packed (with 32-bit ABI)

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_user.c | 73 +++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 58 insertions(+), 15 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index ca1a14f45cf7..df792a3be8f2 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1630,9 +1630,9 @@ static void copy_from_user_policy(struct xfrm_policy *xp,
/* XXX xp->share = p->share; */
}

-static void copy_to_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy_info *p, int dir)
+static void __copy_to_user_policy(struct xfrm_policy *xp,
+ struct xfrm_userpolicy_info_packed *p, int dir)
{
- memset(p, 0, sizeof(*p));
memcpy(&p->sel, &xp->selector, sizeof(p->sel));
memcpy(&p->lft, &xp->lft, sizeof(p->lft));
memcpy(&p->curlft, &xp->curlft, sizeof(p->curlft));
@@ -1645,6 +1645,20 @@ static void copy_to_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy_i
p->share = XFRM_SHARE_ANY; /* XXX xp->share */
}

+static void copy_to_user_policy(struct xfrm_policy *xp,
+ struct xfrm_userpolicy_info *p, int dir)
+{
+ memset(p, 0, sizeof(*p));
+ __copy_to_user_policy(xp, (struct xfrm_userpolicy_info_packed *)p, dir);
+}
+
+static void copy_to_user_policy_compat(struct xfrm_policy *xp,
+ struct xfrm_userpolicy_info_packed *p, int dir)
+{
+ memset(p, 0, sizeof(*p));
+ __copy_to_user_policy(xp, p, dir);
+}
+
static struct xfrm_policy *xfrm_policy_construct(struct net *net,
struct xfrm_userpolicy_info_packed *p,
struct nlattr **attrs, int *errp)
@@ -1795,19 +1809,26 @@ static inline int copy_to_user_policy_type(u8 type, struct sk_buff *skb)
static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr)
{
struct xfrm_dump_info *sp = ptr;
- struct xfrm_userpolicy_info *p;
struct sk_buff *in_skb = sp->in_skb;
struct sk_buff *skb = sp->out_skb;
struct nlmsghdr *nlh;
+ size_t msg_len;
int err;

+ if (sp->compat_dump)
+ msg_len = sizeof(struct xfrm_userpolicy_info_packed);
+ else
+ msg_len = sizeof(struct xfrm_userpolicy_info);
nlh = nlmsg_put(skb, NETLINK_CB(in_skb).portid, sp->nlmsg_seq,
- XFRM_MSG_NEWPOLICY, sizeof(*p), sp->nlmsg_flags);
+ XFRM_MSG_NEWPOLICY, msg_len, sp->nlmsg_flags);
if (nlh == NULL)
return -EMSGSIZE;

- p = nlmsg_data(nlh);
- copy_to_user_policy(xp, p, dir);
+ if (sp->compat_dump)
+ copy_to_user_policy_compat(xp, nlmsg_data(nlh), dir);
+ else
+ copy_to_user_policy(xp, nlmsg_data(nlh), dir);
+
err = copy_to_user_tmpl(xp, skb);
if (!err)
err = copy_to_user_sec_ctx(xp, skb);
@@ -1852,6 +1873,7 @@ static int xfrm_dump_policy(struct sk_buff *skb, struct netlink_callback *cb)
info.out_skb = skb;
info.nlmsg_seq = cb->nlh->nlmsg_seq;
info.nlmsg_flags = NLM_F_MULTI;
+ info.compat_dump = in_compat_syscall();

(void) xfrm_policy_walk(net, walk, dump_one_policy, &info);

@@ -1874,6 +1896,7 @@ static struct sk_buff *xfrm_policy_netlink(struct sk_buff *in_skb,
info.out_skb = skb;
info.nlmsg_seq = seq;
info.nlmsg_flags = 0;
+ info.compat_dump = in_compat_syscall();

err = dump_one_policy(xp, dir, 0, &info);
if (err) {
@@ -3184,18 +3207,24 @@ static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, const struct
return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_EXPIRE);
}

-static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_event *c)
+static int __xfrm_notify_policy(struct xfrm_policy *xp, int dir,
+ const struct km_event *c, bool compat)
{
unsigned int len = nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr);
+ unsigned int headlen, upi_size;
struct net *net = xp_net(xp);
- struct xfrm_userpolicy_info *p;
struct xfrm_userpolicy_id *id;
+ void *userpolicy_info;
struct nlmsghdr *nlh;
struct sk_buff *skb;
- unsigned int headlen;
int err;

- headlen = sizeof(*p);
+ if (compat)
+ upi_size = sizeof(struct xfrm_userpolicy_info_packed);
+ else
+ upi_size = sizeof(struct xfrm_userpolicy_info);
+ headlen = upi_size;
+
if (c->event == XFRM_MSG_DELPOLICY) {
len += nla_total_size(headlen);
headlen = sizeof(*id);
@@ -3213,7 +3242,7 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
if (nlh == NULL)
goto out_free_skb;

- p = nlmsg_data(nlh);
+ userpolicy_info = nlmsg_data(nlh);
if (c->event == XFRM_MSG_DELPOLICY) {
struct nlattr *attr;

@@ -3225,15 +3254,18 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
else
memcpy(&id->sel, &xp->selector, sizeof(id->sel));

- attr = nla_reserve(skb, XFRMA_POLICY, sizeof(*p));
+ attr = nla_reserve(skb, XFRMA_POLICY, upi_size);
err = -EMSGSIZE;
if (attr == NULL)
goto out_free_skb;

- p = nla_data(attr);
+ userpolicy_info = nla_data(attr);
}

- copy_to_user_policy(xp, p, dir);
+ if (compat)
+ copy_to_user_policy_compat(xp, userpolicy_info, dir);
+ else
+ copy_to_user_policy(xp, userpolicy_info, dir);
err = copy_to_user_tmpl(xp, skb);
if (!err)
err = copy_to_user_policy_type(xp->type, skb);
@@ -3244,13 +3276,24 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e

nlmsg_end(skb, nlh);

- return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_POLICY);
+ return xfrm_nlmsg_multicast(net, skb, 0, compat ?
+ XFRMNLGRP_COMPAT_POLICY : XFRMNLGRP_POLICY);

out_free_skb:
kfree_skb(skb);
return err;
}

+static int xfrm_notify_policy(struct xfrm_policy *xp, int dir,
+ const struct km_event *c)
+{
+ int ret = __xfrm_notify_policy(xp, dir, c, false);
+
+ if ((ret && ret != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+ return ret;
+ return __xfrm_notify_policy(xp, dir, c, true);
+}
+
static int xfrm_notify_policy_flush(const struct km_event *c)
{
struct net *net = c->net;
--
2.13.6


2018-07-26 02:35:08

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 07/18] netlink: Pass groups pointer to .bind()

Netlink messages sent by xfrm differ in size between 64-bit native and
32-bit compatible applications. To know which UABI to use to send the
message from kernel, I'll use the type of bind() syscall.
Xfrm will have hidden from userspace kernel-only groups for compatible
applications.
So, add pointer to groups to netlink_bind().
With later patches xfrm will set a proper compat group for netlink
socket during bind().

Cc: "David S. Miller" <[email protected]>
Cc: Eric Paris <[email protected]>
Cc: Florian Westphal <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jozsef Kadlecsik <[email protected]>
Cc: Pablo Neira Ayuso <[email protected]>
Cc: Paul Moore <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
include/linux/netlink.h | 2 +-
kernel/audit.c | 2 +-
net/core/rtnetlink.c | 14 ++++++--------
net/core/sock_diag.c | 25 ++++++++++++-------------
net/netfilter/nfnetlink.c | 24 ++++++++++++++----------
net/netlink/af_netlink.c | 27 ++++++++++-----------------
net/netlink/af_netlink.h | 4 ++--
net/netlink/genetlink.c | 26 ++++++++++++++++++--------
8 files changed, 64 insertions(+), 60 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index f3075d6c7e82..19202648e04a 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -46,7 +46,7 @@ struct netlink_kernel_cfg {
unsigned int flags;
void (*input)(struct sk_buff *skb);
struct mutex *cb_mutex;
- int (*bind)(struct net *net, int group);
+ int (*bind)(struct net *net, unsigned long *groups);
void (*unbind)(struct net *net, int group);
bool (*compare)(struct net *net, struct sock *sk);
};
diff --git a/kernel/audit.c b/kernel/audit.c
index e7478cb58079..87ca0214bcf2 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1523,7 +1523,7 @@ static void audit_receive(struct sk_buff *skb)
}

/* Run custom bind function on netlink socket group connect or bind requests. */
-static int audit_bind(struct net *net, int group)
+static int audit_bind(struct net *net, unsigned long *groups)
{
if (!capable(CAP_AUDIT_READ))
return -EPERM;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e3f743c141b3..0465e692ae32 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4683,15 +4683,13 @@ static void rtnetlink_rcv(struct sk_buff *skb)
netlink_rcv_skb(skb, &rtnetlink_rcv_msg);
}

-static int rtnetlink_bind(struct net *net, int group)
+static int rtnetlink_bind(struct net *net, unsigned long *groups)
{
- switch (group) {
- case RTNLGRP_IPV4_MROUTE_R:
- case RTNLGRP_IPV6_MROUTE_R:
- if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
- return -EPERM;
- break;
- }
+ unsigned long mroute_r;
+
+ mroute_r = 1UL << RTNLGRP_IPV4_MROUTE_R | 1UL << RTNLGRP_IPV6_MROUTE_R;
+ if ((*groups & mroute_r) && !ns_capable(net->user_ns, CAP_NET_ADMIN))
+ return -EPERM;
return 0;
}

diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index c37b5be7c5e4..befa6759f2ad 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -273,20 +273,19 @@ static void sock_diag_rcv(struct sk_buff *skb)
mutex_unlock(&sock_diag_mutex);
}

-static int sock_diag_bind(struct net *net, int group)
+static int sock_diag_bind(struct net *net, unsigned long *groups)
{
- switch (group) {
- case SKNLGRP_INET_TCP_DESTROY:
- case SKNLGRP_INET_UDP_DESTROY:
- if (!sock_diag_handlers[AF_INET])
- sock_load_diag_module(AF_INET, 0);
- break;
- case SKNLGRP_INET6_TCP_DESTROY:
- case SKNLGRP_INET6_UDP_DESTROY:
- if (!sock_diag_handlers[AF_INET6])
- sock_load_diag_module(AF_INET6, 0);
- break;
- }
+ unsigned long inet_mask, inet6_mask;
+
+ inet_mask = 1UL << SKNLGRP_INET_TCP_DESTROY;
+ inet_mask |= 1UL << SKNLGRP_INET_UDP_DESTROY;
+ inet6_mask = 1UL << SKNLGRP_INET6_TCP_DESTROY;
+ inet6_mask |= 1UL << SKNLGRP_INET6_UDP_DESTROY;
+
+ if ((*groups & inet_mask) && !sock_diag_handlers[AF_INET])
+ sock_load_diag_module(AF_INET, 0);
+ if ((*groups & inet6_mask) && !sock_diag_handlers[AF_INET6])
+ sock_load_diag_module(AF_INET6, 0);
return 0;
}

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index e1b6be29848d..6a8893df5285 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -556,21 +556,25 @@ static void nfnetlink_rcv(struct sk_buff *skb)
}

#ifdef CONFIG_MODULES
-static int nfnetlink_bind(struct net *net, int group)
+static int nfnetlink_bind(struct net *net, unsigned long *groups)
{
const struct nfnetlink_subsystem *ss;
- int type;
+ unsigned long _groups = *groups;
+ int type, group_bit, group = -1;

- if (group <= NFNLGRP_NONE || group > NFNLGRP_MAX)
- return 0;
+ while ((group_bit = __builtin_ffsl(_groups))) {
+ group += group_bit;

- type = nfnl_group2type[group];
+ type = nfnl_group2type[group];
+ rcu_read_lock();
+ ss = nfnetlink_get_subsys(type << 8);
+ rcu_read_unlock();
+ if (!ss)
+ request_module("nfnetlink-subsys-%d", type);
+
+ _groups >>= group_bit;
+ }

- rcu_read_lock();
- ss = nfnetlink_get_subsys(type << 8);
- rcu_read_unlock();
- if (!ss)
- request_module("nfnetlink-subsys-%d", type);
return 0;
}
#endif
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ac805caed2e2..1e11e706c683 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -668,7 +668,7 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol,
struct module *module = NULL;
struct mutex *cb_mutex;
struct netlink_sock *nlk;
- int (*bind)(struct net *net, int group);
+ int (*bind)(struct net *net, unsigned long *groups);
void (*unbind)(struct net *net, int group);
int err = 0;

@@ -969,8 +969,7 @@ static int netlink_realloc_groups(struct sock *sk)
return err;
}

-static void netlink_undo_bind(int group, long unsigned int groups,
- struct sock *sk)
+static void netlink_undo_bind(unsigned long groups, struct sock *sk)
{
struct netlink_sock *nlk = nlk_sk(sk);
int undo;
@@ -978,7 +977,7 @@ static void netlink_undo_bind(int group, long unsigned int groups,
if (!nlk->netlink_unbind)
return;

- for (undo = 0; undo < group; undo++)
+ for (undo = 0; undo < nlk->ngroups; undo++)
if (test_bit(undo, &groups))
nlk->netlink_unbind(sock_net(sk), undo + 1);
}
@@ -991,7 +990,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
struct netlink_sock *nlk = nlk_sk(sk);
struct sockaddr_nl *nladdr = (struct sockaddr_nl *)addr;
int err = 0;
- long unsigned int groups = nladdr->nl_groups;
+ unsigned long groups = nladdr->nl_groups;
bool bound;

if (addr_len < sizeof(struct sockaddr_nl))
@@ -1021,17 +1020,9 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,

netlink_lock_table();
if (nlk->netlink_bind && groups) {
- int group;
-
- for (group = 0; group < nlk->ngroups; group++) {
- if (!test_bit(group, &groups))
- continue;
- err = nlk->netlink_bind(net, group + 1);
- if (!err)
- continue;
- netlink_undo_bind(group, groups, sk);
+ err = nlk->netlink_bind(net, &groups);
+ if (err)
goto unlock;
- }
}

/* No need for barriers here as we return to user-space without
@@ -1042,7 +1033,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
netlink_insert(sk, nladdr->nl_pid) :
netlink_autobind(sock);
if (err) {
- netlink_undo_bind(nlk->ngroups, groups, sk);
+ netlink_undo_bind(groups, sk);
goto unlock;
}
}
@@ -1652,7 +1643,9 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
if (!val || val - 1 >= nlk->ngroups)
return -EINVAL;
if (optname == NETLINK_ADD_MEMBERSHIP && nlk->netlink_bind) {
- err = nlk->netlink_bind(sock_net(sk), val);
+ unsigned long groups = 1UL << val;
+
+ err = nlk->netlink_bind(sock_net(sk), &groups);
if (err)
return err;
}
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 962de7b3c023..e765172abbb7 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -39,7 +39,7 @@ struct netlink_sock {
struct mutex *cb_mutex;
struct mutex cb_def_mutex;
void (*netlink_rcv)(struct sk_buff *skb);
- int (*netlink_bind)(struct net *net, int group);
+ int (*netlink_bind)(struct net *net, unsigned long *groups);
void (*netlink_unbind)(struct net *net, int group);
struct module *module;

@@ -61,7 +61,7 @@ struct netlink_table {
unsigned int groups;
struct mutex *cb_mutex;
struct module *module;
- int (*bind)(struct net *net, int group);
+ int (*bind)(struct net *net, unsigned long *groups);
void (*unbind)(struct net *net, int group);
bool (*compare)(struct net *net, struct sock *sock);
int registered;
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 25eeb6d2a75a..a86b105730cf 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -960,28 +960,38 @@ static struct genl_family genl_ctrl __ro_after_init = {
.netnsok = true,
};

-static int genl_bind(struct net *net, int group)
+static int genl_bind(struct net *net, unsigned long *groups)
{
+ unsigned long mcgrps;
struct genl_family *f;
- int err = -ENOENT;
+ int err = 0;
unsigned int id;

down_read(&cb_lock);

idr_for_each_entry(&genl_fam_idr, f, id) {
- if (group >= f->mcgrp_offset &&
- group < f->mcgrp_offset + f->n_mcgrps) {
- int fam_grp = group - f->mcgrp_offset;
+ int fam_grp_bit, fam_grp = -1;
+
+ mcgrps = (1UL << f->n_mcgrps) - 1;
+ mcgrps <<= f->mcgrp_offset;
+ mcgrps &= *groups;
+
+ if (!mcgrps)
+ continue;
+
+ while ((fam_grp_bit = __builtin_ffsl(mcgrps))) {
+ fam_grp += fam_grp_bit;

if (!f->netnsok && net != &init_net)
err = -ENOENT;
else if (f->mcast_bind)
err = f->mcast_bind(net, fam_grp);
- else
- err = 0;
- break;
+
+ if (err)
+ goto out;
}
}
+out:
up_read(&cb_lock);

return err;
--
2.13.6


2018-07-26 02:35:17

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 05/18] net/xfrm: Parse userspi_info{,_packed} depending on syscall

Struct xfrm_userspi_info differs in size between 64-bit/32-bit UAPI
because of (possible) padding of xfrm_usersa_info:

32-bit 64-bit
----------------------------------------------------------------------
sizeof(xfrm_userspi_info) = 228 | sizeof(xfrm_userspi_info) = 232
xfrm_userspi_info::info = 0 | xfrm_userspi_info::info = 0
xfrm_userspi_info::min = 220 | xfrm_userspi_info::min = 224
xfrm_userspi_info::max = 224 | xfrm_userspi_info::max = 228

xfrm_alloc_userspi() can handle both UAPI by checking the type of
send() syscall used by userspace with XFRM_MSG_ALLOCSPI.

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_user.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index b382cdd3bef6..bf2ca93edaf5 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -61,6 +61,12 @@ struct xfrm_userpolicy_info_packed {
__u8 share;
} __packed;

+struct xfrm_userspi_info_packed {
+ struct xfrm_usersa_info_packed info;
+ __u32 min;
+ __u32 max;
+} __packed;
+
static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
{
struct nlattr *rt = attrs[type];
@@ -1279,11 +1285,21 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
xfrm_address_t *daddr;
int family;
int err;
- u32 mark;
+ u32 mark, spi_min, spi_max;
struct xfrm_mark m;

p = nlmsg_data(nlh);
- err = verify_spi_info(p->info.id.proto, p->min, p->max);
+ if (in_compat_syscall()) {
+ struct xfrm_userspi_info_packed *_p = nlmsg_data(nlh);
+
+ spi_min = _p->min;
+ spi_max = _p->max;
+ } else {
+ spi_min = p->min;
+ spi_max = p->max;
+ }
+
+ err = verify_spi_info(p->info.id.proto, spi_min, spi_max);
if (err)
goto out_noput;

@@ -1310,7 +1326,7 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
if (x == NULL)
goto out_noput;

- err = xfrm_alloc_spi(x, p->min, p->max);
+ err = xfrm_alloc_spi(x, spi_min, spi_max);
if (err)
goto out;

--
2.13.6


2018-07-26 02:35:26

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 10/18] xfrm: Send state notifications in compat format too

Applications that used native bind() syscall are in XFRMNLGRP_SA, so
send there xfrm_usersa_info messages (with 64-bit ABI). Compatible
applications are added to kernel-hidden XFRMNLGRP_COMPAT_SA group, so
send there xfrm_usersa_info messages_packed (with 32-bit ABI)

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_user.c | 38 +++++++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 63622264a3a9..230462077dc9 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2856,18 +2856,24 @@ static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
return l;
}

-static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
+static int __xfrm_notify_sa(struct xfrm_state *x,
+ const struct km_event *c, bool compat)
{
struct net *net = xs_net(x);
- struct xfrm_usersa_info *p;
struct xfrm_usersa_id *id;
struct nlmsghdr *nlh;
struct sk_buff *skb;
unsigned int len = xfrm_sa_len(x);
- unsigned int headlen;
+ unsigned int headlen, usersa_info_size;
+ void *usersa_info;
int err;

- headlen = sizeof(*p);
+ if (compat)
+ usersa_info_size = sizeof(struct xfrm_usersa_info_packed);
+ else
+ usersa_info_size = sizeof(struct xfrm_usersa_info);
+ headlen = usersa_info_size;
+
if (c->event == XFRM_MSG_DELSA) {
len += nla_total_size(headlen);
headlen = sizeof(*id);
@@ -2884,7 +2890,7 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
if (nlh == NULL)
goto out_free_skb;

- p = nlmsg_data(nlh);
+ usersa_info = nlmsg_data(nlh);
if (c->event == XFRM_MSG_DELSA) {
struct nlattr *attr;

@@ -2895,26 +2901,40 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
id->family = x->props.family;
id->proto = x->id.proto;

- attr = nla_reserve(skb, XFRMA_SA, sizeof(*p));
+ attr = nla_reserve(skb, XFRMA_SA, usersa_info_size);
err = -EMSGSIZE;
if (attr == NULL)
goto out_free_skb;

- p = nla_data(attr);
+ usersa_info = nla_data(attr);
}
- err = copy_to_user_state_extra(x, p, skb);
+
+ if (compat)
+ err = copy_to_user_state_extra(x, usersa_info, skb);
+ else
+ err = copy_to_user_state_extra_compat(x, usersa_info, skb);
if (err)
goto out_free_skb;

nlmsg_end(skb, nlh);

- return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_SA);
+ return xfrm_nlmsg_multicast(net, skb, 0,
+ compat ? XFRMNLGRP_COMPAT_SA : XFRMNLGRP_SA);

out_free_skb:
kfree_skb(skb);
return err;
}

+static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
+{
+ int ret = __xfrm_notify_sa(x, c, false);
+
+ if ((ret && ret != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+ return ret;
+ return __xfrm_notify_sa(x, c, true);
+}
+
static int xfrm_send_state_notify(struct xfrm_state *x, const struct km_event *c)
{

--
2.13.6


2018-07-26 02:35:36

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 01/18] x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT

The result of in_compat_syscall() can be pictured as:

x86 platform:
---------------------------------------------------
| Arch\syscall | 64-bit | ia32 | x32 |
|-------------------------------------------------|
| x86_64 | false | true | true |
|-------------------------------------------------|
| i686 | false | <true> | false |
---------------------------------------------------

Other platforms:
------------------------------------------------
| Arch\syscall | 64-bit | compat (32?) |
|----------------------------------------------|
| 64-bit | false | true |
|----------------------------------------------|
| 32-bit(?) | false | <false> |
------------------------------------------------

As it seen, the result of in_compat_syscall() on generic 32-bit platform
differs from i686.

There is no reason for in_compat_syscall() == true on native i686.
It also easy to misread code if the result on native 32-bit platform
differs between arches.
Because of that non arch-specific code has many places with:
if (IS_ENABLED(CONFIG_COMPAT) && in_compat_syscall())
in different variations.

It looks-like the only non-x86 code which uses in_compat_syscall() not
under CONFIG_COMPAT guard is in amd/amdkfd. But according to
the commit a18069c132cb ("amdkfd: Disable support for 32-bit user
processes"), it actually should be disabled on native i686.

Rename in_compat_syscall() to in_32bit_syscall() for x86-specific code
and make in_compat_syscall() false under !CONFIG_COMPAT.

With a following patch I'll clean generic users which were forced
to check IS_ENABLED(CONFIG_COMPAT) with in_compat_syscall().

Cc: Andy Lutomirski <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: John Stultz <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
arch/x86/include/asm/compat.h | 9 ++++++++-
arch/x86/include/asm/ftrace.h | 4 +---
arch/x86/kernel/process_64.c | 4 ++--
arch/x86/kernel/sys_x86_64.c | 11 ++++++-----
arch/x86/mm/hugetlbpage.c | 4 ++--
arch/x86/mm/mmap.c | 2 +-
include/linux/compat.h | 4 ++--
7 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index fb97cf7c4137..626bcf1d037d 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -232,11 +232,18 @@ static inline bool in_x32_syscall(void)
return false;
}

-static inline bool in_compat_syscall(void)
+static inline bool in_32bit_syscall(void)
{
return in_ia32_syscall() || in_x32_syscall();
}
+
+#ifdef CONFIG_COMPAT
+static inline bool in_compat_syscall(void)
+{
+ return in_32bit_syscall();
+}
#define in_compat_syscall in_compat_syscall /* override the generic impl */
+#endif

struct compat_siginfo;
int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index c18ed65287d5..cf350639e76d 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -76,9 +76,7 @@ static inline bool arch_syscall_match_sym_name(const char *sym, const char *name
#define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS 1
static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
{
- if (in_compat_syscall())
- return true;
- return false;
+ return in_32bit_syscall();
}
#endif /* CONFIG_FTRACE_SYSCALLS && CONFIG_IA32_EMULATION */
#endif /* !COMPILE_OFFSETS */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 12bb445fb98d..3a6f3cf27808 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -564,10 +564,10 @@ static void __set_personality_x32(void)
current->mm->context.ia32_compat = TIF_X32;
current->personality &= ~READ_IMPLIES_EXEC;
/*
- * in_compat_syscall() uses the presence of the x32 syscall bit
+ * in_32bit_syscall() uses the presence of the x32 syscall bit
* flag to determine compat status. The x86 mmap() code relies on
* the syscall bitness so set x32 syscall bit right here to make
- * in_compat_syscall() work during exec().
+ * in_32bit_syscall() work during exec().
*
* Pretend to come from a x32 execve.
*/
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 6a78d4b36a79..f7476ce23b6e 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -105,7 +105,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
static void find_start_end(unsigned long addr, unsigned long flags,
unsigned long *begin, unsigned long *end)
{
- if (!in_compat_syscall() && (flags & MAP_32BIT)) {
+ if (!in_32bit_syscall() && (flags & MAP_32BIT)) {
/* This is usually used needed to map code in small
model, so it needs to be in the first 31bit. Limit
it to that. This means we need to move the
@@ -122,7 +122,7 @@ static void find_start_end(unsigned long addr, unsigned long flags,
}

*begin = get_mmap_base(1);
- if (in_compat_syscall())
+ if (in_32bit_syscall())
*end = task_size_32bit();
else
*end = task_size_64bit(addr > DEFAULT_MAP_WINDOW);
@@ -193,7 +193,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
return addr;

/* for MAP_32BIT mappings we force the legacy mmap base */
- if (!in_compat_syscall() && (flags & MAP_32BIT))
+ if (!in_32bit_syscall() && (flags & MAP_32BIT))
goto bottomup;

/* requesting a specific address */
@@ -217,9 +217,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
* If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
* in the full address space.
*
- * !in_compat_syscall() check to avoid high addresses for x32.
+ * !in_32bit_syscall() check to avoid high addresses for x32
+ * (and make it no op on native i386).
*/
- if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall())
+ if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;

info.align_mask = 0;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 00b296617ca4..92e4c4b85bba 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -92,7 +92,7 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
* If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
* in the full address space.
*/
- info.high_limit = in_compat_syscall() ?
+ info.high_limit = in_32bit_syscall() ?
task_size_32bit() : task_size_64bit(addr > DEFAULT_MAP_WINDOW);

info.align_mask = PAGE_MASK & ~huge_page_mask(h);
@@ -116,7 +116,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
* If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
* in the full address space.
*/
- if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall())
+ if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;

info.align_mask = PAGE_MASK & ~huge_page_mask(h);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 48c591251600..36334ce78be8 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -166,7 +166,7 @@ unsigned long get_mmap_base(int is_legacy)
struct mm_struct *mm = current->mm;

#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
- if (in_compat_syscall()) {
+ if (in_32bit_syscall()) {
return is_legacy ? mm->mmap_compat_legacy_base
: mm->mmap_compat_base;
}
diff --git a/include/linux/compat.h b/include/linux/compat.h
index c68acc47da57..4dd4b00407ab 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -1031,9 +1031,9 @@ static inline struct compat_timeval ns_to_compat_timeval(s64 nsec)
#else /* !CONFIG_COMPAT */

#define is_compat_task() (0)
-#ifndef in_compat_syscall
+/* Ensure no one redefines in_compat_syscall() under !CONFIG_COMPAT */
+#define in_compat_syscall in_compat_syscall
static inline bool in_compat_syscall(void) { return false; }
-#endif

#endif /* CONFIG_COMPAT */

--
2.13.6


2018-07-26 02:35:40

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 04/18] net/xfrm: Add _packed types for compat users

xfrm_usersa_info and xfrm_userpolicy_info structures differ in size
between 64-bit and 32-bit ABI. In 64-bit ABI there is additional
4-byte padding in the end of the structure:

32-bit:
sizeof(xfrm_usersa_info) = 220
sizeof(xfrm_userpolicy_info) = 164
64-bit:
sizeof(xfrm_usersa_info) = 224
sizeof(xfrm_userpolicy_info) = 168

In preparation to add compat support to xfrm it's needed to add _packed
verstion for those types.

Parse xfrm_usersa_info and xfrm_userpolicy_info netlink messages
sent by userspace using _packed structures (as we don't care about
parsing padding).
Sending _packed notification messages back to userspace will be done
with following patches (in copy_to_user_state() and
copy_to_user_policy()).

Cc: "David S. Miller" <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
net/xfrm/xfrm_user.c | 89 ++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 65 insertions(+), 24 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2677cb55b7a8..b382cdd3bef6 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -33,6 +33,34 @@
#endif
#include <asm/unaligned.h>

+struct xfrm_usersa_info_packed {
+ struct xfrm_selector sel;
+ struct xfrm_id id;
+ xfrm_address_t saddr;
+ struct xfrm_lifetime_cfg lft;
+ struct xfrm_lifetime_cur curlft;
+ struct xfrm_stats stats;
+ __u32 seq;
+ __u32 reqid;
+ __u16 family;
+ __u8 mode; /* XFRM_MODE_xxx */
+ __u8 replay_window;
+ __u8 flags;
+ __u8 __pad[3];
+} __packed;
+
+struct xfrm_userpolicy_info_packed {
+ struct xfrm_selector sel;
+ struct xfrm_lifetime_cfg lft;
+ struct xfrm_lifetime_cur curlft;
+ __u32 priority;
+ __u32 index;
+ __u8 dir;
+ __u8 action;
+ __u8 flags;
+ __u8 share;
+} __packed;
+
static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
{
struct nlattr *rt = attrs[type];
@@ -115,7 +143,7 @@ static inline int verify_sec_ctx_len(struct nlattr **attrs)
return 0;
}

-static inline int verify_replay(struct xfrm_usersa_info *p,
+static inline int verify_replay(struct xfrm_usersa_info_packed *p,
struct nlattr **attrs)
{
struct nlattr *rt = attrs[XFRMA_REPLAY_ESN_VAL];
@@ -143,7 +171,7 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
return 0;
}

-static int verify_newsa_info(struct xfrm_usersa_info *p,
+static int verify_newsa_info(struct xfrm_usersa_info_packed *p,
struct nlattr **attrs)
{
int err;
@@ -464,7 +492,8 @@ static inline unsigned int xfrm_user_sec_ctx_size(struct xfrm_sec_ctx *xfrm_ctx)
return len;
}

-static void copy_from_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p)
+static void copy_from_user_state(struct xfrm_state *x,
+ struct xfrm_usersa_info_packed *p)
{
memcpy(&x->id, &p->id, sizeof(x->id));
memcpy(&x->sel, &p->sel, sizeof(x->sel));
@@ -528,9 +557,8 @@ static void xfrm_update_ae_params(struct xfrm_state *x, struct nlattr **attrs,
}

static struct xfrm_state *xfrm_state_construct(struct net *net,
- struct xfrm_usersa_info *p,
- struct nlattr **attrs,
- int *errp)
+ struct xfrm_usersa_info_packed *p,
+ struct nlattr **attrs, int *errp)
{
struct xfrm_state *x = xfrm_state_alloc(net);
int err = -ENOMEM;
@@ -630,7 +658,7 @@ static int xfrm_add_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
struct nlattr **attrs)
{
struct net *net = sock_net(skb->sk);
- struct xfrm_usersa_info *p = nlmsg_data(nlh);
+ struct xfrm_usersa_info_packed *p = nlmsg_data(nlh);
struct xfrm_state *x;
int err;
struct km_event c;
@@ -1331,7 +1359,7 @@ static int verify_policy_type(u8 type)
return 0;
}

-static int verify_newpolicy_info(struct xfrm_userpolicy_info *p)
+static int verify_newpolicy_info(struct xfrm_userpolicy_info_packed *p)
{
int ret;

@@ -1513,7 +1541,8 @@ static int copy_from_user_policy_type(u8 *tp, struct nlattr **attrs)
return 0;
}

-static void copy_from_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy_info *p)
+static void copy_from_user_policy(struct xfrm_policy *xp,
+ struct xfrm_userpolicy_info_packed *p)
{
xp->priority = p->priority;
xp->index = p->index;
@@ -1540,7 +1569,9 @@ static void copy_to_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy_i
p->share = XFRM_SHARE_ANY; /* XXX xp->share */
}

-static struct xfrm_policy *xfrm_policy_construct(struct net *net, struct xfrm_userpolicy_info *p, struct nlattr **attrs, int *errp)
+static struct xfrm_policy *xfrm_policy_construct(struct net *net,
+ struct xfrm_userpolicy_info_packed *p,
+ struct nlattr **attrs, int *errp)
{
struct xfrm_policy *xp = xfrm_policy_alloc(net, GFP_KERNEL);
int err;
@@ -1575,7 +1606,7 @@ static int xfrm_add_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
struct nlattr **attrs)
{
struct net *net = sock_net(skb->sk);
- struct xfrm_userpolicy_info *p = nlmsg_data(nlh);
+ struct xfrm_userpolicy_info_packed *p = nlmsg_data(nlh);
struct xfrm_policy *xp;
struct km_event c;
int err;
@@ -2079,7 +2110,7 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
struct xfrm_policy *xp;
struct xfrm_user_polexpire *up = nlmsg_data(nlh);
- struct xfrm_userpolicy_info *p = &up->pol;
+ struct xfrm_userpolicy_info_packed *p = (void *)&up->pol;
u8 type = XFRM_POLICY_TYPE_MAIN;
int err = -ENOENT;
struct xfrm_mark m;
@@ -2140,7 +2171,7 @@ static int xfrm_add_sa_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
struct xfrm_state *x;
int err;
struct xfrm_user_expire *ue = nlmsg_data(nlh);
- struct xfrm_usersa_info *p = &ue->state;
+ struct xfrm_usersa_info_packed *p = (struct xfrm_usersa_info_packed *)&ue->state;
struct xfrm_mark m;
u32 mark = xfrm_mark_get(attrs, &m);

@@ -2178,6 +2209,7 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,
struct xfrm_mark mark;

struct xfrm_user_acquire *ua = nlmsg_data(nlh);
+ struct xfrm_userpolicy_info_packed *upi = (void *)&ua->policy;
struct xfrm_state *x = xfrm_state_alloc(net);
int err = -ENOMEM;

@@ -2186,12 +2218,12 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,

xfrm_mark_get(attrs, &mark);

- err = verify_newpolicy_info(&ua->policy);
+ err = verify_newpolicy_info(upi);
if (err)
goto free_state;

/* build an XP */
- xp = xfrm_policy_construct(net, &ua->policy, attrs, &err);
+ xp = xfrm_policy_construct(net, upi, attrs, &err);
if (!xp)
goto free_state;

@@ -2881,11 +2913,21 @@ static struct xfrm_policy *xfrm_compile_policy(struct sock *sk, int opt,
u8 *data, int len, int *dir)
{
struct net *net = sock_net(sk);
- struct xfrm_userpolicy_info *p = (struct xfrm_userpolicy_info *)data;
- struct xfrm_user_tmpl *ut = (struct xfrm_user_tmpl *) (p + 1);
+ struct xfrm_userpolicy_info *upi = (void *)data;
+ struct xfrm_userpolicy_info_packed *_upi = (void *)data;
+ size_t policy_size;
+ struct xfrm_user_tmpl *ut;
struct xfrm_policy *xp;
int nr;

+ if (in_compat_syscall()) {
+ ut = (struct xfrm_user_tmpl *)(_upi + 1);
+ policy_size = sizeof(*_upi);
+ } else {
+ ut = (struct xfrm_user_tmpl *)(upi + 1);
+ policy_size = sizeof(*upi);
+ }
+
switch (sk->sk_family) {
case AF_INET:
if (opt != IP_XFRM_POLICY) {
@@ -2908,15 +2950,14 @@ static struct xfrm_policy *xfrm_compile_policy(struct sock *sk, int opt,

*dir = -EINVAL;

- if (len < sizeof(*p) ||
- verify_newpolicy_info(p))
+ if (len < policy_size || verify_newpolicy_info(_upi))
return NULL;

- nr = ((len - sizeof(*p)) / sizeof(*ut));
- if (validate_tmpl(nr, ut, p->sel.family))
+ nr = ((len - policy_size) / sizeof(*ut));
+ if (validate_tmpl(nr, ut, _upi->sel.family))
return NULL;

- if (p->dir > XFRM_POLICY_OUT)
+ if (_upi->dir > XFRM_POLICY_OUT)
return NULL;

xp = xfrm_policy_alloc(net, GFP_ATOMIC);
@@ -2925,11 +2966,11 @@ static struct xfrm_policy *xfrm_compile_policy(struct sock *sk, int opt,
return NULL;
}

- copy_from_user_policy(xp, p);
+ copy_from_user_policy(xp, _upi);
xp->type = XFRM_POLICY_TYPE_MAIN;
copy_templates(xp, ut, nr);

- *dir = p->dir;
+ *dir = _upi->dir;

return xp;
}
--
2.13.6


2018-07-26 02:35:39

by Dmitry Safonov

[permalink] [raw]
Subject: [PATCH 03/18] selftest/net/xfrm: Add test for ipsec tunnel

It's an exhaustive testing for ipsec: covering all encryption/
authentication/compression algorithms. The tests are run in two
network namespaces, connected by veth interfaces. To make exhaustive
testing less time-consuming, the tests are run in parallel tasks,
specified by parameter to the selftest.

As the patches set adds support for xfrm in compatible tasks, there are
tests to check structures that differ in size between 64-bit and 32-bit
applications.
The selftest doesn't use libnl so that it can be easily compiled as
compatible application and don't require compatible .so.

Here is a diagram of the selftest:

---------------
| selftest |
| (parent) |
---------------
| |
| (pipe) |
----------
/ | | \
/------------- / \ -------------\
| /----- -----\ |
---------|----------|----------------|----------|---------
| --------- --------- --------- --------- |
| | child | | child | NS A | child | | child | |
| --------- --------- --------- --------- |
-------|------------|----------------|-------------|------
veth0 veth1 veth2 vethN
---------|------------|----------------|-------------|----------
| ------------ ------------ ------------ ------------ |
| | gr.child | | gr.child | NS B | gr.child | | gr.child | |
| ------------ ------------ ------------ ------------ |
----------------------------------------------------------------

The parent sends the description of a test (xfrm parameters) to the
child, the child and grand child setup a tunnel over veth interface and
test it by sending udp packets.

Cc: Andrew Morton <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
---
MAINTAINERS | 1 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/ipsec.c | 1987 ++++++++++++++++++++++++++++++++
4 files changed, 1990 insertions(+)
create mode 100644 tools/testing/selftests/net/ipsec.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 0fe4228f78cb..7e20db5d0210 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9938,6 +9938,7 @@ F: net/ipv6/ipcomp6.c
F: net/ipv6/ip6_vti.c
F: include/uapi/linux/xfrm.h
F: include/net/xfrm.h
+F: tools/testing/selftests/net/ipsec.c

NETWORKING [IPv4/IPv6]
M: "David S. Miller" <[email protected]>
diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 1a0ac3a29ec5..6896547292cb 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -1,3 +1,4 @@
+ipsec
msg_zerocopy
socket
psock_fanout
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 663e11e85727..9f35c01fbc0a 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -12,6 +12,7 @@ TEST_GEN_FILES = socket
TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd
TEST_GEN_FILES += udpgso udpgso_bench_tx udpgso_bench_rx
+TEST_GEN_FILES += ipsec
TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict

diff --git a/tools/testing/selftests/net/ipsec.c b/tools/testing/selftests/net/ipsec.c
new file mode 100644
index 000000000000..e0752e127ba7
--- /dev/null
+++ b/tools/testing/selftests/net/ipsec.c
@@ -0,0 +1,1987 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ipsec.c - Check xfrm on veth inside a net-ns.
+ * Copyright (c) 2018 Dmitry Safonov (Arista Networks)
+ */
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <asm/types.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <linux/limits.h>
+#include <linux/netlink.h>
+#include <linux/random.h>
+#include <linux/rtnetlink.h>
+#include <linux/veth.h>
+#include <linux/xfrm.h>
+#include <netinet/in.h>
+#include <net/if.h>
+#include <sched.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+
+#define printk(fmt, lvl, ...) \
+ fprintf(stderr, "[%s] (%s:%d)\t" fmt "\n", \
+ lvl, __FILE__, __LINE__, ##__VA_ARGS__)
+
+#define pr_p(func, fmt, ...) func(fmt ": %m", ##__VA_ARGS__)
+
+#define pr_err(fmt, ...) \
+ printk(fmt, "ERR", ##__VA_ARGS__)
+#define pr_warn(fmt, ...) \
+ printk(fmt, "WARN", ##__VA_ARGS__)
+#define pr_note(fmt, ...) \
+ printk(fmt, "NOTE", ##__VA_ARGS__)
+#define pr_ok(fmt, ...) \
+ printk(fmt, "OK", ##__VA_ARGS__)
+#define pr_debug(fmt, ...) \
+ while (0) { \
+ printk(fmt, "NOTE", ##__VA_ARGS__); \
+ }
+
+#define pr_perror(fmt, ...) pr_p(pr_err, fmt, ##__VA_ARGS__)
+#define pr_pwarn(fmt, ...) pr_p(pr_warn, fmt, ##__VA_ARGS__)
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
+
+#define IPV4_STR_SZ 16 /* xxx.xxx.xxx.xxx is longest + \0 */
+#define MAX_PAYLOAD 2048
+#define XFRM_ALGO_KEY_BUF_SIZE 512
+#define MAX_PROCESSES (1 << 14) /* /16 mask divided by /30 subnets */
+#define INADDR_A ((in_addr_t) 0x0a000000) /* 10.0.0.0 */
+#define INADDR_B ((in_addr_t) 0xc0a80000) /* 192.168.0.0 */
+
+/* /30 mask for one veth connection */
+#define PREFIX_LEN 30
+#define child_ip(nr) (4*nr + 1)
+#define grchild_ip(nr) (4*nr + 2)
+
+#define VETH_FMT "ktst-%d"
+#define VETH_LEN 10
+#define BEGIN_SEQ (time(NULL))
+
+static int nsfd_parent = -1;
+static int nsfd_childa = -1;
+static int nsfd_childb = -1;
+static long page_size;
+
+const unsigned int ping_delay_nsec = 50 * 1000 * 1000;
+const unsigned int ping_timeout = 300;
+const unsigned int ping_count = 100;
+const unsigned int ping_success = 80;
+
+static int unshare_open(void)
+{
+ const char *netns_path = "/proc/self/ns/net";
+ int fd;
+
+ if (unshare(CLONE_NEWNET) != 0) {
+ pr_pwarn("unshare()");
+ return -1;
+ }
+
+ fd = open(netns_path, O_RDONLY);
+ if (fd <= 0) {
+ pr_pwarn("open(%s)", netns_path);
+ return -1;
+ }
+
+ return fd;
+}
+
+static int switch_ns(int fd)
+{
+ if (setns(fd, CLONE_NEWNET)) {
+ pr_pwarn("setns()");
+ return -1;
+ }
+ return 0;
+}
+
+/*
+ * Running the test inside a new parent net namespace to bother less
+ * about cleanup on error-path.
+ */
+static int init_namespaces(void)
+{
+ nsfd_parent = unshare_open();
+ if (nsfd_parent <= 0)
+ return -1;
+
+ nsfd_childa = unshare_open();
+ if (nsfd_childa <= 0)
+ return -1;
+
+ if (switch_ns(nsfd_parent))
+ return -1;
+
+ nsfd_childb = unshare_open();
+ if (nsfd_childb <= 0)
+ return -1;
+
+ if (switch_ns(nsfd_parent))
+ return -1;
+ return 0;
+}
+
+static int netlink_sock(int *sock, uint32_t *seq_nr, int proto)
+{
+ int route_sock = 0;
+ uint32_t seq;
+
+ if (*sock > 0) {
+ seq_nr++;
+ return 0;
+ }
+
+ *sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, proto);
+ if (*sock <= 0) {
+ pr_pwarn("socket(AF_NETLINK)");
+ return errno;
+ }
+ seq = BEGIN_SEQ;
+
+ return 0;
+}
+
+static inline struct rtattr *rtattr_hdr(struct nlmsghdr *nh)
+{
+ return (struct rtattr *)((char *)(nh) + RTA_ALIGN((nh)->nlmsg_len));
+}
+
+static int rtattr_pack(struct nlmsghdr *nh, size_t req_sz,
+ unsigned short rta_type, const void *payload, size_t size)
+{
+ /* NLMSG_ALIGNTO == RTA_ALIGNTO, nlmsg_len already aligned */
+ struct rtattr *attr = rtattr_hdr(nh);
+ size_t nl_size = RTA_ALIGN(nh->nlmsg_len) + RTA_LENGTH(size);
+
+ if (req_sz < nl_size) {
+ pr_err("req buf is too small: %zu < %zu", req_sz, nl_size);
+ return -1;
+ }
+ nh->nlmsg_len = nl_size;
+
+ attr->rta_len = RTA_LENGTH(size); /* XXX: rta_len = size? */
+ attr->rta_type = rta_type;
+ memcpy(RTA_DATA(attr), payload, size);
+
+ return 0;
+}
+
+static struct rtattr *_rtattr_begin(struct nlmsghdr *nh, size_t req_sz,
+ unsigned short rta_type, const void *payload, size_t size)
+{
+ struct rtattr *ret = rtattr_hdr(nh);
+
+ if (rtattr_pack(nh, req_sz, rta_type, payload, size))
+ return 0;
+
+ return ret;
+}
+
+static inline struct rtattr *rtattr_begin(struct nlmsghdr *nh, size_t req_sz,
+ unsigned short rta_type)
+{
+ return _rtattr_begin(nh, req_sz, rta_type, 0, 0);
+}
+
+static inline void rtattr_end(struct nlmsghdr *nh, struct rtattr *attr)
+{
+ char *nlmsg_end = (char *)nh + nh->nlmsg_len;
+
+ attr->rta_len = nlmsg_end - (char *)attr;
+}
+
+static int veth_pack_peerb(struct nlmsghdr *nh, size_t req_sz,
+ const char *peer, int ns)
+{
+ struct ifinfomsg pi;
+ struct rtattr *peer_attr;
+
+ memset(&pi, 0, sizeof(pi));
+ pi.ifi_family = AF_UNSPEC;
+ pi.ifi_change = 0xFFFFFFFF;
+
+ peer_attr = _rtattr_begin(nh, req_sz, VETH_INFO_PEER, &pi, sizeof(pi));
+ if (!peer_attr)
+ return -1;
+
+ if (rtattr_pack(nh, req_sz, IFLA_IFNAME, peer, strlen(peer)))
+ return -1;
+
+ if (rtattr_pack(nh, req_sz, IFLA_NET_NS_FD, &ns, sizeof(ns)))
+ return -1;
+
+ rtattr_end(nh, peer_attr);
+
+ return 0;
+}
+
+static int netlink_check_answer(int sock)
+{
+ struct nlmsgerror {
+ struct nlmsghdr hdr;
+ int error;
+ struct nlmsghdr orig_msg;
+ } answer;
+
+ if (recv(sock, &answer, sizeof(answer), 0) < 0) {
+ pr_perror("recv()");
+ return -1;
+ } else if (answer.hdr.nlmsg_type != NLMSG_ERROR) {
+ pr_err("expected NLMSG_ERROR, got %d", (int)answer.hdr.nlmsg_type);
+ return -1;
+ } else if (answer.error) {
+ pr_err("NLMSG_ERROR: %d: %s",
+ answer.error, strerror(-answer.error));
+ return answer.error;
+ }
+
+ return 0;
+}
+
+static int veth_add(int sock, uint32_t seq, const char *peera, int ns_a,
+ const char *peerb, int ns_b)
+{
+ uint16_t flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg info;
+ char attrbuf[MAX_PAYLOAD];
+ } req;
+ const char veth_type[] = "veth";
+ struct rtattr *link_info, *info_data;
+
+ memset(&req, 0, sizeof(req));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.info));
+ req.nh.nlmsg_type = RTM_NEWLINK;
+ req.nh.nlmsg_flags = flags;
+ req.nh.nlmsg_seq = seq;
+ req.info.ifi_family = AF_UNSPEC;
+ req.info.ifi_change = 0xFFFFFFFF;
+
+ if (rtattr_pack(&req.nh, sizeof(req), IFLA_IFNAME, peera, strlen(peera)))
+ return -1;
+
+ if (rtattr_pack(&req.nh, sizeof(req), IFLA_NET_NS_FD, &ns_a, sizeof(ns_a)))
+ return -1;
+
+ link_info = rtattr_begin(&req.nh, sizeof(req), IFLA_LINKINFO);
+ if (!link_info)
+ return -1;
+
+ if (rtattr_pack(&req.nh, sizeof(req), IFLA_INFO_KIND, veth_type, sizeof(veth_type)))
+ return -1;
+
+ info_data = rtattr_begin(&req.nh, sizeof(req), IFLA_INFO_DATA);
+ if (!info_data)
+ return -1;
+
+ if (veth_pack_peerb(&req.nh, sizeof(req), peerb, ns_b))
+ return -1;
+
+ rtattr_end(&req.nh, info_data);
+ rtattr_end(&req.nh, link_info);
+
+ if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ return -1;
+ }
+ return netlink_check_answer(sock);
+}
+
+static int ip4_addr_set(int sock, uint32_t seq, const char *intf,
+ struct in_addr addr, uint8_t prefix)
+{
+ uint16_t flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
+ struct {
+ struct nlmsghdr nh;
+ struct ifaddrmsg info;
+ char attrbuf[MAX_PAYLOAD];
+ } req;
+
+ memset(&req, 0, sizeof(req));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.info));
+ req.nh.nlmsg_type = RTM_NEWADDR;
+ req.nh.nlmsg_flags = flags;
+ req.nh.nlmsg_seq = seq;
+ req.info.ifa_family = AF_INET;
+ req.info.ifa_prefixlen = prefix;
+ req.info.ifa_index = if_nametoindex(intf);
+
+#if 0
+ {
+ char addr_str[IPV4_STR_SZ] = {};
+
+ strncpy(addr_str, inet_ntoa(addr), IPV4_STR_SZ - 1);
+
+ pr_warn("ip addr set %s", addr_str);
+ }
+#endif
+
+ if (rtattr_pack(&req.nh, sizeof(req), IFA_LOCAL, &addr, sizeof(addr)))
+ return -1;
+
+ if (rtattr_pack(&req.nh, sizeof(req), IFA_ADDRESS, &addr, sizeof(addr)))
+ return -1;
+
+ if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ return -1;
+ }
+ return netlink_check_answer(sock);
+}
+
+static int link_set_up(int sock, uint32_t seq, const char *intf)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg info;
+ char attrbuf[MAX_PAYLOAD];
+ } req;
+
+ memset(&req, 0, sizeof(req));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.info));
+ req.nh.nlmsg_type = RTM_NEWLINK;
+ req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ req.nh.nlmsg_seq = seq;
+ req.info.ifi_family = AF_UNSPEC;
+ req.info.ifi_change = 0xFFFFFFFF;
+ req.info.ifi_index = if_nametoindex(intf);
+ req.info.ifi_flags = IFF_UP;
+ req.info.ifi_change = IFF_UP;
+
+ if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ return -1;
+ }
+ return netlink_check_answer(sock);
+}
+
+static int ip4_route_set(int sock, uint32_t seq, const char *intf,
+ struct in_addr src, struct in_addr dst)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct rtmsg rt;
+ char attrbuf[MAX_PAYLOAD];
+ } req;
+ unsigned int index = if_nametoindex(intf);
+
+ memset(&req, 0, sizeof(req));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.rt));
+ req.nh.nlmsg_type = RTM_NEWROUTE;
+ req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE;
+ req.nh.nlmsg_seq = seq;
+ req.rt.rtm_family = AF_INET;
+ req.rt.rtm_dst_len = 32;
+ req.rt.rtm_table = RT_TABLE_MAIN;
+ req.rt.rtm_protocol = RTPROT_BOOT;
+ req.rt.rtm_scope = RT_SCOPE_LINK;
+ req.rt.rtm_type = RTN_UNICAST;
+
+ if (rtattr_pack(&req.nh, sizeof(req), RTA_DST, &dst, sizeof(dst)))
+ return -1;
+
+ if (rtattr_pack(&req.nh, sizeof(req), RTA_PREFSRC, &src, sizeof(src)))
+ return -1;
+
+ if (rtattr_pack(&req.nh, sizeof(req), RTA_OIF, &index, sizeof(index)))
+ return -1;
+
+ if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ return -1;
+ }
+
+ return netlink_check_answer(sock);
+}
+
+static int tunnel_set_route(int route_sock, uint32_t *route_seq, char *veth,
+ struct in_addr tunsrc, struct in_addr tundst)
+{
+ if (ip4_addr_set(route_sock, (*route_seq)++, "lo",
+ tunsrc, PREFIX_LEN)) {
+ pr_err("Failed to set ipv4 addr");
+ return -1;
+ }
+
+ if (ip4_route_set(route_sock, (*route_seq)++, veth, tunsrc, tundst)) {
+ pr_err("Failed to set ipv4 route");
+ return -1;
+ }
+}
+
+static int init_child(int nsfd, char *veth, unsigned int src, unsigned int dst)
+{
+ struct in_addr intsrc = inet_makeaddr(INADDR_B, src);
+ struct in_addr tunsrc = inet_makeaddr(INADDR_A, src);
+ struct in_addr tundst = inet_makeaddr(INADDR_A, dst);
+ int route_sock = -1, ret = -1;
+ uint32_t route_seq;
+
+ if (switch_ns(nsfd))
+ return -1;
+
+ if (netlink_sock(&route_sock, &route_seq, NETLINK_ROUTE)) {
+ pr_err("Failed to open netlink route socket in child");
+ return -1;
+ }
+
+ if (ip4_addr_set(route_sock, route_seq++, veth, intsrc, PREFIX_LEN)) {
+ pr_err("Failed to set ipv4 addr");
+ goto err;
+ }
+
+ if (link_set_up(route_sock, route_seq++, veth)) {
+ pr_err("Failed to bring up %s", veth);
+ goto err;
+ }
+
+ if (tunnel_set_route(route_sock, &route_seq, veth, tunsrc, tundst)) {
+ pr_err("Failed to add tunnel route on %s", veth);
+ goto err;
+ }
+ ret = 0;
+
+err:
+ close(route_sock);
+ return ret;
+}
+
+#define ALGO_LEN 64
+enum desc_type {
+ CREATE_TUNNEL = 0,
+ ALLOCATE_SPI,
+ MONITOR_ACQUIRE,
+ EXPIRE_STATE,
+ EXPIRE_POLICY,
+};
+struct xfrm_desc {
+ enum desc_type type;
+ uint8_t proto;
+ char a_algo[ALGO_LEN];
+ char e_algo[ALGO_LEN];
+ char c_algo[ALGO_LEN];
+ char ae_algo[ALGO_LEN];
+ unsigned int icv_len;
+ /* unsigned key_len; */
+};
+
+enum msg_type {
+ MSG_ACK = 0,
+ MSG_EXIT,
+ MSG_PING,
+ MSG_XFRM_PREPARE,
+ MSG_XFRM_ADD,
+ MSG_XFRM_DEL,
+ MSG_XFRM_CLEANUP,
+};
+
+struct test_desc {
+ enum msg_type type;
+ union {
+ struct {
+ in_addr_t reply_ip;
+ unsigned int port;
+ } ping;
+ struct xfrm_desc xfrm_desc;
+ } body;
+};
+
+static void write_msg(int fd, struct test_desc *msg)
+{
+ ssize_t bytes = write(fd, msg, sizeof(*msg));
+
+ /* Make sure that write/read is atomic to a pipe */
+ BUILD_BUG_ON(sizeof(struct test_desc) > PIPE_BUF);
+
+ if (bytes < 0) {
+ pr_perror("write()");
+ exit(1);
+ }
+ if (bytes != sizeof(*msg)) {
+ pr_perror("sent part of the message %zd/%zu", bytes, sizeof(*msg));
+ exit(1);
+ }
+}
+
+static void read_msg(int fd, struct test_desc *msg)
+{
+ ssize_t bytes = read(fd, msg, sizeof(*msg));
+
+ if (bytes < 0) {
+ pr_perror("read()");
+ exit(1);
+ }
+ if (bytes != sizeof(*msg)) {
+ pr_perror("got incomplete message %zd/%zu", bytes, sizeof(*msg));
+ exit(1);
+ }
+}
+
+static int udp_ping_init(struct in_addr listen_ip, unsigned int u_timeout,
+ unsigned int *server_port, int sock[2])
+{
+ struct sockaddr_in server;
+ struct timeval t = { .tv_sec = 0, .tv_usec = u_timeout };
+ socklen_t s_len = sizeof(server);
+
+ sock[0] = socket(AF_INET, SOCK_DGRAM, 0);
+ if (sock[0] < 0) {
+ pr_perror("socket()");
+ return -1;
+ }
+
+ server.sin_family = AF_INET;
+ server.sin_port = 0;
+ memcpy(&server.sin_addr.s_addr, &listen_ip, sizeof(struct in_addr));
+
+ if (bind(sock[0], (struct sockaddr *)&server, s_len)) {
+ pr_perror("bind()");
+ goto err_close_server;
+ }
+
+ if (getsockname(sock[0], (struct sockaddr *)&server, &s_len)) {
+ pr_perror("getsockname()");
+ goto err_close_server;
+ }
+
+ *server_port = ntohs(server.sin_port);
+
+ if (setsockopt(sock[0], SOL_SOCKET, SO_RCVTIMEO, (const char *)&t, sizeof t)) {
+ pr_perror("setsockopt()");
+ goto err_close_server;
+ }
+
+ sock[1] = socket(AF_INET, SOCK_DGRAM, 0);
+ if (sock[1] < 0) {
+ pr_perror("socket()");
+ goto err_close_server;
+ }
+
+ return 0;
+
+err_close_server:
+ close(sock[0]);
+ return -1;
+}
+
+static int udp_ping_send(int sock[2], in_addr_t dest_ip, unsigned int port,
+ char *buf, size_t buf_len)
+{
+ struct sockaddr_in server;
+ const struct sockaddr *dest_addr = (struct sockaddr *)&server;
+ char *sock_buf[buf_len];
+ ssize_t r_bytes, s_bytes;
+
+ server.sin_family = AF_INET;
+ server.sin_port = htons(port);
+ server.sin_addr.s_addr = dest_ip;
+
+ s_bytes = sendto(sock[1], buf, buf_len, 0, dest_addr, sizeof(server));
+ if (s_bytes < 0) {
+ pr_perror("sendto()");
+ return -1;
+ } else if (s_bytes != buf_len) {
+ pr_err("send part of the message: %zd/%zu", s_bytes, sizeof(server));
+ return -1;
+ }
+
+ r_bytes = recv(sock[0], sock_buf, buf_len, 0);
+ if (r_bytes < 0) {
+ if (errno != EAGAIN)
+ pr_perror("recv()");
+ return -1;
+ } else if (r_bytes == 0) { /* EOF */
+ pr_err("EOF on reply to ping");
+ return -1;
+ } else if (r_bytes != buf_len || memcmp(buf, sock_buf, buf_len)) {
+ pr_err("ping reply packet is corrupted %zd/%zu", r_bytes, buf_len);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int udp_ping_reply(int sock[2], in_addr_t dest_ip, unsigned int port,
+ char *buf, size_t buf_len)
+{
+ struct sockaddr_in server;
+ const struct sockaddr *dest_addr = (struct sockaddr *)&server;
+ char *sock_buf[buf_len];
+ ssize_t r_bytes, s_bytes;
+
+ server.sin_family = AF_INET;
+ server.sin_port = htons(port);
+ server.sin_addr.s_addr = dest_ip;
+
+ r_bytes = recv(sock[0], sock_buf, buf_len, 0);
+ if (r_bytes < 0) {
+ if (errno != EAGAIN)
+ pr_perror("recv()");
+ return -1;
+ }
+ if (r_bytes == 0) { /* EOF */
+ pr_err("EOF on reply to ping");
+ return -1;
+ }
+ if (r_bytes != buf_len || memcmp(buf, sock_buf, buf_len)) {
+ pr_err("ping reply packet is corrupted %zd/%zu", r_bytes, buf_len);
+ return -1;
+ }
+
+ s_bytes = sendto(sock[1], buf, buf_len, 0, dest_addr, sizeof(server));
+ if (s_bytes < 0) {
+ pr_perror("sendto()");
+ return -1;
+ } else if (s_bytes != buf_len) {
+ pr_err("send part of the message: %zd/%zu", s_bytes, sizeof(server));
+ return -1;
+ }
+
+ return 0;
+}
+
+typedef int (*ping_f)(int sock[2], in_addr_t dest_ip, unsigned int port,
+ char *buf, size_t buf_len);
+static int do_ping(int cmd_fd, char *buf, size_t buf_len, struct in_addr from,
+ bool init_side, int d_port, in_addr_t to, ping_f func)
+{
+ struct test_desc msg;
+ unsigned int s_port, i, ping_succeeded = 0;
+ int ping_sock[2];
+ char to_str[IPV4_STR_SZ] = {}, from_str[IPV4_STR_SZ] = {};
+
+ if (udp_ping_init(from, ping_timeout, &s_port, ping_sock)) {
+ pr_err("Failed to init ping");
+ return -1;
+ }
+
+ memset(&msg, 0, sizeof(msg));
+ msg.type = MSG_PING;
+ msg.body.ping.port = s_port;
+ memcpy(&msg.body.ping.reply_ip, &from, sizeof(from));
+
+ write_msg(cmd_fd, &msg);
+ if (init_side) {
+ /* The other end sends ip to ping */
+ read_msg(cmd_fd, &msg);
+ if (msg.type != MSG_PING)
+ return -1;
+ to = msg.body.ping.reply_ip;
+ d_port = msg.body.ping.port;
+ }
+
+ for (i = 0; i < ping_count ; i++) {
+ struct timespec sleep_time = {
+ .tv_sec = 0,
+ .tv_nsec = ping_delay_nsec,
+ };
+
+ ping_succeeded += !func(ping_sock, to, d_port, buf, page_size);
+ nanosleep(&sleep_time, 0);
+ }
+
+ close(ping_sock[0]);
+ close(ping_sock[1]);
+
+ strncpy(to_str, inet_ntoa(*(struct in_addr *)&to), IPV4_STR_SZ - 1);
+ strncpy(from_str, inet_ntoa(from), IPV4_STR_SZ - 1);
+
+ if (ping_succeeded < ping_success) {
+ pr_err("ping (%s) %s->%s failed %u/%u times",
+ init_side ? "send" : "reply", from_str, to_str,
+ ping_count - ping_succeeded, ping_count);
+ return -1;
+ }
+
+ pr_debug("ping (%s) %s->%s succeeded %u/%u times",
+ init_side ? "send" : "reply", from_str, to_str,
+ ping_succeeded, ping_count);
+
+ return 0;
+}
+
+static int randomize_buffer(void *buf, size_t buflen)
+{
+ int random_bytes = 0;
+
+ if (!buflen)
+ return 0;
+
+ do {
+ random_bytes += syscall(SYS_getrandom, buf, buflen, 0);
+ } while (random_bytes > 0 && random_bytes < buflen);
+
+ if (random_bytes < 0) {
+ pr_err("get_random() failed: %d\n", random_bytes);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int xfrm_fill_key(char *name, char *buf,
+ size_t buf_len, unsigned int *key_len)
+{
+ /* XXX: use set/map instead of all this */
+ if (strncmp(name, "digest_null", ALGO_LEN) == 0)
+ *key_len = 0;
+ else if (strncmp(name, "ecb(cipher_null)", ALGO_LEN) == 0)
+ *key_len = 0;
+ else if (strncmp(name, "cbc(des)", ALGO_LEN) == 0)
+ *key_len = 64;
+ else if (strncmp(name, "hmac(md5)", ALGO_LEN) == 0)
+ *key_len = 128;
+ else if (strncmp(name, "cmac(aes)", ALGO_LEN) == 0)
+ *key_len = 128;
+ else if (strncmp(name, "xcbc(aes)", ALGO_LEN) == 0)
+ *key_len = 128;
+ else if (strncmp(name, "cbc(cast5)", ALGO_LEN) == 0)
+ *key_len = 128;
+ else if (strncmp(name, "cbc(serpent)", ALGO_LEN) == 0)
+ *key_len = 128;
+ else if (strncmp(name, "hmac(sha1)", ALGO_LEN) == 0)
+ *key_len = 160;
+ else if (strncmp(name, "hmac(rmd160)", ALGO_LEN) == 0)
+ *key_len = 160;
+ else if (strncmp(name, "cbc(des3_ede)", ALGO_LEN) == 0)
+ *key_len = 192;
+ else if (strncmp(name, "hmac(sha256)", ALGO_LEN) == 0)
+ *key_len = 256;
+ else if (strncmp(name, "cbc(aes)", ALGO_LEN) == 0)
+ *key_len = 256;
+ else if (strncmp(name, "cbc(camellia)", ALGO_LEN) == 0)
+ *key_len = 256;
+ else if (strncmp(name, "cbc(twofish)", ALGO_LEN) == 0)
+ *key_len = 256;
+ else if (strncmp(name, "rfc3686(ctr(aes))", ALGO_LEN) == 0)
+ *key_len = 288;
+ else if (strncmp(name, "hmac(sha384)", ALGO_LEN) == 0)
+ *key_len = 384;
+ else if (strncmp(name, "cbc(blowfish)", ALGO_LEN) == 0)
+ *key_len = 448;
+ else if (strncmp(name, "hmac(sha512)", ALGO_LEN) == 0)
+ *key_len = 512;
+ else if (strncmp(name, "rfc4106(gcm(aes))-128", ALGO_LEN) == 0)
+ *key_len = 160;
+ else if (strncmp(name, "rfc4543(gcm(aes))-128", ALGO_LEN) == 0)
+ *key_len = 160;
+ else if (strncmp(name, "rfc4309(ccm(aes))-128", ALGO_LEN) == 0)
+ *key_len = 152;
+ else if (strncmp(name, "rfc4106(gcm(aes))-192", ALGO_LEN) == 0)
+ *key_len = 224;
+ else if (strncmp(name, "rfc4543(gcm(aes))-192", ALGO_LEN) == 0)
+ *key_len = 224;
+ else if (strncmp(name, "rfc4309(ccm(aes))-192", ALGO_LEN) == 0)
+ *key_len = 216;
+ else if (strncmp(name, "rfc4106(gcm(aes))-256", ALGO_LEN) == 0)
+ *key_len = 288;
+ else if (strncmp(name, "rfc4543(gcm(aes))-256", ALGO_LEN) == 0)
+ *key_len = 288;
+ else if (strncmp(name, "rfc4309(ccm(aes))-256", ALGO_LEN) == 0)
+ *key_len = 280;
+ else if (strncmp(name, "rfc7539(chacha20,poly1305)-128", ALGO_LEN) == 0)
+ *key_len = 0;
+
+ if (*key_len > buf_len) {
+ pr_err("Can't pack a key - too big for buffer");
+ return -1;
+ }
+
+ return randomize_buffer(buf, *key_len);
+}
+
+static int xfrm_state_pack_algo(struct nlmsghdr *nh, size_t req_sz,
+ struct xfrm_desc *desc)
+{
+ struct {
+ union {
+ struct xfrm_algo alg;
+ struct xfrm_algo_aead aead;
+ struct xfrm_algo_auth auth;
+ } u;
+ char buf[XFRM_ALGO_KEY_BUF_SIZE];
+ } alg = {};
+ size_t alen, elen, clen, aelen;
+ unsigned short type;
+
+ alen = strlen(desc->a_algo);
+ elen = strlen(desc->e_algo);
+ clen = strlen(desc->c_algo);
+ aelen = strlen(desc->ae_algo);
+
+ /* Verify desc */
+ switch (desc->proto) {
+ case IPPROTO_AH:
+ if (!alen || elen || clen || aelen) {
+ pr_err("BUG: buggy ah desc");
+ return -1;
+ }
+ strncpy(alg.u.alg.alg_name, desc->a_algo, ALGO_LEN);
+ if (xfrm_fill_key(desc->a_algo, alg.u.alg.alg_key,
+ sizeof(alg.buf), &alg.u.alg.alg_key_len))
+ return -1;
+ type = XFRMA_ALG_AUTH;
+ break;
+ case IPPROTO_COMP:
+ if (!clen || elen || alen || aelen) {
+ pr_err("BUG: buggy comp desc");
+ return -1;
+ }
+ strncpy(alg.u.alg.alg_name, desc->c_algo, ALGO_LEN);
+ if (xfrm_fill_key(desc->c_algo, alg.u.alg.alg_key,
+ sizeof(alg.buf), &alg.u.alg.alg_key_len))
+ return -1;
+ type = XFRMA_ALG_COMP;
+ break;
+ case IPPROTO_ESP:
+ if (!((alen && elen) ^ aelen) || clen) {
+ pr_err("BUG: buggy esp desc");
+ return -1;
+ }
+ if (aelen) {
+ alg.u.aead.alg_icv_len = desc->icv_len;
+ strncpy(alg.u.aead.alg_name, desc->ae_algo, ALGO_LEN);
+ if (xfrm_fill_key(desc->ae_algo, alg.u.aead.alg_key,
+ sizeof(alg.buf), &alg.u.aead.alg_key_len))
+ return -1;
+ type = XFRMA_ALG_AEAD;
+ } else {
+
+ strncpy(alg.u.alg.alg_name, desc->e_algo, ALGO_LEN);
+ type = XFRMA_ALG_CRYPT;
+ if (xfrm_fill_key(desc->e_algo, alg.u.alg.alg_key,
+ sizeof(alg.buf), &alg.u.alg.alg_key_len))
+ return -1;
+ if (rtattr_pack(nh, req_sz, type, &alg, sizeof(alg)))
+ return -1;
+
+ strncpy(alg.u.alg.alg_name, desc->a_algo, ALGO_LEN);
+ type = XFRMA_ALG_AUTH;
+ if (xfrm_fill_key(desc->a_algo, alg.u.alg.alg_key,
+ sizeof(alg.buf), &alg.u.alg.alg_key_len))
+ return -1;
+ }
+ break;
+ default:
+ pr_err("BUG: unknown proto in desc");
+ return -1;
+ }
+
+ if (rtattr_pack(nh, req_sz, type, &alg, sizeof(alg)))
+ return -1;
+
+ return 0;
+}
+
+static inline uint32_t gen_spi(struct in_addr src)
+{
+ return htonl(inet_lnaof(src));
+}
+
+static int xfrm_state_add(int xfrm_sock, uint32_t seq, uint32_t spi,
+ struct in_addr src, struct in_addr dst,
+ struct xfrm_desc *desc)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct xfrm_usersa_info info;
+ char attrbuf[MAX_PAYLOAD];
+ } req;
+
+ memset(&req, 0, sizeof(req));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.info));
+ req.nh.nlmsg_type = XFRM_MSG_NEWSA;
+ req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ req.nh.nlmsg_seq = seq;
+
+ /* Fill selector. */
+ memcpy(&req.info.sel.daddr, &dst, sizeof(dst));
+ memcpy(&req.info.sel.saddr, &src, sizeof(src));
+ req.info.sel.family = AF_INET;
+ req.info.sel.prefixlen_d = PREFIX_LEN;
+ req.info.sel.prefixlen_s = PREFIX_LEN;
+
+ /* Fill id */
+ memcpy(&req.info.id.daddr, &dst, sizeof(dst));
+ /* Note: zero-spi cannot be deleted */
+ req.info.id.spi = spi;
+ req.info.id.proto = desc->proto;
+
+ memcpy(&req.info.saddr, &src, sizeof(src));
+
+ /* Fill lifteme_cfg */
+ req.info.lft.soft_byte_limit = XFRM_INF;
+ req.info.lft.hard_byte_limit = XFRM_INF;
+ req.info.lft.soft_packet_limit = XFRM_INF;
+ req.info.lft.hard_packet_limit = XFRM_INF;
+
+ req.info.family = AF_INET;
+ req.info.mode = XFRM_MODE_TUNNEL;
+
+ /* XXX: Fill seq, reqid, replay_window, flags? */
+
+ if (xfrm_state_pack_algo(&req.nh, sizeof(req), desc))
+ return -1;
+
+ if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ return -1;
+ }
+
+ return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_set(int xfrm_sock, uint32_t *seq,
+ struct in_addr src, struct in_addr dst,
+ struct in_addr tunsrc, struct in_addr tundst,
+ struct xfrm_desc *desc)
+{
+ int err;
+
+ err = xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst, desc);
+ if (err) {
+ pr_err("Failed to add xfrm state");
+ return -1;
+ }
+
+ err = xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), dst, src, desc);
+ if (err) {
+ pr_err("Failed to add xfrm state");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int xfrm_policy_add(int xfrm_sock, uint32_t seq, uint32_t spi,
+ struct in_addr src, struct in_addr dst, uint8_t dir,
+ struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct xfrm_userpolicy_info info;
+ char attrbuf[MAX_PAYLOAD];
+ } req;
+ struct xfrm_user_tmpl tmpl;
+
+ memset(&req, 0, sizeof(req));
+ memset(&tmpl, 0, sizeof(tmpl));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.info));
+ req.nh.nlmsg_type = XFRM_MSG_NEWPOLICY;
+ req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ req.nh.nlmsg_seq = seq;
+
+ /* Fill selector. */
+ memcpy(&req.info.sel.daddr, &dst, sizeof(tundst));
+ memcpy(&req.info.sel.saddr, &src, sizeof(tunsrc));
+ req.info.sel.family = AF_INET;
+ req.info.sel.prefixlen_d = PREFIX_LEN;
+ req.info.sel.prefixlen_s = PREFIX_LEN;
+
+ /* Fill lifteme_cfg */
+ req.info.lft.soft_byte_limit = XFRM_INF;
+ req.info.lft.hard_byte_limit = XFRM_INF;
+ req.info.lft.soft_packet_limit = XFRM_INF;
+ req.info.lft.hard_packet_limit = XFRM_INF;
+
+ req.info.dir = dir;
+
+ /* Fill tmpl */
+ memcpy(&tmpl.id.daddr, &dst, sizeof(dst));
+ /* Note: zero-spi cannot be deleted */
+ tmpl.id.spi = spi;
+ tmpl.id.proto = proto;
+ tmpl.family = AF_INET;
+ memcpy(&tmpl.saddr, &src, sizeof(src));
+ tmpl.mode = XFRM_MODE_TUNNEL;
+ tmpl.aalgos = (~(uint32_t)0);
+ tmpl.ealgos = (~(uint32_t)0);
+ tmpl.calgos = (~(uint32_t)0);
+
+ if (rtattr_pack(&req.nh, sizeof(req), XFRMA_TMPL, &tmpl, sizeof(tmpl)))
+ return -1;
+
+ if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ return -1;
+ }
+
+ return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_prepare(int xfrm_sock, uint32_t *seq,
+ struct in_addr src, struct in_addr dst,
+ struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+ if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst,
+ XFRM_POLICY_OUT, tunsrc, tundst, proto)) {
+ pr_err("Failed to add xfrm policy");
+ return -1;
+ }
+
+ if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), dst, src,
+ XFRM_POLICY_IN, tunsrc, tundst, proto)) {
+ pr_err("Failed to add xfrm policy");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int xfrm_policy_del(int xfrm_sock, uint32_t seq,
+ struct in_addr src, struct in_addr dst, uint8_t dir,
+ struct in_addr tunsrc, struct in_addr tundst)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct xfrm_userpolicy_id id;
+ char attrbuf[MAX_PAYLOAD];
+ } req;
+
+ memset(&req, 0, sizeof(req));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.id));
+ req.nh.nlmsg_type = XFRM_MSG_DELPOLICY;
+ req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ req.nh.nlmsg_seq = seq;
+
+ /* Fill id */
+ memcpy(&req.id.sel.daddr, &dst, sizeof(tundst));
+ memcpy(&req.id.sel.saddr, &src, sizeof(tunsrc));
+ req.id.sel.family = AF_INET;
+ req.id.sel.prefixlen_d = PREFIX_LEN;
+ req.id.sel.prefixlen_s = PREFIX_LEN;
+ req.id.dir = dir;
+
+ if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ return -1;
+ }
+
+ return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_cleanup(int xfrm_sock, uint32_t *seq,
+ struct in_addr src, struct in_addr dst,
+ struct in_addr tunsrc, struct in_addr tundst)
+{
+ if (xfrm_policy_del(xfrm_sock, (*seq)++, src, dst,
+ XFRM_POLICY_OUT, tunsrc, tundst)) {
+ pr_err("Failed to add xfrm policy");
+ return -1;
+ }
+
+ if (xfrm_policy_del(xfrm_sock, (*seq)++, dst, src,
+ XFRM_POLICY_IN, tunsrc, tundst)) {
+ pr_err("Failed to add xfrm policy");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int xfrm_state_del(int xfrm_sock, uint32_t seq, uint32_t spi,
+ struct in_addr src, struct in_addr dst, uint8_t proto)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct xfrm_usersa_id id;
+ char attrbuf[MAX_PAYLOAD];
+ } req;
+ xfrm_address_t saddr = {};
+
+ memset(&req, 0, sizeof(req));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.id));
+ req.nh.nlmsg_type = XFRM_MSG_DELSA;
+ req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ req.nh.nlmsg_seq = seq;
+
+ memcpy(&req.id.daddr, &dst, sizeof(dst));
+ req.id.family = AF_INET;
+ req.id.proto = proto;
+ /* Note: zero-spi cannot be deleted */
+ req.id.spi = spi;
+
+ memcpy(&saddr, &src, sizeof(src));
+ if (rtattr_pack(&req.nh, sizeof(req), XFRMA_SRCADDR, &saddr, sizeof(saddr)))
+ return -1;
+
+ if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ return -1;
+ }
+
+ return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_delete(int xfrm_sock, uint32_t *seq,
+ struct in_addr src, struct in_addr dst,
+ struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+ if (xfrm_state_del(xfrm_sock, (*seq)++, gen_spi(src), src, dst, proto)) {
+ pr_err("Failed to remove xfrm state");
+ return -1;
+ }
+
+ if (xfrm_state_del(xfrm_sock, (*seq)++, gen_spi(src), dst, src, proto)) {
+ pr_err("Failed to remove xfrm state");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int xfrm_state_allocspi(int xfrm_sock, uint32_t *seq,
+ uint32_t spi, uint8_t proto)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct xfrm_userspi_info spi;
+ } req;
+ struct {
+ struct nlmsghdr nh;
+ union {
+ struct xfrm_usersa_info info;
+ int error;
+ };
+ } answer;
+
+ memset(&req, 0, sizeof(req));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.spi));
+ req.nh.nlmsg_type = XFRM_MSG_ALLOCSPI;
+ req.nh.nlmsg_flags = NLM_F_REQUEST;
+ req.nh.nlmsg_seq = (*seq)++;
+
+ req.spi.info.family = AF_INET;
+ req.spi.min = spi;
+ req.spi.max = spi;
+ req.spi.info.id.proto = proto;
+
+ if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ return -1;
+ }
+
+ if (recv(xfrm_sock, &answer, sizeof(answer), 0) < 0) {
+ pr_perror("recv()");
+ return -1;
+ } else if (answer.nh.nlmsg_type == XFRM_MSG_NEWSA) {
+ uint32_t new_spi = htonl(answer.info.id.spi);
+
+ if (new_spi != spi) {
+ pr_err("allocated spi is different from requested: %#x != %#x",
+ new_spi, spi);
+ return -1;
+ }
+ return 0;
+ } else if (answer.nh.nlmsg_type != NLMSG_ERROR) {
+ pr_err("expected NLMSG_ERROR, got %d", (int)answer.nh.nlmsg_type);
+ return -1;
+ }
+
+ pr_err("NLMSG_ERROR: %d: %s", answer.error, strerror(-answer.error));
+ return answer.error;
+}
+
+static int netlink_sock_bind(int *sock, uint32_t *seq, int proto, uint32_t groups)
+{
+ struct sockaddr_nl snl = {};
+ socklen_t addr_len;
+ int ret = -1;
+
+ snl.nl_family = AF_NETLINK;
+ snl.nl_groups = groups;
+
+ if (netlink_sock(sock, seq, proto)) {
+ pr_err("Failed to open xfrm netlink socket");
+ return -1;
+ }
+
+ if (bind(*sock, (struct sockaddr *)&snl, sizeof(snl)) < 0) {
+ pr_perror("bind()");
+ goto out_close;
+ }
+
+ addr_len = sizeof(snl);
+ if (getsockname(*sock, (struct sockaddr *)&snl, &addr_len) < 0) {
+ pr_perror("getsockname()");
+ goto out_close;
+ }
+ if (addr_len != sizeof(snl)) {
+ pr_err("Wrong address length %d", addr_len);
+ goto out_close;
+ }
+ if (snl.nl_family != AF_NETLINK) {
+ pr_err("Wrong address family %d", snl.nl_family);
+ goto out_close;
+ }
+ return 0;
+
+out_close:
+ close(*sock);
+ return ret;
+}
+
+static int xfrm_monitor_acquire(int xfrm_sock, uint32_t *seq, unsigned int nr)
+{
+ struct {
+ struct nlmsghdr nh;
+ union {
+ struct xfrm_user_acquire acq;
+ int error;
+ };
+ char attrbuf[MAX_PAYLOAD];
+ } req;
+ struct xfrm_user_tmpl xfrm_tmpl = {};
+ int xfrm_listen = -1, ret = -1;
+ uint32_t seq_listen;
+
+ if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_ACQUIRE))
+ return -1;
+
+ memset(&req, 0, sizeof(req));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.acq));
+ req.nh.nlmsg_type = XFRM_MSG_ACQUIRE;
+ req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ req.nh.nlmsg_seq = (*seq)++;
+
+ req.acq.policy.sel.family = AF_INET;
+ req.acq.aalgos = 0xfeed;
+ req.acq.ealgos = 0xbaad;
+ req.acq.calgos = 0xbabe;
+
+ xfrm_tmpl.family = AF_INET;
+ xfrm_tmpl.id.proto = IPPROTO_ESP;
+ if (rtattr_pack(&req.nh, sizeof(req), XFRMA_TMPL, &xfrm_tmpl, sizeof(xfrm_tmpl)))
+ goto out_close;
+
+ if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ goto out_close;
+ }
+
+ if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+ pr_perror("recv()");
+ goto out_close;
+ } else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+ pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+ goto out_close;
+ }
+
+ if (req.error) {
+ pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+ ret = req.error;
+ goto out_close;
+ }
+
+ if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+ pr_perror("recv()");
+ goto out_close;
+ }
+
+ if (req.acq.aalgos != 0xfeed || req.acq.ealgos != 0xbaad
+ || req.acq.calgos != 0xbabe) {
+ pr_err("xfrm_user_acquire has changed %x %x %x",
+ req.acq.aalgos, req.acq.ealgos, req.acq.calgos);
+ goto out_close;
+ }
+
+ ret = 0;
+out_close:
+ close(xfrm_listen);
+ return ret;
+}
+
+static int xfrm_expire_state(int xfrm_sock, uint32_t *seq,
+ unsigned int nr, struct xfrm_desc *desc)
+{
+ struct {
+ struct nlmsghdr nh;
+ union {
+ struct xfrm_user_expire expire;
+ int error;
+ };
+ } req;
+ struct in_addr src, dst;
+ int xfrm_listen = -1, ret = -1;
+ uint32_t seq_listen;
+
+ src = inet_makeaddr(INADDR_B, child_ip(nr));
+ dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+
+ if (xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst, desc)) {
+ pr_err("Failed to add xfrm state");
+ return -1;
+ }
+
+ if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_EXPIRE))
+ return -1;
+
+ memset(&req, 0, sizeof(req));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.expire));
+ req.nh.nlmsg_type = XFRM_MSG_EXPIRE;
+ req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ req.nh.nlmsg_seq = (*seq)++;
+
+ memcpy(&req.expire.state.id.daddr, &dst, sizeof(dst));
+ req.expire.state.id.spi = gen_spi(src);
+ req.expire.state.id.proto = desc->proto;
+ req.expire.state.family = AF_INET;
+ req.expire.hard = 0xff;
+
+ if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ goto out_close;
+ }
+
+ if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+ pr_perror("recv()");
+ goto out_close;
+ } else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+ pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+ goto out_close;
+ }
+
+ if (req.error) {
+ pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+ ret = req.error;
+ goto out_close;
+ }
+
+ if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+ pr_perror("recv()");
+ goto out_close;
+ }
+
+ if (req.expire.hard != 0x1) {
+ pr_err("expire.hard is not set: %x", req.expire.hard);
+ goto out_close;
+ }
+
+ ret = 0;
+out_close:
+ close(xfrm_listen);
+ return ret;
+}
+
+static int xfrm_expire_policy(int xfrm_sock, uint32_t *seq,
+ unsigned int nr, struct xfrm_desc *desc)
+{
+ struct {
+ struct nlmsghdr nh;
+ union {
+ struct xfrm_user_polexpire expire;
+ int error;
+ };
+ } req;
+ struct in_addr src, dst, tunsrc, tundst;
+ int xfrm_listen = -1, ret = -1;
+ uint32_t seq_listen;
+
+ src = inet_makeaddr(INADDR_B, child_ip(nr));
+ dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+ tunsrc = inet_makeaddr(INADDR_A, child_ip(nr));
+ tundst = inet_makeaddr(INADDR_A, grchild_ip(nr));
+
+ if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst,
+ XFRM_POLICY_OUT, tunsrc, tundst, desc->proto)) {
+ pr_err("Failed to add xfrm policy");
+ return -1;
+ }
+
+ if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_EXPIRE))
+ return -1;
+
+ memset(&req, 0, sizeof(req));
+ req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(req.expire));
+ req.nh.nlmsg_type = XFRM_MSG_POLEXPIRE;
+ req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
+ req.nh.nlmsg_seq = (*seq)++;
+
+ /* Fill selector. */
+ memcpy(&req.expire.pol.sel.daddr, &dst, sizeof(tundst));
+ memcpy(&req.expire.pol.sel.saddr, &src, sizeof(tunsrc));
+ req.expire.pol.sel.family = AF_INET;
+ req.expire.pol.sel.prefixlen_d = PREFIX_LEN;
+ req.expire.pol.sel.prefixlen_s = PREFIX_LEN;
+ req.expire.pol.dir = XFRM_POLICY_OUT;
+ req.expire.hard = 0xff;
+
+ if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+ pr_perror("write()");
+ goto out_close;
+ }
+
+ if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+ pr_perror("recv()");
+ goto out_close;
+ } else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+ pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+ goto out_close;
+ }
+
+ if (req.error) {
+ pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+ ret = req.error;
+ goto out_close;
+ }
+
+ if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+ pr_perror("recv()");
+ goto out_close;
+ }
+
+ if (req.expire.hard != 0x1) {
+ pr_err("expire.hard is not set: %x", req.expire.hard);
+ goto out_close;
+ }
+
+ ret = 0;
+out_close:
+ close(xfrm_listen);
+ return ret;
+}
+
+static void print_desc(char *lvl, char *msg, struct xfrm_desc *desc)
+{
+ printk("%s: [%u, '%s', '%s', '%s', '%s', %u]", lvl, msg,
+ (unsigned int)desc->proto, desc->a_algo, desc->e_algo,
+ desc->c_algo, desc->ae_algo, desc->icv_len);
+}
+
+static int child_serv(int xfrm_sock, uint32_t *seq,
+ unsigned int nr, int cmd_fd, void *buf, struct xfrm_desc *desc)
+{
+ struct in_addr src, dst, tunsrc, tundst;
+ struct test_desc msg;
+ int ret = -1;
+
+ src = inet_makeaddr(INADDR_B, child_ip(nr));
+ dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+ tunsrc = inet_makeaddr(INADDR_A, child_ip(nr));
+ tundst = inet_makeaddr(INADDR_A, grchild_ip(nr));
+
+ /* UDP pinging without xfrm */
+ if (do_ping(cmd_fd, buf, page_size, src, true, 0, 0, udp_ping_send)) {
+ pr_err("ping failed before setting xfrm");
+ return -1;
+ }
+
+ memset(&msg, 0, sizeof(msg));
+ msg.type = MSG_XFRM_PREPARE;
+ memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+ write_msg(cmd_fd, &msg);
+
+ if (xfrm_prepare(xfrm_sock, seq, src, dst, tunsrc, tundst, desc->proto)) {
+ print_desc("ERR", "failed to prepare xfrm", desc);
+ goto cleanup;
+ }
+
+ memset(&msg, 0, sizeof(msg));
+ msg.type = MSG_XFRM_ADD;
+ memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+ write_msg(cmd_fd, &msg);
+ if (xfrm_set(xfrm_sock, seq, src, dst, tunsrc, tundst, desc)) {
+ print_desc("ERR", "failed to set xfrm", desc);
+ goto cleanup;
+ }
+
+ /* UDP pinging with xfrm tunnel */
+ if (do_ping(cmd_fd, buf, page_size, tunsrc,
+ true, 0, 0, udp_ping_send)) {
+ print_desc("ERR", "ping failed for xfrm", desc);
+ goto delete;
+ }
+
+ print_desc("OK", "xfrm", desc);
+ ret = 0;
+delete:
+ /* xfrm delete */
+ memset(&msg, 0, sizeof(msg));
+ msg.type = MSG_XFRM_DEL;
+ memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+ write_msg(cmd_fd, &msg);
+
+ if (xfrm_delete(xfrm_sock, seq, src, dst, tunsrc, tundst, desc->proto)) {
+ print_desc("ERR", "ping to remove xfrm", desc);
+ ret = -1;
+ }
+
+cleanup:
+ memset(&msg, 0, sizeof(msg));
+ msg.type = MSG_XFRM_CLEANUP;
+ memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+ write_msg(cmd_fd, &msg);
+ if (xfrm_cleanup(xfrm_sock, seq, src, dst, tunsrc, tundst)) {
+ print_desc("ERR", "ping to cleanup xfrm", desc);
+ ret = -1;
+ }
+ return ret;
+}
+
+static int child_f(unsigned int nr, int test_desc_fd, int cmd_fd, void *buf)
+{
+ struct xfrm_desc desc;
+ struct test_desc msg;
+ int xfrm_sock = -1;
+ uint32_t seq;
+ int ret = 1;
+
+ if (switch_ns(nsfd_childa))
+ exit(1);
+
+ if (netlink_sock(&xfrm_sock, &seq, NETLINK_XFRM)) {
+ pr_err("Failed to open xfrm netlink socket");
+ return -1;
+ }
+
+ /* Check that seq sock is ready, just for sure. */
+ memset(&msg, 0, sizeof(msg));
+ msg.type = MSG_ACK;
+ write_msg(cmd_fd, &msg);
+ read_msg(cmd_fd, &msg);
+ if (msg.type != MSG_ACK) {
+ pr_err("Ack failed");
+ exit(1);
+ }
+
+ for (;;) {
+ ssize_t received = read(test_desc_fd, &desc, sizeof(desc));
+
+ if (received == 0) /* EOF */
+ break;
+
+ if (received != sizeof(desc)) {
+ pr_perror("read() returned %zd", received);
+ goto exit;
+ }
+
+ switch (desc.type) {
+ case CREATE_TUNNEL:
+ if (child_serv(xfrm_sock, &seq, nr, cmd_fd, buf, &desc))
+ goto exit;
+ break;
+ case ALLOCATE_SPI:
+ if (xfrm_state_allocspi(xfrm_sock, &seq, -1, desc.proto)) {
+ pr_err("allocspi failed");
+ goto exit;
+ }
+ pr_ok("allocspi");
+ break;
+ case MONITOR_ACQUIRE:
+ if (xfrm_monitor_acquire(xfrm_sock, &seq, nr)) {
+ pr_err("monitor acqure failed");
+ goto exit;
+ }
+ pr_ok("monitor acqure");
+ break;
+ case EXPIRE_STATE:
+ if (xfrm_expire_state(xfrm_sock, &seq, nr, &desc)) {
+ pr_err("expire state failed");
+ goto exit;
+ }
+ pr_ok("expire state");
+ break;
+ case EXPIRE_POLICY:
+ if (xfrm_expire_policy(xfrm_sock, &seq, nr, &desc)) {
+ pr_err("expire policy failed");
+ goto exit;
+ }
+ pr_ok("expire policy");
+ break;
+ default:
+ pr_err("Unknown desc type");
+ goto exit;
+ }
+ }
+
+ ret = 0;
+exit:
+ close(xfrm_sock);
+
+ msg.type = MSG_EXIT;
+ write_msg(cmd_fd, &msg);
+ exit(ret);
+}
+
+static int grand_child_serv(unsigned int nr, int cmd_fd, void *buf,
+ struct test_desc *msg, int xfrm_sock, uint32_t *seq)
+{
+ struct in_addr src, dst, tunsrc, tundst;
+ bool tun_reply;
+ struct xfrm_desc *desc = &msg->body.xfrm_desc;
+
+ src = inet_makeaddr(INADDR_B, grchild_ip(nr));
+ dst = inet_makeaddr(INADDR_B, child_ip(nr));
+ tunsrc = inet_makeaddr(INADDR_A, grchild_ip(nr));
+ tundst = inet_makeaddr(INADDR_A, child_ip(nr));
+
+ switch (msg->type) {
+ case MSG_EXIT:
+ exit(0);
+ case MSG_ACK:
+ write_msg(cmd_fd, msg);
+ break;
+ case MSG_PING:
+ tun_reply = memcmp(&dst, &msg->body.ping.reply_ip, sizeof(in_addr_t));
+ /* UDP pinging without xfrm */
+ if (do_ping(cmd_fd, buf, page_size, tun_reply ? tunsrc : src,
+ false, msg->body.ping.port,
+ msg->body.ping.reply_ip, udp_ping_reply)) {
+ pr_err("ping failed before setting xfrm");
+ return -1;
+ }
+ break;
+ case MSG_XFRM_PREPARE:
+ if (xfrm_prepare(xfrm_sock, seq, src, dst, tunsrc, tundst,
+ desc->proto)) {
+ print_desc("ERR", "failed to prepare xfrm", desc);
+ return -1;
+ }
+ break;
+ case MSG_XFRM_ADD:
+ if (xfrm_set(xfrm_sock, seq, src, dst, tunsrc, tundst, desc)) {
+ print_desc("ERR", "failed to set xfrm", desc);
+ return -1;
+ }
+ break;
+ case MSG_XFRM_DEL:
+ if (xfrm_delete(xfrm_sock, seq, src, dst, tunsrc, tundst,
+ desc->proto)) {
+ print_desc("ERR", "failed to remove xfrm", desc);
+ return -1;
+ }
+ break;
+ case MSG_XFRM_CLEANUP:
+ if (xfrm_cleanup(xfrm_sock, seq, src, dst, tunsrc, tundst)) {
+ print_desc("ERR", "failed to cleanup xfrm", desc);
+ return -1;
+ }
+ break;
+ default:
+ pr_err("got unknown msg type %d\n", msg->type);
+ return -1;
+ };
+
+ return 0;
+}
+
+static int grand_child_f(unsigned int nr, int cmd_fd, void *buf)
+{
+ struct test_desc msg;
+ int xfrm_sock = -1;
+ uint32_t seq;
+
+ if (switch_ns(nsfd_childb))
+ exit(1);
+
+ if (netlink_sock(&xfrm_sock, &seq, NETLINK_XFRM)) {
+ pr_err("Failed to open xfrm netlink socket");
+ return -1;
+ }
+
+ do {
+ read_msg(cmd_fd, &msg);
+ if (grand_child_serv(nr, cmd_fd, buf, &msg, xfrm_sock, &seq))
+ break;
+ } while (1);
+
+ close(xfrm_sock);
+ exit(1);
+}
+
+static int start_child(unsigned int nr, char *veth, int test_desc_fd[2])
+{
+ uint32_t route_seq;
+ int cmd_sock[2];
+ void *data_map;
+ pid_t child;
+
+ if (init_child(nsfd_childa, veth, child_ip(nr), grchild_ip(nr)))
+ return -1;
+
+ if (init_child(nsfd_childb, veth, grchild_ip(nr), child_ip(nr)))
+ return -1;
+
+ child = fork();
+ if (child < 0) {
+ pr_perror("fork()");
+ return -1;
+ } else if (child) {
+ /* in parent - selftest */
+ return switch_ns(nsfd_parent);
+ }
+
+ if (close(test_desc_fd[1])) {
+ pr_perror("close()");
+ return -1;
+ }
+
+ /* child */
+ data_map = mmap(0, page_size, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+ if (data_map == MAP_FAILED) {
+ pr_perror("mmap()");
+ return -1;
+ }
+ if (randomize_buffer(data_map, page_size))
+ return -1;
+
+ if (socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, cmd_sock)) {
+ pr_perror("socketpair()");
+ return -1;
+ }
+
+ child = fork();
+ if (child < 0) {
+ pr_perror("fork()");
+ return -1;
+ } else if (child) {
+ if (close(cmd_sock[0])) {
+ pr_perror("close()");
+ return -1;
+ }
+ return child_f(nr, test_desc_fd[0], cmd_sock[1], data_map);
+ }
+ if (close(cmd_sock[1])) {
+ pr_perror("close()");
+ return -1;
+ }
+ return grand_child_f(nr, cmd_sock[0], data_map);
+}
+
+static void usage_exit(char **argv)
+{
+ fprintf(stderr, "Usage: %s [nr_process]\n", argv[0]);
+ exit(1);
+}
+
+static int write_desc(int proto, int test_desc_fd,
+ char *a, char *e, char *c, char *ae)
+{
+ struct xfrm_desc desc = {};
+
+ desc.type = CREATE_TUNNEL;
+ desc.proto = proto;
+
+ if (a)
+ strncpy(desc.a_algo, a, ALGO_LEN);
+ if (e)
+ strncpy(desc.e_algo, e, ALGO_LEN);
+ if (c)
+ strncpy(desc.c_algo, c, ALGO_LEN);
+ if (ae)
+ strncpy(desc.ae_algo, ae, ALGO_LEN);
+
+ return write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc);
+}
+
+int proto_list[] = { IPPROTO_AH, IPPROTO_COMP, IPPROTO_ESP };
+char *ah_list[] = {
+ "digest_null", "hmac(md5)", "hmac(sha1)", "hmac(sha256)",
+ "hmac(sha384)", "hmac(sha512)", "hmac(rmd160)",
+ "xcbc(aes)", "cmac(aes)"
+};
+char *comp_list[] = {
+ "deflate"
+#if 0
+ /* No compression backend realization */
+ "lzs", "lzjh"
+#endif
+};
+char *e_list[] = {
+ "ecb(cipher_null)", "cbc(des)", "cbc(des3_ede)", "cbc(cast5)",
+ "cbc(blowfish)", "cbc(aes)", "cbc(serpent)", "cbc(camellia)",
+ "cbc(twofish)", "rfc3686(ctr(aes))"
+};
+char *ae_list[] = {
+#if 0
+ /* not implemented */
+ "rfc4106(gcm(aes))", "rfc4309(ccm(aes))", "rfc4543(gcm(aes))",
+ "rfc7539esp(chacha20,poly1305)"
+#endif
+};
+
+static int write_proto_plan(int fd, int proto)
+{
+ unsigned int i;
+
+ switch (proto) {
+ case IPPROTO_AH:
+ for (i = 0; i < ARRAY_SIZE(ah_list); i++) {
+ if (write_desc(proto, fd, ah_list[i], 0, 0, 0)) {
+ pr_err("writing test's desc failed");
+ return -1;
+ }
+ }
+ break;
+ case IPPROTO_COMP:
+ for (i = 0; i < ARRAY_SIZE(comp_list); i++) {
+ if (write_desc(proto, fd, 0, 0, comp_list[i], 0)) {
+ pr_err("writing test's desc failed");
+ return -1;
+ }
+ }
+ break;
+ case IPPROTO_ESP:
+ for (i = 0; i < ARRAY_SIZE(ah_list); i++) {
+ int j;
+
+ for (j = 0; j < ARRAY_SIZE(e_list); j++) {
+ if (write_desc(proto, fd, ah_list[i],
+ e_list[j], 0, 0)) {
+ pr_err("writing test's desc failed");
+ return -1;
+ }
+ }
+ }
+ for (i = 0; i < ARRAY_SIZE(ae_list); i++) {
+ if (write_desc(proto, fd, 0, 0, 0, ae_list[i])) {
+ pr_err("writing test's desc failed");
+ return -1;
+ }
+ }
+ break;
+ default:
+ pr_err("BUG: Specified unknown proto %d", proto);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int write_compat_struct_tests(int test_desc_fd)
+{
+ struct xfrm_desc desc = {};
+
+ desc.type = ALLOCATE_SPI;
+ desc.proto = IPPROTO_AH;
+ strncpy(desc.a_algo, ah_list[0], ALGO_LEN);
+
+ if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+ return -1;
+
+ desc.type = MONITOR_ACQUIRE;
+ if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+ return -1;
+
+ desc.type = EXPIRE_STATE;
+ if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+ return -1;
+
+ desc.type = EXPIRE_POLICY;
+ if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+ return -1;
+
+ return 0;
+}
+
+static int write_test_plan(int test_desc_fd)
+{
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(proto_list); i++) {
+ if (write_proto_plan(test_desc_fd, proto_list[i]))
+ return -1;
+ }
+
+ if (write_compat_struct_tests(test_desc_fd))
+ return -1;
+
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ unsigned int nr_process = 1;
+ int route_sock = -1, ret = 1;
+ int test_desc_fd[2];
+ uint32_t route_seq;
+ unsigned int i;
+
+ if (argc > 2)
+ usage_exit(argv);
+
+ if (argc > 1) {
+ char *endptr;
+
+ errno = 0;
+ nr_process = strtol(argv[1], &endptr, 10);
+ if ((errno == ERANGE && (nr_process == LONG_MAX || nr_process == LONG_MIN))
+ || (errno != 0 && nr_process == 0)
+ || (endptr == argv[1]) || (*endptr != '\0')) {
+ pr_err("Failed to parse [nr_process]");
+ usage_exit(argv);
+ }
+
+ if (nr_process > MAX_PROCESSES || !nr_process) {
+ pr_err("nr_process should be between [1; %u]", MAX_PROCESSES);
+ usage_exit(argv);
+ }
+ }
+
+ page_size = sysconf(_SC_PAGESIZE);
+ if (page_size < 1) {
+ pr_perror("sysconf()");
+ return 1;
+ }
+
+ if (pipe2(test_desc_fd, O_DIRECT) < 0) {
+ pr_perror("pipe()");
+ return 1;
+ }
+
+ if (init_namespaces()) {
+ pr_err("Failed to create namespaces");
+ return 1;
+ }
+
+ if (netlink_sock(&route_sock, &route_seq, NETLINK_ROUTE)) {
+ pr_err("Failed to open netlink route socket");
+ return 1;
+ }
+
+ for (i = 0; i < nr_process; i++) {
+ char veth[VETH_LEN];
+
+ snprintf(veth, VETH_LEN, VETH_FMT, i);
+
+ if (veth_add(route_sock, route_seq++, veth, nsfd_childa, veth, nsfd_childb)) {
+ pr_err("Failed to create veth device");
+ goto err;
+ }
+
+ if (start_child(i, veth, test_desc_fd)) {
+ pr_err("Child failed to start");
+ goto err;
+ }
+ }
+
+ if (close(test_desc_fd[0])) {
+ pr_perror("close()");
+ goto err;
+ }
+
+ ret = write_test_plan(test_desc_fd[1]);
+ /* XXX: add wait() */
+err:
+ close(route_sock);
+ return ret;
+}
--
2.13.6


2018-07-26 04:24:05

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 06/18] netlink: Do not subscribe to non-existent groups

From: Dmitry Safonov <[email protected]>
Date: Thu, 26 Jul 2018 03:31:32 +0100

> Make ABI more strict about subscribing to group > ngroups.
> Code doesn't check for that and it looks bogus.
> (one can subscribe to non-existing group)
> Still, it's possible to bind() to all possible groups with (-1)
>
> Cc: "David S. Miller" <[email protected]>
> Cc: Herbert Xu <[email protected]>
> Cc: Steffen Klassert <[email protected]>
> Cc: [email protected]
> Signed-off-by: Dmitry Safonov <[email protected]>

This really has nothing to do with adding a compat layer for xfrm,
and is a bug fix that should be submitted separately in it's own
right.

2018-07-26 08:51:40

by Florian Westphal

[permalink] [raw]
Subject: Re: [PATCH 00/18] xfrm: Add compat layer

Dmitry Safonov <[email protected]> wrote:
> So, here I add a compatible layer to xfrm.
> As xfrm uses netlink notifications, kernel should send them in ABI
> format that an application will parse. The proposed solution is
> to save the ABI of bind() syscall. The realization detail is
> to create kernel-hidden, non visible to userspace netlink groups
> for compat applications.

Why not use exisiting netlink support?
Just add the 32bit skb to skb64->frag_list and let
netlink find if tasks needs 64 or 32 one.

It only needs this small fix to properly signal the end of a dump:
https://marc.info/?l=linux-netdev&m=126625240303351&w=2

I had started a second attempt to make xfrm compat work,
but its still in early stage.

One link that might still have some value:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
(compat structure definitions with BUILD_BUG_ON checking)

My plan was to make xfrm compat work strictly as shrinker (64->32)
and expander (32->64), i.e. no/little changes to exisiting code and
pass all "expanded" skbs through existing xfrm rcv functions.

Example to illustrate idea:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=c622f067849b02170127b69471cb3481e4bc9e49

... its supposed to take 64bit skb and create a 32bit one from it.

Just for reference; I currently don't plan to work on this again.

2018-07-27 07:39:01

by Steffen Klassert

[permalink] [raw]
Subject: Re: [PATCH 00/18] xfrm: Add compat layer

On Thu, Jul 26, 2018 at 10:49:59AM +0200, Florian Westphal wrote:
> Dmitry Safonov <[email protected]> wrote:
> > So, here I add a compatible layer to xfrm.
> > As xfrm uses netlink notifications, kernel should send them in ABI
> > format that an application will parse. The proposed solution is
> > to save the ABI of bind() syscall. The realization detail is
> > to create kernel-hidden, non visible to userspace netlink groups
> > for compat applications.
>
> Why not use exisiting netlink support?
> Just add the 32bit skb to skb64->frag_list and let
> netlink find if tasks needs 64 or 32 one.
>
> It only needs this small fix to properly signal the end of a dump:
> https://marc.info/?l=linux-netdev&m=126625240303351&w=2
>
> I had started a second attempt to make xfrm compat work,
> but its still in early stage.
>
> One link that might still have some value:
> https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> (compat structure definitions with BUILD_BUG_ON checking)
>
> My plan was to make xfrm compat work strictly as shrinker (64->32)
> and expander (32->64), i.e. no/little changes to exisiting code and
> pass all "expanded" skbs through existing xfrm rcv functions.

I agree here with Florian. The code behind this ABI
is already complicated. Please stay away from generic
code a much as possible. Generic and compat code should
be clearly separated.

2018-07-27 13:45:51

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH 06/18] netlink: Do not subscribe to non-existent groups

On Wed, 2018-07-25 at 21:22 -0700, David Miller wrote:
> From: Dmitry Safonov <[email protected]>
> Date: Thu, 26 Jul 2018 03:31:32 +0100
>
> > Make ABI more strict about subscribing to group > ngroups.
> > Code doesn't check for that and it looks bogus.
> > (one can subscribe to non-existing group)
> > Still, it's possible to bind() to all possible groups with (-1)
> >
> > Cc: "David S. Miller" <[email protected]>
> > Cc: Herbert Xu <[email protected]>
> > Cc: Steffen Klassert <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Dmitry Safonov <[email protected]>
>
> This really has nothing to do with adding a compat layer for xfrm,
> and is a bug fix that should be submitted separately in it's own
> right.

Sure, will do.

--
Thanks,
Dmitry

2018-07-27 14:04:54

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH 00/18] xfrm: Add compat layer

On Fri, 2018-07-27 at 09:37 +0200, Steffen Klassert wrote:
> On Thu, Jul 26, 2018 at 10:49:59AM +0200, Florian Westphal wrote:
> > Dmitry Safonov <[email protected]> wrote:
> > > So, here I add a compatible layer to xfrm.
> > > As xfrm uses netlink notifications, kernel should send them in
> > > ABI
> > > format that an application will parse. The proposed solution is
> > > to save the ABI of bind() syscall. The realization detail is
> > > to create kernel-hidden, non visible to userspace netlink groups
> > > for compat applications.
> >
> > Why not use exisiting netlink support?
> > Just add the 32bit skb to skb64->frag_list and let
> > netlink find if tasks needs 64 or 32 one.
> >
> > It only needs this small fix to properly signal the end of a dump:
> > https://marc.info/?l=linux-netdev&m=126625240303351&w=2
> >
> > I had started a second attempt to make xfrm compat work,
> > but its still in early stage.
> >
> > One link that might still have some value:
> > https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_confi
> > g_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> > (compat structure definitions with BUILD_BUG_ON checking)
> >
> > My plan was to make xfrm compat work strictly as shrinker (64->32)
> > and expander (32->64), i.e. no/little changes to exisiting code and
> > pass all "expanded" skbs through existing xfrm rcv functions.
>
> I agree here with Florian. The code behind this ABI
> is already complicated. Please stay away from generic
> code a much as possible. Generic and compat code should
> be clearly separated.

Yeah, I tend to agree that it would be better to separate it.
But:
1. It will double copy netlink messages, making it O(n) instead of
O(1), where n - is number of bind()s.. Probably we don't care much.
2. The patches not-yet-done on the link have +500 added lines - as much
as my working patches set, so probably it'll add more code.

Probably, we don't care that much about amount of code added and
additional copies than about separating compat layer from the main
code. Will look into that.

--
Thanks,
Dmitry

2018-07-27 14:21:31

by Florian Westphal

[permalink] [raw]
Subject: Re: [PATCH 00/18] xfrm: Add compat layer

Dmitry Safonov <[email protected]> wrote:
> 1. It will double copy netlink messages, making it O(n) instead of
> O(1), where n - is number of bind()s.. Probably we don't care much.

About those bind() patches, I don't understand why they are needed.

Why can't you just add the compat skb to the native skb when doing
the multicast call?

skb_shinfo(skb)->frag_list = compat_skb;
xfrm_nlmsg_multicast(net, skb, 0, ...

2018-07-27 14:53:02

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH 00/18] xfrm: Add compat layer

On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> Dmitry Safonov <[email protected]> wrote:
> > 1. It will double copy netlink messages, making it O(n) instead of
> > O(1), where n - is number of bind()s.. Probably we don't care much.
>
> About those bind() patches, I don't understand why they are needed.
>
> Why can't you just add the compat skb to the native skb when doing
> the multicast call?
>
> skb_shinfo(skb)->frag_list = compat_skb;
> xfrm_nlmsg_multicast(net, skb, 0, ...

Oh yeah, sorry, I think I misread the patch - will try to add compat
skb in the multicast call.

--
Thanks,
Dmitry

2018-07-27 16:57:00

by Nathan Harold

[permalink] [raw]
Subject: Re: [PATCH 00/18] xfrm: Add compat layer

*We (Android) are very interested in removing the restriction for 32-bit
userspace processes accessing xfrm netlink on 64-bit kernels. IPsec support
is required to pass Android conformance tests, and any manufacturer wishing
to ship 32-bit userspace with a recent kernel needs out-of-tree changes
(removing the compat_task check) to do so.That said, it’s not difficult to
work around alignment issues directly in userspace, so maybe we could just
remove the check and make this the caller's responsibility? Here’s an
example of the workaround currently in the Android
tree:https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257
<https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257>We
could also employ a (relatively simple) solution such as the one above in
the uapi XFRM header itself, though it would require a caller to declare
the target kernel ABI at compile time. Maybe that’s not unthinkable for an
uncommon case?-Nathan*

On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <[email protected]> wrote:

> On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> > Dmitry Safonov <[email protected]> wrote:
> > > 1. It will double copy netlink messages, making it O(n) instead of
> > > O(1), where n - is number of bind()s.. Probably we don't care much.
> >
> > About those bind() patches, I don't understand why they are needed.
> >
> > Why can't you just add the compat skb to the native skb when doing
> > the multicast call?
> >
> > skb_shinfo(skb)->frag_list = compat_skb;
> > xfrm_nlmsg_multicast(net, skb, 0, ...
>
> Oh yeah, sorry, I think I misread the patch - will try to add compat
> skb in the multicast call.
>
> --
> Thanks,
> Dmitry
>

2018-07-27 17:11:02

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 00/18] xfrm: Add compat layer



> On Jul 27, 2018, at 9:48 AM, Nathan Harold <[email protected]> wrote:
>
> We (Android) are very interested in removing the restriction for 32-bit userspace processes accessing xfrm netlink on 64-bit kernels. IPsec support is required to pass Android conformance tests, and any manufacturer wishing to ship 32-bit userspace with a recent kernel needs out-of-tree changes (removing the compat_task check) to do so.
>
> That said, it’s not difficult to work around alignment issues directly in userspace, so maybe we could just remove the check and make this the caller's responsibility? Here’s an example of the workaround currently in the Android tree:
> https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257
>
> We could also employ a (relatively simple) solution such as the one above in the uapi XFRM header itself, though it would require a caller to declare the target kernel ABI at compile time. Maybe that’s not unthinkable for an uncommon case?
>

Could there just be an XFRM2 that is entirely identical to XFRM for 64-bit userspace but makes the 32-bit structures match? If there are a grand total of two or so userspace implementations, that should cover most use cases. L

> -Nathan
>
>
>> On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <[email protected]> wrote:
>> On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
>> > Dmitry Safonov <[email protected]> wrote:
>> > > 1. It will double copy netlink messages, making it O(n) instead of
>> > > O(1), where n - is number of bind()s.. Probably we don't care much.
>> >
>> > About those bind() patches, I don't understand why they are needed.
>> >
>> > Why can't you just add the compat skb to the native skb when doing
>> > the multicast call?
>> >
>> > skb_shinfo(skb)->frag_list = compat_skb;
>> > xfrm_nlmsg_multicast(net, skb, 0, ...
>>
>> Oh yeah, sorry, I think I misread the patch - will try to add compat
>> skb in the multicast call.
>>
>> --
>> Thanks,
>> Dmitry
>

2018-07-28 16:28:02

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH 00/18] xfrm: Add compat layer

On Fri, 2018-07-27 at 09:48 -0700, Nathan Harold wrote:
> We (Android) are very interested in removing the restriction for 32-
> bit userspace processes accessing xfrm netlink on 64-bit kernels.
> IPsec support is required to pass Android conformance tests, and any
> manufacturer wishing to ship 32-bit userspace with a recent kernel
> needs out-of-tree changes (removing the compat_task check) to do so.

Glad to hear - that justify my attempts more :)

> That said, it’s not difficult to work around alignment issues
> directly in userspace, so maybe we could just remove the check and
> make this the caller's responsibility? Here’s an example of the
> workaround currently in the Android tree:
> https://android.googlesource.com/platform/system/netd/+/refs/heads/ma
> ster/server/XfrmController.h#257

We've kinda same workarounds in our userspace..
But I don't think reverting the check makes much sense - it'll make
broken compat ABI in stone.
If you're fine with disgraceful hacks and just want to get rid of
additional non-mainstream patch - you can make 64-bit syscalls from 32-
bit task (hint: examples in x86 selftests).


> We could also employ a (relatively simple) solution such as the one
> above in the uapi XFRM header itself, though it would require a
> caller to declare the target kernel ABI at compile time. Maybe that’s
> not unthinkable for an uncommon case?

Well, I think, I'll rework my patches set according to critics and
separate compat xfrm layer. I've already a selftest to check that 32/64
bit xfrm works - so the most time-taking part is done.
So, if you'll wait a week or two - you may help me to justify acception
of mainstreaming those patches.

> On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <[email protected]>
> wrote:
> > On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> > > Dmitry Safonov <[email protected]> wrote:
> > > > 1. It will double copy netlink messages, making it O(n) instead
> > of
> > > > O(1), where n - is number of bind()s.. Probably we don't care
> > much.
> > >
> > > About those bind() patches, I don't understand why they are
> > needed.
> > >
> > > Why can't you just add the compat skb to the native skb when
> > doing
> > > the multicast call?
> > >
> > > skb_shinfo(skb)->frag_list = compat_skb;
> > > xfrm_nlmsg_multicast(net, skb, 0, ...
> >
> > Oh yeah, sorry, I think I misread the patch - will try to add
> > compat
> > skb in the multicast call.
> >

--
Thanks,
Dmitry

2018-07-28 21:19:59

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 00/18] xfrm: Add compat layer

From: Dmitry Safonov <[email protected]>
Date: Sat, 28 Jul 2018 17:26:55 +0100

> Well, I think, I'll rework my patches set according to critics and
> separate compat xfrm layer. I've already a selftest to check that 32/64
> bit xfrm works - so the most time-taking part is done.

The way you've done the compat structures using __packed is only going
to work on x86, just FYI.

The "32-bit alignment for 64-bit objects" thing x86 has is very much
not universal amongst ABIs having 32-bit and 64-bit variants.

2018-07-30 17:40:36

by Dmitry Safonov

[permalink] [raw]
Subject: Re: [PATCH 00/18] xfrm: Add compat layer

On Sat, 2018-07-28 at 14:18 -0700, David Miller wrote:
> From: Dmitry Safonov <[email protected]>
> Date: Sat, 28 Jul 2018 17:26:55 +0100
>
> > Well, I think, I'll rework my patches set according to critics and
> > separate compat xfrm layer. I've already a selftest to check that
> 32/64
> > bit xfrm works - so the most time-taking part is done.
>
> The way you've done the compat structures using __packed is only
> going
> to work on x86, just FYI.

Thanks for pointing, so I'll probably cover it under something like
HAS_COMPAT_XFRM.
(if there isn't any better idea).

> The "32-bit alignment for 64-bit objects" thing x86 has is very much
> not universal amongst ABIs having 32-bit and 64-bit variants.

--
Thanks,
Dmitry

2018-07-30 19:45:20

by Florian Westphal

[permalink] [raw]
Subject: Re: [PATCH 00/18] xfrm: Add compat layer

Dmitry Safonov <[email protected]> wrote:
> On Sat, 2018-07-28 at 14:18 -0700, David Miller wrote:
> > From: Dmitry Safonov <[email protected]>
> > Date: Sat, 28 Jul 2018 17:26:55 +0100
> >
> > > Well, I think, I'll rework my patches set according to critics and
> > > separate compat xfrm layer. I've already a selftest to check that
> > 32/64
> > > bit xfrm works - so the most time-taking part is done.
> >
> > The way you've done the compat structures using __packed is only
> > going
> > to work on x86, just FYI.
>
> Thanks for pointing, so I'll probably cover it under something like
> HAS_COMPAT_XFRM.
> (if there isn't any better idea).

You can do that, I suspect you can use
CONFIG_COMPAT_FOR_U64_ALIGNMENT
as AFAICR the only reason for the compat problem is different alignment
requirements of 64bit integer types in the structs, not e.g. due to
"long" size differences.

Instead of __packed, you can use the "compat" data types, e.g.
compat_u64 instead of u64:

struct compat_xfrm_lifetime_cur {
compat_u64 bytes, packets, add_time, use_time;
}; /* same size on i386, but only 4 byte alignment required even on x86_64*/

You might be able to reuse
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869

in your patch set.

I can try to submit the first few patches (which are not related to
compat, they just add const qualifiers) for inclusion later this week.