This adds a netlink interface to make reads/writes to mdio buses. This
makes it easier to debug devices. This is especially useful when there
is a PCS involved (and the ethtool reads are faked), when there is no
MAC associated with a PHY, or when the MDIO device is not a PHY.
The closest existing in-kernel interfaces are the SIOCG/SMIIREG ioctls, but
they have several drawbacks:
1. "Write register" is not always exactly that. The kernel will try to
be extra helpful and do things behind the scenes if it detects a
write to the reset bit of a PHY for example.
2. Only one op per syscall. This means that is impossible to implement
many operations in a safe manner. Something as simple as a
read/mask/write cycle can race against an in-kernel driver.
3. Addressing is awkward since you don't address the MDIO bus
directly, rather "the MDIO bus to which this netdev's PHY is
connected". This makes it hard to talk to devices on buses to which
no PHYs are connected, the typical case being Ethernet switches.
To address these shortcomings, this adds a GENL interface with which a user
can interact with an MDIO bus directly. The user sends a program that
mdio-netlink executes, possibly emitting data back to the user. I.e. it
implements a very simple VM. Read/mask/write operations could be
implemented by dedicated commands, but when you start looking at more
advanced things like reading out the VLAN database of a switch you need
state and branching.
To prevent userspace phy drivers, writes are disabled by default, and can
only be enabled by editing the source. This is the same strategy used by
regmap for debugfs writes. Unfortunately, this disallows several useful
features, including
- Register writes (obviously)
- C45-over-C22
- Atomic access to paged registers
- Better MDIO emulation for e.g. QEMU
However, the read-only interface remains broadly useful for debugging.
Users who want to use the above features can re-enable them by defining
MDIO_NETLINK_ALLOW_WRITE and recompiling their kernel.
Signed-off-by: Sean Anderson <[email protected]>
---
This driver was written by Tobias Waldekranz. It is adapted from the
version he released with mdio-tools [1]. This was last discussed 2.5
years ago [2], and I have incorperated his cover letter into this commit
message. The discussion mainly centered around the write capability
allowing for userspace phy drivers. Although it comes with reduced
functionality, I hope this new approach satisfies Andrew. I have also
made several minor changes for style and to stay abrest of changing
APIs.
Tobias, I've taken the liberty of adding some copyright notices
attributing this to you.
[1] https://github.com/wkz/mdio-tools
[2] https://lore.kernel.org/netdev/C42DZQLTPHM5.2THDSRK84BI3T@wkz-x280/
drivers/net/mdio/Kconfig | 8 +
drivers/net/mdio/Makefile | 1 +
drivers/net/mdio/mdio-netlink.c | 464 ++++++++++++++++++++++++++++++
include/uapi/linux/mdio-netlink.h | 61 ++++
4 files changed, 534 insertions(+)
create mode 100644 drivers/net/mdio/mdio-netlink.c
create mode 100644 include/uapi/linux/mdio-netlink.h
diff --git a/drivers/net/mdio/Kconfig b/drivers/net/mdio/Kconfig
index 90309980686e..8a01978e5b51 100644
--- a/drivers/net/mdio/Kconfig
+++ b/drivers/net/mdio/Kconfig
@@ -43,6 +43,14 @@ config ACPI_MDIO
if MDIO_BUS
+config MDIO_NETLINK
+ tristate "Netlink interface for MDIO buses"
+ help
+ Enable a netlink interface to allow reading MDIO buses from
+ userspace. A small virtual machine allows implementing complex
+ operations, such as conditional reads or polling. All operations
+ submitted in the same program are evaluated atomically.
+
config MDIO_DEVRES
tristate
diff --git a/drivers/net/mdio/Makefile b/drivers/net/mdio/Makefile
index 7d4cb4c11e4e..5583d5b8d174 100644
--- a/drivers/net/mdio/Makefile
+++ b/drivers/net/mdio/Makefile
@@ -4,6 +4,7 @@
obj-$(CONFIG_ACPI_MDIO) += acpi_mdio.o
obj-$(CONFIG_FWNODE_MDIO) += fwnode_mdio.o
obj-$(CONFIG_OF_MDIO) += of_mdio.o
+obj-$(CONFIG_MDIO_NETLINK) += mdio-netlink.o
obj-$(CONFIG_MDIO_ASPEED) += mdio-aspeed.o
obj-$(CONFIG_MDIO_BCM_IPROC) += mdio-bcm-iproc.o
diff --git a/drivers/net/mdio/mdio-netlink.c b/drivers/net/mdio/mdio-netlink.c
new file mode 100644
index 000000000000..3e32d1a9bab3
--- /dev/null
+++ b/drivers/net/mdio/mdio-netlink.c
@@ -0,0 +1,464 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022-23 Sean Anderson <[email protected]>
+ * Copyright (C) 2020-22 Tobias Waldekranz <[email protected]>
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netlink.h>
+#include <linux/phy.h>
+#include <net/genetlink.h>
+#include <net/netlink.h>
+#include <uapi/linux/mdio-netlink.h>
+
+struct mdio_nl_xfer {
+ struct genl_info *info;
+ struct sk_buff *msg;
+ void *hdr;
+ struct nlattr *data;
+
+ struct mii_bus *mdio;
+ int timeout_ms;
+
+ int prog_len;
+ struct mdio_nl_insn *prog;
+};
+
+static int mdio_nl_open(struct mdio_nl_xfer *xfer);
+static int mdio_nl_close(struct mdio_nl_xfer *xfer, bool last, int xerr);
+
+static int mdio_nl_flush(struct mdio_nl_xfer *xfer)
+{
+ int err;
+
+ err = mdio_nl_close(xfer, false, 0);
+ if (err)
+ return err;
+
+ return mdio_nl_open(xfer);
+}
+
+static int mdio_nl_emit(struct mdio_nl_xfer *xfer, u32 datum)
+{
+ int err = 0;
+
+ if (!nla_put_nohdr(xfer->msg, sizeof(datum), &datum))
+ return 0;
+
+ err = mdio_nl_flush(xfer);
+ if (err)
+ return err;
+
+ return nla_put_nohdr(xfer->msg, sizeof(datum), &datum);
+}
+
+static inline u16 *__arg_r(u32 arg, u16 *regs)
+{
+ WARN_ON_ONCE(arg >> 16 != MDIO_NL_ARG_REG);
+
+ return ®s[arg & 0x7];
+}
+
+static inline u16 __arg_i(u32 arg)
+{
+ WARN_ON_ONCE(arg >> 16 != MDIO_NL_ARG_IMM);
+
+ return arg & 0xffff;
+}
+
+static inline u16 __arg_ri(u32 arg, u16 *regs)
+{
+ switch ((enum mdio_nl_argmode)(arg >> 16)) {
+ case MDIO_NL_ARG_IMM:
+ return arg & 0xffff;
+ case MDIO_NL_ARG_REG:
+ return regs[arg & 7];
+ default:
+ WARN_ON_ONCE(1);
+ return 0;
+ }
+}
+
+/* To prevent out-of-tree drivers from being implemented through this
+ * interface, disallow writes by default. This does disallow read-only uses,
+ * such as c45-over-c22 or reading phys with pages. However, with a such a
+ * flexible interface, we must use a big hammer. People who want to use this
+ * will need to modify the source code directly.
+ */
+#undef MDIO_NETLINK_ALLOW_WRITE
+
+static int mdio_nl_eval(struct mdio_nl_xfer *xfer)
+{
+ struct mdio_nl_insn *insn;
+ unsigned long timeout;
+ u16 regs[8] = { 0 };
+ int pc, ret = 0;
+ int phy_id, reg, prtad, devad, val;
+
+ timeout = jiffies + msecs_to_jiffies(xfer->timeout_ms);
+
+ mutex_lock(&xfer->mdio->mdio_lock);
+
+ for (insn = xfer->prog, pc = 0;
+ pc < xfer->prog_len;
+ insn = &xfer->prog[++pc]) {
+ if (time_after(jiffies, timeout)) {
+ ret = -ETIMEDOUT;
+ break;
+ }
+
+ switch ((enum mdio_nl_op)insn->op) {
+ case MDIO_NL_OP_READ:
+ phy_id = __arg_ri(insn->arg0, regs);
+ prtad = mdio_phy_id_prtad(phy_id);
+ devad = mdio_phy_id_devad(phy_id);
+ reg = __arg_ri(insn->arg1, regs);
+
+ if (mdio_phy_id_is_c45(phy_id))
+ ret = __mdiobus_c45_read(xfer->mdio, prtad,
+ devad, reg);
+ else
+ ret = __mdiobus_read(xfer->mdio, phy_id, reg);
+
+ if (ret < 0)
+ goto exit;
+ *__arg_r(insn->arg2, regs) = ret;
+ ret = 0;
+ break;
+
+ case MDIO_NL_OP_WRITE:
+ phy_id = __arg_ri(insn->arg0, regs);
+ prtad = mdio_phy_id_prtad(phy_id);
+ devad = mdio_phy_id_devad(phy_id);
+ reg = __arg_ri(insn->arg1, regs);
+ val = __arg_ri(insn->arg2, regs);
+
+#ifdef MDIO_NETLINK_ALLOW_WRITE
+ add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+ if (mdio_phy_id_is_c45(phy_id))
+ ret = __mdiobus_c45_write(xfer->mdio, prtad,
+ devad, reg, val
+ else
+ ret = __mdiobus_write(xfer->mdio, dev, reg,
+ val);
+#else
+ ret = -EPERM;
+#endif
+ if (ret < 0)
+ goto exit;
+ ret = 0;
+ break;
+
+ case MDIO_NL_OP_AND:
+ *__arg_r(insn->arg2, regs) =
+ __arg_ri(insn->arg0, regs) &
+ __arg_ri(insn->arg1, regs);
+ break;
+
+ case MDIO_NL_OP_OR:
+ *__arg_r(insn->arg2, regs) =
+ __arg_ri(insn->arg0, regs) |
+ __arg_ri(insn->arg1, regs);
+ break;
+
+ case MDIO_NL_OP_ADD:
+ *__arg_r(insn->arg2, regs) =
+ __arg_ri(insn->arg0, regs) +
+ __arg_ri(insn->arg1, regs);
+ break;
+
+ case MDIO_NL_OP_JEQ:
+ if (__arg_ri(insn->arg0, regs) ==
+ __arg_ri(insn->arg1, regs))
+ pc += (s16)__arg_i(insn->arg2);
+ break;
+
+ case MDIO_NL_OP_JNE:
+ if (__arg_ri(insn->arg0, regs) !=
+ __arg_ri(insn->arg1, regs))
+ pc += (s16)__arg_i(insn->arg2);
+ break;
+
+ case MDIO_NL_OP_EMIT:
+ ret = mdio_nl_emit(xfer, __arg_ri(insn->arg0, regs));
+ if (ret < 0)
+ goto exit;
+ ret = 0;
+ break;
+
+ case MDIO_NL_OP_UNSPEC:
+ default:
+ ret = -EINVAL;
+ goto exit;
+ }
+ }
+exit:
+ mutex_unlock(&xfer->mdio->mdio_lock);
+ return ret;
+}
+
+struct mdio_nl_op_proto {
+ u8 arg0;
+ u8 arg1;
+ u8 arg2;
+};
+
+static const struct mdio_nl_op_proto mdio_nl_op_protos[MDIO_NL_OP_MAX + 1] = {
+ [MDIO_NL_OP_READ] = {
+ .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg2 = BIT(MDIO_NL_ARG_REG),
+ },
+ [MDIO_NL_OP_WRITE] = {
+ .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg2 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ },
+ [MDIO_NL_OP_AND] = {
+ .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg2 = BIT(MDIO_NL_ARG_REG),
+ },
+ [MDIO_NL_OP_OR] = {
+ .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg2 = BIT(MDIO_NL_ARG_REG),
+ },
+ [MDIO_NL_OP_ADD] = {
+ .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg2 = BIT(MDIO_NL_ARG_REG),
+ },
+ [MDIO_NL_OP_JEQ] = {
+ .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg2 = BIT(MDIO_NL_ARG_IMM),
+ },
+ [MDIO_NL_OP_JNE] = {
+ .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg2 = BIT(MDIO_NL_ARG_IMM),
+ },
+ [MDIO_NL_OP_EMIT] = {
+ .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
+ .arg1 = BIT(MDIO_NL_ARG_NONE),
+ .arg2 = BIT(MDIO_NL_ARG_NONE),
+ },
+};
+
+static int mdio_nl_validate_insn(const struct nlattr *attr,
+ struct netlink_ext_ack *extack,
+ const struct mdio_nl_insn *insn)
+{
+ const struct mdio_nl_op_proto *proto;
+
+ if (insn->op > MDIO_NL_OP_MAX) {
+ NL_SET_ERR_MSG_ATTR(extack, attr, "Illegal instruction");
+ return -EINVAL;
+ }
+
+ proto = &mdio_nl_op_protos[insn->op];
+
+ if (!(BIT(insn->arg0 >> 16) & proto->arg0)) {
+ NL_SET_ERR_MSG_ATTR(extack, attr, "Argument 0 invalid");
+ return -EINVAL;
+ }
+
+ if (!(BIT(insn->arg1 >> 16) & proto->arg1)) {
+ NL_SET_ERR_MSG_ATTR(extack, attr, "Argument 1 invalid");
+ return -EINVAL;
+ }
+
+ if (!(BIT(insn->arg2 >> 16) & proto->arg2)) {
+ NL_SET_ERR_MSG_ATTR(extack, attr, "Argument 2 invalid");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int mdio_nl_validate_prog(const struct nlattr *attr,
+ struct netlink_ext_ack *extack)
+{
+ const struct mdio_nl_insn *prog = nla_data(attr);
+ int len = nla_len(attr);
+ int i, err = 0;
+
+ if (len % sizeof(*prog)) {
+ NL_SET_ERR_MSG_ATTR(extack, attr, "Unaligned instruction");
+ return -EINVAL;
+ }
+
+ len /= sizeof(*prog);
+ for (i = 0; i < len; i++) {
+ err = mdio_nl_validate_insn(attr, extack, &prog[i]);
+ if (err)
+ break;
+ }
+
+ return err;
+}
+
+static const struct nla_policy mdio_nl_policy[MDIO_NLA_MAX + 1] = {
+ [MDIO_NLA_UNSPEC] = { .type = NLA_UNSPEC, },
+ [MDIO_NLA_BUS_ID] = { .type = NLA_STRING, .len = MII_BUS_ID_SIZE },
+ [MDIO_NLA_TIMEOUT] = NLA_POLICY_MAX(NLA_U16, 10 * MSEC_PER_SEC),
+ [MDIO_NLA_PROG] = NLA_POLICY_VALIDATE_FN(NLA_BINARY,
+ mdio_nl_validate_prog,
+ 0x1000),
+ [MDIO_NLA_DATA] = { .type = NLA_NESTED },
+ [MDIO_NLA_ERROR] = { .type = NLA_S32, },
+};
+
+static struct genl_family mdio_nl_family;
+
+static int mdio_nl_open(struct mdio_nl_xfer *xfer)
+{
+ int err;
+
+ xfer->msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+ if (!xfer->msg) {
+ err = -ENOMEM;
+ goto err;
+ }
+
+ xfer->hdr = genlmsg_put(xfer->msg, xfer->info->snd_portid,
+ xfer->info->snd_seq, &mdio_nl_family,
+ NLM_F_ACK | NLM_F_MULTI, MDIO_GENL_XFER);
+ if (!xfer->hdr) {
+ err = -EMSGSIZE;
+ goto err_free;
+ }
+
+ xfer->data = nla_nest_start(xfer->msg, MDIO_NLA_DATA);
+ if (!xfer->data) {
+ err = -EMSGSIZE;
+ goto err_free;
+ }
+
+ return 0;
+
+err_free:
+ nlmsg_free(xfer->msg);
+err:
+ return err;
+}
+
+static int mdio_nl_close(struct mdio_nl_xfer *xfer, bool last, int xerr)
+{
+ struct nlmsghdr *end;
+ int err;
+
+ nla_nest_end(xfer->msg, xfer->data);
+
+ if (xerr && nla_put_s32(xfer->msg, MDIO_NLA_ERROR, xerr)) {
+ err = mdio_nl_flush(xfer);
+ if (err)
+ goto err_free;
+
+ if (nla_put_s32(xfer->msg, MDIO_NLA_ERROR, xerr)) {
+ err = -EMSGSIZE;
+ goto err_free;
+ }
+ }
+
+ genlmsg_end(xfer->msg, xfer->hdr);
+
+ if (last) {
+ end = nlmsg_put(xfer->msg, xfer->info->snd_portid,
+ xfer->info->snd_seq, NLMSG_DONE, 0,
+ NLM_F_ACK | NLM_F_MULTI);
+ if (!end) {
+ err = mdio_nl_flush(xfer);
+ if (err)
+ goto err_free;
+
+ end = nlmsg_put(xfer->msg, xfer->info->snd_portid,
+ xfer->info->snd_seq, NLMSG_DONE, 0,
+ NLM_F_ACK | NLM_F_MULTI);
+ if (!end) {
+ err = -EMSGSIZE;
+ goto err_free;
+ }
+ }
+ }
+
+ return genlmsg_unicast(genl_info_net(xfer->info), xfer->msg,
+ xfer->info->snd_portid);
+
+err_free:
+ nlmsg_free(xfer->msg);
+ return err;
+}
+
+static int mdio_nl_cmd_xfer(struct sk_buff *skb, struct genl_info *info)
+{
+ struct mdio_nl_xfer xfer;
+ int err;
+
+ if (!info->attrs[MDIO_NLA_BUS_ID] ||
+ !info->attrs[MDIO_NLA_PROG] ||
+ info->attrs[MDIO_NLA_DATA] ||
+ info->attrs[MDIO_NLA_ERROR])
+ return -EINVAL;
+
+ xfer.mdio = mdio_find_bus(nla_data(info->attrs[MDIO_NLA_BUS_ID]));
+ if (!xfer.mdio)
+ return -ENODEV;
+
+ if (info->attrs[MDIO_NLA_TIMEOUT])
+ xfer.timeout_ms = nla_get_u32(info->attrs[MDIO_NLA_TIMEOUT]);
+ else
+ xfer.timeout_ms = 100;
+
+ xfer.info = info;
+ xfer.prog_len = nla_len(info->attrs[MDIO_NLA_PROG]) / sizeof(*xfer.prog);
+ xfer.prog = nla_data(info->attrs[MDIO_NLA_PROG]);
+
+ err = mdio_nl_open(&xfer);
+ if (err)
+ return err;
+
+ err = mdio_nl_eval(&xfer);
+
+ err = mdio_nl_close(&xfer, true, err);
+ return err;
+}
+
+static const struct genl_ops mdio_nl_ops[] = {
+ {
+ .cmd = MDIO_GENL_XFER,
+ .doit = mdio_nl_cmd_xfer,
+ .flags = GENL_ADMIN_PERM,
+ },
+};
+
+static struct genl_family mdio_nl_family = {
+ .name = "mdio",
+ .version = 1,
+ .maxattr = MDIO_NLA_MAX,
+ .netnsok = false,
+ .module = THIS_MODULE,
+ .ops = mdio_nl_ops,
+ .n_ops = ARRAY_SIZE(mdio_nl_ops),
+ .policy = mdio_nl_policy,
+};
+
+static int __init mdio_nl_init(void)
+{
+ return genl_register_family(&mdio_nl_family);
+}
+
+static void __exit mdio_nl_exit(void)
+{
+ genl_unregister_family(&mdio_nl_family);
+}
+
+MODULE_AUTHOR("Tobias Waldekranz <[email protected]>");
+MODULE_DESCRIPTION("MDIO Netlink Interface");
+MODULE_LICENSE("GPL");
+
+module_init(mdio_nl_init);
+module_exit(mdio_nl_exit);
diff --git a/include/uapi/linux/mdio-netlink.h b/include/uapi/linux/mdio-netlink.h
new file mode 100644
index 000000000000..bebd3b45c882
--- /dev/null
+++ b/include/uapi/linux/mdio-netlink.h
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (C) 2020-22 Tobias Waldekranz <[email protected]>
+ */
+
+#ifndef _UAPI_LINUX_MDIO_NETLINK_H
+#define _UAPI_LINUX_MDIO_NETLINK_H
+
+#include <linux/types.h>
+
+enum {
+ MDIO_GENL_UNSPEC,
+ MDIO_GENL_XFER,
+
+ __MDIO_GENL_MAX,
+ MDIO_GENL_MAX = __MDIO_GENL_MAX - 1
+};
+
+enum {
+ MDIO_NLA_UNSPEC,
+ MDIO_NLA_BUS_ID, /* string */
+ MDIO_NLA_TIMEOUT, /* u32 */
+ MDIO_NLA_PROG, /* struct mdio_nl_insn[] */
+ MDIO_NLA_DATA, /* nest */
+ MDIO_NLA_ERROR, /* s32 */
+
+ __MDIO_NLA_MAX,
+ MDIO_NLA_MAX = __MDIO_NLA_MAX - 1
+};
+
+enum mdio_nl_op {
+ MDIO_NL_OP_UNSPEC,
+ MDIO_NL_OP_READ, /* read dev(RI), port(RI), dst(R) */
+ MDIO_NL_OP_WRITE, /* write dev(RI), port(RI), src(RI) */
+ MDIO_NL_OP_AND, /* and a(RI), b(RI), dst(R) */
+ MDIO_NL_OP_OR, /* or a(RI), b(RI), dst(R) */
+ MDIO_NL_OP_ADD, /* add a(RI), b(RI), dst(R) */
+ MDIO_NL_OP_JEQ, /* jeq a(RI), b(RI), jmp(I) */
+ MDIO_NL_OP_JNE, /* jeq a(RI), b(RI), jmp(I) */
+ MDIO_NL_OP_EMIT, /* emit src(RI) */
+
+ __MDIO_NL_OP_MAX,
+ MDIO_NL_OP_MAX = __MDIO_NL_OP_MAX - 1
+};
+
+enum mdio_nl_argmode {
+ MDIO_NL_ARG_NONE,
+ MDIO_NL_ARG_REG,
+ MDIO_NL_ARG_IMM,
+ MDIO_NL_ARG_RESERVED
+};
+
+struct mdio_nl_insn {
+ __u64 op:8;
+ __u64 reserved:2;
+ __u64 arg0:18;
+ __u64 arg1:18;
+ __u64 arg2:18;
+};
+
+#endif /* _UAPI_LINUX_MDIO_NETLINK_H */
--
2.35.1.1320.gc452695387.dirty
On Mon, Mar 06, 2023 at 03:45:16PM -0500, Sean Anderson wrote:
> +static int mdio_nl_eval(struct mdio_nl_xfer *xfer)
> +{
> + struct mdio_nl_insn *insn;
> + unsigned long timeout;
> + u16 regs[8] = { 0 };
> + int pc, ret = 0;
So "pc" is signed.
> + int phy_id, reg, prtad, devad, val;
> +
> + timeout = jiffies + msecs_to_jiffies(xfer->timeout_ms);
> +
> + mutex_lock(&xfer->mdio->mdio_lock);
> +
> + for (insn = xfer->prog, pc = 0;
> + pc < xfer->prog_len;
xfer->prog_len is signed, so this is a signed comparison.
> + case MDIO_NL_OP_JEQ:
> + if (__arg_ri(insn->arg0, regs) ==
> + __arg_ri(insn->arg1, regs))
> + pc += (s16)__arg_i(insn->arg2);
This adds a signed 16-bit integer to pc, which can make pc negative.
And so the question becomes... what prevents pc becoming negative
and then trying to use a negative number as an index?
I think prog_len and pc should both be unsigned, then the test you
have will be unsigned, and thus wrapping "pc" around zero makes it
a very large integer which fails the test - preventing at least
access outside of the array. Better still would be a validator
that checks that the program is in fact safe to execute.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On 3/6/23 17:48, Russell King (Oracle) wrote:
> On Mon, Mar 06, 2023 at 03:45:16PM -0500, Sean Anderson wrote:
>> +static int mdio_nl_eval(struct mdio_nl_xfer *xfer)
>> +{
>> + struct mdio_nl_insn *insn;
>> + unsigned long timeout;
>> + u16 regs[8] = { 0 };
>> + int pc, ret = 0;
>
> So "pc" is signed.
>
>> + int phy_id, reg, prtad, devad, val;
>> +
>> + timeout = jiffies + msecs_to_jiffies(xfer->timeout_ms);
>> +
>> + mutex_lock(&xfer->mdio->mdio_lock);
>> +
>> + for (insn = xfer->prog, pc = 0;
>> + pc < xfer->prog_len;
>
> xfer->prog_len is signed, so this is a signed comparison.
>
>> + case MDIO_NL_OP_JEQ:
>> + if (__arg_ri(insn->arg0, regs) ==
>> + __arg_ri(insn->arg1, regs))
>> + pc += (s16)__arg_i(insn->arg2);
>
> This adds a signed 16-bit integer to pc, which can make pc negative.
>
> And so the question becomes... what prevents pc becoming negative
> and then trying to use a negative number as an index?
We start executing from somewhere on the heap :)
> I think prog_len and pc should both be unsigned, then the test you
> have will be unsigned, and thus wrapping "pc" around zero makes it
> a very large integer which fails the test - preventing at least
> access outside of the array.
Will fix.
> Better still would be a validator
> that checks that the program is in fact safe to execute.
I think mdio_nl_validate_prog could be extended to check for this.
--Sean
Hi Sean,
I love your patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Sean-Anderson/net-mdio-Add-netlink-interface/20230307-044742
patch link: https://lore.kernel.org/r/20230306204517.1953122-1-sean.anderson%40seco.com
patch subject: [PATCH net-next] net: mdio: Add netlink interface
config: sparc-allyesconfig (https://download.01.org/0day-ci/archive/20230307/[email protected]/config)
compiler: sparc64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/78ff5545f403a98977a2db207cc165cb3a3b4d8f
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Sean-Anderson/net-mdio-Add-netlink-interface/20230307-044742
git checkout 78ff5545f403a98977a2db207cc165cb3a3b4d8f
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sparc olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=sparc SHELL=/bin/bash drivers/net/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All warnings (new ones prefixed by >>):
drivers/net/mdio/mdio-netlink.c: In function 'mdio_nl_eval':
>> drivers/net/mdio/mdio-netlink.c:98:40: warning: variable 'val' set but not used [-Wunused-but-set-variable]
98 | int phy_id, reg, prtad, devad, val;
| ^~~
vim +/val +98 drivers/net/mdio/mdio-netlink.c
91
92 static int mdio_nl_eval(struct mdio_nl_xfer *xfer)
93 {
94 struct mdio_nl_insn *insn;
95 unsigned long timeout;
96 u16 regs[8] = { 0 };
97 int pc, ret = 0;
> 98 int phy_id, reg, prtad, devad, val;
99
100 timeout = jiffies + msecs_to_jiffies(xfer->timeout_ms);
101
102 mutex_lock(&xfer->mdio->mdio_lock);
103
104 for (insn = xfer->prog, pc = 0;
105 pc < xfer->prog_len;
106 insn = &xfer->prog[++pc]) {
107 if (time_after(jiffies, timeout)) {
108 ret = -ETIMEDOUT;
109 break;
110 }
111
112 switch ((enum mdio_nl_op)insn->op) {
113 case MDIO_NL_OP_READ:
114 phy_id = __arg_ri(insn->arg0, regs);
115 prtad = mdio_phy_id_prtad(phy_id);
116 devad = mdio_phy_id_devad(phy_id);
117 reg = __arg_ri(insn->arg1, regs);
118
119 if (mdio_phy_id_is_c45(phy_id))
120 ret = __mdiobus_c45_read(xfer->mdio, prtad,
121 devad, reg);
122 else
123 ret = __mdiobus_read(xfer->mdio, phy_id, reg);
124
125 if (ret < 0)
126 goto exit;
127 *__arg_r(insn->arg2, regs) = ret;
128 ret = 0;
129 break;
130
131 case MDIO_NL_OP_WRITE:
132 phy_id = __arg_ri(insn->arg0, regs);
133 prtad = mdio_phy_id_prtad(phy_id);
134 devad = mdio_phy_id_devad(phy_id);
135 reg = __arg_ri(insn->arg1, regs);
136 val = __arg_ri(insn->arg2, regs);
137
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
> To prevent userspace phy drivers, writes are disabled by default, and can
> only be enabled by editing the source.
Maybe we can just taint the kernel using add_taint()? I'm not sure if
that will prevent vendors writing user space drivers. Thoughts?
-michael
On mån, mar 06, 2023 at 15:45, Sean Anderson <[email protected]> wrote:
> This adds a netlink interface to make reads/writes to mdio buses. This
> makes it easier to debug devices. This is especially useful when there
> is a PCS involved (and the ethtool reads are faked), when there is no
> MAC associated with a PHY, or when the MDIO device is not a PHY.
>
> The closest existing in-kernel interfaces are the SIOCG/SMIIREG ioctls, but
> they have several drawbacks:
>
> 1. "Write register" is not always exactly that. The kernel will try to
> be extra helpful and do things behind the scenes if it detects a
> write to the reset bit of a PHY for example.
>
> 2. Only one op per syscall. This means that is impossible to implement
> many operations in a safe manner. Something as simple as a
> read/mask/write cycle can race against an in-kernel driver.
>
> 3. Addressing is awkward since you don't address the MDIO bus
> directly, rather "the MDIO bus to which this netdev's PHY is
> connected". This makes it hard to talk to devices on buses to which
> no PHYs are connected, the typical case being Ethernet switches.
>
> To address these shortcomings, this adds a GENL interface with which a user
> can interact with an MDIO bus directly. The user sends a program that
> mdio-netlink executes, possibly emitting data back to the user. I.e. it
> implements a very simple VM. Read/mask/write operations could be
> implemented by dedicated commands, but when you start looking at more
> advanced things like reading out the VLAN database of a switch you need
> state and branching.
>
> To prevent userspace phy drivers, writes are disabled by default, and can
> only be enabled by editing the source. This is the same strategy used by
> regmap for debugfs writes. Unfortunately, this disallows several useful
> features, including
>
> - Register writes (obviously)
> - C45-over-C22
> - Atomic access to paged registers
> - Better MDIO emulation for e.g. QEMU
>
> However, the read-only interface remains broadly useful for debugging.
> Users who want to use the above features can re-enable them by defining
> MDIO_NETLINK_ALLOW_WRITE and recompiling their kernel.
What about taking a page from the BPF playbook and require all loaded
programs (MDIO_GENL_XFERs) to be licensed under GPL? That would mean
that the userspace program that generated it would also have to be
GPLed.
My view has always been that a vendor looking to build a userspace SDK
won't be deterred by this limitation. They can easily build
mdio-netlink.ko from mdio-tools and use that to drive it, or (more
likely) they already have their own implementation that they are stuck
with for legacy reasons. In other words: we are only punishing
legitimate users (mdio-tools being one of them, IMO).
Perhaps with this approach we could have our cake and eat it too.
> Signed-off-by: Sean Anderson <[email protected]>
> ---
> This driver was written by Tobias Waldekranz. It is adapted from the
> version he released with mdio-tools [1]. This was last discussed 2.5
> years ago [2], and I have incorperated his cover letter into this commit
> message. The discussion mainly centered around the write capability
> allowing for userspace phy drivers. Although it comes with reduced
> functionality, I hope this new approach satisfies Andrew. I have also
> made several minor changes for style and to stay abrest of changing
> APIs.
>
> Tobias, I've taken the liberty of adding some copyright notices
> attributing this to you.
Fine by me :)
> [1] https://github.com/wkz/mdio-tools
> [2] https://lore.kernel.org/netdev/C42DZQLTPHM5.2THDSRK84BI3T@wkz-x280/
>
> drivers/net/mdio/Kconfig | 8 +
> drivers/net/mdio/Makefile | 1 +
> drivers/net/mdio/mdio-netlink.c | 464 ++++++++++++++++++++++++++++++
> include/uapi/linux/mdio-netlink.h | 61 ++++
> 4 files changed, 534 insertions(+)
> create mode 100644 drivers/net/mdio/mdio-netlink.c
> create mode 100644 include/uapi/linux/mdio-netlink.h
>
> diff --git a/drivers/net/mdio/Kconfig b/drivers/net/mdio/Kconfig
> index 90309980686e..8a01978e5b51 100644
> --- a/drivers/net/mdio/Kconfig
> +++ b/drivers/net/mdio/Kconfig
> @@ -43,6 +43,14 @@ config ACPI_MDIO
>
> if MDIO_BUS
>
> +config MDIO_NETLINK
> + tristate "Netlink interface for MDIO buses"
> + help
> + Enable a netlink interface to allow reading MDIO buses from
> + userspace. A small virtual machine allows implementing complex
> + operations, such as conditional reads or polling. All operations
> + submitted in the same program are evaluated atomically.
> +
> config MDIO_DEVRES
> tristate
>
> diff --git a/drivers/net/mdio/Makefile b/drivers/net/mdio/Makefile
> index 7d4cb4c11e4e..5583d5b8d174 100644
> --- a/drivers/net/mdio/Makefile
> +++ b/drivers/net/mdio/Makefile
> @@ -4,6 +4,7 @@
> obj-$(CONFIG_ACPI_MDIO) += acpi_mdio.o
> obj-$(CONFIG_FWNODE_MDIO) += fwnode_mdio.o
> obj-$(CONFIG_OF_MDIO) += of_mdio.o
> +obj-$(CONFIG_MDIO_NETLINK) += mdio-netlink.o
>
> obj-$(CONFIG_MDIO_ASPEED) += mdio-aspeed.o
> obj-$(CONFIG_MDIO_BCM_IPROC) += mdio-bcm-iproc.o
> diff --git a/drivers/net/mdio/mdio-netlink.c b/drivers/net/mdio/mdio-netlink.c
> new file mode 100644
> index 000000000000..3e32d1a9bab3
> --- /dev/null
> +++ b/drivers/net/mdio/mdio-netlink.c
> @@ -0,0 +1,464 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2022-23 Sean Anderson <[email protected]>
> + * Copyright (C) 2020-22 Tobias Waldekranz <[email protected]>
> + */
> +
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/netlink.h>
> +#include <linux/phy.h>
> +#include <net/genetlink.h>
> +#include <net/netlink.h>
> +#include <uapi/linux/mdio-netlink.h>
> +
> +struct mdio_nl_xfer {
> + struct genl_info *info;
> + struct sk_buff *msg;
> + void *hdr;
> + struct nlattr *data;
> +
> + struct mii_bus *mdio;
> + int timeout_ms;
> +
> + int prog_len;
> + struct mdio_nl_insn *prog;
> +};
> +
> +static int mdio_nl_open(struct mdio_nl_xfer *xfer);
> +static int mdio_nl_close(struct mdio_nl_xfer *xfer, bool last, int xerr);
> +
> +static int mdio_nl_flush(struct mdio_nl_xfer *xfer)
> +{
> + int err;
> +
> + err = mdio_nl_close(xfer, false, 0);
> + if (err)
> + return err;
> +
> + return mdio_nl_open(xfer);
> +}
> +
> +static int mdio_nl_emit(struct mdio_nl_xfer *xfer, u32 datum)
> +{
> + int err = 0;
> +
> + if (!nla_put_nohdr(xfer->msg, sizeof(datum), &datum))
> + return 0;
> +
> + err = mdio_nl_flush(xfer);
> + if (err)
> + return err;
> +
> + return nla_put_nohdr(xfer->msg, sizeof(datum), &datum);
> +}
> +
> +static inline u16 *__arg_r(u32 arg, u16 *regs)
> +{
> + WARN_ON_ONCE(arg >> 16 != MDIO_NL_ARG_REG);
> +
> + return ®s[arg & 0x7];
> +}
> +
> +static inline u16 __arg_i(u32 arg)
> +{
> + WARN_ON_ONCE(arg >> 16 != MDIO_NL_ARG_IMM);
> +
> + return arg & 0xffff;
> +}
> +
> +static inline u16 __arg_ri(u32 arg, u16 *regs)
> +{
> + switch ((enum mdio_nl_argmode)(arg >> 16)) {
> + case MDIO_NL_ARG_IMM:
> + return arg & 0xffff;
> + case MDIO_NL_ARG_REG:
> + return regs[arg & 7];
> + default:
> + WARN_ON_ONCE(1);
> + return 0;
> + }
> +}
> +
> +/* To prevent out-of-tree drivers from being implemented through this
> + * interface, disallow writes by default. This does disallow read-only uses,
> + * such as c45-over-c22 or reading phys with pages. However, with a such a
> + * flexible interface, we must use a big hammer. People who want to use this
> + * will need to modify the source code directly.
> + */
> +#undef MDIO_NETLINK_ALLOW_WRITE
> +
> +static int mdio_nl_eval(struct mdio_nl_xfer *xfer)
> +{
> + struct mdio_nl_insn *insn;
> + unsigned long timeout;
> + u16 regs[8] = { 0 };
> + int pc, ret = 0;
> + int phy_id, reg, prtad, devad, val;
> +
> + timeout = jiffies + msecs_to_jiffies(xfer->timeout_ms);
> +
> + mutex_lock(&xfer->mdio->mdio_lock);
> +
> + for (insn = xfer->prog, pc = 0;
> + pc < xfer->prog_len;
> + insn = &xfer->prog[++pc]) {
> + if (time_after(jiffies, timeout)) {
> + ret = -ETIMEDOUT;
> + break;
> + }
> +
> + switch ((enum mdio_nl_op)insn->op) {
> + case MDIO_NL_OP_READ:
> + phy_id = __arg_ri(insn->arg0, regs);
> + prtad = mdio_phy_id_prtad(phy_id);
> + devad = mdio_phy_id_devad(phy_id);
> + reg = __arg_ri(insn->arg1, regs);
> +
> + if (mdio_phy_id_is_c45(phy_id))
> + ret = __mdiobus_c45_read(xfer->mdio, prtad,
> + devad, reg);
> + else
> + ret = __mdiobus_read(xfer->mdio, phy_id, reg);
> +
> + if (ret < 0)
> + goto exit;
> + *__arg_r(insn->arg2, regs) = ret;
> + ret = 0;
> + break;
> +
> + case MDIO_NL_OP_WRITE:
> + phy_id = __arg_ri(insn->arg0, regs);
> + prtad = mdio_phy_id_prtad(phy_id);
> + devad = mdio_phy_id_devad(phy_id);
> + reg = __arg_ri(insn->arg1, regs);
> + val = __arg_ri(insn->arg2, regs);
> +
> +#ifdef MDIO_NETLINK_ALLOW_WRITE
> + add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> + if (mdio_phy_id_is_c45(phy_id))
> + ret = __mdiobus_c45_write(xfer->mdio, prtad,
> + devad, reg, val
> + else
> + ret = __mdiobus_write(xfer->mdio, dev, reg,
> + val);
> +#else
> + ret = -EPERM;
> +#endif
> + if (ret < 0)
> + goto exit;
> + ret = 0;
> + break;
> +
> + case MDIO_NL_OP_AND:
> + *__arg_r(insn->arg2, regs) =
> + __arg_ri(insn->arg0, regs) &
> + __arg_ri(insn->arg1, regs);
> + break;
> +
> + case MDIO_NL_OP_OR:
> + *__arg_r(insn->arg2, regs) =
> + __arg_ri(insn->arg0, regs) |
> + __arg_ri(insn->arg1, regs);
> + break;
> +
> + case MDIO_NL_OP_ADD:
> + *__arg_r(insn->arg2, regs) =
> + __arg_ri(insn->arg0, regs) +
> + __arg_ri(insn->arg1, regs);
> + break;
> +
> + case MDIO_NL_OP_JEQ:
> + if (__arg_ri(insn->arg0, regs) ==
> + __arg_ri(insn->arg1, regs))
> + pc += (s16)__arg_i(insn->arg2);
> + break;
> +
> + case MDIO_NL_OP_JNE:
> + if (__arg_ri(insn->arg0, regs) !=
> + __arg_ri(insn->arg1, regs))
> + pc += (s16)__arg_i(insn->arg2);
> + break;
> +
> + case MDIO_NL_OP_EMIT:
> + ret = mdio_nl_emit(xfer, __arg_ri(insn->arg0, regs));
> + if (ret < 0)
> + goto exit;
> + ret = 0;
> + break;
> +
> + case MDIO_NL_OP_UNSPEC:
> + default:
> + ret = -EINVAL;
> + goto exit;
> + }
> + }
> +exit:
> + mutex_unlock(&xfer->mdio->mdio_lock);
> + return ret;
> +}
> +
> +struct mdio_nl_op_proto {
> + u8 arg0;
> + u8 arg1;
> + u8 arg2;
> +};
> +
> +static const struct mdio_nl_op_proto mdio_nl_op_protos[MDIO_NL_OP_MAX + 1] = {
> + [MDIO_NL_OP_READ] = {
> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg2 = BIT(MDIO_NL_ARG_REG),
> + },
> + [MDIO_NL_OP_WRITE] = {
> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg2 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + },
> + [MDIO_NL_OP_AND] = {
> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg2 = BIT(MDIO_NL_ARG_REG),
> + },
> + [MDIO_NL_OP_OR] = {
> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg2 = BIT(MDIO_NL_ARG_REG),
> + },
> + [MDIO_NL_OP_ADD] = {
> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg2 = BIT(MDIO_NL_ARG_REG),
> + },
> + [MDIO_NL_OP_JEQ] = {
> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg2 = BIT(MDIO_NL_ARG_IMM),
> + },
> + [MDIO_NL_OP_JNE] = {
> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg2 = BIT(MDIO_NL_ARG_IMM),
> + },
> + [MDIO_NL_OP_EMIT] = {
> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
> + .arg1 = BIT(MDIO_NL_ARG_NONE),
> + .arg2 = BIT(MDIO_NL_ARG_NONE),
> + },
> +};
> +
> +static int mdio_nl_validate_insn(const struct nlattr *attr,
> + struct netlink_ext_ack *extack,
> + const struct mdio_nl_insn *insn)
> +{
> + const struct mdio_nl_op_proto *proto;
> +
> + if (insn->op > MDIO_NL_OP_MAX) {
> + NL_SET_ERR_MSG_ATTR(extack, attr, "Illegal instruction");
> + return -EINVAL;
> + }
> +
> + proto = &mdio_nl_op_protos[insn->op];
> +
> + if (!(BIT(insn->arg0 >> 16) & proto->arg0)) {
> + NL_SET_ERR_MSG_ATTR(extack, attr, "Argument 0 invalid");
> + return -EINVAL;
> + }
> +
> + if (!(BIT(insn->arg1 >> 16) & proto->arg1)) {
> + NL_SET_ERR_MSG_ATTR(extack, attr, "Argument 1 invalid");
> + return -EINVAL;
> + }
> +
> + if (!(BIT(insn->arg2 >> 16) & proto->arg2)) {
> + NL_SET_ERR_MSG_ATTR(extack, attr, "Argument 2 invalid");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int mdio_nl_validate_prog(const struct nlattr *attr,
> + struct netlink_ext_ack *extack)
> +{
> + const struct mdio_nl_insn *prog = nla_data(attr);
> + int len = nla_len(attr);
> + int i, err = 0;
> +
> + if (len % sizeof(*prog)) {
> + NL_SET_ERR_MSG_ATTR(extack, attr, "Unaligned instruction");
> + return -EINVAL;
> + }
> +
> + len /= sizeof(*prog);
> + for (i = 0; i < len; i++) {
> + err = mdio_nl_validate_insn(attr, extack, &prog[i]);
> + if (err)
> + break;
> + }
> +
> + return err;
> +}
> +
> +static const struct nla_policy mdio_nl_policy[MDIO_NLA_MAX + 1] = {
> + [MDIO_NLA_UNSPEC] = { .type = NLA_UNSPEC, },
> + [MDIO_NLA_BUS_ID] = { .type = NLA_STRING, .len = MII_BUS_ID_SIZE },
> + [MDIO_NLA_TIMEOUT] = NLA_POLICY_MAX(NLA_U16, 10 * MSEC_PER_SEC),
> + [MDIO_NLA_PROG] = NLA_POLICY_VALIDATE_FN(NLA_BINARY,
> + mdio_nl_validate_prog,
> + 0x1000),
> + [MDIO_NLA_DATA] = { .type = NLA_NESTED },
> + [MDIO_NLA_ERROR] = { .type = NLA_S32, },
> +};
> +
> +static struct genl_family mdio_nl_family;
> +
> +static int mdio_nl_open(struct mdio_nl_xfer *xfer)
> +{
> + int err;
> +
> + xfer->msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> + if (!xfer->msg) {
> + err = -ENOMEM;
> + goto err;
> + }
> +
> + xfer->hdr = genlmsg_put(xfer->msg, xfer->info->snd_portid,
> + xfer->info->snd_seq, &mdio_nl_family,
> + NLM_F_ACK | NLM_F_MULTI, MDIO_GENL_XFER);
> + if (!xfer->hdr) {
> + err = -EMSGSIZE;
> + goto err_free;
> + }
> +
> + xfer->data = nla_nest_start(xfer->msg, MDIO_NLA_DATA);
> + if (!xfer->data) {
> + err = -EMSGSIZE;
> + goto err_free;
> + }
> +
> + return 0;
> +
> +err_free:
> + nlmsg_free(xfer->msg);
> +err:
> + return err;
> +}
> +
> +static int mdio_nl_close(struct mdio_nl_xfer *xfer, bool last, int xerr)
> +{
> + struct nlmsghdr *end;
> + int err;
> +
> + nla_nest_end(xfer->msg, xfer->data);
> +
> + if (xerr && nla_put_s32(xfer->msg, MDIO_NLA_ERROR, xerr)) {
> + err = mdio_nl_flush(xfer);
> + if (err)
> + goto err_free;
> +
> + if (nla_put_s32(xfer->msg, MDIO_NLA_ERROR, xerr)) {
> + err = -EMSGSIZE;
> + goto err_free;
> + }
> + }
> +
> + genlmsg_end(xfer->msg, xfer->hdr);
> +
> + if (last) {
> + end = nlmsg_put(xfer->msg, xfer->info->snd_portid,
> + xfer->info->snd_seq, NLMSG_DONE, 0,
> + NLM_F_ACK | NLM_F_MULTI);
> + if (!end) {
> + err = mdio_nl_flush(xfer);
> + if (err)
> + goto err_free;
> +
> + end = nlmsg_put(xfer->msg, xfer->info->snd_portid,
> + xfer->info->snd_seq, NLMSG_DONE, 0,
> + NLM_F_ACK | NLM_F_MULTI);
> + if (!end) {
> + err = -EMSGSIZE;
> + goto err_free;
> + }
> + }
> + }
> +
> + return genlmsg_unicast(genl_info_net(xfer->info), xfer->msg,
> + xfer->info->snd_portid);
> +
> +err_free:
> + nlmsg_free(xfer->msg);
> + return err;
> +}
> +
> +static int mdio_nl_cmd_xfer(struct sk_buff *skb, struct genl_info *info)
> +{
> + struct mdio_nl_xfer xfer;
> + int err;
> +
> + if (!info->attrs[MDIO_NLA_BUS_ID] ||
> + !info->attrs[MDIO_NLA_PROG] ||
> + info->attrs[MDIO_NLA_DATA] ||
> + info->attrs[MDIO_NLA_ERROR])
> + return -EINVAL;
> +
> + xfer.mdio = mdio_find_bus(nla_data(info->attrs[MDIO_NLA_BUS_ID]));
> + if (!xfer.mdio)
> + return -ENODEV;
> +
> + if (info->attrs[MDIO_NLA_TIMEOUT])
> + xfer.timeout_ms = nla_get_u32(info->attrs[MDIO_NLA_TIMEOUT]);
> + else
> + xfer.timeout_ms = 100;
> +
> + xfer.info = info;
> + xfer.prog_len = nla_len(info->attrs[MDIO_NLA_PROG]) / sizeof(*xfer.prog);
> + xfer.prog = nla_data(info->attrs[MDIO_NLA_PROG]);
> +
> + err = mdio_nl_open(&xfer);
> + if (err)
> + return err;
> +
> + err = mdio_nl_eval(&xfer);
> +
> + err = mdio_nl_close(&xfer, true, err);
> + return err;
> +}
> +
> +static const struct genl_ops mdio_nl_ops[] = {
> + {
> + .cmd = MDIO_GENL_XFER,
> + .doit = mdio_nl_cmd_xfer,
> + .flags = GENL_ADMIN_PERM,
> + },
> +};
> +
> +static struct genl_family mdio_nl_family = {
> + .name = "mdio",
> + .version = 1,
> + .maxattr = MDIO_NLA_MAX,
> + .netnsok = false,
> + .module = THIS_MODULE,
> + .ops = mdio_nl_ops,
> + .n_ops = ARRAY_SIZE(mdio_nl_ops),
> + .policy = mdio_nl_policy,
> +};
> +
> +static int __init mdio_nl_init(void)
> +{
> + return genl_register_family(&mdio_nl_family);
> +}
> +
> +static void __exit mdio_nl_exit(void)
> +{
> + genl_unregister_family(&mdio_nl_family);
> +}
> +
> +MODULE_AUTHOR("Tobias Waldekranz <[email protected]>");
> +MODULE_DESCRIPTION("MDIO Netlink Interface");
> +MODULE_LICENSE("GPL");
> +
> +module_init(mdio_nl_init);
> +module_exit(mdio_nl_exit);
> diff --git a/include/uapi/linux/mdio-netlink.h b/include/uapi/linux/mdio-netlink.h
> new file mode 100644
> index 000000000000..bebd3b45c882
> --- /dev/null
> +++ b/include/uapi/linux/mdio-netlink.h
> @@ -0,0 +1,61 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Copyright (C) 2020-22 Tobias Waldekranz <[email protected]>
> + */
> +
> +#ifndef _UAPI_LINUX_MDIO_NETLINK_H
> +#define _UAPI_LINUX_MDIO_NETLINK_H
> +
> +#include <linux/types.h>
> +
> +enum {
> + MDIO_GENL_UNSPEC,
> + MDIO_GENL_XFER,
> +
> + __MDIO_GENL_MAX,
> + MDIO_GENL_MAX = __MDIO_GENL_MAX - 1
> +};
> +
> +enum {
> + MDIO_NLA_UNSPEC,
> + MDIO_NLA_BUS_ID, /* string */
> + MDIO_NLA_TIMEOUT, /* u32 */
> + MDIO_NLA_PROG, /* struct mdio_nl_insn[] */
> + MDIO_NLA_DATA, /* nest */
> + MDIO_NLA_ERROR, /* s32 */
> +
> + __MDIO_NLA_MAX,
> + MDIO_NLA_MAX = __MDIO_NLA_MAX - 1
> +};
> +
> +enum mdio_nl_op {
> + MDIO_NL_OP_UNSPEC,
> + MDIO_NL_OP_READ, /* read dev(RI), port(RI), dst(R) */
> + MDIO_NL_OP_WRITE, /* write dev(RI), port(RI), src(RI) */
> + MDIO_NL_OP_AND, /* and a(RI), b(RI), dst(R) */
> + MDIO_NL_OP_OR, /* or a(RI), b(RI), dst(R) */
> + MDIO_NL_OP_ADD, /* add a(RI), b(RI), dst(R) */
> + MDIO_NL_OP_JEQ, /* jeq a(RI), b(RI), jmp(I) */
> + MDIO_NL_OP_JNE, /* jeq a(RI), b(RI), jmp(I) */
> + MDIO_NL_OP_EMIT, /* emit src(RI) */
> +
> + __MDIO_NL_OP_MAX,
> + MDIO_NL_OP_MAX = __MDIO_NL_OP_MAX - 1
> +};
> +
> +enum mdio_nl_argmode {
> + MDIO_NL_ARG_NONE,
> + MDIO_NL_ARG_REG,
> + MDIO_NL_ARG_IMM,
> + MDIO_NL_ARG_RESERVED
> +};
> +
> +struct mdio_nl_insn {
> + __u64 op:8;
> + __u64 reserved:2;
> + __u64 arg0:18;
> + __u64 arg1:18;
> + __u64 arg2:18;
> +};
> +
> +#endif /* _UAPI_LINUX_MDIO_NETLINK_H */
> --
> 2.35.1.1320.gc452695387.dirty
On Mon, Mar 06, 2023 at 10:48:48PM +0000, Russell King (Oracle) wrote:
> On Mon, Mar 06, 2023 at 03:45:16PM -0500, Sean Anderson wrote:
> > +static int mdio_nl_eval(struct mdio_nl_xfer *xfer)
> > +{
> > + struct mdio_nl_insn *insn;
> > + unsigned long timeout;
> > + u16 regs[8] = { 0 };
> > + int pc, ret = 0;
>
> So "pc" is signed.
>
> > + int phy_id, reg, prtad, devad, val;
> > +
> > + timeout = jiffies + msecs_to_jiffies(xfer->timeout_ms);
> > +
> > + mutex_lock(&xfer->mdio->mdio_lock);
> > +
> > + for (insn = xfer->prog, pc = 0;
> > + pc < xfer->prog_len;
>
> xfer->prog_len is signed, so this is a signed comparison.
>
> > + case MDIO_NL_OP_JEQ:
> > + if (__arg_ri(insn->arg0, regs) ==
> > + __arg_ri(insn->arg1, regs))
> > + pc += (s16)__arg_i(insn->arg2);
>
> This adds a signed 16-bit integer to pc, which can make pc negative.
>
> And so the question becomes... what prevents pc becoming negative
> and then trying to use a negative number as an index?
I don't know ebpf very well, but would it of caught this? I know the
aim of this is to be simple, but due to its simplicity, we are loosing
out on all the inherent safety of eBPF. Is a eBPF interface all that
complex? I assume you just need to add some way to identify MDIO
busses and kfunc to perform a read on the bus?
Andrew
On Tue, Mar 07, 2023 at 12:23:07PM +0100, Michael Walle wrote:
> > To prevent userspace phy drivers, writes are disabled by default, and can
> > only be enabled by editing the source.
>
> Maybe we can just taint the kernel using add_taint()? I'm not sure if
> that will prevent vendors writing user space drivers. Thoughts?
I was thinking about taint as well. But keep the same code structure,
you need to edit it to enable write.
Andrew
On Tue, Mar 07, 2023 at 02:49:27PM +0100, Andrew Lunn wrote:
> On Tue, Mar 07, 2023 at 12:23:07PM +0100, Michael Walle wrote:
> > > To prevent userspace phy drivers, writes are disabled by default, and can
> > > only be enabled by editing the source.
> >
> > Maybe we can just taint the kernel using add_taint()? I'm not sure if
> > that will prevent vendors writing user space drivers. Thoughts?
>
> I was thinking about taint as well. But keep the same code structure,
> you need to edit it to enable write.
But as per the commit message, this locks us out of the following
legitimate uses:
- C45-over-C22 (reads)
- Atomic (why only atomic?) (read) access to paged registers
are we ok with the implications?
> To prevent userspace phy drivers, writes are disabled by default, and can
> only be enabled by editing the source. This is the same strategy used by
> regmap for debugfs writes. Unfortunately, this disallows several useful
> features, including
>
> - Register writes (obviously)
> - C45-over-C22
You could add C45-over-C22 as another op.
This tool is dangerous, even in its read only mode, just like the
IOCTL interface. Interrupt status registers are often clear on read,
so you can loose interrupts. Statistics counters are sometimes clear
on read. BMSR link bit is also latching, so a read of it could mean
you miss link events, etc. Adding C45-over-C22 is just as dangerous,
you can mess up MDIO switches which use the registers for other
things, but by deciding to use this tool you have decided to take the
risk of blowing your foot off.
> - Atomic access to paged registers
> - Better MDIO emulation for e.g. QEMU
>
> However, the read-only interface remains broadly useful for debugging.
I would say it is broadly useful for PHYs. But not Ethernet switches,
when in read only mode.
> +static int mdio_nl_open(struct mdio_nl_xfer *xfer);
> +static int mdio_nl_close(struct mdio_nl_xfer *xfer, bool last, int xerr);
I guess i never did a proper review of this code before, due to not
liking the concept....
Move the code around so these are not needed, unless there are
functions which are mutually recursive.
> +static inline u16 *__arg_r(u32 arg, u16 *regs)
> +{
> + WARN_ON_ONCE(arg >> 16 != MDIO_NL_ARG_REG);
> +
> + return ®s[arg & 0x7];
> +}
No inline functions in C files. Leave the compiler to decide.
> +static int mdio_nl_eval(struct mdio_nl_xfer *xfer)
> +{
> + struct mdio_nl_insn *insn;
> + unsigned long timeout;
> + u16 regs[8] = { 0 };
> + int pc, ret = 0;
> + int phy_id, reg, prtad, devad, val;
> +
> + timeout = jiffies + msecs_to_jiffies(xfer->timeout_ms);
> +
> + mutex_lock(&xfer->mdio->mdio_lock);
Should timeout be set inside the lock, for when you have two
applications running in parallel and each take a while?
> +
> + for (insn = xfer->prog, pc = 0;
> + pc < xfer->prog_len;
> + insn = &xfer->prog[++pc]) {
> + if (time_after(jiffies, timeout)) {
> + ret = -ETIMEDOUT;
> + break;
> + }
> +
> + switch ((enum mdio_nl_op)insn->op) {
> + case MDIO_NL_OP_READ:
> + phy_id = __arg_ri(insn->arg0, regs);
> + prtad = mdio_phy_id_prtad(phy_id);
> + devad = mdio_phy_id_devad(phy_id);
> + reg = __arg_ri(insn->arg1, regs);
> +
> + if (mdio_phy_id_is_c45(phy_id))
> + ret = __mdiobus_c45_read(xfer->mdio, prtad,
> + devad, reg);
> + else
> + ret = __mdiobus_read(xfer->mdio, phy_id, reg);
The application should say if it want to do C22 or C45. As you said in
the cover note, the ioctl interface is limiting when there is no PHY,
so you are artificially adding the same restriction here. Also, you
might want to do C45 on a C22 PHY, e.g. to access EEE registers. Plus
you could consider adding C45 over C22 here.
> +
> + if (ret < 0)
> + goto exit;
> + *__arg_r(insn->arg2, regs) = ret;
> + ret = 0;
> + break;
> +
> + case MDIO_NL_OP_WRITE:
> + phy_id = __arg_ri(insn->arg0, regs);
> + prtad = mdio_phy_id_prtad(phy_id);
> + devad = mdio_phy_id_devad(phy_id);
> + reg = __arg_ri(insn->arg1, regs);
> + val = __arg_ri(insn->arg2, regs);
> +
> +#ifdef MDIO_NETLINK_ALLOW_WRITE
> + add_taint(TAINT_USER, LOCKDEP_STILL_OK);
I don't know, but maybe taint on read as well.
> + if (mdio_phy_id_is_c45(phy_id))
> + ret = __mdiobus_c45_write(xfer->mdio, prtad,
> + devad, reg, val
> + else
> + ret = __mdiobus_write(xfer->mdio, dev, reg,
> + val);
> +#else
> + ret = -EPERM;
EPERM is odd, EOPNOTSUPP would be better. EPERM suggests you can run
it as root and it should work.
Andrew
On Tue, Mar 07, 2023 at 03:22:46PM +0100, Andrew Lunn wrote:
> > + switch ((enum mdio_nl_op)insn->op) {
> > + case MDIO_NL_OP_READ:
> > + phy_id = __arg_ri(insn->arg0, regs);
> > + prtad = mdio_phy_id_prtad(phy_id);
> > + devad = mdio_phy_id_devad(phy_id);
> > + reg = __arg_ri(insn->arg1, regs);
> > +
> > + if (mdio_phy_id_is_c45(phy_id))
> > + ret = __mdiobus_c45_read(xfer->mdio, prtad,
> > + devad, reg);
> > + else
> > + ret = __mdiobus_read(xfer->mdio, phy_id, reg);
>
> The application should say if it want to do C22 or C45. As you said in
> the cover note, the ioctl interface is limiting when there is no PHY,
> so you are artificially adding the same restriction here. Also, you
> might want to do C45 on a C22 PHY, e.g. to access EEE registers. Plus
> you could consider adding C45 over C22 here.
Remembering of course that C45-over-C22 on a device that isn't a PHY
could end up causing havoc, but then if you are using this interface,
you already have the gun pointing at your foot... and if you go and
try C45-over-C22 to scan a MDIO bus, you'd definitely be pulling the
trigger too.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Tue, Mar 07, 2023 at 03:33:20PM +0100, Andrew Lunn wrote:
> > - Atomic (why only atomic?) (read) access to paged registers
>
> I would say 'atomic' is wrong, you cannot access paged registers at
> all.
>
> > are we ok with the implications?
>
> I am. Anybody doing this level of debugging should be able to
> recompile the kernel to enable write support. It does limit debugging
> in field, where maybe you cannot recompile the kernel, but to me, that
> is a reasonable trade off.
However, it should be pointed out that disabling write support means
that one can not even read paged registers through this interface.
That leads me to question the whole point of this, because as far as I
can see, the only thing this interface would offer with writes disabled
is the ability to read several non-paged registers consectively while
holding the mdio bus lock. Apart from that, with writes disabled, it
appears to offer nothing over the existing MII ioctls.
With writes enabled, then yes, it offers a slightly better interface
to be able to perform multiple accesses while holding the bus lock.
In that regard, is there any point to having the configuration option
to control whether writes are supported, or is it just better to have
an option to enable/disable the whole interface, and taint the kernel
on *any* use of this interface?
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
> - Atomic (why only atomic?) (read) access to paged registers
I would say 'atomic' is wrong, you cannot access paged registers at
all.
> are we ok with the implications?
I am. Anybody doing this level of debugging should be able to
recompile the kernel to enable write support. It does limit debugging
in field, where maybe you cannot recompile the kernel, but to me, that
is a reasonable trade off.
Andrew
On 3/7/23 09:22, Andrew Lunn wrote:
>> To prevent userspace phy drivers, writes are disabled by default, and can
>> only be enabled by editing the source. This is the same strategy used by
>> regmap for debugfs writes. Unfortunately, this disallows several useful
>> features, including
>>
>> - Register writes (obviously)
>> - C45-over-C22
>
> You could add C45-over-C22 as another op.
>
> This tool is dangerous, even in its read only mode, just like the
> IOCTL interface. Interrupt status registers are often clear on read,
> so you can loose interrupts. Statistics counters are sometimes clear
> on read. BMSR link bit is also latching, so a read of it could mean
> you miss link events, etc. Adding C45-over-C22 is just as dangerous,
> you can mess up MDIO switches which use the registers for other
> things, but by deciding to use this tool you have decided to take the
> risk of blowing your foot off.
Yes, and I should probably have commented on this in the commit message.
IMO the things you listed are... iffy but unlikely to cause a
malfunction. Tainting on read would be fine, since it is certainly
possible to imagine hardware which would malfunction on read. I suppose
regmap gets away with this by sticking the whole thing in debugfs.
>> - Atomic access to paged registers
>> - Better MDIO emulation for e.g. QEMU
>>
>> However, the read-only interface remains broadly useful for debugging.
>
> I would say it is broadly useful for PHYs. But not Ethernet switches,
> when in read only mode.
>
>> +static int mdio_nl_open(struct mdio_nl_xfer *xfer);
>> +static int mdio_nl_close(struct mdio_nl_xfer *xfer, bool last, int xerr);
>
> I guess i never did a proper review of this code before, due to not
> liking the concept....
>
> Move the code around so these are not needed, unless there are
> functions which are mutually recursive.
They do indeed call each other (although by my analysis no recursion
results). The forward declaration of mdio_nl_open is unnecessary, so I
will rearrange things to avoid that.
>> +static inline u16 *__arg_r(u32 arg, u16 *regs)
>> +{
>> + WARN_ON_ONCE(arg >> 16 != MDIO_NL_ARG_REG);
>> +
>> + return ®s[arg & 0x7];
>> +}
>
> No inline functions in C files. Leave the compiler to decide.
OK
>> +static int mdio_nl_eval(struct mdio_nl_xfer *xfer)
>> +{
>> + struct mdio_nl_insn *insn;
>> + unsigned long timeout;
>> + u16 regs[8] = { 0 };
>> + int pc, ret = 0;
>> + int phy_id, reg, prtad, devad, val;
>> +
>> + timeout = jiffies + msecs_to_jiffies(xfer->timeout_ms);
>> +
>> + mutex_lock(&xfer->mdio->mdio_lock);
>
> Should timeout be set inside the lock, for when you have two
> applications running in parallel and each take a while?
That seems reasonable.
>> +
>> + for (insn = xfer->prog, pc = 0;
>> + pc < xfer->prog_len;
>> + insn = &xfer->prog[++pc]) {
>> + if (time_after(jiffies, timeout)) {
>> + ret = -ETIMEDOUT;
>> + break;
>> + }
>> +
>> + switch ((enum mdio_nl_op)insn->op) {
>> + case MDIO_NL_OP_READ:
>> + phy_id = __arg_ri(insn->arg0, regs);
>> + prtad = mdio_phy_id_prtad(phy_id);
>> + devad = mdio_phy_id_devad(phy_id);
>> + reg = __arg_ri(insn->arg1, regs);
>> +
>> + if (mdio_phy_id_is_c45(phy_id))
>> + ret = __mdiobus_c45_read(xfer->mdio, prtad,
>> + devad, reg);
>> + else
>> + ret = __mdiobus_read(xfer->mdio, phy_id, reg);
>
> The application should say if it want to do C22 or C45.
The phy_id comes from the application. So it sets MDIO_PHY_ID_C45 if it wants
to use C45.
> As you said in
> the cover note, the ioctl interface is limiting when there is no PHY,
> so you are artificially adding the same restriction here.
I don't follow.
> Also, you
> might want to do C45 on a C22 PHY, e.g. to access EEE registers. Plus
> you could consider adding C45 over C22 here.
As Russell noted, this is dangerous in the general case.
>> +
>> + if (ret < 0)
>> + goto exit;
>> + *__arg_r(insn->arg2, regs) = ret;
>> + ret = 0;
>> + break;
>> +
>> + case MDIO_NL_OP_WRITE:
>> + phy_id = __arg_ri(insn->arg0, regs);
>> + prtad = mdio_phy_id_prtad(phy_id);
>> + devad = mdio_phy_id_devad(phy_id);
>> + reg = __arg_ri(insn->arg1, regs);
>> + val = __arg_ri(insn->arg2, regs);
>> +
>> +#ifdef MDIO_NETLINK_ALLOW_WRITE
>> + add_taint(TAINT_USER, LOCKDEP_STILL_OK);
>
> I don't know, but maybe taint on read as well.
>
>> + if (mdio_phy_id_is_c45(phy_id))
>> + ret = __mdiobus_c45_write(xfer->mdio, prtad,
>> + devad, reg, val
>> + else
>> + ret = __mdiobus_write(xfer->mdio, dev, reg,
>> + val);
>> +#else
>> + ret = -EPERM;
>
> EPERM is odd, EOPNOTSUPP would be better. EPERM suggests you can run
> it as root and it should work.
Well, EPERM is what you get when trying to write a 444 file, which is
effectively what we're enforcing here.
--Sean
On 3/7/23 07:26, Tobias Waldekranz wrote:
> On mån, mar 06, 2023 at 15:45, Sean Anderson <[email protected]> wrote:
>> This adds a netlink interface to make reads/writes to mdio buses. This
>> makes it easier to debug devices. This is especially useful when there
>> is a PCS involved (and the ethtool reads are faked), when there is no
>> MAC associated with a PHY, or when the MDIO device is not a PHY.
>>
>> The closest existing in-kernel interfaces are the SIOCG/SMIIREG ioctls, but
>> they have several drawbacks:
>>
>> 1. "Write register" is not always exactly that. The kernel will try to
>> be extra helpful and do things behind the scenes if it detects a
>> write to the reset bit of a PHY for example.
>>
>> 2. Only one op per syscall. This means that is impossible to implement
>> many operations in a safe manner. Something as simple as a
>> read/mask/write cycle can race against an in-kernel driver.
>>
>> 3. Addressing is awkward since you don't address the MDIO bus
>> directly, rather "the MDIO bus to which this netdev's PHY is
>> connected". This makes it hard to talk to devices on buses to which
>> no PHYs are connected, the typical case being Ethernet switches.
>>
>> To address these shortcomings, this adds a GENL interface with which a user
>> can interact with an MDIO bus directly. The user sends a program that
>> mdio-netlink executes, possibly emitting data back to the user. I.e. it
>> implements a very simple VM. Read/mask/write operations could be
>> implemented by dedicated commands, but when you start looking at more
>> advanced things like reading out the VLAN database of a switch you need
>> state and branching.
>>
>> To prevent userspace phy drivers, writes are disabled by default, and can
>> only be enabled by editing the source. This is the same strategy used by
>> regmap for debugfs writes. Unfortunately, this disallows several useful
>> features, including
>>
>> - Register writes (obviously)
>> - C45-over-C22
>> - Atomic access to paged registers
>> - Better MDIO emulation for e.g. QEMU
>>
>> However, the read-only interface remains broadly useful for debugging.
>> Users who want to use the above features can re-enable them by defining
>> MDIO_NETLINK_ALLOW_WRITE and recompiling their kernel.
>
> What about taking a page from the BPF playbook and require all loaded
> programs (MDIO_GENL_XFERs) to be licensed under GPL? That would mean
> that the userspace program that generated it would also have to be
> GPLed.
>
> My view has always been that a vendor looking to build a userspace SDK
> won't be deterred by this limitation. They can easily build
> mdio-netlink.ko from mdio-tools and use that to drive it, or (more
> likely) they already have their own implementation that they are stuck
> with for legacy reasons. In other words: we are only punishing
> legitimate users (mdio-tools being one of them, IMO).
Yes, I agree with this. It's always seemed silly to me to exclude a good
debugging interface on the grounds that it could be used for userspace
drivers when a vendor can just as easily supply their own proprietary
module implementing the same thing.
Last time, the discussion seemed to get hung up on this topic, so I wanted
to start off with an approach which would obviously prevent misuse (albeit
rather draconian).
--Sean
> Perhaps with this approach we could have our cake and eat it too.
>
>> Signed-off-by: Sean Anderson <[email protected]>
>> ---
>> This driver was written by Tobias Waldekranz. It is adapted from the
>> version he released with mdio-tools [1]. This was last discussed 2.5
>> years ago [2], and I have incorperated his cover letter into this commit
>> message. The discussion mainly centered around the write capability
>> allowing for userspace phy drivers. Although it comes with reduced
>> functionality, I hope this new approach satisfies Andrew. I have also
>> made several minor changes for style and to stay abrest of changing
>> APIs.
>>
>> Tobias, I've taken the liberty of adding some copyright notices
>> attributing this to you.
>
> Fine by me :)
>
>> [1] https://github.com/wkz/mdio-tools
>> [2] https://lore.kernel.org/netdev/C42DZQLTPHM5.2THDSRK84BI3T@wkz-x280/
>>
>> drivers/net/mdio/Kconfig | 8 +
>> drivers/net/mdio/Makefile | 1 +
>> drivers/net/mdio/mdio-netlink.c | 464 ++++++++++++++++++++++++++++++
>> include/uapi/linux/mdio-netlink.h | 61 ++++
>> 4 files changed, 534 insertions(+)
>> create mode 100644 drivers/net/mdio/mdio-netlink.c
>> create mode 100644 include/uapi/linux/mdio-netlink.h
>>
>> diff --git a/drivers/net/mdio/Kconfig b/drivers/net/mdio/Kconfig
>> index 90309980686e..8a01978e5b51 100644
>> --- a/drivers/net/mdio/Kconfig
>> +++ b/drivers/net/mdio/Kconfig
>> @@ -43,6 +43,14 @@ config ACPI_MDIO
>>
>> if MDIO_BUS
>>
>> +config MDIO_NETLINK
>> + tristate "Netlink interface for MDIO buses"
>> + help
>> + Enable a netlink interface to allow reading MDIO buses from
>> + userspace. A small virtual machine allows implementing complex
>> + operations, such as conditional reads or polling. All operations
>> + submitted in the same program are evaluated atomically.
>> +
>> config MDIO_DEVRES
>> tristate
>>
>> diff --git a/drivers/net/mdio/Makefile b/drivers/net/mdio/Makefile
>> index 7d4cb4c11e4e..5583d5b8d174 100644
>> --- a/drivers/net/mdio/Makefile
>> +++ b/drivers/net/mdio/Makefile
>> @@ -4,6 +4,7 @@
>> obj-$(CONFIG_ACPI_MDIO) += acpi_mdio.o
>> obj-$(CONFIG_FWNODE_MDIO) += fwnode_mdio.o
>> obj-$(CONFIG_OF_MDIO) += of_mdio.o
>> +obj-$(CONFIG_MDIO_NETLINK) += mdio-netlink.o
>>
>> obj-$(CONFIG_MDIO_ASPEED) += mdio-aspeed.o
>> obj-$(CONFIG_MDIO_BCM_IPROC) += mdio-bcm-iproc.o
>> diff --git a/drivers/net/mdio/mdio-netlink.c b/drivers/net/mdio/mdio-netlink.c
>> new file mode 100644
>> index 000000000000..3e32d1a9bab3
>> --- /dev/null
>> +++ b/drivers/net/mdio/mdio-netlink.c
>> @@ -0,0 +1,464 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (C) 2022-23 Sean Anderson <[email protected]>
>> + * Copyright (C) 2020-22 Tobias Waldekranz <[email protected]>
>> + */
>> +
>> +#include <linux/init.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/netlink.h>
>> +#include <linux/phy.h>
>> +#include <net/genetlink.h>
>> +#include <net/netlink.h>
>> +#include <uapi/linux/mdio-netlink.h>
>> +
>> +struct mdio_nl_xfer {
>> + struct genl_info *info;
>> + struct sk_buff *msg;
>> + void *hdr;
>> + struct nlattr *data;
>> +
>> + struct mii_bus *mdio;
>> + int timeout_ms;
>> +
>> + int prog_len;
>> + struct mdio_nl_insn *prog;
>> +};
>> +
>> +static int mdio_nl_open(struct mdio_nl_xfer *xfer);
>> +static int mdio_nl_close(struct mdio_nl_xfer *xfer, bool last, int xerr);
>> +
>> +static int mdio_nl_flush(struct mdio_nl_xfer *xfer)
>> +{
>> + int err;
>> +
>> + err = mdio_nl_close(xfer, false, 0);
>> + if (err)
>> + return err;
>> +
>> + return mdio_nl_open(xfer);
>> +}
>> +
>> +static int mdio_nl_emit(struct mdio_nl_xfer *xfer, u32 datum)
>> +{
>> + int err = 0;
>> +
>> + if (!nla_put_nohdr(xfer->msg, sizeof(datum), &datum))
>> + return 0;
>> +
>> + err = mdio_nl_flush(xfer);
>> + if (err)
>> + return err;
>> +
>> + return nla_put_nohdr(xfer->msg, sizeof(datum), &datum);
>> +}
>> +
>> +static inline u16 *__arg_r(u32 arg, u16 *regs)
>> +{
>> + WARN_ON_ONCE(arg >> 16 != MDIO_NL_ARG_REG);
>> +
>> + return ®s[arg & 0x7];
>> +}
>> +
>> +static inline u16 __arg_i(u32 arg)
>> +{
>> + WARN_ON_ONCE(arg >> 16 != MDIO_NL_ARG_IMM);
>> +
>> + return arg & 0xffff;
>> +}
>> +
>> +static inline u16 __arg_ri(u32 arg, u16 *regs)
>> +{
>> + switch ((enum mdio_nl_argmode)(arg >> 16)) {
>> + case MDIO_NL_ARG_IMM:
>> + return arg & 0xffff;
>> + case MDIO_NL_ARG_REG:
>> + return regs[arg & 7];
>> + default:
>> + WARN_ON_ONCE(1);
>> + return 0;
>> + }
>> +}
>> +
>> +/* To prevent out-of-tree drivers from being implemented through this
>> + * interface, disallow writes by default. This does disallow read-only uses,
>> + * such as c45-over-c22 or reading phys with pages. However, with a such a
>> + * flexible interface, we must use a big hammer. People who want to use this
>> + * will need to modify the source code directly.
>> + */
>> +#undef MDIO_NETLINK_ALLOW_WRITE
>> +
>> +static int mdio_nl_eval(struct mdio_nl_xfer *xfer)
>> +{
>> + struct mdio_nl_insn *insn;
>> + unsigned long timeout;
>> + u16 regs[8] = { 0 };
>> + int pc, ret = 0;
>> + int phy_id, reg, prtad, devad, val;
>> +
>> + timeout = jiffies + msecs_to_jiffies(xfer->timeout_ms);
>> +
>> + mutex_lock(&xfer->mdio->mdio_lock);
>> +
>> + for (insn = xfer->prog, pc = 0;
>> + pc < xfer->prog_len;
>> + insn = &xfer->prog[++pc]) {
>> + if (time_after(jiffies, timeout)) {
>> + ret = -ETIMEDOUT;
>> + break;
>> + }
>> +
>> + switch ((enum mdio_nl_op)insn->op) {
>> + case MDIO_NL_OP_READ:
>> + phy_id = __arg_ri(insn->arg0, regs);
>> + prtad = mdio_phy_id_prtad(phy_id);
>> + devad = mdio_phy_id_devad(phy_id);
>> + reg = __arg_ri(insn->arg1, regs);
>> +
>> + if (mdio_phy_id_is_c45(phy_id))
>> + ret = __mdiobus_c45_read(xfer->mdio, prtad,
>> + devad, reg);
>> + else
>> + ret = __mdiobus_read(xfer->mdio, phy_id, reg);
>> +
>> + if (ret < 0)
>> + goto exit;
>> + *__arg_r(insn->arg2, regs) = ret;
>> + ret = 0;
>> + break;
>> +
>> + case MDIO_NL_OP_WRITE:
>> + phy_id = __arg_ri(insn->arg0, regs);
>> + prtad = mdio_phy_id_prtad(phy_id);
>> + devad = mdio_phy_id_devad(phy_id);
>> + reg = __arg_ri(insn->arg1, regs);
>> + val = __arg_ri(insn->arg2, regs);
>> +
>> +#ifdef MDIO_NETLINK_ALLOW_WRITE
>> + add_taint(TAINT_USER, LOCKDEP_STILL_OK);
>> + if (mdio_phy_id_is_c45(phy_id))
>> + ret = __mdiobus_c45_write(xfer->mdio, prtad,
>> + devad, reg, val
>> + else
>> + ret = __mdiobus_write(xfer->mdio, dev, reg,
>> + val);
>> +#else
>> + ret = -EPERM;
>> +#endif
>> + if (ret < 0)
>> + goto exit;
>> + ret = 0;
>> + break;
>> +
>> + case MDIO_NL_OP_AND:
>> + *__arg_r(insn->arg2, regs) =
>> + __arg_ri(insn->arg0, regs) &
>> + __arg_ri(insn->arg1, regs);
>> + break;
>> +
>> + case MDIO_NL_OP_OR:
>> + *__arg_r(insn->arg2, regs) =
>> + __arg_ri(insn->arg0, regs) |
>> + __arg_ri(insn->arg1, regs);
>> + break;
>> +
>> + case MDIO_NL_OP_ADD:
>> + *__arg_r(insn->arg2, regs) =
>> + __arg_ri(insn->arg0, regs) +
>> + __arg_ri(insn->arg1, regs);
>> + break;
>> +
>> + case MDIO_NL_OP_JEQ:
>> + if (__arg_ri(insn->arg0, regs) ==
>> + __arg_ri(insn->arg1, regs))
>> + pc += (s16)__arg_i(insn->arg2);
>> + break;
>> +
>> + case MDIO_NL_OP_JNE:
>> + if (__arg_ri(insn->arg0, regs) !=
>> + __arg_ri(insn->arg1, regs))
>> + pc += (s16)__arg_i(insn->arg2);
>> + break;
>> +
>> + case MDIO_NL_OP_EMIT:
>> + ret = mdio_nl_emit(xfer, __arg_ri(insn->arg0, regs));
>> + if (ret < 0)
>> + goto exit;
>> + ret = 0;
>> + break;
>> +
>> + case MDIO_NL_OP_UNSPEC:
>> + default:
>> + ret = -EINVAL;
>> + goto exit;
>> + }
>> + }
>> +exit:
>> + mutex_unlock(&xfer->mdio->mdio_lock);
>> + return ret;
>> +}
>> +
>> +struct mdio_nl_op_proto {
>> + u8 arg0;
>> + u8 arg1;
>> + u8 arg2;
>> +};
>> +
>> +static const struct mdio_nl_op_proto mdio_nl_op_protos[MDIO_NL_OP_MAX + 1] = {
>> + [MDIO_NL_OP_READ] = {
>> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg2 = BIT(MDIO_NL_ARG_REG),
>> + },
>> + [MDIO_NL_OP_WRITE] = {
>> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg2 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + },
>> + [MDIO_NL_OP_AND] = {
>> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg2 = BIT(MDIO_NL_ARG_REG),
>> + },
>> + [MDIO_NL_OP_OR] = {
>> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg2 = BIT(MDIO_NL_ARG_REG),
>> + },
>> + [MDIO_NL_OP_ADD] = {
>> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg2 = BIT(MDIO_NL_ARG_REG),
>> + },
>> + [MDIO_NL_OP_JEQ] = {
>> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg2 = BIT(MDIO_NL_ARG_IMM),
>> + },
>> + [MDIO_NL_OP_JNE] = {
>> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg1 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg2 = BIT(MDIO_NL_ARG_IMM),
>> + },
>> + [MDIO_NL_OP_EMIT] = {
>> + .arg0 = BIT(MDIO_NL_ARG_REG) | BIT(MDIO_NL_ARG_IMM),
>> + .arg1 = BIT(MDIO_NL_ARG_NONE),
>> + .arg2 = BIT(MDIO_NL_ARG_NONE),
>> + },
>> +};
>> +
>> +static int mdio_nl_validate_insn(const struct nlattr *attr,
>> + struct netlink_ext_ack *extack,
>> + const struct mdio_nl_insn *insn)
>> +{
>> + const struct mdio_nl_op_proto *proto;
>> +
>> + if (insn->op > MDIO_NL_OP_MAX) {
>> + NL_SET_ERR_MSG_ATTR(extack, attr, "Illegal instruction");
>> + return -EINVAL;
>> + }
>> +
>> + proto = &mdio_nl_op_protos[insn->op];
>> +
>> + if (!(BIT(insn->arg0 >> 16) & proto->arg0)) {
>> + NL_SET_ERR_MSG_ATTR(extack, attr, "Argument 0 invalid");
>> + return -EINVAL;
>> + }
>> +
>> + if (!(BIT(insn->arg1 >> 16) & proto->arg1)) {
>> + NL_SET_ERR_MSG_ATTR(extack, attr, "Argument 1 invalid");
>> + return -EINVAL;
>> + }
>> +
>> + if (!(BIT(insn->arg2 >> 16) & proto->arg2)) {
>> + NL_SET_ERR_MSG_ATTR(extack, attr, "Argument 2 invalid");
>> + return -EINVAL;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int mdio_nl_validate_prog(const struct nlattr *attr,
>> + struct netlink_ext_ack *extack)
>> +{
>> + const struct mdio_nl_insn *prog = nla_data(attr);
>> + int len = nla_len(attr);
>> + int i, err = 0;
>> +
>> + if (len % sizeof(*prog)) {
>> + NL_SET_ERR_MSG_ATTR(extack, attr, "Unaligned instruction");
>> + return -EINVAL;
>> + }
>> +
>> + len /= sizeof(*prog);
>> + for (i = 0; i < len; i++) {
>> + err = mdio_nl_validate_insn(attr, extack, &prog[i]);
>> + if (err)
>> + break;
>> + }
>> +
>> + return err;
>> +}
>> +
>> +static const struct nla_policy mdio_nl_policy[MDIO_NLA_MAX + 1] = {
>> + [MDIO_NLA_UNSPEC] = { .type = NLA_UNSPEC, },
>> + [MDIO_NLA_BUS_ID] = { .type = NLA_STRING, .len = MII_BUS_ID_SIZE },
>> + [MDIO_NLA_TIMEOUT] = NLA_POLICY_MAX(NLA_U16, 10 * MSEC_PER_SEC),
>> + [MDIO_NLA_PROG] = NLA_POLICY_VALIDATE_FN(NLA_BINARY,
>> + mdio_nl_validate_prog,
>> + 0x1000),
>> + [MDIO_NLA_DATA] = { .type = NLA_NESTED },
>> + [MDIO_NLA_ERROR] = { .type = NLA_S32, },
>> +};
>> +
>> +static struct genl_family mdio_nl_family;
>> +
>> +static int mdio_nl_open(struct mdio_nl_xfer *xfer)
>> +{
>> + int err;
>> +
>> + xfer->msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
>> + if (!xfer->msg) {
>> + err = -ENOMEM;
>> + goto err;
>> + }
>> +
>> + xfer->hdr = genlmsg_put(xfer->msg, xfer->info->snd_portid,
>> + xfer->info->snd_seq, &mdio_nl_family,
>> + NLM_F_ACK | NLM_F_MULTI, MDIO_GENL_XFER);
>> + if (!xfer->hdr) {
>> + err = -EMSGSIZE;
>> + goto err_free;
>> + }
>> +
>> + xfer->data = nla_nest_start(xfer->msg, MDIO_NLA_DATA);
>> + if (!xfer->data) {
>> + err = -EMSGSIZE;
>> + goto err_free;
>> + }
>> +
>> + return 0;
>> +
>> +err_free:
>> + nlmsg_free(xfer->msg);
>> +err:
>> + return err;
>> +}
>> +
>> +static int mdio_nl_close(struct mdio_nl_xfer *xfer, bool last, int xerr)
>> +{
>> + struct nlmsghdr *end;
>> + int err;
>> +
>> + nla_nest_end(xfer->msg, xfer->data);
>> +
>> + if (xerr && nla_put_s32(xfer->msg, MDIO_NLA_ERROR, xerr)) {
>> + err = mdio_nl_flush(xfer);
>> + if (err)
>> + goto err_free;
>> +
>> + if (nla_put_s32(xfer->msg, MDIO_NLA_ERROR, xerr)) {
>> + err = -EMSGSIZE;
>> + goto err_free;
>> + }
>> + }
>> +
>> + genlmsg_end(xfer->msg, xfer->hdr);
>> +
>> + if (last) {
>> + end = nlmsg_put(xfer->msg, xfer->info->snd_portid,
>> + xfer->info->snd_seq, NLMSG_DONE, 0,
>> + NLM_F_ACK | NLM_F_MULTI);
>> + if (!end) {
>> + err = mdio_nl_flush(xfer);
>> + if (err)
>> + goto err_free;
>> +
>> + end = nlmsg_put(xfer->msg, xfer->info->snd_portid,
>> + xfer->info->snd_seq, NLMSG_DONE, 0,
>> + NLM_F_ACK | NLM_F_MULTI);
>> + if (!end) {
>> + err = -EMSGSIZE;
>> + goto err_free;
>> + }
>> + }
>> + }
>> +
>> + return genlmsg_unicast(genl_info_net(xfer->info), xfer->msg,
>> + xfer->info->snd_portid);
>> +
>> +err_free:
>> + nlmsg_free(xfer->msg);
>> + return err;
>> +}
>> +
>> +static int mdio_nl_cmd_xfer(struct sk_buff *skb, struct genl_info *info)
>> +{
>> + struct mdio_nl_xfer xfer;
>> + int err;
>> +
>> + if (!info->attrs[MDIO_NLA_BUS_ID] ||
>> + !info->attrs[MDIO_NLA_PROG] ||
>> + info->attrs[MDIO_NLA_DATA] ||
>> + info->attrs[MDIO_NLA_ERROR])
>> + return -EINVAL;
>> +
>> + xfer.mdio = mdio_find_bus(nla_data(info->attrs[MDIO_NLA_BUS_ID]));
>> + if (!xfer.mdio)
>> + return -ENODEV;
>> +
>> + if (info->attrs[MDIO_NLA_TIMEOUT])
>> + xfer.timeout_ms = nla_get_u32(info->attrs[MDIO_NLA_TIMEOUT]);
>> + else
>> + xfer.timeout_ms = 100;
>> +
>> + xfer.info = info;
>> + xfer.prog_len = nla_len(info->attrs[MDIO_NLA_PROG]) / sizeof(*xfer.prog);
>> + xfer.prog = nla_data(info->attrs[MDIO_NLA_PROG]);
>> +
>> + err = mdio_nl_open(&xfer);
>> + if (err)
>> + return err;
>> +
>> + err = mdio_nl_eval(&xfer);
>> +
>> + err = mdio_nl_close(&xfer, true, err);
>> + return err;
>> +}
>> +
>> +static const struct genl_ops mdio_nl_ops[] = {
>> + {
>> + .cmd = MDIO_GENL_XFER,
>> + .doit = mdio_nl_cmd_xfer,
>> + .flags = GENL_ADMIN_PERM,
>> + },
>> +};
>> +
>> +static struct genl_family mdio_nl_family = {
>> + .name = "mdio",
>> + .version = 1,
>> + .maxattr = MDIO_NLA_MAX,
>> + .netnsok = false,
>> + .module = THIS_MODULE,
>> + .ops = mdio_nl_ops,
>> + .n_ops = ARRAY_SIZE(mdio_nl_ops),
>> + .policy = mdio_nl_policy,
>> +};
>> +
>> +static int __init mdio_nl_init(void)
>> +{
>> + return genl_register_family(&mdio_nl_family);
>> +}
>> +
>> +static void __exit mdio_nl_exit(void)
>> +{
>> + genl_unregister_family(&mdio_nl_family);
>> +}
>> +
>> +MODULE_AUTHOR("Tobias Waldekranz <[email protected]>");
>> +MODULE_DESCRIPTION("MDIO Netlink Interface");
>> +MODULE_LICENSE("GPL");
>> +
>> +module_init(mdio_nl_init);
>> +module_exit(mdio_nl_exit);
>> diff --git a/include/uapi/linux/mdio-netlink.h b/include/uapi/linux/mdio-netlink.h
>> new file mode 100644
>> index 000000000000..bebd3b45c882
>> --- /dev/null
>> +++ b/include/uapi/linux/mdio-netlink.h
>> @@ -0,0 +1,61 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>> +/*
>> + * Copyright (C) 2020-22 Tobias Waldekranz <[email protected]>
>> + */
>> +
>> +#ifndef _UAPI_LINUX_MDIO_NETLINK_H
>> +#define _UAPI_LINUX_MDIO_NETLINK_H
>> +
>> +#include <linux/types.h>
>> +
>> +enum {
>> + MDIO_GENL_UNSPEC,
>> + MDIO_GENL_XFER,
>> +
>> + __MDIO_GENL_MAX,
>> + MDIO_GENL_MAX = __MDIO_GENL_MAX - 1
>> +};
>> +
>> +enum {
>> + MDIO_NLA_UNSPEC,
>> + MDIO_NLA_BUS_ID, /* string */
>> + MDIO_NLA_TIMEOUT, /* u32 */
>> + MDIO_NLA_PROG, /* struct mdio_nl_insn[] */
>> + MDIO_NLA_DATA, /* nest */
>> + MDIO_NLA_ERROR, /* s32 */
>> +
>> + __MDIO_NLA_MAX,
>> + MDIO_NLA_MAX = __MDIO_NLA_MAX - 1
>> +};
>> +
>> +enum mdio_nl_op {
>> + MDIO_NL_OP_UNSPEC,
>> + MDIO_NL_OP_READ, /* read dev(RI), port(RI), dst(R) */
>> + MDIO_NL_OP_WRITE, /* write dev(RI), port(RI), src(RI) */
>> + MDIO_NL_OP_AND, /* and a(RI), b(RI), dst(R) */
>> + MDIO_NL_OP_OR, /* or a(RI), b(RI), dst(R) */
>> + MDIO_NL_OP_ADD, /* add a(RI), b(RI), dst(R) */
>> + MDIO_NL_OP_JEQ, /* jeq a(RI), b(RI), jmp(I) */
>> + MDIO_NL_OP_JNE, /* jeq a(RI), b(RI), jmp(I) */
>> + MDIO_NL_OP_EMIT, /* emit src(RI) */
>> +
>> + __MDIO_NL_OP_MAX,
>> + MDIO_NL_OP_MAX = __MDIO_NL_OP_MAX - 1
>> +};
>> +
>> +enum mdio_nl_argmode {
>> + MDIO_NL_ARG_NONE,
>> + MDIO_NL_ARG_REG,
>> + MDIO_NL_ARG_IMM,
>> + MDIO_NL_ARG_RESERVED
>> +};
>> +
>> +struct mdio_nl_insn {
>> + __u64 op:8;
>> + __u64 reserved:2;
>> + __u64 arg0:18;
>> + __u64 arg1:18;
>> + __u64 arg2:18;
>> +};
>> +
>> +#endif /* _UAPI_LINUX_MDIO_NETLINK_H */
>> --
>> 2.35.1.1320.gc452695387.dirty
On 3/7/23 08:47, Andrew Lunn wrote:
> On Mon, Mar 06, 2023 at 10:48:48PM +0000, Russell King (Oracle) wrote:
>> On Mon, Mar 06, 2023 at 03:45:16PM -0500, Sean Anderson wrote:
>> > +static int mdio_nl_eval(struct mdio_nl_xfer *xfer)
>> > +{
>> > + struct mdio_nl_insn *insn;
>> > + unsigned long timeout;
>> > + u16 regs[8] = { 0 };
>> > + int pc, ret = 0;
>>
>> So "pc" is signed.
>>
>> > + int phy_id, reg, prtad, devad, val;
>> > +
>> > + timeout = jiffies + msecs_to_jiffies(xfer->timeout_ms);
>> > +
>> > + mutex_lock(&xfer->mdio->mdio_lock);
>> > +
>> > + for (insn = xfer->prog, pc = 0;
>> > + pc < xfer->prog_len;
>>
>> xfer->prog_len is signed, so this is a signed comparison.
>>
>> > + case MDIO_NL_OP_JEQ:
>> > + if (__arg_ri(insn->arg0, regs) ==
>> > + __arg_ri(insn->arg1, regs))
>> > + pc += (s16)__arg_i(insn->arg2);
>>
>> This adds a signed 16-bit integer to pc, which can make pc negative.
>>
>> And so the question becomes... what prevents pc becoming negative
>> and then trying to use a negative number as an index?
>
> I don't know ebpf very well, but would it of caught this? I know the
> aim of this is to be simple, but due to its simplicity, we are loosing
> out on all the inherent safety of eBPF. Is a eBPF interface all that
> complex? I assume you just need to add some way to identify MDIO
> busses and kfunc to perform a read on the bus?
Regarding eBPF over netlink, the last time this was discussed, Tobias said
> - Why not use BPF?
>
> That could absolutely be one way forward, but the GENL approach was
> easy to build out-of-tree to prove the idea. Its not obvious how it
> would work though as BPF programs typically run async on some event
> (probe hit, packet received etc.) whereas this is a single execution
> on behalf of a user. So to what would the program be attached? The
> output path is also not straight forward, but it could be done with
> perf events i suppose.
I'm not familiar enough with eBPF to comment further.
--Sean
> Yes, and I should probably have commented on this in the commit message.
> IMO the things you listed are... iffy but unlikely to cause a
> malfunction.
You consider a missed interrupt not a malfunction?
> >> +
> >> + for (insn = xfer->prog, pc = 0;
> >> + pc < xfer->prog_len;
> >> + insn = &xfer->prog[++pc]) {
> >> + if (time_after(jiffies, timeout)) {
> >> + ret = -ETIMEDOUT;
> >> + break;
> >> + }
> >> +
> >> + switch ((enum mdio_nl_op)insn->op) {
> >> + case MDIO_NL_OP_READ:
> >> + phy_id = __arg_ri(insn->arg0, regs);
> >> + prtad = mdio_phy_id_prtad(phy_id);
> >> + devad = mdio_phy_id_devad(phy_id);
> >> + reg = __arg_ri(insn->arg1, regs);
> >> +
> >> + if (mdio_phy_id_is_c45(phy_id))
> >> + ret = __mdiobus_c45_read(xfer->mdio, prtad,
> >> + devad, reg);
> >> + else
> >> + ret = __mdiobus_read(xfer->mdio, phy_id, reg);
> >
> > The application should say if it want to do C22 or C45.
>
> The phy_id comes from the application. So it sets MDIO_PHY_ID_C45 if it wants
> to use C45.
Ah, i misunderstood what mdio_phy_id_is_c45() does.
Anyway, i don't like MDIO_PHY_ID_C45, it has been pretty much removed
everywhere with the refactoring of MDIO drivers to export read and
read_c45 etc. PHY drivers also don't use it, they use c22 or c45
specific methods. So i would prefer an additional attribute. That also
opens up the possibility of adding C45 over C22.
> As Russell noted, this is dangerous in the general case.
And Russell also agreed this whole module is dangerous in general.
Once you accept it is dangerous, its a debug tool only, why not allow
playing with a bit more fire? You could at least poke around the MDIO
bus structures and see if a PHY has been found, and it not, block C45
over C22.
> >> + if (mdio_phy_id_is_c45(phy_id))
> >> + ret = __mdiobus_c45_write(xfer->mdio, prtad,
> >> + devad, reg, val
> >> + else
> >> + ret = __mdiobus_write(xfer->mdio, dev, reg,
> >> + val);
> >> +#else
> >> + ret = -EPERM;
> >
> > EPERM is odd, EOPNOTSUPP would be better. EPERM suggests you can run
> > it as root and it should work.
>
> Well, EPERM is what you get when trying to write a 444 file, which is
> effectively what we're enforcing here.
Does it change to 644 when write is enabled? But netlink does not even
use file access permissions. I would probably trap this earlier, where
you have a extack instance you can return a meaningful error message
string.
Andrew
On 3/7/23 12:23, Andrew Lunn wrote:
>> Yes, and I should probably have commented on this in the commit message.
>> IMO the things you listed are... iffy but unlikely to cause a
>> malfunction.
>
> You consider a missed interrupt not a malfunction?
Hm, yeah that would probably do it.
>> >> +
>> >> + for (insn = xfer->prog, pc = 0;
>> >> + pc < xfer->prog_len;
>> >> + insn = &xfer->prog[++pc]) {
>> >> + if (time_after(jiffies, timeout)) {
>> >> + ret = -ETIMEDOUT;
>> >> + break;
>> >> + }
>> >> +
>> >> + switch ((enum mdio_nl_op)insn->op) {
>> >> + case MDIO_NL_OP_READ:
>> >> + phy_id = __arg_ri(insn->arg0, regs);
>> >> + prtad = mdio_phy_id_prtad(phy_id);
>> >> + devad = mdio_phy_id_devad(phy_id);
>> >> + reg = __arg_ri(insn->arg1, regs);
>> >> +
>> >> + if (mdio_phy_id_is_c45(phy_id))
>> >> + ret = __mdiobus_c45_read(xfer->mdio, prtad,
>> >> + devad, reg);
>> >> + else
>> >> + ret = __mdiobus_read(xfer->mdio, phy_id, reg);
>> >
>> > The application should say if it want to do C22 or C45.
>>
>> The phy_id comes from the application. So it sets MDIO_PHY_ID_C45 if it wants
>> to use C45.
>
> Ah, i misunderstood what mdio_phy_id_is_c45() does.
>
> Anyway, i don't like MDIO_PHY_ID_C45, it has been pretty much removed
> everywhere with the refactoring of MDIO drivers to export read and
> read_c45 etc. PHY drivers also don't use it, they use c22 or c45
> specific methods. So i would prefer an additional attribute. That also
> opens up the possibility of adding C45 over C22.
Well, this is really just because there is an existing way to specify c22
and c45 addresses in a u16. We could definitely add a "please do C45 over
C22" flag. That said, I think that sort of thing is handled better by
allowing writes in the general case.
>> As Russell noted, this is dangerous in the general case.
>
> And Russell also agreed this whole module is dangerous in general.
> Once you accept it is dangerous, its a debug tool only, why not allow
> playing with a bit more fire? You could at least poke around the MDIO
> bus structures and see if a PHY has been found, and it not, block C45
> over C22.
I can look into that.
>> >> + if (mdio_phy_id_is_c45(phy_id))
>> >> + ret = __mdiobus_c45_write(xfer->mdio, prtad,
>> >> + devad, reg, val
>> >> + else
>> >> + ret = __mdiobus_write(xfer->mdio, dev, reg,
>> >> + val);
>> >> +#else
>> >> + ret = -EPERM;
>> >
>> > EPERM is odd, EOPNOTSUPP would be better. EPERM suggests you can run
>> > it as root and it should work.
>>
>> Well, EPERM is what you get when trying to write a 444 file, which is
>> effectively what we're enforcing here.
>
> Does it change to 644 when write is enabled?
Yes. But it is more like 400 and 600.
> But netlink does not even use file access permissions.
Is EPERM reserved only for files?
> I would probably trap this earlier, where you have a extack instance
> you can return a meaningful error message string.
That sounds good.
--Sean