This is first part of netlink based alternative userspace interface for
ethtool. It aims to address some long known issues with the ioctl
interface, mainly lack of extensibility, raciness, limited error reporting
and absence of notifications. The goal is to allow userspace ethtool
utility to provide all features it currently does but without using the
ioctl interface. However, some features provided by ethtool ioctl API will
be available through other netlink interfaces (rtnetlink, devlink) if it's
more appropriate.
The interface uses generic netlink family "ethtool" and provides multicast
group "monitor" which is used for notifications. Documentation for the
interface is in Documentation/networking/ethtool-netlink.txt file. The
netlink interface is optional, it is built when CONFIG_ETHTOOL_NETLINK
(bool) option is enabled.
There are three types of request messages distinguished by suffix "_GET"
(query for information), "_SET" (modify parameters) and "_ACT" (perform an
action). Kernel reply messages have name with additional suffix "_REPLY"
(e.g. ETHTOOL_MSG_SETTINGS_GET_REPLY). Most "_SET" and "_ACT" message types
do not have matching reply type as only some of them need additional reply
data beyond numeric error code and extack.
Basic concepts:
- make extensions easier not only by allowing new attributes but also by
imposing as few artificial limits as possible, e.g. by using arbitrary
size bit sets for most bitmap attributes or by not using fixed size
strings
- use extack for error reporting and warnings
- send netlink notifications on changes (even if they were done using the
ioctl interface) and actions
- avoid the racy read/modify/write cycle between kernel and userspace by
sending only attributes which userspace wants to change; there is still
a read/modify/write cycle between generic kernel code and ethtool_ops
handler in NIC driver but it is only in kernel and under a lock
- reduce the number of name lists that need to be kept in sync between
kernel and userspace (e.g. recognized link modes)
- where feasible, allow dump requests to query specific information for all
network devices
- as the lack of extensibility of the ioctl interface led to having too
many commands, group some of them together to one netlink message but
allow querying only part(s) of the information (using "info mask" bitmap)
and modifying only some of the parameters (by providing only some
attributes)
- as parsing and generating netlink messages is more complicated than
simply copying data structures between userspace API and ethtool_ops
handlers (which most ioctl commands do), split the code into multiple
files in net/ethtool directory; move net/core/ethtool.c also to this
directory and rename it to ioctl.c
The full (work in progress) series, together with the (userspace) ethtool
counterpart can be found at https://github.com/mkubecek/ethnl
Main changes between v5 and v6:
- use ETHTOOL_MSG_ prefix for message types
- replace ETHA_ prefix for netlink attributes by ETHTOOL_A_
- replace ETH_x_IM_y for infomask bits by ETHTOOL_IM_x_y
- split GET reply types from SET requests and notifications
- split kernel and userspace message types into different enums
- remove INFO_GET requests from submitted part
- drop EVENT notifications (use rtnetlink and on-demand string set load)
- reorganize patches to reduce the number of intermitent warnings
- unify request/reply header and its processing
- another nest around strings in a string set for consistency
- more consistent identifier naming
- coding style cleanup
- get rid of some of the helpers
- set bad attribute in extack where applicable
- various bug fixes
- improve documentation and code comments, more kerneldoc comments
- more verbose commit messages
Changes between v4 and v5:
- do not panic on failed initialization, only WARN()
Main changes between RFC v3 and v4:
- use more kerneldoc style comments
- strict attribute policy checking
- use macros for tables of link mode names and parameters
- provide permanent hardware address in rtnetlink
- coding style cleanup
- split too long patches, reorder
- wrap more ETHA_SETTINGS_* attributes in nests
- add also some SET_* implementation into submitted part
Main changes between RFC v2 and RFC v3:
- do not allow building as a module (no netdev notifiers needed)
- drop some obsolete fields
- add permanent hw address, timestamping and private flags support
- rework bitset handling to get rid of variable length arrays
- notify monitor on device renames
- restructure GET_SETTINGS/SET_SETTINGS messages
- split too long patches and submit only first part of the series
Main changes between RFC v1 and RFC v2:
- support dumps for all "get" requests
- provide notifications for changes related to supported request types
- support getting string sets (both global and per device)
- support getting/setting device features
- get rid of family specific header, everything passed as attributes
- split netlink code into multiple files in net/ethtool/ directory
Michal Kubecek (15):
rtnetlink: provide permanent hardware address in RTM_NEWLINK
netlink: rename nl80211_validate_nested() to nla_validate_nested()
ethtool: move to its own directory
ethtool: introduce ethtool netlink interface
ethtool: helper functions for netlink interface
ethtool: netlink bitset handling
ethtool: support for netlink notifications
ethtool: move string arrays into common file
ethtool: generic handlers for GET requests
ethtool: provide string sets with STRSET_GET request
ethtool: provide link mode names as a string set
ethtool: provide link settings and link modes in SETTINGS_GET request
ethtool: add standard notification handler
ethtool: set link settings and link modes with SETTINGS_SET request
ethtool: provide link state in SETTINGS_GET request
Documentation/networking/ethtool-netlink.txt | 399 ++++++++++
include/linux/ethtool.h | 4 +
include/linux/ethtool_netlink.h | 17 +
include/linux/netdevice.h | 12 +
include/net/netlink.h | 8 +-
include/uapi/linux/ethtool.h | 4 +
include/uapi/linux/ethtool_netlink.h | 219 ++++++
include/uapi/linux/if_link.h | 1 +
net/Kconfig | 8 +
net/Makefile | 2 +-
net/core/Makefile | 2 +-
net/core/rtnetlink.c | 5 +
net/ethtool/Makefile | 7 +
net/ethtool/bitset.c | 606 +++++++++++++++
net/ethtool/bitset.h | 40 +
net/ethtool/common.c | 140 ++++
net/ethtool/common.h | 24 +
net/{core/ethtool.c => ethtool/ioctl.c} | 157 +---
net/ethtool/netlink.c | 762 +++++++++++++++++++
net/ethtool/netlink.h | 302 ++++++++
net/ethtool/settings.c | 628 +++++++++++++++
net/ethtool/strset.c | 459 +++++++++++
net/wireless/nl80211.c | 3 +-
23 files changed, 3666 insertions(+), 143 deletions(-)
create mode 100644 Documentation/networking/ethtool-netlink.txt
create mode 100644 include/linux/ethtool_netlink.h
create mode 100644 include/uapi/linux/ethtool_netlink.h
create mode 100644 net/ethtool/Makefile
create mode 100644 net/ethtool/bitset.c
create mode 100644 net/ethtool/bitset.h
create mode 100644 net/ethtool/common.c
create mode 100644 net/ethtool/common.h
rename net/{core/ethtool.c => ethtool/ioctl.c} (93%)
create mode 100644 net/ethtool/netlink.c
create mode 100644 net/ethtool/netlink.h
create mode 100644 net/ethtool/settings.c
create mode 100644 net/ethtool/strset.c
--
2.22.0
Function nl80211_validate_nested() is not specific to nl80211, it's
a counterpart to nla_validate_nested_deprecated() with strict validation.
For consistency with other validation and parse functions, rename it to
nla_validate_nested().
Signed-off-by: Michal Kubecek <[email protected]>
---
include/net/netlink.h | 8 ++++----
net/wireless/nl80211.c | 3 +--
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/include/net/netlink.h b/include/net/netlink.h
index e4650e5b64a1..edb36bf29261 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -1736,7 +1736,7 @@ static inline void nla_nest_cancel(struct sk_buff *skb, struct nlattr *start)
}
/**
- * nla_validate_nested - Validate a stream of nested attributes
+ * __nla_validate_nested - Validate a stream of nested attributes
* @start: container attribute
* @maxtype: maximum attribute type to be expected
* @policy: validation policy
@@ -1759,9 +1759,9 @@ static inline int __nla_validate_nested(const struct nlattr *start, int maxtype,
}
static inline int
-nl80211_validate_nested(const struct nlattr *start, int maxtype,
- const struct nla_policy *policy,
- struct netlink_ext_ack *extack)
+nla_validate_nested(const struct nlattr *start, int maxtype,
+ const struct nla_policy *policy,
+ struct netlink_ext_ack *extack)
{
return __nla_validate_nested(start, maxtype, policy,
NL_VALIDATE_STRICT, extack);
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index fc83dd179c1a..ac371d40530d 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -12707,8 +12707,7 @@ static int nl80211_vendor_check_policy(const struct wiphy_vendor_command *vcmd,
return -EINVAL;
}
- return nl80211_validate_nested(attr, vcmd->maxattr, vcmd->policy,
- extack);
+ return nla_validate_nested(attr, vcmd->maxattr, vcmd->policy, extack);
}
static int nl80211_vendor_cmd(struct sk_buff *skb, struct genl_info *info)
--
2.22.0
The ethtool netlink interface is going to be split into multiple files so
that it will be more convenient to put all of them in a separate directory
net/ethtool. Start by moving current ethtool.c with ioctl interface into
this directory and renaming it to ioctl.c.
Signed-off-by: Michal Kubecek <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
---
net/Makefile | 2 +-
net/core/Makefile | 2 +-
net/ethtool/Makefile | 3 +++
net/{core/ethtool.c => ethtool/ioctl.c} | 0
4 files changed, 5 insertions(+), 2 deletions(-)
create mode 100644 net/ethtool/Makefile
rename net/{core/ethtool.c => ethtool/ioctl.c} (100%)
diff --git a/net/Makefile b/net/Makefile
index 449fc0b221f8..848303d98d3d 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -13,7 +13,7 @@ obj-$(CONFIG_NET) += $(tmp-y)
# LLC has to be linked before the files in net/802/
obj-$(CONFIG_LLC) += llc/
-obj-$(CONFIG_NET) += ethernet/ 802/ sched/ netlink/ bpf/
+obj-$(CONFIG_NET) += ethernet/ 802/ sched/ netlink/ bpf/ ethtool/
obj-$(CONFIG_NETFILTER) += netfilter/
obj-$(CONFIG_INET) += ipv4/
obj-$(CONFIG_TLS) += tls/
diff --git a/net/core/Makefile b/net/core/Makefile
index a104dc8faafc..3e2c378e5f31 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -8,7 +8,7 @@ obj-y := sock.o request_sock.o skbuff.o datagram.o stream.o scm.o \
obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
-obj-y += dev.o ethtool.o dev_addr_lists.o dst.o netevent.o \
+obj-y += dev.o dev_addr_lists.o dst.o netevent.o \
neighbour.o rtnetlink.o utils.o link_watch.o filter.o \
sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \
fib_notifier.o xdp.o flow_offload.o
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
new file mode 100644
index 000000000000..3ebfab2bca66
--- /dev/null
+++ b/net/ethtool/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-y += ioctl.o
diff --git a/net/core/ethtool.c b/net/ethtool/ioctl.c
similarity index 100%
rename from net/core/ethtool.c
rename to net/ethtool/ioctl.c
--
2.22.0
Basic genetlink and init infrastructure for the netlink interface, register
genetlink family "ethtool". Add CONFIG_ETHTOOL_NETLINK Kconfig option to
make the build optional. Add initial overall interface description into
Documentation/networking/ethtool-netlink.txt, further patches will add more
detailed information.
Signed-off-by: Michal Kubecek <[email protected]>
---
Documentation/networking/ethtool-netlink.txt | 208 +++++++++++++++++++
include/linux/ethtool_netlink.h | 9 +
include/uapi/linux/ethtool_netlink.h | 36 ++++
net/Kconfig | 8 +
net/ethtool/Makefile | 6 +-
net/ethtool/netlink.c | 33 +++
net/ethtool/netlink.h | 10 +
7 files changed, 309 insertions(+), 1 deletion(-)
create mode 100644 Documentation/networking/ethtool-netlink.txt
create mode 100644 include/linux/ethtool_netlink.h
create mode 100644 include/uapi/linux/ethtool_netlink.h
create mode 100644 net/ethtool/netlink.c
create mode 100644 net/ethtool/netlink.h
diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
new file mode 100644
index 000000000000..97c369aa290b
--- /dev/null
+++ b/Documentation/networking/ethtool-netlink.txt
@@ -0,0 +1,208 @@
+ Netlink interface for ethtool
+ =============================
+
+
+Basic information
+-----------------
+
+Netlink interface for ethtool uses generic netlink family "ethtool" (userspace
+application should use macros ETHTOOL_GENL_NAME and ETHTOOL_GENL_VERSION
+defined in <linux/ethtool_netlink.h> uapi header). This family does not use
+a specific header, all information in requests and replies is passed using
+netlink attributes.
+
+The ethtool netlink interface uses extended ACK for error and warning
+reporting, userspace application developers are encouraged to make these
+messages available to user in a suitable way.
+
+Requests can be divided into three categories: "get" (retrieving information),
+"set" (setting parameters) and "action" (invoking an action).
+
+All "set" and "action" type requests require admin privileges (CAP_NET_ADMIN
+in the namespace). Most "get" type requests are allowed for anyone but there
+are exceptions (where the response contains sensitive information). In some
+cases, the request as such is allowed for anyone but unprivileged users have
+attributes with sensitive information (e.g. wake-on-lan password) omitted.
+
+
+Conventions
+-----------
+
+Attributes which represent a boolean value usually use u8 type so that we can
+distinguish three states: "on", "off" and "not present" (meaning the
+information is not available in "get" requests or value is not to be changed
+in "set" requests). For these attributes, the "true" value should be passed as
+number 1 but any non-zero value should be understood as "true" by recipient.
+
+In the message structure descriptions below, if an attribute name is suffixed
+with "+", parent nest can contain multiple attributes of the same type. This
+implements an array of entries.
+
+
+Request header
+--------------
+
+Each request or reply message contains a nested attribute with common header.
+Structure of this header is
+
+ ETHTOOL_A_HEADER_DEV_INDEX (u32) device ifindex
+ ETHTOOL_A_HEADER_DEV_NAME (string) device name
+ ETHTOOL_A_HEADER_INFOMASK (u32) info mask
+ ETHTOOL_A_HEADER_GFLAGS (u32) flags common for all requests
+ ETHTOOL_A_HEADER_RFLAGS (u32) request specific flags
+
+ETHTOOL_A_HEADER_DEV_INDEX and ETHTOOL_A_HEADER_DEV_NAME identify the device
+message relates to. One of them is sufficient in requests, if both are used,
+they must identify the same device. Some requests, e.g. global string sets, do
+not require device identification. Most GET requests also allow dump requests
+without device identification to query the same information for all devices
+providing it (each device in a separate message).
+
+Optional info mask allows to ask only for a part of data provided by GET
+request types. If omitted or zero, all data is returned. The two flag bitmaps
+allow enabling requestoptions; ETHTOOL_A_HEADER_GFLAGS are global flags common
+for all request types, flags recognized in ETHTOOL_A_HEADER_RFLAGS and their
+interpretation are specific for each request type. Global flags are
+
+ ETHTOOL_RF_COMPACT use compact format bitsets in reply
+ ETHTOOL_RF_REPLY send optional reply (SET and ACT requests)
+
+Request specific flags are described with each request type. For both flag
+attributes, new flags should follow the general idea that if the flag is not
+set, the behaviour is the same as (or closer to) the behaviour before it was
+introduced.
+
+
+List of message types
+---------------------
+
+All constants identifying message types use ETHTOOL_CMD_ prefix and suffix
+according to message purpose:
+
+ _GET userspace request to retrieve data
+ _SET userspace request to set data
+ _ACT userspace request to perform an action
+ _GET_REPLY kernel reply to a GET request
+ _SET_REPLY kernel reply to a SET request
+ _ACT_REPLY kernel reply to an ACT request
+ _NTF kernel notification
+
+"GET" requests are sent by userspace applications to retrieve device
+information. They usually do not contain any message specific attributes.
+Kernel replies with corresponding "GET_REPLY" message. For most types, "GET"
+request with NLM_F_DUMP and no device identification can be used to query the
+information for all devices supporting the request.
+
+If the data can be also modified, corresponding "SET" message with the same
+layout as "GET" reply is used to request changes. Only attributes where
+a change is requested are included in such request (also, not all attributes
+may be changed). Replies to most "SET" request consist only of error code and
+extack; if kernel provides additional data, it is sent in the form of
+corresponding "SET_REPLY" message (if ETHTOOL_RF_REPLY flag was set in request
+header).
+
+Data modification also triggers sending a "NTF" message with a notification.
+These usually bear only a subset of attributes which was affected by the
+change. The same notification is issued if the data is modified using other
+means (mostly ioctl ethtool interface). Unlike notifications from ethtool
+netlink code which are only sent if something actually changed, notifications
+triggered by ioctl interface may be sent even if the request did not actually
+change any data.
+
+"ACT" messages request kernel (driver) to perform a specific action. If some
+information is reported by kernel (as requested by ETHTOOL_RF_REPLY flag in
+request header), the reply takes form of an "ACT_REPLY" message. Performing an
+action also triggers a notification ("NTF" message).
+
+Later sections describe the format and semantics of these messages.
+
+
+Request translation
+-------------------
+
+The following table maps ioctl commands to netlink commands providing their
+functionality. Entries with "n/a" in right column are commands which do not
+have their netlink replacement yet.
+
+ioctl command netlink command
+---------------------------------------------------------------------
+ETHTOOL_GSET n/a
+ETHTOOL_SSET n/a
+ETHTOOL_GDRVINFO n/a
+ETHTOOL_GREGS n/a
+ETHTOOL_GWOL n/a
+ETHTOOL_SWOL n/a
+ETHTOOL_GMSGLVL n/a
+ETHTOOL_SMSGLVL n/a
+ETHTOOL_NWAY_RST n/a
+ETHTOOL_GLINK n/a
+ETHTOOL_GEEPROM n/a
+ETHTOOL_SEEPROM n/a
+ETHTOOL_GCOALESCE n/a
+ETHTOOL_SCOALESCE n/a
+ETHTOOL_GRINGPARAM n/a
+ETHTOOL_SRINGPARAM n/a
+ETHTOOL_GPAUSEPARAM n/a
+ETHTOOL_SPAUSEPARAM n/a
+ETHTOOL_GRXCSUM n/a
+ETHTOOL_SRXCSUM n/a
+ETHTOOL_GTXCSUM n/a
+ETHTOOL_STXCSUM n/a
+ETHTOOL_GSG n/a
+ETHTOOL_SSG n/a
+ETHTOOL_TEST n/a
+ETHTOOL_GSTRINGS n/a
+ETHTOOL_PHYS_ID n/a
+ETHTOOL_GSTATS n/a
+ETHTOOL_GTSO n/a
+ETHTOOL_STSO n/a
+ETHTOOL_GPERMADDR rtnetlink RTM_GETLINK
+ETHTOOL_GUFO n/a
+ETHTOOL_SUFO n/a
+ETHTOOL_GGSO n/a
+ETHTOOL_SGSO n/a
+ETHTOOL_GFLAGS n/a
+ETHTOOL_SFLAGS n/a
+ETHTOOL_GPFLAGS n/a
+ETHTOOL_SPFLAGS n/a
+ETHTOOL_GRXFH n/a
+ETHTOOL_SRXFH n/a
+ETHTOOL_GGRO n/a
+ETHTOOL_SGRO n/a
+ETHTOOL_GRXRINGS n/a
+ETHTOOL_GRXCLSRLCNT n/a
+ETHTOOL_GRXCLSRULE n/a
+ETHTOOL_GRXCLSRLALL n/a
+ETHTOOL_SRXCLSRLDEL n/a
+ETHTOOL_SRXCLSRLINS n/a
+ETHTOOL_FLASHDEV n/a
+ETHTOOL_RESET n/a
+ETHTOOL_SRXNTUPLE n/a
+ETHTOOL_GRXNTUPLE n/a
+ETHTOOL_GSSET_INFO n/a
+ETHTOOL_GRXFHINDIR n/a
+ETHTOOL_SRXFHINDIR n/a
+ETHTOOL_GFEATURES n/a
+ETHTOOL_SFEATURES n/a
+ETHTOOL_GCHANNELS n/a
+ETHTOOL_SCHANNELS n/a
+ETHTOOL_SET_DUMP n/a
+ETHTOOL_GET_DUMP_FLAG n/a
+ETHTOOL_GET_DUMP_DATA n/a
+ETHTOOL_GET_TS_INFO n/a
+ETHTOOL_GMODULEINFO n/a
+ETHTOOL_GMODULEEEPROM n/a
+ETHTOOL_GEEE n/a
+ETHTOOL_SEEE n/a
+ETHTOOL_GRSSH n/a
+ETHTOOL_SRSSH n/a
+ETHTOOL_GTUNABLE n/a
+ETHTOOL_STUNABLE n/a
+ETHTOOL_GPHYSTATS n/a
+ETHTOOL_PERQUEUE n/a
+ETHTOOL_GLINKSETTINGS n/a
+ETHTOOL_SLINKSETTINGS n/a
+ETHTOOL_PHY_GTUNABLE n/a
+ETHTOOL_PHY_STUNABLE n/a
+ETHTOOL_GFECPARAM n/a
+ETHTOOL_SFECPARAM n/a
diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h
new file mode 100644
index 000000000000..0412adb4f42f
--- /dev/null
+++ b/include/linux/ethtool_netlink.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _LINUX_ETHTOOL_NETLINK_H_
+#define _LINUX_ETHTOOL_NETLINK_H_
+
+#include <uapi/linux/ethtool_netlink.h>
+#include <linux/ethtool.h>
+
+#endif /* _LINUX_ETHTOOL_NETLINK_H_ */
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
new file mode 100644
index 000000000000..9a0fbd4f85d9
--- /dev/null
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * include/uapi/linux/ethtool_netlink.h - netlink interface for ethtool
+ *
+ * See Documentation/networking/ethtool-netlink.txt in kernel source tree for
+ * doucumentation of the interface.
+ */
+
+#ifndef _UAPI_LINUX_ETHTOOL_NETLINK_H_
+#define _UAPI_LINUX_ETHTOOL_NETLINK_H_
+
+#include <linux/ethtool.h>
+
+/* message types - userspace to kernel */
+enum {
+ ETHTOOL_MSG_USER_NONE,
+
+ /* add new constants above here */
+ __ETHTOOL_MSG_USER_CNT,
+ ETHTOOL_MSG_USER_MAX = (__ETHTOOL_MSG_USER_CNT - 1)
+};
+
+/* message types - kernel to userspace */
+enum {
+ ETHTOOL_MSG_KERNEL_NONE,
+
+ /* add new constants above here */
+ __ETHTOOL_MSG_KERNEL_CNT,
+ ETHTOOL_MSG_KERNEL_MAX = (__ETHTOOL_MSG_KERNEL_CNT - 1)
+};
+
+/* generic netlink info */
+#define ETHTOOL_GENL_NAME "ethtool"
+#define ETHTOOL_GENL_VERSION 1
+
+#endif /* _UAPI_LINUX_ETHTOOL_NETLINK_H_ */
diff --git a/net/Kconfig b/net/Kconfig
index 57f51a279ad6..65b760d26eec 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -447,6 +447,14 @@ config FAILOVER
migration of VMs with direct attached VFs by failing over to the
paravirtual datapath when the VF is unplugged.
+config ETHTOOL_NETLINK
+ bool "Netlink interface for ethtool"
+ default y
+ help
+ An alternative userspace interface for ethtool based on generic
+ netlink. It provides better extensibility and some new features,
+ e.g. notification messages.
+
endif # if NET
# Used by archs to tell that they support BPF JIT compiler plus which flavour.
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index 3ebfab2bca66..f30e0da88be5 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -1,3 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
-obj-y += ioctl.o
+obj-y += ioctl.o
+
+obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
+
+ethtool_nl-y := netlink.o
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
new file mode 100644
index 000000000000..3c98b41f04e5
--- /dev/null
+++ b/net/ethtool/netlink.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+
+#include <linux/ethtool_netlink.h>
+#include "netlink.h"
+
+/* genetlink setup */
+
+static const struct genl_ops ethtool_genl_ops[] = {
+};
+
+static struct genl_family ethtool_genl_family = {
+ .name = ETHTOOL_GENL_NAME,
+ .version = ETHTOOL_GENL_VERSION,
+ .netnsok = true,
+ .parallel_ops = true,
+ .ops = ethtool_genl_ops,
+ .n_ops = ARRAY_SIZE(ethtool_genl_ops),
+};
+
+/* module setup */
+
+static int __init ethnl_init(void)
+{
+ int ret;
+
+ ret = genl_register_family(ðtool_genl_family);
+ if (WARN(ret < 0, "ethtool: genetlink family registration failed"))
+ return ret;
+
+ return 0;
+}
+
+subsys_initcall(ethnl_init);
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
new file mode 100644
index 000000000000..257ae55ccc82
--- /dev/null
+++ b/net/ethtool/netlink.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _NET_ETHTOOL_NETLINK_H
+#define _NET_ETHTOOL_NETLINK_H
+
+#include <linux/ethtool_netlink.h>
+#include <linux/netdevice.h>
+#include <net/genetlink.h>
+
+#endif /* _NET_ETHTOOL_NETLINK_H */
--
2.22.0
Add common request/reply header definition and helpers to parse request
header and fill reply header. Provide ethnl_update_* helpers to update
structure members from request attributes (to be used for *_SET requests).
Signed-off-by: Michal Kubecek <[email protected]>
---
include/uapi/linux/ethtool_netlink.h | 23 ++++
net/ethtool/netlink.c | 173 +++++++++++++++++++++++++++
net/ethtool/netlink.h | 145 ++++++++++++++++++++++
3 files changed, 341 insertions(+)
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 9a0fbd4f85d9..ffd7db0848ef 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -29,6 +29,29 @@ enum {
ETHTOOL_MSG_KERNEL_MAX = (__ETHTOOL_MSG_KERNEL_CNT - 1)
};
+/* request header */
+
+/* use compact bitsets in reply */
+#define ETHTOOL_RF_COMPACT (1 << 0)
+/* provide optional reply for SET or ACT requests */
+#define ETHTOOL_RF_REPLY (1 << 1)
+
+#define ETHTOOL_RF_ALL (ETHTOOL_RF_COMPACT | \
+ ETHTOOL_RF_REPLY)
+
+enum {
+ ETHTOOL_A_HEADER_UNSPEC,
+ ETHTOOL_A_HEADER_DEV_INDEX, /* u32 */
+ ETHTOOL_A_HEADER_DEV_NAME, /* string */
+ ETHTOOL_A_HEADER_INFOMASK, /* u32 */
+ ETHTOOL_A_HEADER_GFLAGS, /* u32 - ETHTOOL_RF_* */
+ ETHTOOL_A_HEADER_RFLAGS, /* u32 */
+
+ /* add new constants above here */
+ __ETHTOOL_A_HEADER_CNT,
+ ETHTOOL_A_HEADER_MAX = (__ETHTOOL_A_HEADER_CNT - 1)
+};
+
/* generic netlink info */
#define ETHTOOL_GENL_NAME "ethtool"
#define ETHTOOL_GENL_VERSION 1
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 3c98b41f04e5..e13f29bbd625 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -1,8 +1,181 @@
// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+#include <net/sock.h>
#include <linux/ethtool_netlink.h>
#include "netlink.h"
+static struct genl_family ethtool_genl_family;
+
+static const struct nla_policy dflt_header_policy[ETHTOOL_A_HEADER_MAX + 1] = {
+ [ETHTOOL_A_HEADER_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_HEADER_DEV_INDEX] = { .type = NLA_U32 },
+ [ETHTOOL_A_HEADER_DEV_NAME] = { .type = NLA_NUL_STRING,
+ .len = IFNAMSIZ - 1 },
+ [ETHTOOL_A_HEADER_INFOMASK] = { .type = NLA_U32 },
+ [ETHTOOL_A_HEADER_GFLAGS] = { .type = NLA_U32 },
+ [ETHTOOL_A_HEADER_RFLAGS] = { .type = NLA_U32 },
+};
+
+/**
+ * ethnl_parse_header() - parse request header
+ * @req_info: structure to put results into
+ * @nest: nest attribute with request header
+ * @net: request netns
+ * @extack: netlink extack for error reporting
+ * @policy: netlink attribute policy to validate header; use
+ * @dflt_header_policy (all attributes allowed) if null
+ * @require_dev: fail if no device identiified in header
+ *
+ * Parse request header in nested attribute @nest and puts results into
+ * the structure pointed to by @req_info. Extack from @info is used for error
+ * reporting. If req_info->dev is not null on return, reference to it has
+ * been taken. If error is returned, *req_info is null initialized and no
+ * reference is held.
+ *
+ * Return: 0 on success or negative error code
+ */
+int ethnl_parse_header(struct ethnl_req_info *req_info,
+ const struct nlattr *nest, struct net *net,
+ struct netlink_ext_ack *extack,
+ const struct nla_policy *policy, bool require_dev)
+{
+ struct nlattr *tb[ETHTOOL_A_HEADER_MAX + 1];
+ const struct nlattr *devname_attr;
+ struct net_device *dev = NULL;
+ int ret;
+
+ if (!nest) {
+ NL_SET_ERR_MSG(extack, "request header missing");
+ return -EINVAL;
+ }
+ ret = nla_parse_nested(tb, ETHTOOL_A_HEADER_MAX, nest,
+ policy ?: dflt_header_policy, extack);
+ if (ret < 0)
+ return ret;
+ devname_attr = tb[ETHTOOL_A_HEADER_DEV_NAME];
+
+ if (tb[ETHTOOL_A_HEADER_DEV_INDEX]) {
+ u32 ifindex = nla_get_u32(tb[ETHTOOL_A_HEADER_DEV_INDEX]);
+
+ dev = dev_get_by_index(net, ifindex);
+ if (!dev) {
+ NL_SET_ERR_MSG_ATTR(extack,
+ tb[ETHTOOL_A_HEADER_DEV_INDEX],
+ "no device matches ifindex");
+ return -ENODEV;
+ }
+ /* if both ifindex and ifname are passed, they must match */
+ if (devname_attr &&
+ strncmp(dev->name, nla_data(devname_attr), IFNAMSIZ)) {
+ dev_put(dev);
+ NL_SET_ERR_MSG_ATTR(extack, nest,
+ "ifindex and name do not match");
+ return -ENODEV;
+ }
+ } else if (devname_attr) {
+ dev = dev_get_by_name(net, nla_data(devname_attr));
+ if (!dev) {
+ NL_SET_ERR_MSG_ATTR(extack, devname_attr,
+ "no device matches name");
+ return -ENODEV;
+ }
+ } else if (require_dev) {
+ NL_SET_ERR_MSG_ATTR(extack, nest,
+ "neither ifindex nor name specified");
+ return -EINVAL;
+ }
+
+ if (dev && !netif_device_present(dev)) {
+ dev_put(dev);
+ NL_SET_ERR_MSG(extack, "device not present");
+ return -ENODEV;
+ }
+
+ req_info->dev = dev;
+ ethnl_update_u32(&req_info->req_mask, tb[ETHTOOL_A_HEADER_INFOMASK]);
+ ethnl_update_u32(&req_info->global_flags, tb[ETHTOOL_A_HEADER_GFLAGS]);
+ ethnl_update_u32(&req_info->req_flags, tb[ETHTOOL_A_HEADER_RFLAGS]);
+
+ return 0;
+}
+
+/**
+ * ethnl_fill_reply_header() - Put standard header into a reply message
+ * @skb: skb with the message
+ * @dev: network device to describe in header
+ * @attrtype: attribute type to use for the nest
+ *
+ * Create a nested attribute with attributes describing given network device.
+ * Clean up on error.
+ *
+ * Return: 0 on success, error value (-EMSGSIZE only) on error
+ */
+int ethnl_fill_reply_header(struct sk_buff *skb, struct net_device *dev,
+ u16 attrtype)
+{
+ struct nlattr *nest;
+
+ if (!dev)
+ return 0;
+ nest = nla_nest_start(skb, attrtype);
+ if (!nest)
+ return -EMSGSIZE;
+
+ if (nla_put_u32(skb, ETHTOOL_A_HEADER_DEV_INDEX, (u32)dev->ifindex) ||
+ nla_put_string(skb, ETHTOOL_A_HEADER_DEV_NAME, dev->name))
+ goto nla_put_failure;
+ /* If more attributes are put into reply header, ethnl_header_size()
+ * must be updated to account for them.
+ */
+
+ nla_nest_end(skb, nest);
+ return 0;
+
+nla_put_failure:
+ nla_nest_cancel(skb, nest);
+ return -EMSGSIZE;
+}
+
+/**
+ * ethnl_reply_init() - Create skb for a reply and fill device identification
+ * @payload: payload length (without netlink and genetlink header)
+ * @dev: device the reply is about (may be null)
+ * @cmd: ETHTOOL_MSG_* message type for reply
+ * @info: genetlink info of the received packet we respond to
+ * @ehdrp: place to store payload pointer returned by genlmsg_new()
+ *
+ * Return: pointer to allocated skb on success, NULL on error
+ */
+struct sk_buff *ethnl_reply_init(size_t payload, struct net_device *dev, u8 cmd,
+ u16 hdr_attrtype, struct genl_info *info,
+ void **ehdrp)
+{
+ struct sk_buff *skb;
+
+ skb = genlmsg_new(payload, GFP_KERNEL);
+ if (!skb)
+ goto err;
+ *ehdrp = genlmsg_put_reply(skb, info, ðtool_genl_family, 0, cmd);
+ if (!*ehdrp)
+ goto err_free;
+
+ if (dev) {
+ int ret;
+
+ ret = ethnl_fill_reply_header(skb, dev, hdr_attrtype);
+ if (ret < 0)
+ goto err;
+ }
+ return skb;
+
+err_free:
+ nlmsg_free(skb);
+ if (info)
+ GENL_SET_ERR_MSG(info, "failed to setup reply message");
+err:
+ return NULL;
+}
+
/* genetlink setup */
static const struct genl_ops ethtool_genl_ops[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 257ae55ccc82..5510eb7054b3 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -6,5 +6,150 @@
#include <linux/ethtool_netlink.h>
#include <linux/netdevice.h>
#include <net/genetlink.h>
+#include <net/sock.h>
+
+struct ethnl_req_info;
+
+int ethnl_parse_header(struct ethnl_req_info *req_info,
+ const struct nlattr *nest, struct net *net,
+ struct netlink_ext_ack *extack,
+ const struct nla_policy *policy, bool require_dev);
+int ethnl_fill_reply_header(struct sk_buff *skb, struct net_device *dev,
+ u16 attrtype);
+struct sk_buff *ethnl_reply_init(size_t payload, struct net_device *dev, u8 cmd,
+ u16 hdr_attrtype, struct genl_info *info,
+ void **ehdrp);
+
+static inline int ethnl_str_size(const char *s)
+{
+ return nla_total_size(strlen(s) + 1);
+}
+
+/* The ethnl_update_* helpers set value pointed to by @dst to the value of
+ * netlink attribute @attr (if attr is not null). They return true if *dst
+ * value was changed, false if not.
+ */
+static inline bool ethnl_update_u32(u32 *dst, struct nlattr *attr)
+{
+ u32 val;
+
+ if (!attr)
+ return false;
+ val = nla_get_u32(attr);
+ if (*dst == val)
+ return false;
+
+ *dst = val;
+ return true;
+}
+
+static inline bool ethnl_update_u8(u8 *dst, struct nlattr *attr)
+{
+ u8 val;
+
+ if (!attr)
+ return false;
+ val = nla_get_u8(attr);
+ if (*dst == val)
+ return false;
+
+ *dst = val;
+ return true;
+}
+
+/* update u32 value used as bool from NLA_U8 attribute */
+static inline bool ethnl_update_bool32(u32 *dst, struct nlattr *attr)
+{
+ u8 val;
+
+ if (!attr)
+ return false;
+ val = !!nla_get_u8(attr);
+ if (!!*dst == val)
+ return false;
+
+ *dst = val;
+ return true;
+}
+
+static inline bool ethnl_update_binary(u8 *dst, unsigned int len,
+ struct nlattr *attr)
+{
+ if (!attr)
+ return false;
+ if (nla_len(attr) < len)
+ len = nla_len(attr);
+ if (!memcmp(dst, nla_data(attr), len))
+ return false;
+
+ memcpy(dst, nla_data(attr), len);
+ return true;
+}
+
+static inline bool ethnl_update_bitfield32(u32 *dst, struct nlattr *attr)
+{
+ struct nla_bitfield32 change;
+ u32 newval;
+
+ if (!attr)
+ return false;
+ change = nla_get_bitfield32(attr);
+ newval = (*dst & ~change.selector) | (change.value & change.selector);
+ if (*dst == newval)
+ return false;
+
+ *dst = newval;
+ return true;
+}
+
+/**
+ * ethnl_is_privileged() - check if request has sufficient privileges
+ * @skb: skb with client request
+ *
+ * Checks if client request has CAP_NET_ADMIN in its netns. Unlike the flags
+ * in genl_ops, this allows finer access control, e.g. allowing or denying
+ * the request based on its contents or witholding only part of the data
+ * from unprivileged users.
+ *
+ * Return: true if request is privileged, false if not
+ */
+static inline bool ethnl_is_privileged(struct sk_buff *skb)
+{
+ struct net *net = sock_net(skb->sk);
+
+ return netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN);
+}
+
+/**
+ * ethnl_reply_header_size() - total size of reply header
+ *
+ * This is an upper estimate so that we do not need to hold RTNL lock longer
+ * than necessary (to prevent rename between size estimate and composing the
+ * message). Accounts only for device ifindex and name as those are the only
+ * attributes ethnl_fill_reply_header() puts into the reply header.
+ */
+static inline unsigned int ethnl_reply_header_size(void)
+{
+ return nla_total_size(nla_total_size(sizeof(u32)) +
+ nla_total_size(IFNAMSIZ));
+}
+
+/**
+ * struct ethnl_req_info - base type of request information for GET requests
+ * @dev: network device the request is for (may be null)
+ * @req_mask: request mask, bitmap of requested information
+ * @global_flags: request flags common for all request types
+ * @req_flags: request flags specific for each request type
+ * @privileged: privileged request (CAP_NET_ADMIN in netns)
+ *
+ * This is a common base, additional members may follow after this structure.
+ */
+struct ethnl_req_info {
+ struct net_device *dev;
+ u32 req_mask;
+ u32 global_flags;
+ u32 req_flags;
+ bool privileged;
+};
#endif /* _NET_ETHTOOL_NETLINK_H */
--
2.22.0
The ethtool netlink code uses common framework for passing arbitrary
length bit sets to allow future extensions. A bitset can be a list (only
one bitmap) or can consist of value and mask pair (used e.g. when client
want to modify only some bits). A bitset can use one of two formats:
verbose (bit by bit) or compact.
Verbose format consists of bitset size (number of bits), list flag and
an array of bit nests, telling which bits are part of the list or which
bits are in the mask and which of them are to be set. In requests, bits
can be identified by index (position) or by name. In replies, kernel
provides both index and name. Verbose format is suitable for "one shot"
applications like standard ethtool command as it avoids the need to
either keep bit names (e.g. link modes) in sync with kernel or having to
add an extra roundtrip for string set request (e.g. for private flags).
Compact format uses one (list) or two (value/mask) arrays of 32-bit
words to store the bitmap(s). It is more suitable for long running
applications (ethtool in monitor mode or network management daemons)
which can retrieve the names once and then pass only compact bitmaps to
save space.
Userspace requests can use either format and ETHTOOL_RF_COMPACT flag in
request header tells kernel which format to use in reply. Notifications
always use compact format.
Signed-off-by: Michal Kubecek <[email protected]>
---
Documentation/networking/ethtool-netlink.txt | 61 ++
include/uapi/linux/ethtool_netlink.h | 35 ++
net/ethtool/Makefile | 2 +-
net/ethtool/bitset.c | 606 +++++++++++++++++++
net/ethtool/bitset.h | 40 ++
net/ethtool/netlink.h | 9 +
6 files changed, 752 insertions(+), 1 deletion(-)
create mode 100644 net/ethtool/bitset.c
create mode 100644 net/ethtool/bitset.h
diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
index 97c369aa290b..4636682c551f 100644
--- a/Documentation/networking/ethtool-netlink.txt
+++ b/Documentation/networking/ethtool-netlink.txt
@@ -73,6 +73,67 @@ set, the behaviour is the same as (or closer to) the behaviour before it was
introduced.
+Bit sets
+--------
+
+For short bitmaps of (reasonably) fixed length, standard NLA_BITFIELD32 type
+is used. For arbitrary length bitmaps, ethtool netlink uses a nested attribute
+with contents of one of two forms: compact (two binary bitmaps representing
+bit values and mask of affected bits) and bit-by-bit (list of bits identified
+by either index or name).
+
+Compact form: nested (bitset) atrribute contents:
+
+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
+ ETHTOOL_A_BITSET_VALUE (binary) bitmap of bit values
+ ETHTOOL_A_BITSET_MASK (binary) bitmap of valid bits
+
+Value and mask must have length at least ETHTOOL_A_BITSET_SIZE bits rounded up
+to a multiple of 32 bits. They consist of 32-bit words in host byte order,
+words ordered from least significant to most significant (i.e. the same way as
+bitmaps are passed with ioctl interface).
+
+For compact form, ETHTOOL_A_BITSET_SIZE and ETHTOOL_A_BITSET_VALUE are
+mandatory. Similar to BITFIELD32, a compact form bit set requests to set bits
+in the mask to 1 (if the bit is set in value) or 0 (if not) and preserve the
+rest. If ETHTOOL_A_BITSET_LIST is present, there is no mask and bitset
+represents a simple list of bits.
+
+Kernel bit set length may differ from userspace length if older application is
+used on newer kernel or vice versa. If userspace bitmap is longer, an error is
+issued only if the request actually tries to set values of some bits not
+recognized by kernel.
+
+Bit-by-bit form: nested (bitset) attribute contents:
+
+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
+ ETHTOOL_A_BITSET_BIT (nested) array of bits
+ ETHTOOL_A_BITSET_BIT+ (nested) one bit
+ ETHTOOL_A_BIT_INDEX (u32) bit index (0 for LSB)
+ ETHTOOL_A_BIT_NAME (string) bit name
+ ETHTOOL_A_BIT_VALUE (flag) present if bit is set
+
+Bit size is optional for bit-by-bit form. ETHTOOL_A_BITSET_BITS nest can only
+contain ETHTOOL_A_BITS_BIT attributes but there can be an arbitrary number of
+them. A bit may be identified by its index or by its name. When used in
+requests, listed bits are set to 0 or 1 according to ETHTOOL_A_BIT_VALUE, the
+rest is preserved. A request fails if index exceeds kernel bit length or if
+name is not recognized.
+
+When ETHTOOL_A_BITSET_LIST flag is present, bitset is interpreted as a simple
+bit list. ETHTOOL_A_BIT_VALUE attributes are not used in such case. Bit list
+represents a bitmap with listed bits set and the rest zero.
+
+In requests, application can use either form. Form used by kernel in reply is
+determined by a flag in flags field of request header. Semantics of value and
+mask depends on the attribute. General idea is that flags control request
+processing, info_mask control which parts of the information are returned in
+"get" request and index identifies a particular subcommand or an object to
+which the request applies.
+
+
List of message types
---------------------
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index ffd7db0848ef..805f314f4454 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -52,6 +52,41 @@ enum {
ETHTOOL_A_HEADER_MAX = (__ETHTOOL_A_HEADER_CNT - 1)
};
+/* bit sets */
+
+enum {
+ ETHTOOL_A_BIT_UNSPEC,
+ ETHTOOL_A_BIT_INDEX, /* u32 */
+ ETHTOOL_A_BIT_NAME, /* string */
+ ETHTOOL_A_BIT_VALUE, /* flag */
+
+ /* add new constants above here */
+ __ETHTOOL_A_BIT_CNT,
+ ETHTOOL_A_BIT_MAX = (__ETHTOOL_A_BIT_CNT - 1)
+};
+
+enum {
+ ETHTOOL_A_BITS_UNSPEC,
+ ETHTOOL_A_BITS_BIT,
+
+ /* add new constants above here */
+ __ETHTOOL_A_BITS_CNT,
+ ETHTOOL_A_BITS_MAX = (__ETHTOOL_A_BITS_CNT - 1)
+};
+
+enum {
+ ETHTOOL_A_BITSET_UNSPEC,
+ ETHTOOL_A_BITSET_LIST, /* flag */
+ ETHTOOL_A_BITSET_SIZE, /* u32 */
+ ETHTOOL_A_BITSET_BITS, /* nest - _A_BITS_* */
+ ETHTOOL_A_BITSET_VALUE, /* binary */
+ ETHTOOL_A_BITSET_MASK, /* binary */
+
+ /* add new constants above here */
+ __ETHTOOL_A_BITSET_CNT,
+ ETHTOOL_A_BITSET_MAX = (__ETHTOOL_A_BITSET_CNT - 1)
+};
+
/* generic netlink info */
#define ETHTOOL_GENL_NAME "ethtool"
#define ETHTOOL_GENL_VERSION 1
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index f30e0da88be5..482fdb9380fa 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -4,4 +4,4 @@ obj-y += ioctl.o
obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
-ethtool_nl-y := netlink.o
+ethtool_nl-y := netlink.o bitset.o
diff --git a/net/ethtool/bitset.c b/net/ethtool/bitset.c
new file mode 100644
index 000000000000..80bb6fbb1268
--- /dev/null
+++ b/net/ethtool/bitset.c
@@ -0,0 +1,606 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+
+#include <linux/ethtool_netlink.h>
+#include <linux/bitmap.h>
+#include "netlink.h"
+#include "bitset.h"
+
+static bool ethnl_test_bit(const void *val, unsigned int index, bool is_u32)
+{
+ if (!val)
+ return true;
+ else if (is_u32)
+ return ((const u32 *)val)[index / 32] & (1U << (index % 32));
+ else
+ return test_bit(index, val);
+}
+
+static void __bitmap_to_u32(u32 *dst, const void *src, unsigned int size,
+ bool is_u32)
+{
+ unsigned int full_words = size / 32;
+ const u32 *src32 = src;
+
+ if (!is_u32) {
+ bitmap_to_arr32(dst, src, size);
+ return;
+ }
+
+ memcpy(dst, src32, full_words * sizeof(u32));
+ if (size % 32 != 0)
+ dst[full_words] = src32[full_words] & ((1U << (size % 32)) - 1);
+}
+
+/* convert standard kernel bitmap (long sized words) to ethtool one (u32 words)
+ * bitmap_to_arr32() is not guaranteed to do "in place" conversion correctly;
+ * moreover, we can use the fact that the conversion is no-op except for 64-bit
+ * big endian architectures
+ */
+#if BITS_PER_LONG == 64 && defined(__BIG_ENDIAN)
+void ethnl_bitmap_to_u32(unsigned long *bitmap, unsigned int nwords)
+{
+ u32 *dst = (u32 *)bitmap;
+ unsigned int i;
+
+ for (i = 0; i < nwords; i++) {
+ unsigned long tmp = READ_ONCE(bitmap[i]);
+
+ dst[2 * i] = tmp & 0xffffffff;
+ dst[2 * i + 1] = tmp >> 32;
+ }
+}
+#endif
+
+static const char *bit_name(const void *const names, bool legacy,
+ unsigned int idx)
+{
+ const char (*const legacy_names)[ETH_GSTRING_LEN] =
+ (const char (*const)[ETH_GSTRING_LEN])names;
+ const char *const *simple_names = names;
+
+ return legacy ? legacy_names[idx] : simple_names[idx];
+}
+
+/* calculate size for a bitset attribute
+ * see ethnl_put_bitset() for arguments
+ */
+static int __ethnl_bitset_size(unsigned int size, const void *val,
+ const void *mask, const void *names,
+ unsigned int flags)
+{
+ const bool legacy = flags & ETHNL_BITSET_LEGACY_NAMES;
+ const bool compact = flags & ETHNL_BITSET_COMPACT;
+ const bool is_list = flags & ETHNL_BITSET_LIST;
+ const bool is_u32 = flags & ETHNL_BITSET_U32;
+ unsigned int nwords = DIV_ROUND_UP(size, 32);
+ unsigned int len = 0;
+
+ if (WARN_ON(!compact && !names))
+ return -EINVAL;
+ /* list flag */
+ if (flags & ETHNL_BITSET_LIST)
+ len += nla_total_size(sizeof(u32));
+ /* size */
+ len += nla_total_size(sizeof(u32));
+
+ if (compact) {
+ /* values, mask */
+ len += 2 * nla_total_size(nwords * sizeof(u32));
+ } else {
+ unsigned int bits_len = 0;
+ unsigned int bit_len, i;
+
+ for (i = 0; i < size; i++) {
+ const char *name = bit_name(names, legacy, i) ?: "";
+
+ if ((is_list || mask) &&
+ !ethnl_test_bit(is_list ? val : mask, i, is_u32))
+ continue;
+ /* index */
+ bit_len = nla_total_size(sizeof(u32));
+ /* name */
+ bit_len += ethnl_str_size(name);
+ /* value */
+ if (!is_list && ethnl_test_bit(val, i, is_u32))
+ bit_len += nla_total_size(0);
+
+ /* bit nest */
+ bits_len += nla_total_size(bit_len);
+ }
+ /* bits nest */
+ len += nla_total_size(bits_len);
+ }
+
+ /* outermost nest */
+ return nla_total_size(len);
+}
+
+int ethnl_bitset_size(unsigned int size, const unsigned long *val,
+ const unsigned long *mask, const void *names,
+ unsigned int flags)
+{
+ return __ethnl_bitset_size(size, val, mask, names,
+ flags & ~ETHNL_BITSET_U32);
+}
+
+int ethnl_bitset32_size(unsigned int size, const u32 *val, const u32 *mask,
+ const void *names, unsigned int flags)
+{
+ return __ethnl_bitset_size(size, val, mask, names,
+ flags | ETHNL_BITSET_U32);
+}
+
+/**
+ * __ethnl_put_bitset() - Put a bitset nest into a message
+ * @skb: skb with the message
+ * @attrtype: attribute type for the bitset nest
+ * @size: size of the set in bits
+ * @val: bitset values
+ * @mask: mask of valid bits; NULL is interpreted as "all bits"
+ * @names: bit names (only used for verbose format)
+ * @flags: combination of ETHNL_BITSET_* flags
+ *
+ * This is the actual implementation of putting a bitset nested attribute into
+ * a netlink message but callers are supposed to use either ethnl_put_bitset()
+ * for unsigned long based bitmaps or ethnl_put_bitset32() for u32 based ones.
+ * Cleans the nest up on error.
+ *
+ * Return: 0 on success, negative error value on error
+ */
+static int __ethnl_put_bitset(struct sk_buff *skb, int attrtype,
+ unsigned int size, const void *val,
+ const void *mask, const void *names,
+ unsigned int flags)
+{
+ const bool legacy = flags & ETHNL_BITSET_LEGACY_NAMES;
+ const bool compact = flags & ETHNL_BITSET_COMPACT;
+ const bool is_list = flags & ETHNL_BITSET_LIST;
+ const bool is_u32 = flags & ETHNL_BITSET_U32;
+ struct nlattr *nest;
+ struct nlattr *attr;
+
+ if (WARN_ON(!compact && !names))
+ return -EINVAL;
+ nest = nla_nest_start(skb, attrtype);
+ if (!nest)
+ return -EMSGSIZE;
+
+ if (is_list && nla_put_flag(skb, ETHTOOL_A_BITSET_LIST))
+ goto nla_put_failure;
+ if (nla_put_u32(skb, ETHTOOL_A_BITSET_SIZE, size))
+ goto nla_put_failure;
+ if (compact) {
+ unsigned int bytesize = DIV_ROUND_UP(size, 32) * sizeof(u32);
+
+ attr = nla_reserve(skb, ETHTOOL_A_BITSET_VALUE, bytesize);
+ if (!attr)
+ goto nla_put_failure;
+ __bitmap_to_u32(nla_data(attr), val, size, is_u32);
+ if (mask) {
+ attr = nla_reserve(skb, ETHTOOL_A_BITSET_MASK,
+ bytesize);
+ if (!attr)
+ goto nla_put_failure;
+ __bitmap_to_u32(nla_data(attr), mask, size, is_u32);
+ }
+ } else {
+ struct nlattr *bits;
+ unsigned int i;
+
+ bits = nla_nest_start(skb, ETHTOOL_A_BITSET_BITS);
+ if (!bits)
+ goto nla_put_failure;
+ for (i = 0; i < size; i++) {
+ const char *name = bit_name(names, legacy, i) ?: "";
+
+ if ((is_list || mask) &&
+ !ethnl_test_bit(is_list ? val : mask, i, is_u32))
+ continue;
+ attr = nla_nest_start(skb, ETHTOOL_A_BITS_BIT);
+ if (!attr ||
+ nla_put_u32(skb, ETHTOOL_A_BIT_INDEX, i) ||
+ nla_put_string(skb, ETHTOOL_A_BIT_NAME, name))
+ goto nla_put_failure;
+ if (!is_list && ethnl_test_bit(val, i, is_u32) &&
+ nla_put_flag(skb, ETHTOOL_A_BIT_VALUE))
+ goto nla_put_failure;
+ nla_nest_end(skb, attr);
+ }
+ nla_nest_end(skb, bits);
+ }
+
+ nla_nest_end(skb, nest);
+ return 0;
+
+nla_put_failure:
+ nla_nest_cancel(skb, nest);
+ return -EMSGSIZE;
+}
+
+int ethnl_put_bitset(struct sk_buff *skb, int attrtype, unsigned int size,
+ const unsigned long *val, const unsigned long *mask,
+ const void *names, unsigned int flags)
+{
+ return __ethnl_put_bitset(skb, attrtype, size, val, mask, names,
+ flags & ~ETHNL_BITSET_U32);
+}
+
+int ethnl_put_bitset32(struct sk_buff *skb, int attrtype, unsigned int size,
+ const u32 *val, const u32 *mask, const void *names,
+ unsigned int flags)
+{
+ return __ethnl_put_bitset(skb, attrtype, size, val, mask, names,
+ flags | ETHNL_BITSET_U32);
+}
+
+static const struct nla_policy bitset_policy[ETHTOOL_A_BITSET_MAX + 1] = {
+ [ETHTOOL_A_BITSET_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_BITSET_LIST] = { .type = NLA_FLAG },
+ [ETHTOOL_A_BITSET_SIZE] = { .type = NLA_U32 },
+ [ETHTOOL_A_BITSET_BITS] = { .type = NLA_NESTED },
+ [ETHTOOL_A_BITSET_VALUE] = { .type = NLA_BINARY },
+ [ETHTOOL_A_BITSET_MASK] = { .type = NLA_BINARY },
+};
+
+static const struct nla_policy bit_policy[ETHTOOL_A_BIT_MAX + 1] = {
+ [ETHTOOL_A_BIT_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_BIT_INDEX] = { .type = NLA_U32 },
+ [ETHTOOL_A_BIT_NAME] = { .type = NLA_NUL_STRING },
+ [ETHTOOL_A_BIT_VALUE] = { .type = NLA_FLAG },
+};
+
+static int ethnl_name_to_idx(const void *names, bool legacy,
+ unsigned int n_names, const char *name,
+ unsigned int name_len)
+{
+ unsigned int i;
+
+ for (i = 0; i < n_names; i++) {
+ const char *bname = bit_name(names, legacy, i);
+
+ if (bname && !strncmp(bname, name, name_len) &&
+ strlen(bname) <= name_len)
+ return i;
+ }
+
+ return n_names;
+}
+
+static int ethnl_update_bit(unsigned long *bitmap, unsigned long *bitmask,
+ unsigned int nbits, const struct nlattr *bit_attr,
+ bool is_list, const void *names, bool legacy,
+ struct genl_info *info)
+{
+ struct nlattr *tb[ETHTOOL_A_BIT_MAX + 1];
+ int ret, idx;
+
+ if (nla_type(bit_attr) != ETHTOOL_A_BITS_BIT) {
+ NL_SET_ERR_MSG_ATTR(info->extack, bit_attr,
+ "ETHTOOL_A_BITSET_BITS can contain only ETHTOOL_A_BITS_BIT");
+ return -EINVAL;
+ }
+ ret = nla_parse_nested(tb, ETHTOOL_A_BIT_MAX, bit_attr, bit_policy,
+ info->extack);
+ if (ret < 0)
+ return ret;
+
+ if (tb[ETHTOOL_A_BIT_INDEX]) {
+ const char *name;
+
+ idx = nla_get_u32(tb[ETHTOOL_A_BIT_INDEX]);
+ if (idx >= nbits) {
+ NL_SET_ERR_MSG_ATTR(info->extack,
+ tb[ETHTOOL_A_BIT_INDEX],
+ "bit index too high");
+ return -EOPNOTSUPP;
+ }
+ name = bit_name(names, legacy, idx);
+ if (tb[ETHTOOL_A_BIT_NAME] && name &&
+ strncmp(nla_data(tb[ETHTOOL_A_BIT_NAME]), name,
+ nla_len(tb[ETHTOOL_A_BIT_NAME]))) {
+ NL_SET_ERR_MSG_ATTR(info->extack, bit_attr,
+ "bit index and name mismatch");
+ return -EINVAL;
+ }
+ } else if (tb[ETHTOOL_A_BIT_NAME]) {
+ idx = ethnl_name_to_idx(names, legacy, nbits,
+ nla_data(tb[ETHTOOL_A_BIT_NAME]),
+ nla_len(tb[ETHTOOL_A_BIT_NAME]));
+ if (idx >= nbits) {
+ NL_SET_ERR_MSG_ATTR(info->extack,
+ tb[ETHTOOL_A_BIT_NAME],
+ "bit name not found");
+ return -EOPNOTSUPP;
+ }
+ } else {
+ NL_SET_ERR_MSG_ATTR(info->extack, bit_attr,
+ "neither bit index nor name specified");
+ return -EINVAL;
+ }
+
+ if (is_list || tb[ETHTOOL_A_BIT_VALUE])
+ set_bit(idx, bitmap);
+ else
+ clear_bit(idx, bitmap);
+ if (!is_list || bitmask)
+ set_bit(idx, bitmask);
+ return 0;
+}
+
+int ethnl_bitset_is_compact(const struct nlattr *bitset, bool *compact)
+{
+ struct nlattr *tb[ETHTOOL_A_BITSET_MAX + 1];
+ int ret;
+
+ ret = nla_parse_nested(tb, ETHTOOL_A_BITSET_MAX, bitset,
+ bitset_policy, NULL);
+ if (ret < 0)
+ return ret;
+
+ if (tb[ETHTOOL_A_BITSET_BITS]) {
+ if (tb[ETHTOOL_A_BITSET_VALUE] || tb[ETHTOOL_A_BITSET_MASK])
+ return -EINVAL;
+ *compact = false;
+ return 0;
+ }
+ if (!tb[ETHTOOL_A_BITSET_SIZE] || !tb[ETHTOOL_A_BITSET_VALUE])
+ return -EINVAL;
+
+ *compact = true;
+ return 0;
+}
+
+/* 64-bit long endian is the only case when u32 based bitmap and unsigned long
+ * based bitmap layouts differ
+ */
+#if BITS_PER_LONG == 64 && defined(__BIG_ENDIAN)
+/* dst &= src */
+static void __bitmap_and_u32(unsigned long *dst, const u32 *src,
+ unsigned int nbits)
+{
+ unsigned long op;
+
+ while (nbits >= BITS_PER_LONG) {
+ op = src[0] | ((unsigned long)src[1] << 32);
+ *dst &= op;
+
+ dst++;
+ src += 2;
+ nbits -= BITS_PER_LONG;
+ }
+
+ if (!nbits)
+ return;
+ op = src[0];
+ if (nbits > 32)
+ op |= ((unsigned long)src[1] << 32);
+ *dst = (op & BITMAP_LAST_WORD_MASK(nbits));
+}
+
+/* map1 == map2 */
+static bool __bitmap_equal_u32(const unsigned long *map1, const u32 *map2,
+ unsigned int nbits)
+{
+ unsigned long dword;
+
+ while (nbits >= BITS_PER_LONG) {
+ dword = map2[0] | ((unsigned long)map2[1] << 32);
+ if (*map1 != dword)
+ return false;
+
+ map1++;
+ map2 += 2;
+ nbits -= BITS_PER_LONG;
+ }
+
+ if (!nbits)
+ return true;
+ dword = map2[0];
+ if (nbits > 32)
+ dword |= ((unsigned long)map2[1] << 32);
+ return !((*map1 ^ dword) & BITMAP_LAST_WORD_MASK(nbits));
+}
+#else
+/* On 32-bit and 64-bit LE, unsigned long and u32 bitmap layout is the same
+ * but we must not write past dst buffer if the number of words is odd.
+ */
+static void __bitmap_and_u32(unsigned long *dst, const u32 *src,
+ unsigned int nbits)
+{
+ u32 *dst32 = (u32 *)dst;
+
+ while (nbits >= 32) {
+ *dst32++ &= *src++;
+ nbits -= 32;
+ }
+ if (!nbits)
+ return;
+ *dst32 &= (*src & ((1U << nbits) - 1));
+}
+
+static bool __bitmap_equal_u32(const unsigned long *map1, const u32 *map2,
+ unsigned int nbits)
+{
+ unsigned int full_words = nbits / 32;
+ u32 last_word_mask;
+ u32 *map1_32 = (u32 *)map1;
+
+ if (memcmp(map1, map2, full_words * BITS_PER_BYTE))
+ return false;
+ if (!(nbits % 32))
+ return true;
+ last_word_mask = (1U << (nbits % 32)) - 1;
+ return !((map1_32[full_words] ^ map2[full_words]) & last_word_mask);
+}
+#endif
+
+/* copy unsigned long bitmap to unsigned long or u32 */
+static void __bitmap_to_any(void *dst, const unsigned long *src,
+ unsigned int nbits, bool dst_is_u32)
+{
+ if (dst_is_u32)
+ bitmap_to_arr32(dst, src, nbits);
+ else
+ bitmap_copy(dst, src, nbits);
+}
+
+static bool __bitmap_equal_any(const unsigned long *map1, const void *map2,
+ unsigned int nbits, bool is_u32)
+{
+ if (!is_u32)
+ return bitmap_equal(map1, map2, nbits);
+ else
+ return __bitmap_equal_u32(map1, map2, nbits);
+}
+
+/**
+ * __ethnl_update_bitset() - Apply a bitset nest to a bitmap
+ * @bitmap: bitmap to update
+ * @bitmask: if not, mask from the nest is copied here
+ * @nbits: size of the updated bitmap in bits
+ * @attr: nest attribute to parse and apply
+ * @err: pointer to variable to put error value (or 0 on success) to
+ * @names: array of bit names; may be null for compact format
+ * @legacy: true if @names is ioctl style array of char[32], false if it is
+ * a simple array of (char *) strings
+ * @info: genetlink info (also used for extack error reporting)
+ * @is_u32: false: bitmaps are unsigned long based, true: u32 based bitmaps
+ *
+ * This is the actual implementation of bitset nested attribute parser but
+ * callers are supposed to use ethnl_update_bitset() for unsigned long based
+ * bitmaps or ethnl_update_bitset32() for u32 based ones.
+ *
+ * Return: true if the bitmap contents was modified, false if not
+ */
+static bool __ethnl_update_bitset(void *bitmap, void *bitmask,
+ unsigned int nbits, const struct nlattr *attr,
+ int *err, const void *names, bool legacy,
+ struct genl_info *info, bool is_u32)
+{
+ struct nlattr *tb[ETHTOOL_A_BITSET_MAX + 1];
+ unsigned int change_bits = 0;
+ unsigned int max_bits = 0;
+ unsigned long *val, *mask;
+ bool mod = false;
+ bool is_list;
+
+ *err = 0;
+ if (!attr)
+ return mod;
+ *err = nla_parse_nested(tb, ETHTOOL_A_BITSET_MAX, attr, bitset_policy,
+ info->extack);
+ if (*err < 0)
+ return mod;
+ *err = -EINVAL;
+ if (tb[ETHTOOL_A_BITSET_BITS] &&
+ (tb[ETHTOOL_A_BITSET_VALUE] || tb[ETHTOOL_A_BITSET_MASK]))
+ return mod;
+ if (!tb[ETHTOOL_A_BITSET_BITS] &&
+ (!tb[ETHTOOL_A_BITSET_SIZE] || !tb[ETHTOOL_A_BITSET_VALUE]))
+ return mod;
+ is_list = (tb[ETHTOOL_A_BITSET_LIST] != NULL);
+ if (is_list && tb[ETHTOOL_A_BITSET_MASK])
+ return mod;
+
+ /* To let new userspace to work with old kernel, we allow bitmaps
+ * from userspace to be longer than kernel ones and only issue an
+ * error if userspace actually tries to change a bit not existing
+ * in kernel.
+ */
+ if (tb[ETHTOOL_A_BITSET_SIZE])
+ change_bits = nla_get_u32(tb[ETHTOOL_A_BITSET_SIZE]);
+ max_bits = max_t(unsigned int, nbits, change_bits);
+ mask = bitmap_zalloc(max_bits, GFP_KERNEL);
+ val = bitmap_zalloc(max_bits, GFP_KERNEL);
+
+ if (tb[ETHTOOL_A_BITSET_BITS]) {
+ struct nlattr *bit_attr;
+ int rem;
+
+ if (is_list)
+ bitmap_fill(mask, nbits);
+ else if (is_u32)
+ bitmap_from_arr32(val, bitmap, nbits);
+ else
+ bitmap_copy(val, bitmap, nbits);
+ nla_for_each_nested(bit_attr, tb[ETHTOOL_A_BITSET_BITS], rem) {
+ *err = ethnl_update_bit(val, mask, nbits, bit_attr,
+ is_list, names, legacy, info);
+ if (*err < 0)
+ goto out;
+ }
+ if (bitmask)
+ __bitmap_to_any(bitmask, mask, nbits, is_u32);
+ } else {
+ unsigned int change_words = DIV_ROUND_UP(change_bits, 32);
+
+ *err = 0;
+ if (change_bits == 0 && tb[ETHTOOL_A_BITSET_MASK])
+ goto out;
+ *err = -EINVAL;
+ if (nla_len(tb[ETHTOOL_A_BITSET_VALUE]) <
+ change_words * sizeof(u32))
+ goto out;
+ if (tb[ETHTOOL_A_BITSET_MASK] &&
+ nla_len(tb[ETHTOOL_A_BITSET_MASK]) <
+ change_words * sizeof(u32))
+ goto out;
+
+ bitmap_from_arr32(val, nla_data(tb[ETHTOOL_A_BITSET_VALUE]),
+ change_bits);
+ if (tb[ETHTOOL_A_BITSET_MASK])
+ bitmap_from_arr32(mask,
+ nla_data(tb[ETHTOOL_A_BITSET_MASK]),
+ change_bits);
+ else
+ bitmap_fill(mask, nbits);
+
+ if (nbits < change_bits) {
+ unsigned int idx = find_next_bit(mask, max_bits, nbits);
+
+ *err = -EINVAL;
+ if (idx < max_bits)
+ goto out;
+ }
+
+ if (bitmask)
+ __bitmap_to_any(bitmask, mask, nbits, is_u32);
+ if (!is_list) {
+ bitmap_and(val, val, mask, nbits);
+ bitmap_complement(mask, mask, nbits);
+ if (is_u32)
+ __bitmap_and_u32(mask, bitmap, nbits);
+ else
+ bitmap_and(mask, mask, bitmap, nbits);
+ bitmap_or(val, val, mask, nbits);
+ }
+ }
+
+ mod = !__bitmap_equal_any(val, bitmap, nbits, is_u32);
+ if (mod)
+ __bitmap_to_any(bitmap, val, nbits, is_u32);
+
+ *err = 0;
+out:
+ bitmap_free(val);
+ bitmap_free(mask);
+ return mod;
+}
+
+bool ethnl_update_bitset(unsigned long *bitmap, unsigned long *bitmask,
+ unsigned int nbits, const struct nlattr *attr,
+ int *err, const void *names, bool legacy,
+ struct genl_info *info)
+{
+ return __ethnl_update_bitset(bitmap, bitmask, nbits, attr, err, names,
+ legacy, info, false);
+}
+
+bool ethnl_update_bitset32(u32 *bitmap, u32 *bitmask, unsigned int nbits,
+ const struct nlattr *attr, int *err,
+ const void *names, bool legacy,
+ struct genl_info *info)
+{
+ return __ethnl_update_bitset(bitmap, bitmask, nbits, attr, err, names,
+ legacy, info, true);
+}
diff --git a/net/ethtool/bitset.h b/net/ethtool/bitset.h
new file mode 100644
index 000000000000..761d0c47fe23
--- /dev/null
+++ b/net/ethtool/bitset.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _NET_ETHTOOL_BITSET_H
+#define _NET_ETHTOOL_BITSET_H
+
+/* when set, value and mask bitmaps are arrays of u32, when not, arrays of
+ * unsigned long
+ */
+#define ETHNL_BITSET_U32 BIT(0)
+/* generate a compact format bitset */
+#define ETHNL_BITSET_COMPACT BIT(1)
+/* generate a bit list */
+#define ETHNL_BITSET_LIST BIT(2)
+/* when set, names are interpreted as legacy string set (an array of
+ * char[ETH_GSTRING_LEN]), when not, as a simple array of char *
+ */
+#define ETHNL_BITSET_LEGACY_NAMES BIT(3)
+
+int ethnl_bitset_is_compact(const struct nlattr *bitset, bool *compact);
+int ethnl_bitset_size(unsigned int size, const unsigned long *val,
+ const unsigned long *mask, const void *names,
+ unsigned int flags);
+int ethnl_bitset32_size(unsigned int size, const u32 *val, const u32 *mask,
+ const void *names, unsigned int flags);
+int ethnl_put_bitset(struct sk_buff *skb, int attrtype, unsigned int size,
+ const unsigned long *val, const unsigned long *mask,
+ const void *names, unsigned int flags);
+int ethnl_put_bitset32(struct sk_buff *skb, int attrtype, unsigned int size,
+ const u32 *val, const u32 *mask, const void *names,
+ unsigned int flags);
+bool ethnl_update_bitset(unsigned long *bitmap, unsigned long *bitmask,
+ unsigned int nbits, const struct nlattr *attr,
+ int *err, const void *names, bool legacy,
+ struct genl_info *info);
+bool ethnl_update_bitset32(u32 *bitmap, u32 *bitmask, unsigned int nbits,
+ const struct nlattr *attr, int *err,
+ const void *names, bool legacy,
+ struct genl_info *info);
+
+#endif /* _NET_ETHTOOL_BITSET_H */
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 5510eb7054b3..7f1b9ec1ace7 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -20,6 +20,15 @@ struct sk_buff *ethnl_reply_init(size_t payload, struct net_device *dev, u8 cmd,
u16 hdr_attrtype, struct genl_info *info,
void **ehdrp);
+#if BITS_PER_LONG == 64 && defined(__BIG_ENDIAN)
+void ethnl_bitmap_to_u32(unsigned long *bitmap, unsigned int nwords);
+#else
+static inline void ethnl_bitmap_to_u32(unsigned long *bitmap,
+ unsigned int nwords)
+{
+}
+#endif
+
static inline int ethnl_str_size(const char *s)
{
return nla_total_size(strlen(s) + 1);
--
2.22.0
Add infrastructure for ethtool netlink notifications. There is only one
multicast group "monitor" which is used to notify userspace about changes
and actions performed. Notification messages (types using suffix _NTF)
share the format with replies to GET requests.
Notifications are supposed to be broadcasted on every configuration change,
whether it is done using the netlink interface or ioctl one. Netlink SET
requests only trigger a notification if some data is actually changed.
To trigger an ethtool notification, both ethtool netlink and external code
use ethtool_notify() helper. This helper requires RTNL to be held and may
sleep. Handlers sending messages for specific notification message types
are registered in ethnl_notify_handlers array. As notifications can be
triggered from other code, ethnl_ok flag is used to prevent an attempt to
send notification before genetlink family is registered.
Signed-off-by: Michal Kubecek <[email protected]>
---
include/linux/ethtool_netlink.h | 5 ++++
include/linux/netdevice.h | 12 ++++++++++
include/uapi/linux/ethtool_netlink.h | 2 ++
net/ethtool/netlink.c | 35 ++++++++++++++++++++++++++++
4 files changed, 54 insertions(+)
diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h
index 0412adb4f42f..2a15e64a16f3 100644
--- a/include/linux/ethtool_netlink.h
+++ b/include/linux/ethtool_netlink.h
@@ -5,5 +5,10 @@
#include <uapi/linux/ethtool_netlink.h>
#include <linux/ethtool.h>
+#include <linux/netdevice.h>
+
+enum ethtool_multicast_groups {
+ ETHNL_MCGRP_MONITOR,
+};
#endif /* _LINUX_ETHTOOL_NETLINK_H_ */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 88292953aa6f..c57d9917fd50 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4350,6 +4350,18 @@ struct netdev_notifier_bonding_info {
void netdev_bonding_info_change(struct net_device *dev,
struct netdev_bonding_info *bonding_info);
+#if IS_ENABLED(CONFIG_ETHTOOL_NETLINK)
+void ethtool_notify(struct net_device *dev, struct netlink_ext_ack *extack,
+ unsigned int cmd, u32 req_mask, const void *data);
+#else
+static inline void ethtool_notify(struct net_device *dev,
+ struct netlink_ext_ack *extack,
+ unsigned int cmd, u32 req_mask,
+ const void *data)
+{
+}
+#endif
+
static inline
struct sk_buff *skb_gso_segment(struct sk_buff *skb, netdev_features_t features)
{
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 805f314f4454..8938a1f09057 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -91,4 +91,6 @@ enum {
#define ETHTOOL_GENL_NAME "ethtool"
#define ETHTOOL_GENL_VERSION 1
+#define ETHTOOL_MCGRP_MONITOR_NAME "monitor"
+
#endif /* _UAPI_LINUX_ETHTOOL_NETLINK_H_ */
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index e13f29bbd625..a7a0bfe1818c 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -6,6 +6,8 @@
static struct genl_family ethtool_genl_family;
+static bool ethnl_ok __read_mostly;
+
static const struct nla_policy dflt_header_policy[ETHTOOL_A_HEADER_MAX + 1] = {
[ETHTOOL_A_HEADER_UNSPEC] = { .type = NLA_REJECT },
[ETHTOOL_A_HEADER_DEV_INDEX] = { .type = NLA_U32 },
@@ -176,11 +178,41 @@ struct sk_buff *ethnl_reply_init(size_t payload, struct net_device *dev, u8 cmd,
return NULL;
}
+/* notifications */
+
+typedef void (*ethnl_notify_handler_t)(struct net_device *dev,
+ struct netlink_ext_ack *extack,
+ unsigned int cmd, u32 req_mask,
+ const void *data);
+
+static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
+};
+
+void ethtool_notify(struct net_device *dev, struct netlink_ext_ack *extack,
+ unsigned int cmd, u32 req_mask, const void *data)
+{
+ if (unlikely(!ethnl_ok))
+ return;
+ ASSERT_RTNL();
+
+ if (likely(cmd < ARRAY_SIZE(ethnl_notify_handlers) &&
+ ethnl_notify_handlers[cmd]))
+ ethnl_notify_handlers[cmd](dev, extack, cmd, req_mask, data);
+ else
+ WARN_ONCE(1, "notification %u not implemented (dev=%s, req_mask=0x%x)\n",
+ cmd, netdev_name(dev), req_mask);
+}
+EXPORT_SYMBOL(ethtool_notify);
+
/* genetlink setup */
static const struct genl_ops ethtool_genl_ops[] = {
};
+static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
+ [ETHNL_MCGRP_MONITOR] = { .name = ETHTOOL_MCGRP_MONITOR_NAME },
+};
+
static struct genl_family ethtool_genl_family = {
.name = ETHTOOL_GENL_NAME,
.version = ETHTOOL_GENL_VERSION,
@@ -188,6 +220,8 @@ static struct genl_family ethtool_genl_family = {
.parallel_ops = true,
.ops = ethtool_genl_ops,
.n_ops = ARRAY_SIZE(ethtool_genl_ops),
+ .mcgrps = ethtool_nl_mcgrps,
+ .n_mcgrps = ARRAY_SIZE(ethtool_nl_mcgrps),
};
/* module setup */
@@ -199,6 +233,7 @@ static int __init ethnl_init(void)
ret = genl_register_family(ðtool_genl_family);
if (WARN(ret < 0, "ethtool: genetlink family registration failed"))
return ret;
+ ethnl_ok = true;
return 0;
}
--
2.22.0
Introduce file net/ethtool/common.c for code shared by ioctl and netlink
ethtool interface. Move name tables of features, RSS hash functions,
tunables and PHY tunables into this file.
Signed-off-by: Michal Kubecek <[email protected]>
---
net/ethtool/Makefile | 2 +-
net/ethtool/common.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
net/ethtool/common.h | 17 +++++++++
net/ethtool/ioctl.c | 83 ++-----------------------------------------
4 files changed, 104 insertions(+), 82 deletions(-)
create mode 100644 net/ethtool/common.c
create mode 100644 net/ethtool/common.h
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index 482fdb9380fa..11782306593b 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
-obj-y += ioctl.o
+obj-y += ioctl.o common.o
obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
new file mode 100644
index 000000000000..b0ce420e994e
--- /dev/null
+++ b/net/ethtool/common.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+
+#include "common.h"
+
+const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
+ [NETIF_F_SG_BIT] = "tx-scatter-gather",
+ [NETIF_F_IP_CSUM_BIT] = "tx-checksum-ipv4",
+ [NETIF_F_HW_CSUM_BIT] = "tx-checksum-ip-generic",
+ [NETIF_F_IPV6_CSUM_BIT] = "tx-checksum-ipv6",
+ [NETIF_F_HIGHDMA_BIT] = "highdma",
+ [NETIF_F_FRAGLIST_BIT] = "tx-scatter-gather-fraglist",
+ [NETIF_F_HW_VLAN_CTAG_TX_BIT] = "tx-vlan-hw-insert",
+
+ [NETIF_F_HW_VLAN_CTAG_RX_BIT] = "rx-vlan-hw-parse",
+ [NETIF_F_HW_VLAN_CTAG_FILTER_BIT] = "rx-vlan-filter",
+ [NETIF_F_HW_VLAN_STAG_TX_BIT] = "tx-vlan-stag-hw-insert",
+ [NETIF_F_HW_VLAN_STAG_RX_BIT] = "rx-vlan-stag-hw-parse",
+ [NETIF_F_HW_VLAN_STAG_FILTER_BIT] = "rx-vlan-stag-filter",
+ [NETIF_F_VLAN_CHALLENGED_BIT] = "vlan-challenged",
+ [NETIF_F_GSO_BIT] = "tx-generic-segmentation",
+ [NETIF_F_LLTX_BIT] = "tx-lockless",
+ [NETIF_F_NETNS_LOCAL_BIT] = "netns-local",
+ [NETIF_F_GRO_BIT] = "rx-gro",
+ [NETIF_F_GRO_HW_BIT] = "rx-gro-hw",
+ [NETIF_F_LRO_BIT] = "rx-lro",
+
+ [NETIF_F_TSO_BIT] = "tx-tcp-segmentation",
+ [NETIF_F_GSO_ROBUST_BIT] = "tx-gso-robust",
+ [NETIF_F_TSO_ECN_BIT] = "tx-tcp-ecn-segmentation",
+ [NETIF_F_TSO_MANGLEID_BIT] = "tx-tcp-mangleid-segmentation",
+ [NETIF_F_TSO6_BIT] = "tx-tcp6-segmentation",
+ [NETIF_F_FSO_BIT] = "tx-fcoe-segmentation",
+ [NETIF_F_GSO_GRE_BIT] = "tx-gre-segmentation",
+ [NETIF_F_GSO_GRE_CSUM_BIT] = "tx-gre-csum-segmentation",
+ [NETIF_F_GSO_IPXIP4_BIT] = "tx-ipxip4-segmentation",
+ [NETIF_F_GSO_IPXIP6_BIT] = "tx-ipxip6-segmentation",
+ [NETIF_F_GSO_UDP_TUNNEL_BIT] = "tx-udp_tnl-segmentation",
+ [NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT] = "tx-udp_tnl-csum-segmentation",
+ [NETIF_F_GSO_PARTIAL_BIT] = "tx-gso-partial",
+ [NETIF_F_GSO_SCTP_BIT] = "tx-sctp-segmentation",
+ [NETIF_F_GSO_ESP_BIT] = "tx-esp-segmentation",
+ [NETIF_F_GSO_UDP_L4_BIT] = "tx-udp-segmentation",
+
+ [NETIF_F_FCOE_CRC_BIT] = "tx-checksum-fcoe-crc",
+ [NETIF_F_SCTP_CRC_BIT] = "tx-checksum-sctp",
+ [NETIF_F_FCOE_MTU_BIT] = "fcoe-mtu",
+ [NETIF_F_NTUPLE_BIT] = "rx-ntuple-filter",
+ [NETIF_F_RXHASH_BIT] = "rx-hashing",
+ [NETIF_F_RXCSUM_BIT] = "rx-checksum",
+ [NETIF_F_NOCACHE_COPY_BIT] = "tx-nocache-copy",
+ [NETIF_F_LOOPBACK_BIT] = "loopback",
+ [NETIF_F_RXFCS_BIT] = "rx-fcs",
+ [NETIF_F_RXALL_BIT] = "rx-all",
+ [NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload",
+ [NETIF_F_HW_TC_BIT] = "hw-tc-offload",
+ [NETIF_F_HW_ESP_BIT] = "esp-hw-offload",
+ [NETIF_F_HW_ESP_TX_CSUM_BIT] = "esp-tx-csum-hw-offload",
+ [NETIF_F_RX_UDP_TUNNEL_PORT_BIT] = "rx-udp_tunnel-port-offload",
+ [NETIF_F_HW_TLS_RECORD_BIT] = "tls-hw-record",
+ [NETIF_F_HW_TLS_TX_BIT] = "tls-hw-tx-offload",
+ [NETIF_F_HW_TLS_RX_BIT] = "tls-hw-rx-offload",
+};
+
+const char
+rss_hash_func_strings[ETH_RSS_HASH_FUNCS_COUNT][ETH_GSTRING_LEN] = {
+ [ETH_RSS_HASH_TOP_BIT] = "toeplitz",
+ [ETH_RSS_HASH_XOR_BIT] = "xor",
+ [ETH_RSS_HASH_CRC32_BIT] = "crc32",
+};
+
+const char
+tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
+ [ETHTOOL_ID_UNSPEC] = "Unspec",
+ [ETHTOOL_RX_COPYBREAK] = "rx-copybreak",
+ [ETHTOOL_TX_COPYBREAK] = "tx-copybreak",
+ [ETHTOOL_PFC_PREVENTION_TOUT] = "pfc-prevention-tout",
+};
+
+const char
+phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
+ [ETHTOOL_ID_UNSPEC] = "Unspec",
+ [ETHTOOL_PHY_DOWNSHIFT] = "phy-downshift",
+ [ETHTOOL_PHY_FAST_LINK_DOWN] = "phy-fast-link-down",
+};
diff --git a/net/ethtool/common.h b/net/ethtool/common.h
new file mode 100644
index 000000000000..41b2efc1e4e1
--- /dev/null
+++ b/net/ethtool/common.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _ETHTOOL_COMMON_H
+#define _ETHTOOL_COMMON_H
+
+#include <linux/ethtool.h>
+
+extern const char
+netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN];
+extern const char
+rss_hash_func_strings[ETH_RSS_HASH_FUNCS_COUNT][ETH_GSTRING_LEN];
+extern const char
+tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN];
+extern const char
+phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN];
+
+#endif /* _ETHTOOL_COMMON_H */
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 6288e69e94fc..b35366dd9997 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -27,6 +27,8 @@
#include <net/xdp_sock.h>
#include <net/flow_offload.h>
+#include "common.h"
+
/*
* Some useful ethtool_ops methods that're device independent.
* If we find that all drivers want to do the same thing here,
@@ -54,87 +56,6 @@ EXPORT_SYMBOL(ethtool_op_get_ts_info);
#define ETHTOOL_DEV_FEATURE_WORDS ((NETDEV_FEATURE_COUNT + 31) / 32)
-static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
- [NETIF_F_SG_BIT] = "tx-scatter-gather",
- [NETIF_F_IP_CSUM_BIT] = "tx-checksum-ipv4",
- [NETIF_F_HW_CSUM_BIT] = "tx-checksum-ip-generic",
- [NETIF_F_IPV6_CSUM_BIT] = "tx-checksum-ipv6",
- [NETIF_F_HIGHDMA_BIT] = "highdma",
- [NETIF_F_FRAGLIST_BIT] = "tx-scatter-gather-fraglist",
- [NETIF_F_HW_VLAN_CTAG_TX_BIT] = "tx-vlan-hw-insert",
-
- [NETIF_F_HW_VLAN_CTAG_RX_BIT] = "rx-vlan-hw-parse",
- [NETIF_F_HW_VLAN_CTAG_FILTER_BIT] = "rx-vlan-filter",
- [NETIF_F_HW_VLAN_STAG_TX_BIT] = "tx-vlan-stag-hw-insert",
- [NETIF_F_HW_VLAN_STAG_RX_BIT] = "rx-vlan-stag-hw-parse",
- [NETIF_F_HW_VLAN_STAG_FILTER_BIT] = "rx-vlan-stag-filter",
- [NETIF_F_VLAN_CHALLENGED_BIT] = "vlan-challenged",
- [NETIF_F_GSO_BIT] = "tx-generic-segmentation",
- [NETIF_F_LLTX_BIT] = "tx-lockless",
- [NETIF_F_NETNS_LOCAL_BIT] = "netns-local",
- [NETIF_F_GRO_BIT] = "rx-gro",
- [NETIF_F_GRO_HW_BIT] = "rx-gro-hw",
- [NETIF_F_LRO_BIT] = "rx-lro",
-
- [NETIF_F_TSO_BIT] = "tx-tcp-segmentation",
- [NETIF_F_GSO_ROBUST_BIT] = "tx-gso-robust",
- [NETIF_F_TSO_ECN_BIT] = "tx-tcp-ecn-segmentation",
- [NETIF_F_TSO_MANGLEID_BIT] = "tx-tcp-mangleid-segmentation",
- [NETIF_F_TSO6_BIT] = "tx-tcp6-segmentation",
- [NETIF_F_FSO_BIT] = "tx-fcoe-segmentation",
- [NETIF_F_GSO_GRE_BIT] = "tx-gre-segmentation",
- [NETIF_F_GSO_GRE_CSUM_BIT] = "tx-gre-csum-segmentation",
- [NETIF_F_GSO_IPXIP4_BIT] = "tx-ipxip4-segmentation",
- [NETIF_F_GSO_IPXIP6_BIT] = "tx-ipxip6-segmentation",
- [NETIF_F_GSO_UDP_TUNNEL_BIT] = "tx-udp_tnl-segmentation",
- [NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT] = "tx-udp_tnl-csum-segmentation",
- [NETIF_F_GSO_PARTIAL_BIT] = "tx-gso-partial",
- [NETIF_F_GSO_SCTP_BIT] = "tx-sctp-segmentation",
- [NETIF_F_GSO_ESP_BIT] = "tx-esp-segmentation",
- [NETIF_F_GSO_UDP_L4_BIT] = "tx-udp-segmentation",
-
- [NETIF_F_FCOE_CRC_BIT] = "tx-checksum-fcoe-crc",
- [NETIF_F_SCTP_CRC_BIT] = "tx-checksum-sctp",
- [NETIF_F_FCOE_MTU_BIT] = "fcoe-mtu",
- [NETIF_F_NTUPLE_BIT] = "rx-ntuple-filter",
- [NETIF_F_RXHASH_BIT] = "rx-hashing",
- [NETIF_F_RXCSUM_BIT] = "rx-checksum",
- [NETIF_F_NOCACHE_COPY_BIT] = "tx-nocache-copy",
- [NETIF_F_LOOPBACK_BIT] = "loopback",
- [NETIF_F_RXFCS_BIT] = "rx-fcs",
- [NETIF_F_RXALL_BIT] = "rx-all",
- [NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload",
- [NETIF_F_HW_TC_BIT] = "hw-tc-offload",
- [NETIF_F_HW_ESP_BIT] = "esp-hw-offload",
- [NETIF_F_HW_ESP_TX_CSUM_BIT] = "esp-tx-csum-hw-offload",
- [NETIF_F_RX_UDP_TUNNEL_PORT_BIT] = "rx-udp_tunnel-port-offload",
- [NETIF_F_HW_TLS_RECORD_BIT] = "tls-hw-record",
- [NETIF_F_HW_TLS_TX_BIT] = "tls-hw-tx-offload",
- [NETIF_F_HW_TLS_RX_BIT] = "tls-hw-rx-offload",
-};
-
-static const char
-rss_hash_func_strings[ETH_RSS_HASH_FUNCS_COUNT][ETH_GSTRING_LEN] = {
- [ETH_RSS_HASH_TOP_BIT] = "toeplitz",
- [ETH_RSS_HASH_XOR_BIT] = "xor",
- [ETH_RSS_HASH_CRC32_BIT] = "crc32",
-};
-
-static const char
-tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
- [ETHTOOL_ID_UNSPEC] = "Unspec",
- [ETHTOOL_RX_COPYBREAK] = "rx-copybreak",
- [ETHTOOL_TX_COPYBREAK] = "tx-copybreak",
- [ETHTOOL_PFC_PREVENTION_TOUT] = "pfc-prevention-tout",
-};
-
-static const char
-phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
- [ETHTOOL_ID_UNSPEC] = "Unspec",
- [ETHTOOL_PHY_DOWNSHIFT] = "phy-downshift",
- [ETHTOOL_PHY_FAST_LINK_DOWN] = "phy-fast-link-down",
-};
-
static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
{
struct ethtool_gfeatures cmd = {
--
2.22.0
Significant part of GET request processing is common for most request
types but unfortunately it cannot be easily separated from type specific
code as we need to alternate between common actions (parsing common request
header, allocating message and filling netlink/genetlink headers etc.) and
specific actions (querying the device, composing the reply). The processing
also happens in three different situations: "do" request, "dump" request
and notification, each doing things in slightly different way.
The request specific code is implemented in four or five callbacks defined
in an instance of struct get_request_ops:
parse_request() - parse incoming message
prepare_data() - retrieve data from driver or NIC
reply_size() - estimate reply message size
fill_reply() - compose reply message
cleanup() - (optional) clean up additional data
Other members of struct get_request_ops describe the data structure holding
information from client request and data used to compose the message. The
standard handlers ethnl_get_doit(), ethnl_get_dumpit(), ethnl_get_start()
and ethnl_get_done() can be then used in genl_ops handler. Notification
handler will be introduced in a later patch.
Signed-off-by: Michal Kubecek <[email protected]>
---
net/ethtool/netlink.c | 337 ++++++++++++++++++++++++++++++++++++++++++
net/ethtool/netlink.h | 128 ++++++++++++++++
2 files changed, 465 insertions(+)
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index a7a0bfe1818c..6d326cc25aac 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -178,6 +178,343 @@ struct sk_buff *ethnl_reply_init(size_t payload, struct net_device *dev, u8 cmd,
return NULL;
}
+/* GET request helpers */
+
+/**
+ * struct ethnl_dump_ctx - context structure for generic dumpit() callback
+ * @ops: request ops of currently processed message type
+ * @req_info: parsed request header of processed request
+ * @pos_hash: saved iteration position - hashbucket
+ * @pos_idx: saved iteration position - index
+ *
+ * These parameters are kept in struct netlink_callback as context preserved
+ * between iterations. They are initialized by ethnl_get_start() and used in
+ * ethnl_get_dumpit() and ethnl_get_done().
+ */
+struct ethnl_dump_ctx {
+ const struct get_request_ops *ops;
+ struct ethnl_req_info *req_info;
+ int pos_hash;
+ int pos_idx;
+};
+
+static const struct get_request_ops *get_requests[__ETHTOOL_MSG_USER_CNT] = {
+};
+
+static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb)
+{
+ return (struct ethnl_dump_ctx *)cb->ctx;
+}
+
+/**
+ * ethnl_alloc_get_data() - Allocate and initialize data for a GET request
+ * @ops: instance of struct get_request_ops describing size and layout
+ *
+ * This initializes only the first part (req_info), second part (reply_data)
+ * is initialized before filling the reply data into it (which is done for
+ * each iteration in dump requests).
+ *
+ * Return: pointer to allocated and initialized data, NULL on error
+ */
+static struct ethnl_req_info *
+ethnl_alloc_get_data(const struct get_request_ops *ops)
+{
+ struct ethnl_req_info *req_info;
+
+ req_info = kmalloc(ops->data_size, GFP_KERNEL);
+ if (!req_info)
+ return NULL;
+
+ memset(req_info, '\0', ops->repdata_offset);
+ req_info->reply_data =
+ (struct ethnl_reply_data *)((char *)req_info +
+ ops->repdata_offset);
+
+ return req_info;
+}
+
+/**
+ * ethnl_free_get_data() - free GET request data
+ * @ops: instance of struct get_request_ops describing the layout
+ * @req_info: pointer to embedded struct ethnl_req_info (at offset 0)
+ *
+ * Calls ->cleanup() handler if defined and frees the data block.
+ */
+static void ethnl_free_get_data(const struct get_request_ops *ops,
+ struct ethnl_req_info *req_info)
+{
+ if (ops->cleanup)
+ ops->cleanup(req_info);
+ kfree(req_info);
+}
+
+/**
+ * ethnl_std_parse() - Parse request message
+ * @req_info: pointer to structure to put data into
+ * @nlhdr: pointer to request message header
+ * @net: request netns
+ * @request_ops: struct request_ops for request type
+ * @extack: netlink extack for error reporting
+ * @require_dev: fail if no device identiified in header
+ *
+ * Parse universal request header and call request specific ->parse_request()
+ * callback (if defined) to parse the rest of the message.
+ *
+ * Return: 0 on success or negative error code
+ */
+static int ethnl_std_parse(struct ethnl_req_info *req_info,
+ const struct nlmsghdr *nlhdr, struct net *net,
+ const struct get_request_ops *request_ops,
+ struct netlink_ext_ack *extack, bool require_dev)
+{
+ struct nlattr **tb;
+ int ret;
+
+ tb = kmalloc_array(request_ops->max_attr + 1, sizeof(tb[0]),
+ GFP_KERNEL);
+ if (!tb)
+ return -ENOMEM;
+
+ ret = nlmsg_parse(nlhdr, GENL_HDRLEN, tb, request_ops->max_attr,
+ request_ops->request_policy, extack);
+ if (ret < 0)
+ goto out;
+ ret = ethnl_parse_header(req_info, tb[request_ops->hdr_attr], net,
+ extack, request_ops->header_policy,
+ require_dev);
+ if (ret < 0)
+ goto out;
+
+ if (request_ops->parse_request) {
+ ret = request_ops->parse_request(req_info, tb, extack);
+ if (ret < 0)
+ goto out;
+ }
+
+ if (req_info->req_mask == 0)
+ req_info->req_mask = request_ops->default_infomask;
+ if (req_info->req_flags & ~request_ops->all_reqflags) {
+ ret = -EOPNOTSUPP;
+ NL_SET_ERR_MSG_ATTR(extack, tb[ETHTOOL_A_HEADER_RFLAGS],
+ "unsupported request specific flags");
+ goto out;
+ }
+
+ ret = 0;
+out:
+ kfree(tb);
+ return ret;
+}
+
+/**
+ * ethnl_init_reply_data() - Initialize reply data for GET request
+ * @req_info: pointer to embedded struct ethnl_req_info
+ * @ops: instance of struct get_request_ops describing the layout
+ * @dev: network device to initialize the reply for
+ *
+ * Fills the reply data part with zeros and sets the dev member. Must be called
+ * before calling the ->fill_reply() callback (for each iteration when handling
+ * dump requests).
+ */
+static void ethnl_init_reply_data(const struct ethnl_req_info *req_info,
+ const struct get_request_ops *ops,
+ struct net_device *dev)
+{
+ memset(req_info->reply_data, '\0',
+ ops->data_size - ops->repdata_offset);
+ req_info->reply_data->dev = dev;
+}
+
+/* generic ->doit() handler for GET type requests */
+static int ethnl_get_doit(struct sk_buff *skb, struct genl_info *info)
+{
+ const u8 cmd = info->genlhdr->cmd;
+ const struct get_request_ops *ops;
+ struct ethnl_req_info *req_info;
+ struct sk_buff *rskb;
+ void *reply_payload;
+ int reply_len;
+ int ret;
+
+ ops = get_requests[cmd];
+ if (WARN_ONCE(!ops, "cmd %u has no get_request_ops\n", cmd))
+ return -EOPNOTSUPP;
+ req_info = ethnl_alloc_get_data(ops);
+ if (!req_info)
+ return -ENOMEM;
+ ret = ethnl_std_parse(req_info, info->nlhdr, genl_info_net(info), ops,
+ info->extack, !ops->allow_nodev_do);
+ if (ret < 0)
+ goto err_dev;
+ req_info->privileged = ethnl_is_privileged(skb);
+ ethnl_init_reply_data(req_info, ops, req_info->dev);
+
+ rtnl_lock();
+ ret = ops->prepare_data(req_info, info);
+ if (ret < 0)
+ goto err_rtnl;
+ reply_len = ops->reply_size(req_info);
+ if (ret < 0)
+ goto err_cleanup;
+ ret = -ENOMEM;
+ rskb = ethnl_reply_init(reply_len, req_info->dev, ops->reply_cmd,
+ ops->hdr_attr, info, &reply_payload);
+ if (!rskb)
+ goto err_cleanup;
+ ret = ops->fill_reply(rskb, req_info);
+ if (ret < 0)
+ goto err_msg;
+ rtnl_unlock();
+
+ genlmsg_end(rskb, reply_payload);
+ if (req_info->dev)
+ dev_put(req_info->dev);
+ ethnl_free_get_data(ops, req_info);
+ return genlmsg_reply(rskb, info);
+
+err_msg:
+ WARN_ONCE(ret == -EMSGSIZE,
+ "calculated message payload length (%d) not sufficient\n",
+ reply_len);
+ nlmsg_free(rskb);
+err_cleanup:
+ ethnl_free_get_data(ops, req_info);
+err_rtnl:
+ rtnl_unlock();
+err_dev:
+ if (req_info->dev)
+ dev_put(req_info->dev);
+ return ret;
+}
+
+static int ethnl_get_dump_one(struct sk_buff *skb,
+ struct net_device *dev,
+ const struct get_request_ops *ops,
+ struct ethnl_req_info *req_info)
+{
+ int ret;
+
+ ethnl_init_reply_data(req_info, ops, dev);
+ rtnl_lock();
+ ret = ops->prepare_data(req_info, NULL);
+ if (ret < 0)
+ goto out;
+ ret = ethnl_fill_reply_header(skb, dev, ops->hdr_attr);
+ if (ret < 0)
+ goto out_cleanup;
+ ret = ops->fill_reply(skb, req_info);
+
+out_cleanup:
+ if (ops->cleanup)
+ ops->cleanup(req_info);
+out:
+ rtnl_unlock();
+ req_info->reply_data->dev = NULL;
+ return ret;
+}
+
+/* generic ->dumpit() handler for GET requests; device iteration copied from
+ * rtnl_dump_ifinfo()
+ */
+static int ethnl_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct ethnl_dump_ctx *ctx = ethnl_dump_context(cb);
+ struct ethnl_req_info *req_info = ctx->req_info;
+ const struct get_request_ops *ops = ctx->ops;
+ struct net *net = sock_net(skb->sk);
+ int s_idx = ctx->pos_idx;
+ struct hlist_head *head;
+ struct net_device *dev;
+ int h, idx = 0;
+ int ret = 0;
+ void *ehdr;
+
+ for (h = ctx->pos_hash; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
+ idx = 0;
+ head = &net->dev_index_head[h];
+ hlist_for_each_entry(dev, head, index_hlist) {
+ if (idx < s_idx)
+ goto cont;
+ ehdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq,
+ ðtool_genl_family, 0,
+ ops->reply_cmd);
+ ret = ethnl_get_dump_one(skb, dev, ops, req_info);
+ if (ret < 0) {
+ genlmsg_cancel(skb, ehdr);
+ if (ret == -EOPNOTSUPP)
+ goto cont;
+ if (likely(skb->len))
+ goto out;
+ goto out_err;
+ }
+ genlmsg_end(skb, ehdr);
+cont:
+ idx++;
+ }
+ }
+out:
+ ret = skb->len;
+out_err:
+ ctx->pos_hash = h;
+ ctx->pos_idx = idx;
+ cb->seq = net->dev_base_seq;
+ nl_dump_check_consistent(cb, nlmsg_hdr(skb));
+
+ return ret;
+}
+
+/* generic ->start() handler for GET requests */
+static int ethnl_get_start(struct netlink_callback *cb)
+{
+ struct ethnl_dump_ctx *ctx = ethnl_dump_context(cb);
+ const struct get_request_ops *ops;
+ struct ethnl_req_info *req_info;
+ struct genlmsghdr *ghdr;
+ int ret;
+
+ BUILD_BUG_ON(sizeof(*ctx) > sizeof(cb->ctx));
+
+ ghdr = nlmsg_data(cb->nlh);
+ ops = get_requests[ghdr->cmd];
+ if (WARN_ONCE(!ops, "cmd %u has no get_request_ops\n", ghdr->cmd))
+ return -EOPNOTSUPP;
+ req_info = ethnl_alloc_get_data(ops);
+ if (!req_info)
+ return -ENOMEM;
+
+ ret = ethnl_std_parse(req_info, cb->nlh, sock_net(cb->skb->sk), ops,
+ cb->extack, false);
+ if (req_info->dev) {
+ /* We ignore device specification in dump requests but as the
+ * same parser as for non-dump (doit) requests is used, it
+ * would take reference to the device if it finds one
+ */
+ dev_put(req_info->dev);
+ req_info->dev = NULL;
+ }
+ if (ret < 0)
+ return ret;
+ req_info->privileged = ethnl_is_privileged(cb->skb);
+
+ ctx->ops = ops;
+ ctx->req_info = req_info;
+ ctx->pos_hash = 0;
+ ctx->pos_idx = 0;
+
+ return 0;
+}
+
+/* generic ->done() handler for GET requests */
+static int ethnl_get_done(struct netlink_callback *cb)
+{
+ struct ethnl_dump_ctx *ctx = ethnl_dump_context(cb);
+
+ kfree(ctx->req_info);
+
+ return 0;
+}
+
/* notifications */
typedef void (*ethnl_notify_handler_t)(struct net_device *dev,
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 7f1b9ec1ace7..6a9695c3b0c6 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -143,8 +143,32 @@ static inline unsigned int ethnl_reply_header_size(void)
nla_total_size(IFNAMSIZ));
}
+/* GET request handling */
+
+struct ethnl_reply_data;
+
+/* The structure holding data for unified processing GET requests consists of
+ * two parts: request info and reply data. Request info is related to client
+ * request and for dump request it stays constant through all processing;
+ * reply data contains data for composing a reply message. When processing
+ * a dump request, request info is filled only once but reply data is filled
+ * from scratch for each reply message.
+ *
+ * +-----------------+-----------------+------------------+-----------------+
+ * | common_req_info | specific info | ethnl_reply_data | specific data |
+ * +-----------------+-----------------+------------------+-----------------+
+ * |<---------- request info --------->|<----------- reply data ----------->|
+ *
+ * Request info always starts at offset 0 with struct ethnl_req_info which
+ * holds information from parsing the common header. It may be followed by
+ * other members for request attributes specific for current message type.
+ * Reply data starts with struct ethnl_reply_data which may be followed by
+ * other members holding data needed to compose a message.
+ */
+
/**
* struct ethnl_req_info - base type of request information for GET requests
+ * @reply_data: pointer to reply data within the same block
* @dev: network device the request is for (may be null)
* @req_mask: request mask, bitmap of requested information
* @global_flags: request flags common for all request types
@@ -154,6 +178,7 @@ static inline unsigned int ethnl_reply_header_size(void)
* This is a common base, additional members may follow after this structure.
*/
struct ethnl_req_info {
+ struct ethnl_reply_data *reply_data;
struct net_device *dev;
u32 req_mask;
u32 global_flags;
@@ -161,4 +186,107 @@ struct ethnl_req_info {
bool privileged;
};
+/**
+ * struct ethnl_reply_data - base type of reply data for GET requests
+ * @dev: device for current reply message; in single shot requests it is
+ * equal to ðnl_req_info.dev; in dumps it's different for each
+ * reply message
+ * @info_mask: bitmap of information actually provided in reply; it is a subset
+ * of ðnl_req_info.req_mask with cleared bits corresponding to
+ * information which cannot be provided
+ *
+ * This structure is usually followed by additional members filled by
+ * ->prepare_data() and used by ->cleanup().
+ */
+struct ethnl_reply_data {
+ struct net_device *dev;
+ u32 info_mask;
+};
+
+static inline int ethnl_before_ops(struct net_device *dev)
+{
+ if (dev && dev->ethtool_ops->begin)
+ return dev->ethtool_ops->begin(dev);
+ else
+ return 0;
+}
+
+static inline void ethnl_after_ops(struct net_device *dev)
+{
+ if (dev && dev->ethtool_ops->complete)
+ dev->ethtool_ops->complete(dev);
+}
+
+/**
+ * struct get_request_ops - unified handling of GET requests
+ * @request_cmd: command id for request (GET)
+ * @reply_cmd: command id for reply (GET_REPLY)
+ * @hdr_attr: attribute type for request header
+ * @max_attr: maximum (top level) attribute type
+ * @data_size: total length of data structure
+ * @repdata_offset: offset of "reply data" part (struct ethnl_reply_data)
+ * @request_policy: netlink policy for message contents
+ * @header_policy: (optional) netlink policy for request header
+ * @default_infomask: default infomask (to use if none specified)
+ * @all_reqflags: allowed request specific flags
+ * @allow_nodev_do: allow non-dump request with no device identification
+ * @parse_request:
+ * Parse request except common header (struct ethnl_req_info). Common
+ * header is already filled on entry, the rest up to @repdata_offset
+ * is zero initialized. This callback should only modify type specific
+ * request info by parsed attributes from request message.
+ * @prepare_data:
+ * Retrieve and prepare data needed to compose a reply message. Calls to
+ * ethtool_ops handlers should be limited to this callback. Common reply
+ * data (struct ethnl_reply_data) is filled on entry, type specific part
+ * after it is zero initialized. This callback should only modify the
+ * type specific part of reply data. Device identification from struct
+ * ethnl_reply_data is to be used as for dump requests, it iterates
+ * through network devices which common_req_info::dev points to the
+ * device from client request.
+ * @reply_size:
+ * Estimate reply message size. Returned value must be sufficient for
+ * message payload without common reply header. The callback may returned
+ * estimate higher than actual message size if exact calculation would
+ * not be worth the saved memory space.
+ * @fill_reply:
+ * Fill reply message payload (except for common header) from reply data.
+ * The callback must not generate more payload than previously called
+ * ->reply_size() estimated.
+ * @cleanup:
+ * Optional cleanup called when reply data is no longer needed. Can be
+ * used e.g. to free any additional data structures outside the main
+ * structure which were allocated by ->prepare_data(). When processing
+ * dump requests, ->cleanup() is called for each message.
+ *
+ * Description of variable parts of GET request handling when using the unified
+ * infrastructure. When used, a pointer to an instance of this structure is to
+ * be added to &get_requests array and generic handlers ethnl_get_doit(),
+ * ethnl_get_dumpit(), ethnl_get_start() and ethnl_get_done() used in
+ * @ethnl_genl_ops
+ */
+struct get_request_ops {
+ u8 request_cmd;
+ u8 reply_cmd;
+ u16 hdr_attr;
+ unsigned int max_attr;
+ unsigned int data_size;
+ unsigned int repdata_offset;
+ const struct nla_policy *request_policy;
+ const struct nla_policy *header_policy;
+ u32 default_infomask;
+ u32 all_reqflags;
+ bool allow_nodev_do;
+
+ int (*parse_request)(struct ethnl_req_info *req_info,
+ struct nlattr **tb,
+ struct netlink_ext_ack *extack);
+ int (*prepare_data)(struct ethnl_req_info *req_info,
+ struct genl_info *info);
+ int (*reply_size)(const struct ethnl_req_info *req_info);
+ int (*fill_reply)(struct sk_buff *skb,
+ const struct ethnl_req_info *req_info);
+ void (*cleanup)(struct ethnl_req_info *req_info);
+};
+
#endif /* _NET_ETHTOOL_NETLINK_H */
--
2.22.0
Requests a contents of one or more string sets, i.e. indexed arrays of
strings; this information is provided by ETHTOOL_GSSET_INFO and
ETHTOOL_GSTRINGS commands of ioctl interface. Unlike ioctl interface, all
information can be retrieved with one request and mulitple string sets can
be requested at once.
There are three types of requests:
- no NLM_F_DUMP, no device: get "global" stringsets
- no NLM_F_DUMP, with device: get string sets related to the device
- NLM_F_DUMP, no device: get device related string sets for all devices
Client can request either all string sets of given type (global or device
related) or only specific sets. With ETHTOOL_A_STRSET_COUNTS flag set, only
set sizes (numbers of strings) are returned.
Signed-off-by: Michal Kubecek <[email protected]>
---
Documentation/networking/ethtool-netlink.txt | 57 ++-
include/uapi/linux/ethtool.h | 2 +
include/uapi/linux/ethtool_netlink.h | 60 +++
net/ethtool/Makefile | 2 +-
net/ethtool/netlink.c | 8 +
net/ethtool/netlink.h | 4 +
net/ethtool/strset.c | 453 +++++++++++++++++++
7 files changed, 583 insertions(+), 3 deletions(-)
create mode 100644 net/ethtool/strset.c
diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
index 4636682c551f..21e5030734aa 100644
--- a/Documentation/networking/ethtool-netlink.txt
+++ b/Documentation/networking/ethtool-netlink.txt
@@ -148,6 +148,14 @@ according to message purpose:
_ACT_REPLY kernel reply to an ACT request
_NTF kernel notification
+Userspace to kernel:
+
+ ETHTOOL_MSG_STRSET_GET get string set
+
+Kernel to userspace:
+
+ ETHTOOL_MSG_STRSET_GET_REPLY string set contents
+
"GET" requests are sent by userspace applications to retrieve device
information. They usually do not contain any message specific attributes.
Kernel replies with corresponding "GET_REPLY" message. For most types, "GET"
@@ -178,6 +186,51 @@ action also triggers a notification ("NTF" message).
Later sections describe the format and semantics of these messages.
+STRSET_GET
+----------
+
+Requests contents of a string set as provided by ioctl commands
+ETHTOOL_GSSET_INFO and ETHTOOL_GSTRINGS. String sets are not user writeable so
+that the corresponding SET_STRSET message is only used in kernel replies.
+There are two types of string sets: global (independent of a device, e.g.
+device feature names) and device specific (e.g. device private flags).
+
+Request contents:
+
+ ETHTOOL_A_STRSET_HEADER (nested) request header
+ ETHTOOL_A_STRSET_STRINGSETS (nested) string set to request
+ ETHTOOL_A_STRINGSETS_STRINGSET+ (nested) one string set
+ ETHTOOL_A_STRINGSET_ID (u32) set id
+
+Request specific flag:
+
+ ETHTOOL_RF_STRSET_COUNTS send only string counts in reply
+
+Kernel response contents:
+
+ ETHTOOL_A_STRSET_HEADER (nested) reply header
+ ETHTOOL_A_STRSET_STRINGSETS (nested) array of string sets
+ ETHTOOL_A_STRINGSETS_STRINGSET+ (nested) one string set
+ ETHTOOL_A_STRINGSET_ID (u32) set id
+ ETHTOOL_A_STRINGSET_COUNT (u32) number of strings
+ ETHTOOL_A_STRINGSET_STRINGS (nested) array of strings
+ ETHTOOL_A_STRINGS_STRING+ (nested) one string
+ ETHTOOL_A_STRING_INDEX (u32) string index
+ ETHTOOL_A_STRING_VALUE (string) string value
+
+Device identification in request header is optional. Depending on its presence
+a and NLM_F_DUMP flag, there are three type of STRSET_GET requests:
+
+ - no NLM_F_DUMP, no device: get "global" stringsets
+ - no NLM_F_DUMP, with device: get string sets related to the device
+ - NLM_F_DUMP, no device: get device related string sets for all devices
+
+If there is no ETHTOOL_A_STRSET_STRINGSETS array, all string sets of requested
+type are returned, otherwise only those specified in the request.
+Flag ETHTOOL_A_STRSET_COUNTS tells kernel to only return string counts of the
+sets, not the actual strings.
+
+
Request translation
-------------------
@@ -212,7 +265,7 @@ ETHTOOL_STXCSUM n/a
ETHTOOL_GSG n/a
ETHTOOL_SSG n/a
ETHTOOL_TEST n/a
-ETHTOOL_GSTRINGS n/a
+ETHTOOL_GSTRINGS ETHTOOL_MSG_STRSET_GET
ETHTOOL_PHYS_ID n/a
ETHTOOL_GSTATS n/a
ETHTOOL_GTSO n/a
@@ -240,7 +293,7 @@ ETHTOOL_FLASHDEV n/a
ETHTOOL_RESET n/a
ETHTOOL_SRXNTUPLE n/a
ETHTOOL_GRXNTUPLE n/a
-ETHTOOL_GSSET_INFO n/a
+ETHTOOL_GSSET_INFO ETHTOOL_MSG_STRSET_GET
ETHTOOL_GRXFHINDIR n/a
ETHTOOL_SRXFHINDIR n/a
ETHTOOL_GFEATURES n/a
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index dd06302aa93e..4e4e28e77c7a 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -582,6 +582,8 @@ enum ethtool_stringset {
ETH_SS_TUNABLES,
ETH_SS_PHY_STATS,
ETH_SS_PHY_TUNABLES,
+
+ ETH_SS_COUNT
};
/**
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 8938a1f09057..11b8519d2c1d 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -14,6 +14,7 @@
/* message types - userspace to kernel */
enum {
ETHTOOL_MSG_USER_NONE,
+ ETHTOOL_MSG_STRSET_GET,
/* add new constants above here */
__ETHTOOL_MSG_USER_CNT,
@@ -23,6 +24,7 @@ enum {
/* message types - kernel to userspace */
enum {
ETHTOOL_MSG_KERNEL_NONE,
+ ETHTOOL_MSG_STRSET_GET_REPLY,
/* add new constants above here */
__ETHTOOL_MSG_KERNEL_CNT,
@@ -87,6 +89,64 @@ enum {
ETHTOOL_A_BITSET_MAX = (__ETHTOOL_A_BITSET_CNT - 1)
};
+/* string sets */
+
+enum {
+ ETHTOOL_A_STRING_UNSPEC,
+ ETHTOOL_A_STRING_INDEX, /* u32 */
+ ETHTOOL_A_STRING_VALUE, /* string */
+
+ /* add new constants above here */
+ __ETHTOOL_A_STRING_CNT,
+ ETHTOOL_A_STRING_MAX = (__ETHTOOL_A_STRING_CNT - 1)
+};
+
+enum {
+ ETHTOOL_A_STRINGS_UNSPEC,
+ ETHTOOL_A_STRINGS_STRING, /* nest - _A_STRINGS_* */
+
+ /* add new constants above here */
+ __ETHTOOL_A_STRINGS_CNT,
+ ETHTOOL_A_STRINGS_MAX = (__ETHTOOL_A_STRINGS_CNT - 1)
+};
+
+enum {
+ ETHTOOL_A_STRINGSET_UNSPEC,
+ ETHTOOL_A_STRINGSET_ID, /* u32 */
+ ETHTOOL_A_STRINGSET_COUNT, /* u32 */
+ ETHTOOL_A_STRINGSET_STRINGS, /* nest - _A_STRINGS_* */
+
+ /* add new constants above here */
+ __ETHTOOL_A_STRINGSET_CNT,
+ ETHTOOL_A_STRINGSET_MAX = (__ETHTOOL_A_STRINGSET_CNT - 1)
+};
+
+/* STRSET */
+
+enum {
+ ETHTOOL_A_STRSET_UNSPEC,
+ ETHTOOL_A_STRSET_HEADER, /* nest - _A_HEADER_* */
+ ETHTOOL_A_STRSET_STRINGSETS, /* nest - _A_STRINGSETS_* */
+
+ /* add new constants above here */
+ __ETHTOOL_A_STRSET_CNT,
+ ETHTOOL_A_STRSET_MAX = (__ETHTOOL_A_STRSET_CNT - 1)
+};
+
+enum {
+ ETHTOOL_A_STRINGSETS_UNSPEC,
+ ETHTOOL_A_STRINGSETS_STRINGSET, /* nest - _A_STRINGSET_* */
+
+ /* add new constants above here */
+ __ETHTOOL_A_STRINGSETS_CNT,
+ ETHTOOL_A_STRINGSETS_MAX = (__ETHTOOL_A_STRINGSETS_CNT - 1)
+};
+
+/* return only string counts, not the strings */
+#define ETHTOOL_RF_STRSET_COUNTS (1 << 0)
+
+#define ETHTOOL_RF_STRSET_ALL (ETHTOOL_RF_STRSET_COUNTS)
+
/* generic netlink info */
#define ETHTOOL_GENL_NAME "ethtool"
#define ETHTOOL_GENL_VERSION 1
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index 11782306593b..11ceb00821b3 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -4,4 +4,4 @@ obj-y += ioctl.o common.o
obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
-ethtool_nl-y := netlink.o bitset.o
+ethtool_nl-y := netlink.o bitset.o strset.o
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 6d326cc25aac..41d7fedd3dd6 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -199,6 +199,7 @@ struct ethnl_dump_ctx {
};
static const struct get_request_ops *get_requests[__ETHTOOL_MSG_USER_CNT] = {
+ [ETHTOOL_MSG_STRSET_GET] = &strset_request_ops,
};
static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb)
@@ -544,6 +545,13 @@ EXPORT_SYMBOL(ethtool_notify);
/* genetlink setup */
static const struct genl_ops ethtool_genl_ops[] = {
+ {
+ .cmd = ETHTOOL_MSG_STRSET_GET,
+ .doit = ethnl_get_doit,
+ .start = ethnl_get_start,
+ .dumpit = ethnl_get_dumpit,
+ .done = ethnl_get_done,
+ },
};
static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 6a9695c3b0c6..2352fd9c17c3 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -289,4 +289,8 @@ struct get_request_ops {
void (*cleanup)(struct ethnl_req_info *req_info);
};
+/* request handlers */
+
+extern const struct get_request_ops strset_request_ops;
+
#endif /* _NET_ETHTOOL_NETLINK_H */
diff --git a/net/ethtool/strset.c b/net/ethtool/strset.c
new file mode 100644
index 000000000000..fd7229379158
--- /dev/null
+++ b/net/ethtool/strset.c
@@ -0,0 +1,453 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+
+#include <linux/ethtool.h>
+#include <linux/phy.h>
+#include "netlink.h"
+#include "common.h"
+
+enum strset_type {
+ ETH_SS_TYPE_NONE,
+ ETH_SS_TYPE_LEGACY,
+ ETH_SS_TYPE_SIMPLE,
+};
+
+struct strset_info {
+ enum strset_type type;
+ bool per_dev;
+ bool free_data;
+ unsigned int count;
+ union {
+ const char (*legacy)[ETH_GSTRING_LEN];
+ const char * const *simple;
+ void *ptr;
+ } data;
+};
+
+static const struct strset_info info_template[] = {
+ [ETH_SS_TEST] = {
+ .type = ETH_SS_TYPE_LEGACY,
+ .per_dev = true,
+ },
+ [ETH_SS_STATS] = {
+ .type = ETH_SS_TYPE_LEGACY,
+ .per_dev = true,
+ },
+ [ETH_SS_PRIV_FLAGS] = {
+ .type = ETH_SS_TYPE_LEGACY,
+ .per_dev = true,
+ },
+ [ETH_SS_NTUPLE_FILTERS] = {
+ .type = ETH_SS_TYPE_NONE,
+ },
+ [ETH_SS_FEATURES] = {
+ .type = ETH_SS_TYPE_LEGACY,
+ .per_dev = false,
+ .count = ARRAY_SIZE(netdev_features_strings),
+ .data = { .legacy = netdev_features_strings },
+ },
+ [ETH_SS_RSS_HASH_FUNCS] = {
+ .type = ETH_SS_TYPE_LEGACY,
+ .per_dev = false,
+ .count = ARRAY_SIZE(rss_hash_func_strings),
+ .data = { .legacy = rss_hash_func_strings },
+ },
+ [ETH_SS_TUNABLES] = {
+ .type = ETH_SS_TYPE_LEGACY,
+ .per_dev = false,
+ .count = ARRAY_SIZE(tunable_strings),
+ .data = { .legacy = tunable_strings },
+ },
+ [ETH_SS_PHY_STATS] = {
+ .type = ETH_SS_TYPE_LEGACY,
+ .per_dev = true,
+ },
+ [ETH_SS_PHY_TUNABLES] = {
+ .type = ETH_SS_TYPE_LEGACY,
+ .per_dev = false,
+ .count = ARRAY_SIZE(phy_tunable_strings),
+ .data = { .legacy = phy_tunable_strings },
+ },
+};
+
+struct strset_data {
+ struct ethnl_req_info reqinfo_base;
+ u32 req_ids;
+ bool counts_only;
+
+ /* everything below here will be reset for each device in dumps */
+ struct ethnl_reply_data repdata_base;
+ struct strset_info info[ETH_SS_COUNT];
+};
+
+static const struct nla_policy strset_get_policy[ETHTOOL_A_STRSET_MAX + 1] = {
+ [ETHTOOL_A_STRSET_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_STRSET_HEADER] = { .type = NLA_NESTED },
+ [ETHTOOL_A_STRSET_STRINGSETS] = { .type = NLA_NESTED },
+};
+
+static const struct nla_policy
+get_stringset_policy[ETHTOOL_A_STRINGSET_MAX + 1] = {
+ [ETHTOOL_A_STRINGSET_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_STRINGSET_ID] = { .type = NLA_U32 },
+ [ETHTOOL_A_STRINGSET_COUNT] = { .type = NLA_REJECT },
+ [ETHTOOL_A_STRINGSET_STRINGS] = { .type = NLA_REJECT },
+};
+
+/**
+ * strset_include() - test if a string set should be included in reply
+ * @data: pointer to request data structure
+ * @id: id of string set to check (ETH_SS_* constants)
+ */
+static bool strset_include(const struct strset_data *data, u32 id)
+{
+ bool per_dev;
+
+ BUILD_BUG_ON(ETH_SS_COUNT >= BITS_PER_BYTE * sizeof(data->req_ids));
+
+ if (data->req_ids)
+ return data->req_ids & (1U << id);
+ per_dev = data->info[id].per_dev;
+ if (data->info[id].type == ETH_SS_TYPE_NONE)
+ return false;
+
+ return data->repdata_base.dev ? per_dev : !per_dev;
+}
+
+static int strset_get_id(const struct nlattr *nest, u32 *val,
+ struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[ETHTOOL_A_STRINGSET_MAX + 1];
+ int ret;
+
+ ret = nla_parse_nested(tb, ETHTOOL_A_STRINGSET_MAX, nest,
+ get_stringset_policy, extack);
+ if (ret < 0)
+ return ret;
+ if (!tb[ETHTOOL_A_STRINGSET_ID])
+ return -EINVAL;
+
+ *val = nla_get_u32(tb[ETHTOOL_A_STRINGSET_ID]);
+ return 0;
+}
+
+static const struct nla_policy strset_hdr_policy[ETHTOOL_A_HEADER_MAX + 1] = {
+ [ETHTOOL_A_HEADER_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_HEADER_DEV_INDEX] = { .type = NLA_U32 },
+ [ETHTOOL_A_HEADER_DEV_NAME] = { .type = NLA_NUL_STRING,
+ .len = IFNAMSIZ - 1 },
+ [ETHTOOL_A_HEADER_INFOMASK] = { .type = NLA_REJECT },
+ [ETHTOOL_A_HEADER_GFLAGS] = { .type = NLA_REJECT },
+ [ETHTOOL_A_HEADER_RFLAGS] = { .type = NLA_U32 },
+};
+
+static const struct nla_policy
+strset_stringsets_policy[ETHTOOL_A_STRINGSETS_MAX + 1] = {
+ [ETHTOOL_A_STRINGSETS_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_STRINGSETS_STRINGSET] = { .type = NLA_NESTED },
+};
+
+/* parse_request() handler */
+static int strset_parse(struct ethnl_req_info *req_info,
+ struct nlattr **tb, struct netlink_ext_ack *extack)
+{
+ struct strset_data *data =
+ container_of(req_info, struct strset_data, reqinfo_base);
+ struct nlattr *nest = tb[ETHTOOL_A_STRSET_STRINGSETS];
+ struct nlattr *attr;
+ int rem, ret;
+
+ if (!nest)
+ return 0;
+ ret = nla_validate_nested(nest, ETHTOOL_A_STRINGSETS_MAX,
+ strset_stringsets_policy, extack);
+ if (ret < 0)
+ return ret;
+
+ nla_for_each_nested(attr, nest, rem) {
+ u32 id;
+
+ if (WARN_ONCE(nla_type(attr) != ETHTOOL_A_STRINGSETS_STRINGSET,
+ "unexpected attrtype %u in ETHTOOL_A_STRSET_STRINGSETS\n",
+ nla_type(attr)))
+ return -EINVAL;
+
+ ret = strset_get_id(attr, &id, extack);
+ if (ret < 0)
+ return ret;
+ if (ret >= ETH_SS_COUNT) {
+ NL_SET_ERR_MSG_ATTR(extack, attr,
+ "unknown string set id");
+ return -EOPNOTSUPP;
+ }
+
+ data->req_ids |= (1U << id);
+ }
+
+ return 0;
+}
+
+/* cleanup() handler - free allocated data (if any) */
+static void strset_cleanup(struct ethnl_req_info *req_info)
+{
+ struct strset_data *data =
+ container_of(req_info, struct strset_data, reqinfo_base);
+ unsigned int i;
+
+ for (i = 0; i < ETH_SS_COUNT; i++)
+ if (data->info[i].free_data) {
+ kfree(data->info[i].data.ptr);
+ data->info[i].data.ptr = NULL;
+ data->info[i].free_data = false;
+ }
+}
+
+static int strset_prepare_set(struct strset_info *info, struct net_device *dev,
+ unsigned int id, bool counts_only)
+{
+ const struct ethtool_ops *ops = dev->ethtool_ops;
+ void *strings;
+ int count, ret;
+
+ if (id == ETH_SS_PHY_STATS && dev->phydev &&
+ !ops->get_ethtool_phy_stats)
+ ret = phy_ethtool_get_sset_count(dev->phydev);
+ else if (ops->get_sset_count && ops->get_strings)
+ ret = ops->get_sset_count(dev, id);
+ else
+ ret = -EOPNOTSUPP;
+ if (ret <= 0) {
+ info->count = 0;
+ return 0;
+ }
+
+ count = ret;
+ if (!counts_only) {
+ strings = kcalloc(count, ETH_GSTRING_LEN, GFP_KERNEL);
+ if (!strings)
+ return -ENOMEM;
+ if (id == ETH_SS_PHY_STATS && dev->phydev &&
+ !ops->get_ethtool_phy_stats)
+ phy_ethtool_get_strings(dev->phydev, strings);
+ else
+ ops->get_strings(dev, id, strings);
+ info->data.legacy = strings;
+ info->free_data = true;
+ }
+ info->count = count;
+
+ return 0;
+}
+
+/* prepare_data() handler */
+static int strset_prepare(struct ethnl_req_info *req_info,
+ struct genl_info *info)
+{
+ struct strset_data *data =
+ container_of(req_info, struct strset_data, reqinfo_base);
+ struct net_device *dev = data->repdata_base.dev;
+ unsigned int i;
+ int ret;
+
+ BUILD_BUG_ON(ARRAY_SIZE(info_template) != ETH_SS_COUNT);
+ memcpy(&data->info, &info_template, sizeof(data->info));
+
+ if (!dev) {
+ for (i = 0; i < ETH_SS_COUNT; i++) {
+ if ((data->req_ids & (1U << i)) &&
+ data->info[i].per_dev) {
+ if (info)
+ GENL_SET_ERR_MSG(info, "requested per device strings without dev");
+ return -EINVAL;
+ }
+ }
+ }
+
+ ret = ethnl_before_ops(dev);
+ if (ret < 0)
+ goto err_strset;
+ for (i = 0; i < ETH_SS_COUNT; i++) {
+ if (!strset_include(data, i) || !data->info[i].per_dev)
+ continue;
+ if (WARN_ONCE(data->info[i].type != ETH_SS_TYPE_LEGACY,
+ "unexpected string set type %u",
+ data->info[i].type))
+ goto err_ops;
+
+ ret = strset_prepare_set(&data->info[i], dev, i,
+ data->counts_only);
+ if (ret < 0)
+ goto err_ops;
+ }
+ ethnl_after_ops(dev);
+
+ return 0;
+err_ops:
+ ethnl_after_ops(dev);
+err_strset:
+ strset_cleanup(req_info);
+ return ret;
+}
+
+/* calculate size of ETHTOOL_A_STRSET_STRINGSET nest for one string set */
+static int strset_set_size(const struct strset_info *info, bool counts_only)
+{
+ unsigned int len = 0;
+ unsigned int i;
+
+ if (info->count == 0)
+ return 0;
+ if (counts_only)
+ return nla_total_size(2 * nla_total_size(sizeof(u32)));
+
+ for (i = 0; i < info->count; i++) {
+ const char *str;
+
+ if (info->type == ETH_SS_TYPE_LEGACY)
+ str = info->data.legacy[i];
+ else
+ str = info->data.simple[i];
+
+ /* ETHTOOL_A_STRING_INDEX, ETHTOOL_A_STRING_VALUE, nest */
+ len += nla_total_size(nla_total_size(sizeof(u32)) +
+ ethnl_str_size(str));
+ }
+ /* ETHTOOL_A_STRINGSET_ID, ETHTOOL_A_STRINGSET_COUNT */
+ len = 2 * nla_total_size(sizeof(u32)) + nla_total_size(len);
+
+ return nla_total_size(len);
+}
+
+/* reply_size() handler */
+static int strset_size(const struct ethnl_req_info *req_info)
+{
+ const struct strset_data *data =
+ container_of(req_info, struct strset_data, reqinfo_base);
+ unsigned int i;
+ int len = 0;
+ int ret;
+
+ len += ethnl_reply_header_size();
+ for (i = 0; i < ETH_SS_COUNT; i++) {
+ const struct strset_info *info = &data->info[i];
+
+ if (!strset_include(data, i) || info->type == ETH_SS_TYPE_NONE)
+ continue;
+
+ ret = strset_set_size(info, data->counts_only);
+ if (ret < 0)
+ return ret;
+ len += ret;
+ }
+
+ return len;
+}
+
+/* fill one string into reply */
+static int strset_fill_string(struct sk_buff *skb,
+ const struct strset_info *info, u32 idx)
+{
+ struct nlattr *string;
+ const char *value;
+
+ if (info->type == ETH_SS_TYPE_LEGACY)
+ value = info->data.legacy[idx];
+ else
+ value = info->data.simple[idx];
+
+ string = nla_nest_start(skb, ETHTOOL_A_STRINGS_STRING);
+ if (!string)
+ return -EMSGSIZE;
+ if (nla_put_u32(skb, ETHTOOL_A_STRING_INDEX, idx) ||
+ nla_put_string(skb, ETHTOOL_A_STRING_VALUE, value))
+ return -EMSGSIZE;
+ nla_nest_end(skb, string);
+
+ return 0;
+}
+
+/* fill one string set into reply */
+static int strset_fill_set(struct sk_buff *skb, const struct strset_data *data,
+ u32 id)
+{
+ const struct strset_info *info = &data->info[id];
+ struct nlattr *strings;
+ struct nlattr *nest;
+ unsigned int i = (unsigned int)(-1);
+
+ if (info->type == ETH_SS_TYPE_NONE)
+ return -EOPNOTSUPP;
+ if (info->count == 0)
+ return 0;
+ nest = nla_nest_start(skb, ETHTOOL_A_STRINGSETS_STRINGSET);
+ if (!nest)
+ return -EMSGSIZE;
+
+ if (nla_put_u32(skb, ETHTOOL_A_STRINGSET_ID, id) ||
+ nla_put_u32(skb, ETHTOOL_A_STRINGSET_COUNT, info->count))
+ goto nla_put_failure;
+
+ if (!data->counts_only) {
+ strings = nla_nest_start(skb, ETHTOOL_A_STRINGSET_STRINGS);
+ if (!strings)
+ goto nla_put_failure;
+ for (i = 0; i < info->count; i++) {
+ if (strset_fill_string(skb, info, i) < 0)
+ goto nla_put_failure;
+ }
+ nla_nest_end(skb, strings);
+ }
+
+ nla_nest_end(skb, nest);
+ return 0;
+
+nla_put_failure:
+ nla_nest_cancel(skb, nest);
+ return -EMSGSIZE;
+}
+
+/* fill_reply() handler */
+static int strset_fill(struct sk_buff *skb,
+ const struct ethnl_req_info *req_info)
+{
+ const struct strset_data *data =
+ container_of(req_info, struct strset_data, reqinfo_base);
+ struct nlattr *nest;
+ unsigned int i;
+ int ret;
+
+ nest = nla_nest_start(skb, ETHTOOL_A_STRSET_STRINGSETS);
+ if (!nest)
+ return -EMSGSIZE;
+
+ for (i = 0; i < ETH_SS_COUNT; i++) {
+ if (strset_include(data, i)) {
+ ret = strset_fill_set(skb, data, i);
+ if (ret < 0)
+ goto nla_put_failure;
+ }
+ }
+
+ nla_nest_end(skb, nest);
+ return 0;
+
+nla_put_failure:
+ nla_nest_cancel(skb, nest);
+ return ret;
+}
+
+const struct get_request_ops strset_request_ops = {
+ .request_cmd = ETHTOOL_MSG_STRSET_GET,
+ .reply_cmd = ETHTOOL_MSG_STRSET_GET_REPLY,
+ .hdr_attr = ETHTOOL_A_STRSET_HEADER,
+ .max_attr = ETHTOOL_A_STRSET_MAX,
+ .data_size = sizeof(struct strset_data),
+ .repdata_offset = offsetof(struct strset_data, repdata_base),
+ .all_reqflags = ETHTOOL_RF_STRSET_ALL,
+ .allow_nodev_do = true,
+
+ .parse_request = strset_parse,
+ .prepare_data = strset_prepare,
+ .reply_size = strset_size,
+ .fill_reply = strset_fill,
+ .cleanup = strset_cleanup,
+};
--
2.22.0
Implement SETTINGS_GET netlink request to get link settings and link mode
information provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl
commands.
The information in SETTINGS_GET_REPLY message sent as reply is divided into
two parts and client can use info mask in request header to select only
one of them:
- ETHTOOL_IM_SETTINGS_LINKINFO: physical port, phy MDIO address, MDI(-X)
status, MDI(-X) control and transceiver
- ETHTOOL_IM_SETTINGS_LINKMODES: supported and advertised link modes,
autonegotiation state, link speed and duplex
SETTINGS_GET request can be used with NLM_F_DUMP (and without device
identification) to request the information for all devices in current
network namespace providing the data.
Signed-off-by: Michal Kubecek <[email protected]>
---
Documentation/networking/ethtool-netlink.txt | 48 +++-
include/linux/ethtool_netlink.h | 3 +
include/uapi/linux/ethtool_netlink.h | 49 ++++
net/ethtool/Makefile | 2 +-
net/ethtool/common.c | 48 ++++
net/ethtool/common.h | 4 +
net/ethtool/ioctl.c | 48 ----
net/ethtool/netlink.c | 8 +
net/ethtool/netlink.h | 1 +
net/ethtool/settings.c | 259 +++++++++++++++++++
10 files changed, 419 insertions(+), 51 deletions(-)
create mode 100644 net/ethtool/settings.c
diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
index 21e5030734aa..1d803488e02c 100644
--- a/Documentation/networking/ethtool-netlink.txt
+++ b/Documentation/networking/ethtool-netlink.txt
@@ -151,10 +151,12 @@ according to message purpose:
Userspace to kernel:
ETHTOOL_MSG_STRSET_GET get string set
+ ETHTOOL_MSG_SETTINGS_GET get device settings
Kernel to userspace:
ETHTOOL_MSG_STRSET_GET_REPLY string set contents
+ ETHTOOL_MSG_SETTINGS_GET_REPLY device settings
"GET" requests are sent by userspace applications to retrieve device
information. They usually do not contain any message specific attributes.
@@ -231,6 +233,48 @@ Flag ETHTOOL_A_STRSET_COUNTS tells kernel to only return string counts of the
sets, not the actual strings.
+SETTINGS_GET
+------------
+
+SETTINGS_GET request retrieves information provided by ETHTOOL_GLINKSETTINGS,
+ETHTOOL_GWOL, ETHTOOL_GMSGLVL and ETHTOOL_GLINK ioctl requests. The request
+doesn't use any attributes.
+
+Request attributes:
+
+ ETHTOOL_A_SETTINGS_HEADER (nested) request header
+
+Info mask bits meaning:
+
+ ETHTOOL_IM_SETTINGS_LINKINFO link settings
+ ETHTOOL_IM_SETTINGS_LINKMODES link modes and related
+
+Response contents:
+
+ ETHTOOL_A_SETTINGS_HEADER (nested) reply header
+ ETHTOOL_A_SETTINGS_LINK_INFO (nested) link settings
+ ETHTOOL_A_LINKINFO_PORT (u8) physical port
+ ETHTOOL_A_LINKINFO_PHYADDR (u8) phy MDIO address
+ ETHTOOL_A_LINKINFO_TP_MDIX (u8) MDI(-X) status
+ ETHTOOL_A_LINKINFO_TP_MDIX_CTRL (u8) MDI(-X) control
+ ETHTOOL_A_LINKINFO_TRANSCEIVER (u8) transceiver
+ ETHTOOL_A_SETTINGS_LINK_MODES (nested) link modes
+ ETHTOOL_A_LINKMODES_AUTONEG (u8) autonegotiation status
+ ETHTOOL_A_LINKMODES_OURS (bitset) advertised link modes
+ ETHTOOL_A_LINKMODES_PEER (bitset) partner link modes
+ ETHTOOL_A_LINKMODES_SPEED (u32) link speed (Mb/s)
+ ETHTOOL_A_LINKMODES_DUPLEX (u8) duplex mode
+
+Most of the attributes and their values have the same meaning as matching
+members of the corresponding ioctl structures. For ETHTOOL_A_LINKMODES_OURS,
+value represents advertised modes and mask represents supported modes.
+ETHTOOL_A_LINKMODES_PEER in the reply is a bit list.
+
+SETTINGS_GET requests allow dumps and messages in the same format as response
+to them are broadcasted as notifications on change of these settings using
+netlink or ioctl ethtool interface.
+
+
Request translation
-------------------
@@ -240,7 +284,7 @@ have their netlink replacement yet.
ioctl command netlink command
---------------------------------------------------------------------
-ETHTOOL_GSET n/a
+ETHTOOL_GSET ETHTOOL_MSG_SETTINGS_GET
ETHTOOL_SSET n/a
ETHTOOL_GDRVINFO n/a
ETHTOOL_GREGS n/a
@@ -314,7 +358,7 @@ ETHTOOL_GTUNABLE n/a
ETHTOOL_STUNABLE n/a
ETHTOOL_GPHYSTATS n/a
ETHTOOL_PERQUEUE n/a
-ETHTOOL_GLINKSETTINGS n/a
+ETHTOOL_GLINKSETTINGS ETHTOOL_MSG_SETTINGS_GET
ETHTOOL_SLINKSETTINGS n/a
ETHTOOL_PHY_GTUNABLE n/a
ETHTOOL_PHY_STUNABLE n/a
diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h
index 2a15e64a16f3..e770e6e9acca 100644
--- a/include/linux/ethtool_netlink.h
+++ b/include/linux/ethtool_netlink.h
@@ -7,6 +7,9 @@
#include <linux/ethtool.h>
#include <linux/netdevice.h>
+#define __ETHTOOL_LINK_MODE_MASK_NWORDS \
+ DIV_ROUND_UP(__ETHTOOL_LINK_MODE_MASK_NBITS, 32)
+
enum ethtool_multicast_groups {
ETHNL_MCGRP_MONITOR,
};
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 11b8519d2c1d..a046dd8da50e 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -15,6 +15,7 @@
enum {
ETHTOOL_MSG_USER_NONE,
ETHTOOL_MSG_STRSET_GET,
+ ETHTOOL_MSG_SETTINGS_GET,
/* add new constants above here */
__ETHTOOL_MSG_USER_CNT,
@@ -25,6 +26,7 @@ enum {
enum {
ETHTOOL_MSG_KERNEL_NONE,
ETHTOOL_MSG_STRSET_GET_REPLY,
+ ETHTOOL_MSG_SETTINGS_GET_REPLY,
/* add new constants above here */
__ETHTOOL_MSG_KERNEL_CNT,
@@ -147,6 +149,53 @@ enum {
#define ETHTOOL_RF_STRSET_ALL (ETHTOOL_RF_STRSET_COUNTS)
+/* SETTINGS */
+
+enum {
+ ETHTOOL_A_SETTINGS_UNSPEC,
+ ETHTOOL_A_SETTINGS_HEADER, /* nest - _A_HEADER_* */
+ ETHTOOL_A_SETTINGS_LINK_INFO, /* nest - _A_LINKINFO_* */
+ ETHTOOL_A_SETTINGS_LINK_MODES, /* nest - _A_LINKMODES_* */
+
+ /* add new constants above here */
+ __ETHTOOL_A_SETTINGS_CNT,
+ ETHTOOL_A_SETTINGS_MAX = (__ETHTOOL_A_SETTINGS_CNT - 1)
+};
+
+#define ETHTOOL_IM_SETTINGS_LINKINFO (1U << 0)
+#define ETHTOOL_IM_SETTINGS_LINKMODES (1U << 1)
+
+#define ETHTOOL_IM_SETTINGS_ALL (ETHTOOL_IM_SETTINGS_LINKINFO | \
+ ETHTOOL_IM_SETTINGS_LINKMODES)
+
+#define ETHTOOL_RF_SETTINGS_ALL 0
+
+enum {
+ ETHTOOL_A_LINKINFO_UNSPEC,
+ ETHTOOL_A_LINKINFO_PORT, /* u8 */
+ ETHTOOL_A_LINKINFO_PHYADDR, /* u8 */
+ ETHTOOL_A_LINKINFO_TP_MDIX, /* u8 */
+ ETHTOOL_A_LINKINFO_TP_MDIX_CTRL, /* u8 */
+ ETHTOOL_A_LINKINFO_TRANSCEIVER, /* u8 */
+
+ /* add new constants above here */
+ __ETHTOOL_A_LINKINFO_CNT,
+ ETHTOOL_A_LINKINFO_MAX = (__ETHTOOL_A_LINKINFO_CNT - 1)
+};
+
+enum {
+ ETHTOOL_A_LINKMODES_UNSPEC,
+ ETHTOOL_A_LINKMODES_AUTONEG, /* u8 */
+ ETHTOOL_A_LINKMODES_OURS, /* bitset */
+ ETHTOOL_A_LINKMODES_PEER, /* bitset */
+ ETHTOOL_A_LINKMODES_SPEED, /* u32 */
+ ETHTOOL_A_LINKMODES_DUPLEX, /* u8 */
+
+ /* add new constants above here */
+ __ETHTOOL_A_LINKMODES_CNT,
+ ETHTOOL_A_LINKMODES_MAX = (__ETHTOOL_A_LINKMODES_CNT - 1)
+};
+
/* generic netlink info */
#define ETHTOOL_GENL_NAME "ethtool"
#define ETHTOOL_GENL_VERSION 1
diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index 11ceb00821b3..1155e5e9ef69 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -4,4 +4,4 @@ obj-y += ioctl.o common.o
obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
-ethtool_nl-y := netlink.o bitset.o strset.o
+ethtool_nl-y := netlink.o bitset.o strset.o settings.o
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index b0ce420e994e..abb00b3a7e77 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -82,3 +82,51 @@ phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
[ETHTOOL_PHY_DOWNSHIFT] = "phy-downshift",
[ETHTOOL_PHY_FAST_LINK_DOWN] = "phy-fast-link-down",
};
+
+/* return false if legacy contained non-0 deprecated fields
+ * maxtxpkt/maxrxpkt. rest of ksettings always updated
+ */
+bool
+convert_legacy_settings_to_link_ksettings(
+ struct ethtool_link_ksettings *link_ksettings,
+ const struct ethtool_cmd *legacy_settings)
+{
+ bool retval = true;
+
+ memset(link_ksettings, 0, sizeof(*link_ksettings));
+
+ /* This is used to tell users that driver is still using these
+ * deprecated legacy fields, and they should not use
+ * %ETHTOOL_GLINKSETTINGS/%ETHTOOL_SLINKSETTINGS
+ */
+ if (legacy_settings->maxtxpkt ||
+ legacy_settings->maxrxpkt)
+ retval = false;
+
+ ethtool_convert_legacy_u32_to_link_mode(
+ link_ksettings->link_modes.supported,
+ legacy_settings->supported);
+ ethtool_convert_legacy_u32_to_link_mode(
+ link_ksettings->link_modes.advertising,
+ legacy_settings->advertising);
+ ethtool_convert_legacy_u32_to_link_mode(
+ link_ksettings->link_modes.lp_advertising,
+ legacy_settings->lp_advertising);
+ link_ksettings->base.speed
+ = ethtool_cmd_speed(legacy_settings);
+ link_ksettings->base.duplex
+ = legacy_settings->duplex;
+ link_ksettings->base.port
+ = legacy_settings->port;
+ link_ksettings->base.phy_address
+ = legacy_settings->phy_address;
+ link_ksettings->base.autoneg
+ = legacy_settings->autoneg;
+ link_ksettings->base.mdio_support
+ = legacy_settings->mdio_support;
+ link_ksettings->base.eth_tp_mdix
+ = legacy_settings->eth_tp_mdix;
+ link_ksettings->base.eth_tp_mdix_ctrl
+ = legacy_settings->eth_tp_mdix_ctrl;
+ return retval;
+}
diff --git a/net/ethtool/common.h b/net/ethtool/common.h
index 41b2efc1e4e1..0381936d8e1e 100644
--- a/net/ethtool/common.h
+++ b/net/ethtool/common.h
@@ -14,4 +14,8 @@ tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN];
extern const char
phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN];
+bool convert_legacy_settings_to_link_ksettings(
+ struct ethtool_link_ksettings *link_ksettings,
+ const struct ethtool_cmd *legacy_settings);
+
#endif /* _ETHTOOL_COMMON_H */
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index b35366dd9997..ed53e07d619e 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -352,54 +352,6 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 *legacy_u32,
}
EXPORT_SYMBOL(ethtool_convert_link_mode_to_legacy_u32);
-/* return false if legacy contained non-0 deprecated fields
- * maxtxpkt/maxrxpkt. rest of ksettings always updated
- */
-static bool
-convert_legacy_settings_to_link_ksettings(
- struct ethtool_link_ksettings *link_ksettings,
- const struct ethtool_cmd *legacy_settings)
-{
- bool retval = true;
-
- memset(link_ksettings, 0, sizeof(*link_ksettings));
-
- /* This is used to tell users that driver is still using these
- * deprecated legacy fields, and they should not use
- * %ETHTOOL_GLINKSETTINGS/%ETHTOOL_SLINKSETTINGS
- */
- if (legacy_settings->maxtxpkt ||
- legacy_settings->maxrxpkt)
- retval = false;
-
- ethtool_convert_legacy_u32_to_link_mode(
- link_ksettings->link_modes.supported,
- legacy_settings->supported);
- ethtool_convert_legacy_u32_to_link_mode(
- link_ksettings->link_modes.advertising,
- legacy_settings->advertising);
- ethtool_convert_legacy_u32_to_link_mode(
- link_ksettings->link_modes.lp_advertising,
- legacy_settings->lp_advertising);
- link_ksettings->base.speed
- = ethtool_cmd_speed(legacy_settings);
- link_ksettings->base.duplex
- = legacy_settings->duplex;
- link_ksettings->base.port
- = legacy_settings->port;
- link_ksettings->base.phy_address
- = legacy_settings->phy_address;
- link_ksettings->base.autoneg
- = legacy_settings->autoneg;
- link_ksettings->base.mdio_support
- = legacy_settings->mdio_support;
- link_ksettings->base.eth_tp_mdix
- = legacy_settings->eth_tp_mdix;
- link_ksettings->base.eth_tp_mdix_ctrl
- = legacy_settings->eth_tp_mdix_ctrl;
- return retval;
-}
-
/* return false if ksettings link modes had higher bits
* set. legacy_settings always updated (best effort)
*/
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index cba1f2259248..6c0cfa9001a1 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -280,6 +280,7 @@ struct ethnl_dump_ctx {
static const struct get_request_ops *get_requests[__ETHTOOL_MSG_USER_CNT] = {
[ETHTOOL_MSG_STRSET_GET] = &strset_request_ops,
+ [ETHTOOL_MSG_SETTINGS_GET] = &settings_request_ops,
};
static struct ethnl_dump_ctx *ethnl_dump_context(struct netlink_callback *cb)
@@ -632,6 +633,13 @@ static const struct genl_ops ethtool_genl_ops[] = {
.dumpit = ethnl_get_dumpit,
.done = ethnl_get_done,
},
+ {
+ .cmd = ETHTOOL_MSG_SETTINGS_GET,
+ .doit = ethnl_get_doit,
+ .start = ethnl_get_start,
+ .dumpit = ethnl_get_dumpit,
+ .done = ethnl_get_done,
+ },
};
static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index d85b1edc1b91..27832a3956c8 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -294,5 +294,6 @@ struct get_request_ops {
/* request handlers */
extern const struct get_request_ops strset_request_ops;
+extern const struct get_request_ops settings_request_ops;
#endif /* _NET_ETHTOOL_NETLINK_H */
diff --git a/net/ethtool/settings.c b/net/ethtool/settings.c
new file mode 100644
index 000000000000..11ec30b9d48b
--- /dev/null
+++ b/net/ethtool/settings.c
@@ -0,0 +1,259 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+
+#include "netlink.h"
+#include "common.h"
+#include "bitset.h"
+
+struct settings_data {
+ struct ethnl_req_info reqinfo_base;
+
+ /* everything below here will be reset for each device in dumps */
+ struct ethnl_reply_data repdata_base;
+ struct ethtool_link_ksettings ksettings;
+ struct ethtool_link_settings *lsettings;
+ bool lpm_empty;
+};
+
+static const struct nla_policy
+settings_get_policy[ETHTOOL_A_SETTINGS_MAX + 1] = {
+ [ETHTOOL_A_SETTINGS_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_SETTINGS_HEADER] = { .type = NLA_NESTED },
+ [ETHTOOL_A_SETTINGS_LINK_INFO] = { .type = NLA_REJECT },
+ [ETHTOOL_A_SETTINGS_LINK_MODES] = { .type = NLA_REJECT },
+};
+
+static int ethnl_get_link_ksettings(struct genl_info *info,
+ struct net_device *dev,
+ struct ethtool_link_ksettings *ksettings)
+{
+ int ret;
+
+ ret = __ethtool_get_link_ksettings(dev, ksettings);
+
+ if (ret < 0 && info)
+ GENL_SET_ERR_MSG(info, "failed to retrieve link settings");
+ return ret;
+}
+
+/* prepare_data() handler */
+static int settings_prepare(struct ethnl_req_info *req_info,
+ struct genl_info *info)
+{
+ struct settings_data *data =
+ container_of(req_info, struct settings_data, reqinfo_base);
+ struct net_device *dev = data->repdata_base.dev;
+ u32 req_mask = req_info->req_mask;
+ int ret;
+
+ data->lsettings = &data->ksettings.base;
+ data->lpm_empty = true;
+
+ ret = ethnl_before_ops(dev);
+ if (ret < 0)
+ return ret;
+ if (req_mask &
+ (ETHTOOL_IM_SETTINGS_LINKINFO | ETHTOOL_IM_SETTINGS_LINKMODES)) {
+ ret = ethnl_get_link_ksettings(info, dev, &data->ksettings);
+ if (ret < 0)
+ req_mask &= ~(ETHTOOL_IM_SETTINGS_LINKINFO |
+ ETHTOOL_IM_SETTINGS_LINKMODES);
+ }
+ if (req_mask & ETHTOOL_IM_SETTINGS_LINKMODES) {
+ data->lpm_empty =
+ bitmap_empty(data->ksettings.link_modes.lp_advertising,
+ __ETHTOOL_LINK_MODE_MASK_NBITS);
+ ethnl_bitmap_to_u32(data->ksettings.link_modes.supported,
+ __ETHTOOL_LINK_MODE_MASK_NWORDS);
+ ethnl_bitmap_to_u32(data->ksettings.link_modes.advertising,
+ __ETHTOOL_LINK_MODE_MASK_NWORDS);
+ ethnl_bitmap_to_u32(data->ksettings.link_modes.lp_advertising,
+ __ETHTOOL_LINK_MODE_MASK_NWORDS);
+ }
+ ethnl_after_ops(dev);
+
+ data->repdata_base.info_mask = req_mask;
+ if (req_info->req_mask & ~req_mask && info)
+ GENL_SET_ERR_MSG(info,
+ "not all requested data could be retrieved");
+ return 0;
+}
+
+static int settings_linkinfo_size(void)
+{
+ int len = 0;
+
+ /* port, phyaddr, mdix, mdixctrl, transcvr */
+ len += 5 * nla_total_size(sizeof(u8));
+ /* mdio_support */
+ len += nla_total_size(sizeof(struct nla_bitfield32));
+
+ /* nest */
+ return nla_total_size(len);
+}
+
+static int
+settings_linkmodes_size(const struct ethtool_link_ksettings *ksettings,
+ bool compact)
+{
+ unsigned int flags = compact ? ETHNL_BITSET_COMPACT : 0;
+ u32 *supported = (u32 *)ksettings->link_modes.supported;
+ u32 *advertising = (u32 *)ksettings->link_modes.advertising;
+ u32 *lp_advertising = (u32 *)ksettings->link_modes.lp_advertising;
+ int len = 0, ret;
+
+ /* speed, duplex, autoneg */
+ len += nla_total_size(sizeof(u32)) + 2 * nla_total_size(sizeof(u8));
+ ret = ethnl_bitset32_size(__ETHTOOL_LINK_MODE_MASK_NBITS, advertising,
+ supported, link_mode_names, flags);
+ if (ret < 0)
+ return ret;
+ len += ret;
+ ret = ethnl_bitset32_size(__ETHTOOL_LINK_MODE_MASK_NBITS,
+ lp_advertising, NULL, link_mode_names,
+ flags & ETHNL_BITSET_LIST);
+ if (ret < 0)
+ return ret;
+ len += ret;
+
+ /* nest */
+ return nla_total_size(len);
+}
+
+/* reply_size() handler
+ *
+ * To keep things simple, reserve space for some attributes which may not
+ * be added to the message (e.g. ETHTOOL_A_SETTINGS_SOPASS); therefore the
+ * length returned may be bigger than the actual length of the message sent.
+ */
+static int settings_size(const struct ethnl_req_info *req_info)
+{
+ struct settings_data *data =
+ container_of(req_info, struct settings_data, reqinfo_base);
+ u32 info_mask = data->repdata_base.info_mask;
+ bool compact = req_info->global_flags & ETHTOOL_RF_COMPACT;
+ int len = 0, ret;
+
+ len += ethnl_reply_header_size();
+ if (info_mask & ETHTOOL_IM_SETTINGS_LINKINFO)
+ len += settings_linkinfo_size();
+ if (info_mask & ETHTOOL_IM_SETTINGS_LINKMODES) {
+ ret = settings_linkmodes_size(&data->ksettings, compact);
+ if (ret < 0)
+ return ret;
+ len += ret;
+ }
+
+ return len;
+}
+
+static int settings_fill_linkinfo(struct sk_buff *skb,
+ const struct ethtool_link_settings *lsettings)
+{
+ struct nlattr *nest;
+
+ nest = nla_nest_start(skb, ETHTOOL_A_SETTINGS_LINK_INFO);
+ if (!nest)
+ return -EMSGSIZE;
+
+ if (nla_put_u8(skb, ETHTOOL_A_LINKINFO_PORT, lsettings->port) ||
+ nla_put_u8(skb, ETHTOOL_A_LINKINFO_PHYADDR,
+ lsettings->phy_address) ||
+ nla_put_u8(skb, ETHTOOL_A_LINKINFO_TP_MDIX,
+ lsettings->eth_tp_mdix) ||
+ nla_put_u8(skb, ETHTOOL_A_LINKINFO_TP_MDIX_CTRL,
+ lsettings->eth_tp_mdix_ctrl) ||
+ nla_put_u8(skb, ETHTOOL_A_LINKINFO_TRANSCEIVER,
+ lsettings->transceiver)) {
+ nla_nest_cancel(skb, nest);
+ return -EMSGSIZE;
+ }
+
+ nla_nest_end(skb, nest);
+ return 0;
+}
+
+static int
+settings_fill_linkmodes(struct sk_buff *skb,
+ const struct ethtool_link_ksettings *ksettings,
+ bool lpm_empty, bool compact)
+{
+ const u32 *supported = (const u32 *)ksettings->link_modes.supported;
+ const u32 *advertising = (const u32 *)ksettings->link_modes.advertising;
+ const u32 *lp_adv = (const u32 *)ksettings->link_modes.lp_advertising;
+ const unsigned int flags = compact ? ETHNL_BITSET_COMPACT : 0;
+ const struct ethtool_link_settings *lsettings = &ksettings->base;
+ struct nlattr *nest;
+ int ret;
+
+ nest = nla_nest_start(skb, ETHTOOL_A_SETTINGS_LINK_MODES);
+ if (!nest)
+ return -EMSGSIZE;
+ if (nla_put_u8(skb, ETHTOOL_A_LINKMODES_AUTONEG, lsettings->autoneg))
+ goto nla_put_failure;
+
+ ret = ethnl_put_bitset32(skb, ETHTOOL_A_LINKMODES_OURS,
+ __ETHTOOL_LINK_MODE_MASK_NBITS, advertising,
+ supported, link_mode_names, flags);
+ if (ret < 0)
+ goto nla_put_failure;
+ if (!lpm_empty) {
+ ret = ethnl_put_bitset32(skb, ETHTOOL_A_LINKMODES_PEER,
+ __ETHTOOL_LINK_MODE_MASK_NBITS,
+ lp_adv, NULL, link_mode_names,
+ flags | ETHNL_BITSET_LIST);
+ if (ret < 0)
+ goto nla_put_failure;
+ }
+
+ if (nla_put_u32(skb, ETHTOOL_A_LINKMODES_SPEED, lsettings->speed) ||
+ nla_put_u8(skb, ETHTOOL_A_LINKMODES_DUPLEX, lsettings->duplex))
+ goto nla_put_failure;
+
+ nla_nest_end(skb, nest);
+ return 0;
+
+nla_put_failure:
+ nla_nest_cancel(skb, nest);
+ return -EMSGSIZE;
+}
+
+/* fill_reply() handler */
+static int settings_fill(struct sk_buff *skb,
+ const struct ethnl_req_info *req_info)
+{
+ const struct settings_data *data =
+ container_of(req_info, struct settings_data, reqinfo_base);
+ u32 info_mask = data->repdata_base.info_mask;
+ bool compact = req_info->global_flags & ETHTOOL_RF_COMPACT;
+ int ret;
+
+ if (info_mask & ETHTOOL_IM_SETTINGS_LINKINFO) {
+ ret = settings_fill_linkinfo(skb, data->lsettings);
+ if (ret < 0)
+ return ret;
+ }
+ if (info_mask & ETHTOOL_IM_SETTINGS_LINKMODES) {
+ ret = settings_fill_linkmodes(skb, &data->ksettings,
+ data->lpm_empty, compact);
+ if (ret < 0)
+ return ret;
+ }
+
+ return 0;
+}
+
+const struct get_request_ops settings_request_ops = {
+ .request_cmd = ETHTOOL_MSG_SETTINGS_GET,
+ .reply_cmd = ETHTOOL_MSG_SETTINGS_GET_REPLY,
+ .hdr_attr = ETHTOOL_A_SETTINGS_HEADER,
+ .max_attr = ETHTOOL_A_SETTINGS_MAX,
+ .data_size = sizeof(struct settings_data),
+ .repdata_offset = offsetof(struct settings_data, repdata_base),
+ .request_policy = settings_get_policy,
+ .default_infomask = ETHTOOL_IM_SETTINGS_ALL,
+ .all_reqflags = ETHTOOL_RF_SETTINGS_ALL,
+
+ .prepare_data = settings_prepare,
+ .reply_size = settings_size,
+ .fill_reply = settings_fill,
+};
--
2.22.0
The ethtool netlink notifications have the same format as related GET
replies so that if generic GET handling framework is used to process GET
requests, its callbacks and instance of struct get_request_ops can be
also used to compose corresponding notification message.
Provide function ethnl_std_notify() to be used as notification handler in
ethnl_notify_handlers table.
Signed-off-by: Michal Kubecek <[email protected]>
---
net/ethtool/netlink.c | 74 +++++++++++++++++++++++++++++++++++++++++++
net/ethtool/netlink.h | 3 +-
2 files changed, 76 insertions(+), 1 deletion(-)
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 6c0cfa9001a1..9ff17ef05023 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -7,6 +7,7 @@
static struct genl_family ethtool_genl_family;
static bool ethnl_ok __read_mostly;
+static u32 ethnl_bcast_seq;
#define __LINK_MODE_NAME(speed, type, duplex) \
#speed "base" #type "/" #duplex
@@ -258,6 +259,18 @@ struct sk_buff *ethnl_reply_init(size_t payload, struct net_device *dev, u8 cmd,
return NULL;
}
+static void *ethnl_bcastmsg_put(struct sk_buff *skb, u8 cmd)
+{
+ return genlmsg_put(skb, 0, ++ethnl_bcast_seq, ðtool_genl_family, 0,
+ cmd);
+}
+
+static int ethnl_multicast(struct sk_buff *skb, struct net_device *dev)
+{
+ return genlmsg_multicast_netns(ðtool_genl_family, dev_net(dev), skb,
+ 0, ETHNL_MCGRP_MONITOR, GFP_KERNEL);
+}
+
/* GET request helpers */
/**
@@ -597,6 +610,67 @@ static int ethnl_get_done(struct netlink_callback *cb)
return 0;
}
+static const struct get_request_ops *ethnl_std_notify_to_ops(unsigned int cmd)
+{
+ WARN_ONCE(1, "unexpected notification type %u\n", cmd);
+ return NULL;
+}
+
+/* generic notification handler */
+static void ethnl_std_notify(struct net_device *dev,
+ struct netlink_ext_ack *extack, unsigned int cmd,
+ u32 req_mask, const void *data)
+{
+ const struct get_request_ops *ops;
+ struct ethnl_req_info *req_info;
+ struct sk_buff *skb;
+ void *reply_payload;
+ int reply_len;
+ int ret;
+
+ ops = ethnl_std_notify_to_ops(cmd);
+ if (!ops)
+ return;
+
+ req_info = ethnl_alloc_get_data(ops);
+ if (!req_info)
+ return;
+ req_info->dev = dev;
+ req_info->req_mask = req_mask;
+ req_info->global_flags |= ETHTOOL_RF_COMPACT;
+
+ ethnl_init_reply_data(req_info, ops, dev);
+ ret = ops->prepare_data(req_info, NULL);
+ if (ret < 0)
+ goto err_data;
+ reply_len = ops->reply_size(req_info);
+ if (reply_len < 0)
+ goto err_data;
+ skb = genlmsg_new(reply_len, GFP_KERNEL);
+ if (!skb)
+ goto err_data;
+ reply_payload = ethnl_bcastmsg_put(skb, cmd);
+ if (!reply_payload)
+ goto err_skb;
+
+ ret = ethnl_fill_reply_header(skb, dev, ops->hdr_attr);
+ if (ret < 0)
+ goto err_skb;
+ ret = ops->fill_reply(skb, req_info);
+ if (ret < 0)
+ goto err_skb;
+ ethnl_free_get_data(ops, req_info);
+ genlmsg_end(skb, reply_payload);
+
+ ethnl_multicast(skb, dev);
+ return;
+
+err_skb:
+ nlmsg_free(skb);
+err_data:
+ ethnl_free_get_data(ops, req_info);
+}
+
/* notifications */
typedef void (*ethnl_notify_handler_t)(struct net_device *dev,
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 27832a3956c8..6512d9d508bf 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -265,7 +265,8 @@ static inline void ethnl_after_ops(struct net_device *dev)
* infrastructure. When used, a pointer to an instance of this structure is to
* be added to &get_requests array and generic handlers ethnl_get_doit(),
* ethnl_get_dumpit(), ethnl_get_start() and ethnl_get_done() used in
- * @ethnl_genl_ops
+ * @ethnl_genl_ops; ethnl_std_notify() can be used in @ethnl_notify_handlers
+ * to send notifications of the corresponding type.
*/
struct get_request_ops {
u8 request_cmd;
--
2.22.0
Implement SETTINGS_SET netlink request allowing to set link settings and
advertised link modes as an alternative to ETHTOOL_SLINKSETTINGS and
ETHTOOL_SSET ioctl commands.
Only values which are intended to be set are to be included in the request.
Omitted attributes will be left at current values.
ETHTOOL_A_SETTINGS_LINK_MODES nested attribute is used to set (or modify)
advertised link modes and related settings (autonegotiation, speed and
duplex). Kernel implements logic which was already partially implemented in
ethtool for ioctl interface: if autonegotiation is on (either it was on
already or the request turns it on), no link mode change is requested (no
ETHTOOL_A_LINKMODES_OURS attribute) and speed or duplex are provided,
advertised link modes are set to supported modes matching requested speed
and/or duplex.
ETHTOOL_A_SETTINGS_LINK_INFO nested attribute is used to set physical port,
phy MDIO address and MDI(-X) control. An attempt to modify other attributes
provided by corresponding GET request is rejected.
When any data is modified, ETHTOOL_MSG_SETTINGS_NTF message in the same
format as a reply to GET request is sent to notify userspace about the
changes. The same notification is also sent when these settings are
modified using the ioctl interface.
Signed-off-by: Michal Kubecek <[email protected]>
---
Documentation/networking/ethtool-netlink.txt | 34 +-
include/uapi/linux/ethtool_netlink.h | 2 +
net/ethtool/ioctl.c | 18 +-
net/ethtool/netlink.c | 11 +
net/ethtool/netlink.h | 2 +
net/ethtool/settings.c | 332 +++++++++++++++++++
6 files changed, 395 insertions(+), 4 deletions(-)
diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
index 1d803488e02c..05bc8f5f8654 100644
--- a/Documentation/networking/ethtool-netlink.txt
+++ b/Documentation/networking/ethtool-netlink.txt
@@ -152,11 +152,13 @@ Userspace to kernel:
ETHTOOL_MSG_STRSET_GET get string set
ETHTOOL_MSG_SETTINGS_GET get device settings
+ ETHTOOL_MSG_SETTINGS_SET set device settings
Kernel to userspace:
ETHTOOL_MSG_STRSET_GET_REPLY string set contents
ETHTOOL_MSG_SETTINGS_GET_REPLY device settings
+ ETHTOOL_MSG_SETTINGS_NTF device settings notification
"GET" requests are sent by userspace applications to retrieve device
information. They usually do not contain any message specific attributes.
@@ -275,6 +277,34 @@ to them are broadcasted as notifications on change of these settings using
netlink or ioctl ethtool interface.
+SETTINGS_SET
+------------
+
+SETTINGS_SET request allows setting some of the data reported by SETTINGS_GET.
+Request flags, info_mask and index are ignored. These attributes are allowed
+to be passed with SETTINGS_SET request:
+
+ ETHTOOL_A_SETTINGS_HEADER (nested) request header
+ ETHTOOL_A_SETTINGS_LINK_INFO (nested) link settings
+ ETHTOOL_A_LINKINFO_PORT (u8) physical port
+ ETHTOOL_A_LINKINFO_PHYADDR (u8) MDIO address of phy
+ ETHTOOL_A_LINKINFO_TP_MDIX_CTRL (u8) MDI(-X) control
+ ETHTOOL_A_SETTINGS_LINK_MODES (nested) link modes
+ ETHTOOL_A_LINKMODES_AUTONEG (u8) autonegotiation
+ ETHTOOL_A_LINKMODES_OURS (bitset) advertised link modes
+ ETHTOOL_A_LINKMODES_SPEED (u32) link speed (Mb/s)
+ ETHTOOL_A_LINKMODES_DUPLEX (u8) duplex mode
+
+ETHTOOL_A_LINKMODES_OURS bit set allows setting advertised link modes. If
+autonegotiation is on (either set now or kept from before), advertised modes
+are not changed (no ETHTOOL_A_LINKMODES_OURS attribute) and at least one of
+speed and duplex is specified, kernel adjusts advertised modes to all
+supported modes matching speed, duplex or both (whatever is specified). This
+autoselection is done on ethtool side with ioctl interface, netlink interface
+is supposed to allow requesting changes without knowing what exactly kernel
+supports.
+
+
Request translation
-------------------
@@ -285,7 +315,7 @@ have their netlink replacement yet.
ioctl command netlink command
---------------------------------------------------------------------
ETHTOOL_GSET ETHTOOL_MSG_SETTINGS_GET
-ETHTOOL_SSET n/a
+ETHTOOL_SSET ETHTOOL_MSG_SETTINGS_SET
ETHTOOL_GDRVINFO n/a
ETHTOOL_GREGS n/a
ETHTOOL_GWOL n/a
@@ -359,7 +389,7 @@ ETHTOOL_STUNABLE n/a
ETHTOOL_GPHYSTATS n/a
ETHTOOL_PERQUEUE n/a
ETHTOOL_GLINKSETTINGS ETHTOOL_MSG_SETTINGS_GET
-ETHTOOL_SLINKSETTINGS n/a
+ETHTOOL_SLINKSETTINGS ETHTOOL_MSG_SETTINGS_SET
ETHTOOL_PHY_GTUNABLE n/a
ETHTOOL_PHY_STUNABLE n/a
ETHTOOL_GFECPARAM n/a
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index a046dd8da50e..8ccf66ed3f58 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -16,6 +16,7 @@ enum {
ETHTOOL_MSG_USER_NONE,
ETHTOOL_MSG_STRSET_GET,
ETHTOOL_MSG_SETTINGS_GET,
+ ETHTOOL_MSG_SETTINGS_SET,
/* add new constants above here */
__ETHTOOL_MSG_USER_CNT,
@@ -27,6 +28,7 @@ enum {
ETHTOOL_MSG_KERNEL_NONE,
ETHTOOL_MSG_STRSET_GET_REPLY,
ETHTOOL_MSG_SETTINGS_GET_REPLY,
+ ETHTOOL_MSG_SETTINGS_NTF,
/* add new constants above here */
__ETHTOOL_MSG_KERNEL_CNT,
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index ed53e07d619e..504ab2f7009c 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -26,6 +26,7 @@
#include <net/devlink.h>
#include <net/xdp_sock.h>
#include <net/flow_offload.h>
+#include <linux/ethtool_netlink.h>
#include "common.h"
@@ -565,7 +566,13 @@ static int ethtool_set_link_ksettings(struct net_device *dev,
!= link_ksettings.base.link_mode_masks_nwords)
return -EINVAL;
- return dev->ethtool_ops->set_link_ksettings(dev, &link_ksettings);
+ err = dev->ethtool_ops->set_link_ksettings(dev, &link_ksettings);
+ if (err >= 0)
+ ethtool_notify(dev, NULL, ETHTOOL_MSG_SETTINGS_NTF,
+ ETHTOOL_IM_SETTINGS_LINKINFO |
+ ETHTOOL_IM_SETTINGS_LINKMODES,
+ NULL);
+ return err;
}
/* Query device for its ethtool_cmd settings.
@@ -614,6 +621,7 @@ static int ethtool_set_settings(struct net_device *dev, void __user *useraddr)
{
struct ethtool_link_ksettings link_ksettings;
struct ethtool_cmd cmd;
+ int ret;
ASSERT_RTNL();
@@ -626,7 +634,13 @@ static int ethtool_set_settings(struct net_device *dev, void __user *useraddr)
return -EINVAL;
link_ksettings.base.link_mode_masks_nwords =
__ETHTOOL_LINK_MODE_MASK_NU32;
- return dev->ethtool_ops->set_link_ksettings(dev, &link_ksettings);
+ ret = dev->ethtool_ops->set_link_ksettings(dev, &link_ksettings);
+ if (ret >= 0)
+ ethtool_notify(dev, NULL, ETHTOOL_MSG_SETTINGS_NTF,
+ ETHTOOL_IM_SETTINGS_LINKINFO |
+ ETHTOOL_IM_SETTINGS_LINKMODES,
+ NULL);
+ return ret;
}
static noinline_for_stack int ethtool_get_drvinfo(struct net_device *dev,
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 9ff17ef05023..69b6dfe2a1c8 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -612,6 +612,11 @@ static int ethnl_get_done(struct netlink_callback *cb)
static const struct get_request_ops *ethnl_std_notify_to_ops(unsigned int cmd)
{
+ switch (cmd) {
+ case ETHTOOL_MSG_SETTINGS_NTF:
+ return &settings_request_ops;
+ };
+
WARN_ONCE(1, "unexpected notification type %u\n", cmd);
return NULL;
}
@@ -679,6 +684,7 @@ typedef void (*ethnl_notify_handler_t)(struct net_device *dev,
const void *data);
static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
+ [ETHTOOL_MSG_SETTINGS_NTF] = ethnl_std_notify,
};
void ethtool_notify(struct net_device *dev, struct netlink_ext_ack *extack,
@@ -714,6 +720,11 @@ static const struct genl_ops ethtool_genl_ops[] = {
.dumpit = ethnl_get_dumpit,
.done = ethnl_get_done,
},
+ {
+ .cmd = ETHTOOL_MSG_SETTINGS_SET,
+ .flags = GENL_UNS_ADMIN_PERM,
+ .doit = ethnl_set_settings,
+ },
};
static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 6512d9d508bf..43fdf11cfc6d 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -297,4 +297,6 @@ struct get_request_ops {
extern const struct get_request_ops strset_request_ops;
extern const struct get_request_ops settings_request_ops;
+int ethnl_set_settings(struct sk_buff *skb, struct genl_info *info);
+
#endif /* _NET_ETHTOOL_NETLINK_H */
diff --git a/net/ethtool/settings.c b/net/ethtool/settings.c
index 11ec30b9d48b..2fc961297076 100644
--- a/net/ethtool/settings.c
+++ b/net/ethtool/settings.c
@@ -14,6 +14,99 @@ struct settings_data {
bool lpm_empty;
};
+struct link_mode_info {
+ int speed;
+ u8 duplex;
+};
+
+#define __DEFINE_LINK_MODE_PARAMS(_speed, _type, _duplex) \
+ [ETHTOOL_LINK_MODE(_speed, _type, _duplex)] = { \
+ .speed = SPEED_ ## _speed, \
+ .duplex = __DUPLEX_ ## _duplex \
+ }
+#define __DUPLEX_Half DUPLEX_HALF
+#define __DUPLEX_Full DUPLEX_FULL
+#define __DEFINE_SPECIAL_MODE_PARAMS(_mode) \
+ [ETHTOOL_LINK_MODE_ ## _mode ## _BIT] = { \
+ .speed = SPEED_UNKNOWN, \
+ .duplex = DUPLEX_UNKNOWN, \
+ }
+
+static const struct link_mode_info link_mode_params[] = {
+ __DEFINE_LINK_MODE_PARAMS(10, T, Half),
+ __DEFINE_LINK_MODE_PARAMS(10, T, Full),
+ __DEFINE_LINK_MODE_PARAMS(100, T, Half),
+ __DEFINE_LINK_MODE_PARAMS(100, T, Full),
+ __DEFINE_LINK_MODE_PARAMS(1000, T, Half),
+ __DEFINE_LINK_MODE_PARAMS(1000, T, Full),
+ __DEFINE_SPECIAL_MODE_PARAMS(Autoneg),
+ __DEFINE_SPECIAL_MODE_PARAMS(TP),
+ __DEFINE_SPECIAL_MODE_PARAMS(AUI),
+ __DEFINE_SPECIAL_MODE_PARAMS(MII),
+ __DEFINE_SPECIAL_MODE_PARAMS(FIBRE),
+ __DEFINE_SPECIAL_MODE_PARAMS(BNC),
+ __DEFINE_LINK_MODE_PARAMS(10000, T, Full),
+ __DEFINE_SPECIAL_MODE_PARAMS(Pause),
+ __DEFINE_SPECIAL_MODE_PARAMS(Asym_Pause),
+ __DEFINE_LINK_MODE_PARAMS(2500, X, Full),
+ __DEFINE_SPECIAL_MODE_PARAMS(Backplane),
+ __DEFINE_LINK_MODE_PARAMS(1000, KX, Full),
+ __DEFINE_LINK_MODE_PARAMS(10000, KX4, Full),
+ __DEFINE_LINK_MODE_PARAMS(10000, KR, Full),
+ [ETHTOOL_LINK_MODE_10000baseR_FEC_BIT] = {
+ .speed = SPEED_10000,
+ .duplex = DUPLEX_FULL,
+ },
+ __DEFINE_LINK_MODE_PARAMS(20000, MLD2, Full),
+ __DEFINE_LINK_MODE_PARAMS(20000, KR2, Full),
+ __DEFINE_LINK_MODE_PARAMS(40000, KR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(40000, CR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(40000, SR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(40000, LR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(56000, KR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(56000, CR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(56000, SR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(56000, LR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(25000, CR, Full),
+ __DEFINE_LINK_MODE_PARAMS(25000, KR, Full),
+ __DEFINE_LINK_MODE_PARAMS(25000, SR, Full),
+ __DEFINE_LINK_MODE_PARAMS(50000, CR2, Full),
+ __DEFINE_LINK_MODE_PARAMS(50000, KR2, Full),
+ __DEFINE_LINK_MODE_PARAMS(100000, KR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(100000, SR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(100000, CR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(100000, LR4_ER4, Full),
+ __DEFINE_LINK_MODE_PARAMS(50000, SR2, Full),
+ __DEFINE_LINK_MODE_PARAMS(1000, X, Full),
+ __DEFINE_LINK_MODE_PARAMS(10000, CR, Full),
+ __DEFINE_LINK_MODE_PARAMS(10000, SR, Full),
+ __DEFINE_LINK_MODE_PARAMS(10000, LR, Full),
+ __DEFINE_LINK_MODE_PARAMS(10000, LRM, Full),
+ __DEFINE_LINK_MODE_PARAMS(10000, ER, Full),
+ __DEFINE_LINK_MODE_PARAMS(2500, T, Full),
+ __DEFINE_LINK_MODE_PARAMS(5000, T, Full),
+ __DEFINE_SPECIAL_MODE_PARAMS(FEC_NONE),
+ __DEFINE_SPECIAL_MODE_PARAMS(FEC_RS),
+ __DEFINE_SPECIAL_MODE_PARAMS(FEC_BASER),
+ __DEFINE_LINK_MODE_PARAMS(50000, KR, Full),
+ __DEFINE_LINK_MODE_PARAMS(50000, SR, Full),
+ __DEFINE_LINK_MODE_PARAMS(50000, CR, Full),
+ __DEFINE_LINK_MODE_PARAMS(50000, LR_ER_FR, Full),
+ __DEFINE_LINK_MODE_PARAMS(50000, DR, Full),
+ __DEFINE_LINK_MODE_PARAMS(100000, KR2, Full),
+ __DEFINE_LINK_MODE_PARAMS(100000, SR2, Full),
+ __DEFINE_LINK_MODE_PARAMS(100000, CR2, Full),
+ __DEFINE_LINK_MODE_PARAMS(100000, LR2_ER2_FR2, Full),
+ __DEFINE_LINK_MODE_PARAMS(100000, DR2, Full),
+ __DEFINE_LINK_MODE_PARAMS(200000, KR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(200000, SR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(200000, LR4_ER4_FR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(200000, DR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(200000, CR4, Full),
+ __DEFINE_LINK_MODE_PARAMS(100, T1, Full),
+ __DEFINE_LINK_MODE_PARAMS(1000, T1, Full),
+};
+
static const struct nla_policy
settings_get_policy[ETHTOOL_A_SETTINGS_MAX + 1] = {
[ETHTOOL_A_SETTINGS_UNSPEC] = { .type = NLA_REJECT },
@@ -257,3 +350,242 @@ const struct get_request_ops settings_request_ops = {
.reply_size = settings_size,
.fill_reply = settings_fill,
};
+
+/* SET_SETTINGS */
+
+static const struct nla_policy settings_hdr_policy[ETHTOOL_A_HEADER_MAX + 1] = {
+ [ETHTOOL_A_HEADER_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_HEADER_DEV_INDEX] = { .type = NLA_U32 },
+ [ETHTOOL_A_HEADER_DEV_NAME] = { .type = NLA_NUL_STRING,
+ .len = IFNAMSIZ - 1 },
+ [ETHTOOL_A_HEADER_INFOMASK] = { .type = NLA_REJECT },
+ [ETHTOOL_A_HEADER_GFLAGS] = { .type = NLA_U32 },
+ [ETHTOOL_A_HEADER_RFLAGS] = { .type = NLA_REJECT },
+};
+
+static const struct nla_policy
+linkinfo_set_policy[ETHTOOL_A_LINKINFO_MAX + 1] = {
+ [ETHTOOL_A_LINKINFO_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_LINKINFO_PORT] = { .type = NLA_U8 },
+ [ETHTOOL_A_LINKINFO_PHYADDR] = { .type = NLA_U8 },
+ [ETHTOOL_A_LINKINFO_TP_MDIX] = { .type = NLA_REJECT },
+ [ETHTOOL_A_LINKINFO_TP_MDIX_CTRL] = { .type = NLA_U8 },
+ [ETHTOOL_A_LINKINFO_TRANSCEIVER] = { .type = NLA_REJECT },
+};
+
+static const struct nla_policy
+linkmodes_set_policy[ETHTOOL_A_LINKMODES_MAX + 1] = {
+ [ETHTOOL_A_LINKMODES_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_LINKMODES_AUTONEG] = { .type = NLA_U8 },
+ [ETHTOOL_A_LINKMODES_OURS] = { .type = NLA_NESTED },
+ [ETHTOOL_A_LINKMODES_PEER] = { .type = NLA_REJECT },
+ [ETHTOOL_A_LINKMODES_SPEED] = { .type = NLA_U32 },
+ [ETHTOOL_A_LINKMODES_DUPLEX] = { .type = NLA_U8 },
+};
+
+static const struct nla_policy
+settings_set_policy[ETHTOOL_A_SETTINGS_MAX + 1] = {
+ [ETHTOOL_A_SETTINGS_UNSPEC] = { .type = NLA_REJECT },
+ [ETHTOOL_A_SETTINGS_HEADER] = { .type = NLA_NESTED },
+ [ETHTOOL_A_SETTINGS_LINK_INFO] = { .type = NLA_NESTED },
+ [ETHTOOL_A_SETTINGS_LINK_MODES] = { .type = NLA_NESTED },
+};
+
+static int ethnl_set_link_ksettings(struct genl_info *info,
+ struct net_device *dev,
+ struct ethtool_link_ksettings *ksettings)
+{
+ int ret = dev->ethtool_ops->set_link_ksettings(dev, ksettings);
+
+ if (ret < 0)
+ GENL_SET_ERR_MSG(info, "link settings update failed");
+ return ret;
+}
+
+/* Set advertised link modes to all supported modes matching requested speed
+ * and duplex values. Called when autonegotiation is on, speed or duplex is
+ * requested but no link mode change. This is done in userspace with ioctl()
+ * interface, move it into kernel for netlink.
+ * Returns true if advertised modes bitmap was modified.
+ */
+static bool settings_auto_linkmodes(struct ethtool_link_ksettings *ksettings,
+ bool req_speed, bool req_duplex)
+{
+ unsigned long *advertising = ksettings->link_modes.advertising;
+ unsigned long *supported = ksettings->link_modes.supported;
+ DECLARE_BITMAP(old_adv, __ETHTOOL_LINK_MODE_MASK_NBITS);
+ unsigned int i;
+
+ BUILD_BUG_ON(ARRAY_SIZE(link_mode_params) !=
+ __ETHTOOL_LINK_MODE_MASK_NBITS);
+
+ bitmap_copy(old_adv, advertising, __ETHTOOL_LINK_MODE_MASK_NBITS);
+
+ for (i = 0; i < __ETHTOOL_LINK_MODE_MASK_NBITS; i++) {
+ const struct link_mode_info *info = &link_mode_params[i];
+
+ if (info->speed == SPEED_UNKNOWN)
+ continue;
+ if (test_bit(i, supported) &&
+ (!req_speed || info->speed == ksettings->base.speed) &&
+ (!req_duplex || info->duplex == ksettings->base.duplex))
+ set_bit(i, advertising);
+ else
+ clear_bit(i, advertising);
+ }
+
+ return !bitmap_equal(old_adv, advertising,
+ __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static int settings_update_linkinfo(struct genl_info *info, struct nlattr *nest,
+ struct ethtool_link_settings *lsettings)
+{
+ struct nlattr *tb[ETHTOOL_A_LINKINFO_MAX + 1];
+ int ret;
+
+ if (!nest)
+ return 0;
+ ret = nla_parse_nested(tb, ETHTOOL_A_LINKINFO_MAX, nest,
+ linkinfo_set_policy, info->extack);
+ if (ret < 0)
+ return ret;
+
+ ret = 0;
+ if (ethnl_update_u8(&lsettings->port, tb[ETHTOOL_A_LINKINFO_PORT]))
+ ret = 1;
+ if (ethnl_update_u8(&lsettings->phy_address,
+ tb[ETHTOOL_A_LINKINFO_PHYADDR]))
+ ret = 1;
+ if (ethnl_update_u8(&lsettings->eth_tp_mdix_ctrl,
+ tb[ETHTOOL_A_LINKINFO_TP_MDIX_CTRL]))
+ ret = 1;
+
+ return ret;
+}
+
+static int settings_update_linkmodes(struct genl_info *info,
+ const struct nlattr *nest,
+ struct ethtool_link_ksettings *ksettings)
+{
+ struct ethtool_link_settings *lsettings = &ksettings->base;
+ struct nlattr *tb[ETHTOOL_A_LINKMODES_MAX + 1];
+ bool req_speed, req_duplex;
+ bool mod = false;
+ int ret;
+
+ if (!nest)
+ return 0;
+ ret = nla_parse_nested(tb, ETHTOOL_A_LINKMODES_MAX, nest,
+ linkmodes_set_policy, info->extack);
+ if (ret < 0)
+ return ret;
+ req_speed = tb[ETHTOOL_A_LINKMODES_SPEED];
+ req_duplex = tb[ETHTOOL_A_LINKMODES_DUPLEX];
+
+ if (ethnl_update_u8(&lsettings->autoneg,
+ tb[ETHTOOL_A_LINKMODES_AUTONEG]))
+ mod = true;
+ if (ethnl_update_bitset(ksettings->link_modes.advertising, NULL,
+ __ETHTOOL_LINK_MODE_MASK_NBITS,
+ tb[ETHTOOL_A_LINKMODES_OURS],
+ &ret, link_mode_names, false, info))
+ mod = true;
+ if (ret < 0)
+ return ret;
+ if (ethnl_update_u32(&lsettings->speed, tb[ETHTOOL_A_LINKMODES_SPEED]))
+ mod = true;
+ if (ethnl_update_u8(&lsettings->duplex, tb[ETHTOOL_A_LINKMODES_DUPLEX]))
+ mod = true;
+
+ if (!tb[ETHTOOL_A_LINKMODES_OURS] && lsettings->autoneg &&
+ (req_speed || req_duplex) &&
+ settings_auto_linkmodes(ksettings, req_speed, req_duplex))
+ mod = true;
+
+ return mod;
+}
+
+/* Update device settings using ->set_link_ksettings() callback */
+static int ethnl_update_ksettings(struct genl_info *info, struct nlattr **tb,
+ struct net_device *dev, u32 *req_mask)
+{
+ struct ethtool_link_ksettings ksettings = {};
+ struct ethtool_link_settings *lsettings;
+ u32 mod_mask = 0;
+ int ret;
+
+ ret = ethnl_get_link_ksettings(info, dev, &ksettings);
+ if (ret < 0)
+ return ret;
+ lsettings = &ksettings.base;
+
+ ret = settings_update_linkinfo(info, tb[ETHTOOL_A_SETTINGS_LINK_INFO],
+ lsettings);
+ if (ret < 0)
+ return ret;
+ if (ret)
+ mod_mask |= ETHTOOL_IM_SETTINGS_LINKINFO;
+
+ ret = settings_update_linkmodes(info, tb[ETHTOOL_A_SETTINGS_LINK_MODES],
+ &ksettings);
+ if (ret < 0)
+ return ret;
+ if (ret)
+ mod_mask |= ETHTOOL_IM_SETTINGS_LINKMODES;
+
+ if (mod_mask) {
+ ret = ethnl_set_link_ksettings(info, dev, &ksettings);
+ if (ret < 0)
+ return ret;
+ *req_mask |= mod_mask;
+ }
+
+ return 0;
+}
+
+int ethnl_set_settings(struct sk_buff *skb, struct genl_info *info)
+{
+ struct nlattr *tb[ETHTOOL_A_SETTINGS_MAX + 1];
+ struct ethnl_req_info req_info = {};
+ struct net_device *dev;
+ u32 req_mask = 0;
+ int ret;
+
+ ret = nlmsg_parse(info->nlhdr, GENL_HDRLEN, tb,
+ ETHTOOL_A_SETTINGS_MAX, settings_set_policy,
+ info->extack);
+ if (ret < 0)
+ return ret;
+ ret = ethnl_parse_header(&req_info, tb[ETHTOOL_A_SETTINGS_HEADER],
+ genl_info_net(info), info->extack,
+ settings_hdr_policy, true);
+ if (ret < 0)
+ return ret;
+ dev = req_info.dev;
+
+ rtnl_lock();
+ ret = ethnl_before_ops(dev);
+ if (ret < 0)
+ goto out_rtnl;
+ if (tb[ETHTOOL_A_SETTINGS_LINK_INFO] ||
+ tb[ETHTOOL_A_SETTINGS_LINK_MODES]) {
+ ret = -EOPNOTSUPP;
+ if (!dev->ethtool_ops->get_link_ksettings)
+ goto out_ops;
+ ret = ethnl_update_ksettings(info, tb, dev, &req_mask);
+ if (ret < 0)
+ goto out_ops;
+ }
+ ret = 0;
+
+out_ops:
+ if (req_mask)
+ ethtool_notify(dev, NULL, ETHTOOL_MSG_SETTINGS_NTF, req_mask,
+ NULL);
+ ethnl_after_ops(dev);
+out_rtnl:
+ rtnl_unlock();
+ dev_put(dev);
+ return ret;
+}
--
2.22.0
Unlike e.g. netdev features, the ethtool ioctl interface requires link mode
table to be in sync between kernel and userspace for userspace to be able
to display and set all link modes supported by kernel. The way arbitrary
length bitsets are implemented in netlink interface, this is no longer
needed.
To allow userspace to access all link modes running kernel supports, add
table of ethernet link mode names and make it available as a string set to
userspace GET_STRSET requests. Add build time check to make sure names
are defined for all modes declared in enum ethtool_link_mode_bit_indices.
Signed-off-by: Michal Kubecek <[email protected]>
---
include/linux/ethtool.h | 4 ++
include/uapi/linux/ethtool.h | 2 +
net/ethtool/netlink.c | 83 ++++++++++++++++++++++++++++++++++++
net/ethtool/netlink.h | 2 +
net/ethtool/strset.c | 6 +++
5 files changed, 97 insertions(+)
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 95991e4300bf..5caef65d93d6 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -102,6 +102,10 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, u32 n_rx_rings)
#define __ETHTOOL_DECLARE_LINK_MODE_MASK(name) \
DECLARE_BITMAP(name, __ETHTOOL_LINK_MODE_MASK_NBITS)
+/* compose link mode index from speed, type and duplex */
+#define ETHTOOL_LINK_MODE(speed, type, duplex) \
+ ETHTOOL_LINK_MODE_ ## speed ## base ## type ## _ ## duplex ## _BIT
+
/* drivers must ignore base.cmd and base.link_mode_masks_nwords
* fields, but they are allowed to overwrite them (will be ignored).
*/
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 4e4e28e77c7a..6ad298224352 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -571,6 +571,7 @@ struct ethtool_pauseparam {
* @ETH_SS_RSS_HASH_FUNCS: RSS hush function names
* @ETH_SS_PHY_STATS: Statistic names, for use with %ETHTOOL_GPHYSTATS
* @ETH_SS_PHY_TUNABLES: PHY tunable names
+ * @ETH_SS_LINK_MODES: link mode names
*/
enum ethtool_stringset {
ETH_SS_TEST = 0,
@@ -582,6 +583,7 @@ enum ethtool_stringset {
ETH_SS_TUNABLES,
ETH_SS_PHY_STATS,
ETH_SS_PHY_TUNABLES,
+ ETH_SS_LINK_MODES,
ETH_SS_COUNT
};
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 41d7fedd3dd6..cba1f2259248 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -8,6 +8,86 @@ static struct genl_family ethtool_genl_family;
static bool ethnl_ok __read_mostly;
+#define __LINK_MODE_NAME(speed, type, duplex) \
+ #speed "base" #type "/" #duplex
+#define __DEFINE_LINK_MODE_NAME(speed, type, duplex) \
+ [ETHTOOL_LINK_MODE(speed, type, duplex)] = \
+ __LINK_MODE_NAME(speed, type, duplex)
+#define __DEFINE_SPECIAL_MODE_NAME(_mode, _name) \
+ [ETHTOOL_LINK_MODE_ ## _mode ## _BIT] = _name
+
+const char *const link_mode_names[] = {
+ __DEFINE_LINK_MODE_NAME(10, T, Half),
+ __DEFINE_LINK_MODE_NAME(10, T, Full),
+ __DEFINE_LINK_MODE_NAME(100, T, Half),
+ __DEFINE_LINK_MODE_NAME(100, T, Full),
+ __DEFINE_LINK_MODE_NAME(1000, T, Half),
+ __DEFINE_LINK_MODE_NAME(1000, T, Full),
+ __DEFINE_SPECIAL_MODE_NAME(Autoneg, "Autoneg"),
+ __DEFINE_SPECIAL_MODE_NAME(TP, "TP"),
+ __DEFINE_SPECIAL_MODE_NAME(AUI, "AUI"),
+ __DEFINE_SPECIAL_MODE_NAME(MII, "MII"),
+ __DEFINE_SPECIAL_MODE_NAME(FIBRE, "FIBRE"),
+ __DEFINE_SPECIAL_MODE_NAME(BNC, "BNC"),
+ __DEFINE_LINK_MODE_NAME(10000, T, Full),
+ __DEFINE_SPECIAL_MODE_NAME(Pause, "Pause"),
+ __DEFINE_SPECIAL_MODE_NAME(Asym_Pause, "Asym_Pause"),
+ __DEFINE_LINK_MODE_NAME(2500, X, Full),
+ __DEFINE_SPECIAL_MODE_NAME(Backplane, "Backplane"),
+ __DEFINE_LINK_MODE_NAME(1000, KX, Full),
+ __DEFINE_LINK_MODE_NAME(10000, KX4, Full),
+ __DEFINE_LINK_MODE_NAME(10000, KR, Full),
+ [ETHTOOL_LINK_MODE_10000baseR_FEC_BIT] = "10000baseR_FEC",
+ __DEFINE_LINK_MODE_NAME(20000, MLD2, Full),
+ __DEFINE_LINK_MODE_NAME(20000, KR2, Full),
+ __DEFINE_LINK_MODE_NAME(40000, KR4, Full),
+ __DEFINE_LINK_MODE_NAME(40000, CR4, Full),
+ __DEFINE_LINK_MODE_NAME(40000, SR4, Full),
+ __DEFINE_LINK_MODE_NAME(40000, LR4, Full),
+ __DEFINE_LINK_MODE_NAME(56000, KR4, Full),
+ __DEFINE_LINK_MODE_NAME(56000, CR4, Full),
+ __DEFINE_LINK_MODE_NAME(56000, SR4, Full),
+ __DEFINE_LINK_MODE_NAME(56000, LR4, Full),
+ __DEFINE_LINK_MODE_NAME(25000, CR, Full),
+ __DEFINE_LINK_MODE_NAME(25000, KR, Full),
+ __DEFINE_LINK_MODE_NAME(25000, SR, Full),
+ __DEFINE_LINK_MODE_NAME(50000, CR2, Full),
+ __DEFINE_LINK_MODE_NAME(50000, KR2, Full),
+ __DEFINE_LINK_MODE_NAME(100000, KR4, Full),
+ __DEFINE_LINK_MODE_NAME(100000, SR4, Full),
+ __DEFINE_LINK_MODE_NAME(100000, CR4, Full),
+ __DEFINE_LINK_MODE_NAME(100000, LR4_ER4, Full),
+ __DEFINE_LINK_MODE_NAME(50000, SR2, Full),
+ __DEFINE_LINK_MODE_NAME(1000, X, Full),
+ __DEFINE_LINK_MODE_NAME(10000, CR, Full),
+ __DEFINE_LINK_MODE_NAME(10000, SR, Full),
+ __DEFINE_LINK_MODE_NAME(10000, LR, Full),
+ __DEFINE_LINK_MODE_NAME(10000, LRM, Full),
+ __DEFINE_LINK_MODE_NAME(10000, ER, Full),
+ __DEFINE_LINK_MODE_NAME(2500, T, Full),
+ __DEFINE_LINK_MODE_NAME(5000, T, Full),
+ __DEFINE_SPECIAL_MODE_NAME(FEC_NONE, "None"),
+ __DEFINE_SPECIAL_MODE_NAME(FEC_RS, "RS"),
+ __DEFINE_SPECIAL_MODE_NAME(FEC_BASER, "BASER"),
+ __DEFINE_LINK_MODE_NAME(50000, KR, Full),
+ __DEFINE_LINK_MODE_NAME(50000, SR, Full),
+ __DEFINE_LINK_MODE_NAME(50000, CR, Full),
+ __DEFINE_LINK_MODE_NAME(50000, LR_ER_FR, Full),
+ __DEFINE_LINK_MODE_NAME(50000, DR, Full),
+ __DEFINE_LINK_MODE_NAME(100000, KR2, Full),
+ __DEFINE_LINK_MODE_NAME(100000, SR2, Full),
+ __DEFINE_LINK_MODE_NAME(100000, CR2, Full),
+ __DEFINE_LINK_MODE_NAME(100000, LR2_ER2_FR2, Full),
+ __DEFINE_LINK_MODE_NAME(100000, DR2, Full),
+ __DEFINE_LINK_MODE_NAME(200000, KR4, Full),
+ __DEFINE_LINK_MODE_NAME(200000, SR4, Full),
+ __DEFINE_LINK_MODE_NAME(200000, LR4_ER4_FR4, Full),
+ __DEFINE_LINK_MODE_NAME(200000, DR4, Full),
+ __DEFINE_LINK_MODE_NAME(200000, CR4, Full),
+ __DEFINE_LINK_MODE_NAME(100, T1, Full),
+ __DEFINE_LINK_MODE_NAME(1000, T1, Full),
+};
+
static const struct nla_policy dflt_header_policy[ETHTOOL_A_HEADER_MAX + 1] = {
[ETHTOOL_A_HEADER_UNSPEC] = { .type = NLA_REJECT },
[ETHTOOL_A_HEADER_DEV_INDEX] = { .type = NLA_U32 },
@@ -575,6 +655,9 @@ static int __init ethnl_init(void)
{
int ret;
+ BUILD_BUG_ON(ARRAY_SIZE(link_mode_names) !=
+ __ETHTOOL_LINK_MODE_MASK_NBITS);
+
ret = genl_register_family(ðtool_genl_family);
if (WARN(ret < 0, "ethtool: genetlink family registration failed"))
return ret;
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index 2352fd9c17c3..d85b1edc1b91 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -10,6 +10,8 @@
struct ethnl_req_info;
+extern const char *const link_mode_names[];
+
int ethnl_parse_header(struct ethnl_req_info *req_info,
const struct nlattr *nest, struct net *net,
struct netlink_ext_ack *extack,
diff --git a/net/ethtool/strset.c b/net/ethtool/strset.c
index fd7229379158..514ef04709d3 100644
--- a/net/ethtool/strset.c
+++ b/net/ethtool/strset.c
@@ -67,6 +67,12 @@ static const struct strset_info info_template[] = {
.count = ARRAY_SIZE(phy_tunable_strings),
.data = { .legacy = phy_tunable_strings },
},
+ [ETH_SS_LINK_MODES] = {
+ .type = ETH_SS_TYPE_SIMPLE,
+ .per_dev = false,
+ .count = __ETHTOOL_LINK_MODE_MASK_NBITS,
+ .data = { .simple = link_mode_names },
+ },
};
struct strset_data {
--
2.22.0
Permanent hardware address of a network device was traditionally provided
via ethtool ioctl interface but as Jiri Pirko pointed out in a review of
ethtool netlink interface, rtnetlink is much more suitable for it so let's
add it to the RTM_NEWLINK message.
Add IFLA_PERM_ADDRESS attribute to RTM_NEWLINK messages unless the
permanent address is all zeros (i.e. device driver did not fill it). As
permanent address is not modifiable, reject userspace requests containing
IFLA_PERM_ADDRESS attribute.
Note: we already provide permanent hardware address for bond slaves;
unfortunately we cannot drop that attribute for backward compatibility
reasons.
v5 -> v6: only add the attribute if permanent address is not zero
Signed-off-by: Michal Kubecek <[email protected]>
---
include/uapi/linux/if_link.h | 1 +
net/core/rtnetlink.c | 5 +++++
2 files changed, 6 insertions(+)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 6f75bda2c2d7..1c79d6283a4d 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -167,6 +167,7 @@ enum {
IFLA_NEW_IFINDEX,
IFLA_MIN_MTU,
IFLA_MAX_MTU,
+ IFLA_PERM_ADDRESS,
__IFLA_MAX
};
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1ee6460f8275..9aae53e8914e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1027,6 +1027,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
+ nla_total_size(4) /* IFLA_CARRIER_DOWN_COUNT */
+ nla_total_size(4) /* IFLA_MIN_MTU */
+ nla_total_size(4) /* IFLA_MAX_MTU */
+ + nla_total_size(MAX_ADDR_LEN) /* IFLA_PERM_ADDRESS */
+ 0;
}
@@ -1691,6 +1692,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
nla_put_s32(skb, IFLA_NEW_IFINDEX, new_ifindex) < 0)
goto nla_put_failure;
+ if (memchr_inv(dev->perm_addr, '\0', dev->addr_len) &&
+ nla_put(skb, IFLA_PERM_ADDRESS, dev->addr_len, dev->perm_addr))
+ goto nla_put_failure;
rcu_read_lock();
if (rtnl_fill_link_af(skb, dev, ext_filter_mask))
@@ -1750,6 +1754,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
[IFLA_CARRIER_DOWN_COUNT] = { .type = NLA_U32 },
[IFLA_MIN_MTU] = { .type = NLA_U32 },
[IFLA_MAX_MTU] = { .type = NLA_U32 },
+ [IFLA_PERM_ADDRESS] = { .type = NLA_REJECT },
};
static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
--
2.22.0
Add information about device link state (as provided by ETHTOOL_GLINK ioctl
command) into the SETTINGS_GET reply when ETHTOOL_IM_SETTINGS_LINKSTATE
bit is set in the request info mask.
We cannot use NLA_FLAG for link state as we need three states: off, on and
unknown. The attribute is encapsulated in a nest to allow future extensions
(e.g. link down reason or more detailed link state information).
Signed-off-by: Michal Kubecek <[email protected]>
---
Documentation/networking/ethtool-netlink.txt | 5 ++-
include/uapi/linux/ethtool_netlink.h | 14 +++++++-
net/ethtool/common.c | 8 +++++
net/ethtool/common.h | 3 ++
net/ethtool/ioctl.c | 8 ++---
net/ethtool/settings.c | 37 ++++++++++++++++++++
6 files changed, 69 insertions(+), 6 deletions(-)
diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
index 05bc8f5f8654..dc06e33329a4 100644
--- a/Documentation/networking/ethtool-netlink.txt
+++ b/Documentation/networking/ethtool-netlink.txt
@@ -250,6 +250,7 @@ Info mask bits meaning:
ETHTOOL_IM_SETTINGS_LINKINFO link settings
ETHTOOL_IM_SETTINGS_LINKMODES link modes and related
+ ETHTOOL_IM_SETTINGS_LINKSTATE link state
Response contents:
@@ -266,6 +267,8 @@ Response contents:
ETHTOOL_A_LINKMODES_PEER (bitset) partner link modes
ETHTOOL_A_LINKMODES_SPEED (u32) link speed (Mb/s)
ETHTOOL_A_LINKMODES_DUPLEX (u8) duplex mode
+ ETHTOOL_A_SETTINGS_LINK_STATE (nested) link state
+ ETHTOOL_A_LINKSTATE_LINK (u8) link on/off/unknown
Most of the attributes and their values have the same meaning as matching
members of the corresponding ioctl structures. For ETHTOOL_A_LINKMODES_OURS,
@@ -323,7 +326,7 @@ ETHTOOL_SWOL n/a
ETHTOOL_GMSGLVL n/a
ETHTOOL_SMSGLVL n/a
ETHTOOL_NWAY_RST n/a
-ETHTOOL_GLINK n/a
+ETHTOOL_GLINK ETHNL_CMD_GET_SETTINGS
ETHTOOL_GEEPROM n/a
ETHTOOL_SEEPROM n/a
ETHTOOL_GCOALESCE n/a
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index 8ccf66ed3f58..46c13455246f 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -158,6 +158,7 @@ enum {
ETHTOOL_A_SETTINGS_HEADER, /* nest - _A_HEADER_* */
ETHTOOL_A_SETTINGS_LINK_INFO, /* nest - _A_LINKINFO_* */
ETHTOOL_A_SETTINGS_LINK_MODES, /* nest - _A_LINKMODES_* */
+ ETHTOOL_A_SETTINGS_LINK_STATE, /* nest - _A_LINKSTATE_* */
/* add new constants above here */
__ETHTOOL_A_SETTINGS_CNT,
@@ -166,9 +167,11 @@ enum {
#define ETHTOOL_IM_SETTINGS_LINKINFO (1U << 0)
#define ETHTOOL_IM_SETTINGS_LINKMODES (1U << 1)
+#define ETHTOOL_IM_SETTINGS_LINKSTATE (1U << 2)
#define ETHTOOL_IM_SETTINGS_ALL (ETHTOOL_IM_SETTINGS_LINKINFO | \
- ETHTOOL_IM_SETTINGS_LINKMODES)
+ ETHTOOL_IM_SETTINGS_LINKMODES | \
+ ETHTOOL_IM_SETTINGS_LINKSTATE)
#define ETHTOOL_RF_SETTINGS_ALL 0
@@ -198,6 +201,15 @@ enum {
ETHTOOL_A_LINKMODES_MAX = (__ETHTOOL_A_LINKMODES_CNT - 1)
};
+enum {
+ ETHTOOL_A_LINKSTATE_UNSPEC,
+ ETHTOOL_A_LINKSTATE_LINK, /* u8 */
+
+ /* add new constants above here */
+ __ETHTOOL_A_LINKSTATE_CNT,
+ ETHTOOL_A_LINKSTATE_MAX = (__ETHTOOL_A_LINKSTATE_CNT - 1)
+};
+
/* generic netlink info */
#define ETHTOOL_GENL_NAME "ethtool"
#define ETHTOOL_GENL_VERSION 1
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index abb00b3a7e77..b06635ad2620 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -130,3 +130,11 @@ convert_legacy_settings_to_link_ksettings(
= legacy_settings->eth_tp_mdix_ctrl;
return retval;
}
+
+int __ethtool_get_link(struct net_device *dev)
+{
+ if (!dev->ethtool_ops->get_link)
+ return -EOPNOTSUPP;
+
+ return netif_running(dev) && dev->ethtool_ops->get_link(dev);
+}
diff --git a/net/ethtool/common.h b/net/ethtool/common.h
index 0381936d8e1e..a2c1504576c2 100644
--- a/net/ethtool/common.h
+++ b/net/ethtool/common.h
@@ -3,6 +3,7 @@
#ifndef _ETHTOOL_COMMON_H
#define _ETHTOOL_COMMON_H
+#include <linux/netdevice.h>
#include <linux/ethtool.h>
extern const char
@@ -14,6 +15,8 @@ tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN];
extern const char
phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN];
+int __ethtool_get_link(struct net_device *dev);
+
bool convert_legacy_settings_to_link_ksettings(
struct ethtool_link_ksettings *link_ksettings,
const struct ethtool_cmd *legacy_settings);
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 504ab2f7009c..853b8c21a5e5 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -1359,12 +1359,12 @@ static int ethtool_nway_reset(struct net_device *dev)
static int ethtool_get_link(struct net_device *dev, char __user *useraddr)
{
struct ethtool_value edata = { .cmd = ETHTOOL_GLINK };
+ int link = __ethtool_get_link(dev);
- if (!dev->ethtool_ops->get_link)
- return -EOPNOTSUPP;
-
- edata.data = netif_running(dev) && dev->ethtool_ops->get_link(dev);
+ if (link < 0)
+ return link;
+ edata.data = link;
if (copy_to_user(useraddr, &edata, sizeof(edata)))
return -EFAULT;
return 0;
diff --git a/net/ethtool/settings.c b/net/ethtool/settings.c
index 2fc961297076..079d3776df71 100644
--- a/net/ethtool/settings.c
+++ b/net/ethtool/settings.c
@@ -11,6 +11,7 @@ struct settings_data {
struct ethnl_reply_data repdata_base;
struct ethtool_link_ksettings ksettings;
struct ethtool_link_settings *lsettings;
+ int link;
bool lpm_empty;
};
@@ -113,6 +114,7 @@ settings_get_policy[ETHTOOL_A_SETTINGS_MAX + 1] = {
[ETHTOOL_A_SETTINGS_HEADER] = { .type = NLA_NESTED },
[ETHTOOL_A_SETTINGS_LINK_INFO] = { .type = NLA_REJECT },
[ETHTOOL_A_SETTINGS_LINK_MODES] = { .type = NLA_REJECT },
+ [ETHTOOL_A_SETTINGS_LINK_STATE] = { .type = NLA_REJECT },
};
static int ethnl_get_link_ksettings(struct genl_info *info,
@@ -140,6 +142,7 @@ static int settings_prepare(struct ethnl_req_info *req_info,
data->lsettings = &data->ksettings.base;
data->lpm_empty = true;
+ data->link = -EOPNOTSUPP;
ret = ethnl_before_ops(dev);
if (ret < 0)
@@ -162,6 +165,8 @@ static int settings_prepare(struct ethnl_req_info *req_info,
ethnl_bitmap_to_u32(data->ksettings.link_modes.lp_advertising,
__ETHTOOL_LINK_MODE_MASK_NWORDS);
}
+ if (req_mask & ETHTOOL_IM_SETTINGS_LINKSTATE)
+ data->link = __ethtool_get_link(dev);
ethnl_after_ops(dev);
data->repdata_base.info_mask = req_mask;
@@ -212,6 +217,13 @@ settings_linkmodes_size(const struct ethtool_link_ksettings *ksettings,
return nla_total_size(len);
}
+static int settings_linkstate_size(int link)
+{
+ if (link < 0)
+ return nla_total_size(0);
+ return nla_total_size(nla_total_size(sizeof(u8)));
+}
+
/* reply_size() handler
*
* To keep things simple, reserve space for some attributes which may not
@@ -235,6 +247,8 @@ static int settings_size(const struct ethnl_req_info *req_info)
return ret;
len += ret;
}
+ if (info_mask & ETHTOOL_IM_SETTINGS_LINKSTATE)
+ len += settings_linkstate_size(data->link);
return len;
}
@@ -310,6 +324,23 @@ settings_fill_linkmodes(struct sk_buff *skb,
return -EMSGSIZE;
}
+static int settings_fill_linkstate(struct sk_buff *skb, int link)
+{
+ struct nlattr *nest;
+
+ nest = nla_nest_start(skb, ETHTOOL_A_SETTINGS_LINK_STATE);
+ if (!nest)
+ return -EMSGSIZE;
+ if (link >= 0 && nla_put_u8(skb, ETHTOOL_A_LINKSTATE_LINK, !!link))
+ goto nla_put_failure;
+ nla_nest_end(skb, nest);
+ return 0;
+
+nla_put_failure:
+ nla_nest_cancel(skb, nest);
+ return -EMSGSIZE;
+}
+
/* fill_reply() handler */
static int settings_fill(struct sk_buff *skb,
const struct ethnl_req_info *req_info)
@@ -331,6 +362,11 @@ static int settings_fill(struct sk_buff *skb,
if (ret < 0)
return ret;
}
+ if (info_mask & ETHTOOL_IM_SETTINGS_LINKSTATE) {
+ ret = settings_fill_linkstate(skb, data->link);
+ if (ret < 0)
+ return ret;
+ }
return 0;
}
@@ -389,6 +425,7 @@ settings_set_policy[ETHTOOL_A_SETTINGS_MAX + 1] = {
[ETHTOOL_A_SETTINGS_HEADER] = { .type = NLA_NESTED },
[ETHTOOL_A_SETTINGS_LINK_INFO] = { .type = NLA_NESTED },
[ETHTOOL_A_SETTINGS_LINK_MODES] = { .type = NLA_NESTED },
+ [ETHTOOL_A_SETTINGS_LINK_STATE] = { .type = NLA_REJECT },
};
static int ethnl_set_link_ksettings(struct genl_info *info,
--
2.22.0
Tue, Jul 02, 2019 at 01:49:44PM CEST, [email protected] wrote:
>Permanent hardware address of a network device was traditionally provided
>via ethtool ioctl interface but as Jiri Pirko pointed out in a review of
>ethtool netlink interface, rtnetlink is much more suitable for it so let's
>add it to the RTM_NEWLINK message.
>
>Add IFLA_PERM_ADDRESS attribute to RTM_NEWLINK messages unless the
>permanent address is all zeros (i.e. device driver did not fill it). As
>permanent address is not modifiable, reject userspace requests containing
>IFLA_PERM_ADDRESS attribute.
>
>Note: we already provide permanent hardware address for bond slaves;
>unfortunately we cannot drop that attribute for backward compatibility
>reasons.
>
>v5 -> v6: only add the attribute if permanent address is not zero
>
>Signed-off-by: Michal Kubecek <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Tue, Jul 02, 2019 at 01:49:49PM CEST, [email protected] wrote:
>Function nl80211_validate_nested() is not specific to nl80211, it's
>a counterpart to nla_validate_nested_deprecated() with strict validation.
>For consistency with other validation and parse functions, rename it to
>nla_validate_nested().
>
>Signed-off-by: Michal Kubecek <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
On Tue, 2019-07-02 at 13:49 +0200, Michal Kubecek wrote:
> Function nl80211_validate_nested() is not specific to nl80211, it's
> a counterpart to nla_validate_nested_deprecated() with strict validation.
> For consistency with other validation and parse functions, rename it to
> nla_validate_nested().
Umm, right, not sure how that happened. Sorry about that.
Reviewed-by: Johannes Berg <[email protected]>
johannes
On Tue, 2019-07-02 at 13:49 +0200, Michal Kubecek wrote:
> Function nl80211_validate_nested() is not specific to nl80211, it's
> a counterpart to nla_validate_nested_deprecated() with strict validation.
> For consistency with other validation and parse functions, rename it to
> nla_validate_nested().
Umm, right, not sure how that happened. Sorry about that.
Reviewed-by: Johannes Berg <[email protected]?
johannes
Tue, Jul 02, 2019 at 01:49:59PM CEST, [email protected] wrote:
>Basic genetlink and init infrastructure for the netlink interface, register
>genetlink family "ethtool". Add CONFIG_ETHTOOL_NETLINK Kconfig option to
>make the build optional. Add initial overall interface description into
>Documentation/networking/ethtool-netlink.txt, further patches will add more
>detailed information.
>
>Signed-off-by: Michal Kubecek <[email protected]>
>---
> Documentation/networking/ethtool-netlink.txt | 208 +++++++++++++++++++
> include/linux/ethtool_netlink.h | 9 +
> include/uapi/linux/ethtool_netlink.h | 36 ++++
> net/Kconfig | 8 +
> net/ethtool/Makefile | 6 +-
> net/ethtool/netlink.c | 33 +++
> net/ethtool/netlink.h | 10 +
> 7 files changed, 309 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/networking/ethtool-netlink.txt
> create mode 100644 include/linux/ethtool_netlink.h
> create mode 100644 include/uapi/linux/ethtool_netlink.h
> create mode 100644 net/ethtool/netlink.c
> create mode 100644 net/ethtool/netlink.h
>
>diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
>new file mode 100644
>index 000000000000..97c369aa290b
>--- /dev/null
>+++ b/Documentation/networking/ethtool-netlink.txt
>@@ -0,0 +1,208 @@
>+ Netlink interface for ethtool
>+ =============================
>+
>+
>+Basic information
>+-----------------
>+
>+Netlink interface for ethtool uses generic netlink family "ethtool" (userspace
>+application should use macros ETHTOOL_GENL_NAME and ETHTOOL_GENL_VERSION
>+defined in <linux/ethtool_netlink.h> uapi header). This family does not use
>+a specific header, all information in requests and replies is passed using
>+netlink attributes.
>+
>+The ethtool netlink interface uses extended ACK for error and warning
>+reporting, userspace application developers are encouraged to make these
>+messages available to user in a suitable way.
>+
>+Requests can be divided into three categories: "get" (retrieving information),
>+"set" (setting parameters) and "action" (invoking an action).
>+
>+All "set" and "action" type requests require admin privileges (CAP_NET_ADMIN
>+in the namespace). Most "get" type requests are allowed for anyone but there
>+are exceptions (where the response contains sensitive information). In some
>+cases, the request as such is allowed for anyone but unprivileged users have
>+attributes with sensitive information (e.g. wake-on-lan password) omitted.
>+
>+
>+Conventions
>+-----------
>+
>+Attributes which represent a boolean value usually use u8 type so that we can
>+distinguish three states: "on", "off" and "not present" (meaning the
>+information is not available in "get" requests or value is not to be changed
>+in "set" requests). For these attributes, the "true" value should be passed as
>+number 1 but any non-zero value should be understood as "true" by recipient.
>+
>+In the message structure descriptions below, if an attribute name is suffixed
>+with "+", parent nest can contain multiple attributes of the same type. This
>+implements an array of entries.
>+
>+
>+Request header
>+--------------
>+
>+Each request or reply message contains a nested attribute with common header.
>+Structure of this header is
Missing ":"
>+
>+ ETHTOOL_A_HEADER_DEV_INDEX (u32) device ifindex
>+ ETHTOOL_A_HEADER_DEV_NAME (string) device name
>+ ETHTOOL_A_HEADER_INFOMASK (u32) info mask
>+ ETHTOOL_A_HEADER_GFLAGS (u32) flags common for all requests
>+ ETHTOOL_A_HEADER_RFLAGS (u32) request specific flags
>+
>+ETHTOOL_A_HEADER_DEV_INDEX and ETHTOOL_A_HEADER_DEV_NAME identify the device
>+message relates to. One of them is sufficient in requests, if both are used,
>+they must identify the same device. Some requests, e.g. global string sets, do
>+not require device identification. Most GET requests also allow dump requests
>+without device identification to query the same information for all devices
>+providing it (each device in a separate message).
>+
>+Optional info mask allows to ask only for a part of data provided by GET
How this "infomask" works? What are the bits related to? Is that request
specific?
>+request types. If omitted or zero, all data is returned. The two flag bitmaps
>+allow enabling requestoptions; ETHTOOL_A_HEADER_GFLAGS are global flags common
s/requestoptions;/request options./ ?
>+for all request types, flags recognized in ETHTOOL_A_HEADER_RFLAGS and their
>+interpretation are specific for each request type. Global flags are
>+
>+ ETHTOOL_RF_COMPACT use compact format bitsets in reply
Why "RF"? Isn't this "GF"? I would like "ETHTOOL_GFLAG_COMPACT" better.
>+ ETHTOOL_RF_REPLY send optional reply (SET and ACT requests)
>+
>+Request specific flags are described with each request type. For both flag
>+attributes, new flags should follow the general idea that if the flag is not
>+set, the behaviour is the same as (or closer to) the behaviour before it was
"closer to" ? That would be unfortunate I believe...
>+introduced.
>+
>+
>+List of message types
>+---------------------
>+
>+All constants identifying message types use ETHTOOL_CMD_ prefix and suffix
>+according to message purpose:
>+
>+ _GET userspace request to retrieve data
>+ _SET userspace request to set data
>+ _ACT userspace request to perform an action
>+ _GET_REPLY kernel reply to a GET request
>+ _SET_REPLY kernel reply to a SET request
>+ _ACT_REPLY kernel reply to an ACT request
>+ _NTF kernel notification
>+
>+"GET" requests are sent by userspace applications to retrieve device
>+information. They usually do not contain any message specific attributes.
>+Kernel replies with corresponding "GET_REPLY" message. For most types, "GET"
>+request with NLM_F_DUMP and no device identification can be used to query the
>+information for all devices supporting the request.
>+
>+If the data can be also modified, corresponding "SET" message with the same
>+layout as "GET" reply is used to request changes. Only attributes where
s/"GET" reply"/"GET_REPLY" ?
Maybe better to emphasize that the "GET_REPLY" is the one corresponding
with "SET". But perhaps I got this sentence all wrong :/
>+a change is requested are included in such request (also, not all attributes
>+may be changed). Replies to most "SET" request consist only of error code and
>+extack; if kernel provides additional data, it is sent in the form of
>+corresponding "SET_REPLY" message (if ETHTOOL_RF_REPLY flag was set in request
>+header).
>+
>+Data modification also triggers sending a "NTF" message with a notification.
>+These usually bear only a subset of attributes which was affected by the
>+change. The same notification is issued if the data is modified using other
>+means (mostly ioctl ethtool interface). Unlike notifications from ethtool
>+netlink code which are only sent if something actually changed, notifications
>+triggered by ioctl interface may be sent even if the request did not actually
>+change any data.
Interesting. What's the reason for that?
>+
>+"ACT" messages request kernel (driver) to perform a specific action. If some
>+information is reported by kernel (as requested by ETHTOOL_RF_REPLY flag in
>+request header), the reply takes form of an "ACT_REPLY" message. Performing an
>+action also triggers a notification ("NTF" message).
>+
>+Later sections describe the format and semantics of these messages.
>+
>+
>+Request translation
>+-------------------
>+
>+The following table maps ioctl commands to netlink commands providing their
>+functionality. Entries with "n/a" in right column are commands which do not
>+have their netlink replacement yet.
>+
>+ioctl command netlink command
>+---------------------------------------------------------------------
>+ETHTOOL_GSET n/a
>+ETHTOOL_SSET n/a
>+ETHTOOL_GDRVINFO n/a
>+ETHTOOL_GREGS n/a
>+ETHTOOL_GWOL n/a
>+ETHTOOL_SWOL n/a
>+ETHTOOL_GMSGLVL n/a
>+ETHTOOL_SMSGLVL n/a
>+ETHTOOL_NWAY_RST n/a
>+ETHTOOL_GLINK n/a
>+ETHTOOL_GEEPROM n/a
>+ETHTOOL_SEEPROM n/a
>+ETHTOOL_GCOALESCE n/a
>+ETHTOOL_SCOALESCE n/a
>+ETHTOOL_GRINGPARAM n/a
>+ETHTOOL_SRINGPARAM n/a
>+ETHTOOL_GPAUSEPARAM n/a
>+ETHTOOL_SPAUSEPARAM n/a
>+ETHTOOL_GRXCSUM n/a
>+ETHTOOL_SRXCSUM n/a
>+ETHTOOL_GTXCSUM n/a
>+ETHTOOL_STXCSUM n/a
>+ETHTOOL_GSG n/a
>+ETHTOOL_SSG n/a
>+ETHTOOL_TEST n/a
>+ETHTOOL_GSTRINGS n/a
>+ETHTOOL_PHYS_ID n/a
>+ETHTOOL_GSTATS n/a
>+ETHTOOL_GTSO n/a
>+ETHTOOL_STSO n/a
>+ETHTOOL_GPERMADDR rtnetlink RTM_GETLINK
>+ETHTOOL_GUFO n/a
>+ETHTOOL_SUFO n/a
>+ETHTOOL_GGSO n/a
>+ETHTOOL_SGSO n/a
>+ETHTOOL_GFLAGS n/a
>+ETHTOOL_SFLAGS n/a
>+ETHTOOL_GPFLAGS n/a
>+ETHTOOL_SPFLAGS n/a
>+ETHTOOL_GRXFH n/a
>+ETHTOOL_SRXFH n/a
>+ETHTOOL_GGRO n/a
>+ETHTOOL_SGRO n/a
>+ETHTOOL_GRXRINGS n/a
>+ETHTOOL_GRXCLSRLCNT n/a
>+ETHTOOL_GRXCLSRULE n/a
>+ETHTOOL_GRXCLSRLALL n/a
>+ETHTOOL_SRXCLSRLDEL n/a
>+ETHTOOL_SRXCLSRLINS n/a
>+ETHTOOL_FLASHDEV n/a
>+ETHTOOL_RESET n/a
>+ETHTOOL_SRXNTUPLE n/a
>+ETHTOOL_GRXNTUPLE n/a
>+ETHTOOL_GSSET_INFO n/a
>+ETHTOOL_GRXFHINDIR n/a
>+ETHTOOL_SRXFHINDIR n/a
>+ETHTOOL_GFEATURES n/a
>+ETHTOOL_SFEATURES n/a
>+ETHTOOL_GCHANNELS n/a
>+ETHTOOL_SCHANNELS n/a
>+ETHTOOL_SET_DUMP n/a
>+ETHTOOL_GET_DUMP_FLAG n/a
>+ETHTOOL_GET_DUMP_DATA n/a
>+ETHTOOL_GET_TS_INFO n/a
>+ETHTOOL_GMODULEINFO n/a
>+ETHTOOL_GMODULEEEPROM n/a
>+ETHTOOL_GEEE n/a
>+ETHTOOL_SEEE n/a
>+ETHTOOL_GRSSH n/a
>+ETHTOOL_SRSSH n/a
>+ETHTOOL_GTUNABLE n/a
>+ETHTOOL_STUNABLE n/a
>+ETHTOOL_GPHYSTATS n/a
>+ETHTOOL_PERQUEUE n/a
>+ETHTOOL_GLINKSETTINGS n/a
>+ETHTOOL_SLINKSETTINGS n/a
>+ETHTOOL_PHY_GTUNABLE n/a
>+ETHTOOL_PHY_STUNABLE n/a
>+ETHTOOL_GFECPARAM n/a
>+ETHTOOL_SFECPARAM n/a
>diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h
>new file mode 100644
>index 000000000000..0412adb4f42f
>--- /dev/null
>+++ b/include/linux/ethtool_netlink.h
>@@ -0,0 +1,9 @@
>+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>+
>+#ifndef _LINUX_ETHTOOL_NETLINK_H_
>+#define _LINUX_ETHTOOL_NETLINK_H_
>+
>+#include <uapi/linux/ethtool_netlink.h>
>+#include <linux/ethtool.h>
>+
>+#endif /* _LINUX_ETHTOOL_NETLINK_H_ */
>diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
>new file mode 100644
>index 000000000000..9a0fbd4f85d9
>--- /dev/null
>+++ b/include/uapi/linux/ethtool_netlink.h
>@@ -0,0 +1,36 @@
>+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>+/*
>+ * include/uapi/linux/ethtool_netlink.h - netlink interface for ethtool
>+ *
>+ * See Documentation/networking/ethtool-netlink.txt in kernel source tree for
>+ * doucumentation of the interface.
>+ */
>+
>+#ifndef _UAPI_LINUX_ETHTOOL_NETLINK_H_
>+#define _UAPI_LINUX_ETHTOOL_NETLINK_H_
>+
>+#include <linux/ethtool.h>
>+
>+/* message types - userspace to kernel */
>+enum {
>+ ETHTOOL_MSG_USER_NONE,
>+
>+ /* add new constants above here */
>+ __ETHTOOL_MSG_USER_CNT,
>+ ETHTOOL_MSG_USER_MAX = (__ETHTOOL_MSG_USER_CNT - 1)
>+};
>+
>+/* message types - kernel to userspace */
>+enum {
>+ ETHTOOL_MSG_KERNEL_NONE,
>+
>+ /* add new constants above here */
>+ __ETHTOOL_MSG_KERNEL_CNT,
>+ ETHTOOL_MSG_KERNEL_MAX = (__ETHTOOL_MSG_KERNEL_CNT - 1)
>+};
>+
>+/* generic netlink info */
>+#define ETHTOOL_GENL_NAME "ethtool"
>+#define ETHTOOL_GENL_VERSION 1
>+
>+#endif /* _UAPI_LINUX_ETHTOOL_NETLINK_H_ */
>diff --git a/net/Kconfig b/net/Kconfig
>index 57f51a279ad6..65b760d26eec 100644
>--- a/net/Kconfig
>+++ b/net/Kconfig
>@@ -447,6 +447,14 @@ config FAILOVER
> migration of VMs with direct attached VFs by failing over to the
> paravirtual datapath when the VF is unplugged.
>
>+config ETHTOOL_NETLINK
>+ bool "Netlink interface for ethtool"
>+ default y
>+ help
>+ An alternative userspace interface for ethtool based on generic
>+ netlink. It provides better extensibility and some new features,
>+ e.g. notification messages.
>+
> endif # if NET
>
> # Used by archs to tell that they support BPF JIT compiler plus which flavour.
>diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
>index 3ebfab2bca66..f30e0da88be5 100644
>--- a/net/ethtool/Makefile
>+++ b/net/ethtool/Makefile
>@@ -1,3 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0
>
>-obj-y += ioctl.o
>+obj-y += ioctl.o
>+
>+obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
Hmm, I wonder, why not to make this always on? We want users to use
it, memory savings in case it is off would be minimal. RTNetlink is also
always on. Ethtool ioctl is also always on.
>+
>+ethtool_nl-y := netlink.o
>diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
>new file mode 100644
>index 000000000000..3c98b41f04e5
>--- /dev/null
>+++ b/net/ethtool/netlink.c
>@@ -0,0 +1,33 @@
>+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>+
>+#include <linux/ethtool_netlink.h>
>+#include "netlink.h"
>+
>+/* genetlink setup */
>+
>+static const struct genl_ops ethtool_genl_ops[] = {
>+};
>+
>+static struct genl_family ethtool_genl_family = {
>+ .name = ETHTOOL_GENL_NAME,
>+ .version = ETHTOOL_GENL_VERSION,
>+ .netnsok = true,
>+ .parallel_ops = true,
>+ .ops = ethtool_genl_ops,
>+ .n_ops = ARRAY_SIZE(ethtool_genl_ops),
>+};
>+
>+/* module setup */
>+
>+static int __init ethnl_init(void)
>+{
>+ int ret;
>+
>+ ret = genl_register_family(ðtool_genl_family);
>+ if (WARN(ret < 0, "ethtool: genetlink family registration failed"))
WARN(ret, ...)
>+ return ret;
>+
>+ return 0;
>+}
>+
>+subsys_initcall(ethnl_init);
>diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
>new file mode 100644
>index 000000000000..257ae55ccc82
>--- /dev/null
>+++ b/net/ethtool/netlink.h
>@@ -0,0 +1,10 @@
>+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>+
>+#ifndef _NET_ETHTOOL_NETLINK_H
>+#define _NET_ETHTOOL_NETLINK_H
>+
>+#include <linux/ethtool_netlink.h>
>+#include <linux/netdevice.h>
>+#include <net/genetlink.h>
>+
>+#endif /* _NET_ETHTOOL_NETLINK_H */
>--
>2.22.0
>
Tue, Jul 02, 2019 at 01:50:04PM CEST, [email protected] wrote:
>Add common request/reply header definition and helpers to parse request
>header and fill reply header. Provide ethnl_update_* helpers to update
>structure members from request attributes (to be used for *_SET requests).
>
>Signed-off-by: Michal Kubecek <[email protected]>
>---
> include/uapi/linux/ethtool_netlink.h | 23 ++++
> net/ethtool/netlink.c | 173 +++++++++++++++++++++++++++
> net/ethtool/netlink.h | 145 ++++++++++++++++++++++
> 3 files changed, 341 insertions(+)
>
>diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
>index 9a0fbd4f85d9..ffd7db0848ef 100644
>--- a/include/uapi/linux/ethtool_netlink.h
>+++ b/include/uapi/linux/ethtool_netlink.h
>@@ -29,6 +29,29 @@ enum {
> ETHTOOL_MSG_KERNEL_MAX = (__ETHTOOL_MSG_KERNEL_CNT - 1)
> };
>
>+/* request header */
>+
>+/* use compact bitsets in reply */
>+#define ETHTOOL_RF_COMPACT (1 << 0)
"COMPACT_BITSETS"?
>+/* provide optional reply for SET or ACT requests */
>+#define ETHTOOL_RF_REPLY (1 << 1)
"OPTIONAL_REPLY"?
>+
>+#define ETHTOOL_RF_ALL (ETHTOOL_RF_COMPACT | \
>+ ETHTOOL_RF_REPLY)
>+
>+enum {
>+ ETHTOOL_A_HEADER_UNSPEC,
>+ ETHTOOL_A_HEADER_DEV_INDEX, /* u32 */
>+ ETHTOOL_A_HEADER_DEV_NAME, /* string */
>+ ETHTOOL_A_HEADER_INFOMASK, /* u32 */
>+ ETHTOOL_A_HEADER_GFLAGS, /* u32 - ETHTOOL_RF_* */
>+ ETHTOOL_A_HEADER_RFLAGS, /* u32 */
>+
>+ /* add new constants above here */
>+ __ETHTOOL_A_HEADER_CNT,
>+ ETHTOOL_A_HEADER_MAX = (__ETHTOOL_A_HEADER_CNT - 1)
>+};
>+
> /* generic netlink info */
> #define ETHTOOL_GENL_NAME "ethtool"
> #define ETHTOOL_GENL_VERSION 1
>diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
>index 3c98b41f04e5..e13f29bbd625 100644
>--- a/net/ethtool/netlink.c
>+++ b/net/ethtool/netlink.c
>@@ -1,8 +1,181 @@
> // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>
>+#include <net/sock.h>
> #include <linux/ethtool_netlink.h>
> #include "netlink.h"
>
>+static struct genl_family ethtool_genl_family;
>+
>+static const struct nla_policy dflt_header_policy[ETHTOOL_A_HEADER_MAX + 1] = {
>+ [ETHTOOL_A_HEADER_UNSPEC] = { .type = NLA_REJECT },
>+ [ETHTOOL_A_HEADER_DEV_INDEX] = { .type = NLA_U32 },
>+ [ETHTOOL_A_HEADER_DEV_NAME] = { .type = NLA_NUL_STRING,
>+ .len = IFNAMSIZ - 1 },
>+ [ETHTOOL_A_HEADER_INFOMASK] = { .type = NLA_U32 },
>+ [ETHTOOL_A_HEADER_GFLAGS] = { .type = NLA_U32 },
>+ [ETHTOOL_A_HEADER_RFLAGS] = { .type = NLA_U32 },
>+};
>+
>+/**
>+ * ethnl_parse_header() - parse request header
>+ * @req_info: structure to put results into
>+ * @nest: nest attribute with request header
>+ * @net: request netns
>+ * @extack: netlink extack for error reporting
>+ * @policy: netlink attribute policy to validate header; use
>+ * @dflt_header_policy (all attributes allowed) if null
>+ * @require_dev: fail if no device identiified in header
>+ *
>+ * Parse request header in nested attribute @nest and puts results into
>+ * the structure pointed to by @req_info. Extack from @info is used for error
>+ * reporting. If req_info->dev is not null on return, reference to it has
>+ * been taken. If error is returned, *req_info is null initialized and no
>+ * reference is held.
>+ *
>+ * Return: 0 on success or negative error code
>+ */
>+int ethnl_parse_header(struct ethnl_req_info *req_info,
>+ const struct nlattr *nest, struct net *net,
s/nest/header/ ? Nest is way too generic and really tells nothing :/
>+ struct netlink_ext_ack *extack,
>+ const struct nla_policy *policy, bool require_dev)
>+{
>+ struct nlattr *tb[ETHTOOL_A_HEADER_MAX + 1];
>+ const struct nlattr *devname_attr;
>+ struct net_device *dev = NULL;
>+ int ret;
>+
>+ if (!nest) {
>+ NL_SET_ERR_MSG(extack, "request header missing");
>+ return -EINVAL;
>+ }
>+ ret = nla_parse_nested(tb, ETHTOOL_A_HEADER_MAX, nest,
>+ policy ?: dflt_header_policy, extack);
>+ if (ret < 0)
if (ret)
Same remark goes to the rest of the code (also the rest of the patches),
in case called function cannot return positive values.
>+ return ret;
>+ devname_attr = tb[ETHTOOL_A_HEADER_DEV_NAME];
>+
>+ if (tb[ETHTOOL_A_HEADER_DEV_INDEX]) {
>+ u32 ifindex = nla_get_u32(tb[ETHTOOL_A_HEADER_DEV_INDEX]);
>+
>+ dev = dev_get_by_index(net, ifindex);
>+ if (!dev) {
>+ NL_SET_ERR_MSG_ATTR(extack,
>+ tb[ETHTOOL_A_HEADER_DEV_INDEX],
>+ "no device matches ifindex");
>+ return -ENODEV;
>+ }
>+ /* if both ifindex and ifname are passed, they must match */
>+ if (devname_attr &&
>+ strncmp(dev->name, nla_data(devname_attr), IFNAMSIZ)) {
>+ dev_put(dev);
>+ NL_SET_ERR_MSG_ATTR(extack, nest,
>+ "ifindex and name do not match");
>+ return -ENODEV;
>+ }
>+ } else if (devname_attr) {
>+ dev = dev_get_by_name(net, nla_data(devname_attr));
>+ if (!dev) {
>+ NL_SET_ERR_MSG_ATTR(extack, devname_attr,
>+ "no device matches name");
>+ return -ENODEV;
>+ }
>+ } else if (require_dev) {
>+ NL_SET_ERR_MSG_ATTR(extack, nest,
>+ "neither ifindex nor name specified");
>+ return -EINVAL;
>+ }
>+
>+ if (dev && !netif_device_present(dev)) {
>+ dev_put(dev);
>+ NL_SET_ERR_MSG(extack, "device not present");
>+ return -ENODEV;
>+ }
>+
>+ req_info->dev = dev;
>+ ethnl_update_u32(&req_info->req_mask, tb[ETHTOOL_A_HEADER_INFOMASK]);
>+ ethnl_update_u32(&req_info->global_flags, tb[ETHTOOL_A_HEADER_GFLAGS]);
>+ ethnl_update_u32(&req_info->req_flags, tb[ETHTOOL_A_HEADER_RFLAGS]);
Just:
req_info->req_mask = nla_get_u32(tb[ETHTOOL_A_HEADER_INFOMASK];
...
Not sure what ethnl_update_u32() is good for, but it is not needed here.
>+
>+ return 0;
>+}
>+
>+/**
>+ * ethnl_fill_reply_header() - Put standard header into a reply message
>+ * @skb: skb with the message
>+ * @dev: network device to describe in header
>+ * @attrtype: attribute type to use for the nest
>+ *
>+ * Create a nested attribute with attributes describing given network device.
>+ * Clean up on error.
Cleanup is obvious, no need to mention it in API docs.
>+ *
>+ * Return: 0 on success, error value (-EMSGSIZE only) on error
>+ */
>+int ethnl_fill_reply_header(struct sk_buff *skb, struct net_device *dev,
>+ u16 attrtype)
>+{
>+ struct nlattr *nest;
>+
>+ if (!dev)
>+ return 0;
>+ nest = nla_nest_start(skb, attrtype);
>+ if (!nest)
>+ return -EMSGSIZE;
>+
>+ if (nla_put_u32(skb, ETHTOOL_A_HEADER_DEV_INDEX, (u32)dev->ifindex) ||
>+ nla_put_string(skb, ETHTOOL_A_HEADER_DEV_NAME, dev->name))
>+ goto nla_put_failure;
>+ /* If more attributes are put into reply header, ethnl_header_size()
>+ * must be updated to account for them.
>+ */
>+
>+ nla_nest_end(skb, nest);
>+ return 0;
>+
>+nla_put_failure:
>+ nla_nest_cancel(skb, nest);
>+ return -EMSGSIZE;
>+}
>+
>+/**
>+ * ethnl_reply_init() - Create skb for a reply and fill device identification
>+ * @payload: payload length (without netlink and genetlink header)
>+ * @dev: device the reply is about (may be null)
>+ * @cmd: ETHTOOL_MSG_* message type for reply
>+ * @info: genetlink info of the received packet we respond to
>+ * @ehdrp: place to store payload pointer returned by genlmsg_new()
>+ *
>+ * Return: pointer to allocated skb on success, NULL on error
>+ */
>+struct sk_buff *ethnl_reply_init(size_t payload, struct net_device *dev, u8 cmd,
>+ u16 hdr_attrtype, struct genl_info *info,
>+ void **ehdrp)
>+{
>+ struct sk_buff *skb;
>+
>+ skb = genlmsg_new(payload, GFP_KERNEL);
>+ if (!skb)
>+ goto err;
>+ *ehdrp = genlmsg_put_reply(skb, info, ðtool_genl_family, 0, cmd);
>+ if (!*ehdrp)
>+ goto err_free;
>+
>+ if (dev) {
>+ int ret;
>+
>+ ret = ethnl_fill_reply_header(skb, dev, hdr_attrtype);
>+ if (ret < 0)
>+ goto err;
>+ }
>+ return skb;
>+
>+err_free:
>+ nlmsg_free(skb);
>+ if (info)
>+ GENL_SET_ERR_MSG(info, "failed to setup reply message");
>+err:
Why also not fillup extack msg here?
>+ return NULL;
>+}
>+
> /* genetlink setup */
>
> static const struct genl_ops ethtool_genl_ops[] = {
>diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
>index 257ae55ccc82..5510eb7054b3 100644
>--- a/net/ethtool/netlink.h
>+++ b/net/ethtool/netlink.h
>@@ -6,5 +6,150 @@
> #include <linux/ethtool_netlink.h>
> #include <linux/netdevice.h>
> #include <net/genetlink.h>
>+#include <net/sock.h>
>+
>+struct ethnl_req_info;
>+
>+int ethnl_parse_header(struct ethnl_req_info *req_info,
>+ const struct nlattr *nest, struct net *net,
>+ struct netlink_ext_ack *extack,
>+ const struct nla_policy *policy, bool require_dev);
>+int ethnl_fill_reply_header(struct sk_buff *skb, struct net_device *dev,
>+ u16 attrtype);
>+struct sk_buff *ethnl_reply_init(size_t payload, struct net_device *dev, u8 cmd,
>+ u16 hdr_attrtype, struct genl_info *info,
>+ void **ehdrp);
>+
>+static inline int ethnl_str_size(const char *s)
If you really need this helper, put it into netlink code. There's nothing
ethtool-specific about this.
>+{
>+ return nla_total_size(strlen(s) + 1);
>+}
>+
>+/* The ethnl_update_* helpers set value pointed to by @dst to the value of
>+ * netlink attribute @attr (if attr is not null). They return true if *dst
>+ * value was changed, false if not.
>+ */
>+static inline bool ethnl_update_u32(u32 *dst, struct nlattr *attr)
I'm still not sure I'm convinced about these "update helpers" :)
>+{
>+ u32 val;
>+
>+ if (!attr)
>+ return false;
>+ val = nla_get_u32(attr);
>+ if (*dst == val)
>+ return false;
>+
>+ *dst = val;
>+ return true;
>+}
>+
>+static inline bool ethnl_update_u8(u8 *dst, struct nlattr *attr)
>+{
>+ u8 val;
>+
>+ if (!attr)
>+ return false;
>+ val = nla_get_u8(attr);
>+ if (*dst == val)
>+ return false;
>+
>+ *dst = val;
>+ return true;
>+}
>+
>+/* update u32 value used as bool from NLA_U8 attribute */
>+static inline bool ethnl_update_bool32(u32 *dst, struct nlattr *attr)
>+{
>+ u8 val;
>+
>+ if (!attr)
>+ return false;
>+ val = !!nla_get_u8(attr);
>+ if (!!*dst == val)
>+ return false;
>+
>+ *dst = val;
>+ return true;
>+}
>+
>+static inline bool ethnl_update_binary(u8 *dst, unsigned int len,
void *dst
>+ struct nlattr *attr)
>+{
>+ if (!attr)
>+ return false;
>+ if (nla_len(attr) < len)
>+ len = nla_len(attr);
>+ if (!memcmp(dst, nla_data(attr), len))
>+ return false;
>+
>+ memcpy(dst, nla_data(attr), len);
>+ return true;
>+}
>+
>+static inline bool ethnl_update_bitfield32(u32 *dst, struct nlattr *attr)
>+{
>+ struct nla_bitfield32 change;
>+ u32 newval;
>+
>+ if (!attr)
>+ return false;
>+ change = nla_get_bitfield32(attr);
>+ newval = (*dst & ~change.selector) | (change.value & change.selector);
>+ if (*dst == newval)
>+ return false;
>+
>+ *dst = newval;
>+ return true;
>+}
>+
>+/**
>+ * ethnl_is_privileged() - check if request has sufficient privileges
>+ * @skb: skb with client request
>+ *
>+ * Checks if client request has CAP_NET_ADMIN in its netns. Unlike the flags
>+ * in genl_ops, this allows finer access control, e.g. allowing or denying
>+ * the request based on its contents or witholding only part of the data
>+ * from unprivileged users.
>+ *
>+ * Return: true if request is privileged, false if not
>+ */
>+static inline bool ethnl_is_privileged(struct sk_buff *skb)
I wonder why you need this helper. Genetlink uses
ops->flags & GENL_ADMIN_PERM for this.
>+{
>+ struct net *net = sock_net(skb->sk);
>+
>+ return netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN);
>+}
>+
>+/**
>+ * ethnl_reply_header_size() - total size of reply header
>+ *
>+ * This is an upper estimate so that we do not need to hold RTNL lock longer
>+ * than necessary (to prevent rename between size estimate and composing the
I guess this description is not relevant anymore. I don't see why to
hold rtnl mutex for this function...
>+ * message). Accounts only for device ifindex and name as those are the only
>+ * attributes ethnl_fill_reply_header() puts into the reply header.
>+ */
>+static inline unsigned int ethnl_reply_header_size(void)
>+{
>+ return nla_total_size(nla_total_size(sizeof(u32)) +
>+ nla_total_size(IFNAMSIZ));
>+}
>+
>+/**
>+ * struct ethnl_req_info - base type of request information for GET requests
>+ * @dev: network device the request is for (may be null)
>+ * @req_mask: request mask, bitmap of requested information
>+ * @global_flags: request flags common for all request types
>+ * @req_flags: request flags specific for each request type
>+ * @privileged: privileged request (CAP_NET_ADMIN in netns)
>+ *
>+ * This is a common base, additional members may follow after this structure.
>+ */
>+struct ethnl_req_info {
>+ struct net_device *dev;
>+ u32 req_mask;
>+ u32 global_flags;
>+ u32 req_flags;
>+ bool privileged;
>+};
>
> #endif /* _NET_ETHTOOL_NETLINK_H */
>--
>2.22.0
>
On Tue, Jul 02, 2019 at 02:25:21PM +0200, Jiri Pirko wrote:
> Tue, Jul 02, 2019 at 01:49:59PM CEST, [email protected] wrote:
> >+Request header
> >+--------------
> >+
> >+Each request or reply message contains a nested attribute with common header.
> >+Structure of this header is
>
> Missing ":"
OK
> >+
> >+ ETHTOOL_A_HEADER_DEV_INDEX (u32) device ifindex
> >+ ETHTOOL_A_HEADER_DEV_NAME (string) device name
> >+ ETHTOOL_A_HEADER_INFOMASK (u32) info mask
> >+ ETHTOOL_A_HEADER_GFLAGS (u32) flags common for all requests
> >+ ETHTOOL_A_HEADER_RFLAGS (u32) request specific flags
> >+
> >+ETHTOOL_A_HEADER_DEV_INDEX and ETHTOOL_A_HEADER_DEV_NAME identify the device
> >+message relates to. One of them is sufficient in requests, if both are used,
> >+they must identify the same device. Some requests, e.g. global string sets, do
> >+not require device identification. Most GET requests also allow dump requests
> >+without device identification to query the same information for all devices
> >+providing it (each device in a separate message).
> >+
> >+Optional info mask allows to ask only for a part of data provided by GET
>
> How this "infomask" works? What are the bits related to? Is that request
> specific?
The interpretation is request specific, the information returned for
a GET request is divided into multiple parts and client can choose to
request one of them (usually one). In the code so far, infomask bits
correspond to top level (nest) attributes but I would rather not make it
a strict rule.
I'll make the paragraph more verbose.
> >+request types. If omitted or zero, all data is returned. The two flag bitmaps
> >+allow enabling requestoptions; ETHTOOL_A_HEADER_GFLAGS are global flags common
>
> s/requestoptions;/request options./ ?
Yes.
> >+for all request types, flags recognized in ETHTOOL_A_HEADER_RFLAGS and their
> >+interpretation are specific for each request type. Global flags are
> >+
> >+ ETHTOOL_RF_COMPACT use compact format bitsets in reply
>
> Why "RF"? Isn't this "GF"? I would like "ETHTOOL_GFLAG_COMPACT" better.
RF as Request Flags. At the moment, global flags use ETHTOOL_RF_name
pattern and request specific flags ETHTOOL_RF_msgtype_name. GFLAG and
RFLAG would probably show the relation better, so how about
ETHTOOL_GFLAG_name for global
ETHTOOL_RFLAG_msgtype_name for request specific
> >+ ETHTOOL_RF_REPLY send optional reply (SET and ACT requests)
> >+
> >+Request specific flags are described with each request type. For both flag
> >+attributes, new flags should follow the general idea that if the flag is not
> >+set, the behaviour is the same as (or closer to) the behaviour before it was
>
> "closer to" ? That would be unfortunate I believe...
There may be situations where it cannot be exactly the same, e.g.
because the flag affects interpretation of an attribute which was
introduced together with the flag. How about "...the behaviour is
backward compatible"?
> >+List of message types
> >+---------------------
> >+
> >+All constants identifying message types use ETHTOOL_CMD_ prefix and suffix
> >+according to message purpose:
> >+
> >+ _GET userspace request to retrieve data
> >+ _SET userspace request to set data
> >+ _ACT userspace request to perform an action
> >+ _GET_REPLY kernel reply to a GET request
> >+ _SET_REPLY kernel reply to a SET request
> >+ _ACT_REPLY kernel reply to an ACT request
> >+ _NTF kernel notification
> >+
> >+"GET" requests are sent by userspace applications to retrieve device
> >+information. They usually do not contain any message specific attributes.
> >+Kernel replies with corresponding "GET_REPLY" message. For most types, "GET"
> >+request with NLM_F_DUMP and no device identification can be used to query the
> >+information for all devices supporting the request.
> >+
> >+If the data can be also modified, corresponding "SET" message with the same
> >+layout as "GET" reply is used to request changes. Only attributes where
>
> s/"GET" reply"/"GET_REPLY" ?
> Maybe better to emphasize that the "GET_REPLY" is the one corresponding
> with "SET". But perhaps I got this sentence all wrong :/
OK
> >+a change is requested are included in such request (also, not all attributes
> >+may be changed). Replies to most "SET" request consist only of error code and
> >+extack; if kernel provides additional data, it is sent in the form of
> >+corresponding "SET_REPLY" message (if ETHTOOL_RF_REPLY flag was set in request
> >+header).
> >+
> >+Data modification also triggers sending a "NTF" message with a notification.
> >+These usually bear only a subset of attributes which was affected by the
> >+change. The same notification is issued if the data is modified using other
> >+means (mostly ioctl ethtool interface). Unlike notifications from ethtool
> >+netlink code which are only sent if something actually changed, notifications
> >+triggered by ioctl interface may be sent even if the request did not actually
> >+change any data.
>
> Interesting. What's the reason for that?
Most setting commands in ioctl interface do not even query the original
state, they just pass the structure from ioctl() to ethtool_ops handler.
We could add retrieving the original state first but I suppose we would
still have to call the handler anyway even if requested values are the
same (as that's what kernel does now) and it's not clear if omitting the
notification in such case is the right thing to do.
> >diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
> >index 3ebfab2bca66..f30e0da88be5 100644
> >--- a/net/ethtool/Makefile
> >+++ b/net/ethtool/Makefile
> >@@ -1,3 +1,7 @@
> > # SPDX-License-Identifier: GPL-2.0
> >
> >-obj-y += ioctl.o
> >+obj-y += ioctl.o
> >+
> >+obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
>
> Hmm, I wonder, why not to make this always on? We want users to use
> it, memory savings in case it is off would be minimal. RTNetlink is also
> always on. Ethtool ioctl is also always on.
We have already discussed this in the previous version. Someone claimed
earlier that building a kernel without ethtool interface would make
sense for some minimalistic systems. My plan is to make the ioctl
interface also optional once it's possible for (sufficiently new)
ethtool to work without it.
Michal
On Tue, 2 Jul 2019 13:49:44 +0200 (CEST)
Michal Kubecek <[email protected]> wrote:
> Permanent hardware address of a network device was traditionally provided
> via ethtool ioctl interface but as Jiri Pirko pointed out in a review of
> ethtool netlink interface, rtnetlink is much more suitable for it so let's
> add it to the RTM_NEWLINK message.
>
> Add IFLA_PERM_ADDRESS attribute to RTM_NEWLINK messages unless the
> permanent address is all zeros (i.e. device driver did not fill it). As
> permanent address is not modifiable, reject userspace requests containing
> IFLA_PERM_ADDRESS attribute.
>
> Note: we already provide permanent hardware address for bond slaves;
> unfortunately we cannot drop that attribute for backward compatibility
> reasons.
>
> v5 -> v6: only add the attribute if permanent address is not zero
>
> Signed-off-by: Michal Kubecek <[email protected]>
Do you want to make an iproute patch to display this?
Acked-by: Stephen Hemminger <[email protected]>
On Tue, Jul 02, 2019 at 03:05:15PM +0200, Jiri Pirko wrote:
> Tue, Jul 02, 2019 at 01:50:04PM CEST, [email protected] wrote:
> >
> >+/* request header */
> >+
> >+/* use compact bitsets in reply */
> >+#define ETHTOOL_RF_COMPACT (1 << 0)
>
> "COMPACT_BITSETS"?
>
> >+/* provide optional reply for SET or ACT requests */
> >+#define ETHTOOL_RF_REPLY (1 << 1)
>
> "OPTIONAL_REPLY"?
OK
> >+ ret = nla_parse_nested(tb, ETHTOOL_A_HEADER_MAX, nest,
> >+ policy ?: dflt_header_policy, extack);
> >+ if (ret < 0)
>
> if (ret)
>
> Same remark goes to the rest of the code (also the rest of the patches),
> in case called function cannot return positive values.
The "if (ret < 0)" idiom for "on error do ..." is so ubiquitous through
the whole kernel that I don't think it's worth it to carefully check
which function can return a positive value and which cannot and risk
that one day I overlook that some function. And yet another question is
what exactly "cannot return" means: is it whenever the function does not
return a positive value or only if it's explicitly documented not to?
Looking at existing networking code, e.g. net/netfilter (except ipvs),
net/sched or net/core/rtnetlink.c are using "if (ret < 0)" rather
uniformly. And (as you objected to the check of genl_register_family()
previous patch) even genetlink itself has
err = genl_register_family(&genl_ctrl);
if (err < 0)
goto problem;
in genl_init().
>
>
> >+ return ret;
> >+ devname_attr = tb[ETHTOOL_A_HEADER_DEV_NAME];
> >+
> >+ if (tb[ETHTOOL_A_HEADER_DEV_INDEX]) {
> >+ u32 ifindex = nla_get_u32(tb[ETHTOOL_A_HEADER_DEV_INDEX]);
> >+
> >+ dev = dev_get_by_index(net, ifindex);
> >+ if (!dev) {
> >+ NL_SET_ERR_MSG_ATTR(extack,
> >+ tb[ETHTOOL_A_HEADER_DEV_INDEX],
> >+ "no device matches ifindex");
> >+ return -ENODEV;
> >+ }
> >+ /* if both ifindex and ifname are passed, they must match */
> >+ if (devname_attr &&
> >+ strncmp(dev->name, nla_data(devname_attr), IFNAMSIZ)) {
> >+ dev_put(dev);
> >+ NL_SET_ERR_MSG_ATTR(extack, nest,
> >+ "ifindex and name do not match");
> >+ return -ENODEV;
> >+ }
> >+ } else if (devname_attr) {
> >+ dev = dev_get_by_name(net, nla_data(devname_attr));
> >+ if (!dev) {
> >+ NL_SET_ERR_MSG_ATTR(extack, devname_attr,
> >+ "no device matches name");
> >+ return -ENODEV;
> >+ }
> >+ } else if (require_dev) {
> >+ NL_SET_ERR_MSG_ATTR(extack, nest,
> >+ "neither ifindex nor name specified");
> >+ return -EINVAL;
> >+ }
> >+
> >+ if (dev && !netif_device_present(dev)) {
> >+ dev_put(dev);
> >+ NL_SET_ERR_MSG(extack, "device not present");
> >+ return -ENODEV;
> >+ }
> >+
> >+ req_info->dev = dev;
> >+ ethnl_update_u32(&req_info->req_mask, tb[ETHTOOL_A_HEADER_INFOMASK]);
> >+ ethnl_update_u32(&req_info->global_flags, tb[ETHTOOL_A_HEADER_GFLAGS]);
> >+ ethnl_update_u32(&req_info->req_flags, tb[ETHTOOL_A_HEADER_RFLAGS]);
>
> Just:
> req_info->req_mask = nla_get_u32(tb[ETHTOOL_A_HEADER_INFOMASK];
> ...
>
> Not sure what ethnl_update_u32() is good for, but it is not needed here.
That would result in null pointer dereference if the attribute is
missing. So you would need at least
if (tb[ETHTOOL_A_HEADER_INFOMASK])
req_info->req_mask = nla_get_u32(tb[ETHTOOL_A_HEADER_INFOMASK]);
if (tb[ETHTOOL_A_HEADER_GFLAGS])
req_info->global_flags =
nla_get_u32(tb[ETHTOOL_A_HEADER_GFLAGS]);
if (tb[ETHTOOL_A_HEADER_RFLAGS])
req_info->req_flags = nla_get_u32(tb[ETHTOOL_A_HEADER_RFLAGS]);
I don't think it looks better.
> >+
> >+ return 0;
> >+}
> >+
> >+/**
> >+ * ethnl_fill_reply_header() - Put standard header into a reply message
> >+ * @skb: skb with the message
> >+ * @dev: network device to describe in header
> >+ * @attrtype: attribute type to use for the nest
> >+ *
> >+ * Create a nested attribute with attributes describing given network device.
> >+ * Clean up on error.
>
> Cleanup is obvious, no need to mention it in API docs.
OK
> >+ *
> >+ * Return: 0 on success, error value (-EMSGSIZE only) on error
> >+ */
> >+int ethnl_fill_reply_header(struct sk_buff *skb, struct net_device *dev,
> >+ u16 attrtype)
> >+{
> >+ struct nlattr *nest;
> >+
> >+ if (!dev)
> >+ return 0;
> >+ nest = nla_nest_start(skb, attrtype);
> >+ if (!nest)
> >+ return -EMSGSIZE;
> >+
> >+ if (nla_put_u32(skb, ETHTOOL_A_HEADER_DEV_INDEX, (u32)dev->ifindex) ||
> >+ nla_put_string(skb, ETHTOOL_A_HEADER_DEV_NAME, dev->name))
> >+ goto nla_put_failure;
> >+ /* If more attributes are put into reply header, ethnl_header_size()
> >+ * must be updated to account for them.
> >+ */
> >+
> >+ nla_nest_end(skb, nest);
> >+ return 0;
> >+
> >+nla_put_failure:
> >+ nla_nest_cancel(skb, nest);
> >+ return -EMSGSIZE;
> >+}
> >+
> >+/**
> >+ * ethnl_reply_init() - Create skb for a reply and fill device identification
> >+ * @payload: payload length (without netlink and genetlink header)
> >+ * @dev: device the reply is about (may be null)
> >+ * @cmd: ETHTOOL_MSG_* message type for reply
> >+ * @info: genetlink info of the received packet we respond to
> >+ * @ehdrp: place to store payload pointer returned by genlmsg_new()
> >+ *
> >+ * Return: pointer to allocated skb on success, NULL on error
> >+ */
> >+struct sk_buff *ethnl_reply_init(size_t payload, struct net_device *dev, u8 cmd,
> >+ u16 hdr_attrtype, struct genl_info *info,
> >+ void **ehdrp)
> >+{
> >+ struct sk_buff *skb;
> >+
> >+ skb = genlmsg_new(payload, GFP_KERNEL);
> >+ if (!skb)
> >+ goto err;
> >+ *ehdrp = genlmsg_put_reply(skb, info, ðtool_genl_family, 0, cmd);
> >+ if (!*ehdrp)
> >+ goto err_free;
> >+
> >+ if (dev) {
> >+ int ret;
> >+
> >+ ret = ethnl_fill_reply_header(skb, dev, hdr_attrtype);
> >+ if (ret < 0)
> >+ goto err;
> >+ }
> >+ return skb;
> >+
> >+err_free:
> >+ nlmsg_free(skb);
> >+ if (info)
> >+ GENL_SET_ERR_MSG(info, "failed to setup reply message");
> >+err:
>
> Why also not fillup extack msg here?
Right, err label should be right below the nlmsg_free(skb), thanks. And
now I noticed another mistake: on ethnl_fill_reply_header() failure, we
should go to err_free, not err.
> >+static inline int ethnl_str_size(const char *s)
>
> If you really need this helper, put it into netlink code. There's nothing
> ethtool-specific about this.
OK, I'll look into it. I've been already thinking about some kind of
NLA_SIZEOF() macro as about 1/3 of all uses of nla_total_size() follow
the nla_total_size(sizeof(...)) pattern (and lot more should follow it
but are written like e.g. nla_total_size(4) instead). This is another
common pattern.
> >+/* The ethnl_update_* helpers set value pointed to by @dst to the value of
> >+ * netlink attribute @attr (if attr is not null). They return true if *dst
> >+ * value was changed, false if not.
> >+ */
> >+static inline bool ethnl_update_u32(u32 *dst, struct nlattr *attr)
>
> I'm still not sure I'm convinced about these "update helpers" :)
Just imagine what would e.g.
if (ethnl_update_u32(&data.rx_pending, tb[ETHTOOL_A_RING_RX_PENDING]))
mod = true;
if (ethnl_update_u32(&data.rx_mini_pending,
tb[ETHTOOL_A_RING_RX_MINI_PENDING]))
mod = true;
if (ethnl_update_u32(&data.rx_jumbo_pending,
tb[ETHTOOL_A_RING_RX_JUMBO_PENDING]))
mod = true;
if (ethnl_update_u32(&data.tx_pending, tb[ETHTOOL_A_RING_TX_PENDING]))
mod = true;
if (!mod)
return 0;
look like without them. And coalescing parameters would be much worse
(22 attributes / struct members).
> >+{
> >+ u32 val;
> >+
> >+ if (!attr)
> >+ return false;
> >+ val = nla_get_u32(attr);
> >+ if (*dst == val)
> >+ return false;
> >+
> >+ *dst = val;
> >+ return true;
> >+}
...
> >+static inline bool ethnl_update_binary(u8 *dst, unsigned int len,
>
> void *dst
OK.
> >+/**
> >+ * ethnl_is_privileged() - check if request has sufficient privileges
> >+ * @skb: skb with client request
> >+ *
> >+ * Checks if client request has CAP_NET_ADMIN in its netns. Unlike the flags
> >+ * in genl_ops, this allows finer access control, e.g. allowing or denying
> >+ * the request based on its contents or witholding only part of the data
> >+ * from unprivileged users.
> >+ *
> >+ * Return: true if request is privileged, false if not
> >+ */
> >+static inline bool ethnl_is_privileged(struct sk_buff *skb)
>
> I wonder why you need this helper. Genetlink uses
> ops->flags & GENL_ADMIN_PERM for this.
It's explained in the function description. Sometimes we need finer
control than by request message type. An example is the WoL password:
ETHTOOL_GWOL is privileged because of it but I believe there si no
reason why unprivileged user couldn't see enabled WoL modes, we can
simply omit the password for him. (Also, it allows to combine query for
WoL settings with other unprivileged settings.)
> >+/**
> >+ * ethnl_reply_header_size() - total size of reply header
> >+ *
> >+ * This is an upper estimate so that we do not need to hold RTNL lock longer
> >+ * than necessary (to prevent rename between size estimate and composing the
>
> I guess this description is not relevant anymore. I don't see why to
> hold rtnl mutex for this function...
You don't need it for this function, it's the other way around: unless
you hold RTNL lock for the whole time covering both checking needed
message size and filling the message - and we don't - the device could
be renamed in between. Thus if we returned size based on current device
name, it might not be sufficient at the time the header is filled.
That's why this function returns maximum possible size (which is
actually a constant).
Michal
> >+ * message). Accounts only for device ifindex and name as those are the only
> >+ * attributes ethnl_fill_reply_header() puts into the reply header.
> >+ */
> >+static inline unsigned int ethnl_reply_header_size(void)
> >+{
> >+ return nla_total_size(nla_total_size(sizeof(u32)) +
> >+ nla_total_size(IFNAMSIZ));
> >+}
On Tue, Jul 02, 2019 at 07:55:00AM -0700, Stephen Hemminger wrote:
> On Tue, 2 Jul 2019 13:49:44 +0200 (CEST)
> Michal Kubecek <[email protected]> wrote:
>
> > Permanent hardware address of a network device was traditionally provided
> > via ethtool ioctl interface but as Jiri Pirko pointed out in a review of
> > ethtool netlink interface, rtnetlink is much more suitable for it so let's
> > add it to the RTM_NEWLINK message.
> >
> > Add IFLA_PERM_ADDRESS attribute to RTM_NEWLINK messages unless the
> > permanent address is all zeros (i.e. device driver did not fill it). As
> > permanent address is not modifiable, reject userspace requests containing
> > IFLA_PERM_ADDRESS attribute.
> >
> > Note: we already provide permanent hardware address for bond slaves;
> > unfortunately we cannot drop that attribute for backward compatibility
> > reasons.
> >
> > v5 -> v6: only add the attribute if permanent address is not zero
> >
> > Signed-off-by: Michal Kubecek <[email protected]>
>
> Do you want to make an iproute patch to display this?
Yes, I'm going to submit it once this patch gets into net-next.
Michal
> Acked-by: Stephen Hemminger <[email protected]>
On Tue, 2 Jul 2019 18:34:37 +0200, Michal Kubecek wrote:
> > >+ ret = nla_parse_nested(tb, ETHTOOL_A_HEADER_MAX, nest,
> > >+ policy ?: dflt_header_policy, extack);
> > >+ if (ret < 0)
> >
> > if (ret)
> >
> > Same remark goes to the rest of the code (also the rest of the patches),
> > in case called function cannot return positive values.
>
> The "if (ret < 0)" idiom for "on error do ..." is so ubiquitous through
> the whole kernel that I don't think it's worth it to carefully check
> which function can return a positive value and which cannot and risk
> that one day I overlook that some function. And yet another question is
> what exactly "cannot return" means: is it whenever the function does not
> return a positive value or only if it's explicitly documented not to?
>
> Looking at existing networking code, e.g. net/netfilter (except ipvs),
> net/sched or net/core/rtnetlink.c are using "if (ret < 0)" rather
> uniformly. And (as you objected to the check of genl_register_family()
> previous patch) even genetlink itself has
>
> err = genl_register_family(&genl_ctrl);
> if (err < 0)
> goto problem;
>
> in genl_init().
I agree with Jiri, if a function only returns "0, or -errno" it's
easier to parse if the error check is not only for negative values.
At least to my eyes.
What I'm not sure about is whether we want to delay the merging of this
interface over this..
On Tue, 2 Jul 2019 13:49:59 +0200 (CEST), Michal Kubecek wrote:
> diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
> new file mode 100644
> index 000000000000..97c369aa290b
> --- /dev/null
> +++ b/Documentation/networking/ethtool-netlink.txt
> @@ -0,0 +1,208 @@
> + Netlink interface for ethtool
> + =============================
> +
> +
> +Basic information
> +-----------------
Probably not a blocker for initial merging, but please note a TODO to
convert the documentation to ReST.
On Tue, 2 Jul 2019 13:50:04 +0200 (CEST), Michal Kubecek wrote:
> Add common request/reply header definition and helpers to parse request
> header and fill reply header. Provide ethnl_update_* helpers to update
> structure members from request attributes (to be used for *_SET requests).
>
> Signed-off-by: Michal Kubecek <[email protected]>
> diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
> index 3c98b41f04e5..e13f29bbd625 100644
> --- a/net/ethtool/netlink.c
> +++ b/net/ethtool/netlink.c
> @@ -1,8 +1,181 @@
> // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>
> +#include <net/sock.h>
> #include <linux/ethtool_netlink.h>
> #include "netlink.h"
>
> +static struct genl_family ethtool_genl_family;
> +
> +static const struct nla_policy dflt_header_policy[ETHTOOL_A_HEADER_MAX + 1] = {
> + [ETHTOOL_A_HEADER_UNSPEC] = { .type = NLA_REJECT },
I think we want strict checking on all new netlink interfaces, and
unfortunately that feature is opt-in.. so you need to add:
.strict_start_type = ETHTOOL_A_HEADER_UNSPEC + 1
To the first attr.
> + [ETHTOOL_A_HEADER_DEV_INDEX] = { .type = NLA_U32 },
> + [ETHTOOL_A_HEADER_DEV_NAME] = { .type = NLA_NUL_STRING,
> + .len = IFNAMSIZ - 1 },
> + [ETHTOOL_A_HEADER_INFOMASK] = { .type = NLA_U32 },
> + [ETHTOOL_A_HEADER_GFLAGS] = { .type = NLA_U32 },
> + [ETHTOOL_A_HEADER_RFLAGS] = { .type = NLA_U32 },
> +};
On Tue, 2 Jul 2019 13:50:34 +0200 (CEST), Michal Kubecek wrote:
> +const char *const link_mode_names[] = {
> + __DEFINE_LINK_MODE_NAME(10, T, Half),
> + __DEFINE_LINK_MODE_NAME(10, T, Full),
> + __DEFINE_LINK_MODE_NAME(100, T, Half),
> + __DEFINE_LINK_MODE_NAME(100, T, Full),
> + __DEFINE_LINK_MODE_NAME(1000, T, Half),
> + __DEFINE_LINK_MODE_NAME(1000, T, Full),
> + __DEFINE_SPECIAL_MODE_NAME(Autoneg, "Autoneg"),
> + __DEFINE_SPECIAL_MODE_NAME(TP, "TP"),
> + __DEFINE_SPECIAL_MODE_NAME(AUI, "AUI"),
> + __DEFINE_SPECIAL_MODE_NAME(MII, "MII"),
> + __DEFINE_SPECIAL_MODE_NAME(FIBRE, "FIBRE"),
> + __DEFINE_SPECIAL_MODE_NAME(BNC, "BNC"),
> + __DEFINE_LINK_MODE_NAME(10000, T, Full),
> + __DEFINE_SPECIAL_MODE_NAME(Pause, "Pause"),
> + __DEFINE_SPECIAL_MODE_NAME(Asym_Pause, "Asym_Pause"),
> + __DEFINE_LINK_MODE_NAME(2500, X, Full),
> + __DEFINE_SPECIAL_MODE_NAME(Backplane, "Backplane"),
> + __DEFINE_LINK_MODE_NAME(1000, KX, Full),
...
> + __DEFINE_LINK_MODE_NAME(5000, T, Full),
> + __DEFINE_SPECIAL_MODE_NAME(FEC_NONE, "None"),
> + __DEFINE_SPECIAL_MODE_NAME(FEC_RS, "RS"),
> + __DEFINE_SPECIAL_MODE_NAME(FEC_BASER, "BASER"),
Why are port types and FEC params among link mode strings?
> + __DEFINE_LINK_MODE_NAME(50000, KR, Full),
...
> + __DEFINE_LINK_MODE_NAME(1000, T1, Full),
> +};
On Tue, 2 Jul 2019 19:04:19 -0700, Jakub Kicinski wrote:
> On Tue, 2 Jul 2019 13:50:34 +0200 (CEST), Michal Kubecek wrote:
> > +const char *const link_mode_names[] = {
> > + __DEFINE_LINK_MODE_NAME(10, T, Half),
> > + __DEFINE_LINK_MODE_NAME(10, T, Full),
> > + __DEFINE_LINK_MODE_NAME(100, T, Half),
> > + __DEFINE_LINK_MODE_NAME(100, T, Full),
> > + __DEFINE_LINK_MODE_NAME(1000, T, Half),
> > + __DEFINE_LINK_MODE_NAME(1000, T, Full),
> > + __DEFINE_SPECIAL_MODE_NAME(Autoneg, "Autoneg"),
> > + __DEFINE_SPECIAL_MODE_NAME(TP, "TP"),
> > + __DEFINE_SPECIAL_MODE_NAME(AUI, "AUI"),
> > + __DEFINE_SPECIAL_MODE_NAME(MII, "MII"),
> > + __DEFINE_SPECIAL_MODE_NAME(FIBRE, "FIBRE"),
> > + __DEFINE_SPECIAL_MODE_NAME(BNC, "BNC"),
>
> > + __DEFINE_LINK_MODE_NAME(10000, T, Full),
> > + __DEFINE_SPECIAL_MODE_NAME(Pause, "Pause"),
> > + __DEFINE_SPECIAL_MODE_NAME(Asym_Pause, "Asym_Pause"),
> > + __DEFINE_LINK_MODE_NAME(2500, X, Full),
> > + __DEFINE_SPECIAL_MODE_NAME(Backplane, "Backplane"),
> > + __DEFINE_LINK_MODE_NAME(1000, KX, Full),
> ...
> > + __DEFINE_LINK_MODE_NAME(5000, T, Full),
> > + __DEFINE_SPECIAL_MODE_NAME(FEC_NONE, "None"),
> > + __DEFINE_SPECIAL_MODE_NAME(FEC_RS, "RS"),
> > + __DEFINE_SPECIAL_MODE_NAME(FEC_BASER, "BASER"),
>
> Why are port types and FEC params among link mode strings?
Ah, FEC for autoneg, but port type?
> > + __DEFINE_LINK_MODE_NAME(50000, KR, Full),
> ...
> > + __DEFINE_LINK_MODE_NAME(1000, T1, Full),
> > +};
On Tue, Jul 02, 2019 at 06:29:56PM -0700, Jakub Kicinski wrote:
> On Tue, 2 Jul 2019 13:49:59 +0200 (CEST), Michal Kubecek wrote:
> > diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
> > new file mode 100644
> > index 000000000000..97c369aa290b
> > --- /dev/null
> > +++ b/Documentation/networking/ethtool-netlink.txt
> > @@ -0,0 +1,208 @@
> > + Netlink interface for ethtool
> > + =============================
> > +
> > +
> > +Basic information
> > +-----------------
>
> Probably not a blocker for initial merging, but please note a TODO to
> convert the documentation to ReST.
Yes, I want to do that. What stopped me was that I wasn't sure what to
do with the message structure descriptions. I guess I'll leave them as
preformated text (literal paragraph) for now and leave finding something
more fancy for later.
Michal
On Tue, Jul 02, 2019 at 06:37:24PM -0700, Jakub Kicinski wrote:
> On Tue, 2 Jul 2019 13:50:04 +0200 (CEST), Michal Kubecek wrote:
> > Add common request/reply header definition and helpers to parse request
> > header and fill reply header. Provide ethnl_update_* helpers to update
> > structure members from request attributes (to be used for *_SET requests).
> >
> > Signed-off-by: Michal Kubecek <[email protected]>
>
> > diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
> > index 3c98b41f04e5..e13f29bbd625 100644
> > --- a/net/ethtool/netlink.c
> > +++ b/net/ethtool/netlink.c
> > @@ -1,8 +1,181 @@
> > // SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
> >
> > +#include <net/sock.h>
> > #include <linux/ethtool_netlink.h>
> > #include "netlink.h"
> >
> > +static struct genl_family ethtool_genl_family;
> > +
> > +static const struct nla_policy dflt_header_policy[ETHTOOL_A_HEADER_MAX + 1] = {
> > + [ETHTOOL_A_HEADER_UNSPEC] = { .type = NLA_REJECT },
>
> I think we want strict checking on all new netlink interfaces, and
> unfortunately that feature is opt-in.. so you need to add:
>
> .strict_start_type = ETHTOOL_A_HEADER_UNSPEC + 1
>
> To the first attr.
Oops... I'll have to check again how this works. I thought using
nla_parse_nested() instead of nla_parse_nested_deprecated() is
sufficient to have everything strict checked.
Michal
On Tue, Jul 02, 2019 at 07:11:24PM -0700, Jakub Kicinski wrote:
> On Tue, 2 Jul 2019 19:04:19 -0700, Jakub Kicinski wrote:
> > On Tue, 2 Jul 2019 13:50:34 +0200 (CEST), Michal Kubecek wrote:
> > > +const char *const link_mode_names[] = {
> > > + __DEFINE_LINK_MODE_NAME(10, T, Half),
> > > + __DEFINE_LINK_MODE_NAME(10, T, Full),
> > > + __DEFINE_LINK_MODE_NAME(100, T, Half),
> > > + __DEFINE_LINK_MODE_NAME(100, T, Full),
> > > + __DEFINE_LINK_MODE_NAME(1000, T, Half),
> > > + __DEFINE_LINK_MODE_NAME(1000, T, Full),
> > > + __DEFINE_SPECIAL_MODE_NAME(Autoneg, "Autoneg"),
> > > + __DEFINE_SPECIAL_MODE_NAME(TP, "TP"),
> > > + __DEFINE_SPECIAL_MODE_NAME(AUI, "AUI"),
> > > + __DEFINE_SPECIAL_MODE_NAME(MII, "MII"),
> > > + __DEFINE_SPECIAL_MODE_NAME(FIBRE, "FIBRE"),
> > > + __DEFINE_SPECIAL_MODE_NAME(BNC, "BNC"),
> >
> > > + __DEFINE_LINK_MODE_NAME(10000, T, Full),
> > > + __DEFINE_SPECIAL_MODE_NAME(Pause, "Pause"),
> > > + __DEFINE_SPECIAL_MODE_NAME(Asym_Pause, "Asym_Pause"),
> > > + __DEFINE_LINK_MODE_NAME(2500, X, Full),
> > > + __DEFINE_SPECIAL_MODE_NAME(Backplane, "Backplane"),
> > > + __DEFINE_LINK_MODE_NAME(1000, KX, Full),
> > ...
> > > + __DEFINE_LINK_MODE_NAME(5000, T, Full),
> > > + __DEFINE_SPECIAL_MODE_NAME(FEC_NONE, "None"),
> > > + __DEFINE_SPECIAL_MODE_NAME(FEC_RS, "RS"),
> > > + __DEFINE_SPECIAL_MODE_NAME(FEC_BASER, "BASER"),
> >
> > Why are port types and FEC params among link mode strings?
>
> Ah, FEC for autoneg, but port type?
The bits in supported bitmap are used to pass information which port
types the device supports (but the information which port is selected
is passed in a different way :-( ). It is used by ethtool to provide the
"Supported ports:" line in "ethtool <dev>" output.
I don't like this design where link modes are mixed with few different
and only loosely related bitmaps. Maybe it would be cleaner to split it
into multiple bitmaps and later change the backend (ethtool_ops) too and
only translate to/from this combined bitmap for legacy ioctl interface.
Michal
>
> > > + __DEFINE_LINK_MODE_NAME(50000, KR, Full),
> > ...
> > > + __DEFINE_LINK_MODE_NAME(1000, T1, Full),
> > > +};
>
Tue, Jul 02, 2019 at 04:52:41PM CEST, [email protected] wrote:
>On Tue, Jul 02, 2019 at 02:25:21PM +0200, Jiri Pirko wrote:
>> Tue, Jul 02, 2019 at 01:49:59PM CEST, [email protected] wrote:
>> >+Request header
>> >+--------------
>> >+
>> >+Each request or reply message contains a nested attribute with common header.
>> >+Structure of this header is
>>
>> Missing ":"
>
>OK
>
>> >+
>> >+ ETHTOOL_A_HEADER_DEV_INDEX (u32) device ifindex
>> >+ ETHTOOL_A_HEADER_DEV_NAME (string) device name
>> >+ ETHTOOL_A_HEADER_INFOMASK (u32) info mask
>> >+ ETHTOOL_A_HEADER_GFLAGS (u32) flags common for all requests
>> >+ ETHTOOL_A_HEADER_RFLAGS (u32) request specific flags
>> >+
>> >+ETHTOOL_A_HEADER_DEV_INDEX and ETHTOOL_A_HEADER_DEV_NAME identify the device
>> >+message relates to. One of them is sufficient in requests, if both are used,
>> >+they must identify the same device. Some requests, e.g. global string sets, do
>> >+not require device identification. Most GET requests also allow dump requests
>> >+without device identification to query the same information for all devices
>> >+providing it (each device in a separate message).
>> >+
>> >+Optional info mask allows to ask only for a part of data provided by GET
>>
>> How this "infomask" works? What are the bits related to? Is that request
>> specific?
>
>The interpretation is request specific, the information returned for
>a GET request is divided into multiple parts and client can choose to
>request one of them (usually one). In the code so far, infomask bits
>correspond to top level (nest) attributes but I would rather not make it
>a strict rule.
Wait, so it is a matter of verbosity? If you have multiple parts and the
user is able to chose one of them, why don't you rather have multiple
get commands, one per bit. This infomask construct seems redundant to me.
>
>I'll make the paragraph more verbose.
>
>> >+request types. If omitted or zero, all data is returned. The two flag bitmaps
>> >+allow enabling requestoptions; ETHTOOL_A_HEADER_GFLAGS are global flags common
>>
>> s/requestoptions;/request options./ ?
>
>Yes.
>
>> >+for all request types, flags recognized in ETHTOOL_A_HEADER_RFLAGS and their
>> >+interpretation are specific for each request type. Global flags are
>> >+
>> >+ ETHTOOL_RF_COMPACT use compact format bitsets in reply
>>
>> Why "RF"? Isn't this "GF"? I would like "ETHTOOL_GFLAG_COMPACT" better.
>
>RF as Request Flags. At the moment, global flags use ETHTOOL_RF_name
>pattern and request specific flags ETHTOOL_RF_msgtype_name. GFLAG and
>RFLAG would probably show the relation better, so how about
>
> ETHTOOL_GFLAG_name for global
> ETHTOOL_RFLAG_msgtype_name for request specific
Yep, as I suggested. Looks fine to me.
>
>> >+ ETHTOOL_RF_REPLY send optional reply (SET and ACT requests)
>> >+
>> >+Request specific flags are described with each request type. For both flag
>> >+attributes, new flags should follow the general idea that if the flag is not
>> >+set, the behaviour is the same as (or closer to) the behaviour before it was
>>
>> "closer to" ? That would be unfortunate I believe...
>
>There may be situations where it cannot be exactly the same, e.g.
>because the flag affects interpretation of an attribute which was
>introduced together with the flag. How about "...the behaviour is
>backward compatible"?
Ok.
>
>
>> >+List of message types
>> >+---------------------
>> >+
>> >+All constants identifying message types use ETHTOOL_CMD_ prefix and suffix
>> >+according to message purpose:
>> >+
>> >+ _GET userspace request to retrieve data
>> >+ _SET userspace request to set data
>> >+ _ACT userspace request to perform an action
>> >+ _GET_REPLY kernel reply to a GET request
>> >+ _SET_REPLY kernel reply to a SET request
>> >+ _ACT_REPLY kernel reply to an ACT request
>> >+ _NTF kernel notification
>> >+
>> >+"GET" requests are sent by userspace applications to retrieve device
>> >+information. They usually do not contain any message specific attributes.
>> >+Kernel replies with corresponding "GET_REPLY" message. For most types, "GET"
>> >+request with NLM_F_DUMP and no device identification can be used to query the
>> >+information for all devices supporting the request.
>> >+
>> >+If the data can be also modified, corresponding "SET" message with the same
>> >+layout as "GET" reply is used to request changes. Only attributes where
>>
>> s/"GET" reply"/"GET_REPLY" ?
>> Maybe better to emphasize that the "GET_REPLY" is the one corresponding
>> with "SET". But perhaps I got this sentence all wrong :/
>
>OK
>
>> >+a change is requested are included in such request (also, not all attributes
>> >+may be changed). Replies to most "SET" request consist only of error code and
>> >+extack; if kernel provides additional data, it is sent in the form of
>> >+corresponding "SET_REPLY" message (if ETHTOOL_RF_REPLY flag was set in request
>> >+header).
>> >+
>> >+Data modification also triggers sending a "NTF" message with a notification.
>> >+These usually bear only a subset of attributes which was affected by the
>> >+change. The same notification is issued if the data is modified using other
>> >+means (mostly ioctl ethtool interface). Unlike notifications from ethtool
>> >+netlink code which are only sent if something actually changed, notifications
>> >+triggered by ioctl interface may be sent even if the request did not actually
>> >+change any data.
>>
>> Interesting. What's the reason for that?
>
>Most setting commands in ioctl interface do not even query the original
>state, they just pass the structure from ioctl() to ethtool_ops handler.
>We could add retrieving the original state first but I suppose we would
>still have to call the handler anyway even if requested values are the
>same (as that's what kernel does now) and it's not clear if omitting the
>notification in such case is the right thing to do.
Okay, got it. Better notification with no change than no notification.
>
>> >diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
>> >index 3ebfab2bca66..f30e0da88be5 100644
>> >--- a/net/ethtool/Makefile
>> >+++ b/net/ethtool/Makefile
>> >@@ -1,3 +1,7 @@
>> > # SPDX-License-Identifier: GPL-2.0
>> >
>> >-obj-y += ioctl.o
>> >+obj-y += ioctl.o
>> >+
>> >+obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
>>
>> Hmm, I wonder, why not to make this always on? We want users to use
>> it, memory savings in case it is off would be minimal. RTNetlink is also
>> always on. Ethtool ioctl is also always on.
>
>We have already discussed this in the previous version. Someone claimed
>earlier that building a kernel without ethtool interface would make
>sense for some minimalistic systems. My plan is to make the ioctl
>interface also optional once it's possible for (sufficiently new)
>ethtool to work without it.
Okay, pardon me. I don't recall that conversation.
>
>Michal
Tue, Jul 02, 2019 at 06:34:37PM CEST, [email protected] wrote:
>On Tue, Jul 02, 2019 at 03:05:15PM +0200, Jiri Pirko wrote:
>> Tue, Jul 02, 2019 at 01:50:04PM CEST, [email protected] wrote:
>> >
>> >+/* request header */
>> >+
>> >+/* use compact bitsets in reply */
>> >+#define ETHTOOL_RF_COMPACT (1 << 0)
>>
>> "COMPACT_BITSETS"?
>>
>> >+/* provide optional reply for SET or ACT requests */
>> >+#define ETHTOOL_RF_REPLY (1 << 1)
>>
>> "OPTIONAL_REPLY"?
>
>OK
>
>> >+ ret = nla_parse_nested(tb, ETHTOOL_A_HEADER_MAX, nest,
>> >+ policy ?: dflt_header_policy, extack);
>> >+ if (ret < 0)
>>
>> if (ret)
>>
>> Same remark goes to the rest of the code (also the rest of the patches),
>> in case called function cannot return positive values.
>
>The "if (ret < 0)" idiom for "on error do ..." is so ubiquitous through
>the whole kernel that I don't think it's worth it to carefully check
>which function can return a positive value and which cannot and risk
>that one day I overlook that some function. And yet another question is
>what exactly "cannot return" means: is it whenever the function does not
>return a positive value or only if it's explicitly documented not to?
>
>Looking at existing networking code, e.g. net/netfilter (except ipvs),
>net/sched or net/core/rtnetlink.c are using "if (ret < 0)" rather
>uniformly. And (as you objected to the check of genl_register_family()
>previous patch) even genetlink itself has
>
> err = genl_register_family(&genl_ctrl);
> if (err < 0)
> goto problem;
>
>in genl_init().
>
>>
>>
>> >+ return ret;
>> >+ devname_attr = tb[ETHTOOL_A_HEADER_DEV_NAME];
>> >+
>> >+ if (tb[ETHTOOL_A_HEADER_DEV_INDEX]) {
>> >+ u32 ifindex = nla_get_u32(tb[ETHTOOL_A_HEADER_DEV_INDEX]);
>> >+
>> >+ dev = dev_get_by_index(net, ifindex);
>> >+ if (!dev) {
>> >+ NL_SET_ERR_MSG_ATTR(extack,
>> >+ tb[ETHTOOL_A_HEADER_DEV_INDEX],
>> >+ "no device matches ifindex");
>> >+ return -ENODEV;
>> >+ }
>> >+ /* if both ifindex and ifname are passed, they must match */
>> >+ if (devname_attr &&
>> >+ strncmp(dev->name, nla_data(devname_attr), IFNAMSIZ)) {
>> >+ dev_put(dev);
>> >+ NL_SET_ERR_MSG_ATTR(extack, nest,
>> >+ "ifindex and name do not match");
>> >+ return -ENODEV;
>> >+ }
>> >+ } else if (devname_attr) {
>> >+ dev = dev_get_by_name(net, nla_data(devname_attr));
>> >+ if (!dev) {
>> >+ NL_SET_ERR_MSG_ATTR(extack, devname_attr,
>> >+ "no device matches name");
>> >+ return -ENODEV;
>> >+ }
>> >+ } else if (require_dev) {
>> >+ NL_SET_ERR_MSG_ATTR(extack, nest,
>> >+ "neither ifindex nor name specified");
>> >+ return -EINVAL;
>> >+ }
>> >+
>> >+ if (dev && !netif_device_present(dev)) {
>> >+ dev_put(dev);
>> >+ NL_SET_ERR_MSG(extack, "device not present");
>> >+ return -ENODEV;
>> >+ }
>> >+
>> >+ req_info->dev = dev;
>> >+ ethnl_update_u32(&req_info->req_mask, tb[ETHTOOL_A_HEADER_INFOMASK]);
>> >+ ethnl_update_u32(&req_info->global_flags, tb[ETHTOOL_A_HEADER_GFLAGS]);
>> >+ ethnl_update_u32(&req_info->req_flags, tb[ETHTOOL_A_HEADER_RFLAGS]);
>>
>> Just:
>> req_info->req_mask = nla_get_u32(tb[ETHTOOL_A_HEADER_INFOMASK];
>> ...
>>
>> Not sure what ethnl_update_u32() is good for, but it is not needed here.
>
>That would result in null pointer dereference if the attribute is
>missing. So you would need at least
>
> if (tb[ETHTOOL_A_HEADER_INFOMASK])
> req_info->req_mask = nla_get_u32(tb[ETHTOOL_A_HEADER_INFOMASK]);
> if (tb[ETHTOOL_A_HEADER_GFLAGS])
> req_info->global_flags =
> nla_get_u32(tb[ETHTOOL_A_HEADER_GFLAGS]);
> if (tb[ETHTOOL_A_HEADER_RFLAGS])
> req_info->req_flags = nla_get_u32(tb[ETHTOOL_A_HEADER_RFLAGS]);
Yeah, sure.
>
>I don't think it looks better.
Better than hiding something inside a helper in my opinion - helper that
is there for different reason moreover. Much easier to read the code
and follow.
>
>> >+
>> >+ return 0;
>> >+}
>> >+
>> >+/**
>> >+ * ethnl_fill_reply_header() - Put standard header into a reply message
>> >+ * @skb: skb with the message
>> >+ * @dev: network device to describe in header
>> >+ * @attrtype: attribute type to use for the nest
>> >+ *
>> >+ * Create a nested attribute with attributes describing given network device.
>> >+ * Clean up on error.
>>
>> Cleanup is obvious, no need to mention it in API docs.
>
>OK
>
>> >+ *
>> >+ * Return: 0 on success, error value (-EMSGSIZE only) on error
>> >+ */
>> >+int ethnl_fill_reply_header(struct sk_buff *skb, struct net_device *dev,
>> >+ u16 attrtype)
>> >+{
>> >+ struct nlattr *nest;
>> >+
>> >+ if (!dev)
>> >+ return 0;
>> >+ nest = nla_nest_start(skb, attrtype);
>> >+ if (!nest)
>> >+ return -EMSGSIZE;
>> >+
>> >+ if (nla_put_u32(skb, ETHTOOL_A_HEADER_DEV_INDEX, (u32)dev->ifindex) ||
>> >+ nla_put_string(skb, ETHTOOL_A_HEADER_DEV_NAME, dev->name))
>> >+ goto nla_put_failure;
>> >+ /* If more attributes are put into reply header, ethnl_header_size()
>> >+ * must be updated to account for them.
>> >+ */
>> >+
>> >+ nla_nest_end(skb, nest);
>> >+ return 0;
>> >+
>> >+nla_put_failure:
>> >+ nla_nest_cancel(skb, nest);
>> >+ return -EMSGSIZE;
>> >+}
>> >+
>> >+/**
>> >+ * ethnl_reply_init() - Create skb for a reply and fill device identification
>> >+ * @payload: payload length (without netlink and genetlink header)
>> >+ * @dev: device the reply is about (may be null)
>> >+ * @cmd: ETHTOOL_MSG_* message type for reply
>> >+ * @info: genetlink info of the received packet we respond to
>> >+ * @ehdrp: place to store payload pointer returned by genlmsg_new()
>> >+ *
>> >+ * Return: pointer to allocated skb on success, NULL on error
>> >+ */
>> >+struct sk_buff *ethnl_reply_init(size_t payload, struct net_device *dev, u8 cmd,
>> >+ u16 hdr_attrtype, struct genl_info *info,
>> >+ void **ehdrp)
>> >+{
>> >+ struct sk_buff *skb;
>> >+
>> >+ skb = genlmsg_new(payload, GFP_KERNEL);
>> >+ if (!skb)
>> >+ goto err;
>> >+ *ehdrp = genlmsg_put_reply(skb, info, ðtool_genl_family, 0, cmd);
>> >+ if (!*ehdrp)
>> >+ goto err_free;
>> >+
>> >+ if (dev) {
>> >+ int ret;
>> >+
>> >+ ret = ethnl_fill_reply_header(skb, dev, hdr_attrtype);
>> >+ if (ret < 0)
>> >+ goto err;
>> >+ }
>> >+ return skb;
>> >+
>> >+err_free:
>> >+ nlmsg_free(skb);
>> >+ if (info)
>> >+ GENL_SET_ERR_MSG(info, "failed to setup reply message");
>> >+err:
>>
>> Why also not fillup extack msg here?
>
>Right, err label should be right below the nlmsg_free(skb), thanks. And
>now I noticed another mistake: on ethnl_fill_reply_header() failure, we
>should go to err_free, not err.
>
>> >+static inline int ethnl_str_size(const char *s)
>>
>> If you really need this helper, put it into netlink code. There's nothing
>> ethtool-specific about this.
>
>OK, I'll look into it. I've been already thinking about some kind of
>NLA_SIZEOF() macro as about 1/3 of all uses of nla_total_size() follow
>the nla_total_size(sizeof(...)) pattern (and lot more should follow it
>but are written like e.g. nla_total_size(4) instead). This is another
>common pattern.
>
>> >+/* The ethnl_update_* helpers set value pointed to by @dst to the value of
>> >+ * netlink attribute @attr (if attr is not null). They return true if *dst
>> >+ * value was changed, false if not.
>> >+ */
>> >+static inline bool ethnl_update_u32(u32 *dst, struct nlattr *attr)
>>
>> I'm still not sure I'm convinced about these "update helpers" :)
>
>Just imagine what would e.g.
>
> if (ethnl_update_u32(&data.rx_pending, tb[ETHTOOL_A_RING_RX_PENDING]))
> mod = true;
> if (ethnl_update_u32(&data.rx_mini_pending,
> tb[ETHTOOL_A_RING_RX_MINI_PENDING]))
> mod = true;
> if (ethnl_update_u32(&data.rx_jumbo_pending,
> tb[ETHTOOL_A_RING_RX_JUMBO_PENDING]))
> mod = true;
> if (ethnl_update_u32(&data.tx_pending, tb[ETHTOOL_A_RING_TX_PENDING]))
> mod = true;
> if (!mod)
> return 0;
>
>look like without them. And coalescing parameters would be much worse
>(22 attributes / struct members).
No, I understand your motivation, don't get me wrong. I just wonder that
no other netlink implementation need such mechanism. Maybe I'm not
looking close enough. But if it does, should be rathe netlink helper.
Regarding the example code you have here. It is prefered to store
function result in a variable "if check" that variable. But in your,
code, couldn't this be done without ifs?
bool mod = false;
ethnl_update_u32(&mod, &data.rx_pending, tb[ETHTOOL_A_RING_RX_PENDING]))
ethnl_update_u32(&mod, &data.rx_mini_pending,
tb[ETHTOOL_A_RING_RX_MINI_PENDING]))
ethnl_update_u32(&mod, &data.rx_jumbo_pending,
tb[ETHTOOL_A_RING_RX_JUMBO_PENDING]))
ethnl_update_u32(&mod, &data.tx_pending, tb[ETHTOOL_A_RING_TX_PENDING]))
if (!mod)
return 0;
>
>> >+{
>> >+ u32 val;
>> >+
>> >+ if (!attr)
>> >+ return false;
>> >+ val = nla_get_u32(attr);
>> >+ if (*dst == val)
>> >+ return false;
>> >+
>> >+ *dst = val;
>> >+ return true;
>> >+}
>...
>> >+static inline bool ethnl_update_binary(u8 *dst, unsigned int len,
>>
>> void *dst
>
>OK.
>
>> >+/**
>> >+ * ethnl_is_privileged() - check if request has sufficient privileges
>> >+ * @skb: skb with client request
>> >+ *
>> >+ * Checks if client request has CAP_NET_ADMIN in its netns. Unlike the flags
>> >+ * in genl_ops, this allows finer access control, e.g. allowing or denying
>> >+ * the request based on its contents or witholding only part of the data
>> >+ * from unprivileged users.
>> >+ *
>> >+ * Return: true if request is privileged, false if not
>> >+ */
>> >+static inline bool ethnl_is_privileged(struct sk_buff *skb)
>>
>> I wonder why you need this helper. Genetlink uses
>> ops->flags & GENL_ADMIN_PERM for this.
>
>It's explained in the function description. Sometimes we need finer
>control than by request message type. An example is the WoL password:
>ETHTOOL_GWOL is privileged because of it but I believe there si no
>reason why unprivileged user couldn't see enabled WoL modes, we can
>simply omit the password for him. (Also, it allows to combine query for
>WoL settings with other unprivileged settings.)
Why can't we have rather:
ETHTOOL_WOL_GET for all
ETHTOOL_WOL_PASSWORD_GET with GENL_ADMIN_PERM
?
Better to stick with what we have in gennetlink rather then to bend the
implementation from the very beginning I think.
>
>> >+/**
>> >+ * ethnl_reply_header_size() - total size of reply header
>> >+ *
>> >+ * This is an upper estimate so that we do not need to hold RTNL lock longer
>> >+ * than necessary (to prevent rename between size estimate and composing the
>>
>> I guess this description is not relevant anymore. I don't see why to
>> hold rtnl mutex for this function...
>
>You don't need it for this function, it's the other way around: unless
>you hold RTNL lock for the whole time covering both checking needed
>message size and filling the message - and we don't - the device could
>be renamed in between. Thus if we returned size based on current device
>name, it might not be sufficient at the time the header is filled.
>That's why this function returns maximum possible size (which is
>actually a constant).
I suggest to avoid the description. It is misleading. Perhaps something
to have in a patch description but not here in code.
>
>Michal
>
>> >+ * message). Accounts only for device ifindex and name as those are the only
>> >+ * attributes ethnl_fill_reply_header() puts into the reply header.
>> >+ */
>> >+static inline unsigned int ethnl_reply_header_size(void)
>> >+{
>> >+ return nla_total_size(nla_total_size(sizeof(u32)) +
>> >+ nla_total_size(IFNAMSIZ));
>> >+}
On Wed, Jul 03, 2019 at 12:04:35PM +0200, Jiri Pirko wrote:
> Tue, Jul 02, 2019 at 06:34:37PM CEST, [email protected] wrote:
> >On Tue, Jul 02, 2019 at 03:05:15PM +0200, Jiri Pirko wrote:
> >> Tue, Jul 02, 2019 at 01:50:04PM CEST, [email protected] wrote:
> >> >+
> >> >+ req_info->dev = dev;
> >> >+ ethnl_update_u32(&req_info->req_mask, tb[ETHTOOL_A_HEADER_INFOMASK]);
> >> >+ ethnl_update_u32(&req_info->global_flags, tb[ETHTOOL_A_HEADER_GFLAGS]);
> >> >+ ethnl_update_u32(&req_info->req_flags, tb[ETHTOOL_A_HEADER_RFLAGS]);
> >>
> >> Just:
> >> req_info->req_mask = nla_get_u32(tb[ETHTOOL_A_HEADER_INFOMASK];
> >> ...
> >>
> >> Not sure what ethnl_update_u32() is good for, but it is not needed here.
> >
> >That would result in null pointer dereference if the attribute is
> >missing. So you would need at least
> >
> > if (tb[ETHTOOL_A_HEADER_INFOMASK])
> > req_info->req_mask = nla_get_u32(tb[ETHTOOL_A_HEADER_INFOMASK]);
> > if (tb[ETHTOOL_A_HEADER_GFLAGS])
> > req_info->global_flags =
> > nla_get_u32(tb[ETHTOOL_A_HEADER_GFLAGS]);
> > if (tb[ETHTOOL_A_HEADER_RFLAGS])
> > req_info->req_flags = nla_get_u32(tb[ETHTOOL_A_HEADER_RFLAGS]);
>
> Yeah, sure.
>
> >
> >I don't think it looks better.
>
> Better than hiding something inside a helper in my opinion - helper that
> is there for different reason moreover. Much easier to read the code
> and follow.
OK, I'll use nla_get_u32() directly here. With the change below, use of
ethnl_update_u32() would really look unnatural.
> >> >+/* The ethnl_update_* helpers set value pointed to by @dst to the value of
> >> >+ * netlink attribute @attr (if attr is not null). They return true if *dst
> >> >+ * value was changed, false if not.
> >> >+ */
> >> >+static inline bool ethnl_update_u32(u32 *dst, struct nlattr *attr)
> >>
> >> I'm still not sure I'm convinced about these "update helpers" :)
> >
> >Just imagine what would e.g.
> >
> > if (ethnl_update_u32(&data.rx_pending, tb[ETHTOOL_A_RING_RX_PENDING]))
> > mod = true;
> > if (ethnl_update_u32(&data.rx_mini_pending,
> > tb[ETHTOOL_A_RING_RX_MINI_PENDING]))
> > mod = true;
> > if (ethnl_update_u32(&data.rx_jumbo_pending,
> > tb[ETHTOOL_A_RING_RX_JUMBO_PENDING]))
> > mod = true;
> > if (ethnl_update_u32(&data.tx_pending, tb[ETHTOOL_A_RING_TX_PENDING]))
> > mod = true;
> > if (!mod)
> > return 0;
> >
> >look like without them. And coalescing parameters would be much worse
> >(22 attributes / struct members).
>
> No, I understand your motivation, don't get me wrong. I just wonder that
> no other netlink implementation need such mechanism. Maybe I'm not
> looking close enough. But if it does, should be rathe netlink helper.
I'll check some existing interfaces to see how they handle "set" type
requests.
> Regarding the example code you have here. It is prefered to store
> function result in a variable "if check" that variable. But in your,
> code, couldn't this be done without ifs?
>
> bool mod = false;
>
> ethnl_update_u32(&mod, &data.rx_pending, tb[ETHTOOL_A_RING_RX_PENDING]))
> ethnl_update_u32(&mod, &data.rx_mini_pending,
> tb[ETHTOOL_A_RING_RX_MINI_PENDING]))
> ethnl_update_u32(&mod, &data.rx_jumbo_pending,
> tb[ETHTOOL_A_RING_RX_JUMBO_PENDING]))
> ethnl_update_u32(&mod, &data.tx_pending, tb[ETHTOOL_A_RING_TX_PENDING]))
>
> if (!mod)
> return 0;
Ah, right. Somehow I completely missed the possibility that update
helper can use "set of leave as it is" logic instead of "set to true or
false". Thanks, I'll rewrite the update helpers to this style.
Michal
Tue, Jul 02, 2019 at 01:50:09PM CEST, [email protected] wrote:
>The ethtool netlink code uses common framework for passing arbitrary
>length bit sets to allow future extensions. A bitset can be a list (only
>one bitmap) or can consist of value and mask pair (used e.g. when client
>want to modify only some bits). A bitset can use one of two formats:
>verbose (bit by bit) or compact.
>
>Verbose format consists of bitset size (number of bits), list flag and
>an array of bit nests, telling which bits are part of the list or which
>bits are in the mask and which of them are to be set. In requests, bits
>can be identified by index (position) or by name. In replies, kernel
>provides both index and name. Verbose format is suitable for "one shot"
>applications like standard ethtool command as it avoids the need to
>either keep bit names (e.g. link modes) in sync with kernel or having to
>add an extra roundtrip for string set request (e.g. for private flags).
>
>Compact format uses one (list) or two (value/mask) arrays of 32-bit
>words to store the bitmap(s). It is more suitable for long running
>applications (ethtool in monitor mode or network management daemons)
>which can retrieve the names once and then pass only compact bitmaps to
>save space.
>
>Userspace requests can use either format and ETHTOOL_RF_COMPACT flag in
>request header tells kernel which format to use in reply. Notifications
>always use compact format.
>
>Signed-off-by: Michal Kubecek <[email protected]>
>---
> Documentation/networking/ethtool-netlink.txt | 61 ++
> include/uapi/linux/ethtool_netlink.h | 35 ++
> net/ethtool/Makefile | 2 +-
> net/ethtool/bitset.c | 606 +++++++++++++++++++
> net/ethtool/bitset.h | 40 ++
> net/ethtool/netlink.h | 9 +
> 6 files changed, 752 insertions(+), 1 deletion(-)
> create mode 100644 net/ethtool/bitset.c
> create mode 100644 net/ethtool/bitset.h
>
>diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
>index 97c369aa290b..4636682c551f 100644
>--- a/Documentation/networking/ethtool-netlink.txt
>+++ b/Documentation/networking/ethtool-netlink.txt
>@@ -73,6 +73,67 @@ set, the behaviour is the same as (or closer to) the behaviour before it was
> introduced.
>
>
>+Bit sets
>+--------
>+
>+For short bitmaps of (reasonably) fixed length, standard NLA_BITFIELD32 type
>+is used. For arbitrary length bitmaps, ethtool netlink uses a nested attribute
>+with contents of one of two forms: compact (two binary bitmaps representing
>+bit values and mask of affected bits) and bit-by-bit (list of bits identified
>+by either index or name).
>+
>+Compact form: nested (bitset) atrribute contents:
>+
>+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
>+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
>+ ETHTOOL_A_BITSET_VALUE (binary) bitmap of bit values
>+ ETHTOOL_A_BITSET_MASK (binary) bitmap of valid bits
>+
>+Value and mask must have length at least ETHTOOL_A_BITSET_SIZE bits rounded up
>+to a multiple of 32 bits. They consist of 32-bit words in host byte order,
Looks like the blocks are similar to NLA_BITFIELD32. Why don't you user
nested array of NLA_BITFIELD32 instead?
>+words ordered from least significant to most significant (i.e. the same way as
>+bitmaps are passed with ioctl interface).
>+
>+For compact form, ETHTOOL_A_BITSET_SIZE and ETHTOOL_A_BITSET_VALUE are
>+mandatory. Similar to BITFIELD32, a compact form bit set requests to set bits
Double space^^
>+in the mask to 1 (if the bit is set in value) or 0 (if not) and preserve the
>+rest. If ETHTOOL_A_BITSET_LIST is present, there is no mask and bitset
>+represents a simple list of bits.
Okay, that is a bit confusing. Why not to rename to something like:
ETHTOOL_A_BITSET_NO_MASK (flag)
?
>+
>+Kernel bit set length may differ from userspace length if older application is
>+used on newer kernel or vice versa. If userspace bitmap is longer, an error is
>+issued only if the request actually tries to set values of some bits not
>+recognized by kernel.
>+
>+Bit-by-bit form: nested (bitset) attribute contents:
>+
>+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
>+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
>+ ETHTOOL_A_BITSET_BIT (nested) array of bits
>+ ETHTOOL_A_BITSET_BIT+ (nested) one bit
>+ ETHTOOL_A_BIT_INDEX (u32) bit index (0 for LSB)
>+ ETHTOOL_A_BIT_NAME (string) bit name
>+ ETHTOOL_A_BIT_VALUE (flag) present if bit is set
>+
>+Bit size is optional for bit-by-bit form. ETHTOOL_A_BITSET_BITS nest can only
>+contain ETHTOOL_A_BITS_BIT attributes but there can be an arbitrary number of
>+them. A bit may be identified by its index or by its name. When used in
>+requests, listed bits are set to 0 or 1 according to ETHTOOL_A_BIT_VALUE, the
>+rest is preserved. A request fails if index exceeds kernel bit length or if
>+name is not recognized.
>+
>+When ETHTOOL_A_BITSET_LIST flag is present, bitset is interpreted as a simple
>+bit list. ETHTOOL_A_BIT_VALUE attributes are not used in such case. Bit list
>+represents a bitmap with listed bits set and the rest zero.
>+
>+In requests, application can use either form. Form used by kernel in reply is
>+determined by a flag in flags field of request header. Semantics of value and
>+mask depends on the attribute. General idea is that flags control request
>+processing, info_mask control which parts of the information are returned in
>+"get" request and index identifies a particular subcommand or an object to
>+which the request applies.
This is quite complex and confusing. Having the same API for 2 APIs is
odd. The API should be crystal clear, easy to use.
Why can't you have 2 commands, one working with bit arrays only, one
working with strings? Something like:
X_GET
ETHTOOL_A_BITS (nested)
ETHTOOL_A_BIT_ARRAY (BITFIELD32)
X_NAMES_GET
ETHTOOL_A_BIT_NAMES (nested)
ETHTOOL_A_BIT_INDEX
ETHTOOL_A_BIT_NAME
For set, you can also have multiple cmds:
X_SET - to set many at once, by bit index
ETHTOOL_A_BITS (nested)
ETHTOOL_A_BIT_ARRAY (BITFIELD32)
X_ONE_SET - to set one, by bit index
ETHTOOL_A_BIT_INDEX
ETHTOOL_A_BIT_VALUE
X_ONE_SET - to set one, by name
ETHTOOL_A_BIT_NAME
ETHTOOL_A_BIT_VALUE
[...]
Tue, Jul 02, 2019 at 01:50:14PM CEST, [email protected] wrote:
>Add infrastructure for ethtool netlink notifications. There is only one
>multicast group "monitor" which is used to notify userspace about changes
>and actions performed. Notification messages (types using suffix _NTF)
>share the format with replies to GET requests.
>
>Notifications are supposed to be broadcasted on every configuration change,
>whether it is done using the netlink interface or ioctl one. Netlink SET
>requests only trigger a notification if some data is actually changed.
>
>To trigger an ethtool notification, both ethtool netlink and external code
>use ethtool_notify() helper. This helper requires RTNL to be held and may
>sleep. Handlers sending messages for specific notification message types
>are registered in ethnl_notify_handlers array. As notifications can be
>triggered from other code, ethnl_ok flag is used to prevent an attempt to
>send notification before genetlink family is registered.
>
>Signed-off-by: Michal Kubecek <[email protected]>
>---
> include/linux/ethtool_netlink.h | 5 ++++
> include/linux/netdevice.h | 12 ++++++++++
> include/uapi/linux/ethtool_netlink.h | 2 ++
> net/ethtool/netlink.c | 35 ++++++++++++++++++++++++++++
> 4 files changed, 54 insertions(+)
>
>diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h
>index 0412adb4f42f..2a15e64a16f3 100644
>--- a/include/linux/ethtool_netlink.h
>+++ b/include/linux/ethtool_netlink.h
>@@ -5,5 +5,10 @@
>
> #include <uapi/linux/ethtool_netlink.h>
> #include <linux/ethtool.h>
>+#include <linux/netdevice.h>
>+
>+enum ethtool_multicast_groups {
>+ ETHNL_MCGRP_MONITOR,
>+};
>
> #endif /* _LINUX_ETHTOOL_NETLINK_H_ */
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 88292953aa6f..c57d9917fd50 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -4350,6 +4350,18 @@ struct netdev_notifier_bonding_info {
> void netdev_bonding_info_change(struct net_device *dev,
> struct netdev_bonding_info *bonding_info);
>
>+#if IS_ENABLED(CONFIG_ETHTOOL_NETLINK)
>+void ethtool_notify(struct net_device *dev, struct netlink_ext_ack *extack,
>+ unsigned int cmd, u32 req_mask, const void *data);
>+#else
>+static inline void ethtool_notify(struct net_device *dev,
>+ struct netlink_ext_ack *extack,
>+ unsigned int cmd, u32 req_mask,
>+ const void *data)
>+{
>+}
>+#endif
>+
> static inline
> struct sk_buff *skb_gso_segment(struct sk_buff *skb, netdev_features_t features)
> {
>diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
>index 805f314f4454..8938a1f09057 100644
>--- a/include/uapi/linux/ethtool_netlink.h
>+++ b/include/uapi/linux/ethtool_netlink.h
>@@ -91,4 +91,6 @@ enum {
> #define ETHTOOL_GENL_NAME "ethtool"
> #define ETHTOOL_GENL_VERSION 1
>
>+#define ETHTOOL_MCGRP_MONITOR_NAME "monitor"
>+
> #endif /* _UAPI_LINUX_ETHTOOL_NETLINK_H_ */
>diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
>index e13f29bbd625..a7a0bfe1818c 100644
>--- a/net/ethtool/netlink.c
>+++ b/net/ethtool/netlink.c
>@@ -6,6 +6,8 @@
>
> static struct genl_family ethtool_genl_family;
>
>+static bool ethnl_ok __read_mostly;
>+
> static const struct nla_policy dflt_header_policy[ETHTOOL_A_HEADER_MAX + 1] = {
> [ETHTOOL_A_HEADER_UNSPEC] = { .type = NLA_REJECT },
> [ETHTOOL_A_HEADER_DEV_INDEX] = { .type = NLA_U32 },
>@@ -176,11 +178,41 @@ struct sk_buff *ethnl_reply_init(size_t payload, struct net_device *dev, u8 cmd,
> return NULL;
> }
>
>+/* notifications */
>+
>+typedef void (*ethnl_notify_handler_t)(struct net_device *dev,
>+ struct netlink_ext_ack *extack,
>+ unsigned int cmd, u32 req_mask,
>+ const void *data);
>+
>+static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
>+};
>+
>+void ethtool_notify(struct net_device *dev, struct netlink_ext_ack *extack,
>+ unsigned int cmd, u32 req_mask, const void *data)
What's "req_mask" ?
>+{
>+ if (unlikely(!ethnl_ok))
>+ return;
>+ ASSERT_RTNL();
>+
>+ if (likely(cmd < ARRAY_SIZE(ethnl_notify_handlers) &&
>+ ethnl_notify_handlers[cmd]))
How it could be null?
>+ ethnl_notify_handlers[cmd](dev, extack, cmd, req_mask, data);
>+ else
>+ WARN_ONCE(1, "notification %u not implemented (dev=%s, req_mask=0x%x)\n",
>+ cmd, netdev_name(dev), req_mask);
>+}
>+EXPORT_SYMBOL(ethtool_notify);
>+
> /* genetlink setup */
>
> static const struct genl_ops ethtool_genl_ops[] = {
> };
>
>+static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
>+ [ETHNL_MCGRP_MONITOR] = { .name = ETHTOOL_MCGRP_MONITOR_NAME },
>+};
>+
> static struct genl_family ethtool_genl_family = {
> .name = ETHTOOL_GENL_NAME,
> .version = ETHTOOL_GENL_VERSION,
>@@ -188,6 +220,8 @@ static struct genl_family ethtool_genl_family = {
> .parallel_ops = true,
> .ops = ethtool_genl_ops,
> .n_ops = ARRAY_SIZE(ethtool_genl_ops),
>+ .mcgrps = ethtool_nl_mcgrps,
>+ .n_mcgrps = ARRAY_SIZE(ethtool_nl_mcgrps),
> };
>
> /* module setup */
>@@ -199,6 +233,7 @@ static int __init ethnl_init(void)
> ret = genl_register_family(ðtool_genl_family);
> if (WARN(ret < 0, "ethtool: genetlink family registration failed"))
> return ret;
>+ ethnl_ok = true;
>
> return 0;
> }
>--
>2.22.0
>
On Tue, 2019-07-02 at 13:50 +0200, Michal Kubecek wrote:
>
> +static bool ethnl_ok __read_mostly;
Not sure it makes a big difference, but it could probably be
__ro_after_init instead?
johannes
Tue, Jul 02, 2019 at 01:50:19PM CEST, [email protected] wrote:
>Introduce file net/ethtool/common.c for code shared by ioctl and netlink
>ethtool interface. Move name tables of features, RSS hash functions,
>tunables and PHY tunables into this file.
>
>Signed-off-by: Michal Kubecek <[email protected]>
>---
> net/ethtool/Makefile | 2 +-
> net/ethtool/common.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
> net/ethtool/common.h | 17 +++++++++
> net/ethtool/ioctl.c | 83 ++-----------------------------------------
> 4 files changed, 104 insertions(+), 82 deletions(-)
> create mode 100644 net/ethtool/common.c
> create mode 100644 net/ethtool/common.h
>
>diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
>index 482fdb9380fa..11782306593b 100644
>--- a/net/ethtool/Makefile
>+++ b/net/ethtool/Makefile
>@@ -1,6 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0
>
>-obj-y += ioctl.o
>+obj-y += ioctl.o common.o
>
> obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
>
>diff --git a/net/ethtool/common.c b/net/ethtool/common.c
>new file mode 100644
>index 000000000000..b0ce420e994e
>--- /dev/null
>+++ b/net/ethtool/common.c
>@@ -0,0 +1,84 @@
>+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>+
>+#include "common.h"
>+
>+const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
const char *netdev_features_strings[NETDEV_FEATURE_COUNT] = {
?
Same with the other arrays.
[...]
On Wed, 2019-07-03 at 13:49 +0200, Jiri Pirko wrote:
>
> > +Value and mask must have length at least ETHTOOL_A_BITSET_SIZE bits rounded up
> > +to a multiple of 32 bits. They consist of 32-bit words in host byte order,
>
> Looks like the blocks are similar to NLA_BITFIELD32. Why don't you user
> nested array of NLA_BITFIELD32 instead?
That would seem kind of awkward to use, IMHO.
Perhaps better to make some kind of generic "arbitrary size bitfield"
attribute type?
Not really sure we want the complexity with _LIST and _SIZE, since you
should always be able to express it as _VALUE and _MASK, right?
Trying to think how we should express this best - bitfield32 is just a
mask/value struct, for arbitrary size I guess we *could* just make it
kind of a binary with arbitrary length that must be a multiple of 2
bytes (or 2 u32-bit-words?) and then the first half is the value and the
second half is the mask? Some more validation would be nicer, but having
a generic attribute that actually is nested is awkward too.
johannes
On Wed, Jul 03, 2019 at 03:33:52PM +0200, Jiri Pirko wrote:
> >+/* notifications */
> >+
> >+typedef void (*ethnl_notify_handler_t)(struct net_device *dev,
> >+ struct netlink_ext_ack *extack,
> >+ unsigned int cmd, u32 req_mask,
> >+ const void *data);
> >+
> >+static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
> >+};
> >+
> >+void ethtool_notify(struct net_device *dev, struct netlink_ext_ack *extack,
> >+ unsigned int cmd, u32 req_mask, const void *data)
>
> What's "req_mask" ?
It's infomask to interpret the same way as if it came from request
header (the notification triggered by a SET request or its ioctl
equivalent uses the same format as corresponding GET_REPLY message and
is created by the same code). But it could be called infomask, I have no
strong opinion about that.
> >+{
> >+ if (unlikely(!ethnl_ok))
> >+ return;
> >+ ASSERT_RTNL();
> >+
> >+ if (likely(cmd < ARRAY_SIZE(ethnl_notify_handlers) &&
> >+ ethnl_notify_handlers[cmd]))
>
> How it could be null?
Notification message types share the enum with other kernel messages:
/* message types - kernel to userspace */
enum {
ETHTOOL_MSG_KERNEL_NONE,
ETHTOOL_MSG_STRSET_GET_REPLY,
ETHTOOL_MSG_SETTINGS_GET_REPLY,
ETHTOOL_MSG_SETTINGS_NTF,
ETHTOOL_MSG_SETTINGS_SET_REPLY,
ETHTOOL_MSG_INFO_GET_REPLY,
ETHTOOL_MSG_PARAMS_GET_REPLY,
ETHTOOL_MSG_PARAMS_NTF,
ETHTOOL_MSG_NWAYRST_NTF,
ETHTOOL_MSG_PHYSID_NTF,
ETHTOOL_MSG_RESET_NTF,
ETHTOOL_MSG_RESET_ACT_REPLY,
ETHTOOL_MSG_RXFLOW_GET_REPLY,
ETHTOOL_MSG_RXFLOW_NTF,
ETHTOOL_MSG_RXFLOW_SET_REPLY,
/* add new constants above here */
__ETHTOOL_MSG_KERNEL_CNT,
ETHTOOL_MSG_KERNEL_MAX = (__ETHTOOL_MSG_KERNEL_CNT - 1)
};
Only entries for *_NTF types are non-null in ethnl_notify_handlers[]:
static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
[ETHTOOL_MSG_SETTINGS_NTF] = ethnl_std_notify,
[ETHTOOL_MSG_PARAMS_NTF] = ethnl_std_notify,
[ETHTOOL_MSG_NWAYRST_NTF] = ethnl_nwayrst_notify,
[ETHTOOL_MSG_PHYSID_NTF] = ethnl_physid_notify,
[ETHTOOL_MSG_RESET_NTF] = ethnl_reset_notify,
[ETHTOOL_MSG_RXFLOW_NTF] = ethnl_rxflow_notify,
};
If the check above fails, it means that kernel code tried to send
a notification with type which does not exist or is not a notification,
i.e. a bug in kernel; that's why the WARN_ONCE.
Michal
> >+ ethnl_notify_handlers[cmd](dev, extack, cmd, req_mask, data);
> >+ else
> >+ WARN_ONCE(1, "notification %u not implemented (dev=%s, req_mask=0x%x)\n",
> >+ cmd, netdev_name(dev), req_mask);
> >+}
> >+EXPORT_SYMBOL(ethtool_notify);
On Wed, Jul 03, 2019 at 03:39:54PM +0200, Johannes Berg wrote:
> On Tue, 2019-07-02 at 13:50 +0200, Michal Kubecek wrote:
> >
> > +static bool ethnl_ok __read_mostly;
>
> Not sure it makes a big difference, but it could probably be
> __ro_after_init instead?
Yes, that's more fitting; the flag is initialized to false, changes to
true once ethtool netlink is ready and never changes back. I wasn't
aware of __ro_after_init annotation.
Michal
Tue, Jul 02, 2019 at 01:50:24PM CEST, [email protected] wrote:
[...]
>+/* generic ->doit() handler for GET type requests */
>+static int ethnl_get_doit(struct sk_buff *skb, struct genl_info *info)
It is very unfortunate for review to introduce function in a patch and
don't use it. In general, this approach is frowned upon. You should use
whatever you introduce in the same patch. I understand it is sometimes
hard.
IIUC, you have one ethnl_get_doit for all possible commands, and you
have this ops to do cmd-specific tasks. That is quite unusual. Plus if
you consider the complicated datastructures connected with this,
I'm lost from the beginning :( Any particular reason form this indirection?
I don't think any other generic netlink code does that (correct me if
I'm wrong). The nice thing about generic netlink is the fact that
you have separate handlers per cmd.
I don't think you need these ops and indirections. For the common parts,
just have a set of common helpers, as the other generic netlink users
are doing. The code would be much easier to read and follow then.
>+{
>+ const u8 cmd = info->genlhdr->cmd;
>+ const struct get_request_ops *ops;
>+ struct ethnl_req_info *req_info;
>+ struct sk_buff *rskb;
>+ void *reply_payload;
>+ int reply_len;
>+ int ret;
>+
>+ ops = get_requests[cmd];
>+ if (WARN_ONCE(!ops, "cmd %u has no get_request_ops\n", cmd))
>+ return -EOPNOTSUPP;
>+ req_info = ethnl_alloc_get_data(ops);
>+ if (!req_info)
>+ return -ENOMEM;
>+ ret = ethnl_std_parse(req_info, info->nlhdr, genl_info_net(info), ops,
>+ info->extack, !ops->allow_nodev_do);
>+ if (ret < 0)
>+ goto err_dev;
>+ req_info->privileged = ethnl_is_privileged(skb);
>+ ethnl_init_reply_data(req_info, ops, req_info->dev);
>+
>+ rtnl_lock();
>+ ret = ops->prepare_data(req_info, info);
>+ if (ret < 0)
>+ goto err_rtnl;
>+ reply_len = ops->reply_size(req_info);
>+ if (ret < 0)
>+ goto err_cleanup;
>+ ret = -ENOMEM;
>+ rskb = ethnl_reply_init(reply_len, req_info->dev, ops->reply_cmd,
>+ ops->hdr_attr, info, &reply_payload);
>+ if (!rskb)
>+ goto err_cleanup;
>+ ret = ops->fill_reply(rskb, req_info);
>+ if (ret < 0)
>+ goto err_msg;
>+ rtnl_unlock();
>+
>+ genlmsg_end(rskb, reply_payload);
>+ if (req_info->dev)
>+ dev_put(req_info->dev);
>+ ethnl_free_get_data(ops, req_info);
>+ return genlmsg_reply(rskb, info);
>+
>+err_msg:
>+ WARN_ONCE(ret == -EMSGSIZE,
>+ "calculated message payload length (%d) not sufficient\n",
>+ reply_len);
>+ nlmsg_free(rskb);
>+err_cleanup:
>+ ethnl_free_get_data(ops, req_info);
>+err_rtnl:
>+ rtnl_unlock();
>+err_dev:
>+ if (req_info->dev)
>+ dev_put(req_info->dev);
>+ return ret;
>+}
[...]
Wed, Jul 03, 2019 at 03:44:57PM CEST, [email protected] wrote:
>On Wed, 2019-07-03 at 13:49 +0200, Jiri Pirko wrote:
>>
>> > +Value and mask must have length at least ETHTOOL_A_BITSET_SIZE bits rounded up
>> > +to a multiple of 32 bits. They consist of 32-bit words in host byte order,
>>
>> Looks like the blocks are similar to NLA_BITFIELD32. Why don't you user
>> nested array of NLA_BITFIELD32 instead?
>
>That would seem kind of awkward to use, IMHO.
>
>Perhaps better to make some kind of generic "arbitrary size bitfield"
>attribute type?
Yep, I believe I was trying to make this point during bitfield32
discussion, failed apparently. So if we have "NLA_BITFIELD" with
arbitrary size, that sounds good to me.
>
>Not really sure we want the complexity with _LIST and _SIZE, since you
>should always be able to express it as _VALUE and _MASK, right?
>
>Trying to think how we should express this best - bitfield32 is just a
>mask/value struct, for arbitrary size I guess we *could* just make it
>kind of a binary with arbitrary length that must be a multiple of 2
>bytes (or 2 u32-bit-words?) and then the first half is the value and the
>second half is the mask? Some more validation would be nicer, but having
>a generic attribute that actually is nested is awkward too.
>
>johannes
>
>
On Wed, Jul 03, 2019 at 03:44:52PM +0200, Jiri Pirko wrote:
> Tue, Jul 02, 2019 at 01:50:19PM CEST, [email protected] wrote:
> >Introduce file net/ethtool/common.c for code shared by ioctl and netlink
> >ethtool interface. Move name tables of features, RSS hash functions,
> >tunables and PHY tunables into this file.
> >
> >Signed-off-by: Michal Kubecek <[email protected]>
> >---
> > net/ethtool/Makefile | 2 +-
> > net/ethtool/common.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
> > net/ethtool/common.h | 17 +++++++++
> > net/ethtool/ioctl.c | 83 ++-----------------------------------------
> > 4 files changed, 104 insertions(+), 82 deletions(-)
> > create mode 100644 net/ethtool/common.c
> > create mode 100644 net/ethtool/common.h
> >
> >diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
> >index 482fdb9380fa..11782306593b 100644
> >--- a/net/ethtool/Makefile
> >+++ b/net/ethtool/Makefile
> >@@ -1,6 +1,6 @@
> > # SPDX-License-Identifier: GPL-2.0
> >
> >-obj-y += ioctl.o
> >+obj-y += ioctl.o common.o
> >
> > obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
> >
> >diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> >new file mode 100644
> >index 000000000000..b0ce420e994e
> >--- /dev/null
> >+++ b/net/ethtool/common.c
> >@@ -0,0 +1,84 @@
> >+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
> >+
> >+#include "common.h"
> >+
> >+const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
>
> const char *netdev_features_strings[NETDEV_FEATURE_COUNT] = {
> ?
>
> Same with the other arrays.
These are not new tables, this patch only moves existing tables from
ioctl.c (originally net/core/ethtool.c) into common.c so that they can
be used by both ioctl and netlink code.
This fixed size string array format is used by ETHTOOL_GSTRINGS ioctl
command. So if we switch these into simple const char *table[], we can
get rid of some complexity in strset.c and bitset.c (the "simple" vs.
"legacy" string set mess) but we would have to convert them into the
fixed size string array in ioctl ETHTOOL_GSTRINGS handler. And then we
would also have to convert (or rather "index") string sets retrieved
from NIC driver (e.g. private flags, stats, tests) - which also means an
extra kmalloc() (or rather kmalloc_array()).
It an option I'm certainly open to if we agree on it but it's not for
free.
Michal
On Wed, Jul 03, 2019 at 04:25:10PM +0200, Jiri Pirko wrote:
> Tue, Jul 02, 2019 at 01:50:24PM CEST, [email protected] wrote:
>
> [...]
>
> >+/* generic ->doit() handler for GET type requests */
> >+static int ethnl_get_doit(struct sk_buff *skb, struct genl_info *info)
>
> It is very unfortunate for review to introduce function in a patch and
> don't use it. In general, this approach is frowned upon. You should use
> whatever you introduce in the same patch. I understand it is sometimes
> hard.
It's not as if I introduced something and didn't show how to use it.
First use is in the very next patch so if you insist on reading each
patch separately without context, just combine 09/15 and 10/15 together;
the overlap is minimal (10/15 adds an entry into get_requests[]
introduced in 09/15).
I could have done that myself but the resulting patch would add over
1000 lines (also something frown upon in general) and if someone asked
if it could be split, the only honest answer I could give would be:
"Of course it should be split, it consists of two completely logically
separated parts (which are also 99% separated in code)."
> IIUC, you have one ethnl_get_doit for all possible commands, and you
Not all of them, only GET requests (and related notifications) and out
of them, only those which fit the common pattern. There will be e.g. Rx
rules and stats (maybe others) where dump request won't be iterating
through devices so that they will need at least their own dumpit
handler.
> have this ops to do cmd-specific tasks. That is quite unusual. Plus if
> you consider the complicated datastructures connected with this,
> I'm lost from the beginning :( Any particular reason form this indirection?
> I don't think any other generic netlink code does that (correct me if
> I'm wrong). The nice thing about generic netlink is the fact that
> you have separate handlers per cmd.
>
> I don't think you need these ops and indirections. For the common parts,
> just have a set of common helpers, as the other generic netlink users
> are doing. The code would be much easier to read and follow then.
As I said last time, what you suggest is going back to what I already
had in the early versions; so I have pretty good idea what the result
would look like.
I could go that way, having a separate main handler for each request
type and call common helpers from it. But as there would always be
a doit() handler, a dumpit() handler and mostly also a notification
handler, I would have to factor out the functions which are now
callbacks in struct get_request_ops anyway. To avoid too many
parameters, I would end up with structures very similar to what I have
now. (Not really "I would", the structures were already there, the only
difference was that the "request" and "data" parts were two structures
rather than one.)
So at the moment, I would have 5 functions looking almost the same as
ethnl_get_doit(), 5 functions looking almost as ethnl_get_dumpit() and
2 functions looking like ethnl_std_notify(), with the prospect of more
to be added. Any change in the logic would need to be repeated for all
of them. Moreover, you also proposed (or rather requested) to drop the
infomask concept and split the message types into multiple separate
ones. With that change, the number of almost copies would be 21 doit(),
21 dumpit() and 13 notification handlers (for now, that is).
I'm also not happy about the way typical GET and SET request processing
looks now. But I would much rather go in the opposite direction: define
relationship between message attributes and data structure members so
that most of the size estimate, data prepare, message fill and data
update functions which are all repeating the same pattern could be
replaced by universal functions doing these actions according to the
description. The direction you suggest is the direction I came from.
Seriously, I don't know what to think. Anywhere I look, return code is
checked with "if (ret < 0)" (sure, some use "if (ret)" but it's
certainly not prevalent or universally preferred, more like 1:1), now
you tell me it's wrong. Networking stack is full of simple helpers and
wrappers, yet you keep telling me simple wrappers are wrong. Networking
stack is full of abstractions and ops, you tell me it's wrong. It's
really confusing...
Michal
On Wed, Jul 03, 2019 at 01:49:33PM +0200, Jiri Pirko wrote:
> Tue, Jul 02, 2019 at 01:50:09PM CEST, [email protected] wrote:
> >diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
> >index 97c369aa290b..4636682c551f 100644
> >--- a/Documentation/networking/ethtool-netlink.txt
> >+++ b/Documentation/networking/ethtool-netlink.txt
> >@@ -73,6 +73,67 @@ set, the behaviour is the same as (or closer to) the behaviour before it was
> > introduced.
> >
> >
> >+Bit sets
> >+--------
> >+
> >+For short bitmaps of (reasonably) fixed length, standard NLA_BITFIELD32 type
> >+is used. For arbitrary length bitmaps, ethtool netlink uses a nested attribute
> >+with contents of one of two forms: compact (two binary bitmaps representing
> >+bit values and mask of affected bits) and bit-by-bit (list of bits identified
> >+by either index or name).
> >+
> >+Compact form: nested (bitset) atrribute contents:
> >+
> >+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
> >+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
> >+ ETHTOOL_A_BITSET_VALUE (binary) bitmap of bit values
> >+ ETHTOOL_A_BITSET_MASK (binary) bitmap of valid bits
> >+
> >+Value and mask must have length at least ETHTOOL_A_BITSET_SIZE bits rounded up
> >+to a multiple of 32 bits. They consist of 32-bit words in host byte order,
>
> Looks like the blocks are similar to NLA_BITFIELD32. Why don't you user
> nested array of NLA_BITFIELD32 instead?
That would mean a layout like
4 bytes of attr header
4 bytes of value
4 bytes of mask
4 bytes of attr header
4 bytes of value
4 bytes of mask
...
i.e. interleaved headers, words of value and words of mask. Having value
and mask contiguous looks cleaner to me. Also, I can quickly check the
sizes without iterating through a (potentially long) array.
> >+words ordered from least significant to most significant (i.e. the same way as
> >+bitmaps are passed with ioctl interface).
> >+
> >+For compact form, ETHTOOL_A_BITSET_SIZE and ETHTOOL_A_BITSET_VALUE are
> >+mandatory. Similar to BITFIELD32, a compact form bit set requests to set bits
>
> Double space^^
Hm, I have to learn how to tell vim not to do that with "gq".
> >+in the mask to 1 (if the bit is set in value) or 0 (if not) and preserve the
> >+rest. If ETHTOOL_A_BITSET_LIST is present, there is no mask and bitset
> >+represents a simple list of bits.
>
> Okay, that is a bit confusing. Why not to rename to something like:
> ETHTOOL_A_BITSET_NO_MASK (flag)
> ?
From the logical point of view, it's used for lists - list of link
modes, list of netdev features, list of timestamping modes etc.
The point is that in userspace requests, we sometimes want to change
some values (enable A, disable B), sometimes to define the list of
values to be set (I want (only) A, C and E to be enabled). In kernel
replies, sometimes there is a natural value/mask pairing (e.g.
advertised and supported link modes, enabled and supported WoL modes)
but often there is just one bitmap.
> >+Kernel bit set length may differ from userspace length if older application is
> >+used on newer kernel or vice versa. If userspace bitmap is longer, an error is
> >+issued only if the request actually tries to set values of some bits not
> >+recognized by kernel.
> >+
> >+Bit-by-bit form: nested (bitset) attribute contents:
> >+
> >+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
> >+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
> >+ ETHTOOL_A_BITSET_BIT (nested) array of bits
> >+ ETHTOOL_A_BITSET_BIT+ (nested) one bit
> >+ ETHTOOL_A_BIT_INDEX (u32) bit index (0 for LSB)
> >+ ETHTOOL_A_BIT_NAME (string) bit name
> >+ ETHTOOL_A_BIT_VALUE (flag) present if bit is set
> >+
> >+Bit size is optional for bit-by-bit form. ETHTOOL_A_BITSET_BITS nest can only
> >+contain ETHTOOL_A_BITS_BIT attributes but there can be an arbitrary number of
> >+them. A bit may be identified by its index or by its name. When used in
> >+requests, listed bits are set to 0 or 1 according to ETHTOOL_A_BIT_VALUE, the
> >+rest is preserved. A request fails if index exceeds kernel bit length or if
> >+name is not recognized.
> >+
> >+When ETHTOOL_A_BITSET_LIST flag is present, bitset is interpreted as a simple
> >+bit list. ETHTOOL_A_BIT_VALUE attributes are not used in such case. Bit list
> >+represents a bitmap with listed bits set and the rest zero.
> >+
> >+In requests, application can use either form. Form used by kernel in reply is
> >+determined by a flag in flags field of request header. Semantics of value and
> >+mask depends on the attribute. General idea is that flags control request
> >+processing, info_mask control which parts of the information are returned in
> >+"get" request and index identifies a particular subcommand or an object to
> >+which the request applies.
>
> This is quite complex and confusing. Having the same API for 2 APIs is
> odd. The API should be crystal clear, easy to use.
>
> Why can't you have 2 commands, one working with bit arrays only, one
> working with strings? Something like:
> X_GET
> ETHTOOL_A_BITS (nested)
> ETHTOOL_A_BIT_ARRAY (BITFIELD32)
> X_NAMES_GET
> ETHTOOL_A_BIT_NAMES (nested)
> ETHTOOL_A_BIT_INDEX
> ETHTOOL_A_BIT_NAME
>
> For set, you can also have multiple cmds:
> X_SET - to set many at once, by bit index
> ETHTOOL_A_BITS (nested)
> ETHTOOL_A_BIT_ARRAY (BITFIELD32)
> X_ONE_SET - to set one, by bit index
> ETHTOOL_A_BIT_INDEX
> ETHTOOL_A_BIT_VALUE
> X_ONE_SET - to set one, by name
> ETHTOOL_A_BIT_NAME
> ETHTOOL_A_BIT_VALUE
This looks as if you assume there is nothing except the bitset in the
message but that is not true. Even with your proposed breaking of
current groups, you would still have e.g. 4 bitsets in reply to netdev
features query, 3 in timestamping info GET request and often bitsets
combined with other data (e.g. WoL modes and optional WoL password).
If you wanted to further refine the message granularity to the level of
single parameters, we might be out of message type ids already.
Unless you want to forget about structured data completely and turn
everything into tunables - but that's rather scary idea.
Michal
Wed, Jul 03, 2019 at 08:18:51PM CEST, [email protected] wrote:
>On Wed, Jul 03, 2019 at 01:49:33PM +0200, Jiri Pirko wrote:
>> Tue, Jul 02, 2019 at 01:50:09PM CEST, [email protected] wrote:
>> >diff --git a/Documentation/networking/ethtool-netlink.txt b/Documentation/networking/ethtool-netlink.txt
>> >index 97c369aa290b..4636682c551f 100644
>> >--- a/Documentation/networking/ethtool-netlink.txt
>> >+++ b/Documentation/networking/ethtool-netlink.txt
>> >@@ -73,6 +73,67 @@ set, the behaviour is the same as (or closer to) the behaviour before it was
>> > introduced.
>> >
>> >
>> >+Bit sets
>> >+--------
>> >+
>> >+For short bitmaps of (reasonably) fixed length, standard NLA_BITFIELD32 type
>> >+is used. For arbitrary length bitmaps, ethtool netlink uses a nested attribute
>> >+with contents of one of two forms: compact (two binary bitmaps representing
>> >+bit values and mask of affected bits) and bit-by-bit (list of bits identified
>> >+by either index or name).
>> >+
>> >+Compact form: nested (bitset) atrribute contents:
>> >+
>> >+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
>> >+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
>> >+ ETHTOOL_A_BITSET_VALUE (binary) bitmap of bit values
>> >+ ETHTOOL_A_BITSET_MASK (binary) bitmap of valid bits
>> >+
>> >+Value and mask must have length at least ETHTOOL_A_BITSET_SIZE bits rounded up
>> >+to a multiple of 32 bits. They consist of 32-bit words in host byte order,
>>
>> Looks like the blocks are similar to NLA_BITFIELD32. Why don't you user
>> nested array of NLA_BITFIELD32 instead?
>
>That would mean a layout like
>
> 4 bytes of attr header
> 4 bytes of value
> 4 bytes of mask
> 4 bytes of attr header
> 4 bytes of value
> 4 bytes of mask
> ...
>
>i.e. interleaved headers, words of value and words of mask. Having value
>and mask contiguous looks cleaner to me. Also, I can quickly check the
>sizes without iterating through a (potentially long) array.
Yeah, if you are not happy with this, I suggest to introduce
NLA_BITFIELD with arbitrary size. That would be probably cleanest.
>
>> >+words ordered from least significant to most significant (i.e. the same way as
>> >+bitmaps are passed with ioctl interface).
>> >+
>> >+For compact form, ETHTOOL_A_BITSET_SIZE and ETHTOOL_A_BITSET_VALUE are
>> >+mandatory. Similar to BITFIELD32, a compact form bit set requests to set bits
>>
>> Double space^^
>
>Hm, I have to learn how to tell vim not to do that with "gq".
>
>> >+in the mask to 1 (if the bit is set in value) or 0 (if not) and preserve the
>> >+rest. If ETHTOOL_A_BITSET_LIST is present, there is no mask and bitset
>> >+represents a simple list of bits.
>>
>> Okay, that is a bit confusing. Why not to rename to something like:
>> ETHTOOL_A_BITSET_NO_MASK (flag)
>> ?
>
>From the logical point of view, it's used for lists - list of link
>modes, list of netdev features, list of timestamping modes etc.
>
>The point is that in userspace requests, we sometimes want to change
>some values (enable A, disable B), sometimes to define the list of
>values to be set (I want (only) A, C and E to be enabled). In kernel
>replies, sometimes there is a natural value/mask pairing (e.g.
>advertised and supported link modes, enabled and supported WoL modes)
>but often there is just one bitmap.
>
>> >+Kernel bit set length may differ from userspace length if older application is
>> >+used on newer kernel or vice versa. If userspace bitmap is longer, an error is
>> >+issued only if the request actually tries to set values of some bits not
>> >+recognized by kernel.
>> >+
>> >+Bit-by-bit form: nested (bitset) attribute contents:
>> >+
>> >+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
>> >+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
>> >+ ETHTOOL_A_BITSET_BIT (nested) array of bits
>> >+ ETHTOOL_A_BITSET_BIT+ (nested) one bit
>> >+ ETHTOOL_A_BIT_INDEX (u32) bit index (0 for LSB)
>> >+ ETHTOOL_A_BIT_NAME (string) bit name
>> >+ ETHTOOL_A_BIT_VALUE (flag) present if bit is set
>> >+
>> >+Bit size is optional for bit-by-bit form. ETHTOOL_A_BITSET_BITS nest can only
>> >+contain ETHTOOL_A_BITS_BIT attributes but there can be an arbitrary number of
>> >+them. A bit may be identified by its index or by its name. When used in
>> >+requests, listed bits are set to 0 or 1 according to ETHTOOL_A_BIT_VALUE, the
>> >+rest is preserved. A request fails if index exceeds kernel bit length or if
>> >+name is not recognized.
>> >+
>> >+When ETHTOOL_A_BITSET_LIST flag is present, bitset is interpreted as a simple
>> >+bit list. ETHTOOL_A_BIT_VALUE attributes are not used in such case. Bit list
>> >+represents a bitmap with listed bits set and the rest zero.
>> >+
>> >+In requests, application can use either form. Form used by kernel in reply is
>> >+determined by a flag in flags field of request header. Semantics of value and
>> >+mask depends on the attribute. General idea is that flags control request
>> >+processing, info_mask control which parts of the information are returned in
>> >+"get" request and index identifies a particular subcommand or an object to
>> >+which the request applies.
>>
>> This is quite complex and confusing. Having the same API for 2 APIs is
>> odd. The API should be crystal clear, easy to use.
>>
>> Why can't you have 2 commands, one working with bit arrays only, one
>> working with strings? Something like:
>> X_GET
>> ETHTOOL_A_BITS (nested)
>> ETHTOOL_A_BIT_ARRAY (BITFIELD32)
>> X_NAMES_GET
>> ETHTOOL_A_BIT_NAMES (nested)
>> ETHTOOL_A_BIT_INDEX
>> ETHTOOL_A_BIT_NAME
>>
>> For set, you can also have multiple cmds:
>> X_SET - to set many at once, by bit index
>> ETHTOOL_A_BITS (nested)
>> ETHTOOL_A_BIT_ARRAY (BITFIELD32)
>> X_ONE_SET - to set one, by bit index
>> ETHTOOL_A_BIT_INDEX
>> ETHTOOL_A_BIT_VALUE
>> X_ONE_SET - to set one, by name
>> ETHTOOL_A_BIT_NAME
>> ETHTOOL_A_BIT_VALUE
>
>This looks as if you assume there is nothing except the bitset in the
>message but that is not true. Even with your proposed breaking of
>current groups, you would still have e.g. 4 bitsets in reply to netdev
>features query, 3 in timestamping info GET request and often bitsets
>combined with other data (e.g. WoL modes and optional WoL password).
>If you wanted to further refine the message granularity to the level of
>single parameters, we might be out of message type ids already.
You can still have multiple bitsets(bitfields) in single message and
have separate cmd/cmds to get string-bit mapping. No need to mangle it.
>
>Unless you want to forget about structured data completely and turn
>everything into tunables - but that's rather scary idea.
>
>Michal
Wed, Jul 03, 2019 at 04:16:14PM CEST, [email protected] wrote:
>On Wed, Jul 03, 2019 at 03:33:52PM +0200, Jiri Pirko wrote:
>> >+/* notifications */
>> >+
>> >+typedef void (*ethnl_notify_handler_t)(struct net_device *dev,
>> >+ struct netlink_ext_ack *extack,
>> >+ unsigned int cmd, u32 req_mask,
>> >+ const void *data);
>> >+
>> >+static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
>> >+};
>> >+
>> >+void ethtool_notify(struct net_device *dev, struct netlink_ext_ack *extack,
>> >+ unsigned int cmd, u32 req_mask, const void *data)
>>
>> What's "req_mask" ?
>
>It's infomask to interpret the same way as if it came from request
>header (the notification triggered by a SET request or its ioctl
>equivalent uses the same format as corresponding GET_REPLY message and
>is created by the same code). But it could be called infomask, I have no
>strong opinion about that.
The name should be same all along the code so the reader can track it.
>
>> >+{
>> >+ if (unlikely(!ethnl_ok))
>> >+ return;
>> >+ ASSERT_RTNL();
>> >+
>> >+ if (likely(cmd < ARRAY_SIZE(ethnl_notify_handlers) &&
>> >+ ethnl_notify_handlers[cmd]))
>>
>> How it could be null?
>
>Notification message types share the enum with other kernel messages:
>
>/* message types - kernel to userspace */
>enum {
> ETHTOOL_MSG_KERNEL_NONE,
> ETHTOOL_MSG_STRSET_GET_REPLY,
> ETHTOOL_MSG_SETTINGS_GET_REPLY,
> ETHTOOL_MSG_SETTINGS_NTF,
> ETHTOOL_MSG_SETTINGS_SET_REPLY,
> ETHTOOL_MSG_INFO_GET_REPLY,
> ETHTOOL_MSG_PARAMS_GET_REPLY,
> ETHTOOL_MSG_PARAMS_NTF,
> ETHTOOL_MSG_NWAYRST_NTF,
> ETHTOOL_MSG_PHYSID_NTF,
> ETHTOOL_MSG_RESET_NTF,
> ETHTOOL_MSG_RESET_ACT_REPLY,
> ETHTOOL_MSG_RXFLOW_GET_REPLY,
> ETHTOOL_MSG_RXFLOW_NTF,
> ETHTOOL_MSG_RXFLOW_SET_REPLY,
>
> /* add new constants above here */
> __ETHTOOL_MSG_KERNEL_CNT,
> ETHTOOL_MSG_KERNEL_MAX = (__ETHTOOL_MSG_KERNEL_CNT - 1)
>};
>
>Only entries for *_NTF types are non-null in ethnl_notify_handlers[]:
>
>static const ethnl_notify_handler_t ethnl_notify_handlers[] = {
> [ETHTOOL_MSG_SETTINGS_NTF] = ethnl_std_notify,
> [ETHTOOL_MSG_PARAMS_NTF] = ethnl_std_notify,
> [ETHTOOL_MSG_NWAYRST_NTF] = ethnl_nwayrst_notify,
> [ETHTOOL_MSG_PHYSID_NTF] = ethnl_physid_notify,
> [ETHTOOL_MSG_RESET_NTF] = ethnl_reset_notify,
> [ETHTOOL_MSG_RXFLOW_NTF] = ethnl_rxflow_notify,
>};
>
>If the check above fails, it means that kernel code tried to send
>a notification with type which does not exist or is not a notification,
>i.e. a bug in kernel; that's why the WARN_ONCE.
Got it, thanks!
>
>Michal
>
>> >+ ethnl_notify_handlers[cmd](dev, extack, cmd, req_mask, data);
>> >+ else
>> >+ WARN_ONCE(1, "notification %u not implemented (dev=%s, req_mask=0x%x)\n",
>> >+ cmd, netdev_name(dev), req_mask);
>> >+}
>> >+EXPORT_SYMBOL(ethtool_notify);
Wed, Jul 03, 2019 at 04:37:22PM CEST, [email protected] wrote:
>On Wed, Jul 03, 2019 at 03:44:52PM +0200, Jiri Pirko wrote:
>> Tue, Jul 02, 2019 at 01:50:19PM CEST, [email protected] wrote:
>> >Introduce file net/ethtool/common.c for code shared by ioctl and netlink
>> >ethtool interface. Move name tables of features, RSS hash functions,
>> >tunables and PHY tunables into this file.
>> >
>> >Signed-off-by: Michal Kubecek <[email protected]>
>> >---
>> > net/ethtool/Makefile | 2 +-
>> > net/ethtool/common.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
>> > net/ethtool/common.h | 17 +++++++++
>> > net/ethtool/ioctl.c | 83 ++-----------------------------------------
>> > 4 files changed, 104 insertions(+), 82 deletions(-)
>> > create mode 100644 net/ethtool/common.c
>> > create mode 100644 net/ethtool/common.h
>> >
>> >diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
>> >index 482fdb9380fa..11782306593b 100644
>> >--- a/net/ethtool/Makefile
>> >+++ b/net/ethtool/Makefile
>> >@@ -1,6 +1,6 @@
>> > # SPDX-License-Identifier: GPL-2.0
>> >
>> >-obj-y += ioctl.o
>> >+obj-y += ioctl.o common.o
>> >
>> > obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
>> >
>> >diff --git a/net/ethtool/common.c b/net/ethtool/common.c
>> >new file mode 100644
>> >index 000000000000..b0ce420e994e
>> >--- /dev/null
>> >+++ b/net/ethtool/common.c
>> >@@ -0,0 +1,84 @@
>> >+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>> >+
>> >+#include "common.h"
>> >+
>> >+const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
>>
>> const char *netdev_features_strings[NETDEV_FEATURE_COUNT] = {
>> ?
>>
>> Same with the other arrays.
>
>These are not new tables, this patch only moves existing tables from
>ioctl.c (originally net/core/ethtool.c) into common.c so that they can
>be used by both ioctl and netlink code.
>
>This fixed size string array format is used by ETHTOOL_GSTRINGS ioctl
>command. So if we switch these into simple const char *table[], we can
>get rid of some complexity in strset.c and bitset.c (the "simple" vs.
>"legacy" string set mess) but we would have to convert them into the
>fixed size string array in ioctl ETHTOOL_GSTRINGS handler. And then we
>would also have to convert (or rather "index") string sets retrieved
>from NIC driver (e.g. private flags, stats, tests) - which also means an
>extra kmalloc() (or rather kmalloc_array()).
>
>It an option I'm certainly open to if we agree on it but it's not for
>free.
Got it. I don't think we need to do this now. But it would be certainly
nice to fix this later on.
>
>Michal
Tue, Jul 02, 2019 at 01:50:29PM CEST, [email protected] wrote:
[...]
>@@ -87,6 +89,64 @@ enum {
> ETHTOOL_A_BITSET_MAX = (__ETHTOOL_A_BITSET_CNT - 1)
You don't need "()". Same for the others below.
> };
>
>+/* string sets */
>+
>+enum {
>+ ETHTOOL_A_STRING_UNSPEC,
>+ ETHTOOL_A_STRING_INDEX, /* u32 */
>+ ETHTOOL_A_STRING_VALUE, /* string */
>+
>+ /* add new constants above here */
>+ __ETHTOOL_A_STRING_CNT,
>+ ETHTOOL_A_STRING_MAX = (__ETHTOOL_A_STRING_CNT - 1)
>+};
>+
>+enum {
>+ ETHTOOL_A_STRINGS_UNSPEC,
>+ ETHTOOL_A_STRINGS_STRING, /* nest - _A_STRINGS_* */
>+
>+ /* add new constants above here */
>+ __ETHTOOL_A_STRINGS_CNT,
>+ ETHTOOL_A_STRINGS_MAX = (__ETHTOOL_A_STRINGS_CNT - 1)
>+};
>+
>+enum {
>+ ETHTOOL_A_STRINGSET_UNSPEC,
>+ ETHTOOL_A_STRINGSET_ID, /* u32 */
>+ ETHTOOL_A_STRINGSET_COUNT, /* u32 */
>+ ETHTOOL_A_STRINGSET_STRINGS, /* nest - _A_STRINGS_* */
>+
>+ /* add new constants above here */
>+ __ETHTOOL_A_STRINGSET_CNT,
>+ ETHTOOL_A_STRINGSET_MAX = (__ETHTOOL_A_STRINGSET_CNT - 1)
>+};
>+
>+/* STRSET */
>+
>+enum {
>+ ETHTOOL_A_STRSET_UNSPEC,
>+ ETHTOOL_A_STRSET_HEADER, /* nest - _A_HEADER_* */
>+ ETHTOOL_A_STRSET_STRINGSETS, /* nest - _A_STRINGSETS_* */
>+
>+ /* add new constants above here */
>+ __ETHTOOL_A_STRSET_CNT,
>+ ETHTOOL_A_STRSET_MAX = (__ETHTOOL_A_STRSET_CNT - 1)
>+};
>+
>+enum {
>+ ETHTOOL_A_STRINGSETS_UNSPEC,
>+ ETHTOOL_A_STRINGSETS_STRINGSET, /* nest - _A_STRINGSET_* */
>+
>+ /* add new constants above here */
>+ __ETHTOOL_A_STRINGSETS_CNT,
>+ ETHTOOL_A_STRINGSETS_MAX = (__ETHTOOL_A_STRINGSETS_CNT - 1)
>+};
>+
[...]
>+ nla_for_each_nested(attr, nest, rem) {
>+ u32 id;
>+
>+ if (WARN_ONCE(nla_type(attr) != ETHTOOL_A_STRINGSETS_STRINGSET,
>+ "unexpected attrtype %u in ETHTOOL_A_STRSET_STRINGSETS\n",
>+ nla_type(attr)))
>+ return -EINVAL;
>+
>+ ret = strset_get_id(attr, &id, extack);
>+ if (ret < 0)
>+ return ret;
>+ if (ret >= ETH_SS_COUNT) {
>+ NL_SET_ERR_MSG_ATTR(extack, attr,
>+ "unknown string set id");
>+ return -EOPNOTSUPP;
>+ }
>+
>+ data->req_ids |= (1U << id);
You don't need "()" here either.
[...]
Wed, Jul 03, 2019 at 07:53:39PM CEST, [email protected] wrote:
>On Wed, Jul 03, 2019 at 04:25:10PM +0200, Jiri Pirko wrote:
>> Tue, Jul 02, 2019 at 01:50:24PM CEST, [email protected] wrote:
>>
>> [...]
>>
>> >+/* generic ->doit() handler for GET type requests */
>> >+static int ethnl_get_doit(struct sk_buff *skb, struct genl_info *info)
>>
>> It is very unfortunate for review to introduce function in a patch and
>> don't use it. In general, this approach is frowned upon. You should use
>> whatever you introduce in the same patch. I understand it is sometimes
>> hard.
>
>It's not as if I introduced something and didn't show how to use it.
>First use is in the very next patch so if you insist on reading each
>patch separately without context, just combine 09/15 and 10/15 together;
>the overlap is minimal (10/15 adds an entry into get_requests[]
>introduced in 09/15).
>
>I could have done that myself but the resulting patch would add over
>1000 lines (also something frown upon in general) and if someone asked
>if it could be split, the only honest answer I could give would be:
>"Of course it should be split, it consists of two completely logically
>separated parts (which are also 99% separated in code)."
>
>> IIUC, you have one ethnl_get_doit for all possible commands, and you
>
>Not all of them, only GET requests (and related notifications) and out
>of them, only those which fit the common pattern. There will be e.g. Rx
>rules and stats (maybe others) where dump request won't be iterating
>through devices so that they will need at least their own dumpit
>handler.
>
>> have this ops to do cmd-specific tasks. That is quite unusual. Plus if
>> you consider the complicated datastructures connected with this,
>> I'm lost from the beginning :( Any particular reason form this indirection?
>> I don't think any other generic netlink code does that (correct me if
>> I'm wrong). The nice thing about generic netlink is the fact that
>> you have separate handlers per cmd.
>>
>> I don't think you need these ops and indirections. For the common parts,
>> just have a set of common helpers, as the other generic netlink users
>> are doing. The code would be much easier to read and follow then.
>
>As I said last time, what you suggest is going back to what I already
>had in the early versions; so I have pretty good idea what the result
>would look like.
>
>I could go that way, having a separate main handler for each request
>type and call common helpers from it. But as there would always be
>a doit() handler, a dumpit() handler and mostly also a notification
>handler, I would have to factor out the functions which are now
>callbacks in struct get_request_ops anyway. To avoid too many
>parameters, I would end up with structures very similar to what I have
>now. (Not really "I would", the structures were already there, the only
>difference was that the "request" and "data" parts were two structures
>rather than one.)
>
>So at the moment, I would have 5 functions looking almost the same as
>ethnl_get_doit(), 5 functions looking almost as ethnl_get_dumpit() and
>2 functions looking like ethnl_std_notify(), with the prospect of more
>to be added. Any change in the logic would need to be repeated for all
>of them. Moreover, you also proposed (or rather requested) to drop the
>infomask concept and split the message types into multiple separate
>ones. With that change, the number of almost copies would be 21 doit(),
>21 dumpit() and 13 notification handlers (for now, that is).
I understand. It's a tradeoff. The code as you introduce is hard for
me to follow, so I thought that the other way would help readability.
Also it seems to be that you replicate a lot of generic netlink API
(per-cmd-doit/dumpit ops and privileged/GENL_ADMIN_PERM) in your code.
Seems more natural to use the API as others are doing.
>
>I'm also not happy about the way typical GET and SET request processing
>looks now. But I would much rather go in the opposite direction: define
>relationship between message attributes and data structure members so
>that most of the size estimate, data prepare, message fill and data
>update functions which are all repeating the same pattern could be
>replaced by universal functions doing these actions according to the
>description. The direction you suggest is the direction I came from.
>
>Seriously, I don't know what to think. Anywhere I look, return code is
>checked with "if (ret < 0)" (sure, some use "if (ret)" but it's
>certainly not prevalent or universally preferred, more like 1:1), now
>you tell me it's wrong. Networking stack is full of simple helpers and
>wrappers, yet you keep telling me simple wrappers are wrong. Networking
>stack is full of abstractions and ops, you tell me it's wrong. It's
>really confusing...
It is all just a matter of readability I believe.
For example when I see "if (ret < 0) goto err" I assume that there
might be positive non-error value returned. There are many places where
the code is not in optimal shape. But for new code, I believe we have to
be careful.
Simple helpers are fine as far as they don't cover simple things going
under the hood. Typical example is "myown_lock() myown_unlock()" which
just call mutex_lock/unlock. Another nice example is macro putting
netlink attributes having goto nla_failure inside - this was removed
couple years ago. The code still have many things like this. Again, for
new code, I believe we have to be careful.
Tue, Jul 02, 2019 at 01:50:24PM CEST, [email protected] wrote:
[...]
>+/* The structure holding data for unified processing GET requests consists of
>+ * two parts: request info and reply data. Request info is related to client
>+ * request and for dump request it stays constant through all processing;
>+ * reply data contains data for composing a reply message. When processing
>+ * a dump request, request info is filled only once but reply data is filled
>+ * from scratch for each reply message.
>+ *
>+ * +-----------------+-----------------+------------------+-----------------+
>+ * | common_req_info | specific info | ethnl_reply_data | specific data |
>+ * +-----------------+-----------------+------------------+-----------------+
>+ * |<---------- request info --------->|<----------- reply data ----------->|
>+ *
>+ * Request info always starts at offset 0 with struct ethnl_req_info which
>+ * holds information from parsing the common header. It may be followed by
>+ * other members for request attributes specific for current message type.
>+ * Reply data starts with struct ethnl_reply_data which may be followed by
>+ * other members holding data needed to compose a message.
>+ */
>+
[...]
>+/**
>+ * struct get_request_ops - unified handling of GET requests
>+ * @request_cmd: command id for request (GET)
>+ * @reply_cmd: command id for reply (GET_REPLY)
>+ * @hdr_attr: attribute type for request header
>+ * @max_attr: maximum (top level) attribute type
>+ * @data_size: total length of data structure
>+ * @repdata_offset: offset of "reply data" part (struct ethnl_reply_data)
For example, this looks quite scarry for me. You have one big chunk of
data (according to the scheme above) specific for cmd with reply starting
at arbitrary offset.
>+ * @request_policy: netlink policy for message contents
>+ * @header_policy: (optional) netlink policy for request header
>+ * @default_infomask: default infomask (to use if none specified)
>+ * @all_reqflags: allowed request specific flags
>+ * @allow_nodev_do: allow non-dump request with no device identification
>+ * @parse_request:
>+ * Parse request except common header (struct ethnl_req_info). Common
>+ * header is already filled on entry, the rest up to @repdata_offset
>+ * is zero initialized. This callback should only modify type specific
>+ * request info by parsed attributes from request message.
>+ * @prepare_data:
>+ * Retrieve and prepare data needed to compose a reply message. Calls to
>+ * ethtool_ops handlers should be limited to this callback. Common reply
>+ * data (struct ethnl_reply_data) is filled on entry, type specific part
>+ * after it is zero initialized. This callback should only modify the
>+ * type specific part of reply data. Device identification from struct
>+ * ethnl_reply_data is to be used as for dump requests, it iterates
>+ * through network devices which common_req_info::dev points to the
>+ * device from client request.
>+ * @reply_size:
>+ * Estimate reply message size. Returned value must be sufficient for
>+ * message payload without common reply header. The callback may returned
>+ * estimate higher than actual message size if exact calculation would
>+ * not be worth the saved memory space.
>+ * @fill_reply:
>+ * Fill reply message payload (except for common header) from reply data.
>+ * The callback must not generate more payload than previously called
>+ * ->reply_size() estimated.
>+ * @cleanup:
>+ * Optional cleanup called when reply data is no longer needed. Can be
>+ * used e.g. to free any additional data structures outside the main
>+ * structure which were allocated by ->prepare_data(). When processing
>+ * dump requests, ->cleanup() is called for each message.
>+ *
>+ * Description of variable parts of GET request handling when using the unified
>+ * infrastructure. When used, a pointer to an instance of this structure is to
>+ * be added to &get_requests array and generic handlers ethnl_get_doit(),
>+ * ethnl_get_dumpit(), ethnl_get_start() and ethnl_get_done() used in
>+ * @ethnl_genl_ops
>+ */
>+struct get_request_ops {
>+ u8 request_cmd;
>+ u8 reply_cmd;
>+ u16 hdr_attr;
>+ unsigned int max_attr;
>+ unsigned int data_size;
>+ unsigned int repdata_offset;
>+ const struct nla_policy *request_policy;
>+ const struct nla_policy *header_policy;
>+ u32 default_infomask;
>+ u32 all_reqflags;
>+ bool allow_nodev_do;
>+
>+ int (*parse_request)(struct ethnl_req_info *req_info,
>+ struct nlattr **tb,
>+ struct netlink_ext_ack *extack);
>+ int (*prepare_data)(struct ethnl_req_info *req_info,
>+ struct genl_info *info);
>+ int (*reply_size)(const struct ethnl_req_info *req_info);
>+ int (*fill_reply)(struct sk_buff *skb,
>+ const struct ethnl_req_info *req_info);
>+ void (*cleanup)(struct ethnl_req_info *req_info);
>+};
>+
> #endif /* _NET_ETHTOOL_NETLINK_H */
>--
>2.22.0
>
On Thu, Jul 04, 2019 at 10:49:13AM +0200, Jiri Pirko wrote:
> Tue, Jul 02, 2019 at 01:50:24PM CEST, [email protected] wrote:
>
> [...]
>
>
> >+/* The structure holding data for unified processing GET requests consists of
> >+ * two parts: request info and reply data. Request info is related to client
> >+ * request and for dump request it stays constant through all processing;
> >+ * reply data contains data for composing a reply message. When processing
> >+ * a dump request, request info is filled only once but reply data is filled
> >+ * from scratch for each reply message.
> >+ *
> >+ * +-----------------+-----------------+------------------+-----------------+
> >+ * | common_req_info | specific info | ethnl_reply_data | specific data |
> >+ * +-----------------+-----------------+------------------+-----------------+
> >+ * |<---------- request info --------->|<----------- reply data ----------->|
> >+ *
> >+ * Request info always starts at offset 0 with struct ethnl_req_info which
> >+ * holds information from parsing the common header. It may be followed by
> >+ * other members for request attributes specific for current message type.
> >+ * Reply data starts with struct ethnl_reply_data which may be followed by
> >+ * other members holding data needed to compose a message.
> >+ */
> >+
>
> [...]
>
>
> >+/**
> >+ * struct get_request_ops - unified handling of GET requests
> >+ * @request_cmd: command id for request (GET)
> >+ * @reply_cmd: command id for reply (GET_REPLY)
> >+ * @hdr_attr: attribute type for request header
> >+ * @max_attr: maximum (top level) attribute type
> >+ * @data_size: total length of data structure
> >+ * @repdata_offset: offset of "reply data" part (struct ethnl_reply_data)
>
> For example, this looks quite scarry for me. You have one big chunk of
> data (according to the scheme above) specific for cmd with reply starting
> at arbitrary offset.
We can split it into two structures, one for request related data with
struct ethnl_req_info embedded at offset 0 and one for reply related
data with struct ethnl_reply_data embedded at offset 0. It would be
probably more convenient to have pointer to request info from reply data
then. The code would get a bit simpler in few places at the expense of
an extra kmalloc().
Michal
On Thu, Jul 04, 2019 at 10:04:35AM +0200, Jiri Pirko wrote:
> Wed, Jul 03, 2019 at 08:18:51PM CEST, [email protected] wrote:
> >On Wed, Jul 03, 2019 at 01:49:33PM +0200, Jiri Pirko wrote:
> >> Tue, Jul 02, 2019 at 01:50:09PM CEST, [email protected] wrote:
> >> >+Compact form: nested (bitset) atrribute contents:
> >> >+
> >> >+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
> >> >+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
> >> >+ ETHTOOL_A_BITSET_VALUE (binary) bitmap of bit values
> >> >+ ETHTOOL_A_BITSET_MASK (binary) bitmap of valid bits
> >> >+
> >> >+Value and mask must have length at least ETHTOOL_A_BITSET_SIZE bits rounded up
> >> >+to a multiple of 32 bits. They consist of 32-bit words in host byte order,
> >>
> >> Looks like the blocks are similar to NLA_BITFIELD32. Why don't you user
> >> nested array of NLA_BITFIELD32 instead?
> >
> >That would mean a layout like
> >
> > 4 bytes of attr header
> > 4 bytes of value
> > 4 bytes of mask
> > 4 bytes of attr header
> > 4 bytes of value
> > 4 bytes of mask
> > ...
> >
> >i.e. interleaved headers, words of value and words of mask. Having value
> >and mask contiguous looks cleaner to me. Also, I can quickly check the
> >sizes without iterating through a (potentially long) array.
>
> Yeah, if you are not happy with this, I suggest to introduce
> NLA_BITFIELD with arbitrary size. That would be probably cleanest.
There is still the question if it it should be implemented as a nested
attribute which could look like the current compact form without the
"list" flag (if there is no mask, it's a list). Or an unstructured data
block consisting of u32 bit length and one or two bitmaps of
corresponding length. I would prefer the nested attribute, netlink was
designed to represent structured data, passing structures as binary goes
against the design (just looked at VFINFO in rtnetlink few days ago,
it's awful, IMHO).
Either way, I would still prefer to have bitmaps represented as an array
of 32-bit blocks in host byte order. This would be easy to handle in
kernel both in places where we have u32 based bitmaps and unsigned long
based ones. Other options seem less appealing:
- u8 based: only complicates processing
- u64 based: have to care about alignment
- unsigned long based: alignment and also problems with 64-bit kernel
vs. 32-bit userspace
> >> This is quite complex and confusing. Having the same API for 2 APIs is
> >> odd. The API should be crystal clear, easy to use.
> >>
> >> Why can't you have 2 commands, one working with bit arrays only, one
> >> working with strings? Something like:
> >> X_GET
> >> ETHTOOL_A_BITS (nested)
> >> ETHTOOL_A_BIT_ARRAY (BITFIELD32)
> >> X_NAMES_GET
> >> ETHTOOL_A_BIT_NAMES (nested)
> >> ETHTOOL_A_BIT_INDEX
> >> ETHTOOL_A_BIT_NAME
> >>
> >> For set, you can also have multiple cmds:
> >> X_SET - to set many at once, by bit index
> >> ETHTOOL_A_BITS (nested)
> >> ETHTOOL_A_BIT_ARRAY (BITFIELD32)
> >> X_ONE_SET - to set one, by bit index
> >> ETHTOOL_A_BIT_INDEX
> >> ETHTOOL_A_BIT_VALUE
> >> X_ONE_SET - to set one, by name
> >> ETHTOOL_A_BIT_NAME
> >> ETHTOOL_A_BIT_VALUE
> >
> >This looks as if you assume there is nothing except the bitset in the
> >message but that is not true. Even with your proposed breaking of
> >current groups, you would still have e.g. 4 bitsets in reply to netdev
> >features query, 3 in timestamping info GET request and often bitsets
> >combined with other data (e.g. WoL modes and optional WoL password).
> >If you wanted to further refine the message granularity to the level of
> >single parameters, we might be out of message type ids already.
>
> You can still have multiple bitsets(bitfields) in single message and
> have separate cmd/cmds to get string-bit mapping. No need to mangle it.
Let's take a look at what it means in practice, the command is
ethtool --set-prif-flags eth3 legacy-rx on
on an ixgbe card. Currently, ethtool (from the github repository) does
------------------------------------------------------------------------
ETHTOOL_CMD_SETTINGS_SET (K->U, 68 bytes)
ETHTOOL_A_HEADER
ETHTOOL_A_DEV_NAME = "eth3"
ETHTOOL_A_SETTINGS_PRIV_FLAGS
ETHTOOL_A_BITSET_BITS
ETHTOOL_A_BITS_BIT
ETHTOOL_A_BIT_NAME = "legacy-rx"
ETHTOOL_A_BIT_VALUE
NLMSG_ERR (K->U, 36 bytes) err = 0
------------------------------------------------------------------------
If we had only compact form (or some of the NLA_BITFIELD solutions we
are talking about), you would need
------------------------------------------------------------------------
ETHTOOL_CMD_STRSET_GET (U->K, 52 bytes)
ETHTOOL_A_HEADER
ETHTOOL_A_DEV_NAME = "eth3"
ETHTOOL_A_STRSET_STRINGSETS
ETHTOOL_A_STRINGSETS_STRINGSET
ETHTOOL_A_STRINGSET_ID = 2 (ETH_SS_PRIV_FLAGS)
ETHTOOL_CMD_STRSET_GET_REPLY (K->U, 128 bytes)
ETHTOOL_A_HEADER
ETHTOOL_A_DEV_INDEX = 9
ETHTOOL_A_DEV_NAME = "eth3"
ETHTOOL_A_STRSET_STRINGSETS
ETHTOOL_A_STRINGSETS_STRINGSET
ETHTOOL_A_STRINGSET_ID = 2 (ETH_SS_PRIV_FLAGS)
ETHTOOL_A_STRINGSET_COUNT = 2
ETHTOOL_A_STRINGSET_STRINGS
ETHTOOL_A_STRINGS_STRING
ETHTOOL_A_STRING_INDEX = 0
ETHTOOL_A_STRING_VALUE = "legacy-rx"
ETHTOOL_A_STRINGS_STRING
ETHTOOL_A_STRING_INDEX = 1
ETHTOOL_A_STRING_VALUE = "vf-ipsec"
NLMSG_ERR (K->U, 36 bytes) err = 0
ETHTOOL_CMD_SETTINGS_SET (K->U, 64 bytes)
ETHTOOL_A_HEADER
ETHTOOL_A_DEV_NAME = "eth3"
ETHTOOL_A_SETTINGS_PRIV_FLAGS
ETHTOOL_A_BITSET_SIZE = 2
ETHTOOL_A_BITSET_VALUE = 00000001
ETHTOOL_A_BITSET_MASK = 00000001
NLMSG_ERR (K->U, 36 bytes) err = 0
------------------------------------------------------------------------
That's an extra roundtrip, lot more chat and the SETTINGS_SET message is
only 4 bytes shorter in the end. And we can consider ourselves lucky
this NIC has only two private flags. Or that we didn't need to enable or
disable a netdev feature (56 bits) or link mode (69 bits and growing).
We could reduce the overhead by allowing STRSET_GET query to only ask
for specific string(s) but there would still be the extra roundtrip
which I dislike in the ioctl interface. Florian also said in the v5
discussion that he would like if it was possible to get names and data
together in one request.
Michal
On Thu, 2019-07-04 at 13:52 +0200, Michal Kubecek wrote:
>
> There is still the question if it it should be implemented as a nested
> attribute which could look like the current compact form without the
> "list" flag (if there is no mask, it's a list). Or an unstructured data
> block consisting of u32 bit length
You wouldn't really need the length, since the attribute has a length
already :-)
And then, if you just concatenate the value and mask, the existing
NLA_BITFIELD32 becomes a special case.
> and one or two bitmaps of
> corresponding length. I would prefer the nested attribute, netlink was
> designed to represent structured data, passing structures as binary goes
> against the design (just looked at VFINFO in rtnetlink few days ago,
> it's awful, IMHO).
Yeah, I dunno. On the one hand I completely agree, on the other hand
NLA_BITFIELD32 already goes the other way, and is there now...
> Either way, I would still prefer to have bitmaps represented as an array
> of 32-bit blocks in host byte order. This would be easy to handle in
> kernel both in places where we have u32 based bitmaps and unsigned long
> based ones. Other options seem less appealing:
>
> - u8 based: only complicates processing
> - u64 based: have to care about alignment
> - unsigned long based: alignment and also problems with 64-bit kernel
> vs. 32-bit userspace
Agree with this.
johannes
On Wed, 2019-07-03 at 16:37 +0200, Jiri Pirko wrote:
> Wed, Jul 03, 2019 at 03:44:57PM CEST, [email protected] wrote:
> > On Wed, 2019-07-03 at 13:49 +0200, Jiri Pirko wrote:
> > >
> > > > +Value and mask must have length at least ETHTOOL_A_BITSET_SIZE bits rounded up
> > > > +to a multiple of 32 bits. They consist of 32-bit words in host byte order,
> > >
> > > Looks like the blocks are similar to NLA_BITFIELD32. Why don't you user
> > > nested array of NLA_BITFIELD32 instead?
> >
> > That would seem kind of awkward to use, IMHO.
> >
> > Perhaps better to make some kind of generic "arbitrary size bitfield"
> > attribute type?
>
> Yep, I believe I was trying to make this point during bitfield32
> discussion, failed apparently. So if we have "NLA_BITFIELD" with
> arbitrary size, that sounds good to me.
I guess it could be the same way - just have the content be
u32 value[N];
u32 select[N];
where N = nla_len(attr) / 8
That'd be compatible with NLA_BITFIELD32, and we could basically change
all occurrences of NLA_BITFIELD32 to NLA_BITFIELD, and have NLA_BITFIELD
take something like a "max_bit" for the .len field or something like
that? And an entry in the validation union to point to a "u32 *mask"
instead of the current validation_data that just points to a single u32
mask...
So overall seems like a pretty simple extension to NLA_BITFIELD32 that
handles NLA_BITFIELD32 as a special case with simply .len=32.
(len is a 16-bit field, but a 64k bitmap should be sufficient I hope?)
johannes
On Thu, Jul 04, 2019 at 02:03:02PM +0200, Johannes Berg wrote:
> On Thu, 2019-07-04 at 13:52 +0200, Michal Kubecek wrote:
> >
> > There is still the question if it it should be implemented as a nested
> > attribute which could look like the current compact form without the
> > "list" flag (if there is no mask, it's a list). Or an unstructured data
> > block consisting of u32 bit length
>
> You wouldn't really need the length, since the attribute has a length
> already :-)
It has byte length, not bit length. The bitmaps we are dealing with
can have any bit length, not necessarily multiples of 8 (or even 32).
Michal
On Thu, 2019-07-04 at 14:17 +0200, Michal Kubecek wrote:
> On Thu, Jul 04, 2019 at 02:03:02PM +0200, Johannes Berg wrote:
> > On Thu, 2019-07-04 at 13:52 +0200, Michal Kubecek wrote:
> > >
> > > There is still the question if it it should be implemented as a nested
> > > attribute which could look like the current compact form without the
> > > "list" flag (if there is no mask, it's a list). Or an unstructured data
> > > block consisting of u32 bit length
> >
> > You wouldn't really need the length, since the attribute has a length
> > already :-)
>
> It has byte length, not bit length. The bitmaps we are dealing with
> can have any bit length, not necessarily multiples of 8 (or even 32).
Not sure why that matters? You have the mask, so you don't really need
to additionally say that you're only going up to a certain bit?
I mean, say you want to set some bits <=17, why would you need to say
that they're <=17 if you have a
value: 0b00000000'000000xx'xxxxxxxx'xxxxxxxx
mask: 0b00000000'00000011'11111111'11111111
johannes
On Thu, Jul 04, 2019 at 02:21:52PM +0200, Johannes Berg wrote:
> On Thu, 2019-07-04 at 14:17 +0200, Michal Kubecek wrote:
> > On Thu, Jul 04, 2019 at 02:03:02PM +0200, Johannes Berg wrote:
> > > On Thu, 2019-07-04 at 13:52 +0200, Michal Kubecek wrote:
> > > >
> > > > There is still the question if it it should be implemented as a nested
> > > > attribute which could look like the current compact form without the
> > > > "list" flag (if there is no mask, it's a list). Or an unstructured data
> > > > block consisting of u32 bit length
> > >
> > > You wouldn't really need the length, since the attribute has a length
> > > already :-)
> >
> > It has byte length, not bit length. The bitmaps we are dealing with
> > can have any bit length, not necessarily multiples of 8 (or even 32).
>
> Not sure why that matters? You have the mask, so you don't really need
> to additionally say that you're only going up to a certain bit?
>
> I mean, say you want to set some bits <=17, why would you need to say
> that they're <=17 if you have a
> value: 0b00000000'000000xx'xxxxxxxx'xxxxxxxx
> mask: 0b00000000'00000011'11111111'11111111
One scenario that I can see from the top of my head would be user
running
ethtool -s <dev> advertise 0x...
with hex value representing some subset of link modes. Now if ethtool
version is behind kernel and recognizes fewer link modes than kernel
but in a way that the number rounded up to bytes or words would be the
same, kernel has no way to recognize of those zero bits on top of the
mask are zero on purpose or just because userspace doesn't know about
them. In general, I believe the absence of bit length information is
something protocols would have to work around sometimes.
The submitted implementation doesn't have this problem as it can tell
kernel "this is a list" (i.e. I'm not sending a value/mask pair, I want
exactly these bits to be set). Thus it can easily implement requests of
both types (value/mask or just value):
ethtool -s <dev> advertise 0x2f
ethtool -s <dev> advertise 0x08/0x0c
ethtool -s <dev> advertise 100baseT/Full off 1000baseT/Full on
and could be as easily extended to support also
ethtool -s <dev> advertise 100baseT/Full 1000baseT/Full
Michal
On Thu, 2019-07-04 at 14:53 +0200, Michal Kubecek wrote:
>
> > value: 0b00000000'000000xx'xxxxxxxx'xxxxxxxx
> > mask: 0b00000000'00000011'11111111'11111111
>
> One scenario that I can see from the top of my head would be user
> running
>
> ethtool -s <dev> advertise 0x...
The "0x..." here would be the *value* in the NLA_BITFIELD32 parlance,
right?
What would the "selector" be? I assume the selector would be "whatever
ethtool knows about"?
> with hex value representing some subset of link modes. Now if ethtool
> version is behind kernel and recognizes fewer link modes than kernel
> but in a way that the number rounded up to bytes or words would be the
> same, kernel has no way to recognize of those zero bits on top of the
> mask are zero on purpose or just because userspace doesn't know about
> them. In general, I believe the absence of bit length information is
> something protocols would have to work around sometimes.
>
> The submitted implementation doesn't have this problem as it can tell
> kernel "this is a list" (i.e. I'm not sending a value/mask pair, I want
> exactly these bits to be set).
OK, here I guess I see what you mean. You're saying if ethtool were to
send a value/mask of "0..0100/0..0111" you wouldn't know what to do with
BIT(4) as long as the kernel knows about that bit?
I guess the difference now is depending on the operation. NLA_BITFIELD32
is sort of built on the assumption of having a "toggle" operation. If
you want to have a "set to" operation, then you don't really need the
selector/mask at all, just the value.
johannes
> OK, here I guess I see what you mean. You're saying if ethtool were to
> send a value/mask of "0..0100/0..0111" you wouldn't know what to do with
> BIT(4) as long as the kernel knows about that bit?
>
> I guess the difference now is depending on the operation. NLA_BITFIELD32
> is sort of built on the assumption of having a "toggle" operation. If
> you want to have a "set to" operation, then you don't really need the
> selector/mask at all, just the value.
I don't think it is as simple as this. User space has a few different
things it wants to pass to the kernel:
I want to set this bit to 0
I want to set this bit to 1
I don't want to change this bit
In my world view, this bit is unused
The kernel has had a long history of trouble with flag bits in system
calls. It has not validated that unused bits are clear. Meaning when
you actually want to make use of the unused bits you cannot because
userspace has been passing random values in them since day 1.
We need a design which is clear to everybody which bits are unused and
should be validated as being unused and an error returned if an unused
bit is actually used. A value and a mask is not sufficient for
this. We need the length in bits.
Andrew
On Wed, Jul 03, 2019 at 12:04:35PM +0200, Jiri Pirko wrote:
> Tue, Jul 02, 2019 at 06:34:37PM CEST, [email protected] wrote:
> >On Tue, Jul 02, 2019 at 03:05:15PM +0200, Jiri Pirko wrote:
> >> Tue, Jul 02, 2019 at 01:50:04PM CEST, [email protected] wrote:
> >> >+/**
> >> >+ * ethnl_is_privileged() - check if request has sufficient privileges
> >> >+ * @skb: skb with client request
> >> >+ *
> >> >+ * Checks if client request has CAP_NET_ADMIN in its netns. Unlike the flags
> >> >+ * in genl_ops, this allows finer access control, e.g. allowing or denying
> >> >+ * the request based on its contents or witholding only part of the data
> >> >+ * from unprivileged users.
> >> >+ *
> >> >+ * Return: true if request is privileged, false if not
> >> >+ */
> >> >+static inline bool ethnl_is_privileged(struct sk_buff *skb)
> >>
> >> I wonder why you need this helper. Genetlink uses
> >> ops->flags & GENL_ADMIN_PERM for this.
> >
> >It's explained in the function description. Sometimes we need finer
> >control than by request message type. An example is the WoL password:
> >ETHTOOL_GWOL is privileged because of it but I believe there si no
> >reason why unprivileged user couldn't see enabled WoL modes, we can
> >simply omit the password for him. (Also, it allows to combine query for
> >WoL settings with other unprivileged settings.)
>
> Why can't we have rather:
> ETHTOOL_WOL_GET for all
> ETHTOOL_WOL_PASSWORD_GET with GENL_ADMIN_PERM
> ?
> Better to stick with what we have in gennetlink rather then to bend the
> implementation from the very beginning I think.
We can. But it would also mean two separate SET requests (or breaking
the rule that _GET_REPLY, _SET and _NTF share the layout). That would be
unfortunate as ethtool_ops callback does not actually allow setting only
the modes so that the ETHTOOL_MSG_WOL_SET request (which would have to
go first as many drivers ignore .sopass if WAKE_MAGICSECURE is not set)
would have to pass a different password (most likely just leaving what
->get_wol() put there) and that password would be actually set until the
second request arrives. There goes the idea of getting rid of ioctl
interface raciness...
I would rather see returning to WoL modes not being visible to
unprivileged users than that (even if there is no actual reason for it).
Anyway, shortening the series left WoL settings out if the first part so
that I can split this out for now and leave the discussion for when we
get to WoL one day.
> >> >+/**
> >> >+ * ethnl_reply_header_size() - total size of reply header
> >> >+ *
> >> >+ * This is an upper estimate so that we do not need to hold RTNL lock longer
> >> >+ * than necessary (to prevent rename between size estimate and composing the
> >>
> >> I guess this description is not relevant anymore. I don't see why to
> >> hold rtnl mutex for this function...
> >
> >You don't need it for this function, it's the other way around: unless
> >you hold RTNL lock for the whole time covering both checking needed
> >message size and filling the message - and we don't - the device could
> >be renamed in between. Thus if we returned size based on current device
> >name, it might not be sufficient at the time the header is filled.
> >That's why this function returns maximum possible size (which is
> >actually a constant).
>
> I suggest to avoid the description. It is misleading. Perhaps something
> to have in a patch description but not here in code.
The reason I put the comment there was to prevent someone "optimizing"
the helper by using strlen() later. Maybe something shorter and more to
the point, e.g.
Using IFNAMSIZ is faster and prevents a race if the device is renamed
before we fill the name into skb.
?
Michal
Mon, Jul 08, 2019 at 02:22:51PM CEST, [email protected] wrote:
>On Wed, Jul 03, 2019 at 12:04:35PM +0200, Jiri Pirko wrote:
>> Tue, Jul 02, 2019 at 06:34:37PM CEST, [email protected] wrote:
>> >On Tue, Jul 02, 2019 at 03:05:15PM +0200, Jiri Pirko wrote:
>> >> Tue, Jul 02, 2019 at 01:50:04PM CEST, [email protected] wrote:
>> >> >+/**
>> >> >+ * ethnl_is_privileged() - check if request has sufficient privileges
>> >> >+ * @skb: skb with client request
>> >> >+ *
>> >> >+ * Checks if client request has CAP_NET_ADMIN in its netns. Unlike the flags
>> >> >+ * in genl_ops, this allows finer access control, e.g. allowing or denying
>> >> >+ * the request based on its contents or witholding only part of the data
>> >> >+ * from unprivileged users.
>> >> >+ *
>> >> >+ * Return: true if request is privileged, false if not
>> >> >+ */
>> >> >+static inline bool ethnl_is_privileged(struct sk_buff *skb)
>> >>
>> >> I wonder why you need this helper. Genetlink uses
>> >> ops->flags & GENL_ADMIN_PERM for this.
>> >
>> >It's explained in the function description. Sometimes we need finer
>> >control than by request message type. An example is the WoL password:
>> >ETHTOOL_GWOL is privileged because of it but I believe there si no
>> >reason why unprivileged user couldn't see enabled WoL modes, we can
>> >simply omit the password for him. (Also, it allows to combine query for
>> >WoL settings with other unprivileged settings.)
>>
>> Why can't we have rather:
>> ETHTOOL_WOL_GET for all
>> ETHTOOL_WOL_PASSWORD_GET with GENL_ADMIN_PERM
>> ?
>> Better to stick with what we have in gennetlink rather then to bend the
>> implementation from the very beginning I think.
>
>We can. But it would also mean two separate SET requests (or breaking
>the rule that _GET_REPLY, _SET and _NTF share the layout). That would be
>unfortunate as ethtool_ops callback does not actually allow setting only
>the modes so that the ETHTOOL_MSG_WOL_SET request (which would have to
>go first as many drivers ignore .sopass if WAKE_MAGICSECURE is not set)
>would have to pass a different password (most likely just leaving what
>->get_wol() put there) and that password would be actually set until the
>second request arrives. There goes the idea of getting rid of ioctl
>interface raciness...
I understand. That is my concern, not to bring baggage from ioclt :/
>
>I would rather see returning to WoL modes not being visible to
>unprivileged users than that (even if there is no actual reason for it).
>Anyway, shortening the series left WoL settings out if the first part so
>that I can split this out for now and leave the discussion for when we
>get to WoL one day.
Fine.
>
>> >> >+/**
>> >> >+ * ethnl_reply_header_size() - total size of reply header
>> >> >+ *
>> >> >+ * This is an upper estimate so that we do not need to hold RTNL lock longer
>> >> >+ * than necessary (to prevent rename between size estimate and composing the
>> >>
>> >> I guess this description is not relevant anymore. I don't see why to
>> >> hold rtnl mutex for this function...
>> >
>> >You don't need it for this function, it's the other way around: unless
>> >you hold RTNL lock for the whole time covering both checking needed
>> >message size and filling the message - and we don't - the device could
>> >be renamed in between. Thus if we returned size based on current device
>> >name, it might not be sufficient at the time the header is filled.
>> >That's why this function returns maximum possible size (which is
>> >actually a constant).
>>
>> I suggest to avoid the description. It is misleading. Perhaps something
>> to have in a patch description but not here in code.
>
>The reason I put the comment there was to prevent someone "optimizing"
>the helper by using strlen() later. Maybe something shorter and more to
>the point, e.g.
>
> Using IFNAMSIZ is faster and prevents a race if the device is renamed
> before we fill the name into skb.
>
>?
Sounds good, thanks!
>
>Michal
On Wed, Jul 03, 2019 at 10:41:51AM +0200, Jiri Pirko wrote:
> Tue, Jul 02, 2019 at 04:52:41PM CEST, [email protected] wrote:
> >On Tue, Jul 02, 2019 at 02:25:21PM +0200, Jiri Pirko wrote:
> >> Tue, Jul 02, 2019 at 01:49:59PM CEST, [email protected] wrote:
> >> >+
> >> >+ ETHTOOL_A_HEADER_DEV_INDEX (u32) device ifindex
> >> >+ ETHTOOL_A_HEADER_DEV_NAME (string) device name
> >> >+ ETHTOOL_A_HEADER_INFOMASK (u32) info mask
> >> >+ ETHTOOL_A_HEADER_GFLAGS (u32) flags common for all requests
> >> >+ ETHTOOL_A_HEADER_RFLAGS (u32) request specific flags
> >> >+
> >> >+ETHTOOL_A_HEADER_DEV_INDEX and ETHTOOL_A_HEADER_DEV_NAME identify the device
> >> >+message relates to. One of them is sufficient in requests, if both are used,
> >> >+they must identify the same device. Some requests, e.g. global string sets, do
> >> >+not require device identification. Most GET requests also allow dump requests
> >> >+without device identification to query the same information for all devices
> >> >+providing it (each device in a separate message).
> >> >+
> >> >+Optional info mask allows to ask only for a part of data provided by GET
> >>
> >> How this "infomask" works? What are the bits related to? Is that request
> >> specific?
> >
> >The interpretation is request specific, the information returned for
> >a GET request is divided into multiple parts and client can choose to
> >request one of them (usually one). In the code so far, infomask bits
> >correspond to top level (nest) attributes but I would rather not make it
> >a strict rule.
>
> Wait, so it is a matter of verbosity? If you have multiple parts and the
> user is able to chose one of them, why don't you rather have multiple
> get commands, one per bit. This infomask construct seems redundant to me.
I thought it was a matter of verbosity because it is a very basic
element of the design, it was even advertised in the cover letter among
the basic ideas, it has been there since the very beginning and in five
previous versions through year and a half, noone did question it. That's
why I thought you objected against unclear description, not against the
concept as such.
There are two reasons for this design. First is to reduce the number of
requests needed to get the information. This is not so much a problem of
ethtool itself; the only existing commands that would result in multiple
request messages would be "ethtool <dev>" and "ethtool -s <dev>". Maybe
also "ethtool -x/-X <dev>" but even if the indirection table and hash
key have different bits assigned now, they don't have to be split even
if we split other commands. It may be bigger problem for daemons wanting
to keep track of system configuration which would have to issue many
requests whenever a new device appears.
Second reason is that with 8-bit genetlink command/message id, the space
is not as infinite as it might seem. I counted quickly, right now the
full series uses 14 ids for kernel messages, with split you propose it
would most likely grow to 44. For full implementation of all ethtool
functionality, we could get to ~60 ids. It's still only 1/4 of the
available space but it's not clear what the future development will look
like. We would certainly need to be careful not to start allocating new
commands for single parameters and try to be foreseeing about what can
be grouped together. But we will need to do that in any case.
On kernel side, splitting existing messages would make some things a bit
easier. It would also reduce the number of scenarios where only part of
requested information is available or only part of a SET request fails.
Michal
On Mon, 2019-07-08 at 19:27 +0200, Michal Kubecek wrote:
>
> Second reason is that with 8-bit genetlink command/message id, the space
> is not as infinite as it might seem.
FWIW, there isn't really any good reason for this, we have like 16
reserved bits in the genl header.
OTOH, having a LOT of ops will certainly cost space in the kernel
image...
johannes
Mon, Jul 08, 2019 at 07:27:29PM CEST, [email protected] wrote:
>On Wed, Jul 03, 2019 at 10:41:51AM +0200, Jiri Pirko wrote:
>> Tue, Jul 02, 2019 at 04:52:41PM CEST, [email protected] wrote:
>> >On Tue, Jul 02, 2019 at 02:25:21PM +0200, Jiri Pirko wrote:
>> >> Tue, Jul 02, 2019 at 01:49:59PM CEST, [email protected] wrote:
>> >> >+
>> >> >+ ETHTOOL_A_HEADER_DEV_INDEX (u32) device ifindex
>> >> >+ ETHTOOL_A_HEADER_DEV_NAME (string) device name
>> >> >+ ETHTOOL_A_HEADER_INFOMASK (u32) info mask
>> >> >+ ETHTOOL_A_HEADER_GFLAGS (u32) flags common for all requests
>> >> >+ ETHTOOL_A_HEADER_RFLAGS (u32) request specific flags
>> >> >+
>> >> >+ETHTOOL_A_HEADER_DEV_INDEX and ETHTOOL_A_HEADER_DEV_NAME identify the device
>> >> >+message relates to. One of them is sufficient in requests, if both are used,
>> >> >+they must identify the same device. Some requests, e.g. global string sets, do
>> >> >+not require device identification. Most GET requests also allow dump requests
>> >> >+without device identification to query the same information for all devices
>> >> >+providing it (each device in a separate message).
>> >> >+
>> >> >+Optional info mask allows to ask only for a part of data provided by GET
>> >>
>> >> How this "infomask" works? What are the bits related to? Is that request
>> >> specific?
>> >
>> >The interpretation is request specific, the information returned for
>> >a GET request is divided into multiple parts and client can choose to
>> >request one of them (usually one). In the code so far, infomask bits
>> >correspond to top level (nest) attributes but I would rather not make it
>> >a strict rule.
>>
>> Wait, so it is a matter of verbosity? If you have multiple parts and the
>> user is able to chose one of them, why don't you rather have multiple
>> get commands, one per bit. This infomask construct seems redundant to me.
>
>I thought it was a matter of verbosity because it is a very basic
>element of the design, it was even advertised in the cover letter among
>the basic ideas, it has been there since the very beginning and in five
>previous versions through year and a half, noone did question it. That's
>why I thought you objected against unclear description, not against the
>concept as such.
>
>There are two reasons for this design. First is to reduce the number of
>requests needed to get the information. This is not so much a problem of
>ethtool itself; the only existing commands that would result in multiple
>request messages would be "ethtool <dev>" and "ethtool -s <dev>". Maybe
>also "ethtool -x/-X <dev>" but even if the indirection table and hash
>key have different bits assigned now, they don't have to be split even
>if we split other commands. It may be bigger problem for daemons wanting
>to keep track of system configuration which would have to issue many
>requests whenever a new device appears.
>
>Second reason is that with 8-bit genetlink command/message id, the space
>is not as infinite as it might seem. I counted quickly, right now the
>full series uses 14 ids for kernel messages, with split you propose it
>would most likely grow to 44. For full implementation of all ethtool
>functionality, we could get to ~60 ids. It's still only 1/4 of the
>available space but it's not clear what the future development will look
>like. We would certainly need to be careful not to start allocating new
>commands for single parameters and try to be foreseeing about what can
>be grouped together. But we will need to do that in any case.
>
>On kernel side, splitting existing messages would make some things a bit
>easier. It would also reduce the number of scenarios where only part of
>requested information is available or only part of a SET request fails.
Okay, I got your point. So why don't we look at if from the other angle.
Why don't we have only single get/set command that would be in general
used to get/set ALL info from/to the kernel. Where we can have these
bits (perhaps rather varlen bitfield) to for user to indicate which data
is he interested in? This scales. The other commands would be
just for action.
Something like RTM_GETLINK/RTM_SETLINK. Makes sense?
>
>Michal
On Mon, Jul 08, 2019 at 09:26:29PM +0200, Jiri Pirko wrote:
> Mon, Jul 08, 2019 at 07:27:29PM CEST, [email protected] wrote:
> >
> >There are two reasons for this design. First is to reduce the number of
> >requests needed to get the information. This is not so much a problem of
> >ethtool itself; the only existing commands that would result in multiple
> >request messages would be "ethtool <dev>" and "ethtool -s <dev>". Maybe
> >also "ethtool -x/-X <dev>" but even if the indirection table and hash
> >key have different bits assigned now, they don't have to be split even
> >if we split other commands. It may be bigger problem for daemons wanting
> >to keep track of system configuration which would have to issue many
> >requests whenever a new device appears.
> >
> >Second reason is that with 8-bit genetlink command/message id, the space
> >is not as infinite as it might seem. I counted quickly, right now the
> >full series uses 14 ids for kernel messages, with split you propose it
> >would most likely grow to 44. For full implementation of all ethtool
> >functionality, we could get to ~60 ids. It's still only 1/4 of the
> >available space but it's not clear what the future development will look
> >like. We would certainly need to be careful not to start allocating new
> >commands for single parameters and try to be foreseeing about what can
> >be grouped together. But we will need to do that in any case.
> >
> >On kernel side, splitting existing messages would make some things a bit
> >easier. It would also reduce the number of scenarios where only part of
> >requested information is available or only part of a SET request fails.
>
> Okay, I got your point. So why don't we look at if from the other angle.
> Why don't we have only single get/set command that would be in general
> used to get/set ALL info from/to the kernel. Where we can have these
> bits (perhaps rather varlen bitfield) to for user to indicate which data
> is he interested in? This scales. The other commands would be
> just for action.
>
> Something like RTM_GETLINK/RTM_SETLINK. Makes sense?
It's certainly an option but at the first glance it seems as just moving
what I tried to avoid one level lower. It would work around the u8 issue
(but as Johannes pointed out, we can handle it with genetlink when/if
the time comes). We would almost certainly have to split the replies
into multiple messages to keep the packet size reasonable. I'll have to
think more about the consequences for both kernel and userspace.
My gut feeling is that out of the two extreme options (one universal
message type and message types corresponding to current infomask bits),
the latter is more appealing. After all, ethtool has been gathering
features that would need those ~60 message types for 20 years.
Michal
Mon, Jul 08, 2019 at 09:26:29PM CEST, [email protected] wrote:
>Mon, Jul 08, 2019 at 07:27:29PM CEST, [email protected] wrote:
>>On Wed, Jul 03, 2019 at 10:41:51AM +0200, Jiri Pirko wrote:
>>> Tue, Jul 02, 2019 at 04:52:41PM CEST, [email protected] wrote:
>>> >On Tue, Jul 02, 2019 at 02:25:21PM +0200, Jiri Pirko wrote:
>>> >> Tue, Jul 02, 2019 at 01:49:59PM CEST, [email protected] wrote:
>>> >> >+
>>> >> >+ ETHTOOL_A_HEADER_DEV_INDEX (u32) device ifindex
>>> >> >+ ETHTOOL_A_HEADER_DEV_NAME (string) device name
>>> >> >+ ETHTOOL_A_HEADER_INFOMASK (u32) info mask
>>> >> >+ ETHTOOL_A_HEADER_GFLAGS (u32) flags common for all requests
>>> >> >+ ETHTOOL_A_HEADER_RFLAGS (u32) request specific flags
>>> >> >+
>>> >> >+ETHTOOL_A_HEADER_DEV_INDEX and ETHTOOL_A_HEADER_DEV_NAME identify the device
>>> >> >+message relates to. One of them is sufficient in requests, if both are used,
>>> >> >+they must identify the same device. Some requests, e.g. global string sets, do
>>> >> >+not require device identification. Most GET requests also allow dump requests
>>> >> >+without device identification to query the same information for all devices
>>> >> >+providing it (each device in a separate message).
>>> >> >+
>>> >> >+Optional info mask allows to ask only for a part of data provided by GET
>>> >>
>>> >> How this "infomask" works? What are the bits related to? Is that request
>>> >> specific?
>>> >
>>> >The interpretation is request specific, the information returned for
>>> >a GET request is divided into multiple parts and client can choose to
>>> >request one of them (usually one). In the code so far, infomask bits
>>> >correspond to top level (nest) attributes but I would rather not make it
>>> >a strict rule.
>>>
>>> Wait, so it is a matter of verbosity? If you have multiple parts and the
>>> user is able to chose one of them, why don't you rather have multiple
>>> get commands, one per bit. This infomask construct seems redundant to me.
>>
>>I thought it was a matter of verbosity because it is a very basic
>>element of the design, it was even advertised in the cover letter among
>>the basic ideas, it has been there since the very beginning and in five
>>previous versions through year and a half, noone did question it. That's
>>why I thought you objected against unclear description, not against the
>>concept as such.
>>
>>There are two reasons for this design. First is to reduce the number of
>>requests needed to get the information. This is not so much a problem of
>>ethtool itself; the only existing commands that would result in multiple
>>request messages would be "ethtool <dev>" and "ethtool -s <dev>". Maybe
>>also "ethtool -x/-X <dev>" but even if the indirection table and hash
>>key have different bits assigned now, they don't have to be split even
>>if we split other commands. It may be bigger problem for daemons wanting
>>to keep track of system configuration which would have to issue many
>>requests whenever a new device appears.
>>
>>Second reason is that with 8-bit genetlink command/message id, the space
>>is not as infinite as it might seem. I counted quickly, right now the
>>full series uses 14 ids for kernel messages, with split you propose it
>>would most likely grow to 44. For full implementation of all ethtool
>>functionality, we could get to ~60 ids. It's still only 1/4 of the
>>available space but it's not clear what the future development will look
>>like. We would certainly need to be careful not to start allocating new
>>commands for single parameters and try to be foreseeing about what can
>>be grouped together. But we will need to do that in any case.
>>
>>On kernel side, splitting existing messages would make some things a bit
>>easier. It would also reduce the number of scenarios where only part of
>>requested information is available or only part of a SET request fails.
>
>Okay, I got your point. So why don't we look at if from the other angle.
>Why don't we have only single get/set command that would be in general
>used to get/set ALL info from/to the kernel. Where we can have these
>bits (perhaps rather varlen bitfield) to for user to indicate which data
>is he interested in? This scales. The other commands would be
>just for action.
>
>Something like RTM_GETLINK/RTM_SETLINK. Makes sense?
+ I think this might safe a lot of complexicity aroung your proposed
inner ops.
>
>
>>
>>Michal
Mon, Jul 08, 2019 at 10:22:19PM CEST, [email protected] wrote:
>On Mon, Jul 08, 2019 at 09:26:29PM +0200, Jiri Pirko wrote:
>> Mon, Jul 08, 2019 at 07:27:29PM CEST, [email protected] wrote:
>> >
>> >There are two reasons for this design. First is to reduce the number of
>> >requests needed to get the information. This is not so much a problem of
>> >ethtool itself; the only existing commands that would result in multiple
>> >request messages would be "ethtool <dev>" and "ethtool -s <dev>". Maybe
>> >also "ethtool -x/-X <dev>" but even if the indirection table and hash
>> >key have different bits assigned now, they don't have to be split even
>> >if we split other commands. It may be bigger problem for daemons wanting
>> >to keep track of system configuration which would have to issue many
>> >requests whenever a new device appears.
>> >
>> >Second reason is that with 8-bit genetlink command/message id, the space
>> >is not as infinite as it might seem. I counted quickly, right now the
>> >full series uses 14 ids for kernel messages, with split you propose it
>> >would most likely grow to 44. For full implementation of all ethtool
>> >functionality, we could get to ~60 ids. It's still only 1/4 of the
>> >available space but it's not clear what the future development will look
>> >like. We would certainly need to be careful not to start allocating new
>> >commands for single parameters and try to be foreseeing about what can
>> >be grouped together. But we will need to do that in any case.
>> >
>> >On kernel side, splitting existing messages would make some things a bit
>> >easier. It would also reduce the number of scenarios where only part of
>> >requested information is available or only part of a SET request fails.
>>
>> Okay, I got your point. So why don't we look at if from the other angle.
>> Why don't we have only single get/set command that would be in general
>> used to get/set ALL info from/to the kernel. Where we can have these
>> bits (perhaps rather varlen bitfield) to for user to indicate which data
>> is he interested in? This scales. The other commands would be
>> just for action.
>>
>> Something like RTM_GETLINK/RTM_SETLINK. Makes sense?
>
>It's certainly an option but at the first glance it seems as just moving
>what I tried to avoid one level lower. It would work around the u8 issue
>(but as Johannes pointed out, we can handle it with genetlink when/if
>the time comes). We would almost certainly have to split the replies
>into multiple messages to keep the packet size reasonable. I'll have to
>think more about the consequences for both kernel and userspace.
>
>My gut feeling is that out of the two extreme options (one universal
>message type and message types corresponding to current infomask bits),
>the latter is more appealing. After all, ethtool has been gathering
>features that would need those ~60 message types for 20 years.
Yeah, but I think that we have to do one or another. Anything in between
makes the code complex and uapi confusing. Let's start clean :)
Thu, Jul 04, 2019 at 01:52:36PM CEST, [email protected] wrote:
>On Thu, Jul 04, 2019 at 10:04:35AM +0200, Jiri Pirko wrote:
>> Wed, Jul 03, 2019 at 08:18:51PM CEST, [email protected] wrote:
>> >On Wed, Jul 03, 2019 at 01:49:33PM +0200, Jiri Pirko wrote:
>> >> Tue, Jul 02, 2019 at 01:50:09PM CEST, [email protected] wrote:
>> >> >+Compact form: nested (bitset) atrribute contents:
>> >> >+
>> >> >+ ETHTOOL_A_BITSET_LIST (flag) no mask, only a list
>> >> >+ ETHTOOL_A_BITSET_SIZE (u32) number of significant bits
>> >> >+ ETHTOOL_A_BITSET_VALUE (binary) bitmap of bit values
>> >> >+ ETHTOOL_A_BITSET_MASK (binary) bitmap of valid bits
>> >> >+
>> >> >+Value and mask must have length at least ETHTOOL_A_BITSET_SIZE bits rounded up
>> >> >+to a multiple of 32 bits. They consist of 32-bit words in host byte order,
>> >>
>> >> Looks like the blocks are similar to NLA_BITFIELD32. Why don't you user
>> >> nested array of NLA_BITFIELD32 instead?
>> >
>> >That would mean a layout like
>> >
>> > 4 bytes of attr header
>> > 4 bytes of value
>> > 4 bytes of mask
>> > 4 bytes of attr header
>> > 4 bytes of value
>> > 4 bytes of mask
>> > ...
>> >
>> >i.e. interleaved headers, words of value and words of mask. Having value
>> >and mask contiguous looks cleaner to me. Also, I can quickly check the
>> >sizes without iterating through a (potentially long) array.
>>
>> Yeah, if you are not happy with this, I suggest to introduce
>> NLA_BITFIELD with arbitrary size. That would be probably cleanest.
>
>There is still the question if it it should be implemented as a nested
>attribute which could look like the current compact form without the
>"list" flag (if there is no mask, it's a list). Or an unstructured data
>block consisting of u32 bit length and one or two bitmaps of
>corresponding length. I would prefer the nested attribute, netlink was
>designed to represent structured data, passing structures as binary goes
>against the design (just looked at VFINFO in rtnetlink few days ago,
>it's awful, IMHO).
>
>Either way, I would still prefer to have bitmaps represented as an array
>of 32-bit blocks in host byte order. This would be easy to handle in
>kernel both in places where we have u32 based bitmaps and unsigned long
>based ones. Other options seem less appealing:
>
> - u8 based: only complicates processing
> - u64 based: have to care about alignment
> - unsigned long based: alignment and also problems with 64-bit kernel
> vs. 32-bit userspace
>
>> >> This is quite complex and confusing. Having the same API for 2 APIs is
>> >> odd. The API should be crystal clear, easy to use.
>> >>
>> >> Why can't you have 2 commands, one working with bit arrays only, one
>> >> working with strings? Something like:
>> >> X_GET
>> >> ETHTOOL_A_BITS (nested)
>> >> ETHTOOL_A_BIT_ARRAY (BITFIELD32)
>> >> X_NAMES_GET
>> >> ETHTOOL_A_BIT_NAMES (nested)
>> >> ETHTOOL_A_BIT_INDEX
>> >> ETHTOOL_A_BIT_NAME
>> >>
>> >> For set, you can also have multiple cmds:
>> >> X_SET - to set many at once, by bit index
>> >> ETHTOOL_A_BITS (nested)
>> >> ETHTOOL_A_BIT_ARRAY (BITFIELD32)
>> >> X_ONE_SET - to set one, by bit index
>> >> ETHTOOL_A_BIT_INDEX
>> >> ETHTOOL_A_BIT_VALUE
>> >> X_ONE_SET - to set one, by name
>> >> ETHTOOL_A_BIT_NAME
>> >> ETHTOOL_A_BIT_VALUE
>> >
>> >This looks as if you assume there is nothing except the bitset in the
>> >message but that is not true. Even with your proposed breaking of
>> >current groups, you would still have e.g. 4 bitsets in reply to netdev
>> >features query, 3 in timestamping info GET request and often bitsets
>> >combined with other data (e.g. WoL modes and optional WoL password).
>> >If you wanted to further refine the message granularity to the level of
>> >single parameters, we might be out of message type ids already.
>>
>> You can still have multiple bitsets(bitfields) in single message and
>> have separate cmd/cmds to get string-bit mapping. No need to mangle it.
>
>Let's take a look at what it means in practice, the command is
>
> ethtool --set-prif-flags eth3 legacy-rx on
>
>on an ixgbe card. Currently, ethtool (from the github repository) does
>
>------------------------------------------------------------------------
>ETHTOOL_CMD_SETTINGS_SET (K->U, 68 bytes)
> ETHTOOL_A_HEADER
> ETHTOOL_A_DEV_NAME = "eth3"
> ETHTOOL_A_SETTINGS_PRIV_FLAGS
> ETHTOOL_A_BITSET_BITS
> ETHTOOL_A_BITS_BIT
> ETHTOOL_A_BIT_NAME = "legacy-rx"
> ETHTOOL_A_BIT_VALUE
>
>NLMSG_ERR (K->U, 36 bytes) err = 0
>------------------------------------------------------------------------
>
>If we had only compact form (or some of the NLA_BITFIELD solutions we
>are talking about), you would need
>
>------------------------------------------------------------------------
>ETHTOOL_CMD_STRSET_GET (U->K, 52 bytes)
> ETHTOOL_A_HEADER
> ETHTOOL_A_DEV_NAME = "eth3"
> ETHTOOL_A_STRSET_STRINGSETS
> ETHTOOL_A_STRINGSETS_STRINGSET
> ETHTOOL_A_STRINGSET_ID = 2 (ETH_SS_PRIV_FLAGS)
>
>ETHTOOL_CMD_STRSET_GET_REPLY (K->U, 128 bytes)
> ETHTOOL_A_HEADER
> ETHTOOL_A_DEV_INDEX = 9
> ETHTOOL_A_DEV_NAME = "eth3"
> ETHTOOL_A_STRSET_STRINGSETS
> ETHTOOL_A_STRINGSETS_STRINGSET
> ETHTOOL_A_STRINGSET_ID = 2 (ETH_SS_PRIV_FLAGS)
> ETHTOOL_A_STRINGSET_COUNT = 2
> ETHTOOL_A_STRINGSET_STRINGS
> ETHTOOL_A_STRINGS_STRING
> ETHTOOL_A_STRING_INDEX = 0
> ETHTOOL_A_STRING_VALUE = "legacy-rx"
> ETHTOOL_A_STRINGS_STRING
> ETHTOOL_A_STRING_INDEX = 1
> ETHTOOL_A_STRING_VALUE = "vf-ipsec"
>
>NLMSG_ERR (K->U, 36 bytes) err = 0
>
>ETHTOOL_CMD_SETTINGS_SET (K->U, 64 bytes)
> ETHTOOL_A_HEADER
> ETHTOOL_A_DEV_NAME = "eth3"
> ETHTOOL_A_SETTINGS_PRIV_FLAGS
> ETHTOOL_A_BITSET_SIZE = 2
> ETHTOOL_A_BITSET_VALUE = 00000001
> ETHTOOL_A_BITSET_MASK = 00000001
>
>NLMSG_ERR (K->U, 36 bytes) err = 0
>------------------------------------------------------------------------
>
>That's an extra roundtrip, lot more chat and the SETTINGS_SET message is
>only 4 bytes shorter in the end. And we can consider ourselves lucky
>this NIC has only two private flags. Or that we didn't need to enable or
>disable a netdev feature (56 bits) or link mode (69 bits and growing).
>
>We could reduce the overhead by allowing STRSET_GET query to only ask
>for specific string(s) but there would still be the extra roundtrip
>which I dislike in the ioctl interface. Florian also said in the v5
>discussion that he would like if it was possible to get names and data
>together in one request.
I understand. So how about avoid the bitfield all together and just
have array of either bits of strings or combinations?
ETHTOOL_CMD_SETTINGS_SET (U->K)
ETHTOOL_A_HEADER
ETHTOOL_A_DEV_NAME = "eth3"
ETHTOOL_A_SETTINGS_PRIV_FLAGS
ETHTOOL_A_SETTINGS_PRIV_FLAG
ETHTOOL_A_FLAG_NAME = "legacy-rx"
ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
or the same with index instead of string
ETHTOOL_CMD_SETTINGS_SET (U->K)
ETHTOOL_A_HEADER
ETHTOOL_A_DEV_NAME = "eth3"
ETHTOOL_A_SETTINGS_PRIV_FLAGS
ETHTOOL_A_SETTINGS_PRIV_FLAG
ETHTOOL_A_FLAG_INDEX = 0
ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
For set you can combine both when you want to set multiple bits:
ETHTOOL_CMD_SETTINGS_SET (U->K)
ETHTOOL_A_HEADER
ETHTOOL_A_DEV_NAME = "eth3"
ETHTOOL_A_SETTINGS_PRIV_FLAGS
ETHTOOL_A_SETTINGS_PRIV_FLAG
ETHTOOL_A_FLAG_INDEX = 2
ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
ETHTOOL_A_SETTINGS_PRIV_FLAG
ETHTOOL_A_FLAG_INDEX = 8
ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
ETHTOOL_A_SETTINGS_PRIV_FLAG
ETHTOOL_A_FLAG_NAME = "legacy-rx"
ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
For get this might be a bit bigger message:
ETHTOOL_CMD_SETTINGS_GET_REPLY (K->U)
ETHTOOL_A_HEADER
ETHTOOL_A_DEV_NAME = "eth3"
ETHTOOL_A_SETTINGS_PRIV_FLAGS
ETHTOOL_A_SETTINGS_PRIV_FLAG
ETHTOOL_A_FLAG_INDEX = 0
ETHTOOL_A_FLAG_NAME = "legacy-rx"
ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
ETHTOOL_A_SETTINGS_PRIV_FLAG
ETHTOOL_A_FLAG_INDEX = 1
ETHTOOL_A_FLAG_NAME = "vf-ipsec"
ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
ETHTOOL_A_SETTINGS_PRIV_FLAG
ETHTOOL_A_FLAG_INDEX = 8
ETHTOOL_A_FLAG_NAME = "something-else"
ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>
>Michal
On Tue, Jul 09, 2019 at 03:42:12PM +0200, Jiri Pirko wrote:
> Mon, Jul 08, 2019 at 10:22:19PM CEST, [email protected] wrote:
> >On Mon, Jul 08, 2019 at 09:26:29PM +0200, Jiri Pirko wrote:
> >> Mon, Jul 08, 2019 at 07:27:29PM CEST, [email protected] wrote:
> >> >
> >> >There are two reasons for this design. First is to reduce the number of
> >> >requests needed to get the information. This is not so much a problem of
> >> >ethtool itself; the only existing commands that would result in multiple
> >> >request messages would be "ethtool <dev>" and "ethtool -s <dev>". Maybe
> >> >also "ethtool -x/-X <dev>" but even if the indirection table and hash
> >> >key have different bits assigned now, they don't have to be split even
> >> >if we split other commands. It may be bigger problem for daemons wanting
> >> >to keep track of system configuration which would have to issue many
> >> >requests whenever a new device appears.
> >> >
> >> >Second reason is that with 8-bit genetlink command/message id, the space
> >> >is not as infinite as it might seem. I counted quickly, right now the
> >> >full series uses 14 ids for kernel messages, with split you propose it
> >> >would most likely grow to 44. For full implementation of all ethtool
> >> >functionality, we could get to ~60 ids. It's still only 1/4 of the
> >> >available space but it's not clear what the future development will look
> >> >like. We would certainly need to be careful not to start allocating new
> >> >commands for single parameters and try to be foreseeing about what can
> >> >be grouped together. But we will need to do that in any case.
> >> >
> >> >On kernel side, splitting existing messages would make some things a bit
> >> >easier. It would also reduce the number of scenarios where only part of
> >> >requested information is available or only part of a SET request fails.
> >>
> >> Okay, I got your point. So why don't we look at if from the other angle.
> >> Why don't we have only single get/set command that would be in general
> >> used to get/set ALL info from/to the kernel. Where we can have these
> >> bits (perhaps rather varlen bitfield) to for user to indicate which data
> >> is he interested in? This scales. The other commands would be
> >> just for action.
> >>
> >> Something like RTM_GETLINK/RTM_SETLINK. Makes sense?
> >
> >It's certainly an option but at the first glance it seems as just moving
> >what I tried to avoid one level lower. It would work around the u8 issue
> >(but as Johannes pointed out, we can handle it with genetlink when/if
> >the time comes). We would almost certainly have to split the replies
> >into multiple messages to keep the packet size reasonable. I'll have to
> >think more about the consequences for both kernel and userspace.
> >
> >My gut feeling is that out of the two extreme options (one universal
> >message type and message types corresponding to current infomask bits),
> >the latter is more appealing. After all, ethtool has been gathering
> >features that would need those ~60 message types for 20 years.
>
> Yeah, but I think that we have to do one or another. Anything in between
> makes the code complex and uapi confusing. Let's start clean :)
I'll split the messages for v7.
Michal
On Tue, Jul 09, 2019 at 04:18:17PM +0200, Jiri Pirko wrote:
>
> I understand. So how about avoid the bitfield all together and just
> have array of either bits of strings or combinations?
>
> ETHTOOL_CMD_SETTINGS_SET (U->K)
> ETHTOOL_A_HEADER
> ETHTOOL_A_DEV_NAME = "eth3"
> ETHTOOL_A_SETTINGS_PRIV_FLAGS
> ETHTOOL_A_SETTINGS_PRIV_FLAG
> ETHTOOL_A_FLAG_NAME = "legacy-rx"
> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>
> or the same with index instead of string
>
> ETHTOOL_CMD_SETTINGS_SET (U->K)
> ETHTOOL_A_HEADER
> ETHTOOL_A_DEV_NAME = "eth3"
> ETHTOOL_A_SETTINGS_PRIV_FLAGS
> ETHTOOL_A_SETTINGS_PRIV_FLAG
> ETHTOOL_A_FLAG_INDEX = 0
> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>
>
> For set you can combine both when you want to set multiple bits:
>
> ETHTOOL_CMD_SETTINGS_SET (U->K)
> ETHTOOL_A_HEADER
> ETHTOOL_A_DEV_NAME = "eth3"
> ETHTOOL_A_SETTINGS_PRIV_FLAGS
> ETHTOOL_A_SETTINGS_PRIV_FLAG
> ETHTOOL_A_FLAG_INDEX = 2
> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> ETHTOOL_A_SETTINGS_PRIV_FLAG
> ETHTOOL_A_FLAG_INDEX = 8
> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> ETHTOOL_A_SETTINGS_PRIV_FLAG
> ETHTOOL_A_FLAG_NAME = "legacy-rx"
> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>
>
> For get this might be a bit bigger message:
>
> ETHTOOL_CMD_SETTINGS_GET_REPLY (K->U)
> ETHTOOL_A_HEADER
> ETHTOOL_A_DEV_NAME = "eth3"
> ETHTOOL_A_SETTINGS_PRIV_FLAGS
> ETHTOOL_A_SETTINGS_PRIV_FLAG
> ETHTOOL_A_FLAG_INDEX = 0
> ETHTOOL_A_FLAG_NAME = "legacy-rx"
> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> ETHTOOL_A_SETTINGS_PRIV_FLAG
> ETHTOOL_A_FLAG_INDEX = 1
> ETHTOOL_A_FLAG_NAME = "vf-ipsec"
> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> ETHTOOL_A_SETTINGS_PRIV_FLAG
> ETHTOOL_A_FLAG_INDEX = 8
> ETHTOOL_A_FLAG_NAME = "something-else"
> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
This is perfect for "one shot" applications but not so much for long
running ones, either "ethtool --monitor" or management or monitoring
daemons. Repeating the names in every notification message would be
a waste, it's much more convenient to load the strings only once and
cache them. Even if we omit the names in notifications (and possibly the
GET replies if client opts for it), this format still takes 12-16 bytes
per bit.
So the problem I'm trying to address is that there are two types of
clients with very different mode of work and different preferences.
Looking at the bitset.c, I would rather say that most of the complexity
and ugliness comes from dealing with both unsigned long based bitmaps
and u32 based ones. Originally, there were functions working with
unsigned long based bitmaps and the variants with "32" suffix were
wrappers around them which converted u32 bitmaps to unsigned long ones
and back. This became a problem when kernel started issuing warnings
about variable length arrays as getting rid of them meant two kmalloc()
and two kfree() for each u32 bitmap operation, even if most of the
bitmaps are in rather short in practice.
Maybe the wrapper could do something like
int ethnl_put_bitset32(const u32 *value, const u32 *mask,
unsigned int size, ...)
{
unsigned long fixed_value[2], fixed_mask[2];
unsigned long *tmp_value = fixed_value;
unsigned long *tmp_mask = fixed_mask;
if (size > sizeof(fixed_value) * BITS_PER_BYTE) {
tmp_value = bitmap_alloc(size);
if (!tmp_value)
return -ENOMEM;
tmp_mask = bitmap_alloc(size);
if (!tmp_mask) {
kfree(tmp_value);
return -ENOMEM;
}
}
bitmap_from_arr32(tmp_value, value, size);
bitmap_from_arr32(tmp_mask, mask, size);
ret = ethnl_put_bitset(tmp_value, tmp_mask, size, ...);
}
This way we would make bitset.c code cleaner while avoiding allocating
short bitmaps (which is the most common case).
Michal
Wed, Jul 10, 2019 at 02:38:03PM CEST, [email protected] wrote:
>On Tue, Jul 09, 2019 at 04:18:17PM +0200, Jiri Pirko wrote:
>>
>> I understand. So how about avoid the bitfield all together and just
>> have array of either bits of strings or combinations?
>>
>> ETHTOOL_CMD_SETTINGS_SET (U->K)
>> ETHTOOL_A_HEADER
>> ETHTOOL_A_DEV_NAME = "eth3"
>> ETHTOOL_A_SETTINGS_PRIV_FLAGS
>> ETHTOOL_A_SETTINGS_PRIV_FLAG
>> ETHTOOL_A_FLAG_NAME = "legacy-rx"
>> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>>
>> or the same with index instead of string
>>
>> ETHTOOL_CMD_SETTINGS_SET (U->K)
>> ETHTOOL_A_HEADER
>> ETHTOOL_A_DEV_NAME = "eth3"
>> ETHTOOL_A_SETTINGS_PRIV_FLAGS
>> ETHTOOL_A_SETTINGS_PRIV_FLAG
>> ETHTOOL_A_FLAG_INDEX = 0
>> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>>
>>
>> For set you can combine both when you want to set multiple bits:
>>
>> ETHTOOL_CMD_SETTINGS_SET (U->K)
>> ETHTOOL_A_HEADER
>> ETHTOOL_A_DEV_NAME = "eth3"
>> ETHTOOL_A_SETTINGS_PRIV_FLAGS
>> ETHTOOL_A_SETTINGS_PRIV_FLAG
>> ETHTOOL_A_FLAG_INDEX = 2
>> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>> ETHTOOL_A_SETTINGS_PRIV_FLAG
>> ETHTOOL_A_FLAG_INDEX = 8
>> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>> ETHTOOL_A_SETTINGS_PRIV_FLAG
>> ETHTOOL_A_FLAG_NAME = "legacy-rx"
>> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>>
>>
>> For get this might be a bit bigger message:
>>
>> ETHTOOL_CMD_SETTINGS_GET_REPLY (K->U)
>> ETHTOOL_A_HEADER
>> ETHTOOL_A_DEV_NAME = "eth3"
>> ETHTOOL_A_SETTINGS_PRIV_FLAGS
>> ETHTOOL_A_SETTINGS_PRIV_FLAG
>> ETHTOOL_A_FLAG_INDEX = 0
>> ETHTOOL_A_FLAG_NAME = "legacy-rx"
>> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>> ETHTOOL_A_SETTINGS_PRIV_FLAG
>> ETHTOOL_A_FLAG_INDEX = 1
>> ETHTOOL_A_FLAG_NAME = "vf-ipsec"
>> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>> ETHTOOL_A_SETTINGS_PRIV_FLAG
>> ETHTOOL_A_FLAG_INDEX = 8
>> ETHTOOL_A_FLAG_NAME = "something-else"
>> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
>
>This is perfect for "one shot" applications but not so much for long
>running ones, either "ethtool --monitor" or management or monitoring
>daemons. Repeating the names in every notification message would be
>a waste, it's much more convenient to load the strings only once and
Yeah, for those aplications, the ETHTOOL_A_FLAG_NAME could be omitted
>cache them. Even if we omit the names in notifications (and possibly the
>GET replies if client opts for it), this format still takes 12-16 bytes
>per bit.
>
>So the problem I'm trying to address is that there are two types of
>clients with very different mode of work and different preferences.
>
>Looking at the bitset.c, I would rather say that most of the complexity
>and ugliness comes from dealing with both unsigned long based bitmaps
>and u32 based ones. Originally, there were functions working with
>unsigned long based bitmaps and the variants with "32" suffix were
>wrappers around them which converted u32 bitmaps to unsigned long ones
>and back. This became a problem when kernel started issuing warnings
>about variable length arrays as getting rid of them meant two kmalloc()
>and two kfree() for each u32 bitmap operation, even if most of the
>bitmaps are in rather short in practice.
>
>Maybe the wrapper could do something like
>
>int ethnl_put_bitset32(const u32 *value, const u32 *mask,
> unsigned int size, ...)
>{
> unsigned long fixed_value[2], fixed_mask[2];
> unsigned long *tmp_value = fixed_value;
> unsigned long *tmp_mask = fixed_mask;
>
> if (size > sizeof(fixed_value) * BITS_PER_BYTE) {
> tmp_value = bitmap_alloc(size);
> if (!tmp_value)
> return -ENOMEM;
> tmp_mask = bitmap_alloc(size);
> if (!tmp_mask) {
> kfree(tmp_value);
> return -ENOMEM;
> }
> }
>
> bitmap_from_arr32(tmp_value, value, size);
> bitmap_from_arr32(tmp_mask, mask, size);
> ret = ethnl_put_bitset(tmp_value, tmp_mask, size, ...);
>}
>
>This way we would make bitset.c code cleaner while avoiding allocating
>short bitmaps (which is the most common case).
I'm primarily concerned about the uapi. Plus if the uapi approach is united
for both index and string, we can omit this whole bitset abomination...
>
>Michal
On Wed, Jul 10, 2019 at 02:59:43PM +0200, Jiri Pirko wrote:
> Wed, Jul 10, 2019 at 02:38:03PM CEST, [email protected] wrote:
> >On Tue, Jul 09, 2019 at 04:18:17PM +0200, Jiri Pirko wrote:
> >>
> >> I understand. So how about avoid the bitfield all together and just
> >> have array of either bits of strings or combinations?
> >>
> >> ETHTOOL_CMD_SETTINGS_SET (U->K)
> >> ETHTOOL_A_HEADER
> >> ETHTOOL_A_DEV_NAME = "eth3"
> >> ETHTOOL_A_SETTINGS_PRIV_FLAGS
> >> ETHTOOL_A_SETTINGS_PRIV_FLAG
> >> ETHTOOL_A_FLAG_NAME = "legacy-rx"
> >> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> >>
> >> or the same with index instead of string
> >>
> >> ETHTOOL_CMD_SETTINGS_SET (U->K)
> >> ETHTOOL_A_HEADER
> >> ETHTOOL_A_DEV_NAME = "eth3"
> >> ETHTOOL_A_SETTINGS_PRIV_FLAGS
> >> ETHTOOL_A_SETTINGS_PRIV_FLAG
> >> ETHTOOL_A_FLAG_INDEX = 0
> >> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> >>
> >>
> >> For set you can combine both when you want to set multiple bits:
> >>
> >> ETHTOOL_CMD_SETTINGS_SET (U->K)
> >> ETHTOOL_A_HEADER
> >> ETHTOOL_A_DEV_NAME = "eth3"
> >> ETHTOOL_A_SETTINGS_PRIV_FLAGS
> >> ETHTOOL_A_SETTINGS_PRIV_FLAG
> >> ETHTOOL_A_FLAG_INDEX = 2
> >> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> >> ETHTOOL_A_SETTINGS_PRIV_FLAG
> >> ETHTOOL_A_FLAG_INDEX = 8
> >> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> >> ETHTOOL_A_SETTINGS_PRIV_FLAG
> >> ETHTOOL_A_FLAG_NAME = "legacy-rx"
> >> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> >>
> >>
> >> For get this might be a bit bigger message:
> >>
> >> ETHTOOL_CMD_SETTINGS_GET_REPLY (K->U)
> >> ETHTOOL_A_HEADER
> >> ETHTOOL_A_DEV_NAME = "eth3"
> >> ETHTOOL_A_SETTINGS_PRIV_FLAGS
> >> ETHTOOL_A_SETTINGS_PRIV_FLAG
> >> ETHTOOL_A_FLAG_INDEX = 0
> >> ETHTOOL_A_FLAG_NAME = "legacy-rx"
> >> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> >> ETHTOOL_A_SETTINGS_PRIV_FLAG
> >> ETHTOOL_A_FLAG_INDEX = 1
> >> ETHTOOL_A_FLAG_NAME = "vf-ipsec"
> >> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> >> ETHTOOL_A_SETTINGS_PRIV_FLAG
> >> ETHTOOL_A_FLAG_INDEX = 8
> >> ETHTOOL_A_FLAG_NAME = "something-else"
> >> ETHTOOL_A_FLAG_VALUE (NLA_FLAG)
> >
> >This is perfect for "one shot" applications but not so much for long
> >running ones, either "ethtool --monitor" or management or monitoring
> >daemons. Repeating the names in every notification message would be
> >a waste, it's much more convenient to load the strings only once and
>
> Yeah, for those aplications, the ETHTOOL_A_FLAG_NAME could be omitted
>
>
> >cache them. Even if we omit the names in notifications (and possibly the
> >GET replies if client opts for it), this format still takes 12-16 bytes
> >per bit.
> >
> >So the problem I'm trying to address is that there are two types of
> >clients with very different mode of work and different preferences.
> >
> >Looking at the bitset.c, I would rather say that most of the complexity
> >and ugliness comes from dealing with both unsigned long based bitmaps
> >and u32 based ones. Originally, there were functions working with
> >unsigned long based bitmaps and the variants with "32" suffix were
> >wrappers around them which converted u32 bitmaps to unsigned long ones
> >and back. This became a problem when kernel started issuing warnings
> >about variable length arrays as getting rid of them meant two kmalloc()
> >and two kfree() for each u32 bitmap operation, even if most of the
> >bitmaps are in rather short in practice.
> >
> >Maybe the wrapper could do something like
> >
> >int ethnl_put_bitset32(const u32 *value, const u32 *mask,
> > unsigned int size, ...)
> >{
> > unsigned long fixed_value[2], fixed_mask[2];
> > unsigned long *tmp_value = fixed_value;
> > unsigned long *tmp_mask = fixed_mask;
> >
> > if (size > sizeof(fixed_value) * BITS_PER_BYTE) {
> > tmp_value = bitmap_alloc(size);
> > if (!tmp_value)
> > return -ENOMEM;
> > tmp_mask = bitmap_alloc(size);
> > if (!tmp_mask) {
> > kfree(tmp_value);
> > return -ENOMEM;
> > }
> > }
> >
> > bitmap_from_arr32(tmp_value, value, size);
> > bitmap_from_arr32(tmp_mask, mask, size);
> > ret = ethnl_put_bitset(tmp_value, tmp_mask, size, ...);
> >}
> >
> >This way we would make bitset.c code cleaner while avoiding allocating
> >short bitmaps (which is the most common case).
>
> I'm primarily concerned about the uapi. Plus if the uapi approach is united
> for both index and string, we can omit this whole bitset abomination...
I'm afraid I don't understand this comment. Whatever the representation
of bitmaps (both simple bitmaps and value/mask pairs) is going to be, we
will need a function for parsing them (currently ethnl_update_bitset())
and a function for filling them into the message (currently
ethnl_put_bitset()). Unless you are suggesting to write a copy of
essentially the same parser and composer for each of the bitsets (there
is 15 of them at the already and 4 NLA_BITFIELD32 attributes which I'm
seriously considering to replace with arbitrary length bitsets as well
to make the UAPI as future proof as possible).
After all, what you suggested above is exactly the same structure as my
bitset in verbose form, except you omit size (which is a problem, as
discussed in other part of the thread) and put the contents of BITS
container directly under the main container.
Michal