This is an almost complete rework of [0].
This series introduces generic XDP statistics infra based on rtnl
xstats (Ethtool standard stats previously), and wires up the drivers
which collect appropriate statistics to this new interface. Finally,
it introduces XDP/XSK statistics to all XDP-capable Intel drivers.
Those counters are:
* packets: number of frames passed to bpf_prog_run_xdp().
* bytes: number of bytes went through bpf_prog_run_xdp().
* errors: number of general XDP errors, if driver has one unified
counter.
* aborted: number of XDP_ABORTED returns.
* drop: number of XDP_DROP returns.
* invalid: number of returns of unallowed values (i.e. not XDP_*).
* pass: number of XDP_PASS returns.
* redirect: number of successfully performed XDP_REDIRECT requests.
* redirect_errors: number of failed XDP_REDIRECT requests.
* tx: number of successfully performed XDP_TX requests.
* tx_errors: number of failed XDP_TX requests.
* xmit_packets: number of successfully transmitted XDP/XSK frames.
* xmit_bytes: number of successfully transmitted XDP/XSK frames.
* xmit_errors: of XDP/XSK frames failed to transmit.
* xmit_full: number of XDP/XSK queue being full at the moment of
transmission.
To provide them, developers need to implement .ndo_get_xdp_stats()
and, if they want to expose stats on a per-channel basis,
.ndo_get_xdp_stats_nch(). include/net/xdp.h contains some helper
structs and functions which might be useful for doing this.
It is up to developers to decide whether to implement XDP stats in
their drivers or not, depending on the needs and so on, but if so,
it is implied that they will be implemented using this new infra
rather than custom Ethtool entries. XDP stats {,type} list can be
expanded if needed as counters are being provided as nested NL attrs.
There's an option to provide XDP and XSK counters separately, and I
used it in mlx5 (has separate RQs and SQs) and Intel (have separate
NAPI poll routines) drivers.
Example output of iproute2's new command:
$ ip link xdpstats dev enp178s0
16: enp178s0:
xdp-channel0-rx_xdp_packets: 0
xdp-channel0-rx_xdp_bytes: 1
xdp-channel0-rx_xdp_errors: 2
xdp-channel0-rx_xdp_aborted: 3
xdp-channel0-rx_xdp_drop: 4
xdp-channel0-rx_xdp_invalid: 5
xdp-channel0-rx_xdp_pass: 6
xdp-channel0-rx_xdp_redirect: 7
xdp-channel0-rx_xdp_redirect_errors: 8
xdp-channel0-rx_xdp_tx: 9
xdp-channel0-rx_xdp_tx_errors: 10
xdp-channel0-tx_xdp_xmit_packets: 11
xdp-channel0-tx_xdp_xmit_bytes: 12
xdp-channel0-tx_xdp_xmit_errors: 13
xdp-channel0-tx_xdp_xmit_full: 14
[ ... ]
This series doesn't touch existing Ethtool-exposed XDP stats due
to the potential breakage of custom{,er} scripts and other stuff
that might be hardwired on their presence. Developers are free to
drop them on their own if possible. In ideal case we would see
Ethtool stats free from anything related to XDP, but it's unlikely
to happen (:
XDP_PASS kpps on an ice NIC with ~50 Gbps line rate:
Frame size 64 | 128 | 256 | 512 | 1024 | 1532
----------------------------------------------------------
net-next 23557 | 23750 | 20731 | 11723 | 6270 | 4377
This series 23484 | 23812 | 20679 | 11720 | 6270 | 4377
The same situation with XDP_DROP and several more common cases:
nothing past stddev (which is a bit surprising, but not complaining
at all).
A brief series breakdown:
* 1-2: introduce new infra and APIs, for rtnetlink and drivers
respectively;
* 3-19: add needed callback to the existing drivers to export
their stats using new infra. Some of the patches are cosmetic
prereqs;
* 20-25: add XDP/XSK stats to all Intel drivers;
* 26: mention generic XDP stats in Documentation.
This set is also available here: [1]
A separate iproute2-next patch will be published in parallel,
for now you can find it here: [2]
From v1 [0]:
- use rtnl xstats instead of Ethtool standard -- XDP stats are
purely software while Ethtool infra was designed for HW stats
(Jakub);
- split xmit into xmit_packets and xmit_bytes, add xmit_full;
- don't touch existing drivers custom XDP stats exposed via
Ethtool (Jakub);
- add a bunch of helper structs and functions to reduce boilerplates
and code duplication in new drivers;
- merge with the series which adds XDP stats to Intel drivers;
- add some perf numbers per Jakub's request.
[0] https://lore.kernel.org/all/[email protected]
[1] https://github.com/alobakin/linux/pull/11
[2] https://github.com/alobakin/iproute2/pull/1
Alexander Lobakin (26):
rtnetlink: introduce generic XDP statistics
xdp: provide common driver helpers for implementing XDP stats
ena: implement generic XDP statistics callbacks
dpaa2: implement generic XDP stats callbacks
enetc: implement generic XDP stats callbacks
mvneta: reformat mvneta_netdev_ops
mvneta: add .ndo_get_xdp_stats() callback
mvpp2: provide .ndo_get_xdp_stats() callback
mlx5: don't mix XDP_DROP and Rx XDP error cases
mlx5: provide generic XDP stats callbacks
sf100, sfx: implement generic XDP stats callbacks
veth: don't mix XDP_DROP counter with Rx XDP errors
veth: drop 'xdp_' suffix from packets and bytes stats
veth: reformat veth_netdev_ops
veth: add generic XDP stats callbacks
virtio_net: don't mix XDP_DROP counter with Rx XDP errors
virtio_net: rename xdp_tx{,_drops} SQ stats to xdp_xmit{,_errors}
virtio_net: reformat virtnet_netdev
virtio_net: add callbacks for generic XDP stats
i40e: add XDP and XSK generic per-channel statistics
ice: add XDP and XSK generic per-channel statistics
igb: add XDP generic per-channel statistics
igc: bail out early on XSK xmit if no descs are available
igc: add XDP and XSK generic per-channel statistics
ixgbe: add XDP and XSK generic per-channel statistics
Documentation: reflect generic XDP statistics
Documentation/networking/statistics.rst | 33 +++
drivers/net/ethernet/amazon/ena/ena_netdev.c | 53 ++++
.../net/ethernet/freescale/dpaa2/dpaa2-eth.c | 45 +++
drivers/net/ethernet/freescale/enetc/enetc.c | 48 ++++
drivers/net/ethernet/freescale/enetc/enetc.h | 3 +
.../net/ethernet/freescale/enetc/enetc_pf.c | 2 +
drivers/net/ethernet/intel/i40e/i40e.h | 1 +
drivers/net/ethernet/intel/i40e/i40e_main.c | 38 ++-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 40 ++-
drivers/net/ethernet/intel/i40e/i40e_txrx.h | 1 +
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 33 ++-
drivers/net/ethernet/intel/ice/ice.h | 2 +
drivers/net/ethernet/intel/ice/ice_lib.c | 21 ++
drivers/net/ethernet/intel/ice/ice_main.c | 17 ++
drivers/net/ethernet/intel/ice/ice_txrx.c | 33 ++-
drivers/net/ethernet/intel/ice/ice_txrx.h | 12 +-
drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 3 +
drivers/net/ethernet/intel/ice/ice_xsk.c | 51 +++-
drivers/net/ethernet/intel/igb/igb.h | 14 +-
drivers/net/ethernet/intel/igb/igb_main.c | 102 ++++++-
drivers/net/ethernet/intel/igc/igc.h | 3 +
drivers/net/ethernet/intel/igc/igc_main.c | 89 +++++-
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 1 +
drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 3 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 69 ++++-
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 56 +++-
drivers/net/ethernet/marvell/mvneta.c | 78 +++++-
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 51 ++++
drivers/net/ethernet/mellanox/mlx5/core/en.h | 5 +
.../net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 +-
.../net/ethernet/mellanox/mlx5/core/en_main.c | 2 +
.../ethernet/mellanox/mlx5/core/en_stats.c | 76 +++++
.../ethernet/mellanox/mlx5/core/en_stats.h | 3 +
drivers/net/ethernet/sfc/ef100_netdev.c | 2 +
drivers/net/ethernet/sfc/efx.c | 2 +
drivers/net/ethernet/sfc/efx_common.c | 42 +++
drivers/net/ethernet/sfc/efx_common.h | 3 +
drivers/net/veth.c | 128 +++++++--
drivers/net/virtio_net.c | 104 +++++--
include/linux/if_link.h | 39 ++-
include/linux/netdevice.h | 13 +
include/net/xdp.h | 162 +++++++++++
include/uapi/linux/if_link.h | 67 +++++
net/core/rtnetlink.c | 264 ++++++++++++++++++
net/core/xdp.c | 124 ++++++++
45 files changed, 1798 insertions(+), 143 deletions(-)
--
2.33.1
Lots of the driver-side XDP enabled drivers provide some statistics
on XDP programs runs and different actions taken (number of passes,
drops, redirects etc.). Regarding that it's quite similar across all
the drivers (which is obvious), we can implement some sort of
generic statistics using rtnetlink xstats infra to provide a way for
exposing XDP/XSK statistics without code and stringsets duplication
inside drivers' Ethtool callbacks.
These 15 fields provided by the standard XDP stats should cover most
stats that might be interesting for collecting and tracking.
Note that most NIC drivers keep XDP statistics on a per-channel
basis, so this also introduces a new callback for getting a number
of channels which a driver will provide stats for. If it's not
implemented, we assume the driver stats are shared across channels.
If it's here, it should return either the number of channels, or 0
if stats for this type (XDP or XSK) is shared, or -EOPNOTSUPP if it
doesn't store such type of statistics, or -ENODATA if it does, but
can't provide them right now for any reason (purely for better code
readability, acts the same as -EOPNOTSUPP).
Stats are provided as nested attrs to be able to expand them later
on.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
---
include/linux/if_link.h | 39 +++++-
include/linux/netdevice.h | 12 ++
include/uapi/linux/if_link.h | 67 +++++++++
net/core/rtnetlink.c | 264 +++++++++++++++++++++++++++++++++++
4 files changed, 381 insertions(+), 1 deletion(-)
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 622658dfbf0a..a0dac6cb3e6a 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -4,8 +4,8 @@
#include <uapi/linux/if_link.h>
+/* We don't want these structures exposed to user space */
-/* We don't want this structure exposed to user space */
struct ifla_vf_stats {
__u64 rx_packets;
__u64 tx_packets;
@@ -30,4 +30,41 @@ struct ifla_vf_info {
__u32 trusted;
__be16 vlan_proto;
};
+
+/**
+ * struct ifla_xdp_stats - driver-side XDP statistics
+ * @packets: number of frames passed to bpf_prog_run_xdp().
+ * @bytes: number of bytes went through bpf_prog_run_xdp().
+ * @errors: number of general XDP errors, if driver has one unified counter.
+ * @aborted: number of %XDP_ABORTED returns.
+ * @drop: number of %XDP_DROP returns.
+ * @invalid: number of returns of unallowed values (i.e. not XDP_*).
+ * @pass: number of %XDP_PASS returns.
+ * @redirect: number of successfully performed %XDP_REDIRECT requests.
+ * @redirect_errors: number of failed %XDP_REDIRECT requests.
+ * @tx: number of successfully performed %XDP_TX requests.
+ * @tx_errors: number of failed %XDP_TX requests.
+ * @xmit_packets: number of successfully transmitted XDP/XSK frames.
+ * @xmit_bytes: number of successfully transmitted XDP/XSK frames.
+ * @xmit_errors: of XDP/XSK frames failed to transmit.
+ * @xmit_full: number of XDP/XSK queue being full at the moment of transmission.
+ */
+struct ifla_xdp_stats {
+ __u64 packets;
+ __u64 bytes;
+ __u64 errors;
+ __u64 aborted;
+ __u64 drop;
+ __u64 invalid;
+ __u64 pass;
+ __u64 redirect;
+ __u64 redirect_errors;
+ __u64 tx;
+ __u64 tx_errors;
+ __u64 xmit_packets;
+ __u64 xmit_bytes;
+ __u64 xmit_errors;
+ __u64 xmit_full;
+};
+
#endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index db3bff1ae7fd..058a00c2d19a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1327,6 +1327,13 @@ struct netdev_net_notifier {
* queue id bound to an AF_XDP socket. The flags field specifies if
* only RX, only Tx, or both should be woken up using the flags
* XDP_WAKEUP_RX and XDP_WAKEUP_TX.
+ * int (*ndo_get_xdp_stats_nch)(const struct net_device *dev, u32 attr_id);
+ * Get the number of channels which ndo_get_xdp_stats will return
+ * statistics for.
+ *
+ * int (*ndo_get_xdp_stats)(const struct net_device *dev, u32 attr_id,
+ * void *attr_data);
+ * Get attr_id XDP statistics into the attr_data pointer.
* struct devlink_port *(*ndo_get_devlink_port)(struct net_device *dev);
* Get devlink port instance associated with a given netdev.
* Called with a reference on the netdevice and devlink locks only,
@@ -1550,6 +1557,11 @@ struct net_device_ops {
struct xdp_buff *xdp);
int (*ndo_xsk_wakeup)(struct net_device *dev,
u32 queue_id, u32 flags);
+ int (*ndo_get_xdp_stats_nch)(const struct net_device *dev,
+ u32 attr_id);
+ int (*ndo_get_xdp_stats)(const struct net_device *dev,
+ u32 attr_id,
+ void *attr_data);
struct devlink_port * (*ndo_get_devlink_port)(struct net_device *dev);
int (*ndo_tunnel_ctl)(struct net_device *dev,
struct ip_tunnel_parm *p, int cmd);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index eebd3894fe89..dc1dd31e8274 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -1147,6 +1147,7 @@ enum {
IFLA_STATS_LINK_XSTATS_SLAVE,
IFLA_STATS_LINK_OFFLOAD_XSTATS,
IFLA_STATS_AF_SPEC,
+ IFLA_STATS_LINK_XDP_XSTATS,
__IFLA_STATS_MAX,
};
@@ -1175,6 +1176,72 @@ enum {
};
#define IFLA_OFFLOAD_XSTATS_MAX (__IFLA_OFFLOAD_XSTATS_MAX - 1)
+/* These are embedded into IFLA_STATS_LINK_XDP_XSTATS */
+enum {
+ IFLA_XDP_XSTATS_TYPE_UNSPEC,
+ /* Stats collected on a "regular" channel(s) */
+ IFLA_XDP_XSTATS_TYPE_XDP,
+ /* Stats collected on an XSK channel(s) */
+ IFLA_XDP_XSTATS_TYPE_XSK,
+
+ __IFLA_XDP_XSTATS_TYPE_CNT,
+};
+
+#define IFLA_XDP_XSTATS_TYPE_START (IFLA_XDP_XSTATS_TYPE_UNSPEC + 1)
+#define IFLA_XDP_XSTATS_TYPE_MAX (__IFLA_XDP_XSTATS_TYPE_CNT - 1)
+
+/* Embedded into IFLA_XDP_XSTATS_TYPE_XDP or IFLA_XDP_XSTATS_TYPE_XSK */
+enum {
+ IFLA_XDP_XSTATS_SCOPE_UNSPEC,
+ /* netdev-wide stats */
+ IFLA_XDP_XSTATS_SCOPE_SHARED,
+ /* Per-channel stats */
+ IFLA_XDP_XSTATS_SCOPE_CHANNEL,
+
+ __IFLA_XDP_XSTATS_SCOPE_CNT,
+};
+
+/* Embedded into IFLA_XDP_XSTATS_SCOPE_SHARED/IFLA_XDP_XSTATS_SCOPE_CHANNEL */
+enum {
+ /* Padding for 64-bit alignment */
+ IFLA_XDP_XSTATS_UNSPEC,
+ /* Number of frames passed to bpf_prog_run_xdp() */
+ IFLA_XDP_XSTATS_PACKETS,
+ /* Number of bytes went through bpf_prog_run_xdp() */
+ IFLA_XDP_XSTATS_BYTES,
+ /* Number of general XDP errors if driver counts them together */
+ IFLA_XDP_XSTATS_ERRORS,
+ /* Number of %XDP_ABORTED returns */
+ IFLA_XDP_XSTATS_ABORTED,
+ /* Number of %XDP_DROP returns */
+ IFLA_XDP_XSTATS_DROP,
+ /* Number of returns of unallowed values (i.e. not XDP_*) */
+ IFLA_XDP_XSTATS_INVALID,
+ /* Number of %XDP_PASS returns */
+ IFLA_XDP_XSTATS_PASS,
+ /* Number of successfully performed %XDP_REDIRECT requests */
+ IFLA_XDP_XSTATS_REDIRECT,
+ /* Number of failed %XDP_REDIRECT requests */
+ IFLA_XDP_XSTATS_REDIRECT_ERRORS,
+ /* Number of successfully performed %XDP_TX requests */
+ IFLA_XDP_XSTATS_TX,
+ /* Number of failed %XDP_TX requests */
+ IFLA_XDP_XSTATS_TX_ERRORS,
+ /* Number of successfully transmitted XDP/XSK frames */
+ IFLA_XDP_XSTATS_XMIT_PACKETS,
+ /* Number of successfully transmitted XDP/XSK bytes */
+ IFLA_XDP_XSTATS_XMIT_BYTES,
+ /* Number of XDP/XSK frames failed to transmit */
+ IFLA_XDP_XSTATS_XMIT_ERRORS,
+ /* Number of XDP/XSK queue being full at the moment of transmission */
+ IFLA_XDP_XSTATS_XMIT_FULL,
+
+ __IFLA_XDP_XSTATS_CNT,
+};
+
+#define IFLA_XDP_XSTATS_START (IFLA_XDP_XSTATS_UNSPEC + 1)
+#define IFLA_XDP_XSTATS_MAX (__IFLA_XDP_XSTATS_CNT - 1)
+
/* XDP section */
#define XDP_FLAGS_UPDATE_IF_NOEXIST (1U << 0)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6f25c0a8aebe..b7db68fb0879 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -5107,6 +5107,262 @@ static int rtnl_get_offload_stats_size(const struct net_device *dev)
return nla_size;
}
+#define IFLA_XDP_XSTATS_NUM (__IFLA_XDP_XSTATS_CNT - \
+ IFLA_XDP_XSTATS_START)
+
+static_assert(sizeof(struct ifla_xdp_stats) / sizeof(__u64) ==
+ IFLA_XDP_XSTATS_NUM);
+
+static u32 rtnl_get_xdp_stats_num(u32 attr_id)
+{
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ case IFLA_XDP_XSTATS_TYPE_XSK:
+ return IFLA_XDP_XSTATS_NUM;
+ default:
+ return 0;
+ }
+}
+
+static bool rtnl_get_xdp_stats_xdpxsk(struct sk_buff *skb, u32 ch,
+ const void *attr_data)
+{
+ const struct ifla_xdp_stats *xstats = attr_data;
+
+ xstats += ch;
+
+ if (nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_PACKETS, xstats->packets,
+ IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_BYTES, xstats->bytes,
+ IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_ERRORS, xstats->errors,
+ IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_ABORTED, xstats->aborted,
+ IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_DROP, xstats->drop,
+ IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_INVALID, xstats->invalid,
+ IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_PASS, xstats->pass,
+ IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_REDIRECT, xstats->redirect,
+ IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_REDIRECT_ERRORS,
+ xstats->redirect_errors,
+ IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_TX, xstats->tx,
+ IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_TX_ERRORS,
+ xstats->tx_errors, IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_PACKETS,
+ xstats->xmit_packets, IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_BYTES,
+ xstats->xmit_bytes, IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_ERRORS,
+ xstats->xmit_errors, IFLA_XDP_XSTATS_UNSPEC) ||
+ nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_FULL,
+ xstats->xmit_full, IFLA_XDP_XSTATS_UNSPEC))
+ return false;
+
+ return true;
+}
+
+static bool rtnl_get_xdp_stats_one(struct sk_buff *skb, u32 attr_id,
+ u32 scope_id, u32 ch, const void *attr_data)
+{
+ struct nlattr *scope;
+
+ scope = nla_nest_start_noflag(skb, scope_id);
+ if (!scope)
+ return false;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ case IFLA_XDP_XSTATS_TYPE_XSK:
+ if (!rtnl_get_xdp_stats_xdpxsk(skb, ch, attr_data))
+ goto fail;
+
+ break;
+ default:
+fail:
+ nla_nest_cancel(skb, scope);
+
+ return false;
+ }
+
+ nla_nest_end(skb, scope);
+
+ return true;
+}
+
+static bool rtnl_get_xdp_stats(struct sk_buff *skb,
+ const struct net_device *dev,
+ int *idxattr, int *prividx)
+{
+ const struct net_device_ops *ops = dev->netdev_ops;
+ struct nlattr *xstats, *type = NULL;
+ u32 saved_ch = *prividx & U16_MAX;
+ u32 saved_attr = *prividx >> 16;
+ bool nuke_xstats = true;
+ u32 attr_id, ch = 0;
+ int ret;
+
+ if (!ops || !ops->ndo_get_xdp_stats)
+ goto nodata;
+
+ *idxattr = IFLA_STATS_LINK_XDP_XSTATS;
+
+ xstats = nla_nest_start_noflag(skb, IFLA_STATS_LINK_XDP_XSTATS);
+ if (!xstats)
+ return false;
+
+ for (attr_id = IFLA_XDP_XSTATS_TYPE_START;
+ attr_id < __IFLA_XDP_XSTATS_TYPE_CNT;
+ attr_id++) {
+ u32 nstat, scope_id, nch;
+ bool nuke_type = true;
+ void *attr_data;
+ size_t size;
+
+ if (attr_id > saved_attr)
+ saved_ch = 0;
+ if (attr_id < saved_attr)
+ continue;
+
+ nstat = rtnl_get_xdp_stats_num(attr_id);
+ if (!nstat)
+ continue;
+
+ scope_id = IFLA_XDP_XSTATS_SCOPE_SHARED;
+ nch = 1;
+
+ if (!ops->ndo_get_xdp_stats_nch)
+ goto shared;
+
+ ret = ops->ndo_get_xdp_stats_nch(dev, attr_id);
+ if (ret == -EOPNOTSUPP || ret == -ENODATA)
+ continue;
+ if (ret < 0)
+ goto out;
+ if (!ret)
+ goto shared;
+
+ scope_id = IFLA_XDP_XSTATS_SCOPE_CHANNEL;
+ nch = ret;
+
+shared:
+ size = array3_size(nch, nstat, sizeof(__u64));
+ if (unlikely(size == SIZE_MAX)) {
+ ret = -EOVERFLOW;
+ goto out;
+ }
+
+ attr_data = kzalloc(size, GFP_KERNEL);
+ if (!attr_data) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = ops->ndo_get_xdp_stats(dev, attr_id, attr_data);
+ if (ret == -EOPNOTSUPP || ret == -ENODATA)
+ goto kfree_cont;
+ if (ret) {
+kfree_out:
+ kfree(attr_data);
+ goto out;
+ }
+
+ ret = -EMSGSIZE;
+
+ type = nla_nest_start_noflag(skb, attr_id);
+ if (!type)
+ goto kfree_out;
+
+ for (ch = saved_ch; ch < nch; ch++)
+ if (!rtnl_get_xdp_stats_one(skb, attr_id, scope_id,
+ ch, attr_data)) {
+ if (nuke_type)
+ nla_nest_cancel(skb, type);
+ else
+ nla_nest_end(skb, type);
+
+ goto kfree_out;
+ } else {
+ nuke_xstats = false;
+ nuke_type = false;
+ }
+
+ nla_nest_end(skb, type);
+kfree_cont:
+ kfree(attr_data);
+ }
+
+ ret = 0;
+
+out:
+ if (nuke_xstats)
+ nla_nest_cancel(skb, xstats);
+ else
+ nla_nest_end(skb, xstats);
+
+ if (ret && ret != -EOPNOTSUPP && ret != -ENODATA) {
+ /* If the driver has 60+ queues, we can run out of skb
+ * tailroom even when putting stats for one type. Save
+ * channel number in prividx to resume from it next time
+ * rather than restaring the whole type and running into
+ * the same problem again.
+ */
+ *prividx = (attr_id << 16) | ch;
+ return false;
+ }
+
+ *prividx = 0;
+nodata:
+ *idxattr = 0;
+
+ return true;
+}
+
+static size_t rtnl_get_xdp_stats_size(const struct net_device *dev)
+{
+ const struct net_device_ops *ops = dev->netdev_ops;
+ size_t size = 0;
+ u32 attr_id;
+
+ if (!ops || !ops->ndo_get_xdp_stats)
+ return 0;
+
+ for (attr_id = IFLA_XDP_XSTATS_TYPE_START;
+ attr_id < __IFLA_XDP_XSTATS_TYPE_CNT;
+ attr_id++) {
+ u32 nstat = rtnl_get_xdp_stats_num(attr_id);
+ u32 nch = 1;
+ int ret;
+
+ if (!nstat)
+ continue;
+
+ if (!ops->ndo_get_xdp_stats_nch)
+ goto shared;
+
+ ret = ops->ndo_get_xdp_stats_nch(dev, attr_id);
+ if (ret < 0)
+ continue;
+ if (ret > 0)
+ nch = ret;
+
+shared:
+ size += nla_total_size(0) + /* IFLA_XDP_XSTATS_TYPE_* */
+ (nla_total_size(0) + /* IFLA_XDP_XSTATS_SCOPE_* */
+ nla_total_size_64bit(sizeof(__u64)) * nstat) * nch;
+ }
+
+ if (size)
+ size += nla_total_size(0); /* IFLA_STATS_LINK_XDP_XSTATS */
+
+ return size;
+}
+
static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
int type, u32 pid, u32 seq, u32 change,
unsigned int flags, unsigned int filter_mask,
@@ -5243,6 +5499,11 @@ static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
*idxattr = 0;
}
+ if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_XDP_XSTATS,
+ *idxattr) &&
+ !rtnl_get_xdp_stats(skb, dev, idxattr, prividx))
+ goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
@@ -5318,6 +5579,9 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
rcu_read_unlock();
}
+ if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_XDP_XSTATS, 0))
+ size += rtnl_get_xdp_stats_size(dev);
+
return size;
}
--
2.33.1
ena driver has 6 XDP counters collected per-channel. Add callbacks
for getting the number of channels and those counters using generic
XDP stats infra.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/ethernet/amazon/ena/ena_netdev.c | 53 ++++++++++++++++++++
1 file changed, 53 insertions(+)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 7d5d885d85d5..83e9b85cc998 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3313,12 +3313,65 @@ static void ena_get_stats64(struct net_device *netdev,
stats->tx_errors = 0;
}
+static int ena_get_xdp_stats_nch(const struct net_device *netdev, u32 attr_id)
+{
+ const struct ena_adapter *adapter = netdev_priv(netdev);
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ return adapter->num_io_queues;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int ena_get_xdp_stats(const struct net_device *netdev, u32 attr_id,
+ void *attr_data)
+{
+ const struct ena_adapter *adapter = netdev_priv(netdev);
+ struct ifla_xdp_stats *xdp_stats = attr_data;
+ u32 i;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ for (i = 0; i < adapter->num_io_queues; i++) {
+ const struct u64_stats_sync *syncp;
+ const struct ena_stats_rx *stats;
+ u32 start;
+
+ stats = &adapter->rx_ring[i].rx_stats;
+ syncp = &adapter->rx_ring[i].syncp;
+
+ do {
+ start = u64_stats_fetch_begin_irq(syncp);
+
+ xdp_stats->drop = stats->xdp_drop;
+ xdp_stats->pass = stats->xdp_pass;
+ xdp_stats->tx = stats->xdp_tx;
+ xdp_stats->redirect = stats->xdp_redirect;
+ xdp_stats->aborted = stats->xdp_aborted;
+ xdp_stats->invalid = stats->xdp_invalid;
+ } while (u64_stats_fetch_retry_irq(syncp, start));
+
+ xdp_stats++;
+ }
+
+ return 0;
+}
+
static const struct net_device_ops ena_netdev_ops = {
.ndo_open = ena_open,
.ndo_stop = ena_close,
.ndo_start_xmit = ena_start_xmit,
.ndo_select_queue = ena_select_queue,
.ndo_get_stats64 = ena_get_stats64,
+ .ndo_get_xdp_stats_nch = ena_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = ena_get_xdp_stats,
.ndo_tx_timeout = ena_tx_timeout,
.ndo_change_mtu = ena_change_mtu,
.ndo_set_mac_address = NULL,
--
2.33.1
Add several shorthands to reduce driver boilerplates and unify
storing and accessing generic XDP statistics in the drivers.
If the driver has one of xdp_{rx,tx}_drv_stats embedded into
a ring structure, it can reuse pretty much everything, but needs
to implement its own .ndo_xdp_stats() and .ndo_xdp_stats_nch()
if needed. If the driver stores a separate array of xdp_drv_stats,
it can then export it as net_device::xstats, implement only
.ndo_xdp_stats_nch() and wire up xdp_get_drv_stats_generic()
as .ndo_xdp_stats().
Both XDP and XSK blocks of xdp_drv_stats are cacheline-aligned
to avoid false-sharing, only extremely unlikely 'aborted' and
'invalid' falls out of a 64-byte CL. xdp_rx_drv_stats_local is
provided to put it on stack and collect the stats on hotpath,
with accessing a real container and its atomic/seqcount sync
points just once when exiting Rx NAPI polling.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
---
include/linux/netdevice.h | 1 +
include/net/xdp.h | 162 ++++++++++++++++++++++++++++++++++++++
net/core/xdp.c | 124 +++++++++++++++++++++++++++++
3 files changed, 287 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 058a00c2d19a..728c650d290e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2225,6 +2225,7 @@ struct net_device {
struct pcpu_lstats __percpu *lstats;
struct pcpu_sw_netstats __percpu *tstats;
struct pcpu_dstats __percpu *dstats;
+ struct xdp_drv_stats /* per-channel */ *xstats;
};
#if IS_ENABLED(CONFIG_GARP)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 447f9b1578f3..e4f06a34d462 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -7,6 +7,7 @@
#define __LINUX_NET_XDP_H__
#include <linux/skbuff.h> /* skb_shared_info */
+#include <linux/u64_stats_sync.h> /* u64_stats_* */
/**
* DOC: XDP RX-queue information
@@ -292,4 +293,165 @@ void xdp_attachment_setup(struct xdp_attachment_info *info,
#define DEV_MAP_BULK_SIZE XDP_BULK_QUEUE_SIZE
+/* Suggested XDP/XSK driver stats, mirror &ifla_xdp_stats except
+ * for generic errors, refer to its documentation for the details.
+ * The intended usage is to either have them as a standalone array
+ * of xdp_drv_stats, or embed &xdp_{rx,tx}_drv_stats into a ring
+ * structure. Having separate XDP and XSK counters is recommended.
+ */
+
+struct ifla_xdp_stats;
+
+struct xdp_rx_drv_stats {
+ struct u64_stats_sync syncp;
+ u64_stats_t packets;
+ u64_stats_t bytes;
+ u64_stats_t pass;
+ u64_stats_t drop;
+ u64_stats_t redirect;
+ u64_stats_t tx;
+ u64_stats_t redirect_errors;
+ u64_stats_t tx_errors;
+ u64_stats_t aborted;
+ u64_stats_t invalid;
+};
+
+struct xdp_tx_drv_stats {
+ struct u64_stats_sync syncp;
+ u64_stats_t packets;
+ u64_stats_t bytes;
+ u64_stats_t errors;
+ u64_stats_t full;
+};
+
+struct xdp_drv_stats {
+ struct xdp_rx_drv_stats xdp_rx;
+ struct xdp_tx_drv_stats xdp_tx;
+ struct xdp_rx_drv_stats xsk_rx ____cacheline_aligned;
+ struct xdp_tx_drv_stats xsk_tx;
+} ____cacheline_aligned;
+
+/* Shortened copy of Rx stats to put on stack */
+struct xdp_rx_drv_stats_local {
+ u32 bytes;
+ u32 packets;
+ u32 pass;
+ u32 drop;
+ u32 tx;
+ u32 tx_errors;
+ u32 redirect;
+ u32 redirect_errors;
+ u32 aborted;
+ u32 invalid;
+};
+
+#define xdp_init_rx_drv_stats(rstats) u64_stats_init(&(rstats)->syncp)
+#define xdp_init_tx_drv_stats(tstats) u64_stats_init(&(tstats)->syncp)
+
+/**
+ * xdp_init_drv_stats - initialize driver XDP stats
+ * @xdp_stats: driver container if it uses generic xdp_drv_stats
+ *
+ * Initializes atomic/seqcount sync points inside the containers.
+ */
+static inline void xdp_init_drv_stats(struct xdp_drv_stats *xdp_stats)
+{
+ xdp_init_rx_drv_stats(&xdp_stats->xdp_rx);
+ xdp_init_tx_drv_stats(&xdp_stats->xdp_tx);
+ xdp_init_rx_drv_stats(&xdp_stats->xsk_rx);
+ xdp_init_tx_drv_stats(&xdp_stats->xsk_tx);
+}
+
+/**
+ * xdp_update_rx_drv_stats - update driver XDP stats
+ * @rstats: target driver container
+ * @lrstats: filled onstack structure
+ *
+ * Fetches Rx path XDP statistics from the onstack structure to the
+ * driver container, respecting atomic/seqcount synchronization.
+ * Typical usage is to call it at the end of Rx NAPI polling.
+ */
+static inline void
+xdp_update_rx_drv_stats(struct xdp_rx_drv_stats *rstats,
+ const struct xdp_rx_drv_stats_local *lrstats)
+{
+ if (!lrstats->packets)
+ return;
+
+ u64_stats_update_begin(&rstats->syncp);
+ u64_stats_add(&rstats->packets, lrstats->packets);
+ u64_stats_add(&rstats->bytes, lrstats->bytes);
+ u64_stats_add(&rstats->pass, lrstats->pass);
+ u64_stats_add(&rstats->drop, lrstats->drop);
+ u64_stats_add(&rstats->redirect, lrstats->redirect);
+ u64_stats_add(&rstats->tx, lrstats->tx);
+ u64_stats_add(&rstats->redirect_errors, lrstats->redirect_errors);
+ u64_stats_add(&rstats->tx_errors, lrstats->tx_errors);
+ u64_stats_add(&rstats->aborted, lrstats->aborted);
+ u64_stats_add(&rstats->invalid, lrstats->invalid);
+ u64_stats_update_end(&rstats->syncp);
+}
+
+/**
+ * xdp_update_tx_drv_stats - update driver XDP stats
+ * @tstats: target driver container
+ * @packets: onstack packet counter
+ * @bytes: onstack octete counter
+ *
+ * Adds onstack packet/byte Tx XDP counter values from the current session
+ * to the driver container. Typical usage is to call it on completion path /
+ * Tx NAPI polling.
+ */
+static inline void xdp_update_tx_drv_stats(struct xdp_tx_drv_stats *tstats,
+ u32 packets, u32 bytes)
+{
+ if (!packets)
+ return;
+
+ u64_stats_update_begin(&tstats->syncp);
+ u64_stats_add(&tstats->packets, packets);
+ u64_stats_add(&tstats->bytes, bytes);
+ u64_stats_update_end(&tstats->syncp);
+}
+
+/**
+ * xdp_update_tx_drv_err - update driver Tx XDP errors counter
+ * @tstats: target driver container
+ * @num: onstack error counter / number of non-xmitted frames
+ *
+ * Adds onstack error Tx XDP counter value from the current session
+ * to the driver container. Typical usage is to call it at on error
+ * path of .ndo_xdp_xmit() / XSK zerocopy xmit.
+ */
+static inline void xdp_update_tx_drv_err(struct xdp_tx_drv_stats *tstats,
+ u32 num)
+{
+ u64_stats_update_begin(&tstats->syncp);
+ u64_stats_add(&tstats->errors, num);
+ u64_stats_update_end(&tstats->syncp);
+}
+
+/**
+ * xdp_update_tx_drv_full - update driver Tx XDP ring full counter
+ * @tstats: target driver container
+ *
+ * Adds onstack error Tx XDP counter value from the current session
+ * to the driver container. Typical usage is to call it at in case
+ * of no free descs available on a ring in .ndo_xdp_xmit() / XSK
+ * zerocopy xmit.
+ */
+static inline void xdp_update_tx_drv_full(struct xdp_tx_drv_stats *tstats)
+{
+ u64_stats_update_begin(&tstats->syncp);
+ u64_stats_inc(&tstats->full);
+ u64_stats_update_end(&tstats->syncp);
+}
+
+void xdp_fetch_rx_drv_stats(struct ifla_xdp_stats *if_stats,
+ const struct xdp_rx_drv_stats *rstats);
+void xdp_fetch_tx_drv_stats(struct ifla_xdp_stats *if_stats,
+ const struct xdp_tx_drv_stats *tstats);
+int xdp_get_drv_stats_generic(const struct net_device *dev, u32 attr_id,
+ void *attr_data);
+
#endif /* __LINUX_NET_XDP_H__ */
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 5ddc29f29bad..24980207303c 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -611,3 +611,127 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf)
return nxdpf;
}
+
+/**
+ * xdp_fetch_rx_drv_stats - helper for implementing .ndo_get_xdp_stats()
+ * @if_stats: target container passed from rtnetlink core
+ * @rstats: driver container if it uses generic xdp_rx_drv_stats
+ *
+ * Fetches Rx path XDP statistics from a suggested driver structure to
+ * the one used by rtnetlink, respecting atomic/seqcount synchronization.
+ */
+void xdp_fetch_rx_drv_stats(struct ifla_xdp_stats *if_stats,
+ const struct xdp_rx_drv_stats *rstats)
+{
+ u32 start;
+
+ do {
+ start = u64_stats_fetch_begin_irq(&rstats->syncp);
+
+ if_stats->packets = u64_stats_read(&rstats->packets);
+ if_stats->bytes = u64_stats_read(&rstats->bytes);
+ if_stats->pass = u64_stats_read(&rstats->pass);
+ if_stats->drop = u64_stats_read(&rstats->drop);
+ if_stats->tx = u64_stats_read(&rstats->tx);
+ if_stats->tx_errors = u64_stats_read(&rstats->tx_errors);
+ if_stats->redirect = u64_stats_read(&rstats->redirect);
+ if_stats->redirect_errors =
+ u64_stats_read(&rstats->redirect_errors);
+ if_stats->aborted = u64_stats_read(&rstats->aborted);
+ if_stats->invalid = u64_stats_read(&rstats->invalid);
+ } while (u64_stats_fetch_retry_irq(&rstats->syncp, start));
+}
+EXPORT_SYMBOL_GPL(xdp_fetch_rx_drv_stats);
+
+/**
+ * xdp_fetch_tx_drv_stats - helper for implementing .ndo_get_xdp_stats()
+ * @if_stats: target container passed from rtnetlink core
+ * @tstats: driver container if it uses generic xdp_tx_drv_stats
+ *
+ * Fetches Tx path XDP statistics from a suggested driver structure to
+ * the one used by rtnetlink, respecting atomic/seqcount synchronization.
+ */
+void xdp_fetch_tx_drv_stats(struct ifla_xdp_stats *if_stats,
+ const struct xdp_tx_drv_stats *tstats)
+{
+ u32 start;
+
+ do {
+ start = u64_stats_fetch_begin_irq(&tstats->syncp);
+
+ if_stats->xmit_packets = u64_stats_read(&tstats->packets);
+ if_stats->xmit_bytes = u64_stats_read(&tstats->bytes);
+ if_stats->xmit_errors = u64_stats_read(&tstats->errors);
+ if_stats->xmit_full = u64_stats_read(&tstats->full);
+ } while (u64_stats_fetch_retry_irq(&tstats->syncp, start));
+}
+EXPORT_SYMBOL_GPL(xdp_fetch_tx_drv_stats);
+
+/**
+ * xdp_get_drv_stats_generic - generic implementation of .ndo_get_xdp_stats()
+ * @dev: network interface device structure
+ * @attr_id: type of statistics (XDP, XSK, ...)
+ * @attr_data: target stats container
+ *
+ * Returns 0 on success, -%EOPNOTSUPP if either driver or this function doesn't
+ * support this attr_id, -%ENODATA if the driver supports attr_id, but can't
+ * provide anything right now, and -%EINVAL if driver configuration is invalid.
+ */
+int xdp_get_drv_stats_generic(const struct net_device *dev, u32 attr_id,
+ void *attr_data)
+{
+ const bool xsk = attr_id == IFLA_XDP_XSTATS_TYPE_XSK;
+ const struct xdp_drv_stats *drv_iter = dev->xstats;
+ const struct net_device_ops *ops = dev->netdev_ops;
+ struct ifla_xdp_stats *iter = attr_data;
+ int nch;
+ u32 i;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ if (unlikely(!ops->ndo_bpf))
+ return -EINVAL;
+
+ break;
+ case IFLA_XDP_XSTATS_TYPE_XSK:
+ if (!ops->ndo_xsk_wakeup)
+ return -EOPNOTSUPP;
+
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ if (unlikely(!drv_iter || !ops->ndo_get_xdp_stats_nch))
+ return -EINVAL;
+
+ nch = ops->ndo_get_xdp_stats_nch(dev, attr_id);
+ switch (nch) {
+ case 0:
+ /* Stats are shared across the netdev */
+ nch = 1;
+ break;
+ case 1 ... INT_MAX:
+ /* Stats are per-channel */
+ break;
+ default:
+ return nch;
+ }
+
+ for (i = 0; i < nch; i++) {
+ const struct xdp_rx_drv_stats *rstats;
+ const struct xdp_tx_drv_stats *tstats;
+
+ rstats = xsk ? &drv_iter->xsk_rx : &drv_iter->xdp_rx;
+ xdp_fetch_rx_drv_stats(iter, rstats);
+
+ tstats = xsk ? &drv_iter->xsk_tx : &drv_iter->xdp_tx;
+ xdp_fetch_tx_drv_stats(iter, tstats);
+
+ drv_iter++;
+ iter++;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(xdp_get_drv_stats_generic);
--
2.33.1
Add an ability to dpaa2 to query its 5 per-channel XDP counters
using generic XDP stats infra.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
.../net/ethernet/freescale/dpaa2/dpaa2-eth.c | 45 +++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index 6451c8383639..7715aecedacc 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -1973,6 +1973,49 @@ static void dpaa2_eth_get_stats(struct net_device *net_dev,
}
}
+static int dpaa2_eth_get_xdp_stats_nch(const struct net_device *net_dev,
+ u32 attr_id)
+{
+ const struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ return priv->num_channels;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int dpaa2_eth_get_xdp_stats(const struct net_device *net_dev,
+ u32 attr_id, void *attr_data)
+{
+ const struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
+ struct ifla_xdp_stats *xdp_stats = attr_data;
+ u32 i;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ for (i = 0; i < priv->num_channels; i++) {
+ const struct dpaa2_eth_ch_stats *ch_stats;
+
+ ch_stats = &priv->channel[i]->stats;
+
+ xdp_stats->drop = ch_stats->xdp_drop;
+ xdp_stats->redirect = ch_stats->xdp_redirect;
+ xdp_stats->tx = ch_stats->xdp_tx;
+ xdp_stats->tx_errors = ch_stats->xdp_tx_err;
+
+ xdp_stats++;
+ }
+
+ return 0;
+}
+
/* Copy mac unicast addresses from @net_dev to @priv.
* Its sole purpose is to make dpaa2_eth_set_rx_mode() more readable.
*/
@@ -2601,6 +2644,8 @@ static const struct net_device_ops dpaa2_eth_ops = {
.ndo_stop = dpaa2_eth_stop,
.ndo_set_mac_address = dpaa2_eth_set_addr,
.ndo_get_stats64 = dpaa2_eth_get_stats,
+ .ndo_get_xdp_stats_nch = dpaa2_eth_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = dpaa2_eth_get_xdp_stats,
.ndo_set_rx_mode = dpaa2_eth_set_rx_mode,
.ndo_set_features = dpaa2_eth_set_features,
.ndo_eth_ioctl = dpaa2_eth_ioctl,
--
2.33.1
Similarly to dpaa2, enetc stores 5 per-channel counters for XDP.
Add necessary callbacks to be able to access them using new generic
XDP stats infra.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 48 +++++++++++++++++++
drivers/net/ethernet/freescale/enetc/enetc.h | 3 ++
.../net/ethernet/freescale/enetc/enetc_pf.c | 2 +
3 files changed, 53 insertions(+)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 504e12554079..ec62765377a7 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -2575,6 +2575,54 @@ struct net_device_stats *enetc_get_stats(struct net_device *ndev)
return stats;
}
+int enetc_get_xdp_stats_nch(const struct net_device *ndev, u32 attr_id)
+{
+ const struct enetc_ndev_priv *priv = netdev_priv(ndev);
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ return max(priv->num_rx_rings, priv->num_tx_rings);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+int enetc_get_xdp_stats(const struct net_device *ndev, u32 attr_id,
+ void *attr_data)
+{
+ struct ifla_xdp_stats *xdp_iter, *xdp_stats = attr_data;
+ const struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ const struct enetc_ring_stats *stats;
+ u32 i;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ for (i = 0; i < priv->num_tx_rings; i++) {
+ stats = &priv->tx_ring[i]->stats;
+ xdp_iter = xdp_stats + i;
+
+ xdp_iter->tx = stats->xdp_tx;
+ xdp_iter->tx_errors = stats->xdp_tx_drops;
+ }
+
+ for (i = 0; i < priv->num_rx_rings; i++) {
+ stats = &priv->rx_ring[i]->stats;
+ xdp_iter = xdp_stats + i;
+
+ xdp_iter->drop = stats->xdp_drops;
+ xdp_iter->redirect = stats->xdp_redirect;
+ xdp_iter->redirect_errors = stats->xdp_redirect_failures;
+ xdp_iter->redirect_errors += stats->xdp_redirect_sg;
+ }
+
+ return 0;
+}
+
static int enetc_set_rss(struct net_device *ndev, int en)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index fb39e406b7fc..8f175f0194e3 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -389,6 +389,9 @@ void enetc_start(struct net_device *ndev);
void enetc_stop(struct net_device *ndev);
netdev_tx_t enetc_xmit(struct sk_buff *skb, struct net_device *ndev);
struct net_device_stats *enetc_get_stats(struct net_device *ndev);
+int enetc_get_xdp_stats_nch(const struct net_device *ndev, u32 attr_id);
+int enetc_get_xdp_stats(const struct net_device *ndev, u32 attr_id,
+ void *attr_data);
int enetc_set_features(struct net_device *ndev,
netdev_features_t features);
int enetc_ioctl(struct net_device *ndev, struct ifreq *rq, int cmd);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index fe6a544f37f0..c7776b842a91 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -729,6 +729,8 @@ static const struct net_device_ops enetc_ndev_ops = {
.ndo_stop = enetc_close,
.ndo_start_xmit = enetc_xmit,
.ndo_get_stats = enetc_get_stats,
+ .ndo_get_xdp_stats_nch = enetc_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = enetc_get_xdp_stats,
.ndo_set_mac_address = enetc_pf_set_mac_addr,
.ndo_set_rx_mode = enetc_pf_set_rx_mode,
.ndo_vlan_rx_add_vid = enetc_vlan_rx_add_vid,
--
2.33.1
Some of the initializers are aligned with spaces, others with tabs.
Reindent it using tabs only.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/ethernet/marvell/mvneta.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 80e4b500695e..7c30417a0464 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -4949,18 +4949,18 @@ static int mvneta_setup_tc(struct net_device *dev, enum tc_setup_type type,
}
static const struct net_device_ops mvneta_netdev_ops = {
- .ndo_open = mvneta_open,
- .ndo_stop = mvneta_stop,
- .ndo_start_xmit = mvneta_tx,
- .ndo_set_rx_mode = mvneta_set_rx_mode,
- .ndo_set_mac_address = mvneta_set_mac_addr,
- .ndo_change_mtu = mvneta_change_mtu,
- .ndo_fix_features = mvneta_fix_features,
- .ndo_get_stats64 = mvneta_get_stats64,
- .ndo_eth_ioctl = mvneta_ioctl,
- .ndo_bpf = mvneta_xdp,
- .ndo_xdp_xmit = mvneta_xdp_xmit,
- .ndo_setup_tc = mvneta_setup_tc,
+ .ndo_open = mvneta_open,
+ .ndo_stop = mvneta_stop,
+ .ndo_start_xmit = mvneta_tx,
+ .ndo_set_rx_mode = mvneta_set_rx_mode,
+ .ndo_set_mac_address = mvneta_set_mac_addr,
+ .ndo_change_mtu = mvneta_change_mtu,
+ .ndo_fix_features = mvneta_fix_features,
+ .ndo_get_stats64 = mvneta_get_stats64,
+ .ndo_eth_ioctl = mvneta_ioctl,
+ .ndo_bpf = mvneta_xdp,
+ .ndo_xdp_xmit = mvneta_xdp_xmit,
+ .ndo_setup_tc = mvneta_setup_tc,
};
static const struct ethtool_ops mvneta_eth_tool_ops = {
--
2.33.1
mvneta driver implements 7 per-cpu counters which means we can
only provide them as a global sum across CPUs.
Implement a callback for querying them using generic XDP stats
infra.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/ethernet/marvell/mvneta.c | 54 +++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 7c30417a0464..5bb0bbfa1ee6 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -802,6 +802,59 @@ mvneta_get_stats64(struct net_device *dev,
stats->tx_dropped = dev->stats.tx_dropped;
}
+static int mvneta_get_xdp_stats(const struct net_device *dev, u32 attr_id,
+ void *attr_data)
+{
+ const struct mvneta_port *pp = netdev_priv(dev);
+ struct ifla_xdp_stats *xdp_stats = attr_data;
+ u32 cpu;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ for_each_possible_cpu(cpu) {
+ const struct mvneta_pcpu_stats *stats;
+ const struct mvneta_stats *ps;
+ u64 xdp_xmit_err;
+ u64 xdp_redirect;
+ u64 xdp_tx_err;
+ u64 xdp_pass;
+ u64 xdp_drop;
+ u64 xdp_xmit;
+ u64 xdp_tx;
+ u32 start;
+
+ stats = per_cpu_ptr(pp->stats, cpu);
+ ps = &stats->es.ps;
+
+ do {
+ start = u64_stats_fetch_begin_irq(&stats->syncp);
+
+ xdp_drop = ps->xdp_drop;
+ xdp_pass = ps->xdp_pass;
+ xdp_redirect = ps->xdp_redirect;
+ xdp_tx = ps->xdp_tx;
+ xdp_tx_err = ps->xdp_tx_err;
+ xdp_xmit = ps->xdp_xmit;
+ xdp_xmit_err = ps->xdp_xmit_err;
+ } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
+
+ xdp_stats->drop += xdp_drop;
+ xdp_stats->pass += xdp_pass;
+ xdp_stats->redirect += xdp_redirect;
+ xdp_stats->tx += xdp_tx;
+ xdp_stats->tx_errors += xdp_tx_err;
+ xdp_stats->xmit_packets += xdp_xmit;
+ xdp_stats->xmit_errors += xdp_xmit_err;
+ }
+
+ return 0;
+}
+
/* Rx descriptors helper methods */
/* Checks whether the RX descriptor having this status is both the first
@@ -4957,6 +5010,7 @@ static const struct net_device_ops mvneta_netdev_ops = {
.ndo_change_mtu = mvneta_change_mtu,
.ndo_fix_features = mvneta_fix_features,
.ndo_get_stats64 = mvneta_get_stats64,
+ .ndo_get_xdp_stats = mvneta_get_xdp_stats,
.ndo_eth_ioctl = mvneta_ioctl,
.ndo_bpf = mvneta_xdp,
.ndo_xdp_xmit = mvneta_xdp_xmit,
--
2.33.1
They get updated not only on XDP path. Moreover, packet counter
stores the total number of frames, not only the ones passed to
bpf_prog_run_xdp(), so it's rather confusing.
Drop the xdp_ suffix from both of them to not mix XDP-only stats
with the general ones.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/veth.c | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 0e6c030576f4..ac3b1a2a91c8 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -38,10 +38,10 @@
#define VETH_XDP_BATCH 16
struct veth_stats {
+ u64 packets;
+ u64 bytes;
u64 rx_drops;
/* xdp */
- u64 xdp_packets;
- u64 xdp_bytes;
u64 xdp_redirect;
u64 xdp_errors;
u64 xdp_drops;
@@ -93,8 +93,8 @@ struct veth_q_stat_desc {
#define VETH_RQ_STAT(m) offsetof(struct veth_stats, m)
static const struct veth_q_stat_desc veth_rq_stats_desc[] = {
- { "xdp_packets", VETH_RQ_STAT(xdp_packets) },
- { "xdp_bytes", VETH_RQ_STAT(xdp_bytes) },
+ { "packets", VETH_RQ_STAT(packets) },
+ { "bytes", VETH_RQ_STAT(bytes) },
{ "drops", VETH_RQ_STAT(rx_drops) },
{ "xdp_redirect", VETH_RQ_STAT(xdp_redirect) },
{ "xdp_errors", VETH_RQ_STAT(xdp_errors) },
@@ -378,9 +378,9 @@ static void veth_stats_rx(struct veth_stats *result, struct net_device *dev)
int i;
result->peer_tq_xdp_xmit_err = 0;
- result->xdp_packets = 0;
+ result->packets = 0;
result->xdp_tx_err = 0;
- result->xdp_bytes = 0;
+ result->bytes = 0;
result->rx_drops = 0;
for (i = 0; i < dev->num_rx_queues; i++) {
u64 packets, bytes, drops, xdp_tx_err, peer_tq_xdp_xmit_err;
@@ -391,14 +391,14 @@ static void veth_stats_rx(struct veth_stats *result, struct net_device *dev)
start = u64_stats_fetch_begin_irq(&stats->syncp);
peer_tq_xdp_xmit_err = stats->vs.peer_tq_xdp_xmit_err;
xdp_tx_err = stats->vs.xdp_tx_err;
- packets = stats->vs.xdp_packets;
- bytes = stats->vs.xdp_bytes;
+ packets = stats->vs.packets;
+ bytes = stats->vs.bytes;
drops = stats->vs.rx_drops;
} while (u64_stats_fetch_retry_irq(&stats->syncp, start));
result->peer_tq_xdp_xmit_err += peer_tq_xdp_xmit_err;
result->xdp_tx_err += xdp_tx_err;
- result->xdp_packets += packets;
- result->xdp_bytes += bytes;
+ result->packets += packets;
+ result->bytes += bytes;
result->rx_drops += drops;
}
}
@@ -418,8 +418,8 @@ static void veth_get_stats64(struct net_device *dev,
veth_stats_rx(&rx, dev);
tot->tx_dropped += rx.xdp_tx_err;
tot->rx_dropped = rx.rx_drops + rx.peer_tq_xdp_xmit_err;
- tot->rx_bytes = rx.xdp_bytes;
- tot->rx_packets = rx.xdp_packets;
+ tot->rx_bytes = rx.bytes;
+ tot->rx_packets = rx.packets;
rcu_read_lock();
peer = rcu_dereference(priv->peer);
@@ -431,8 +431,8 @@ static void veth_get_stats64(struct net_device *dev,
veth_stats_rx(&rx, peer);
tot->tx_dropped += rx.peer_tq_xdp_xmit_err;
tot->rx_dropped += rx.xdp_tx_err;
- tot->tx_bytes += rx.xdp_bytes;
- tot->tx_packets += rx.xdp_packets;
+ tot->tx_bytes += rx.bytes;
+ tot->tx_packets += rx.packets;
}
rcu_read_unlock();
}
@@ -867,7 +867,7 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
/* ndo_xdp_xmit */
struct xdp_frame *frame = veth_ptr_to_xdp(ptr);
- stats->xdp_bytes += frame->len;
+ stats->bytes += frame->len;
frame = veth_xdp_rcv_one(rq, frame, bq, stats);
if (frame) {
/* XDP_PASS */
@@ -882,7 +882,7 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
/* ndo_start_xmit */
struct sk_buff *skb = ptr;
- stats->xdp_bytes += skb->len;
+ stats->bytes += skb->len;
skb = veth_xdp_rcv_skb(rq, skb, bq, stats);
if (skb)
napi_gro_receive(&rq->xdp_napi, skb);
@@ -895,10 +895,10 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
u64_stats_update_begin(&rq->stats.syncp);
rq->stats.vs.xdp_redirect += stats->xdp_redirect;
- rq->stats.vs.xdp_bytes += stats->xdp_bytes;
+ rq->stats.vs.bytes += stats->bytes;
rq->stats.vs.xdp_drops += stats->xdp_drops;
rq->stats.vs.rx_drops += stats->rx_drops;
- rq->stats.vs.xdp_packets += done;
+ rq->stats.vs.packets += done;
u64_stats_update_end(&rq->stats.syncp);
return done;
--
2.33.1
mlx5 driver has a bunch of per-channel stats for XDP. 7 and 5 of
them can be exported through generic XDP stats infra for XDP and XSK
correspondingly.
Add necessary calbacks for that. Note that the driver doesn't expose
XSK stats if XSK setup has never been requested.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 5 ++
.../net/ethernet/mellanox/mlx5/core/en_main.c | 2 +
.../ethernet/mellanox/mlx5/core/en_stats.c | 69 +++++++++++++++++++
3 files changed, 76 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 48b12ee44b8d..cc8cf3ff7d49 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1212,4 +1212,9 @@ int mlx5e_set_vf_rate(struct net_device *dev, int vf, int min_tx_rate, int max_t
int mlx5e_get_vf_config(struct net_device *dev, int vf, struct ifla_vf_info *ivi);
int mlx5e_get_vf_stats(struct net_device *dev, int vf, struct ifla_vf_stats *vf_stats);
#endif
+
+int mlx5e_get_xdp_stats_nch(const struct net_device *dev, u32 attr_id);
+int mlx5e_get_xdp_stats(const struct net_device *dev, u32 attr_id,
+ void *attr_data);
+
#endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 65571593ec5c..d5b3abf09c82 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4532,6 +4532,8 @@ const struct net_device_ops mlx5e_netdev_ops = {
.ndo_setup_tc = mlx5e_setup_tc,
.ndo_select_queue = mlx5e_select_queue,
.ndo_get_stats64 = mlx5e_get_stats,
+ .ndo_get_xdp_stats_nch = mlx5e_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = mlx5e_get_xdp_stats,
.ndo_set_rx_mode = mlx5e_set_rx_mode,
.ndo_set_mac_address = mlx5e_set_mac,
.ndo_vlan_rx_add_vid = mlx5e_vlan_rx_add_vid,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 3631dafb4ea2..834457e3f19a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -2292,3 +2292,72 @@ unsigned int mlx5e_nic_stats_grps_num(struct mlx5e_priv *priv)
{
return ARRAY_SIZE(mlx5e_nic_stats_grps);
}
+
+int mlx5e_get_xdp_stats_nch(const struct net_device *dev, u32 attr_id)
+{
+ const struct mlx5e_priv *priv = netdev_priv(dev);
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ return priv->max_nch;
+ case IFLA_XDP_XSTATS_TYPE_XSK:
+ return priv->xsk.ever_used ? priv->max_nch : -ENODATA;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+int mlx5e_get_xdp_stats(const struct net_device *dev, u32 attr_id,
+ void *attr_data)
+{
+ const struct mlx5e_priv *priv = netdev_priv(dev);
+ struct ifla_xdp_stats *xdp_stats = attr_data;
+ u32 i;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ break;
+ case IFLA_XDP_XSTATS_TYPE_XSK:
+ if (!priv->xsk.ever_used)
+ return -ENODATA;
+
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ for (i = 0; i < priv->max_nch; i++) {
+ const struct mlx5e_channel_stats *cs = priv->channel_stats + i;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ /* mlx5e_rq_stats rq */
+ xdp_stats->errors = cs->rq.xdp_errors;
+ xdp_stats->drop = cs->rq.xdp_drop;
+ xdp_stats->redirect = cs->rq.xdp_redirect;
+ /* mlx5e_xdpsq_stats rq_xdpsq */
+ xdp_stats->tx = cs->rq_xdpsq.xmit;
+ xdp_stats->tx_errors = cs->rq_xdpsq.err +
+ cs->rq_xdpsq.full;
+ /* mlx5e_xdpsq_stats xdpsq */
+ xdp_stats->xmit_packets = cs->xdpsq.xmit;
+ xdp_stats->xmit_errors = cs->xdpsq.err;
+ xdp_stats->xmit_full = cs->xdpsq.full;
+ break;
+ case IFLA_XDP_XSTATS_TYPE_XSK:
+ /* mlx5e_rq_stats xskrq */
+ xdp_stats->errors = cs->xskrq.xdp_errors;
+ xdp_stats->drop = cs->xskrq.xdp_drop;
+ xdp_stats->redirect = cs->xskrq.xdp_redirect;
+ /* mlx5e_xdpsq_stats xsksq */
+ xdp_stats->xmit_packets = cs->xsksq.xmit;
+ xdp_stats->xmit_errors = cs->xsksq.err;
+ xdp_stats->xmit_full = cs->xsksq.full;
+ break;
+ }
+
+ xdp_stats++;
+ }
+
+ return 0;
+}
--
2.33.1
Same as mvneta, mvpp2 stores 7 XDP counters in per-cpu containers.
Expose them via generic XDP stats infra.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 51 +++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 97bd2ee8a010..58203cde3b60 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -5131,6 +5131,56 @@ mvpp2_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
stats->tx_dropped = dev->stats.tx_dropped;
}
+static int mvpp2_get_xdp_stats_ndo(const struct net_device *dev, u32 attr_id,
+ void *attr_data)
+{
+ const struct mvpp2_port *port = netdev_priv(dev);
+ struct ifla_xdp_stats *xdp_stats = attr_data;
+ u32 cpu, start;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ for_each_possible_cpu(cpu) {
+ const struct mvpp2_pcpu_stats *ps;
+ u64 xdp_xmit_err;
+ u64 xdp_redirect;
+ u64 xdp_tx_err;
+ u64 xdp_pass;
+ u64 xdp_drop;
+ u64 xdp_xmit;
+ u64 xdp_tx;
+
+ ps = per_cpu_ptr(port->stats, cpu);
+
+ do {
+ start = u64_stats_fetch_begin_irq(&ps->syncp);
+
+ xdp_redirect = ps->xdp_redirect;
+ xdp_pass = ps->xdp_pass;
+ xdp_drop = ps->xdp_drop;
+ xdp_xmit = ps->xdp_xmit;
+ xdp_xmit_err = ps->xdp_xmit_err;
+ xdp_tx = ps->xdp_tx;
+ xdp_tx_err = ps->xdp_tx_err;
+ } while (u64_stats_fetch_retry_irq(&ps->syncp, start));
+
+ xdp_stats->redirect += xdp_redirect;
+ xdp_stats->pass += xdp_pass;
+ xdp_stats->drop += xdp_drop;
+ xdp_stats->xmit_packets += xdp_xmit;
+ xdp_stats->xmit_errors += xdp_xmit_err;
+ xdp_stats->tx += xdp_tx;
+ xdp_stats->tx_errors += xdp_tx_err;
+ }
+
+ return 0;
+}
+
static int mvpp2_set_ts_config(struct mvpp2_port *port, struct ifreq *ifr)
{
struct hwtstamp_config config;
@@ -5719,6 +5769,7 @@ static const struct net_device_ops mvpp2_netdev_ops = {
.ndo_set_mac_address = mvpp2_set_mac_address,
.ndo_change_mtu = mvpp2_change_mtu,
.ndo_get_stats64 = mvpp2_get_stats64,
+ .ndo_get_xdp_stats = mvpp2_get_xdp_stats_ndo,
.ndo_eth_ioctl = mvpp2_ioctl,
.ndo_vlan_rx_add_vid = mvpp2_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = mvpp2_vlan_rx_kill_vid,
--
2.33.1
Provide a separate counter [rx_]xdp_errors for drops related to
XDP_ABORTED and other errors/exceptions and leave [rx_]xdp_drop
only for XDP_DROP case.
This will align the driver better with generic XDP stats.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 ++-
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 7 +++++++
drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 3 +++
3 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 2f0df5cc1a2d..a9a8535c828b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -156,7 +156,8 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
case XDP_ABORTED:
xdp_abort:
trace_xdp_exception(rq->netdev, prog, act);
- fallthrough;
+ rq->stats->xdp_errors++;
+ return true;
case XDP_DROP:
rq->stats->xdp_drop++;
return true;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 3c91a11e27ad..3631dafb4ea2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -141,6 +141,7 @@ static const struct counter_desc sw_stats_desc[] = {
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_complete_tail) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_complete_tail_slow) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_csum_unnecessary_inner) },
+ { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xdp_errors) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xdp_drop) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xdp_redirect) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xdp_tx_xmit) },
@@ -208,6 +209,7 @@ static const struct counter_desc sw_stats_desc[] = {
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xsk_csum_none) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xsk_ecn_mark) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xsk_removed_vlan_packets) },
+ { MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xsk_xdp_errors) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xsk_xdp_drop) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xsk_xdp_redirect) },
{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_xsk_wqe_err) },
@@ -298,6 +300,7 @@ static void mlx5e_stats_grp_sw_update_stats_xskrq(struct mlx5e_sw_stats *s,
s->rx_xsk_csum_none += xskrq_stats->csum_none;
s->rx_xsk_ecn_mark += xskrq_stats->ecn_mark;
s->rx_xsk_removed_vlan_packets += xskrq_stats->removed_vlan_packets;
+ s->rx_xsk_xdp_errors += xskrq_stats->xdp_errors;
s->rx_xsk_xdp_drop += xskrq_stats->xdp_drop;
s->rx_xsk_xdp_redirect += xskrq_stats->xdp_redirect;
s->rx_xsk_wqe_err += xskrq_stats->wqe_err;
@@ -331,6 +334,7 @@ static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
s->rx_csum_complete_tail_slow += rq_stats->csum_complete_tail_slow;
s->rx_csum_unnecessary += rq_stats->csum_unnecessary;
s->rx_csum_unnecessary_inner += rq_stats->csum_unnecessary_inner;
+ s->rx_xdp_errors += rq_stats->xdp_errors;
s->rx_xdp_drop += rq_stats->xdp_drop;
s->rx_xdp_redirect += rq_stats->xdp_redirect;
s->rx_wqe_err += rq_stats->wqe_err;
@@ -1766,6 +1770,7 @@ static const struct counter_desc rq_stats_desc[] = {
{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, csum_unnecessary) },
{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, csum_unnecessary_inner) },
{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, csum_none) },
+ { MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_errors) },
{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_drop) },
{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, xdp_redirect) },
{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, lro_packets) },
@@ -1869,6 +1874,7 @@ static const struct counter_desc xskrq_stats_desc[] = {
{ MLX5E_DECLARE_XSKRQ_STAT(struct mlx5e_rq_stats, csum_none) },
{ MLX5E_DECLARE_XSKRQ_STAT(struct mlx5e_rq_stats, ecn_mark) },
{ MLX5E_DECLARE_XSKRQ_STAT(struct mlx5e_rq_stats, removed_vlan_packets) },
+ { MLX5E_DECLARE_XSKRQ_STAT(struct mlx5e_rq_stats, xdp_errors) },
{ MLX5E_DECLARE_XSKRQ_STAT(struct mlx5e_rq_stats, xdp_drop) },
{ MLX5E_DECLARE_XSKRQ_STAT(struct mlx5e_rq_stats, xdp_redirect) },
{ MLX5E_DECLARE_XSKRQ_STAT(struct mlx5e_rq_stats, wqe_err) },
@@ -1940,6 +1946,7 @@ static const struct counter_desc ptp_rq_stats_desc[] = {
{ MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, csum_unnecessary) },
{ MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, csum_unnecessary_inner) },
{ MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, csum_none) },
+ { MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, xdp_errors) },
{ MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, xdp_drop) },
{ MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, xdp_redirect) },
{ MLX5E_DECLARE_PTP_RQ_STAT(struct mlx5e_rq_stats, lro_packets) },
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 2c1ed5b81be6..dd33465af0ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -158,6 +158,7 @@ struct mlx5e_sw_stats {
u64 rx_csum_complete_tail;
u64 rx_csum_complete_tail_slow;
u64 rx_csum_unnecessary_inner;
+ u64 rx_xdp_errors;
u64 rx_xdp_drop;
u64 rx_xdp_redirect;
u64 rx_xdp_tx_xmit;
@@ -237,6 +238,7 @@ struct mlx5e_sw_stats {
u64 rx_xsk_csum_none;
u64 rx_xsk_ecn_mark;
u64 rx_xsk_removed_vlan_packets;
+ u64 rx_xsk_xdp_errors;
u64 rx_xsk_xdp_drop;
u64 rx_xsk_xdp_redirect;
u64 rx_xsk_wqe_err;
@@ -335,6 +337,7 @@ struct mlx5e_rq_stats {
u64 mcast_packets;
u64 ecn_mark;
u64 removed_vlan_packets;
+ u64 xdp_errors;
u64 xdp_drop;
u64 xdp_redirect;
u64 wqe_err;
--
2.33.1
Dedicate a separate counter for tracking XDP_ABORTED and other XDP
errors and to leave xdp_drop for XDP_DROP case solely.
Needed to better align with generic XDP stats.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/virtio_net.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c74af526d79b..112ceda3dcf7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -92,6 +92,7 @@ struct virtnet_rq_stats {
u64 xdp_tx;
u64 xdp_redirects;
u64 xdp_drops;
+ u64 xdp_errors;
u64 kicks;
};
@@ -115,6 +116,7 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
{ "xdp_tx", VIRTNET_RQ_STAT(xdp_tx) },
{ "xdp_redirects", VIRTNET_RQ_STAT(xdp_redirects) },
{ "xdp_drops", VIRTNET_RQ_STAT(xdp_drops) },
+ { "xdp_errors", VIRTNET_RQ_STAT(xdp_errors) },
{ "kicks", VIRTNET_RQ_STAT(kicks) },
};
@@ -818,7 +820,8 @@ static struct sk_buff *receive_small(struct net_device *dev,
trace_xdp_exception(vi->dev, xdp_prog, act);
goto err_xdp;
case XDP_DROP:
- goto err_xdp;
+ stats->xdp_drops++;
+ goto xdp_drop;
}
}
rcu_read_unlock();
@@ -843,8 +846,9 @@ static struct sk_buff *receive_small(struct net_device *dev,
return skb;
err_xdp:
+ stats->xdp_errors++;
+xdp_drop:
rcu_read_unlock();
- stats->xdp_drops++;
err_len:
stats->drops++;
put_page(page);
@@ -1033,7 +1037,12 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
case XDP_DROP:
if (unlikely(xdp_page != page))
__free_pages(xdp_page, 0);
- goto err_xdp;
+
+ if (unlikely(act != XDP_DROP))
+ goto err_xdp;
+
+ stats->xdp_drops++;
+ goto xdp_drop;
}
}
rcu_read_unlock();
@@ -1103,8 +1112,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
return head_skb;
err_xdp:
+ stats->xdp_errors++;
+xdp_drop:
rcu_read_unlock();
- stats->xdp_drops++;
err_skb:
put_page(page);
while (num_buf-- > 1) {
--
2.33.1
Export 4 per-channel XDP counters for both sf100 and sfx drivers
using generic XDP stats infra.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/ethernet/sfc/ef100_netdev.c | 2 ++
drivers/net/ethernet/sfc/efx.c | 2 ++
drivers/net/ethernet/sfc/efx_common.c | 42 +++++++++++++++++++++++++
drivers/net/ethernet/sfc/efx_common.h | 3 ++
4 files changed, 49 insertions(+)
diff --git a/drivers/net/ethernet/sfc/ef100_netdev.c b/drivers/net/ethernet/sfc/ef100_netdev.c
index 67fe44db6b61..0367f7e043d8 100644
--- a/drivers/net/ethernet/sfc/ef100_netdev.c
+++ b/drivers/net/ethernet/sfc/ef100_netdev.c
@@ -219,6 +219,8 @@ static const struct net_device_ops ef100_netdev_ops = {
.ndo_start_xmit = ef100_hard_start_xmit,
.ndo_tx_timeout = efx_watchdog,
.ndo_get_stats64 = efx_net_stats,
+ .ndo_get_xdp_stats_nch = efx_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = efx_get_xdp_stats,
.ndo_change_mtu = efx_change_mtu,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = efx_set_mac_address,
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index a8c252e2b252..a6a015c4d3b4 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -588,6 +588,8 @@ static const struct net_device_ops efx_netdev_ops = {
.ndo_open = efx_net_open,
.ndo_stop = efx_net_stop,
.ndo_get_stats64 = efx_net_stats,
+ .ndo_get_xdp_stats_nch = efx_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = efx_get_xdp_stats,
.ndo_tx_timeout = efx_watchdog,
.ndo_start_xmit = efx_hard_start_xmit,
.ndo_validate_addr = eth_validate_addr,
diff --git a/drivers/net/ethernet/sfc/efx_common.c b/drivers/net/ethernet/sfc/efx_common.c
index f187631b2c5c..c2bf79fd66b4 100644
--- a/drivers/net/ethernet/sfc/efx_common.c
+++ b/drivers/net/ethernet/sfc/efx_common.c
@@ -606,6 +606,48 @@ void efx_net_stats(struct net_device *net_dev, struct rtnl_link_stats64 *stats)
spin_unlock_bh(&efx->stats_lock);
}
+int efx_get_xdp_stats_nch(const struct net_device *net_dev, u32 attr_id)
+{
+ const struct efx_nic *efx = netdev_priv(net_dev);
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ return efx->n_channels;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+int efx_get_xdp_stats(const struct net_device *net_dev, u32 attr_id,
+ void *attr_data)
+{
+ struct ifla_xdp_stats *xdp_stats = attr_data;
+ struct efx_nic *efx = netdev_priv(net_dev);
+ const struct efx_channel *channel;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ spin_lock_bh(&efx->stats_lock);
+
+ efx_for_each_channel(channel, efx) {
+ xdp_stats->drop = channel->n_rx_xdp_drops;
+ xdp_stats->errors = channel->n_rx_xdp_bad_drops;
+ xdp_stats->redirect = channel->n_rx_xdp_redirect;
+ xdp_stats->tx = channel->n_rx_xdp_tx;
+
+ xdp_stats++;
+ }
+
+ spin_unlock_bh(&efx->stats_lock);
+
+ return 0;
+}
+
/* Push loopback/power/transmit disable settings to the PHY, and reconfigure
* the MAC appropriately. All other PHY configuration changes are pushed
* through phy_op->set_settings(), and pushed asynchronously to the MAC
diff --git a/drivers/net/ethernet/sfc/efx_common.h b/drivers/net/ethernet/sfc/efx_common.h
index 65513fd0cf6c..987d7c6608a2 100644
--- a/drivers/net/ethernet/sfc/efx_common.h
+++ b/drivers/net/ethernet/sfc/efx_common.h
@@ -32,6 +32,9 @@ void efx_start_all(struct efx_nic *efx);
void efx_stop_all(struct efx_nic *efx);
void efx_net_stats(struct net_device *net_dev, struct rtnl_link_stats64 *stats);
+int efx_get_xdp_stats_nch(const struct net_device *net_dev, u32 attr_id);
+int efx_get_xdp_stats(const struct net_device *net_dev, u32 attr_id,
+ void *attr_data);
int efx_create_reset_workqueue(void);
void efx_queue_reset_work(struct efx_nic *efx);
--
2.33.1
To align better with other drivers and generic XDP stats, rename
xdp_tx{,_drops} to xdp_xmit{,_errors} as they're used on
.ndo_xdp_xmit() path.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/virtio_net.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 112ceda3dcf7..bf407423c929 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -77,8 +77,8 @@ struct virtnet_sq_stats {
struct u64_stats_sync syncp;
u64 packets;
u64 bytes;
- u64 xdp_tx;
- u64 xdp_tx_drops;
+ u64 xdp_xmit;
+ u64 xdp_xmit_errors;
u64 kicks;
u64 tx_timeouts;
};
@@ -102,8 +102,8 @@ struct virtnet_rq_stats {
static const struct virtnet_stat_desc virtnet_sq_stats_desc[] = {
{ "packets", VIRTNET_SQ_STAT(packets) },
{ "bytes", VIRTNET_SQ_STAT(bytes) },
- { "xdp_tx", VIRTNET_SQ_STAT(xdp_tx) },
- { "xdp_tx_drops", VIRTNET_SQ_STAT(xdp_tx_drops) },
+ { "xdp_xmit", VIRTNET_SQ_STAT(xdp_xmit) },
+ { "xdp_xmit_errors", VIRTNET_SQ_STAT(xdp_xmit_errors) },
{ "kicks", VIRTNET_SQ_STAT(kicks) },
{ "tx_timeouts", VIRTNET_SQ_STAT(tx_timeouts) },
};
@@ -629,8 +629,8 @@ static int virtnet_xdp_xmit(struct net_device *dev,
u64_stats_update_begin(&sq->stats.syncp);
sq->stats.bytes += bytes;
sq->stats.packets += packets;
- sq->stats.xdp_tx += n;
- sq->stats.xdp_tx_drops += n - nxmit;
+ sq->stats.xdp_xmit += n;
+ sq->stats.xdp_xmit_errors += n - nxmit;
sq->stats.kicks += kicks;
u64_stats_update_end(&sq->stats.syncp);
--
2.33.1
Some of the initializers are aligned with spaces, others with tabs.
Reindent it using tabs only.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/virtio_net.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index bf407423c929..f7c5511e510c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2710,15 +2710,15 @@ static void virtnet_tx_timeout(struct net_device *dev, unsigned int txqueue)
}
static const struct net_device_ops virtnet_netdev = {
- .ndo_open = virtnet_open,
- .ndo_stop = virtnet_close,
- .ndo_start_xmit = start_xmit,
- .ndo_validate_addr = eth_validate_addr,
- .ndo_set_mac_address = virtnet_set_mac_address,
- .ndo_set_rx_mode = virtnet_set_rx_mode,
- .ndo_get_stats64 = virtnet_stats,
- .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
- .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
+ .ndo_open = virtnet_open,
+ .ndo_stop = virtnet_close,
+ .ndo_start_xmit = start_xmit,
+ .ndo_validate_addr = eth_validate_addr,
+ .ndo_set_mac_address = virtnet_set_mac_address,
+ .ndo_set_rx_mode = virtnet_set_rx_mode,
+ .ndo_get_stats64 = virtnet_stats,
+ .ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
+ .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
.ndo_bpf = virtnet_xdp,
.ndo_xdp_xmit = virtnet_xdp_xmit,
.ndo_features_check = passthru_features_check,
--
2.33.1
Some of the initializers are aligned with spaces, others with tabs.¬
Reindent it using tabs only.¬
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/veth.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index ac3b1a2a91c8..3eb24a5c2d45 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1532,13 +1532,13 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp)
}
static const struct net_device_ops veth_netdev_ops = {
- .ndo_init = veth_dev_init,
- .ndo_open = veth_open,
- .ndo_stop = veth_close,
- .ndo_start_xmit = veth_xmit,
- .ndo_get_stats64 = veth_get_stats64,
- .ndo_set_rx_mode = veth_set_multicast_list,
- .ndo_set_mac_address = eth_mac_addr,
+ .ndo_init = veth_dev_init,
+ .ndo_open = veth_open,
+ .ndo_stop = veth_close,
+ .ndo_start_xmit = veth_xmit,
+ .ndo_get_stats64 = veth_get_stats64,
+ .ndo_set_rx_mode = veth_set_multicast_list,
+ .ndo_set_mac_address = eth_mac_addr,
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = veth_poll_controller,
#endif
--
2.33.1
Add generic XDP stats callbacks to be able to query 7 per-channel
virtio-net XDP stats via generic XDP stats infra.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/virtio_net.c | 56 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f7c5511e510c..0b4cc9662d91 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1919,6 +1919,60 @@ static void virtnet_stats(struct net_device *dev,
tot->rx_frame_errors = dev->stats.rx_frame_errors;
}
+static int virtnet_get_xdp_stats_nch(const struct net_device *dev, u32 attr_id)
+{
+ const struct virtnet_info *vi = netdev_priv(dev);
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ return vi->curr_queue_pairs;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int virtnet_get_xdp_stats(const struct net_device *dev, u32 attr_id,
+ void *attr_data)
+{
+ const struct virtnet_info *vi = netdev_priv(dev);
+ struct ifla_xdp_stats *xdp_stats = attr_data;
+ u32 i;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ for (i = 0; i < vi->curr_queue_pairs; i++) {
+ const struct virtnet_rq_stats *rqs = &vi->rq[i].stats;
+ const struct virtnet_sq_stats *sqs = &vi->sq[i].stats;
+ u32 start;
+
+ do {
+ start = u64_stats_fetch_begin_irq(&rqs->syncp);
+
+ xdp_stats->packets = rqs->xdp_packets;
+ xdp_stats->tx = rqs->xdp_tx;
+ xdp_stats->redirect = rqs->xdp_redirects;
+ xdp_stats->drop = rqs->xdp_drops;
+ xdp_stats->errors = rqs->xdp_errors;
+ } while (u64_stats_fetch_retry_irq(&rqs->syncp, start));
+
+ do {
+ start = u64_stats_fetch_begin_irq(&sqs->syncp);
+
+ xdp_stats->xmit_packets = sqs->xdp_xmit;
+ xdp_stats->xmit_errors = sqs->xdp_xmit_errors;
+ } while (u64_stats_fetch_retry_irq(&sqs->syncp, start));
+
+ xdp_stats++;
+ }
+
+ return 0;
+}
+
static void virtnet_ack_link_announce(struct virtnet_info *vi)
{
rtnl_lock();
@@ -2717,6 +2771,8 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_set_mac_address = virtnet_set_mac_address,
.ndo_set_rx_mode = virtnet_set_rx_mode,
.ndo_get_stats64 = virtnet_stats,
+ .ndo_get_xdp_stats_nch = virtnet_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = virtnet_get_xdp_stats,
.ndo_vlan_rx_add_vid = virtnet_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid,
.ndo_bpf = virtnet_xdp,
--
2.33.1
Expose veth's 7 per-channel counters by providing callbacks
for generic XDP stats infra.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/veth.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 67 insertions(+)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 3eb24a5c2d45..c12209fbd1bd 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -437,6 +437,71 @@ static void veth_get_stats64(struct net_device *dev,
rcu_read_unlock();
}
+static int veth_get_xdp_stats_nch(const struct net_device *dev, u32 attr_id)
+{
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ return max(dev->real_num_rx_queues, dev->real_num_tx_queues);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int veth_get_xdp_stats(const struct net_device *dev, u32 attr_id,
+ void *attr_data)
+{
+ const struct veth_priv *priv = netdev_priv(dev);
+ const struct net_device *peer = rtnl_dereference(priv->peer);
+ struct ifla_xdp_stats *xdp_iter, *xdp_stats = attr_data;
+ const struct veth_rq_stats *rq_stats;
+ u64 xmit_packets, xmit_errors;
+ u32 i, start;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ for (i = 0; i < dev->real_num_rx_queues; i++) {
+ rq_stats = &priv->rq[i].stats;
+ xdp_iter = xdp_stats + i;
+
+ do {
+ start = u64_stats_fetch_begin_irq(&rq_stats->syncp);
+
+ xdp_iter->errors = rq_stats->vs.xdp_errors;
+ xdp_iter->redirect = rq_stats->vs.xdp_redirect;
+ xdp_iter->drop = rq_stats->vs.xdp_drops;
+ xdp_iter->tx = rq_stats->vs.xdp_tx;
+ xdp_iter->tx_errors = rq_stats->vs.xdp_tx_err;
+ } while (u64_stats_fetch_retry_irq(&rq_stats->syncp, start));
+ }
+
+ if (!peer)
+ return 0;
+
+ priv = netdev_priv(peer);
+
+ for (i = 0; i < peer->real_num_rx_queues; i++) {
+ rq_stats = &priv->rq[i].stats;
+ xdp_iter = xdp_stats + (i % dev->real_num_tx_queues);
+
+ do {
+ start = u64_stats_fetch_begin_irq(&rq_stats->syncp);
+
+ xmit_packets = rq_stats->vs.peer_tq_xdp_xmit;
+ xmit_errors = rq_stats->vs.peer_tq_xdp_xmit_err;
+ } while (u64_stats_fetch_retry_irq(&rq_stats->syncp, start));
+
+ xdp_iter->xmit_packets += xmit_packets;
+ xdp_iter->xmit_errors += xmit_errors;
+ }
+
+ return 0;
+}
+
/* fake multicast ability */
static void veth_set_multicast_list(struct net_device *dev)
{
@@ -1537,6 +1602,8 @@ static const struct net_device_ops veth_netdev_ops = {
.ndo_stop = veth_close,
.ndo_start_xmit = veth_xmit,
.ndo_get_stats64 = veth_get_stats64,
+ .ndo_get_xdp_stats_nch = veth_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = veth_get_xdp_stats,
.ndo_set_rx_mode = veth_set_multicast_list,
.ndo_set_mac_address = eth_mac_addr,
#ifdef CONFIG_NET_POLL_CONTROLLER
--
2.33.1
Similarly to mlx5, count XDP_ABORTED and other Rx XDP errors
separately from XDP_DROP to better align with generic XDP stats.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
---
drivers/net/veth.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 5ca0a899101d..0e6c030576f4 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -43,6 +43,7 @@ struct veth_stats {
u64 xdp_packets;
u64 xdp_bytes;
u64 xdp_redirect;
+ u64 xdp_errors;
u64 xdp_drops;
u64 xdp_tx;
u64 xdp_tx_err;
@@ -96,6 +97,7 @@ static const struct veth_q_stat_desc veth_rq_stats_desc[] = {
{ "xdp_bytes", VETH_RQ_STAT(xdp_bytes) },
{ "drops", VETH_RQ_STAT(rx_drops) },
{ "xdp_redirect", VETH_RQ_STAT(xdp_redirect) },
+ { "xdp_errors", VETH_RQ_STAT(xdp_errors) },
{ "xdp_drops", VETH_RQ_STAT(xdp_drops) },
{ "xdp_tx", VETH_RQ_STAT(xdp_tx) },
{ "xdp_tx_errors", VETH_RQ_STAT(xdp_tx_err) },
@@ -655,16 +657,18 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
fallthrough;
case XDP_ABORTED:
trace_xdp_exception(rq->dev, xdp_prog, act);
- fallthrough;
+ goto err_xdp;
case XDP_DROP:
stats->xdp_drops++;
- goto err_xdp;
+ goto xdp_drop;
}
}
rcu_read_unlock();
return frame;
err_xdp:
+ stats->xdp_errors++;
+xdp_drop:
rcu_read_unlock();
xdp_return_frame(frame);
xdp_xmit:
@@ -805,7 +809,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
fallthrough;
case XDP_ABORTED:
trace_xdp_exception(rq->dev, xdp_prog, act);
- fallthrough;
+ stats->xdp_errors++;
+ goto xdp_drop;
case XDP_DROP:
stats->xdp_drops++;
goto xdp_drop;
--
2.33.1
There's no need to fetch an XSK pool desc in case our ring is full,
we can rather quit under unlikely branch.
Can't skip taking a lock here unfortunately since igc_desc_unused()
assumes we call it being locked.
This was designed to track xsk_tx::full counter, but won't hurt
either way.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
---
drivers/net/ethernet/intel/igc/igc_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 8e448288ee26..7d0c540d6b76 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2604,6 +2604,8 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
__netif_tx_lock(nq, cpu);
budget = igc_desc_unused(ring);
+ if (unlikely(!budget))
+ goto out_unlock;
while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
u32 cmd_type, olinfo_status;
@@ -2644,6 +2646,7 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
xsk_tx_release(pool);
}
+out_unlock:
__netif_tx_unlock(nq);
}
--
2.33.1
Make ixgbe driver collect and provide all generic XDP/XSK counters.
Unfortunately, XDP rings have a lifetime of an XDP prog, and all
ring stats structures get wiped on xsk_pool attach/detach, so
store them in a separate array with a lifetime of a netdev.
Reuse all previously introduced helpers and
xdp_get_drv_stats_generic(). Performance wavering from incrementing
a bunch of counters on hotpath is around stddev at [64 ... 1532]
frame sizes.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 1 +
drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c | 3 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 69 ++++++++++++++++---
drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 56 +++++++++++----
4 files changed, 106 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 4a69823e6abd..d60794636925 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -349,6 +349,7 @@ struct ixgbe_ring {
struct ixgbe_tx_queue_stats tx_stats;
struct ixgbe_rx_queue_stats rx_stats;
};
+ struct xdp_drv_stats *xdp_stats;
u16 rx_offset;
struct xdp_rxq_info xdp_rxq;
spinlock_t tx_lock; /* used in XDP mode */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 86b11164655e..c146963adbd5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -951,6 +951,7 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
ring->queue_index = xdp_idx;
set_ring_xdp(ring);
spin_lock_init(&ring->tx_lock);
+ ring->xdp_stats = adapter->netdev->xstats + xdp_idx;
/* assign ring to adapter */
WRITE_ONCE(adapter->xdp_ring[xdp_idx], ring);
@@ -994,6 +995,7 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
/* apply Rx specific ring traits */
ring->count = adapter->rx_ring_count;
ring->queue_index = rxr_idx;
+ ring->xdp_stats = adapter->netdev->xstats + rxr_idx;
/* assign ring to adapter */
WRITE_ONCE(adapter->rx_ring[rxr_idx], ring);
@@ -1303,4 +1305,3 @@ void ixgbe_tx_ctxtdesc(struct ixgbe_ring *tx_ring, u32 vlan_macip_lens,
context_desc->type_tucmd_mlhl = cpu_to_le32(type_tucmd);
context_desc->mss_l4len_idx = cpu_to_le32(mss_l4len_idx);
}
-
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0f9f022260d7..d1cfd7d6a72b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1246,8 +1246,11 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
return true;
}
- if (ring_is_xdp(tx_ring))
+ if (ring_is_xdp(tx_ring)) {
+ xdp_update_tx_drv_stats(&tx_ring->xdp_stats->xdp_tx,
+ total_packets, total_bytes);
return !!budget;
+ }
netdev_tx_completed_queue(txring_txq(tx_ring),
total_packets, total_bytes);
@@ -2196,7 +2199,8 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
struct ixgbe_ring *rx_ring,
- struct xdp_buff *xdp)
+ struct xdp_buff *xdp,
+ struct xdp_rx_drv_stats_local *lrstats)
{
int err, result = IXGBE_XDP_PASS;
struct bpf_prog *xdp_prog;
@@ -2209,40 +2213,57 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
if (!xdp_prog)
goto xdp_out;
+ lrstats->bytes += xdp->data_end - xdp->data;
+ lrstats->packets++;
+
prefetchw(xdp->data_hard_start); /* xdp_frame write */
act = bpf_prog_run_xdp(xdp_prog, xdp);
switch (act) {
case XDP_PASS:
+ lrstats->pass++;
break;
case XDP_TX:
xdpf = xdp_convert_buff_to_frame(xdp);
- if (unlikely(!xdpf))
+ if (unlikely(!xdpf)) {
+ lrstats->tx_errors++;
goto out_failure;
+ }
ring = ixgbe_determine_xdp_ring(adapter);
if (static_branch_unlikely(&ixgbe_xdp_locking_key))
spin_lock(&ring->tx_lock);
result = ixgbe_xmit_xdp_ring(ring, xdpf);
if (static_branch_unlikely(&ixgbe_xdp_locking_key))
spin_unlock(&ring->tx_lock);
- if (result == IXGBE_XDP_CONSUMED)
+ if (result == IXGBE_XDP_CONSUMED) {
+ lrstats->tx_errors++;
goto out_failure;
+ }
+ lrstats->tx++;
break;
case XDP_REDIRECT:
err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog);
- if (err)
+ if (err) {
+ lrstats->redirect_errors++;
goto out_failure;
+ }
result = IXGBE_XDP_REDIR;
+ lrstats->redirect++;
break;
default:
bpf_warn_invalid_xdp_action(act);
- fallthrough;
+ lrstats->invalid++;
+ goto out_failure;
case XDP_ABORTED:
+ lrstats->aborted++;
out_failure:
trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
- fallthrough; /* handle aborts by dropping packet */
+ /* handle aborts by dropping packet */
+ result = IXGBE_XDP_CONSUMED;
+ break;
case XDP_DROP:
result = IXGBE_XDP_CONSUMED;
+ lrstats->drop++;
break;
}
xdp_out:
@@ -2301,6 +2322,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
unsigned int mss = 0;
#endif /* IXGBE_FCOE */
u16 cleaned_count = ixgbe_desc_unused(rx_ring);
+ struct xdp_rx_drv_stats_local lrstats = { };
unsigned int offset = rx_ring->rx_offset;
unsigned int xdp_xmit = 0;
struct xdp_buff xdp;
@@ -2348,7 +2370,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
/* At larger PAGE_SIZE, frame_sz depend on len size */
xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size);
#endif
- skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
+ skb = ixgbe_run_xdp(adapter, rx_ring, &xdp, &lrstats);
}
if (IS_ERR(skb)) {
@@ -2440,6 +2462,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
rx_ring->stats.packets += total_rx_packets;
rx_ring->stats.bytes += total_rx_bytes;
u64_stats_update_end(&rx_ring->syncp);
+ xdp_update_rx_drv_stats(&rx_ring->xdp_stats->xdp_rx, &lrstats);
q_vector->rx.total_packets += total_rx_packets;
q_vector->rx.total_bytes += total_rx_bytes;
@@ -8552,8 +8575,10 @@ int ixgbe_xmit_xdp_ring(struct ixgbe_ring *ring,
len = xdpf->len;
- if (unlikely(!ixgbe_desc_unused(ring)))
+ if (unlikely(!ixgbe_desc_unused(ring))) {
+ xdp_update_tx_drv_full(&ring->xdp_stats->xdp_tx);
return IXGBE_XDP_CONSUMED;
+ }
dma = dma_map_single(ring->dev, xdpf->data, len, DMA_TO_DEVICE);
if (dma_mapping_error(ring->dev, dma))
@@ -10257,12 +10282,26 @@ static int ixgbe_xdp_xmit(struct net_device *dev, int n,
if (unlikely(flags & XDP_XMIT_FLUSH))
ixgbe_xdp_ring_update_tail(ring);
+ if (unlikely(nxmit < n))
+ xdp_update_tx_drv_err(&ring->xdp_stats->xdp_tx, n - nxmit);
+
if (static_branch_unlikely(&ixgbe_xdp_locking_key))
spin_unlock(&ring->tx_lock);
return nxmit;
}
+static int ixgbe_get_xdp_stats_nch(const struct net_device *dev, u32 attr_id)
+{
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ case IFLA_XDP_XSTATS_TYPE_XSK:
+ return IXGBE_MAX_XDP_QS;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_open = ixgbe_open,
.ndo_stop = ixgbe_close,
@@ -10306,6 +10345,8 @@ static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_bpf = ixgbe_xdp,
.ndo_xdp_xmit = ixgbe_xdp_xmit,
.ndo_xsk_wakeup = ixgbe_xsk_wakeup,
+ .ndo_get_xdp_stats_nch = ixgbe_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = xdp_get_drv_stats_generic,
};
static void ixgbe_disable_txr_hw(struct ixgbe_adapter *adapter,
@@ -10712,6 +10753,16 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
netdev->watchdog_timeo = 5 * HZ;
strlcpy(netdev->name, pci_name(pdev), sizeof(netdev->name));
+ netdev->xstats = devm_kcalloc(&pdev->dev, IXGBE_MAX_XDP_QS,
+ sizeof(*netdev->xstats), GFP_KERNEL);
+ if (!netdev->xstats) {
+ err = -ENOMEM;
+ goto err_ioremap;
+ }
+
+ for (i = 0; i < IXGBE_MAX_XDP_QS; i++)
+ xdp_init_drv_stats(netdev->xstats + i);
+
/* Setup hw api */
hw->mac.ops = *ii->mac_ops;
hw->mac.type = ii->mac;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index db2bc58dfcfd..47c4b4621ab1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -96,7 +96,8 @@ int ixgbe_xsk_pool_setup(struct ixgbe_adapter *adapter,
static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
struct ixgbe_ring *rx_ring,
- struct xdp_buff *xdp)
+ struct xdp_buff *xdp,
+ struct xdp_rx_drv_stats_local *lrstats)
{
int err, result = IXGBE_XDP_PASS;
struct bpf_prog *xdp_prog;
@@ -104,41 +105,58 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
struct xdp_frame *xdpf;
u32 act;
+ lrstats->bytes += xdp->data_end - xdp->data;
+ lrstats->packets++;
+
xdp_prog = READ_ONCE(rx_ring->xdp_prog);
act = bpf_prog_run_xdp(xdp_prog, xdp);
if (likely(act == XDP_REDIRECT)) {
err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
- if (err)
+ if (err) {
+ lrstats->redirect_errors++;
goto out_failure;
+ }
+ lrstats->redirect++;
return IXGBE_XDP_REDIR;
}
switch (act) {
case XDP_PASS:
+ lrstats->pass++;
break;
case XDP_TX:
xdpf = xdp_convert_buff_to_frame(xdp);
- if (unlikely(!xdpf))
+ if (unlikely(!xdpf)) {
+ lrstats->tx_errors++;
goto out_failure;
+ }
ring = ixgbe_determine_xdp_ring(adapter);
if (static_branch_unlikely(&ixgbe_xdp_locking_key))
spin_lock(&ring->tx_lock);
result = ixgbe_xmit_xdp_ring(ring, xdpf);
if (static_branch_unlikely(&ixgbe_xdp_locking_key))
spin_unlock(&ring->tx_lock);
- if (result == IXGBE_XDP_CONSUMED)
+ if (result == IXGBE_XDP_CONSUMED) {
+ lrstats->tx_errors++;
goto out_failure;
+ }
+ lrstats->tx++;
break;
default:
bpf_warn_invalid_xdp_action(act);
- fallthrough;
+ lrstats->invalid++;
+ goto out_failure;
case XDP_ABORTED:
+ lrstats->aborted++;
out_failure:
trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
- fallthrough; /* handle aborts by dropping packet */
+ /* handle aborts by dropping packet */
+ result = IXGBE_XDP_CONSUMED;
+ break;
case XDP_DROP:
result = IXGBE_XDP_CONSUMED;
+ lrstats->drop++;
break;
}
return result;
@@ -246,6 +264,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
unsigned int total_rx_bytes = 0, total_rx_packets = 0;
struct ixgbe_adapter *adapter = q_vector->adapter;
u16 cleaned_count = ixgbe_desc_unused(rx_ring);
+ struct xdp_rx_drv_stats_local lrstats = { };
unsigned int xdp_res, xdp_xmit = 0;
bool failure = false;
struct sk_buff *skb;
@@ -299,7 +318,8 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
bi->xdp->data_end = bi->xdp->data + size;
xsk_buff_dma_sync_for_cpu(bi->xdp, rx_ring->xsk_pool);
- xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, bi->xdp);
+ xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, bi->xdp,
+ &lrstats);
if (xdp_res) {
if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR))
@@ -349,6 +369,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
rx_ring->stats.packets += total_rx_packets;
rx_ring->stats.bytes += total_rx_bytes;
u64_stats_update_end(&rx_ring->syncp);
+ xdp_update_rx_drv_stats(&rx_ring->xdp_stats->xsk_rx, &lrstats);
q_vector->rx.total_packets += total_rx_packets;
q_vector->rx.total_bytes += total_rx_bytes;
@@ -392,6 +413,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
while (budget-- > 0) {
if (unlikely(!ixgbe_desc_unused(xdp_ring)) ||
!netif_carrier_ok(xdp_ring->netdev)) {
+ xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xsk_tx);
work_done = false;
break;
}
@@ -448,9 +470,10 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
unsigned int total_packets = 0, total_bytes = 0;
struct xsk_buff_pool *pool = tx_ring->xsk_pool;
+ u32 xdp_frames = 0, xdp_bytes = 0;
+ u32 xsk_frames = 0, xsk_bytes = 0;
union ixgbe_adv_tx_desc *tx_desc;
struct ixgbe_tx_buffer *tx_bi;
- u32 xsk_frames = 0;
tx_bi = &tx_ring->tx_buffer_info[ntc];
tx_desc = IXGBE_TX_DESC(tx_ring, ntc);
@@ -459,13 +482,14 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
break;
- total_bytes += tx_bi->bytecount;
- total_packets += tx_bi->gso_segs;
-
- if (tx_bi->xdpf)
+ if (tx_bi->xdpf) {
ixgbe_clean_xdp_tx_buffer(tx_ring, tx_bi);
- else
+ xdp_bytes += tx_bi->bytecount;
+ xdp_frames++;
+ } else {
+ xsk_bytes += tx_bi->bytecount;
xsk_frames++;
+ }
tx_bi->xdpf = NULL;
@@ -483,11 +507,17 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
}
tx_ring->next_to_clean = ntc;
+ total_bytes = xdp_bytes + xsk_bytes;
+ total_packets = xdp_frames + xsk_frames;
u64_stats_update_begin(&tx_ring->syncp);
tx_ring->stats.bytes += total_bytes;
tx_ring->stats.packets += total_packets;
u64_stats_update_end(&tx_ring->syncp);
+ xdp_update_tx_drv_stats(&tx_ring->xdp_stats->xdp_tx, xdp_frames,
+ xdp_bytes);
+ xdp_update_tx_drv_stats(&tx_ring->xdp_stats->xsk_tx, xsk_frames,
+ xsk_bytes);
q_vector->tx.total_bytes += total_bytes;
q_vector->tx.total_packets += total_packets;
--
2.33.1
Make i40e driver collect and provide all generic XDP/XSK counters.
Unfortunately, XDP rings have a lifetime of an XDP prog, and all
ring stats structures get wiped on xsk_pool attach/detach, so
store them in a separate array with a lifetime of a VSI.
Reuse all previously introduced helpers and
xdp_get_drv_stats_generic(). Performance wavering from incrementing
a bunch of counters on hotpath is around stddev at [64 ... 1532]
frame sizes.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
---
drivers/net/ethernet/intel/i40e/i40e.h | 1 +
drivers/net/ethernet/intel/i40e/i40e_main.c | 38 +++++++++++++++++++-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 40 +++++++++++++++++----
drivers/net/ethernet/intel/i40e/i40e_txrx.h | 1 +
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 33 +++++++++++++----
5 files changed, 99 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 4d939af0a626..2e2a3936332f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -942,6 +942,7 @@ struct i40e_vsi {
irqreturn_t (*irq_handler)(int irq, void *data);
unsigned long *af_xdp_zc_qps; /* tracks AF_XDP ZC enabled qps */
+ struct xdp_drv_stats *xdp_stats; /* XDP/XSK stats array */
} ____cacheline_internodealigned_in_smp;
struct i40e_netdev_priv {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e118cf9265c7..e3619fc13630 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11087,7 +11087,7 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi, bool alloc_qvectors)
{
struct i40e_ring **next_rings;
- int size;
+ int size, i;
int ret = 0;
/* allocate memory for both Tx, XDP Tx and Rx ring pointers */
@@ -11103,6 +11103,15 @@ static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi, bool alloc_qvectors)
}
vsi->rx_rings = next_rings;
+ vsi->xdp_stats = kcalloc(vsi->alloc_queue_pairs,
+ sizeof(*vsi->xdp_stats),
+ GFP_KERNEL);
+ if (!vsi->xdp_stats)
+ goto err_xdp_stats;
+
+ for (i = 0; i < vsi->alloc_queue_pairs; i++)
+ xdp_init_drv_stats(vsi->xdp_stats + i);
+
if (alloc_qvectors) {
/* allocate memory for q_vector pointers */
size = sizeof(struct i40e_q_vector *) * vsi->num_q_vectors;
@@ -11115,6 +11124,10 @@ static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi, bool alloc_qvectors)
return ret;
err_vectors:
+ kfree(vsi->xdp_stats);
+ vsi->xdp_stats = NULL;
+
+err_xdp_stats:
kfree(vsi->tx_rings);
return ret;
}
@@ -11225,6 +11238,10 @@ static void i40e_vsi_free_arrays(struct i40e_vsi *vsi, bool free_qvectors)
kfree(vsi->q_vectors);
vsi->q_vectors = NULL;
}
+
+ kfree(vsi->xdp_stats);
+ vsi->xdp_stats = NULL;
+
kfree(vsi->tx_rings);
vsi->tx_rings = NULL;
vsi->rx_rings = NULL;
@@ -11347,6 +11364,7 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
if (vsi->back->hw_features & I40E_HW_WB_ON_ITR_CAPABLE)
ring->flags = I40E_TXR_FLAGS_WB_ON_ITR;
ring->itr_setting = pf->tx_itr_default;
+ ring->xdp_stats = vsi->xdp_stats + i;
WRITE_ONCE(vsi->tx_rings[i], ring++);
if (!i40e_enabled_xdp_vsi(vsi))
@@ -11365,6 +11383,7 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
ring->flags = I40E_TXR_FLAGS_WB_ON_ITR;
set_ring_xdp(ring);
ring->itr_setting = pf->tx_itr_default;
+ ring->xdp_stats = vsi->xdp_stats + i;
WRITE_ONCE(vsi->xdp_rings[i], ring++);
setup_rx:
@@ -11378,6 +11397,7 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
ring->size = 0;
ring->dcb_tc = 0;
ring->itr_setting = pf->rx_itr_default;
+ ring->xdp_stats = vsi->xdp_stats + i;
WRITE_ONCE(vsi->rx_rings[i], ring);
}
@@ -13308,6 +13328,19 @@ static int i40e_xdp(struct net_device *dev,
}
}
+static int i40e_get_xdp_stats_nch(const struct net_device *dev, u32 attr_id)
+{
+ const struct i40e_netdev_priv *np = netdev_priv(dev);
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ case IFLA_XDP_XSTATS_TYPE_XSK:
+ return np->vsi->alloc_queue_pairs;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static const struct net_device_ops i40e_netdev_ops = {
.ndo_open = i40e_open,
.ndo_stop = i40e_close,
@@ -13343,6 +13376,8 @@ static const struct net_device_ops i40e_netdev_ops = {
.ndo_bpf = i40e_xdp,
.ndo_xdp_xmit = i40e_xdp_xmit,
.ndo_xsk_wakeup = i40e_xsk_wakeup,
+ .ndo_get_xdp_stats_nch = i40e_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = xdp_get_drv_stats_generic,
.ndo_dfwd_add_station = i40e_fwd_add,
.ndo_dfwd_del_station = i40e_fwd_del,
};
@@ -13487,6 +13522,7 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
netdev->netdev_ops = &i40e_netdev_ops;
netdev->watchdog_timeo = 5 * HZ;
i40e_set_ethtool_ops(netdev);
+ netdev->xstats = vsi->xdp_stats;
/* MTU range: 68 - 9706 */
netdev->min_mtu = ETH_MIN_MTU;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 10a83e5385c7..8854004fbec3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1027,8 +1027,11 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
i40e_update_tx_stats(tx_ring, total_packets, total_bytes);
i40e_arm_wb(tx_ring, vsi, budget);
- if (ring_is_xdp(tx_ring))
+ if (ring_is_xdp(tx_ring)) {
+ xdp_update_tx_drv_stats(&tx_ring->xdp_stats->xdp_tx,
+ total_packets, total_bytes);
return !!budget;
+ }
/* notify netdev of completed buffers */
netdev_tx_completed_queue(txring_txq(tx_ring),
@@ -2290,8 +2293,10 @@ int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp, struct i40e_ring *xdp_ring)
* i40e_run_xdp - run an XDP program
* @rx_ring: Rx ring being processed
* @xdp: XDP buffer containing the frame
+ * @lrstats: onstack Rx XDP stats
**/
-static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
+static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp,
+ struct xdp_rx_drv_stats_local *lrstats)
{
int err, result = I40E_XDP_PASS;
struct i40e_ring *xdp_ring;
@@ -2303,33 +2308,48 @@ static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
if (!xdp_prog)
goto xdp_out;
+ lrstats->bytes += xdp->data_end - xdp->data;
+ lrstats->packets++;
+
prefetchw(xdp->data_hard_start); /* xdp_frame write */
act = bpf_prog_run_xdp(xdp_prog, xdp);
switch (act) {
case XDP_PASS:
+ lrstats->pass++;
break;
case XDP_TX:
xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring);
- if (result == I40E_XDP_CONSUMED)
+ if (result == I40E_XDP_CONSUMED) {
+ lrstats->tx_errors++;
goto out_failure;
+ }
+ lrstats->tx++;
break;
case XDP_REDIRECT:
err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
- if (err)
+ if (err) {
+ lrstats->redirect_errors++;
goto out_failure;
+ }
result = I40E_XDP_REDIR;
+ lrstats->redirect++;
break;
default:
bpf_warn_invalid_xdp_action(act);
- fallthrough;
+ lrstats->invalid++;
+ goto out_failure;
case XDP_ABORTED:
+ lrstats->aborted++;
out_failure:
trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
- fallthrough; /* handle aborts by dropping packet */
+ /* handle aborts by dropping packet */
+ result = I40E_XDP_CONSUMED;
+ break;
case XDP_DROP:
result = I40E_XDP_CONSUMED;
+ lrstats->drop++;
break;
}
xdp_out:
@@ -2441,6 +2461,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
{
unsigned int total_rx_bytes = 0, total_rx_packets = 0, frame_sz = 0;
u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
+ struct xdp_rx_drv_stats_local lrstats = { };
unsigned int offset = rx_ring->rx_offset;
struct sk_buff *skb = rx_ring->skb;
unsigned int xdp_xmit = 0;
@@ -2512,7 +2533,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
/* At larger PAGE_SIZE, frame_sz depend on len size */
xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, size);
#endif
- xdp_res = i40e_run_xdp(rx_ring, &xdp);
+ xdp_res = i40e_run_xdp(rx_ring, &xdp, &lrstats);
}
if (xdp_res) {
@@ -2569,6 +2590,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
rx_ring->skb = skb;
i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
+ xdp_update_rx_drv_stats(&rx_ring->xdp_stats->xdp_rx, &lrstats);
/* guarantee a trip back through this routine if there was a failure */
return failure ? budget : (int)total_rx_packets;
@@ -3696,6 +3718,7 @@ static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf,
dma_addr_t dma;
if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) {
+ xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xdp_tx);
xdp_ring->tx_stats.tx_busy++;
return I40E_XDP_CONSUMED;
}
@@ -3923,5 +3946,8 @@ int i40e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
if (unlikely(flags & XDP_XMIT_FLUSH))
i40e_xdp_ring_update_tail(xdp_ring);
+ if (unlikely(nxmit < n))
+ xdp_update_tx_drv_err(&xdp_ring->xdp_stats->xdp_tx, n - nxmit);
+
return nxmit;
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index bfc2845c99d1..dcfcf20e2ea9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -368,6 +368,7 @@ struct i40e_ring {
struct i40e_tx_queue_stats tx_stats;
struct i40e_rx_queue_stats rx_stats;
};
+ struct xdp_drv_stats *xdp_stats;
unsigned int size; /* length of descriptor ring in bytes */
dma_addr_t dma; /* physical address of ring */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index ea06e957393e..54c5b8abbb53 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -143,16 +143,21 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
* i40e_run_xdp_zc - Executes an XDP program on an xdp_buff
* @rx_ring: Rx ring
* @xdp: xdp_buff used as input to the XDP program
+ * @lrstats: onstack Rx XDP stats structure
*
* Returns any of I40E_XDP_{PASS, CONSUMED, TX, REDIR}
**/
-static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
+static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp,
+ struct xdp_rx_drv_stats_local *lrstats)
{
int err, result = I40E_XDP_PASS;
struct i40e_ring *xdp_ring;
struct bpf_prog *xdp_prog;
u32 act;
+ lrstats->bytes += xdp->data_end - xdp->data;
+ lrstats->packets++;
+
/* NB! xdp_prog will always be !NULL, due to the fact that
* this path is enabled by setting an XDP program.
*/
@@ -161,29 +166,41 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
if (likely(act == XDP_REDIRECT)) {
err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
- if (err)
+ if (err) {
+ lrstats->redirect_errors++;
goto out_failure;
+ }
+ lrstats->redirect++;
return I40E_XDP_REDIR;
}
switch (act) {
case XDP_PASS:
+ lrstats->pass++;
break;
case XDP_TX:
xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
result = i40e_xmit_xdp_tx_ring(xdp, xdp_ring);
- if (result == I40E_XDP_CONSUMED)
+ if (result == I40E_XDP_CONSUMED) {
+ lrstats->tx_errors++;
goto out_failure;
+ }
+ lrstats->tx++;
break;
default:
bpf_warn_invalid_xdp_action(act);
- fallthrough;
+ lrstats->invalid++;
+ goto out_failure;
case XDP_ABORTED:
+ lrstats->aborted++;
out_failure:
trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
- fallthrough; /* handle aborts by dropping packet */
+ /* handle aborts by dropping packet */
+ result = I40E_XDP_CONSUMED;
+ break;
case XDP_DROP:
result = I40E_XDP_CONSUMED;
+ lrstats->drop++;
break;
}
return result;
@@ -325,6 +342,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
{
unsigned int total_rx_bytes = 0, total_rx_packets = 0;
u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
+ struct xdp_rx_drv_stats_local lrstats = { };
u16 next_to_clean = rx_ring->next_to_clean;
u16 count_mask = rx_ring->count - 1;
unsigned int xdp_res, xdp_xmit = 0;
@@ -366,7 +384,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
xsk_buff_set_size(bi, size);
xsk_buff_dma_sync_for_cpu(bi, rx_ring->xsk_pool);
- xdp_res = i40e_run_xdp_zc(rx_ring, bi);
+ xdp_res = i40e_run_xdp_zc(rx_ring, bi, &lrstats);
i40e_handle_xdp_result_zc(rx_ring, bi, rx_desc, &rx_packets,
&rx_bytes, size, xdp_res);
total_rx_packets += rx_packets;
@@ -383,6 +401,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
i40e_finalize_xdp_rx(rx_ring, xdp_xmit);
i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
+ xdp_update_rx_drv_stats(&rx_ring->xdp_stats->xsk_rx, &lrstats);
if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
if (failure || next_to_clean == rx_ring->next_to_use)
@@ -489,6 +508,8 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
i40e_xdp_ring_update_tail(xdp_ring);
i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes);
+ xdp_update_tx_drv_stats(&xdp_ring->xdp_stats->xsk_tx, nb_pkts,
+ total_bytes);
return nb_pkts < budget;
}
--
2.33.1
Add a couple of hints on how to retrieve and implement generic XDP
statistics for drivers/interfaces. Mention that it's unwanted to
include related XDP counters in driver-defined Ethtool stats.
Signed-off-by: Alexander Lobakin <[email protected]>
---
Documentation/networking/statistics.rst | 33 +++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/Documentation/networking/statistics.rst b/Documentation/networking/statistics.rst
index c9aeb70dafa2..ec5d14f279e1 100644
--- a/Documentation/networking/statistics.rst
+++ b/Documentation/networking/statistics.rst
@@ -41,6 +41,29 @@ If `-s` is specified once the detailed errors won't be shown.
`ip` supports JSON formatting via the `-j` option.
+For some interfaces, standard XDP statistics are available.
+It can be accessed the same ways, e.g. `ip`::
+
+ $ ip link xdpstats dev enp178s0
+ 16: enp178s0:
+ xdp-channel0-rx_xdp_packets: 0
+ xdp-channel0-rx_xdp_bytes: 1
+ xdp-channel0-rx_xdp_errors: 2
+ xdp-channel0-rx_xdp_aborted: 3
+ xdp-channel0-rx_xdp_drop: 4
+ xdp-channel0-rx_xdp_invalid: 5
+ xdp-channel0-rx_xdp_pass: 6
+ xdp-channel0-rx_xdp_redirect: 7
+ xdp-channel0-rx_xdp_redirect_errors: 8
+ xdp-channel0-rx_xdp_tx: 9
+ xdp-channel0-rx_xdp_tx_errors: 10
+ xdp-channel0-tx_xdp_xmit_packets: 11
+ xdp-channel0-tx_xdp_xmit_bytes: 12
+ xdp-channel0-tx_xdp_xmit_errors: 13
+ xdp-channel0-tx_xdp_xmit_full: 14
+
+Those are usually per-channel. JSON is also supported via the `-j` opt.
+
Protocol-specific statistics
----------------------------
@@ -147,6 +170,8 @@ Statistics are reported both in the responses to link information
requests (`RTM_GETLINK`) and statistic requests (`RTM_GETSTATS`,
when `IFLA_STATS_LINK_64` bit is set in the `.filter_mask` of the request).
+`IFLA_STATS_LINK_XDP_XSTATS` bit is used to retrieve standard XDP statstics.
+
ethtool
-------
@@ -206,6 +231,14 @@ Retrieving ethtool statistics is a multi-syscall process, drivers are advised
to keep the number of statistics constant to avoid race conditions with
user space trying to read them.
+It is up to the developers whether to implement XDP statistics or not due to
+possible performance hits. If so, it is encouraged to export it using generic
+XDP statistics infrastructure, not driver-defined Ethtool stats.
+It can be achieve by implementing `.ndo_get_xdp_stats` and, optionally but
+preferred, `.ndo_get_xdp_stats_nch`. There are several common helper structures
+and functions in `include/net/xdp.h` to make this simpler and keep the code
+compact.
+
Statistics must persist across routine operations like bringing the interface
down and up.
--
2.33.1
Make ice driver collect and provide all generic XDP/XSK counters.
Unfortunately, XDP rings have a lifetime of an XDP prog, and all
ring stats structures get wiped on xsk_pool attach/detach, so
store them in a separate array with a lifetime of a VSI. New
alloc_xdp_stats field is used to calculate the maximum possible
number of XDP-enabled queues just once and refer to it later.
Reuse all previously introduced helpers and
xdp_get_drv_stats_generic(). Performance wavering from incrementing
a bunch of counters on hotpath is around stddev at [64 ... 1532]
frame sizes.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Maciej Fijalkowski <[email protected]>
---
drivers/net/ethernet/intel/ice/ice.h | 2 +
drivers/net/ethernet/intel/ice/ice_lib.c | 21 ++++++++
drivers/net/ethernet/intel/ice/ice_main.c | 17 +++++++
drivers/net/ethernet/intel/ice/ice_txrx.c | 33 +++++++++---
drivers/net/ethernet/intel/ice/ice_txrx.h | 12 +++--
drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 3 ++
drivers/net/ethernet/intel/ice/ice_xsk.c | 51 ++++++++++++++-----
7 files changed, 118 insertions(+), 21 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index b67ad51cbcc9..6cef8b4e887f 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -387,8 +387,10 @@ struct ice_vsi {
struct ice_tc_cfg tc_cfg;
struct bpf_prog *xdp_prog;
struct ice_tx_ring **xdp_rings; /* XDP ring array */
+ struct xdp_drv_stats *xdp_stats; /* XDP stats array */
unsigned long *af_xdp_zc_qps; /* tracks AF_XDP ZC enabled qps */
u16 num_xdp_txq; /* Used XDP queues */
+ u16 alloc_xdp_stats; /* Length of xdp_stats array */
u8 xdp_mapping_mode; /* ICE_MAP_MODE_[CONTIG|SCATTER] */
struct net_device **target_netdevs;
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 40562600a8cf..934152216df5 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -73,6 +73,7 @@ static int ice_vsi_alloc_arrays(struct ice_vsi *vsi)
{
struct ice_pf *pf = vsi->back;
struct device *dev;
+ u32 i;
dev = ice_pf_to_dev(pf);
if (vsi->type == ICE_VSI_CHNL)
@@ -115,8 +116,23 @@ static int ice_vsi_alloc_arrays(struct ice_vsi *vsi)
if (!vsi->af_xdp_zc_qps)
goto err_zc_qps;
+ vsi->alloc_xdp_stats = max_t(u16, vsi->alloc_rxq, num_possible_cpus());
+
+ vsi->xdp_stats = kcalloc(vsi->alloc_xdp_stats, sizeof(*vsi->xdp_stats),
+ GFP_KERNEL);
+ if (!vsi->xdp_stats)
+ goto err_xdp_stats;
+
+ for (i = 0; i < vsi->alloc_xdp_stats; i++)
+ xdp_init_drv_stats(vsi->xdp_stats + i);
+
return 0;
+err_xdp_stats:
+ vsi->alloc_xdp_stats = 0;
+
+ bitmap_free(vsi->af_xdp_zc_qps);
+ vsi->af_xdp_zc_qps = NULL;
err_zc_qps:
devm_kfree(dev, vsi->q_vectors);
err_vectors:
@@ -317,6 +333,10 @@ static void ice_vsi_free_arrays(struct ice_vsi *vsi)
dev = ice_pf_to_dev(pf);
+ kfree(vsi->xdp_stats);
+ vsi->xdp_stats = NULL;
+ vsi->alloc_xdp_stats = 0;
+
if (vsi->af_xdp_zc_qps) {
bitmap_free(vsi->af_xdp_zc_qps);
vsi->af_xdp_zc_qps = NULL;
@@ -1422,6 +1442,7 @@ static int ice_vsi_alloc_rings(struct ice_vsi *vsi)
ring->netdev = vsi->netdev;
ring->dev = dev;
ring->count = vsi->num_rx_desc;
+ ring->xdp_stats = vsi->xdp_stats + i;
WRITE_ONCE(vsi->rx_rings[i], ring);
}
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index f2a5f2f965d1..94d0bf440a49 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2481,6 +2481,7 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
xdp_ring->next_rs = ICE_TX_THRESH - 1;
xdp_ring->dev = dev;
xdp_ring->count = vsi->num_tx_desc;
+ xdp_ring->xdp_stats = vsi->xdp_stats + i;
WRITE_ONCE(vsi->xdp_rings[i], xdp_ring);
if (ice_setup_tx_ring(xdp_ring))
goto free_xdp_rings;
@@ -2837,6 +2838,19 @@ static int ice_xdp(struct net_device *dev, struct netdev_bpf *xdp)
}
}
+static int ice_get_xdp_stats_nch(const struct net_device *dev, u32 attr_id)
+{
+ const struct ice_netdev_priv *np = netdev_priv(dev);
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ case IFLA_XDP_XSTATS_TYPE_XSK:
+ return np->vsi->alloc_xdp_stats;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
/**
* ice_ena_misc_vector - enable the non-queue interrupts
* @pf: board private structure
@@ -3280,6 +3294,7 @@ static int ice_cfg_netdev(struct ice_vsi *vsi)
ice_set_netdev_features(netdev);
ice_set_ops(netdev);
+ netdev->xstats = vsi->xdp_stats;
if (vsi->type == ICE_VSI_PF) {
SET_NETDEV_DEV(netdev, ice_pf_to_dev(vsi->back));
@@ -8608,4 +8623,6 @@ static const struct net_device_ops ice_netdev_ops = {
.ndo_bpf = ice_xdp,
.ndo_xdp_xmit = ice_xdp_xmit,
.ndo_xsk_wakeup = ice_xsk_wakeup,
+ .ndo_get_xdp_stats_nch = ice_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = xdp_get_drv_stats_generic,
};
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index bc3ba19dc88f..d32d6f2975b5 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -532,19 +532,25 @@ ice_rx_frame_truesize(struct ice_rx_ring *rx_ring, unsigned int __maybe_unused s
* @xdp: xdp_buff used as input to the XDP program
* @xdp_prog: XDP program to run
* @xdp_ring: ring to be used for XDP_TX action
+ * @lrstats: onstack Rx XDP stats
*
* Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
*/
static int
ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring)
+ struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
+ struct xdp_rx_drv_stats_local *lrstats)
{
int err;
u32 act;
+ lrstats->bytes += xdp->data_end - xdp->data;
+ lrstats->packets++;
+
act = bpf_prog_run_xdp(xdp_prog, xdp);
switch (act) {
case XDP_PASS:
+ lrstats->pass++;
return ICE_XDP_PASS;
case XDP_TX:
if (static_branch_unlikely(&ice_xdp_locking_key))
@@ -552,22 +558,31 @@ ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
err = ice_xmit_xdp_ring(xdp->data, xdp->data_end - xdp->data, xdp_ring);
if (static_branch_unlikely(&ice_xdp_locking_key))
spin_unlock(&xdp_ring->tx_lock);
- if (err == ICE_XDP_CONSUMED)
+ if (err == ICE_XDP_CONSUMED) {
+ lrstats->tx_errors++;
goto out_failure;
+ }
+ lrstats->tx++;
return err;
case XDP_REDIRECT:
err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
- if (err)
+ if (err) {
+ lrstats->redirect_errors++;
goto out_failure;
+ }
+ lrstats->redirect++;
return ICE_XDP_REDIR;
default:
bpf_warn_invalid_xdp_action(act);
- fallthrough;
+ lrstats->invalid++;
+ goto out_failure;
case XDP_ABORTED:
+ lrstats->aborted++;
out_failure:
trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
- fallthrough;
+ return ICE_XDP_CONSUMED;
case XDP_DROP:
+ lrstats->drop++;
return ICE_XDP_CONSUMED;
}
}
@@ -627,6 +642,9 @@ ice_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
if (static_branch_unlikely(&ice_xdp_locking_key))
spin_unlock(&xdp_ring->tx_lock);
+ if (unlikely(nxmit < n))
+ xdp_update_tx_drv_err(&xdp_ring->xdp_stats->xdp_tx, n - nxmit);
+
return nxmit;
}
@@ -1089,6 +1107,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
{
unsigned int total_rx_bytes = 0, total_rx_pkts = 0, frame_sz = 0;
u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
+ struct xdp_rx_drv_stats_local lrstats = { };
unsigned int offset = rx_ring->rx_offset;
struct ice_tx_ring *xdp_ring = NULL;
unsigned int xdp_res, xdp_xmit = 0;
@@ -1173,7 +1192,8 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
if (!xdp_prog)
goto construct_skb;
- xdp_res = ice_run_xdp(rx_ring, &xdp, xdp_prog, xdp_ring);
+ xdp_res = ice_run_xdp(rx_ring, &xdp, xdp_prog, xdp_ring,
+ &lrstats);
if (!xdp_res)
goto construct_skb;
if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) {
@@ -1254,6 +1274,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
rx_ring->skb = skb;
ice_update_rx_ring_stats(rx_ring, total_rx_pkts, total_rx_bytes);
+ xdp_update_rx_drv_stats(&rx_ring->xdp_stats->xdp_rx, &lrstats);
/* guarantee a trip back through this routine if there was a failure */
return failure ? budget : (int)total_rx_pkts;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index c56dd1749903..c54be60c3479 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -284,9 +284,9 @@ struct ice_rx_ring {
struct ice_rxq_stats rx_stats;
struct ice_q_stats stats;
struct u64_stats_sync syncp;
+ struct xdp_drv_stats *xdp_stats;
- struct rcu_head rcu; /* to avoid race on free */
- /* CL4 - 3rd cacheline starts here */
+ /* CL4 - 4rd cacheline starts here */
struct ice_channel *ch;
struct bpf_prog *xdp_prog;
struct ice_tx_ring *xdp_ring;
@@ -298,6 +298,9 @@ struct ice_rx_ring {
u8 dcb_tc; /* Traffic class of ring */
u8 ptp_rx;
u8 flags;
+
+ /* CL5 - 5th cacheline starts here */
+ struct rcu_head rcu; /* to avoid race on free */
} ____cacheline_internodealigned_in_smp;
struct ice_tx_ring {
@@ -324,13 +327,16 @@ struct ice_tx_ring {
/* stats structs */
struct ice_q_stats stats;
struct u64_stats_sync syncp;
- struct ice_txq_stats tx_stats;
+ struct xdp_drv_stats *xdp_stats;
/* CL3 - 3rd cacheline starts here */
+ struct ice_txq_stats tx_stats;
struct rcu_head rcu; /* to avoid race on free */
DECLARE_BITMAP(xps_state, ICE_TX_NBITS); /* XPS Config State */
struct ice_channel *ch;
struct ice_ptp_tx *tx_tstamps;
+
+ /* CL4 - 4th cacheline starts here */
spinlock_t tx_lock;
u32 txq_teid; /* Added Tx queue TEID */
#define ICE_TX_FLAGS_RING_XDP BIT(0)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 1dd7e84f41f8..7dc287bc3a1a 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -258,6 +258,8 @@ static void ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
xdp_ring->next_dd = ICE_TX_THRESH - 1;
xdp_ring->next_to_clean = ntc;
ice_update_tx_ring_stats(xdp_ring, total_pkts, total_bytes);
+ xdp_update_tx_drv_stats(&xdp_ring->xdp_stats->xdp_tx, total_pkts,
+ total_bytes);
}
/**
@@ -277,6 +279,7 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring)
ice_clean_xdp_irq(xdp_ring);
if (!unlikely(ICE_DESC_UNUSED(xdp_ring))) {
+ xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xdp_tx);
xdp_ring->tx_stats.tx_busy++;
return ICE_XDP_CONSUMED;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index ff55cb415b11..62ef47a38d93 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -454,42 +454,58 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff **xdp_arr)
* @xdp: xdp_buff used as input to the XDP program
* @xdp_prog: XDP program to run
* @xdp_ring: ring to be used for XDP_TX action
+ * @lrstats: onstack Rx XDP stats
*
* Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
*/
static int
ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring)
+ struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
+ struct xdp_rx_drv_stats_local *lrstats)
{
int err, result = ICE_XDP_PASS;
u32 act;
+ lrstats->bytes += xdp->data_end - xdp->data;
+ lrstats->packets++;
+
act = bpf_prog_run_xdp(xdp_prog, xdp);
if (likely(act == XDP_REDIRECT)) {
err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
- if (err)
+ if (err) {
+ lrstats->redirect_errors++;
goto out_failure;
+ }
+ lrstats->redirect++;
return ICE_XDP_REDIR;
}
switch (act) {
case XDP_PASS:
+ lrstats->pass++;
break;
case XDP_TX:
result = ice_xmit_xdp_buff(xdp, xdp_ring);
- if (result == ICE_XDP_CONSUMED)
+ if (result == ICE_XDP_CONSUMED) {
+ lrstats->tx_errors++;
goto out_failure;
+ }
+ lrstats->tx++;
break;
default:
bpf_warn_invalid_xdp_action(act);
- fallthrough;
+ lrstats->invalid++;
+ goto out_failure;
case XDP_ABORTED:
+ lrstats->aborted++;
out_failure:
trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
- fallthrough;
+ result = ICE_XDP_CONSUMED;
+ break;
case XDP_DROP:
result = ICE_XDP_CONSUMED;
+ lrstats->drop++;
break;
}
@@ -507,6 +523,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
{
unsigned int total_rx_bytes = 0, total_rx_packets = 0;
u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
+ struct xdp_rx_drv_stats_local lrstats = { };
struct ice_tx_ring *xdp_ring;
unsigned int xdp_xmit = 0;
struct bpf_prog *xdp_prog;
@@ -548,7 +565,8 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
xsk_buff_set_size(*xdp, size);
xsk_buff_dma_sync_for_cpu(*xdp, rx_ring->xsk_pool);
- xdp_res = ice_run_xdp_zc(rx_ring, *xdp, xdp_prog, xdp_ring);
+ xdp_res = ice_run_xdp_zc(rx_ring, *xdp, xdp_prog, xdp_ring,
+ &lrstats);
if (xdp_res) {
if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))
xdp_xmit |= xdp_res;
@@ -598,6 +616,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
ice_finalize_xdp_rx(xdp_ring, xdp_xmit);
ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes);
+ xdp_update_rx_drv_stats(&rx_ring->xdp_stats->xsk_rx, &lrstats);
if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
@@ -629,6 +648,7 @@ static bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, int budget)
struct ice_tx_buf *tx_buf;
if (unlikely(!ICE_DESC_UNUSED(xdp_ring))) {
+ xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xsk_tx);
xdp_ring->tx_stats.tx_busy++;
work_done = false;
break;
@@ -686,11 +706,11 @@ ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf *tx_buf)
*/
bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget)
{
- int total_packets = 0, total_bytes = 0;
s16 ntc = xdp_ring->next_to_clean;
+ u32 xdp_frames = 0, xdp_bytes = 0;
+ u32 xsk_frames = 0, xsk_bytes = 0;
struct ice_tx_desc *tx_desc;
struct ice_tx_buf *tx_buf;
- u32 xsk_frames = 0;
bool xmit_done;
tx_desc = ICE_TX_DESC(xdp_ring, ntc);
@@ -702,13 +722,14 @@ bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget)
cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)))
break;
- total_bytes += tx_buf->bytecount;
- total_packets++;
-
if (tx_buf->raw_buf) {
ice_clean_xdp_tx_buf(xdp_ring, tx_buf);
tx_buf->raw_buf = NULL;
+
+ xdp_bytes += tx_buf->bytecount;
+ xdp_frames++;
} else {
+ xsk_bytes += tx_buf->bytecount;
xsk_frames++;
}
@@ -736,7 +757,13 @@ bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget)
if (xsk_uses_need_wakeup(xdp_ring->xsk_pool))
xsk_set_tx_need_wakeup(xdp_ring->xsk_pool);
- ice_update_tx_ring_stats(xdp_ring, total_packets, total_bytes);
+ ice_update_tx_ring_stats(xdp_ring, xdp_frames + xsk_frames,
+ xdp_bytes + xsk_bytes);
+ xdp_update_tx_drv_stats(&xdp_ring->xdp_stats->xdp_tx, xdp_frames,
+ xdp_bytes);
+ xdp_update_tx_drv_stats(&xdp_ring->xdp_stats->xsk_tx, xsk_frames,
+ xsk_bytes);
+
xmit_done = ice_xmit_zc(xdp_ring, ICE_DFLT_IRQ_WORK);
return budget > 0 && xmit_done;
--
2.33.1
Make igc driver collect and provide all generic XDP/XSK counters.
Unfortunately, igc has an unified ice_ring structure for both Rx
and Tx, so embedding xdp_drv_stats would bloat it for no good.
Store them in a separate array with a lifetime of an igc_adapter.
IGC_MAX_QUEUES is introduced purely for convenience to not hardcode
max(RX, TX) all the time.
Reuse all previously introduced helpers and
xdp_get_drv_stats_generic(). Performance wavering from incrementing
a bunch of counters on hotpath is around stddev at [64 ... 1532]
frame sizes.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
---
drivers/net/ethernet/intel/igc/igc.h | 3 +
drivers/net/ethernet/intel/igc/igc_main.c | 88 +++++++++++++++++++----
2 files changed, 77 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 3e386c38d016..ec46134227ee 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -21,6 +21,8 @@ void igc_ethtool_set_ops(struct net_device *);
/* Transmit and receive queues */
#define IGC_MAX_RX_QUEUES 4
#define IGC_MAX_TX_QUEUES 4
+#define IGC_MAX_QUEUES max(IGC_MAX_RX_QUEUES, \
+ IGC_MAX_TX_QUEUES)
#define MAX_Q_VECTORS 8
#define MAX_STD_JUMBO_FRAME_SIZE 9216
@@ -125,6 +127,7 @@ struct igc_ring {
struct sk_buff *skb;
};
};
+ struct xdp_drv_stats *xdp_stats;
struct xdp_rxq_info xdp_rxq;
struct xsk_buff_pool *xsk_pool;
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 7d0c540d6b76..2ffe4b2bfde7 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2148,8 +2148,10 @@ static int igc_xdp_init_tx_descriptor(struct igc_ring *ring,
u32 cmd_type, olinfo_status;
int err;
- if (!igc_desc_unused(ring))
+ if (!igc_desc_unused(ring)) {
+ xdp_update_tx_drv_full(&ring->xdp_stats->xdp_tx);
return -EBUSY;
+ }
buffer = &ring->tx_buffer_info[ring->next_to_use];
err = igc_xdp_init_tx_buffer(buffer, xdpf, ring);
@@ -2214,36 +2216,51 @@ static int igc_xdp_xmit_back(struct igc_adapter *adapter, struct xdp_buff *xdp)
/* This function assumes rcu_read_lock() is held by the caller. */
static int __igc_xdp_run_prog(struct igc_adapter *adapter,
struct bpf_prog *prog,
- struct xdp_buff *xdp)
+ struct xdp_buff *xdp,
+ struct xdp_rx_drv_stats_local *lrstats)
{
- u32 act = bpf_prog_run_xdp(prog, xdp);
+ u32 act;
+
+ lrstats->bytes += xdp->data_end - xdp->data;
+ lrstats->packets++;
+ act = bpf_prog_run_xdp(prog, xdp);
switch (act) {
case XDP_PASS:
+ lrstats->pass++;
return IGC_XDP_PASS;
case XDP_TX:
- if (igc_xdp_xmit_back(adapter, xdp) < 0)
+ if (igc_xdp_xmit_back(adapter, xdp) < 0) {
+ lrstats->tx_errors++;
goto out_failure;
+ }
+ lrstats->tx++;
return IGC_XDP_TX;
case XDP_REDIRECT:
- if (xdp_do_redirect(adapter->netdev, xdp, prog) < 0)
+ if (xdp_do_redirect(adapter->netdev, xdp, prog) < 0) {
+ lrstats->redirect_errors++;
goto out_failure;
+ }
+ lrstats->redirect++;
return IGC_XDP_REDIRECT;
- break;
default:
bpf_warn_invalid_xdp_action(act);
- fallthrough;
+ lrstats->invalid++;
+ goto out_failure;
case XDP_ABORTED:
+ lrstats->aborted++;
out_failure:
trace_xdp_exception(adapter->netdev, prog, act);
- fallthrough;
+ return IGC_XDP_CONSUMED;
case XDP_DROP:
+ lrstats->drop++;
return IGC_XDP_CONSUMED;
}
}
static struct sk_buff *igc_xdp_run_prog(struct igc_adapter *adapter,
- struct xdp_buff *xdp)
+ struct xdp_buff *xdp,
+ struct xdp_rx_drv_stats_local *lrstats)
{
struct bpf_prog *prog;
int res;
@@ -2254,7 +2271,7 @@ static struct sk_buff *igc_xdp_run_prog(struct igc_adapter *adapter,
goto out;
}
- res = __igc_xdp_run_prog(adapter, prog, xdp);
+ res = __igc_xdp_run_prog(adapter, prog, xdp, lrstats);
out:
return ERR_PTR(-res);
@@ -2309,6 +2326,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
unsigned int total_bytes = 0, total_packets = 0;
struct igc_adapter *adapter = q_vector->adapter;
struct igc_ring *rx_ring = q_vector->rx.ring;
+ struct xdp_rx_drv_stats_local lrstats = { };
struct sk_buff *skb = rx_ring->skb;
u16 cleaned_count = igc_desc_unused(rx_ring);
int xdp_status = 0, rx_buffer_pgcnt;
@@ -2356,7 +2374,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
xdp_prepare_buff(&xdp, pktbuf - igc_rx_offset(rx_ring),
igc_rx_offset(rx_ring) + pkt_offset, size, false);
- skb = igc_xdp_run_prog(adapter, &xdp);
+ skb = igc_xdp_run_prog(adapter, &xdp, &lrstats);
}
if (IS_ERR(skb)) {
@@ -2425,6 +2443,7 @@ static int igc_clean_rx_irq(struct igc_q_vector *q_vector, const int budget)
rx_ring->skb = skb;
igc_update_rx_stats(q_vector, total_packets, total_bytes);
+ xdp_update_rx_drv_stats(&rx_ring->xdp_stats->xdp_rx, &lrstats);
if (cleaned_count)
igc_alloc_rx_buffers(rx_ring, cleaned_count);
@@ -2481,6 +2500,7 @@ static void igc_dispatch_skb_zc(struct igc_q_vector *q_vector,
static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
{
struct igc_adapter *adapter = q_vector->adapter;
+ struct xdp_rx_drv_stats_local lrstats = { };
struct igc_ring *ring = q_vector->rx.ring;
u16 cleaned_count = igc_desc_unused(ring);
int total_bytes = 0, total_packets = 0;
@@ -2529,7 +2549,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
bi->xdp->data_end = bi->xdp->data + size;
xsk_buff_dma_sync_for_cpu(bi->xdp, ring->xsk_pool);
- res = __igc_xdp_run_prog(adapter, prog, bi->xdp);
+ res = __igc_xdp_run_prog(adapter, prog, bi->xdp, &lrstats);
switch (res) {
case IGC_XDP_PASS:
igc_dispatch_skb_zc(q_vector, desc, bi->xdp, timestamp);
@@ -2562,6 +2582,7 @@ static int igc_clean_rx_irq_zc(struct igc_q_vector *q_vector, const int budget)
igc_finalize_xdp(adapter, xdp_status);
igc_update_rx_stats(q_vector, total_packets, total_bytes);
+ xdp_update_rx_drv_stats(&ring->xdp_stats->xsk_rx, &lrstats);
if (xsk_uses_need_wakeup(ring->xsk_pool)) {
if (failure || ring->next_to_clean == ring->next_to_use)
@@ -2604,8 +2625,10 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring)
__netif_tx_lock(nq, cpu);
budget = igc_desc_unused(ring);
- if (unlikely(!budget))
+ if (unlikely(!budget)) {
+ xdp_update_tx_drv_full(&ring->xdp_stats->xsk_tx);
goto out_unlock;
+ }
while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) {
u32 cmd_type, olinfo_status;
@@ -2664,9 +2687,10 @@ static bool igc_clean_tx_irq(struct igc_q_vector *q_vector, int napi_budget)
unsigned int budget = q_vector->tx.work_limit;
struct igc_ring *tx_ring = q_vector->tx.ring;
unsigned int i = tx_ring->next_to_clean;
+ u32 xdp_frames = 0, xdp_bytes = 0;
+ u32 xsk_frames = 0, xsk_bytes = 0;
struct igc_tx_buffer *tx_buffer;
union igc_adv_tx_desc *tx_desc;
- u32 xsk_frames = 0;
if (test_bit(__IGC_DOWN, &adapter->state))
return true;
@@ -2698,11 +2722,14 @@ static bool igc_clean_tx_irq(struct igc_q_vector *q_vector, int napi_budget)
switch (tx_buffer->type) {
case IGC_TX_BUFFER_TYPE_XSK:
+ xsk_bytes += tx_buffer->bytecount;
xsk_frames++;
break;
case IGC_TX_BUFFER_TYPE_XDP:
xdp_return_frame(tx_buffer->xdpf);
igc_unmap_tx_buffer(tx_ring->dev, tx_buffer);
+ xdp_bytes += tx_buffer->bytecount;
+ xdp_frames++;
break;
case IGC_TX_BUFFER_TYPE_SKB:
napi_consume_skb(tx_buffer->skb, napi_budget);
@@ -2753,6 +2780,10 @@ static bool igc_clean_tx_irq(struct igc_q_vector *q_vector, int napi_budget)
tx_ring->next_to_clean = i;
igc_update_tx_stats(q_vector, total_packets, total_bytes);
+ xdp_update_tx_drv_stats(&tx_ring->xdp_stats->xdp_tx, xdp_frames,
+ xdp_bytes);
+ xdp_update_tx_drv_stats(&tx_ring->xdp_stats->xsk_tx, xsk_frames,
+ xsk_bytes);
if (tx_ring->xsk_pool) {
if (xsk_frames)
@@ -4385,6 +4416,8 @@ static int igc_alloc_q_vector(struct igc_adapter *adapter,
ring->count = adapter->tx_ring_count;
ring->queue_index = txr_idx;
+ ring->xdp_stats = adapter->netdev->xstats + txr_idx;
+
/* assign ring to adapter */
adapter->tx_ring[txr_idx] = ring;
@@ -4407,6 +4440,8 @@ static int igc_alloc_q_vector(struct igc_adapter *adapter,
ring->count = adapter->rx_ring_count;
ring->queue_index = rxr_idx;
+ ring->xdp_stats = adapter->netdev->xstats + rxr_idx;
+
/* assign ring to adapter */
adapter->rx_ring[rxr_idx] = ring;
}
@@ -4515,6 +4550,7 @@ static int igc_sw_init(struct igc_adapter *adapter)
struct net_device *netdev = adapter->netdev;
struct pci_dev *pdev = adapter->pdev;
struct igc_hw *hw = &adapter->hw;
+ u32 i;
pci_read_config_word(pdev, PCI_COMMAND, &hw->bus.pci_cmd_word);
@@ -4544,6 +4580,14 @@ static int igc_sw_init(struct igc_adapter *adapter)
igc_init_queue_configuration(adapter);
+ netdev->xstats = kcalloc(IGC_MAX_QUEUES, sizeof(*netdev->xstats),
+ GFP_KERNEL);
+ if (!netdev->xstats)
+ return -ENOMEM;
+
+ for (i = 0; i < IGC_MAX_QUEUES; i++)
+ xdp_init_drv_stats(netdev->xstats + i);
+
/* This call may decrease the number of queues */
if (igc_init_interrupt_scheme(adapter, true)) {
netdev_err(netdev, "Unable to allocate memory for queues\n");
@@ -6046,11 +6090,25 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
if (flags & XDP_XMIT_FLUSH)
igc_flush_tx_descriptors(ring);
+ if (unlikely(drops))
+ xdp_update_tx_drv_err(&ring->xdp_stats->xdp_tx, drops);
+
__netif_tx_unlock(nq);
return num_frames - drops;
}
+static int igc_get_xdp_stats_nch(const struct net_device *dev, u32 attr_id)
+{
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ case IFLA_XDP_XSTATS_TYPE_XSK:
+ return IGC_MAX_QUEUES;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
static void igc_trigger_rxtxq_interrupt(struct igc_adapter *adapter,
struct igc_q_vector *q_vector)
{
@@ -6096,6 +6154,8 @@ static const struct net_device_ops igc_netdev_ops = {
.ndo_set_mac_address = igc_set_mac,
.ndo_change_mtu = igc_change_mtu,
.ndo_get_stats64 = igc_get_stats64,
+ .ndo_get_xdp_stats_nch = igc_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = xdp_get_drv_stats_generic,
.ndo_fix_features = igc_fix_features,
.ndo_set_features = igc_set_features,
.ndo_features_check = igc_features_check,
--
2.33.1
Make igb driver collect and provide all generic XDP counters.
Unfortunately, igb has an unified ice_ring structure for both Rx
and Tx, so embedding xdp_drv_rx_stats would bloat it for no good.
Store XDP stats in a separate array with a lifetime of a netdev.
Unlike other Intel drivers, igb has no support for XSK, so we can't
use full xdp_drv_stats here. IGB_MAX_ALLOC_QUEUES is introduced
purely for convenience to not hardcode 16 twice more.
Reuse previously introduced helpers where possible. Performance
wavering from incrementing a bunch of counters on hotpath is around
stddev at [64 ... 1532] frame sizes.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Michal Swiatkowski <[email protected]>
---
drivers/net/ethernet/intel/igb/igb.h | 14 ++-
drivers/net/ethernet/intel/igb/igb_main.c | 102 ++++++++++++++++++++--
2 files changed, 105 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 2d3daf022651..a6c5355b82fc 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -303,6 +303,11 @@ struct igb_rx_queue_stats {
u64 alloc_failed;
};
+struct igb_xdp_stats {
+ struct xdp_rx_drv_stats rx;
+ struct xdp_tx_drv_stats tx;
+} ____cacheline_aligned;
+
struct igb_ring_container {
struct igb_ring *ring; /* pointer to linked list of rings */
unsigned int total_bytes; /* total bytes processed this int */
@@ -356,6 +361,7 @@ struct igb_ring {
struct u64_stats_sync rx_syncp;
};
};
+ struct igb_xdp_stats *xdp_stats;
struct xdp_rxq_info xdp_rxq;
} ____cacheline_internodealigned_in_smp;
@@ -531,6 +537,8 @@ struct igb_mac_addr {
#define IGB_MAC_STATE_SRC_ADDR 0x4
#define IGB_MAC_STATE_QUEUE_STEERING 0x8
+#define IGB_MAX_ALLOC_QUEUES 16
+
/* board specific private data structure */
struct igb_adapter {
unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
@@ -554,11 +562,11 @@ struct igb_adapter {
u16 tx_work_limit;
u32 tx_timeout_count;
int num_tx_queues;
- struct igb_ring *tx_ring[16];
+ struct igb_ring *tx_ring[IGB_MAX_ALLOC_QUEUES];
/* RX */
int num_rx_queues;
- struct igb_ring *rx_ring[16];
+ struct igb_ring *rx_ring[IGB_MAX_ALLOC_QUEUES];
u32 max_frame_size;
u32 min_frame_size;
@@ -664,6 +672,8 @@ struct igb_adapter {
struct igb_mac_addr *mac_table;
struct vf_mac_filter vf_macs;
struct vf_mac_filter *vf_mac_list;
+
+ struct igb_xdp_stats *xdp_stats;
};
/* flags controlling PTP/1588 function */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 18a019a47182..c4e1ea9bc4a8 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1266,6 +1266,7 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
u64_stats_init(&ring->tx_syncp);
u64_stats_init(&ring->tx_syncp2);
+ ring->xdp_stats = adapter->xdp_stats + txr_idx;
/* assign ring to adapter */
adapter->tx_ring[txr_idx] = ring;
@@ -1300,6 +1301,7 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
ring->queue_index = rxr_idx;
u64_stats_init(&ring->rx_syncp);
+ ring->xdp_stats = adapter->xdp_stats + rxr_idx;
/* assign ring to adapter */
adapter->rx_ring[rxr_idx] = ring;
@@ -2973,6 +2975,9 @@ static int igb_xdp_xmit(struct net_device *dev, int n,
nxmit++;
}
+ if (unlikely(nxmit < n))
+ xdp_update_tx_drv_err(&tx_ring->xdp_stats->tx, n - nxmit);
+
__netif_tx_unlock(nq);
if (unlikely(flags & XDP_XMIT_FLUSH))
@@ -2981,6 +2986,42 @@ static int igb_xdp_xmit(struct net_device *dev, int n,
return nxmit;
}
+static int igb_get_xdp_stats_nch(const struct net_device *dev, u32 attr_id)
+{
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ return IGB_MAX_ALLOC_QUEUES;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int igb_get_xdp_stats(const struct net_device *dev, u32 attr_id,
+ void *attr_data)
+{
+ const struct igb_adapter *adapter = netdev_priv(dev);
+ const struct igb_xdp_stats *drv_iter = adapter->xdp_stats;
+ struct ifla_xdp_stats *iter = attr_data;
+ u32 i;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ for (i = 0; i < IGB_MAX_ALLOC_QUEUES; i++) {
+ xdp_fetch_rx_drv_stats(iter, &drv_iter->rx);
+ xdp_fetch_tx_drv_stats(iter, &drv_iter->tx);
+
+ drv_iter++;
+ iter++;
+ }
+
+ return 0;
+}
+
static const struct net_device_ops igb_netdev_ops = {
.ndo_open = igb_open,
.ndo_stop = igb_close,
@@ -3007,6 +3048,8 @@ static const struct net_device_ops igb_netdev_ops = {
.ndo_setup_tc = igb_setup_tc,
.ndo_bpf = igb_xdp,
.ndo_xdp_xmit = igb_xdp_xmit,
+ .ndo_get_xdp_stats_nch = igb_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = igb_get_xdp_stats,
};
/**
@@ -3620,6 +3663,7 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (hw->flash_address)
iounmap(hw->flash_address);
err_sw_init:
+ kfree(adapter->xdp_stats);
kfree(adapter->mac_table);
kfree(adapter->shadow_vfta);
igb_clear_interrupt_scheme(adapter);
@@ -3833,6 +3877,7 @@ static void igb_remove(struct pci_dev *pdev)
iounmap(hw->flash_address);
pci_release_mem_regions(pdev);
+ kfree(adapter->xdp_stats);
kfree(adapter->mac_table);
kfree(adapter->shadow_vfta);
free_netdev(netdev);
@@ -3962,6 +4007,7 @@ static int igb_sw_init(struct igb_adapter *adapter)
struct e1000_hw *hw = &adapter->hw;
struct net_device *netdev = adapter->netdev;
struct pci_dev *pdev = adapter->pdev;
+ u32 i;
pci_read_config_word(pdev, PCI_COMMAND, &hw->bus.pci_cmd_word);
@@ -4019,6 +4065,19 @@ static int igb_sw_init(struct igb_adapter *adapter)
if (!adapter->shadow_vfta)
return -ENOMEM;
+ adapter->xdp_stats = kcalloc(IGB_MAX_ALLOC_QUEUES,
+ sizeof(*adapter->xdp_stats),
+ GFP_KERNEL);
+ if (!adapter->xdp_stats)
+ return -ENOMEM;
+
+ for (i = 0; i < IGB_MAX_ALLOC_QUEUES; i++) {
+ struct igb_xdp_stats *xdp_stats = adapter->xdp_stats + i;
+
+ xdp_init_rx_drv_stats(&xdp_stats->rx);
+ xdp_init_tx_drv_stats(&xdp_stats->tx);
+ }
+
/* This call may decrease the number of queues */
if (igb_init_interrupt_scheme(adapter, true)) {
dev_err(&pdev->dev, "Unable to allocate memory for queues\n");
@@ -6264,8 +6323,10 @@ int igb_xmit_xdp_ring(struct igb_adapter *adapter,
len = xdpf->len;
- if (unlikely(!igb_desc_unused(tx_ring)))
+ if (unlikely(!igb_desc_unused(tx_ring))) {
+ xdp_update_tx_drv_full(&tx_ring->xdp_stats->tx);
return IGB_XDP_CONSUMED;
+ }
dma = dma_map_single(tx_ring->dev, xdpf->data, len, DMA_TO_DEVICE);
if (dma_mapping_error(tx_ring->dev, dma))
@@ -8045,6 +8106,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
unsigned int total_bytes = 0, total_packets = 0;
unsigned int budget = q_vector->tx.work_limit;
unsigned int i = tx_ring->next_to_clean;
+ u32 xdp_packets = 0, xdp_bytes = 0;
if (test_bit(__IGB_DOWN, &adapter->state))
return true;
@@ -8075,10 +8137,13 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
total_packets += tx_buffer->gso_segs;
/* free the skb */
- if (tx_buffer->type == IGB_TYPE_SKB)
+ if (tx_buffer->type == IGB_TYPE_SKB) {
napi_consume_skb(tx_buffer->skb, napi_budget);
- else
+ } else {
xdp_return_frame(tx_buffer->xdpf);
+ xdp_bytes += tx_buffer->bytecount;
+ xdp_packets++;
+ }
/* unmap skb header data */
dma_unmap_single(tx_ring->dev,
@@ -8135,6 +8200,8 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector, int napi_budget)
tx_ring->tx_stats.bytes += total_bytes;
tx_ring->tx_stats.packets += total_packets;
u64_stats_update_end(&tx_ring->tx_syncp);
+ xdp_update_tx_drv_stats(&tx_ring->xdp_stats->tx, xdp_packets,
+ xdp_bytes);
q_vector->tx.total_bytes += total_bytes;
q_vector->tx.total_packets += total_packets;
@@ -8393,7 +8460,8 @@ static struct sk_buff *igb_build_skb(struct igb_ring *rx_ring,
static struct sk_buff *igb_run_xdp(struct igb_adapter *adapter,
struct igb_ring *rx_ring,
- struct xdp_buff *xdp)
+ struct xdp_buff *xdp,
+ struct xdp_rx_drv_stats_local *lrstats)
{
int err, result = IGB_XDP_PASS;
struct bpf_prog *xdp_prog;
@@ -8404,32 +8472,46 @@ static struct sk_buff *igb_run_xdp(struct igb_adapter *adapter,
if (!xdp_prog)
goto xdp_out;
+ lrstats->bytes += xdp->data_end - xdp->data;
+ lrstats->packets++;
+
prefetchw(xdp->data_hard_start); /* xdp_frame write */
act = bpf_prog_run_xdp(xdp_prog, xdp);
switch (act) {
case XDP_PASS:
+ lrstats->pass++;
break;
case XDP_TX:
result = igb_xdp_xmit_back(adapter, xdp);
- if (result == IGB_XDP_CONSUMED)
+ if (result == IGB_XDP_CONSUMED) {
+ lrstats->tx_errors++;
goto out_failure;
+ }
+ lrstats->tx++;
break;
case XDP_REDIRECT:
err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog);
- if (err)
+ if (err) {
+ lrstats->redirect_errors++;
goto out_failure;
+ }
result = IGB_XDP_REDIR;
+ lrstats->redirect++;
break;
default:
bpf_warn_invalid_xdp_action(act);
- fallthrough;
+ lrstats->invalid++;
+ goto out_failure;
case XDP_ABORTED:
+ lrstats->aborted++;
out_failure:
trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
- fallthrough;
+ result = IGB_XDP_CONSUMED;
+ break;
case XDP_DROP:
result = IGB_XDP_CONSUMED;
+ lrstats->drop++;
break;
}
xdp_out:
@@ -8677,6 +8759,7 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
{
struct igb_adapter *adapter = q_vector->adapter;
struct igb_ring *rx_ring = q_vector->rx.ring;
+ struct xdp_rx_drv_stats_local lrstats = { };
struct sk_buff *skb = rx_ring->skb;
unsigned int total_bytes = 0, total_packets = 0;
u16 cleaned_count = igb_desc_unused(rx_ring);
@@ -8740,7 +8823,7 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
/* At larger PAGE_SIZE, frame_sz depend on len size */
xdp.frame_sz = igb_rx_frame_truesize(rx_ring, size);
#endif
- skb = igb_run_xdp(adapter, rx_ring, &xdp);
+ skb = igb_run_xdp(adapter, rx_ring, &xdp, &lrstats);
}
if (IS_ERR(skb)) {
@@ -8814,6 +8897,7 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
rx_ring->rx_stats.packets += total_packets;
rx_ring->rx_stats.bytes += total_bytes;
u64_stats_update_end(&rx_ring->rx_syncp);
+ xdp_update_rx_drv_stats(&rx_ring->xdp_stats->rx, &lrstats);
q_vector->rx.total_packets += total_packets;
q_vector->rx.total_bytes += total_bytes;
--
2.33.1
On Tue, Nov 23, 2021 at 05:39:34PM +0100, Alexander Lobakin wrote:
> Similarly to dpaa2, enetc stores 5 per-channel counters for XDP.
> Add necessary callbacks to be able to access them using new generic
> XDP stats infra.
>
> Signed-off-by: Alexander Lobakin <[email protected]>
> Reviewed-by: Jesse Brandeburg <[email protected]>
> ---
Reviewed-by: Vladimir Oltean <[email protected]>
These counters can be dropped from ethtool, nobody depends on having
them there.
Side question: what does "nch" stand for?
Hi Alexander,
On 11/23/21 5:39 PM, Alexander Lobakin wrote:
[...]
Just commenting on ice here as one example (similar applies to other drivers):
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> index 1dd7e84f41f8..7dc287bc3a1a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> @@ -258,6 +258,8 @@ static void ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
> xdp_ring->next_dd = ICE_TX_THRESH - 1;
> xdp_ring->next_to_clean = ntc;
> ice_update_tx_ring_stats(xdp_ring, total_pkts, total_bytes);
> + xdp_update_tx_drv_stats(&xdp_ring->xdp_stats->xdp_tx, total_pkts,
> + total_bytes);
> }
>
> /**
> @@ -277,6 +279,7 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring)
> ice_clean_xdp_irq(xdp_ring);
>
> if (!unlikely(ICE_DESC_UNUSED(xdp_ring))) {
> + xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xdp_tx);
> xdp_ring->tx_stats.tx_busy++;
> return ICE_XDP_CONSUMED;
> }
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index ff55cb415b11..62ef47a38d93 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -454,42 +454,58 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff **xdp_arr)
> * @xdp: xdp_buff used as input to the XDP program
> * @xdp_prog: XDP program to run
> * @xdp_ring: ring to be used for XDP_TX action
> + * @lrstats: onstack Rx XDP stats
> *
> * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
> */
> static int
> ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> - struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring)
> + struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
> + struct xdp_rx_drv_stats_local *lrstats)
> {
> int err, result = ICE_XDP_PASS;
> u32 act;
>
> + lrstats->bytes += xdp->data_end - xdp->data;
> + lrstats->packets++;
> +
> act = bpf_prog_run_xdp(xdp_prog, xdp);
>
> if (likely(act == XDP_REDIRECT)) {
> err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
> - if (err)
> + if (err) {
> + lrstats->redirect_errors++;
> goto out_failure;
> + }
> + lrstats->redirect++;
> return ICE_XDP_REDIR;
> }
>
> switch (act) {
> case XDP_PASS:
> + lrstats->pass++;
> break;
> case XDP_TX:
> result = ice_xmit_xdp_buff(xdp, xdp_ring);
> - if (result == ICE_XDP_CONSUMED)
> + if (result == ICE_XDP_CONSUMED) {
> + lrstats->tx_errors++;
> goto out_failure;
> + }
> + lrstats->tx++;
> break;
> default:
> bpf_warn_invalid_xdp_action(act);
> - fallthrough;
> + lrstats->invalid++;
> + goto out_failure;
> case XDP_ABORTED:
> + lrstats->aborted++;
> out_failure:
> trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
> - fallthrough;
> + result = ICE_XDP_CONSUMED;
> + break;
> case XDP_DROP:
> result = ICE_XDP_CONSUMED;
> + lrstats->drop++;
> break;
> }
Imho, the overall approach is way too bloated. I can see the packets/bytes but now we
have 3 counter updates with return codes included and then the additional sync of the
on-stack counters into the ring counters via xdp_update_rx_drv_stats(). So we now need
ice_update_rx_ring_stats() as well as xdp_update_rx_drv_stats() which syncs 10 different
stat counters via u64_stats_add() into the per ring ones. :/
I'm just taking our XDP L4LB in Cilium as an example: there we already count errors and
export them via per-cpu map that eventually lead to XDP_DROP cases including the /reason/
which caused the XDP_DROP (e.g. Prometheus can then scrape these insights from all the
nodes in the cluster). Given the different action codes are very often application specific,
there's not much debugging that you can do when /only/ looking at `ip link xdpstats` to
gather insight on *why* some of these actions were triggered (e.g. fib lookup failure, etc).
If really of interest, then maybe libxdp could have such per-action counters as opt-in in
its call chain..
In the case of ice_run_xdp() today, we already bump total_rx_bytes/total_rx_pkts under
XDP and update ice_update_rx_ring_stats(). I do see the case for XDP_TX and XDP_REDIRECT
where we run into driver-specific errors that are /outside of the reach/ of the BPF prog.
For example, we've been running into errors from XDP_TX in ice_xmit_xdp_ring() in the
past during testing, and were able to pinpoint the location as xdp_ring->tx_stats.tx_busy
was increasing. These things are useful and would make sense to standardize for XDP context.
But then it also seems like above in ice_xmit_xdp_ring() we now need to bump counters
twice just for sake of ethtool vs xdp counters which sucks a bit, would be nice to only
having to do it once:
> if (!unlikely(ICE_DESC_UNUSED(xdp_ring))) {
> + xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xdp_tx);
> xdp_ring->tx_stats.tx_busy++;
> return ICE_XDP_CONSUMED;
> }
Anyway, but just to reiterate, for troubleshooting I do care about anomalous events that
led to drops in the driver e.g. due to no space in ring or DMA errors (XDP_TX), or more
detailed insights in xdp_do_redirect() when errors occur (XDP_REDIRECT), very much less
about the action code given the prog has the full error context here already.
One more comment/question on the last doc update patch (I presume you only have dummy
numbers in there from testing?):
+For some interfaces, standard XDP statistics are available.
+It can be accessed the same ways, e.g. `ip`::
+
+ $ ip link xdpstats dev enp178s0
+ 16: enp178s0:
+ xdp-channel0-rx_xdp_packets: 0
+ xdp-channel0-rx_xdp_bytes: 1
+ xdp-channel0-rx_xdp_errors: 2
What are the semantics on xdp_errors? Summary of xdp_redirect_errors, xdp_tx_errors and
xdp_xmit_errors? Or driver specific defined?
+ xdp-channel0-rx_xdp_aborted: 3
+ xdp-channel0-rx_xdp_drop: 4
+ xdp-channel0-rx_xdp_invalid: 5
+ xdp-channel0-rx_xdp_pass: 6
[...]
+ xdp-channel0-rx_xdp_redirect: 7
+ xdp-channel0-rx_xdp_redirect_errors: 8
+ xdp-channel0-rx_xdp_tx: 9
+ xdp-channel0-rx_xdp_tx_errors: 10
+ xdp-channel0-tx_xdp_xmit_packets: 11
+ xdp-channel0-tx_xdp_xmit_bytes: 12
+ xdp-channel0-tx_xdp_xmit_errors: 13
+ xdp-channel0-tx_xdp_xmit_full: 14
From a user PoV to avoid confusion, maybe should be made more clear that the latter refers
to xsk.
> @@ -507,6 +523,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
> {
> unsigned int total_rx_bytes = 0, total_rx_packets = 0;
> u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
> + struct xdp_rx_drv_stats_local lrstats = { };
> struct ice_tx_ring *xdp_ring;
> unsigned int xdp_xmit = 0;
> struct bpf_prog *xdp_prog;
> @@ -548,7 +565,8 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
> xsk_buff_set_size(*xdp, size);
> xsk_buff_dma_sync_for_cpu(*xdp, rx_ring->xsk_pool);
>
> - xdp_res = ice_run_xdp_zc(rx_ring, *xdp, xdp_prog, xdp_ring);
> + xdp_res = ice_run_xdp_zc(rx_ring, *xdp, xdp_prog, xdp_ring,
> + &lrstats);
> if (xdp_res) {
> if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))
> xdp_xmit |= xdp_res;
> @@ -598,6 +616,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
>
> ice_finalize_xdp_rx(xdp_ring, xdp_xmit);
> ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes);
> + xdp_update_rx_drv_stats(&rx_ring->xdp_stats->xsk_rx, &lrstats);
>
> if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
> if (failure || rx_ring->next_to_clean == rx_ring->next_to_use)
> @@ -629,6 +648,7 @@ static bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, int budget)
> struct ice_tx_buf *tx_buf;
>
> if (unlikely(!ICE_DESC_UNUSED(xdp_ring))) {
> + xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xsk_tx);
> xdp_ring->tx_stats.tx_busy++;
> work_done = false;
> break;
> @@ -686,11 +706,11 @@ ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf *tx_buf)
> */
> bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget)
> {
> - int total_packets = 0, total_bytes = 0;
> s16 ntc = xdp_ring->next_to_clean;
> + u32 xdp_frames = 0, xdp_bytes = 0;
> + u32 xsk_frames = 0, xsk_bytes = 0;
> struct ice_tx_desc *tx_desc;
> struct ice_tx_buf *tx_buf;
> - u32 xsk_frames = 0;
> bool xmit_done;
>
> tx_desc = ICE_TX_DESC(xdp_ring, ntc);
Thanks,
Daniel
On 23/11/2021 16:39, Alexander Lobakin wrote:
> Export 4 per-channel XDP counters for both sf100 and sfx drivers
> using generic XDP stats infra.
>
> Signed-off-by: Alexander Lobakin <[email protected]>
> Reviewed-by: Jesse Brandeburg <[email protected]>
The usual Subject: prefix for these drivers is sfc:
(or occasionally sfc_ef100: for ef100-specific stuff).
> +int efx_get_xdp_stats_nch(const struct net_device *net_dev, u32 attr_id)
> +{
> + const struct efx_nic *efx = netdev_priv(net_dev);
> +
> + switch (attr_id) {
> + case IFLA_XDP_XSTATS_TYPE_XDP:
> + return efx->n_channels;
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +int efx_get_xdp_stats(const struct net_device *net_dev, u32 attr_id,
> + void *attr_data)
> +{
> + struct ifla_xdp_stats *xdp_stats = attr_data;
> + struct efx_nic *efx = netdev_priv(net_dev);
> + const struct efx_channel *channel;
> +
> + switch (attr_id) {
> + case IFLA_XDP_XSTATS_TYPE_XDP:
> + break;
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + spin_lock_bh(&efx->stats_lock);
> +
> + efx_for_each_channel(channel, efx) {
> + xdp_stats->drop = channel->n_rx_xdp_drops;
> + xdp_stats->errors = channel->n_rx_xdp_bad_drops;
> + xdp_stats->redirect = channel->n_rx_xdp_redirect;
> + xdp_stats->tx = channel->n_rx_xdp_tx;
> +
> + xdp_stats++;
> + }What guarantees that efx->n_channels won't change between these two
calls, potentially overrunning the buffer?
-ed
On Tue, Nov 23, 2021 at 05:39:37PM +0100, Alexander Lobakin wrote:
> Same as mvneta, mvpp2 stores 7 XDP counters in per-cpu containers.
> Expose them via generic XDP stats infra.
>
> Signed-off-by: Alexander Lobakin <[email protected]>
> Reviewed-by: Jesse Brandeburg <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
Thanks!
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Tue, Nov 23, 2021 at 05:39:37PM +0100, Alexander Lobakin wrote:
> Same as mvneta, mvpp2 stores 7 XDP counters in per-cpu containers.
> Expose them via generic XDP stats infra.
>
> Signed-off-by: Alexander Lobakin <[email protected]>
> Reviewed-by: Jesse Brandeburg <[email protected]>
> ---
> .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 51 +++++++++++++++++++
> 1 file changed, 51 insertions(+)
>
> diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> index 97bd2ee8a010..58203cde3b60 100644
> --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> @@ -5131,6 +5131,56 @@ mvpp2_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> stats->tx_dropped = dev->stats.tx_dropped;
> }
>
> +static int mvpp2_get_xdp_stats_ndo(const struct net_device *dev, u32 attr_id,
> + void *attr_data)
> +{
> + const struct mvpp2_port *port = netdev_priv(dev);
> + struct ifla_xdp_stats *xdp_stats = attr_data;
> + u32 cpu, start;
> +
> + switch (attr_id) {
> + case IFLA_XDP_XSTATS_TYPE_XDP:
> + break;
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + for_each_possible_cpu(cpu) {
> + const struct mvpp2_pcpu_stats *ps;
> + u64 xdp_xmit_err;
> + u64 xdp_redirect;
> + u64 xdp_tx_err;
> + u64 xdp_pass;
> + u64 xdp_drop;
> + u64 xdp_xmit;
> + u64 xdp_tx;
> +
> + ps = per_cpu_ptr(port->stats, cpu);
> +
> + do {
> + start = u64_stats_fetch_begin_irq(&ps->syncp);
> +
> + xdp_redirect = ps->xdp_redirect;
> + xdp_pass = ps->xdp_pass;
> + xdp_drop = ps->xdp_drop;
> + xdp_xmit = ps->xdp_xmit;
> + xdp_xmit_err = ps->xdp_xmit_err;
> + xdp_tx = ps->xdp_tx;
> + xdp_tx_err = ps->xdp_tx_err;
> + } while (u64_stats_fetch_retry_irq(&ps->syncp, start));
> +
> + xdp_stats->redirect += xdp_redirect;
> + xdp_stats->pass += xdp_pass;
> + xdp_stats->drop += xdp_drop;
> + xdp_stats->xmit_packets += xdp_xmit;
> + xdp_stats->xmit_errors += xdp_xmit_err;
> + xdp_stats->tx += xdp_tx;
> + xdp_stats->tx_errors += xdp_tx_err;
> + }
Actually, the only concern I have here is the duplication between this
function and mvpp2_get_xdp_stats(). It looks to me like these two
functions could share a lot of their code. Please submit a patch to
make that happen. Thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
From: Vladimir Oltean <[email protected]>
Date: Tue, 23 Nov 2021 17:09:20 +0000
> On Tue, Nov 23, 2021 at 05:39:34PM +0100, Alexander Lobakin wrote:
> > Similarly to dpaa2, enetc stores 5 per-channel counters for XDP.
> > Add necessary callbacks to be able to access them using new generic
> > XDP stats infra.
> >
> > Signed-off-by: Alexander Lobakin <[email protected]>
> > Reviewed-by: Jesse Brandeburg <[email protected]>
> > ---
>
> Reviewed-by: Vladimir Oltean <[email protected]>
Thanks!
> These counters can be dropped from ethtool, nobody depends on having
> them there.
Got it, thanks. I'll remove them in v3 or, in case v2 gets accepted,
will send a follow-up patch(es) for removing redundant Ethtool
stats.
> Side question: what does "nch" stand for?
"The number of channels". I was thinking of an intuitial, but short
term, as get_xdp_stats_channels is too long and breaks Tab aligment
of tons of net_device_ops across the tree.
It was "nqs" /number of queues/ previously, but we usually use term
"queue" referring to one-direction ring, in case of these stats and
XDP in general "queue pair" or simply "channel" is more correct.
Thanks,
Al
On Tue, Nov 23, 2021 at 05:39:36PM +0100, Alexander Lobakin wrote:
> + for_each_possible_cpu(cpu) {
> + const struct mvneta_pcpu_stats *stats;
> + const struct mvneta_stats *ps;
> + u64 xdp_xmit_err;
> + u64 xdp_redirect;
> + u64 xdp_tx_err;
> + u64 xdp_pass;
> + u64 xdp_drop;
> + u64 xdp_xmit;
> + u64 xdp_tx;
> + u32 start;
> +
> + stats = per_cpu_ptr(pp->stats, cpu);
> + ps = &stats->es.ps;
> +
> + do {
> + start = u64_stats_fetch_begin_irq(&stats->syncp);
> +
> + xdp_drop = ps->xdp_drop;
> + xdp_pass = ps->xdp_pass;
> + xdp_redirect = ps->xdp_redirect;
> + xdp_tx = ps->xdp_tx;
> + xdp_tx_err = ps->xdp_tx_err;
> + xdp_xmit = ps->xdp_xmit;
> + xdp_xmit_err = ps->xdp_xmit_err;
> + } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
> +
> + xdp_stats->drop += xdp_drop;
> + xdp_stats->pass += xdp_pass;
> + xdp_stats->redirect += xdp_redirect;
> + xdp_stats->tx += xdp_tx;
> + xdp_stats->tx_errors += xdp_tx_err;
> + xdp_stats->xmit_packets += xdp_xmit;
> + xdp_stats->xmit_errors += xdp_xmit_err;
Same comment as for mvpp2 - this could share a lot of code from
mvneta_ethtool_update_pcpu_stats() (although it means we end up
calculating a little more for the alloc error and refill error
that this API doesn't need) but I think sharing that code would be
a good idea.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Daniel asked me to share my opinion, as Cloudflare has an XDP load
balancer as well.
On Wed, 24 Nov 2021 at 00:53, Daniel Borkmann <[email protected]> wrote:
> I'm just taking our XDP L4LB in Cilium as an example: there we already count errors and
> export them via per-cpu map that eventually lead to XDP_DROP cases including the /reason/
> which caused the XDP_DROP (e.g. Prometheus can then scrape these insights from all the
> nodes in the cluster). Given the different action codes are very often application specific,
> there's not much debugging that you can do when /only/ looking at `ip link xdpstats` to
> gather insight on *why* some of these actions were triggered (e.g. fib lookup failure, etc).
Agreed. For our purpose we often want to know whether a specific
program has been invoked. Per-channel or per device stats don't help
us much since we have a chain of programs (not using libxdp though).
My colleague Arthur has written xdpcap [1], which gives per-action,
per-program counters. This way we can correlate an action with a
packet and a program.
> If really of interest, then maybe libxdp could have such per-action counters as opt-in in
> its call chain..
We could also make it part of BPF_ENABLE_STATS, it's kind of coarse
grained though.
> In the case of ice_run_xdp() today, we already bump total_rx_bytes/total_rx_pkts under
> XDP and update ice_update_rx_ring_stats(). I do see the case for XDP_TX and XDP_REDIRECT
> where we run into driver-specific errors that are /outside of the reach/ of the BPF prog.
> For example, we've been running into errors from XDP_TX in ice_xmit_xdp_ring() in the
> past during testing, and were able to pinpoint the location as xdp_ring->tx_stats.tx_busy
> was increasing. These things are useful and would make sense to standardize for XDP context.
I'd like to see more tracepoints like trace_xdp_exception, personally.
We can use things like bpftrace for exploration and ebpf_exporter [2]
to generate alerts much more easily than something wired into
iproute2.
Best
Lorenz
1: https://github.com/cloudflare/xdpcap
2: https://github.com/cloudflare/ebpf_exporter
--
Lorenz Bauer | Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
http://www.cloudflare.com
Daniel Borkmann <[email protected]> writes:
> Hi Alexander,
>
> On 11/23/21 5:39 PM, Alexander Lobakin wrote:
> [...]
>
> Just commenting on ice here as one example (similar applies to other drivers):
>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> index 1dd7e84f41f8..7dc287bc3a1a 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
>> @@ -258,6 +258,8 @@ static void ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
>> xdp_ring->next_dd = ICE_TX_THRESH - 1;
>> xdp_ring->next_to_clean = ntc;
>> ice_update_tx_ring_stats(xdp_ring, total_pkts, total_bytes);
>> + xdp_update_tx_drv_stats(&xdp_ring->xdp_stats->xdp_tx, total_pkts,
>> + total_bytes);
>> }
>>
>> /**
>> @@ -277,6 +279,7 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring)
>> ice_clean_xdp_irq(xdp_ring);
>>
>> if (!unlikely(ICE_DESC_UNUSED(xdp_ring))) {
>> + xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xdp_tx);
>> xdp_ring->tx_stats.tx_busy++;
>> return ICE_XDP_CONSUMED;
>> }
>> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> index ff55cb415b11..62ef47a38d93 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
>> @@ -454,42 +454,58 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff **xdp_arr)
>> * @xdp: xdp_buff used as input to the XDP program
>> * @xdp_prog: XDP program to run
>> * @xdp_ring: ring to be used for XDP_TX action
>> + * @lrstats: onstack Rx XDP stats
>> *
>> * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
>> */
>> static int
>> ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
>> - struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring)
>> + struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
>> + struct xdp_rx_drv_stats_local *lrstats)
>> {
>> int err, result = ICE_XDP_PASS;
>> u32 act;
>>
>> + lrstats->bytes += xdp->data_end - xdp->data;
>> + lrstats->packets++;
>> +
>> act = bpf_prog_run_xdp(xdp_prog, xdp);
>>
>> if (likely(act == XDP_REDIRECT)) {
>> err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
>> - if (err)
>> + if (err) {
>> + lrstats->redirect_errors++;
>> goto out_failure;
>> + }
>> + lrstats->redirect++;
>> return ICE_XDP_REDIR;
>> }
>>
>> switch (act) {
>> case XDP_PASS:
>> + lrstats->pass++;
>> break;
>> case XDP_TX:
>> result = ice_xmit_xdp_buff(xdp, xdp_ring);
>> - if (result == ICE_XDP_CONSUMED)
>> + if (result == ICE_XDP_CONSUMED) {
>> + lrstats->tx_errors++;
>> goto out_failure;
>> + }
>> + lrstats->tx++;
>> break;
>> default:
>> bpf_warn_invalid_xdp_action(act);
>> - fallthrough;
>> + lrstats->invalid++;
>> + goto out_failure;
>> case XDP_ABORTED:
>> + lrstats->aborted++;
>> out_failure:
>> trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
>> - fallthrough;
>> + result = ICE_XDP_CONSUMED;
>> + break;
>> case XDP_DROP:
>> result = ICE_XDP_CONSUMED;
>> + lrstats->drop++;
>> break;
>> }
>
> Imho, the overall approach is way too bloated. I can see the
> packets/bytes but now we have 3 counter updates with return codes
> included and then the additional sync of the on-stack counters into
> the ring counters via xdp_update_rx_drv_stats(). So we now need
> ice_update_rx_ring_stats() as well as xdp_update_rx_drv_stats() which
> syncs 10 different stat counters via u64_stats_add() into the per ring
> ones. :/
>
> I'm just taking our XDP L4LB in Cilium as an example: there we already
> count errors and export them via per-cpu map that eventually lead to
> XDP_DROP cases including the /reason/ which caused the XDP_DROP (e.g.
> Prometheus can then scrape these insights from all the nodes in the
> cluster). Given the different action codes are very often application
> specific, there's not much debugging that you can do when /only/
> looking at `ip link xdpstats` to gather insight on *why* some of these
> actions were triggered (e.g. fib lookup failure, etc). If really of
> interest, then maybe libxdp could have such per-action counters as
> opt-in in its call chain..
To me, standardising these counters is less about helping people debug
their XDP programs (as you say, you can put your own telemetry into
those), and more about making XDP less "mystical" to the system
administrator (who may not be the same person who wrote the XDP
programs). So at the very least, they need to indicate "where are the
packets going", which means at least counters for DROP, REDIRECT and TX
(+ errors for tx/redirect) in addition to the "processed by XDP" initial
counter. Which in the above means 'pass', 'invalid' and 'aborted' could
be dropped, I guess; but I don't mind terribly keeping them either given
that there's no measurable performance impact.
> But then it also seems like above in ice_xmit_xdp_ring() we now need
> to bump counters twice just for sake of ethtool vs xdp counters which
> sucks a bit, would be nice to only having to do it once:
This I agree with, and while I can see the layering argument for putting
them into 'ip' and rtnetlink instead of ethtool, I also worry that these
counters will simply be lost in obscurity, so I do wonder if it wouldn't
be better to accept the "layering violation" and keeping them all in the
'ethtool -S' output?
[...]
> + xdp-channel0-rx_xdp_redirect: 7
> + xdp-channel0-rx_xdp_redirect_errors: 8
> + xdp-channel0-rx_xdp_tx: 9
> + xdp-channel0-rx_xdp_tx_errors: 10
> + xdp-channel0-tx_xdp_xmit_packets: 11
> + xdp-channel0-tx_xdp_xmit_bytes: 12
> + xdp-channel0-tx_xdp_xmit_errors: 13
> + xdp-channel0-tx_xdp_xmit_full: 14
>
> From a user PoV to avoid confusion, maybe should be made more clear that the latter refers
> to xsk.
+1, these should probably be xdp-channel0-tx_xsk_* or something like
that...
-Toke
From: Toke Høiland-Jørgensen <[email protected]>
Date: Thu, 25 Nov 2021 12:56:01 +0100
> Daniel Borkmann <[email protected]> writes:
>
> > Hi Alexander,
> >
> > On 11/23/21 5:39 PM, Alexander Lobakin wrote:
> > [...]
> >
> > Just commenting on ice here as one example (similar applies to other drivers):
> >
> >> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> >> index 1dd7e84f41f8..7dc287bc3a1a 100644
> >> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> >> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> >> @@ -258,6 +258,8 @@ static void ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
> >> xdp_ring->next_dd = ICE_TX_THRESH - 1;
> >> xdp_ring->next_to_clean = ntc;
> >> ice_update_tx_ring_stats(xdp_ring, total_pkts, total_bytes);
> >> + xdp_update_tx_drv_stats(&xdp_ring->xdp_stats->xdp_tx, total_pkts,
> >> + total_bytes);
> >> }
> >>
> >> /**
> >> @@ -277,6 +279,7 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring)
> >> ice_clean_xdp_irq(xdp_ring);
> >>
> >> if (!unlikely(ICE_DESC_UNUSED(xdp_ring))) {
> >> + xdp_update_tx_drv_full(&xdp_ring->xdp_stats->xdp_tx);
> >> xdp_ring->tx_stats.tx_busy++;
> >> return ICE_XDP_CONSUMED;
> >> }
> >> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
> >> index ff55cb415b11..62ef47a38d93 100644
> >> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> >> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> >> @@ -454,42 +454,58 @@ ice_construct_skb_zc(struct ice_rx_ring *rx_ring, struct xdp_buff **xdp_arr)
> >> * @xdp: xdp_buff used as input to the XDP program
> >> * @xdp_prog: XDP program to run
> >> * @xdp_ring: ring to be used for XDP_TX action
> >> + * @lrstats: onstack Rx XDP stats
> >> *
> >> * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
> >> */
> >> static int
> >> ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> >> - struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring)
> >> + struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
> >> + struct xdp_rx_drv_stats_local *lrstats)
> >> {
> >> int err, result = ICE_XDP_PASS;
> >> u32 act;
> >>
> >> + lrstats->bytes += xdp->data_end - xdp->data;
> >> + lrstats->packets++;
> >> +
> >> act = bpf_prog_run_xdp(xdp_prog, xdp);
> >>
> >> if (likely(act == XDP_REDIRECT)) {
> >> err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
> >> - if (err)
> >> + if (err) {
> >> + lrstats->redirect_errors++;
> >> goto out_failure;
> >> + }
> >> + lrstats->redirect++;
> >> return ICE_XDP_REDIR;
> >> }
> >>
> >> switch (act) {
> >> case XDP_PASS:
> >> + lrstats->pass++;
> >> break;
> >> case XDP_TX:
> >> result = ice_xmit_xdp_buff(xdp, xdp_ring);
> >> - if (result == ICE_XDP_CONSUMED)
> >> + if (result == ICE_XDP_CONSUMED) {
> >> + lrstats->tx_errors++;
> >> goto out_failure;
> >> + }
> >> + lrstats->tx++;
> >> break;
> >> default:
> >> bpf_warn_invalid_xdp_action(act);
> >> - fallthrough;
> >> + lrstats->invalid++;
> >> + goto out_failure;
> >> case XDP_ABORTED:
> >> + lrstats->aborted++;
> >> out_failure:
> >> trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
> >> - fallthrough;
> >> + result = ICE_XDP_CONSUMED;
> >> + break;
> >> case XDP_DROP:
> >> result = ICE_XDP_CONSUMED;
> >> + lrstats->drop++;
> >> break;
> >> }
> >
> > Imho, the overall approach is way too bloated. I can see the
> > packets/bytes but now we have 3 counter updates with return codes
> > included and then the additional sync of the on-stack counters into
> > the ring counters via xdp_update_rx_drv_stats(). So we now need
> > ice_update_rx_ring_stats() as well as xdp_update_rx_drv_stats() which
> > syncs 10 different stat counters via u64_stats_add() into the per ring
> > ones. :/
> >
> > I'm just taking our XDP L4LB in Cilium as an example: there we already
> > count errors and export them via per-cpu map that eventually lead to
> > XDP_DROP cases including the /reason/ which caused the XDP_DROP (e.g.
> > Prometheus can then scrape these insights from all the nodes in the
> > cluster). Given the different action codes are very often application
> > specific, there's not much debugging that you can do when /only/
> > looking at `ip link xdpstats` to gather insight on *why* some of these
> > actions were triggered (e.g. fib lookup failure, etc). If really of
> > interest, then maybe libxdp could have such per-action counters as
> > opt-in in its call chain..
>
> To me, standardising these counters is less about helping people debug
> their XDP programs (as you say, you can put your own telemetry into
> those), and more about making XDP less "mystical" to the system
> administrator (who may not be the same person who wrote the XDP
> programs). So at the very least, they need to indicate "where are the
> packets going", which means at least counters for DROP, REDIRECT and TX
> (+ errors for tx/redirect) in addition to the "processed by XDP" initial
> counter. Which in the above means 'pass', 'invalid' and 'aborted' could
> be dropped, I guess; but I don't mind terribly keeping them either given
> that there's no measurable performance impact.
Right.
The other reason is that I want to continue the effort of
standardizing widely-implemented statistics. Ethtool private stats
approach is neither scalable (you can't rely on any fields which may
be not exposed in other drivers) nor good for code hygiene (code
duplication, differences in naming and logics etc.).
Let's say if only mlx5 driver has 'cache_waive' stats, then it's
okay to export it using private stats, but if 10 drivers has
'xdp_drop' field it's better to uniform it, isn't it?
> > But then it also seems like above in ice_xmit_xdp_ring() we now need
> > to bump counters twice just for sake of ethtool vs xdp counters which
> > sucks a bit, would be nice to only having to do it once:
We'll remove such duplication in the nearest future, as well as some
of duplications between Ethtool private and this XDP stats. I wanted
this series to be as harmless as possible.
> This I agree with, and while I can see the layering argument for putting
> them into 'ip' and rtnetlink instead of ethtool, I also worry that these
> counters will simply be lost in obscurity, so I do wonder if it wouldn't
> be better to accept the "layering violation" and keeping them all in the
> 'ethtool -S' output?
I don't think we should harm the code and the logics in favor of
'some of the users can face something'. We don't control anything
related to XDP using Ethtool at all, but there is some XDP-related
stuff inside iproute2 code, so for me it's even more intuitive to
have them there.
Jakub, may be you'd like to add something at this point?
> [...]
>
> > + xdp-channel0-rx_xdp_redirect: 7
> > + xdp-channel0-rx_xdp_redirect_errors: 8
> > + xdp-channel0-rx_xdp_tx: 9
> > + xdp-channel0-rx_xdp_tx_errors: 10
> > + xdp-channel0-tx_xdp_xmit_packets: 11
> > + xdp-channel0-tx_xdp_xmit_bytes: 12
> > + xdp-channel0-tx_xdp_xmit_errors: 13
> > + xdp-channel0-tx_xdp_xmit_full: 14
> >
> > From a user PoV to avoid confusion, maybe should be made more clear that the latter refers
> > to xsk.
>
> +1, these should probably be xdp-channel0-tx_xsk_* or something like
> that...
I think I should expand this example in Docs a bit. For XSK, there's
a separate set of the same counters, and they differ as follows:
xdp-channel0-rx_xdp_packets: 0
xdp-channel0-rx_xdp_bytes: 1
xdp-channel0-rx_xdp_errors: 2
[ ... ]
xsk-channel0-rx_xdp_packets: 256
xsk-channel0-rx_xdp_bytes: 257
xsk-channel0-rx_xdp_errors: 258
[ ... ]
The only semantic difference is that 'tx_xdp_xmit' for XDP is a
counter for the packets gone through .ndo_xdp_xmit(), and in
case of XSK it's a counter for the packets gone through XSK
ZC xmit.
> -Toke
Thanks,
Al
From: Russell King (Oracle) <[email protected]>
Date: Wed, 24 Nov 2021 11:39:05 +0000
> On Tue, Nov 23, 2021 at 05:39:36PM +0100, Alexander Lobakin wrote:
> > + for_each_possible_cpu(cpu) {
> > + const struct mvneta_pcpu_stats *stats;
> > + const struct mvneta_stats *ps;
> > + u64 xdp_xmit_err;
> > + u64 xdp_redirect;
> > + u64 xdp_tx_err;
> > + u64 xdp_pass;
> > + u64 xdp_drop;
> > + u64 xdp_xmit;
> > + u64 xdp_tx;
> > + u32 start;
> > +
> > + stats = per_cpu_ptr(pp->stats, cpu);
> > + ps = &stats->es.ps;
> > +
> > + do {
> > + start = u64_stats_fetch_begin_irq(&stats->syncp);
> > +
> > + xdp_drop = ps->xdp_drop;
> > + xdp_pass = ps->xdp_pass;
> > + xdp_redirect = ps->xdp_redirect;
> > + xdp_tx = ps->xdp_tx;
> > + xdp_tx_err = ps->xdp_tx_err;
> > + xdp_xmit = ps->xdp_xmit;
> > + xdp_xmit_err = ps->xdp_xmit_err;
> > + } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
> > +
> > + xdp_stats->drop += xdp_drop;
> > + xdp_stats->pass += xdp_pass;
> > + xdp_stats->redirect += xdp_redirect;
> > + xdp_stats->tx += xdp_tx;
> > + xdp_stats->tx_errors += xdp_tx_err;
> > + xdp_stats->xmit_packets += xdp_xmit;
> > + xdp_stats->xmit_errors += xdp_xmit_err;
>
> Same comment as for mvpp2 - this could share a lot of code from
> mvneta_ethtool_update_pcpu_stats() (although it means we end up
> calculating a little more for the alloc error and refill error
> that this API doesn't need) but I think sharing that code would be
> a good idea.
Ah, I didn't do that because in my first series I was removing
Ethtool counters at all. In this one, I left them as-is due to
some of folks hinted me that those counters (not specifically
on mvpp2 or mvneta, let's say on virtio-net or so) could have
already been used in some admin scripts somewhere in the world
(but with a TODO to figure out which driver I could remove them
in and do that).
It would be great if you know and would hint me if I could remove
those XDP-related Ethtool counters from Marvell drivers or not.
If so, I'll wipe them, otherwise just factor out common parts to
wipe out code duplication.
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Thanks,
Al
On Thu, 25 Nov 2021 18:07:08 +0100 Alexander Lobakin wrote:
> > This I agree with, and while I can see the layering argument for putting
> > them into 'ip' and rtnetlink instead of ethtool, I also worry that these
> > counters will simply be lost in obscurity, so I do wonder if it wouldn't
> > be better to accept the "layering violation" and keeping them all in the
> > 'ethtool -S' output?
>
> I don't think we should harm the code and the logics in favor of
> 'some of the users can face something'. We don't control anything
> related to XDP using Ethtool at all, but there is some XDP-related
> stuff inside iproute2 code, so for me it's even more intuitive to
> have them there.
> Jakub, may be you'd like to add something at this point?
TBH I wasn't following this thread too closely since I saw Daniel
nacked it already. I do prefer rtnl xstats, I'd just report them
in -s if they are non-zero. But doesn't sound like we have an agreement
whether they should exist or not.
Can we think of an approach which would make cloudflare and cilium
happy? Feels like we're trying to make the slightly hypothetical
admin happy while ignoring objections of very real users.
> > > + xdp-channel0-rx_xdp_redirect: 7
> > > + xdp-channel0-rx_xdp_redirect_errors: 8
> > > + xdp-channel0-rx_xdp_tx: 9
> > > + xdp-channel0-rx_xdp_tx_errors: 10
> > > + xdp-channel0-tx_xdp_xmit_packets: 11
> > > + xdp-channel0-tx_xdp_xmit_bytes: 12
> > > + xdp-channel0-tx_xdp_xmit_errors: 13
> > > + xdp-channel0-tx_xdp_xmit_full: 14
Please leave the per-channel stats out. They make a precedent for
channel stats which should be an attribute of a channel. Working for
a large XDP user for a couple of years now I can tell you from my own
experience I've not once found them useful. In fact per-queue stats are
a major PITA as they crowd the output.
From: Jakub Kicinski <[email protected]>
Date: Thu, 25 Nov 2021 09:44:40 -0800
> On Thu, 25 Nov 2021 18:07:08 +0100 Alexander Lobakin wrote:
> > > This I agree with, and while I can see the layering argument for putting
> > > them into 'ip' and rtnetlink instead of ethtool, I also worry that these
> > > counters will simply be lost in obscurity, so I do wonder if it wouldn't
> > > be better to accept the "layering violation" and keeping them all in the
> > > 'ethtool -S' output?
> >
> > I don't think we should harm the code and the logics in favor of
> > 'some of the users can face something'. We don't control anything
> > related to XDP using Ethtool at all, but there is some XDP-related
> > stuff inside iproute2 code, so for me it's even more intuitive to
> > have them there.
> > Jakub, may be you'd like to add something at this point?
>
> TBH I wasn't following this thread too closely since I saw Daniel
> nacked it already. I do prefer rtnl xstats, I'd just report them
> in -s if they are non-zero. But doesn't sound like we have an agreement
> whether they should exist or not.
Right, just -s is fine, if we drop the per-channel approach.
> Can we think of an approach which would make cloudflare and cilium
> happy? Feels like we're trying to make the slightly hypothetical
> admin happy while ignoring objections of very real users.
The initial idea was to only uniform the drivers. But in general
you are right, 10 drivers having something doesn't mean it's
something good.
Maciej, I think you were talking about Cilium asking for those stats
in Intel drivers? Could you maybe provide their exact usecases/needs
so I'll orient myself? I certainly remember about XSK Tx packets and
bytes.
And speaking of XSK Tx, we have per-socket stats, isn't that enough?
> > > > + xdp-channel0-rx_xdp_redirect: 7
> > > > + xdp-channel0-rx_xdp_redirect_errors: 8
> > > > + xdp-channel0-rx_xdp_tx: 9
> > > > + xdp-channel0-rx_xdp_tx_errors: 10
> > > > + xdp-channel0-tx_xdp_xmit_packets: 11
> > > > + xdp-channel0-tx_xdp_xmit_bytes: 12
> > > > + xdp-channel0-tx_xdp_xmit_errors: 13
> > > > + xdp-channel0-tx_xdp_xmit_full: 14
>
> Please leave the per-channel stats out. They make a precedent for
> channel stats which should be an attribute of a channel. Working for
> a large XDP user for a couple of years now I can tell you from my own
> experience I've not once found them useful. In fact per-queue stats are
> a major PITA as they crowd the output.
Oh okay. My very first iterations were without this, but then I
found most of the drivers expose their XDP stats per-channel. Since
I didn't plan to degrade the functionality, they went that way.
Al
Alexander Lobakin <[email protected]> writes:
> From: Jakub Kicinski <[email protected]>
> Date: Thu, 25 Nov 2021 09:44:40 -0800
>
>> On Thu, 25 Nov 2021 18:07:08 +0100 Alexander Lobakin wrote:
>> > > This I agree with, and while I can see the layering argument for putting
>> > > them into 'ip' and rtnetlink instead of ethtool, I also worry that these
>> > > counters will simply be lost in obscurity, so I do wonder if it wouldn't
>> > > be better to accept the "layering violation" and keeping them all in the
>> > > 'ethtool -S' output?
>> >
>> > I don't think we should harm the code and the logics in favor of
>> > 'some of the users can face something'. We don't control anything
>> > related to XDP using Ethtool at all, but there is some XDP-related
>> > stuff inside iproute2 code, so for me it's even more intuitive to
>> > have them there.
>> > Jakub, may be you'd like to add something at this point?
>>
>> TBH I wasn't following this thread too closely since I saw Daniel
>> nacked it already. I do prefer rtnl xstats, I'd just report them
>> in -s if they are non-zero. But doesn't sound like we have an agreement
>> whether they should exist or not.
>
> Right, just -s is fine, if we drop the per-channel approach.
I agree that adding them to -s is fine (and that resolves my "no one
will find them" complain as well). If it crowds the output we could also
default to only output'ing a subset, and have the more detailed
statistics hidden behind a verbose switch (or even just in the JSON
output)?
>> Can we think of an approach which would make cloudflare and cilium
>> happy? Feels like we're trying to make the slightly hypothetical
>> admin happy while ignoring objections of very real users.
>
> The initial idea was to only uniform the drivers. But in general
> you are right, 10 drivers having something doesn't mean it's
> something good.
I don't think it's accurate to call the admin use case "hypothetical".
We're expending a significant effort explaining to people that XDP can
"eat" your packets, and not having any standard statistics makes this
way harder. We should absolutely cater to our "early adopters", but if
we want XDP to see wider adoption, making it "less weird" is critical!
> Maciej, I think you were talking about Cilium asking for those stats
> in Intel drivers? Could you maybe provide their exact usecases/needs
> so I'll orient myself? I certainly remember about XSK Tx packets and
> bytes.
> And speaking of XSK Tx, we have per-socket stats, isn't that enough?
IMO, as long as the packets are accounted for in the regular XDP stats,
having a whole separate set of stats only for XSK is less important.
>> Please leave the per-channel stats out. They make a precedent for
>> channel stats which should be an attribute of a channel. Working for
>> a large XDP user for a couple of years now I can tell you from my own
>> experience I've not once found them useful. In fact per-queue stats are
>> a major PITA as they crowd the output.
>
> Oh okay. My very first iterations were without this, but then I
> found most of the drivers expose their XDP stats per-channel. Since
> I didn't plan to degrade the functionality, they went that way.
I personally find the per-channel stats quite useful. One of the primary
reasons for not achieving full performance with XDP is broken
configuration of packet steering to CPUs, and having per-channel stats
is a nice way of seeing this. I can see the point about them being way
too verbose in the default output, though, and I do generally filter the
output as well when viewing them. But see my point above about only
printing a subset of the stats by default; per-channel stats could be
JSON-only, for instance?
-Toke
On Fri, 26 Nov 2021 13:30:16 +0100 Toke Høiland-Jørgensen wrote:
> >> TBH I wasn't following this thread too closely since I saw Daniel
> >> nacked it already. I do prefer rtnl xstats, I'd just report them
> >> in -s if they are non-zero. But doesn't sound like we have an agreement
> >> whether they should exist or not.
> >
> > Right, just -s is fine, if we drop the per-channel approach.
>
> I agree that adding them to -s is fine (and that resolves my "no one
> will find them" complain as well). If it crowds the output we could also
> default to only output'ing a subset, and have the more detailed
> statistics hidden behind a verbose switch (or even just in the JSON
> output)?
>
> >> Can we think of an approach which would make cloudflare and cilium
> >> happy? Feels like we're trying to make the slightly hypothetical
> >> admin happy while ignoring objections of very real users.
> >
> > The initial idea was to only uniform the drivers. But in general
> > you are right, 10 drivers having something doesn't mean it's
> > something good.
>
> I don't think it's accurate to call the admin use case "hypothetical".
> We're expending a significant effort explaining to people that XDP can
> "eat" your packets, and not having any standard statistics makes this
> way harder. We should absolutely cater to our "early adopters", but if
> we want XDP to see wider adoption, making it "less weird" is critical!
Fair. In all honesty I said that hoping to push for a more flexible
approach hidden entirely in BPF, and not involving driver changes.
Assuming the XDP program has more fine grained stats we should be able
to extract those instead of double-counting. Hence my vague "let's work
with apps" comment.
For example to a person familiar with the workload it'd be useful to
know if program returned XDP_DROP because of configured policy or
failure to parse a packet. I don't think that sort distinction is
achievable at the level of standard stats.
The information required by the admin is higher level. As you say the
primary concern there is "how many packets did XDP eat".
Speaking of which, one thing that badly needs clarification is our
expectation around XDP packets getting counted towards the interface
stats.
> > Maciej, I think you were talking about Cilium asking for those stats
> > in Intel drivers? Could you maybe provide their exact usecases/needs
> > so I'll orient myself? I certainly remember about XSK Tx packets and
> > bytes.
> > And speaking of XSK Tx, we have per-socket stats, isn't that enough?
>
> IMO, as long as the packets are accounted for in the regular XDP stats,
> having a whole separate set of stats only for XSK is less important.
>
> >> Please leave the per-channel stats out. They make a precedent for
> >> channel stats which should be an attribute of a channel. Working for
> >> a large XDP user for a couple of years now I can tell you from my own
> >> experience I've not once found them useful. In fact per-queue stats are
> >> a major PITA as they crowd the output.
> >
> > Oh okay. My very first iterations were without this, but then I
> > found most of the drivers expose their XDP stats per-channel. Since
> > I didn't plan to degrade the functionality, they went that way.
>
> I personally find the per-channel stats quite useful. One of the primary
> reasons for not achieving full performance with XDP is broken
> configuration of packet steering to CPUs, and having per-channel stats
> is a nice way of seeing this.
Right, that's about the only thing I use it for as well. "Is the load
evenly distributed?" But that's not XDP specific and not worth
standardizing for, yet, IMO, because..
> I can see the point about them being way too verbose in the default
> output, though, and I do generally filter the output as well when
> viewing them. But see my point above about only printing a subset of
> the stats by default; per-channel stats could be JSON-only, for
> instance?
we don't even know what constitutes a channel today. And that will
become increasingly problematic as importance of application specific
queues increases (zctap etc). IMO until the ontological gaps around
queues are filled we should leave per-queue stats in ethtool -S.
Jakub Kicinski <[email protected]> writes:
> On Fri, 26 Nov 2021 13:30:16 +0100 Toke Høiland-Jørgensen wrote:
>> >> TBH I wasn't following this thread too closely since I saw Daniel
>> >> nacked it already. I do prefer rtnl xstats, I'd just report them
>> >> in -s if they are non-zero. But doesn't sound like we have an agreement
>> >> whether they should exist or not.
>> >
>> > Right, just -s is fine, if we drop the per-channel approach.
>>
>> I agree that adding them to -s is fine (and that resolves my "no one
>> will find them" complain as well). If it crowds the output we could also
>> default to only output'ing a subset, and have the more detailed
>> statistics hidden behind a verbose switch (or even just in the JSON
>> output)?
>>
>> >> Can we think of an approach which would make cloudflare and cilium
>> >> happy? Feels like we're trying to make the slightly hypothetical
>> >> admin happy while ignoring objections of very real users.
>> >
>> > The initial idea was to only uniform the drivers. But in general
>> > you are right, 10 drivers having something doesn't mean it's
>> > something good.
>>
>> I don't think it's accurate to call the admin use case "hypothetical".
>> We're expending a significant effort explaining to people that XDP can
>> "eat" your packets, and not having any standard statistics makes this
>> way harder. We should absolutely cater to our "early adopters", but if
>> we want XDP to see wider adoption, making it "less weird" is critical!
>
> Fair. In all honesty I said that hoping to push for a more flexible
> approach hidden entirely in BPF, and not involving driver changes.
> Assuming the XDP program has more fine grained stats we should be able
> to extract those instead of double-counting. Hence my vague "let's work
> with apps" comment.
>
> For example to a person familiar with the workload it'd be useful to
> know if program returned XDP_DROP because of configured policy or
> failure to parse a packet. I don't think that sort distinction is
> achievable at the level of standard stats.
>
> The information required by the admin is higher level. As you say the
> primary concern there is "how many packets did XDP eat".
Right, sure, I am also totally fine with having only a somewhat
restricted subset of stats available at the interface level and make
everything else be BPF-based. I'm hoping we can converge of a common
understanding of what this "minimal set" should be :)
> Speaking of which, one thing that badly needs clarification is our
> expectation around XDP packets getting counted towards the interface
> stats.
Agreed. My immediate thought is that "XDP packets are interface packets"
but that is certainly not what we do today, so not sure if changing it
at this point would break things?
>> > Maciej, I think you were talking about Cilium asking for those stats
>> > in Intel drivers? Could you maybe provide their exact usecases/needs
>> > so I'll orient myself? I certainly remember about XSK Tx packets and
>> > bytes.
>> > And speaking of XSK Tx, we have per-socket stats, isn't that enough?
>>
>> IMO, as long as the packets are accounted for in the regular XDP stats,
>> having a whole separate set of stats only for XSK is less important.
>>
>> >> Please leave the per-channel stats out. They make a precedent for
>> >> channel stats which should be an attribute of a channel. Working for
>> >> a large XDP user for a couple of years now I can tell you from my own
>> >> experience I've not once found them useful. In fact per-queue stats are
>> >> a major PITA as they crowd the output.
>> >
>> > Oh okay. My very first iterations were without this, but then I
>> > found most of the drivers expose their XDP stats per-channel. Since
>> > I didn't plan to degrade the functionality, they went that way.
>>
>> I personally find the per-channel stats quite useful. One of the primary
>> reasons for not achieving full performance with XDP is broken
>> configuration of packet steering to CPUs, and having per-channel stats
>> is a nice way of seeing this.
>
> Right, that's about the only thing I use it for as well. "Is the load
> evenly distributed?" But that's not XDP specific and not worth
> standardizing for, yet, IMO, because..
>
>> I can see the point about them being way too verbose in the default
>> output, though, and I do generally filter the output as well when
>> viewing them. But see my point above about only printing a subset of
>> the stats by default; per-channel stats could be JSON-only, for
>> instance?
>
> we don't even know what constitutes a channel today. And that will
> become increasingly problematic as importance of application specific
> queues increases (zctap etc). IMO until the ontological gaps around
> queues are filled we should leave per-queue stats in ethtool -S.
Hmm, right, I see. I suppose that as long as the XDP packets show up in
one of the interface counters in ethtool -S, it's possible to answer the
load distribution issue, and any further debugging (say, XDP drops on a
certain queue due to CPU-based queue indexing on TX) can be delegated to
BPF-based tools...
-Toke
On Fri, 26 Nov 2021 19:47:17 +0100 Toke Høiland-Jørgensen wrote:
> > Fair. In all honesty I said that hoping to push for a more flexible
> > approach hidden entirely in BPF, and not involving driver changes.
> > Assuming the XDP program has more fine grained stats we should be able
> > to extract those instead of double-counting. Hence my vague "let's work
> > with apps" comment.
> >
> > For example to a person familiar with the workload it'd be useful to
> > know if program returned XDP_DROP because of configured policy or
> > failure to parse a packet. I don't think that sort distinction is
> > achievable at the level of standard stats.
> >
> > The information required by the admin is higher level. As you say the
> > primary concern there is "how many packets did XDP eat".
>
> Right, sure, I am also totally fine with having only a somewhat
> restricted subset of stats available at the interface level and make
> everything else be BPF-based. I'm hoping we can converge of a common
> understanding of what this "minimal set" should be :)
>
> > Speaking of which, one thing that badly needs clarification is our
> > expectation around XDP packets getting counted towards the interface
> > stats.
>
> Agreed. My immediate thought is that "XDP packets are interface packets"
> but that is certainly not what we do today, so not sure if changing it
> at this point would break things?
I'd vote for taking the risk and trying to align all the drivers.
On 11/26/21 7:06 PM, Jakub Kicinski wrote:
> On Fri, 26 Nov 2021 13:30:16 +0100 Toke Høiland-Jørgensen wrote:
>>>> TBH I wasn't following this thread too closely since I saw Daniel
>>>> nacked it already. I do prefer rtnl xstats, I'd just report them
>>>> in -s if they are non-zero. But doesn't sound like we have an agreement
>>>> whether they should exist or not.
>>>
>>> Right, just -s is fine, if we drop the per-channel approach.
>>
>> I agree that adding them to -s is fine (and that resolves my "no one
>> will find them" complain as well). If it crowds the output we could also
>> default to only output'ing a subset, and have the more detailed
>> statistics hidden behind a verbose switch (or even just in the JSON
>> output)?
>>
>>>> Can we think of an approach which would make cloudflare and cilium
>>>> happy? Feels like we're trying to make the slightly hypothetical
>>>> admin happy while ignoring objections of very real users.
>>>
>>> The initial idea was to only uniform the drivers. But in general
>>> you are right, 10 drivers having something doesn't mean it's
>>> something good.
>>
>> I don't think it's accurate to call the admin use case "hypothetical".
>> We're expending a significant effort explaining to people that XDP can
>> "eat" your packets, and not having any standard statistics makes this
>> way harder. We should absolutely cater to our "early adopters", but if
>> we want XDP to see wider adoption, making it "less weird" is critical!
>
> Fair. In all honesty I said that hoping to push for a more flexible
> approach hidden entirely in BPF, and not involving driver changes.
> Assuming the XDP program has more fine grained stats we should be able
> to extract those instead of double-counting. Hence my vague "let's work
> with apps" comment.
>
> For example to a person familiar with the workload it'd be useful to
> know if program returned XDP_DROP because of configured policy or
> failure to parse a packet. I don't think that sort distinction is
> achievable at the level of standard stats.
Agree on the additional context. How often have you looked at tc clsact
/dropped/ stats specifically when you debug a more complex BPF program
there?
# tc -s qdisc show clsact dev foo
qdisc clsact ffff: parent ffff:fff1
Sent 6800 bytes 120 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Similarly, XDP_PASS counters may be of limited use as well for same reason
(and I think we might not even have a tc counter equivalent for it).
> The information required by the admin is higher level. As you say the
> primary concern there is "how many packets did XDP eat".
Agree. Above said, for XDP_DROP I would see one use case where you compare
different drivers or bond vs no bond as we did in the past in [0] when
testing against a packet generator (although I don't see bond driver covered
in this series here yet where it aggregates the XDP stats from all bond slave
devs).
On a higher-level wrt "how many packets did XDP eat", it would make sense
to have the stats for successful XDP_{TX,REDIRECT} given these are out
of reach from a BPF prog PoV - we can only count there how many times we
returned with XDP_TX but not whether the pkt /successfully made it/.
In terms of error cases, could we just standardize all drivers on the behavior
of e.g. mlx5e_xdp_handle(), meaning, a failure from XDP_{TX,REDIRECT} will
hit the trace_xdp_exception() and then fallthrough to bump a drop counter
(same as we bump in XDP_DROP then). So the drop counter will account for
program drops but also driver-related drops.
At some later point the trace_xdp_exception() could be extended with an error
code that the driver would propagate (given some of them look quite similar
across drivers, fwiw), and then whoever wants to do further processing with
them can do so via bpftrace or other tooling.
So overall wrt this series: from the lrstats we'd be /dropping/ the pass,
tx_errors, redirect_errors, invalid, aborted counters. And we'd be /keeping/
bytes & packets counters that XDP sees, (driver-)successful tx & redirect
counters as well as drop counter. Also, XDP bytes & packets counters should
not be counted twice wrt ethtool stats.
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e2ee5c7e7c35d195e2aa0692a7241d47a433d1e
Thanks,
Daniel
On 11/26/21 11:27 PM, Daniel Borkmann wrote:
> On 11/26/21 7:06 PM, Jakub Kicinski wrote:
[...]
>> The information required by the admin is higher level. As you say the
>> primary concern there is "how many packets did XDP eat".
>
> Agree. Above said, for XDP_DROP I would see one use case where you compare
> different drivers or bond vs no bond as we did in the past in [0] when
> testing against a packet generator (although I don't see bond driver covered
> in this series here yet where it aggregates the XDP stats from all bond slave
> devs).
>
> On a higher-level wrt "how many packets did XDP eat", it would make sense
> to have the stats for successful XDP_{TX,REDIRECT} given these are out
> of reach from a BPF prog PoV - we can only count there how many times we
> returned with XDP_TX but not whether the pkt /successfully made it/.
>
> In terms of error cases, could we just standardize all drivers on the behavior
> of e.g. mlx5e_xdp_handle(), meaning, a failure from XDP_{TX,REDIRECT} will
> hit the trace_xdp_exception() and then fallthrough to bump a drop counter
> (same as we bump in XDP_DROP then). So the drop counter will account for
> program drops but also driver-related drops.
>
> At some later point the trace_xdp_exception() could be extended with an error
> code that the driver would propagate (given some of them look quite similar
> across drivers, fwiw), and then whoever wants to do further processing with
> them can do so via bpftrace or other tooling.
Just thinking out loud, one straight forward example we could start out with
that is also related to Paolo's series [1] ...
enum xdp_error {
XDP_UNKNOWN,
XDP_ACTION_INVALID,
XDP_ACTION_UNSUPPORTED,
};
... and then bpf_warn_invalid_xdp_action() returns one of the latter two
which we pass to trace_xdp_exception(). Later there could be XDP_DRIVER_*
cases e.g. propagated from XDP_TX error exceptions.
[...]
default:
err = bpf_warn_invalid_xdp_action(act);
fallthrough;
case XDP_ABORTED:
xdp_abort:
trace_xdp_exception(rq->netdev, prog, act, err);
fallthrough;
case XDP_DROP:
lrstats->xdp_drop++;
break;
}
[...]
[1] https://lore.kernel.org/netdev/[email protected]/
> So overall wrt this series: from the lrstats we'd be /dropping/ the pass,
> tx_errors, redirect_errors, invalid, aborted counters. And we'd be /keeping/
> bytes & packets counters that XDP sees, (driver-)successful tx & redirect
> counters as well as drop counter. Also, XDP bytes & packets counters should
> not be counted twice wrt ethtool stats.
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e2ee5c7e7c35d195e2aa0692a7241d47a433d1e
+Petr, Nik
On Fri, Nov 26, 2021 at 11:14:31AM -0800, Jakub Kicinski wrote:
> On Fri, 26 Nov 2021 19:47:17 +0100 Toke H?iland-J?rgensen wrote:
> > > Fair. In all honesty I said that hoping to push for a more flexible
> > > approach hidden entirely in BPF, and not involving driver changes.
> > > Assuming the XDP program has more fine grained stats we should be able
> > > to extract those instead of double-counting. Hence my vague "let's work
> > > with apps" comment.
> > >
> > > For example to a person familiar with the workload it'd be useful to
> > > know if program returned XDP_DROP because of configured policy or
> > > failure to parse a packet. I don't think that sort distinction is
> > > achievable at the level of standard stats.
> > >
> > > The information required by the admin is higher level. As you say the
> > > primary concern there is "how many packets did XDP eat".
> >
> > Right, sure, I am also totally fine with having only a somewhat
> > restricted subset of stats available at the interface level and make
> > everything else be BPF-based. I'm hoping we can converge of a common
> > understanding of what this "minimal set" should be :)
> >
> > > Speaking of which, one thing that badly needs clarification is our
> > > expectation around XDP packets getting counted towards the interface
> > > stats.
> >
> > Agreed. My immediate thought is that "XDP packets are interface packets"
> > but that is certainly not what we do today, so not sure if changing it
> > at this point would break things?
>
> I'd vote for taking the risk and trying to align all the drivers.
I agree. I think IFLA_STATS64 in RTM_NEWLINK should contain statistics
of all the packets seen by the netdev. The breakdown into software /
hardware / XDP should be reported via RTM_NEWSTATS.
Currently, for soft devices such as VLANs, bridges and GRE, user space
only sees statistics of packets forwarded by software, which is quite
useless when forwarding is offloaded from the kernel to hardware.
Petr is working on exposing hardware statistics for such devices via
rtnetlink. Unlike XDP (?), we need to be able to let user space enable /
disable hardware statistics as we have a limited number of hardware
counters and they can also reduce the bandwidth when enabled. We are
thinking of adding a new RTM_SETSTATS for that:
# ip stats set dev swp1 hw_stats on
For query, something like (under discussion):
# ip stats show dev swp1 // all groups
# ip stats show dev swp1 group link
# ip stats show dev swp1 group offload // all sub-groups
# ip stats show dev swp1 group offload sub-group cpu
# ip stats show dev swp1 group offload sub-group hw
Like other iproute2 commands, these follow the nesting of the
RTM_{NEW,GET}STATS uAPI.
Looking at patch #1 [1], I think that whatever you decide to expose for
XDP can be queried via:
# ip stats show dev swp1 group xdp
# ip stats show dev swp1 group xdp sub-group regular
# ip stats show dev swp1 group xdp sub-group xsk
Regardless, the following command should show statistics of all the
packets seen by the netdev:
# ip -s link show dev swp1
There is a PR [2] for node_exporter to use rtnetlink to fetch netdev
statistics instead of the old proc interface. It should be possible to
extend it to use RTM_*STATS for more fine-grained statistics.
[1] https://lore.kernel.org/netdev/[email protected]/
[2] https://github.com/prometheus/node_exporter/pull/2074
On 11/23/21 9:39 AM, Alexander Lobakin wrote:
> This is an almost complete rework of [0].
>
> This series introduces generic XDP statistics infra based on rtnl
> xstats (Ethtool standard stats previously), and wires up the drivers
> which collect appropriate statistics to this new interface. Finally,
> it introduces XDP/XSK statistics to all XDP-capable Intel drivers.
>
> Those counters are:
> * packets: number of frames passed to bpf_prog_run_xdp().
> * bytes: number of bytes went through bpf_prog_run_xdp().
> * errors: number of general XDP errors, if driver has one unified
> counter.
> * aborted: number of XDP_ABORTED returns.
> * drop: number of XDP_DROP returns.
> * invalid: number of returns of unallowed values (i.e. not XDP_*).
> * pass: number of XDP_PASS returns.
> * redirect: number of successfully performed XDP_REDIRECT requests.
> * redirect_errors: number of failed XDP_REDIRECT requests.
> * tx: number of successfully performed XDP_TX requests.
> * tx_errors: number of failed XDP_TX requests.
> * xmit_packets: number of successfully transmitted XDP/XSK frames.
> * xmit_bytes: number of successfully transmitted XDP/XSK frames.
> * xmit_errors: of XDP/XSK frames failed to transmit.
> * xmit_full: number of XDP/XSK queue being full at the moment of
> transmission.
>
> To provide them, developers need to implement .ndo_get_xdp_stats()
> and, if they want to expose stats on a per-channel basis,
> .ndo_get_xdp_stats_nch(). include/net/xdp.h contains some helper
Why the tie to a channel? There are Rx queues and Tx queues and no
requirement to link them into a channel. It would be better (more
flexible) to allow them to be independent. Rather than ask the driver
"how many channels", ask 'how many Rx queues' and 'how many Tx queues'
for which xdp stats are reported.
From there, allow queue numbers or queue id's to be non-consecutive and
add a queue id or number as an attribute. e.g.,
[XDP stats]
[ Rx queue N]
counters
[ Tx queue N]
counters
This would allow a follow on patch set to do something like "Give me XDP
stats for Rx queue N" instead of doing a full dump.
Daniel Borkmann <[email protected]> writes:
> On 11/26/21 7:06 PM, Jakub Kicinski wrote:
>> On Fri, 26 Nov 2021 13:30:16 +0100 Toke Høiland-Jørgensen wrote:
>>>>> TBH I wasn't following this thread too closely since I saw Daniel
>>>>> nacked it already. I do prefer rtnl xstats, I'd just report them
>>>>> in -s if they are non-zero. But doesn't sound like we have an agreement
>>>>> whether they should exist or not.
>>>>
>>>> Right, just -s is fine, if we drop the per-channel approach.
>>>
>>> I agree that adding them to -s is fine (and that resolves my "no one
>>> will find them" complain as well). If it crowds the output we could also
>>> default to only output'ing a subset, and have the more detailed
>>> statistics hidden behind a verbose switch (or even just in the JSON
>>> output)?
>>>
>>>>> Can we think of an approach which would make cloudflare and cilium
>>>>> happy? Feels like we're trying to make the slightly hypothetical
>>>>> admin happy while ignoring objections of very real users.
>>>>
>>>> The initial idea was to only uniform the drivers. But in general
>>>> you are right, 10 drivers having something doesn't mean it's
>>>> something good.
>>>
>>> I don't think it's accurate to call the admin use case "hypothetical".
>>> We're expending a significant effort explaining to people that XDP can
>>> "eat" your packets, and not having any standard statistics makes this
>>> way harder. We should absolutely cater to our "early adopters", but if
>>> we want XDP to see wider adoption, making it "less weird" is critical!
>>
>> Fair. In all honesty I said that hoping to push for a more flexible
>> approach hidden entirely in BPF, and not involving driver changes.
>> Assuming the XDP program has more fine grained stats we should be able
>> to extract those instead of double-counting. Hence my vague "let's work
>> with apps" comment.
>>
>> For example to a person familiar with the workload it'd be useful to
>> know if program returned XDP_DROP because of configured policy or
>> failure to parse a packet. I don't think that sort distinction is
>> achievable at the level of standard stats.
>
> Agree on the additional context. How often have you looked at tc clsact
> /dropped/ stats specifically when you debug a more complex BPF program
> there?
>
> # tc -s qdisc show clsact dev foo
> qdisc clsact ffff: parent ffff:fff1
> Sent 6800 bytes 120 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
>
> Similarly, XDP_PASS counters may be of limited use as well for same reason
> (and I think we might not even have a tc counter equivalent for it).
>
>> The information required by the admin is higher level. As you say the
>> primary concern there is "how many packets did XDP eat".
>
> Agree. Above said, for XDP_DROP I would see one use case where you compare
> different drivers or bond vs no bond as we did in the past in [0] when
> testing against a packet generator (although I don't see bond driver covered
> in this series here yet where it aggregates the XDP stats from all bond slave
> devs).
>
> On a higher-level wrt "how many packets did XDP eat", it would make sense
> to have the stats for successful XDP_{TX,REDIRECT} given these are out
> of reach from a BPF prog PoV - we can only count there how many times we
> returned with XDP_TX but not whether the pkt /successfully made it/.
>
> In terms of error cases, could we just standardize all drivers on the behavior
> of e.g. mlx5e_xdp_handle(), meaning, a failure from XDP_{TX,REDIRECT} will
> hit the trace_xdp_exception() and then fallthrough to bump a drop counter
> (same as we bump in XDP_DROP then). So the drop counter will account for
> program drops but also driver-related drops.
>
> At some later point the trace_xdp_exception() could be extended with an error
> code that the driver would propagate (given some of them look quite similar
> across drivers, fwiw), and then whoever wants to do further processing with
> them can do so via bpftrace or other tooling.
>
> So overall wrt this series: from the lrstats we'd be /dropping/ the pass,
> tx_errors, redirect_errors, invalid, aborted counters. And we'd be /keeping/
> bytes & packets counters that XDP sees, (driver-)successful tx & redirect
> counters as well as drop counter. Also, XDP bytes & packets counters should
> not be counted twice wrt ethtool stats.
This sounds reasonable to me, and I also like the error code to
tracepoint idea :)
-Toke
Alexander Lobakin <[email protected]> writes:
> ena driver has 6 XDP counters collected per-channel. Add
> callbacks
> for getting the number of channels and those counters using
> generic
> XDP stats infra.
>
> Signed-off-by: Alexander Lobakin <[email protected]>
> Reviewed-by: Jesse Brandeburg <[email protected]>
> ---
> drivers/net/ethernet/amazon/ena/ena_netdev.c | 53
> ++++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 7d5d885d85d5..83e9b85cc998 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -3313,12 +3313,65 @@ static void ena_get_stats64(struct
> net_device *netdev,
> stats->tx_errors = 0;
> }
>
> +static int ena_get_xdp_stats_nch(const struct net_device
> *netdev, u32 attr_id)
> +{
> + const struct ena_adapter *adapter = netdev_priv(netdev);
> +
> + switch (attr_id) {
> + case IFLA_XDP_XSTATS_TYPE_XDP:
> + return adapter->num_io_queues;
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +static int ena_get_xdp_stats(const struct net_device *netdev,
> u32 attr_id,
> + void *attr_data)
> +{
> + const struct ena_adapter *adapter = netdev_priv(netdev);
> + struct ifla_xdp_stats *xdp_stats = attr_data;
> + u32 i;
> +
> + switch (attr_id) {
> + case IFLA_XDP_XSTATS_TYPE_XDP:
> + break;
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + for (i = 0; i < adapter->num_io_queues; i++) {
> + const struct u64_stats_sync *syncp;
> + const struct ena_stats_rx *stats;
> + u32 start;
> +
> + stats = &adapter->rx_ring[i].rx_stats;
> + syncp = &adapter->rx_ring[i].syncp;
> +
> + do {
> + start = u64_stats_fetch_begin_irq(syncp);
> +
> + xdp_stats->drop = stats->xdp_drop;
> + xdp_stats->pass = stats->xdp_pass;
> + xdp_stats->tx = stats->xdp_tx;
> + xdp_stats->redirect = stats->xdp_redirect;
> + xdp_stats->aborted = stats->xdp_aborted;
> + xdp_stats->invalid = stats->xdp_invalid;
> + } while (u64_stats_fetch_retry_irq(syncp, start));
> +
> + xdp_stats++;
> + }
> +
> + return 0;
> +}
> +
Hi,
thank you for the time you took in adding ENA support, this code
doesn't update the XDP TX queues (which only available when an XDP
program is loaded).
In theory the following patch should fix it, but I was unable to
compile your version of iproute2 and test the patch properly. Can
you please let me know if I need to do anything special to bring
up your version of iproute2 and test this patch?
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 7d5d885d8..4e89a7d60 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3313,12 +3313,85 @@ static void ena_get_stats64(struct
net_device *netdev,
stats->tx_errors = 0;
}
+static int ena_get_xdp_stats_nch(const struct net_device *netdev,
u32 attr_id)
+{
+ const struct ena_adapter *adapter = netdev_priv(netdev);
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ return adapter->num_io_queues;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static int ena_get_xdp_stats(const struct net_device *netdev, u32
attr_id,
+ void *attr_data)
+{
+ const struct ena_adapter *adapter = netdev_priv(netdev);
+ struct ifla_xdp_stats *xdp_stats = attr_data;
+ const struct u64_stats_sync *syncp;
+ u32 start;
+ u32 i;
+
+ switch (attr_id) {
+ case IFLA_XDP_XSTATS_TYPE_XDP:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ for (i = 0; i < adapter->num_io_queues; i++) {
+ const struct ena_stats_rx *rx_stats;
+
+ rx_stats = &adapter->rx_ring[i].rx_stats;
+ syncp = &adapter->rx_ring[i].syncp;
+
+ do {
+ start = u64_stats_fetch_begin_irq(syncp);
+
+ xdp_stats->drop = rx_stats->xdp_drop;
+ xdp_stats->pass = rx_stats->xdp_pass;
+ xdp_stats->tx = rx_stats->xdp_tx;
+ xdp_stats->redirect =
rx_stats->xdp_redirect;
+ xdp_stats->aborted =
rx_stats->xdp_aborted;
+ xdp_stats->invalid =
rx_stats->xdp_invalid;
+ } while (u64_stats_fetch_retry_irq(syncp, start));
+
+ xdp_stats++;
+ }
+
+ xdp_stats = attr_data;
+ /* xdp_num_queues can be 0 if an XDP program isn't loaded
*/
+ for (i = 0; i < adapter->xdp_num_queues; i++) {
+ const struct ena_stats_tx *tx_stats;
+
+ tx_stats =
&adapter->rx_ring[i].xdp_ring->tx_stats;
+ syncp = &adapter->rx_ring[i].xdp_ring->syncp;
+
+ do {
+ start = u64_stats_fetch_begin_irq(syncp);
+
+ xdp_stats->xmit_packets = tx_stats->cnt;
+ xdp_stats->xmit_bytes = tx_stats->bytes;
+ xdp_stats->xmit_errors =
tx_stats->dma_mapping_err +
+
tx_stats->prepare_ctx_err;
+ } while (u64_stats_fetch_retry_irq(syncp, start));
+
+ xdp_stats++;
+ }
+
+ return 0;
+}
+
static const struct net_device_ops ena_netdev_ops = {
.ndo_open = ena_open,
.ndo_stop = ena_close,
.ndo_start_xmit = ena_start_xmit,
.ndo_select_queue = ena_select_queue,
.ndo_get_stats64 = ena_get_stats64,
+ .ndo_get_xdp_stats_nch = ena_get_xdp_stats_nch,
+ .ndo_get_xdp_stats = ena_get_xdp_stats,
.ndo_tx_timeout = ena_tx_timeout,
.ndo_change_mtu = ena_change_mtu,
.ndo_set_mac_address = NULL,
On 27/11/2021 00.01, Daniel Borkmann wrote:
> On 11/26/21 11:27 PM, Daniel Borkmann wrote:
>> On 11/26/21 7:06 PM, Jakub Kicinski wrote:
> [...]
>>> The information required by the admin is higher level. As you say the
>>> primary concern there is "how many packets did XDP eat".
>>
>> Agree. Above said, for XDP_DROP I would see one use case where you
>> compare
>> different drivers or bond vs no bond as we did in the past in [0] when
>> testing against a packet generator (although I don't see bond driver
>> covered
>> in this series here yet where it aggregates the XDP stats from all
>> bond slave devs).
>>
>> On a higher-level wrt "how many packets did XDP eat", it would make sense
>> to have the stats for successful XDP_{TX,REDIRECT} given these are out
>> of reach from a BPF prog PoV - we can only count there how many times we
>> returned with XDP_TX but not whether the pkt /successfully made it/.
>>
Exactly.
>> In terms of error cases, could we just standardize all drivers on the
>> behavior
>> of e.g. mlx5e_xdp_handle(), meaning, a failure from XDP_{TX,REDIRECT} will
>> hit the trace_xdp_exception() and then fallthrough to bump a drop counter
>> (same as we bump in XDP_DROP then). So the drop counter will account for
>> program drops but also driver-related drops.
>>
Hmm... I don't agree here. IMHO the BPF-program's *choice* to drop (via
XDP_DROP) should NOT share the counter with the driver-related drops.
The driver-related drops must be accounted separate.
For the record, I think mlx5e_xdp_handle() does the wrong thing, of
accounting everything as XDP_DROP in (rq->stats->xdp_drop++).
Current mlx5 driver stats are highly problematic actually.
Please don't model stats behavior after this driver.
E.g. if BPF-prog takes the *choice* XDP_TX or XDP_REDIRECT or XDP_DROP,
then the packet is invisible to "ifconfig" stats. It is like the driver
never received these packets (which is wrong IMHO). (The stats are only
avail via ethtool -S).
>> At some later point the trace_xdp_exception() could be extended with
>> an error
>> code that the driver would propagate (given some of them look quite
>> similar
>> across drivers, fwiw), and then whoever wants to do further processing
>> with them can do so via bpftrace or other tooling.
I do like trace_xdp_exception() is invoked in mlx5e_xdp_handle(), but do
notice that xdp_do_redirect() also have a tracepoint that can be used
for troubleshooting. (I usually use xdp_monitor for troubleshooting
which catch both).
I like the stats XDP handling better in mvneta_run_xdp().
> Just thinking out loud, one straight forward example we could start out
> with that is also related to Paolo's series [1] ...
>
> enum xdp_error {
> XDP_UNKNOWN,
> XDP_ACTION_INVALID,
> XDP_ACTION_UNSUPPORTED,
> };
>
> ... and then bpf_warn_invalid_xdp_action() returns one of the latter two
> which we pass to trace_xdp_exception(). Later there could be XDP_DRIVER_*
> cases e.g. propagated from XDP_TX error exceptions.
>
> [...]
> default:
> err = bpf_warn_invalid_xdp_action(act);
> fallthrough;
> case XDP_ABORTED:
> xdp_abort:
> trace_xdp_exception(rq->netdev, prog, act, err);
> fallthrough;
> case XDP_DROP:
> lrstats->xdp_drop++;
> break;
> }
> [...]
>
> [1]
> https://lore.kernel.org/netdev/[email protected]/
>
>> So overall wrt this series: from the lrstats we'd be /dropping/ the pass,
>> tx_errors, redirect_errors, invalid, aborted counters. And we'd be
>> /keeping/
>> bytes & packets counters that XDP sees, (driver-)successful tx & redirect
>> counters as well as drop counter. Also, XDP bytes & packets counters
>> should
>> not be counted twice wrt ethtool stats.
>>
>> [0]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e2ee5c7e7c35d195e2aa0692a7241d47a433d1e
>>
Concrete example with mlx5:
For most other hardware (than mlx5) I experience that XDP_TX creates a
push-back on NIC RX-handing speed. Thus, the XDP_TX stats recorded by
BPF-prog is usually correct.
With mlx5 hardware (tested on ConnectX-5 Ex MT28800) the RX
packets-per-sec (pps) stats can easily be faster than actually XDP_TX
transmitted frames.
$ sudo ./xdp_rxq_info --dev mlx5p1 --action XDP_TX
[...]
Running XDP on dev:mlx5p1 (ifindex:10) action:XDP_TX options:swapmac
XDP stats CPU pps issue-pps
XDP-RX CPU 1 13,922,430 0
XDP-RX CPU total 13,922,430
RXQ stats RXQ:CPU pps issue-pps
rx_queue_index 1:1 13,922,430 0
rx_queue_index 1:sum 13,922,430
The real xmit speed is (from below ethtool_stats.pl) is
9,391,314 pps <= rx1_xdp_tx_xmit /sec
The dropped packets are double accounted as:
4,552,033 <= rx_xdp_drop /sec
4,552,033 <= rx_xdp_tx_full /sec
Show adapter(s) (mlx5p1) statistics (ONLY that changed!)
Ethtool(mlx5p1 ) stat: 217865 ( 217,865) <= ch1_poll /sec
Ethtool(mlx5p1 ) stat: 217864 ( 217,864) <= ch_poll /sec
Ethtool(mlx5p1 ) stat: 13943371 ( 13,943,371) <=
rx1_cache_reuse /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx1_xdp_drop /sec
Ethtool(mlx5p1 ) stat: 146740 ( 146,740) <=
rx1_xdp_tx_cqes /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <=
rx1_xdp_tx_full /sec
Ethtool(mlx5p1 ) stat: 9391314 ( 9,391,314) <=
rx1_xdp_tx_inlnw /sec
Ethtool(mlx5p1 ) stat: 880436 ( 880,436) <=
rx1_xdp_tx_mpwqe /sec
Ethtool(mlx5p1 ) stat: 997833 ( 997,833) <=
rx1_xdp_tx_nops /sec
Ethtool(mlx5p1 ) stat: 9391314 ( 9,391,314) <=
rx1_xdp_tx_xmit /sec
Ethtool(mlx5p1 ) stat: 45095173 ( 45,095,173) <=
rx_64_bytes_phy /sec
Ethtool(mlx5p1 ) stat: 2886090490 ( 2,886,090,490) <= rx_bytes_phy /sec
Ethtool(mlx5p1 ) stat: 13943293 ( 13,943,293) <= rx_cache_reuse
/sec
Ethtool(mlx5p1 ) stat: 31151957 ( 31,151,957) <=
rx_out_of_buffer /sec
Ethtool(mlx5p1 ) stat: 45095158 ( 45,095,158) <= rx_packets_phy
/sec
Ethtool(mlx5p1 ) stat: 2886072350 ( 2,886,072,350) <= rx_prio0_bytes
/sec
Ethtool(mlx5p1 ) stat: 45094878 ( 45,094,878) <=
rx_prio0_packets /sec
Ethtool(mlx5p1 ) stat: 2705707938 ( 2,705,707,938) <=
rx_vport_unicast_bytes /sec
Ethtool(mlx5p1 ) stat: 45095129 ( 45,095,129) <=
rx_vport_unicast_packets /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx_xdp_drop /sec
Ethtool(mlx5p1 ) stat: 146739 ( 146,739) <= rx_xdp_tx_cqe /sec
Ethtool(mlx5p1 ) stat: 4552033 ( 4,552,033) <= rx_xdp_tx_full
/sec
Ethtool(mlx5p1 ) stat: 9391319 ( 9,391,319) <=
rx_xdp_tx_inlnw /sec
Ethtool(mlx5p1 ) stat: 880436 ( 880,436) <=
rx_xdp_tx_mpwqe /sec
Ethtool(mlx5p1 ) stat: 997831 ( 997,831) <= rx_xdp_tx_nops
/sec
Ethtool(mlx5p1 ) stat: 9391319 ( 9,391,319) <= rx_xdp_tx_xmit
/sec
Ethtool(mlx5p1 ) stat: 601044221 ( 601,044,221) <= tx_bytes_phy /sec
Ethtool(mlx5p1 ) stat: 9391316 ( 9,391,316) <= tx_packets_phy
/sec
Ethtool(mlx5p1 ) stat: 601040871 ( 601,040,871) <= tx_prio0_bytes
/sec
Ethtool(mlx5p1 ) stat: 9391264 ( 9,391,264) <=
tx_prio0_packets /sec
Ethtool(mlx5p1 ) stat: 563478483 ( 563,478,483) <=
tx_vport_unicast_bytes /sec
Ethtool(mlx5p1 ) stat: 9391316 ( 9,391,316) <=
tx_vport_unicast_packets /sec
[1]
https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
The net_devices stats says the NIC is processing zero packets:
$ sar -n DEV 2 1000
[...]
Average: IFACE rxpck/s txpck/s rxkB/s txkB/s
rxcmp/s txcmp/s rxmcst/s %ifutil
[...]
Average: mlx5p1 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00
Average: mlx5p2 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00
On Sun, 28 Nov 2021 19:54:53 +0200 Ido Schimmel wrote:
> > > Right, sure, I am also totally fine with having only a somewhat
> > > restricted subset of stats available at the interface level and make
> > > everything else be BPF-based. I'm hoping we can converge of a common
> > > understanding of what this "minimal set" should be :)
> > >
> > > Agreed. My immediate thought is that "XDP packets are interface packets"
> > > but that is certainly not what we do today, so not sure if changing it
> > > at this point would break things?
> >
> > I'd vote for taking the risk and trying to align all the drivers.
>
> I agree. I think IFLA_STATS64 in RTM_NEWLINK should contain statistics
> of all the packets seen by the netdev. The breakdown into software /
> hardware / XDP should be reported via RTM_NEWSTATS.
Hm, in the offload case "seen by the netdev" may be unclear. For
the offload case I believe our recommendation was phrased more like
"all packets which would be seen by the netdev if there was no
routing/tc offload", right?
> Currently, for soft devices such as VLANs, bridges and GRE, user space
> only sees statistics of packets forwarded by software, which is quite
> useless when forwarding is offloaded from the kernel to hardware.
>
> Petr is working on exposing hardware statistics for such devices via
> rtnetlink. Unlike XDP (?), we need to be able to let user space enable /
> disable hardware statistics as we have a limited number of hardware
> counters and they can also reduce the bandwidth when enabled. We are
> thinking of adding a new RTM_SETSTATS for that:
>
> # ip stats set dev swp1 hw_stats on
Does it belong on the switch port? Not the netdev we want to track?
> For query, something like (under discussion):
>
> # ip stats show dev swp1 // all groups
> # ip stats show dev swp1 group link
> # ip stats show dev swp1 group offload // all sub-groups
> # ip stats show dev swp1 group offload sub-group cpu
> # ip stats show dev swp1 group offload sub-group hw
>
> Like other iproute2 commands, these follow the nesting of the
> RTM_{NEW,GET}STATS uAPI.
But we do have IFLA_STATS_LINK_OFFLOAD_XSTATS, isn't it effectively
the same use case?
> Looking at patch #1 [1], I think that whatever you decide to expose for
> XDP can be queried via:
>
> # ip stats show dev swp1 group xdp
> # ip stats show dev swp1 group xdp sub-group regular
> # ip stats show dev swp1 group xdp sub-group xsk
>
> Regardless, the following command should show statistics of all the
> packets seen by the netdev:
>
> # ip -s link show dev swp1
>
> There is a PR [2] for node_exporter to use rtnetlink to fetch netdev
> statistics instead of the old proc interface. It should be possible to
> extend it to use RTM_*STATS for more fine-grained statistics.
>
> [1] https://lore.kernel.org/netdev/[email protected]/
> [2] https://github.com/prometheus/node_exporter/pull/2074
Nice!
Jakub Kicinski <[email protected]> writes:
> On Sun, 28 Nov 2021 19:54:53 +0200 Ido Schimmel wrote:
>> > > Right, sure, I am also totally fine with having only a somewhat
>> > > restricted subset of stats available at the interface level and make
>> > > everything else be BPF-based. I'm hoping we can converge of a common
>> > > understanding of what this "minimal set" should be :)
>> > >
>> > > Agreed. My immediate thought is that "XDP packets are interface packets"
>> > > but that is certainly not what we do today, so not sure if changing it
>> > > at this point would break things?
>> >
>> > I'd vote for taking the risk and trying to align all the drivers.
>>
>> I agree. I think IFLA_STATS64 in RTM_NEWLINK should contain statistics
>> of all the packets seen by the netdev. The breakdown into software /
>> hardware / XDP should be reported via RTM_NEWSTATS.
>
> Hm, in the offload case "seen by the netdev" may be unclear. For
> the offload case I believe our recommendation was phrased more like
> "all packets which would be seen by the netdev if there was no
> routing/tc offload", right?
Yes. The idea is to expose to Linux stats about traffic at conceptually
corresponding objects in the HW.
>
>> Currently, for soft devices such as VLANs, bridges and GRE, user space
>> only sees statistics of packets forwarded by software, which is quite
>> useless when forwarding is offloaded from the kernel to hardware.
>>
>> Petr is working on exposing hardware statistics for such devices via
>> rtnetlink. Unlike XDP (?), we need to be able to let user space enable /
>> disable hardware statistics as we have a limited number of hardware
>> counters and they can also reduce the bandwidth when enabled. We are
>> thinking of adding a new RTM_SETSTATS for that:
>>
>> # ip stats set dev swp1 hw_stats on
>
> Does it belong on the switch port? Not the netdev we want to track?
Yes, it does, and is designed that way. That was just muscle memory
typing that "swp1" above :)
You would do e.g. "ip stats set dev swp1.200 hw_stats on" or, "dev br1",
or something like that.
>> For query, something like (under discussion):
>>
>> # ip stats show dev swp1 // all groups
>> # ip stats show dev swp1 group link
>> # ip stats show dev swp1 group offload // all sub-groups
>> # ip stats show dev swp1 group offload sub-group cpu
>> # ip stats show dev swp1 group offload sub-group hw
>>
>> Like other iproute2 commands, these follow the nesting of the
>> RTM_{NEW,GET}STATS uAPI.
>
> But we do have IFLA_STATS_LINK_OFFLOAD_XSTATS, isn't it effectively
> the same use case?
IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest. Currently it carries just
CPU_HIT stats. The idea is to carry HW stats as well in that group.
>> Looking at patch #1 [1], I think that whatever you decide to expose for
>> XDP can be queried via:
>>
>> # ip stats show dev swp1 group xdp
>> # ip stats show dev swp1 group xdp sub-group regular
>> # ip stats show dev swp1 group xdp sub-group xsk
>>
>> Regardless, the following command should show statistics of all the
>> packets seen by the netdev:
>>
>> # ip -s link show dev swp1
>>
>> There is a PR [2] for node_exporter to use rtnetlink to fetch netdev
>> statistics instead of the old proc interface. It should be possible to
>> extend it to use RTM_*STATS for more fine-grained statistics.
>>
>> [1] https://lore.kernel.org/netdev/[email protected]/
>> [2] https://github.com/prometheus/node_exporter/pull/2074
>
> Nice!
Petr Machata <[email protected]> writes:
> Jakub Kicinski <[email protected]> writes:
>
>> On Sun, 28 Nov 2021 19:54:53 +0200 Ido Schimmel wrote:
>>> # ip stats set dev swp1 hw_stats on
>>
>> Does it belong on the switch port? Not the netdev we want to track?
>
> Yes, it does, and is designed that way. That was just muscle memory
> typing that "swp1" above :)
And by "yes, it does", I obviously meant "no, it doesn't". It does
belong to the device that you want counters for.
> You would do e.g. "ip stats set dev swp1.200 hw_stats on" or, "dev br1",
> or something like that.
On Mon, 29 Nov 2021 16:51:02 +0100 Petr Machata wrote:
> Jakub Kicinski <[email protected]> writes:
> > On Sun, 28 Nov 2021 19:54:53 +0200 Ido Schimmel wrote:
> >> I agree. I think IFLA_STATS64 in RTM_NEWLINK should contain statistics
> >> of all the packets seen by the netdev. The breakdown into software /
> >> hardware / XDP should be reported via RTM_NEWSTATS.
> >
> > Hm, in the offload case "seen by the netdev" may be unclear. For
> > the offload case I believe our recommendation was phrased more like
> > "all packets which would be seen by the netdev if there was no
> > routing/tc offload", right?
>
> Yes. The idea is to expose to Linux stats about traffic at conceptually
> corresponding objects in the HW.
Great.
> >> Currently, for soft devices such as VLANs, bridges and GRE, user space
> >> only sees statistics of packets forwarded by software, which is quite
> >> useless when forwarding is offloaded from the kernel to hardware.
> >>
> >> Petr is working on exposing hardware statistics for such devices via
> >> rtnetlink. Unlike XDP (?), we need to be able to let user space enable /
> >> disable hardware statistics as we have a limited number of hardware
> >> counters and they can also reduce the bandwidth when enabled. We are
> >> thinking of adding a new RTM_SETSTATS for that:
> >>
> >> # ip stats set dev swp1 hw_stats on
> >
> > Does it belong on the switch port? Not the netdev we want to track?
>
> Yes, it does, and is designed that way. That was just muscle memory
> typing that "swp1" above :)
>
> You would do e.g. "ip stats set dev swp1.200 hw_stats on" or, "dev br1",
> or something like that.
I see :)
> >> For query, something like (under discussion):
> >>
> >> # ip stats show dev swp1 // all groups
> >> # ip stats show dev swp1 group link
> >> # ip stats show dev swp1 group offload // all sub-groups
> >> # ip stats show dev swp1 group offload sub-group cpu
> >> # ip stats show dev swp1 group offload sub-group hw
> >>
> >> Like other iproute2 commands, these follow the nesting of the
> >> RTM_{NEW,GET}STATS uAPI.
> >
> > But we do have IFLA_STATS_LINK_OFFLOAD_XSTATS, isn't it effectively
> > the same use case?
>
> IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest. Currently it carries just
> CPU_HIT stats. The idea is to carry HW stats as well in that group.
Hm, the expectation was that the HW stats == total - SW. I believe that
still holds true for you, even if HW stats are not "complete" (e.g.
user enabled them after device was already forwarding for a while).
Is the concern about backward compat or such?
Jakub Kicinski <[email protected]> writes:
> On Mon, 29 Nov 2021 16:51:02 +0100 Petr Machata wrote:
>> Jakub Kicinski <[email protected]> writes:
>> > On Sun, 28 Nov 2021 19:54:53 +0200 Ido Schimmel wrote:
>> >> For query, something like (under discussion):
>> >>
>> >> # ip stats show dev swp1 // all groups
>> >> # ip stats show dev swp1 group link
>> >> # ip stats show dev swp1 group offload // all sub-groups
>> >> # ip stats show dev swp1 group offload sub-group cpu
>> >> # ip stats show dev swp1 group offload sub-group hw
>> >>
>> >> Like other iproute2 commands, these follow the nesting of the
>> >> RTM_{NEW,GET}STATS uAPI.
>> >
>> > But we do have IFLA_STATS_LINK_OFFLOAD_XSTATS, isn't it effectively
>> > the same use case?
>>
>> IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest. Currently it carries just
>> CPU_HIT stats. The idea is to carry HW stats as well in that group.
>
> Hm, the expectation was that the HW stats == total - SW. I believe that
> still holds true for you, even if HW stats are not "complete" (e.g.
> user enabled them after device was already forwarding for a while).
> Is the concern about backward compat or such?
I guess you could call it backward compat. But not only. I think a
typical user doing "ip -s l sh", including various scripts, wants to see
the full picture and not worry what's going on where. Physical
netdevices already do that, and by extension bond and team of physical
netdevices. It also makes sense from the point of view of an offloaded
datapath as an implementation detail that you would ideally not notice.
For those who care to know about the offloaded datapath, it would be
nice to have the option to request either just the SW stats, or just the
HW stats. A logical place to put these would be under the OFFLOAD_XSTATS
nest of the RTM_GETSTATS message, but maybe the SW ones should be up
there next to IFLA_STATS_LINK_64. (After all it's going to be
independent from not only offload datapath, but also XDP.)
This way you get the intuitive default behavior, but still have a way to
e.g. request just the SW stats without hitting the HW, or just request
the HW stats if that's what you care about.
On Mon, 29 Nov 2021 14:59:53 +0100 Jesper Dangaard Brouer wrote:
> Hmm... I don't agree here. IMHO the BPF-program's *choice* to drop (via
> XDP_DROP) should NOT share the counter with the driver-related drops.
>
> The driver-related drops must be accounted separate.
+1 FWIW. The Tx stat is a little misleading because it differs from the
definition of our other tx stats which mean _successfully_ transmitted
(and are accounted on the completion path in many drivers).
In the past I've used act_*, e.g. act_tx, to indicate the stat counts
returned actions, not whether the packet made it.
I still wonder whether it makes sense to count the stats per-action or
just have one "XDP consumed it" stat and that's it. The semantics of the
action are not of interest to the admin. A firewall can drop or tx
depending if it wants to send an ICMP reject or TCP RST message in
response. I need to know what the application does to understand the
difference, and if I do I can as well look at app stats. But I'm aware
I'm not going to find much support for this position, so just saying...
;)
On Mon, 29 Nov 2021 18:08:12 +0100 Petr Machata wrote:
> Jakub Kicinski <[email protected]> writes:
> > On Mon, 29 Nov 2021 16:51:02 +0100 Petr Machata wrote:
> >> IFLA_STATS_LINK_OFFLOAD_XSTATS is a nest. Currently it carries just
> >> CPU_HIT stats. The idea is to carry HW stats as well in that group.
> >
> > Hm, the expectation was that the HW stats == total - SW. I believe that
> > still holds true for you, even if HW stats are not "complete" (e.g.
> > user enabled them after device was already forwarding for a while).
> > Is the concern about backward compat or such?
>
> I guess you could call it backward compat. But not only. I think a
> typical user doing "ip -s l sh", including various scripts, wants to see
> the full picture and not worry what's going on where. Physical
> netdevices already do that, and by extension bond and team of physical
> netdevices. It also makes sense from the point of view of an offloaded
> datapath as an implementation detail that you would ideally not notice.
Agreed.
> For those who care to know about the offloaded datapath, it would be
> nice to have the option to request either just the SW stats, or just the
> HW stats. A logical place to put these would be under the OFFLOAD_XSTATS
> nest of the RTM_GETSTATS message, but maybe the SW ones should be up
> there next to IFLA_STATS_LINK_64. (After all it's going to be
> independent from not only offload datapath, but also XDP.)
What I'm getting at is that I thought IFLA_OFFLOAD_XSTATS_CPU_HIT
should be sufficient from uAPI perspective in terms of reporting.
User space can do the simple math to calculate the "SW stats" if
it wants to. We may well be talking about the same thing, so maybe
let's wait for the code?
> This way you get the intuitive default behavior, but still have a way to
> e.g. request just the SW stats without hitting the HW, or just request
> the HW stats if that's what you care about.
On 11/23/21 9:39 AM, Alexander Lobakin wrote:
> +static bool rtnl_get_xdp_stats_xdpxsk(struct sk_buff *skb, u32 ch,
> + const void *attr_data)
> +{
> + const struct ifla_xdp_stats *xstats = attr_data;
> +
> + xstats += ch;
> +
> + if (nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_PACKETS, xstats->packets,
> + IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_BYTES, xstats->bytes,
> + IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_ERRORS, xstats->errors,
> + IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_ABORTED, xstats->aborted,
> + IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_DROP, xstats->drop,
> + IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_INVALID, xstats->invalid,
> + IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_PASS, xstats->pass,
> + IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_REDIRECT, xstats->redirect,
> + IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_REDIRECT_ERRORS,
> + xstats->redirect_errors,
> + IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_TX, xstats->tx,
> + IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_TX_ERRORS,
> + xstats->tx_errors, IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_PACKETS,
> + xstats->xmit_packets, IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_BYTES,
> + xstats->xmit_bytes, IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_ERRORS,
> + xstats->xmit_errors, IFLA_XDP_XSTATS_UNSPEC) ||
> + nla_put_u64_64bit(skb, IFLA_XDP_XSTATS_XMIT_FULL,
> + xstats->xmit_full, IFLA_XDP_XSTATS_UNSPEC))
> + return false;
> +
> + return true;
> +}
> +
Another thought on this patch: with individual attributes you could save
some overhead by not sending 0 counters to userspace. e.g., define a
helper that does:
static inline int nla_put_u64_if_set(struct sk_buff *skb, int attrtype,
u64 value, int padattr)
{
if (value)
return nla_put_u64_64bit(skb, attrtype, value, padattr);
return 0;
}
Jakub Kicinski <[email protected]> writes:
> On Mon, 29 Nov 2021 18:08:12 +0100 Petr Machata wrote:
>> For those who care to know about the offloaded datapath, it would be
>> nice to have the option to request either just the SW stats, or just the
>> HW stats. A logical place to put these would be under the OFFLOAD_XSTATS
>> nest of the RTM_GETSTATS message, but maybe the SW ones should be up
>> there next to IFLA_STATS_LINK_64. (After all it's going to be
>> independent from not only offload datapath, but also XDP.)
>
> What I'm getting at is that I thought IFLA_OFFLOAD_XSTATS_CPU_HIT
> should be sufficient from uAPI perspective in terms of reporting.
> User space can do the simple math to calculate the "SW stats" if
> it wants to. We may well be talking about the same thing, so maybe
> let's wait for the code?
Ha, OK, now I understand. Yeah, CPU_HIT actually does fit the bill for
the traffic that took place in SW. We can reuse it.
I still think it would be better to report HW_STATS explicitly as well
though. One reason is simply convenience. The other is that OK, now we
have SW stats, and XDP stats, and total stats, and I (as a client) don't
necessarily know how it all fits together. But the contract for HW_STATS
is very clear.
On Tue, 30 Nov 2021 12:55:47 +0100 Petr Machata wrote:
> I still think it would be better to report HW_STATS explicitly as well
> though. One reason is simply convenience. The other is that OK, now we
> have SW stats, and XDP stats, and total stats, and I (as a client) don't
> necessarily know how it all fits together. But the contract for HW_STATS
> is very clear.
Would be good to check with Jiri, my recollection is that this argument
was brought up when CPU_HIT stats were added. I don't recall the
reasoning.
<insert xkcd standards>
From: Alexander Lobakin <[email protected]>
Date: Tue, 23 Nov 2021 17:39:29 +0100
Ok, open questions:
1. Channels vs queues vs global.
Jakub: no per-channel.
David (Ahern): it's worth it to separate as Rx/Tx.
Toke is fine with globals at the end I think?
My point was that for most of the systems we have 1:1 Rx:Tx
(usually num_online_cpus()), so asking drivers separately for
the number of RQs and then SQs would end up asking for the same
number twice.
But the main reason TBH was that most of the drivers store stats
on a per-channel basis and I didn't want them to regress in
functionality. I'm fine with reporting only netdev-wide if
everyone are.
In case if we keep per-channel: report per-channel only by request
and cumulative globals by default to not flood the output?
2. Count all errors as "drops" vs separately.
Daniel: account everything as drops, plus errors should be
reported as exceptions for tracing sub.
Jesper: we shouldn't mix drops and errors.
My point: we shouldn't, that's why there are patches for 2 drivers
to give errors a separate counter.
I provided an option either to report all errors together ('errors'
in stats structure) or to provide individual counters for each of
them (sonamed ctrs), but personally prefer detailed errors. However,
they might "go detailed" under trace_xdp_exception() only, sound
fine (OTOH in RTNL stats we have both "general" errors and detailed
error counters).
3. XDP and XSK ctrs separately or not.
My PoV is that those are two quite different worlds.
However, stats for actions on XSK really make a little sense since
99% of time we have xskmap redirect. So I think it'd be fine to just
expand stats structure with xsk_{rx,tx}_{packets,bytes} and count
the rest (actions, errors) together with XDP.
Rest:
- don't create a separate `ip` command and report under `-s`;
- save some RTNL skb space by skipping zeroed counters.
Also, regarding that I count all on the stack and then add to the
storage once in a polling cycle -- most drivers don't do that and
just increment the values in the storage directly, but this can be
less performant for frequently updated stats (or it's just my
embedded past).
Re u64 vs u64_stats_t -- the latter is more universal and
architecture-friendly, the former is used directly in most of the
drivers primarily because those drivers and the corresponding HW
are being run on 64-bit systems in the vast majority of cases, and
Ethtools stats themselves are not so critical to guard them with
anti-tearing. Anyways, local64_t is cheap on ARM64/x86_64 I guess?
Thanks,
Al
On Tue, 30 Nov 2021 16:56:12 +0100 Alexander Lobakin wrote:
> 3. XDP and XSK ctrs separately or not.
>
> My PoV is that those are two quite different worlds.
> However, stats for actions on XSK really make a little sense since
> 99% of time we have xskmap redirect. So I think it'd be fine to just
> expand stats structure with xsk_{rx,tx}_{packets,bytes} and count
> the rest (actions, errors) together with XDP.
>
>
> Rest:
> - don't create a separate `ip` command and report under `-s`;
> - save some RTNL skb space by skipping zeroed counters.
Let me ruin this point of clarity for you. I think that stats should
be skipped when they are not collected (see ETHTOOL_STAT_NOT_SET).
If messages get large user should use the GETSTATS call and avoid
the problem more effectively.
> Also, regarding that I count all on the stack and then add to the
> storage once in a polling cycle -- most drivers don't do that and
> just increment the values in the storage directly, but this can be
> less performant for frequently updated stats (or it's just my
> embedded past).
> Re u64 vs u64_stats_t -- the latter is more universal and
> architecture-friendly, the former is used directly in most of the
> drivers primarily because those drivers and the corresponding HW
> are being run on 64-bit systems in the vast majority of cases, and
> Ethtools stats themselves are not so critical to guard them with
> anti-tearing. Anyways, local64_t is cheap on ARM64/x86_64 I guess?
Alexander Lobakin <[email protected]> writes:
> From: Alexander Lobakin <[email protected]>
> Date: Tue, 23 Nov 2021 17:39:29 +0100
>
> Ok, open questions:
>
> 1. Channels vs queues vs global.
>
> Jakub: no per-channel.
> David (Ahern): it's worth it to separate as Rx/Tx.
> Toke is fine with globals at the end I think?
Well, I don't like throwing data away, so in that sense I do like
per-queue stats, but it's not a very strong preference (i.e., I can live
with either)...
> My point was that for most of the systems we have 1:1 Rx:Tx
> (usually num_online_cpus()), so asking drivers separately for
> the number of RQs and then SQs would end up asking for the same
> number twice.
> But the main reason TBH was that most of the drivers store stats
> on a per-channel basis and I didn't want them to regress in
> functionality. I'm fine with reporting only netdev-wide if
> everyone are.
>
> In case if we keep per-channel: report per-channel only by request
> and cumulative globals by default to not flood the output?
... however if we do go with per-channel stats I do agree that they
shouldn't be in the default output. I guess netlink could still split
them out and iproute2 could just sum them before display?
> 2. Count all errors as "drops" vs separately.
>
> Daniel: account everything as drops, plus errors should be
> reported as exceptions for tracing sub.
> Jesper: we shouldn't mix drops and errors.
>
> My point: we shouldn't, that's why there are patches for 2 drivers
> to give errors a separate counter.
> I provided an option either to report all errors together ('errors'
> in stats structure) or to provide individual counters for each of
> them (sonamed ctrs), but personally prefer detailed errors. However,
> they might "go detailed" under trace_xdp_exception() only, sound
> fine (OTOH in RTNL stats we have both "general" errors and detailed
> error counters).
I agree it would be nice to have a separate error counter, but a single
counter is enough when combined with the tracepoints.
> 3. XDP and XSK ctrs separately or not.
>
> My PoV is that those are two quite different worlds.
> However, stats for actions on XSK really make a little sense since
> 99% of time we have xskmap redirect. So I think it'd be fine to just
> expand stats structure with xsk_{rx,tx}_{packets,bytes} and count
> the rest (actions, errors) together with XDP.
A whole set of separate counters for XSK is certainly overkill. No
strong preference as to whether they need a separate counter at all...
> Rest:
> - don't create a separate `ip` command and report under `-s`;
> - save some RTNL skb space by skipping zeroed counters.
>
> Also, regarding that I count all on the stack and then add to the
> storage once in a polling cycle -- most drivers don't do that and
> just increment the values in the storage directly, but this can be
> less performant for frequently updated stats (or it's just my
> embedded past).
> Re u64 vs u64_stats_t -- the latter is more universal and
> architecture-friendly, the former is used directly in most of the
> drivers primarily because those drivers and the corresponding HW
> are being run on 64-bit systems in the vast majority of cases, and
> Ethtools stats themselves are not so critical to guard them with
> anti-tearing. Anyways, local64_t is cheap on ARM64/x86_64 I guess?
I'm generally a fan of correctness first, so since you're touching all
the drivers anyway why I'd say go for u64_stats_t :)
-Toke
From: Jakub Kicinski <[email protected]>
Date: Tue, 30 Nov 2021 08:12:07 -0800
> On Tue, 30 Nov 2021 16:56:12 +0100 Alexander Lobakin wrote:
> > 3. XDP and XSK ctrs separately or not.
> >
> > My PoV is that those are two quite different worlds.
> > However, stats for actions on XSK really make a little sense since
> > 99% of time we have xskmap redirect. So I think it'd be fine to just
> > expand stats structure with xsk_{rx,tx}_{packets,bytes} and count
> > the rest (actions, errors) together with XDP.
> >
> >
> > Rest:
> > - don't create a separate `ip` command and report under `-s`;
> > - save some RTNL skb space by skipping zeroed counters.
>
> Let me ruin this point of clarity for you. I think that stats should
> be skipped when they are not collected (see ETHTOOL_STAT_NOT_SET).
> If messages get large user should use the GETSTATS call and avoid
> the problem more effectively.
Well, it was Dave's thought here: [0]
> Another thought on this patch: with individual attributes you could save
> some overhead by not sending 0 counters to userspace. e.g., define a
> helper that does:
I know about ETHTOOL_STAT_NOT_SET, but RTNL xstats doesn't use this,
does it?
GETSTATS is another thing, and I'll use it, thanks.
>
> > Also, regarding that I count all on the stack and then add to the
> > storage once in a polling cycle -- most drivers don't do that and
> > just increment the values in the storage directly, but this can be
> > less performant for frequently updated stats (or it's just my
> > embedded past).
> > Re u64 vs u64_stats_t -- the latter is more universal and
> > architecture-friendly, the former is used directly in most of the
> > drivers primarily because those drivers and the corresponding HW
> > are being run on 64-bit systems in the vast majority of cases, and
> > Ethtools stats themselves are not so critical to guard them with
> > anti-tearing. Anyways, local64_t is cheap on ARM64/x86_64 I guess?
[0] https://lore.kernel.org/netdev/[email protected]/
Al
On Tue, 30 Nov 2021 17:34:54 +0100 Alexander Lobakin wrote:
> > Another thought on this patch: with individual attributes you could save
> > some overhead by not sending 0 counters to userspace. e.g., define a
> > helper that does:
>
> I know about ETHTOOL_STAT_NOT_SET, but RTNL xstats doesn't use this,
> does it?
Not sure if you're asking me or Dave but no, to my knowledge RTNL does
not use such semantics today. But the reason is mostly because there
weren't many driver stats added there. Knowing if an error counter is
not supported or supporter and 0 is important for monitoring. Even if
XDP stats don't have a counter which may not be supported today it's
not a good precedent to make IMO.
On Tue, 30 Nov 2021 17:17:24 +0100 Toke Høiland-Jørgensen wrote:
> > 1. Channels vs queues vs global.
> >
> > Jakub: no per-channel.
> > David (Ahern): it's worth it to separate as Rx/Tx.
> > Toke is fine with globals at the end I think?
>
> Well, I don't like throwing data away, so in that sense I do like
> per-queue stats, but it's not a very strong preference (i.e., I can live
> with either)...
We don't even have a clear definition of a queue in Linux.
As I said, adding this API today without a strong user and letting
drivers diverge in behavior would be a mistake.
On 11/30/21 10:04 AM, Jakub Kicinski wrote:
> On Tue, 30 Nov 2021 17:34:54 +0100 Alexander Lobakin wrote:
>>> Another thought on this patch: with individual attributes you could save
>>> some overhead by not sending 0 counters to userspace. e.g., define a
>>> helper that does:
>>
>> I know about ETHTOOL_STAT_NOT_SET, but RTNL xstats doesn't use this,
>> does it?
>
> Not sure if you're asking me or Dave but no, to my knowledge RTNL does
> not use such semantics today. But the reason is mostly because there
> weren't many driver stats added there. Knowing if an error counter is
> not supported or supporter and 0 is important for monitoring. Even if
> XDP stats don't have a counter which may not be supported today it's
> not a good precedent to make IMO.
>
Today, stats are sent as a struct so skipping stats whose value is 0 is
not an option. When using individual attributes for the counters this
becomes an option. Given there is no value in sending '0' why do it?
Is your pushback that there should be a uapi to opt-in to this behavior?
On 11/30/21 8:56 AM, Alexander Lobakin wrote:
> Rest:
> - don't create a separate `ip` command and report under `-s`;
Reporting XDP stats under 'ip -s' is not going to be scalable from a
readability perspective.
ifstat (misc/ifstat.c) has support for extended stats which is where you
are adding these.
On 11/30/21 10:07 AM, Jakub Kicinski wrote:
> On Tue, 30 Nov 2021 17:17:24 +0100 Toke Høiland-Jørgensen wrote:
>>> 1. Channels vs queues vs global.
>>>
>>> Jakub: no per-channel.
>>> David (Ahern): it's worth it to separate as Rx/Tx.
>>> Toke is fine with globals at the end I think?
>>
>> Well, I don't like throwing data away, so in that sense I do like
>> per-queue stats, but it's not a very strong preference (i.e., I can live
>> with either)...
>
> We don't even have a clear definition of a queue in Linux.
>
The summary above says "Jakub: no per-channel", and then you have this
comment about a clear definition of a queue. What is your preference
here, Jakub? I think I have gotten lost in all of the coments.
My request was just to not lump Rx and Tx together under a 'channel'
definition as a new API. Proposals like zctap and 'queues as a first
class citizen' are examples of intentions / desires to move towards Rx
and Tx queues beyond what exists today.
From: Shay Agroskin <[email protected]>
Date: Mon, 29 Nov 2021 15:34:19 +0200
> Alexander Lobakin <[email protected]> writes:
>
> > ena driver has 6 XDP counters collected per-channel. Add
> > callbacks
> > for getting the number of channels and those counters using
> > generic
> > XDP stats infra.
> >
> > Signed-off-by: Alexander Lobakin <[email protected]>
> > Reviewed-by: Jesse Brandeburg <[email protected]>
> > ---
> > drivers/net/ethernet/amazon/ena/ena_netdev.c | 53
> > ++++++++++++++++++++
> > 1 file changed, 53 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> > b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> > index 7d5d885d85d5..83e9b85cc998 100644
> > --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> > +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> > @@ -3313,12 +3313,65 @@ static void ena_get_stats64(struct
> > net_device *netdev,
> > stats->tx_errors = 0;
> > }
> >
> > +static int ena_get_xdp_stats_nch(const struct net_device
> > *netdev, u32 attr_id)
> > +{
> > + const struct ena_adapter *adapter = netdev_priv(netdev);
> > +
> > + switch (attr_id) {
> > + case IFLA_XDP_XSTATS_TYPE_XDP:
> > + return adapter->num_io_queues;
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +}
> > +
> > +static int ena_get_xdp_stats(const struct net_device *netdev,
> > u32 attr_id,
> > + void *attr_data)
> > +{
> > + const struct ena_adapter *adapter = netdev_priv(netdev);
> > + struct ifla_xdp_stats *xdp_stats = attr_data;
> > + u32 i;
> > +
> > + switch (attr_id) {
> > + case IFLA_XDP_XSTATS_TYPE_XDP:
> > + break;
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + for (i = 0; i < adapter->num_io_queues; i++) {
> > + const struct u64_stats_sync *syncp;
> > + const struct ena_stats_rx *stats;
> > + u32 start;
> > +
> > + stats = &adapter->rx_ring[i].rx_stats;
> > + syncp = &adapter->rx_ring[i].syncp;
> > +
> > + do {
> > + start = u64_stats_fetch_begin_irq(syncp);
> > +
> > + xdp_stats->drop = stats->xdp_drop;
> > + xdp_stats->pass = stats->xdp_pass;
> > + xdp_stats->tx = stats->xdp_tx;
> > + xdp_stats->redirect = stats->xdp_redirect;
> > + xdp_stats->aborted = stats->xdp_aborted;
> > + xdp_stats->invalid = stats->xdp_invalid;
> > + } while (u64_stats_fetch_retry_irq(syncp, start));
> > +
> > + xdp_stats++;
> > + }
> > +
> > + return 0;
> > +}
> > +
>
> Hi,
> thank you for the time you took in adding ENA support, this code
> doesn't update the XDP TX queues (which only available when an XDP
> program is loaded).
>
> In theory the following patch should fix it, but I was unable to
> compile your version of iproute2 and test the patch properly. Can
> you please let me know if I need to do anything special to bring
> up your version of iproute2 and test this patch?
Did you clone 'xdp_stats' branch? I've just rechecked on a freshly
cloned copy, works for me.
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 7d5d885d8..4e89a7d60 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -3313,12 +3313,85 @@ static void ena_get_stats64(struct
> net_device *netdev,
> stats->tx_errors = 0;
> }
>
> +static int ena_get_xdp_stats_nch(const struct net_device *netdev,
> u32 attr_id)
> +{
> + const struct ena_adapter *adapter = netdev_priv(netdev);
> +
> + switch (attr_id) {
> + case IFLA_XDP_XSTATS_TYPE_XDP:
> + return adapter->num_io_queues;
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +static int ena_get_xdp_stats(const struct net_device *netdev, u32
> attr_id,
> + void *attr_data)
> +{
> + const struct ena_adapter *adapter = netdev_priv(netdev);
> + struct ifla_xdp_stats *xdp_stats = attr_data;
> + const struct u64_stats_sync *syncp;
> + u32 start;
> + u32 i;
> +
> + switch (attr_id) {
> + case IFLA_XDP_XSTATS_TYPE_XDP:
> + break;
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + for (i = 0; i < adapter->num_io_queues; i++) {
> + const struct ena_stats_rx *rx_stats;
> +
> + rx_stats = &adapter->rx_ring[i].rx_stats;
> + syncp = &adapter->rx_ring[i].syncp;
> +
> + do {
> + start = u64_stats_fetch_begin_irq(syncp);
> +
> + xdp_stats->drop = rx_stats->xdp_drop;
> + xdp_stats->pass = rx_stats->xdp_pass;
> + xdp_stats->tx = rx_stats->xdp_tx;
> + xdp_stats->redirect =
> rx_stats->xdp_redirect;
> + xdp_stats->aborted =
> rx_stats->xdp_aborted;
> + xdp_stats->invalid =
> rx_stats->xdp_invalid;
> + } while (u64_stats_fetch_retry_irq(syncp, start));
> +
> + xdp_stats++;
> + }
> +
> + xdp_stats = attr_data;
> + /* xdp_num_queues can be 0 if an XDP program isn't loaded
> */
> + for (i = 0; i < adapter->xdp_num_queues; i++) {
> + const struct ena_stats_tx *tx_stats;
> +
> + tx_stats =
> &adapter->rx_ring[i].xdp_ring->tx_stats;
> + syncp = &adapter->rx_ring[i].xdp_ring->syncp;
> +
> + do {
> + start = u64_stats_fetch_begin_irq(syncp);
> +
> + xdp_stats->xmit_packets = tx_stats->cnt;
> + xdp_stats->xmit_bytes = tx_stats->bytes;
> + xdp_stats->xmit_errors =
> tx_stats->dma_mapping_err +
> +
> tx_stats->prepare_ctx_err;
> + } while (u64_stats_fetch_retry_irq(syncp, start));
> +
> + xdp_stats++;
> + }
> +
> + return 0;
> +}
> +
> static const struct net_device_ops ena_netdev_ops = {
> .ndo_open = ena_open,
> .ndo_stop = ena_close,
> .ndo_start_xmit = ena_start_xmit,
> .ndo_select_queue = ena_select_queue,
> .ndo_get_stats64 = ena_get_stats64,
> + .ndo_get_xdp_stats_nch = ena_get_xdp_stats_nch,
> + .ndo_get_xdp_stats = ena_get_xdp_stats,
> .ndo_tx_timeout = ena_tx_timeout,
> .ndo_change_mtu = ena_change_mtu,
> .ndo_set_mac_address = NULL,
I'll update it in v3 and mention you, thanks!
Al
On Tue, 30 Nov 2021 10:38:14 -0700 David Ahern wrote:
> On 11/30/21 10:04 AM, Jakub Kicinski wrote:
> > On Tue, 30 Nov 2021 17:34:54 +0100 Alexander Lobakin wrote:
> >> I know about ETHTOOL_STAT_NOT_SET, but RTNL xstats doesn't use this,
> >> does it?
> >
> > Not sure if you're asking me or Dave but no, to my knowledge RTNL does
> > not use such semantics today. But the reason is mostly because there
> > weren't many driver stats added there. Knowing if an error counter is
> > not supported or supporter and 0 is important for monitoring. Even if
> > XDP stats don't have a counter which may not be supported today it's
> > not a good precedent to make IMO.
>
> Today, stats are sent as a struct so skipping stats whose value is 0 is
> not an option. When using individual attributes for the counters this
> becomes an option. Given there is no value in sending '0' why do it?
To establish semantics of what it means that the statistic is not
reported. If we need to save space we'd need an extra attr with
a bitmap of "these stats were skipped because they were zero".
Or conversely some way of querying supported stats.
> Is your pushback that there should be a uapi to opt-in to this behavior?
Not where I was going with it, but it is an option. If skipping 0s was
controlled by a flag a dump without such flag set would basically serve
as a way to query supported stats.
On Tue, 30 Nov 2021 10:56:26 -0700 David Ahern wrote:
> On 11/30/21 10:07 AM, Jakub Kicinski wrote:
> > On Tue, 30 Nov 2021 17:17:24 +0100 Toke Høiland-Jørgensen wrote:
> >> Well, I don't like throwing data away, so in that sense I do like
> >> per-queue stats, but it's not a very strong preference (i.e., I can live
> >> with either)...
> >
> > We don't even have a clear definition of a queue in Linux.
> >
>
> The summary above says "Jakub: no per-channel", and then you have this
> comment about a clear definition of a queue. What is your preference
> here, Jakub? I think I have gotten lost in all of the coments.
I'm against per-channel and against per-queue stats. I'm not saying "do
one instead of the other". Hope that makes it clear.
> My request was just to not lump Rx and Tx together under a 'channel'
> definition as a new API. Proposals like zctap and 'queues as a first
> class citizen' are examples of intentions / desires to move towards Rx
> and Tx queues beyond what exists today.
Right, and when we have the objects to control those we'll hang the
stats off them. Right now half of the NICs will destroy queue stats
on random reconfiguration requests, others will mix the stats between
queue instantiations.. mlx5 does it's shadow queue thing. It's a mess.
uAPI which is not portable and not usable in production is pure burden.
On 2021-11-30 12:38, David Ahern wrote:
> Today, stats are sent as a struct so skipping stats whose value is 0 is
> not an option. When using individual attributes for the counters this
> becomes an option. Given there is no value in sending '0' why do it?
>
> Is your pushback that there should be a uapi to opt-in to this behavior?
A filter in the netlink request should help pick what is user-preferred.
You can default to not sending zeros.
cheers,
jamal