To support flow steering in GVE driver, there are two adminq changes
need to be made in advance. The first one is adding adminq mutex lock,
which is to allow the incoming flow steering operations to be able to
temporarily drop the rtnl_lock to reduce the latency for registering
flow rules among several NICs at the same time. The second one is to
add the extended adminq command so that we can support larger adminq
command such as configure_flow_rule command. The other three patches
are needed for the actual flow steering feature support.
Jeroen de Borst (4):
gve: Add adminq extended command
gve: Add flow steering device option
gve: Add flow steering adminq commands
gve: Add flow steering ethtool support
Ziwei Xiao (1):
gve: Add adminq mutex lock
drivers/net/ethernet/google/gve/Makefile | 2 +-
drivers/net/ethernet/google/gve/gve.h | 53 +++-
drivers/net/ethernet/google/gve/gve_adminq.c | 228 +++++++++++++-
drivers/net/ethernet/google/gve/gve_adminq.h | 98 ++++++
drivers/net/ethernet/google/gve/gve_ethtool.c | 91 +++++-
.../net/ethernet/google/gve/gve_flow_rule.c | 296 ++++++++++++++++++
drivers/net/ethernet/google/gve/gve_main.c | 86 ++++-
7 files changed, 829 insertions(+), 25 deletions(-)
create mode 100644 drivers/net/ethernet/google/gve/gve_flow_rule.c
--
2.45.0.rc1.225.g2a3ae87e7f-goog
From: Jeroen de Borst <[email protected]>
Adding new adminq commands for the driver to configure and query flow
rules that are stored in the device. Flow steering rules are assigned
with a location that determines the relative order of the rules.
Flow rules can run up to an order of millions. In such cases, storing
a full copy of the rules in the driver to prepare for the ethtool query
is infeasible while querying them from the device is better. That needs
to be optimized too so that we don't send a lot of adminq commands. The
solution here is to store a limited number of rules/rule ids in the
driver in a cache. This patch uses dma_pool to allocate 4k bytes which
lets device write at most 46 flow rules(4k/88) or 1k rule ids(4096/4)
at a time.
For configuring flow rules, there are 3 sub-commands:
- ADD which adds a rule at the location supplied
- DEL which deletes the rule at the location supplied
- RESET which clears all currently active rules in the device
For querying flow rules, there are also 3 sub-commands:
- QUERY_RULES corresponds to ETHTOOL_GRXCLSRULE. It fills the rules in
the allocated cache after querying the device
- QUERY_RULES_IDS corresponds to ETHTOOL_GRXCLSRLALL. It fills the
rule_ids in the allocated cache after querying the device
- QUERY_RULES_STATS corresponds to ETHTOOL_GRXCLSRLCNT. It queries the
device's current flow rule number and the supported max flow rule
limit
Signed-off-by: Jeroen de Borst <[email protected]>
Co-developed-by: Ziwei Xiao <[email protected]>
Signed-off-by: Ziwei Xiao <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
drivers/net/ethernet/google/gve/gve.h | 42 ++++++
drivers/net/ethernet/google/gve/gve_adminq.c | 133 ++++++++++++++++++
drivers/net/ethernet/google/gve/gve_adminq.h | 75 ++++++++++
drivers/net/ethernet/google/gve/gve_ethtool.c | 5 +-
drivers/net/ethernet/google/gve/gve_main.c | 54 ++++++-
5 files changed, 307 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 58213c15e084..355ae797eacf 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -60,6 +60,10 @@
#define GVE_DEFAULT_RX_BUFFER_OFFSET 2048
+#define GVE_FLOW_RULES_CACHE_SIZE (GVE_ADMINQ_BUFFER_SIZE / sizeof(struct gve_flow_rule))
+#define GVE_FLOW_RULE_IDS_CACHE_SIZE \
+ (GVE_ADMINQ_BUFFER_SIZE / sizeof(((struct gve_flow_rule *)0)->location))
+
#define GVE_XDP_ACTIONS 5
#define GVE_GQ_TX_MIN_PKT_DESC_BYTES 182
@@ -678,6 +682,39 @@ enum gve_queue_format {
GVE_DQO_QPL_FORMAT = 0x4,
};
+struct gve_flow_spec {
+ __be32 src_ip[4];
+ __be32 dst_ip[4];
+ union {
+ struct {
+ __be16 src_port;
+ __be16 dst_port;
+ };
+ __be32 spi;
+ };
+ union {
+ u8 tos;
+ u8 tclass;
+ };
+};
+
+struct gve_flow_rule {
+ u32 location;
+ u16 flow_type;
+ u16 action;
+ struct gve_flow_spec key;
+ struct gve_flow_spec mask;
+};
+
+struct gve_flow_rules_cache {
+ bool rules_cache_synced; /* False if the driver's rules_cache is outdated */
+ struct gve_flow_rule *rules_cache;
+ u32 *rule_ids_cache;
+ /* The total number of queried rules that stored in the caches */
+ u32 rules_cache_num;
+ u32 rule_ids_cache_num;
+};
+
struct gve_priv {
struct net_device *dev;
struct gve_tx_ring *tx; /* array of tx_cfg.num_queues */
@@ -744,6 +781,8 @@ struct gve_priv {
u32 adminq_report_link_speed_cnt;
u32 adminq_get_ptype_map_cnt;
u32 adminq_verify_driver_compatibility_cnt;
+ u32 adminq_query_flow_rules_cnt;
+ u32 adminq_cfg_flow_rule_cnt;
/* Global stats */
u32 interface_up_cnt; /* count of times interface turned up since last reset */
@@ -788,6 +827,9 @@ struct gve_priv {
bool header_split_enabled; /* True if the header split is enabled by the user */
u32 max_flow_rules;
+ u32 num_flow_rules; /* Current number of flow rules registered in the device */
+
+ struct gve_flow_rules_cache flow_rules_cache;
};
enum gve_service_task_flags_bit {
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 85d0d742ad21..7a929e20cf96 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -287,6 +287,8 @@ int gve_adminq_alloc(struct device *dev, struct gve_priv *priv)
priv->adminq_report_stats_cnt = 0;
priv->adminq_report_link_speed_cnt = 0;
priv->adminq_get_ptype_map_cnt = 0;
+ priv->adminq_query_flow_rules_cnt = 0;
+ priv->adminq_cfg_flow_rule_cnt = 0;
/* Setup Admin queue with the device */
if (priv->pdev->revision < 0x1) {
@@ -526,6 +528,12 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
case GVE_ADMINQ_VERIFY_DRIVER_COMPATIBILITY:
priv->adminq_verify_driver_compatibility_cnt++;
break;
+ case GVE_ADMINQ_QUERY_FLOW_RULES:
+ priv->adminq_query_flow_rules_cnt++;
+ break;
+ case GVE_ADMINQ_CONFIGURE_FLOW_RULE:
+ priv->adminq_cfg_flow_rule_cnt++;
+ break;
default:
dev_err(&priv->pdev->dev, "unknown AQ command opcode %d\n", opcode);
}
@@ -1188,3 +1196,128 @@ int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
ptype_map_bus);
return err;
}
+
+static int
+gve_adminq_configure_flow_rule(struct gve_priv *priv,
+ struct gve_adminq_configure_flow_rule *flow_rule_cmd)
+{
+ int err = gve_adminq_execute_extended_cmd(priv,
+ GVE_ADMINQ_CONFIGURE_FLOW_RULE,
+ sizeof(struct gve_adminq_configure_flow_rule),
+ flow_rule_cmd);
+
+ if (err) {
+ dev_err(&priv->pdev->dev, "Timeout to configure the flow rule, trigger reset");
+ gve_reset(priv, true);
+ } else {
+ priv->flow_rules_cache.rules_cache_synced = false;
+ }
+
+ return err;
+}
+
+int gve_adminq_add_flow_rule(struct gve_priv *priv, struct gve_adminq_flow_rule *rule, u32 loc)
+{
+ struct gve_adminq_configure_flow_rule flow_rule_cmd = {
+ .opcode = cpu_to_be16(GVE_RULE_ADD),
+ .location = cpu_to_be32(loc),
+ .rule = *rule,
+ };
+
+ return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
+}
+
+int gve_adminq_del_flow_rule(struct gve_priv *priv, u32 loc)
+{
+ struct gve_adminq_configure_flow_rule flow_rule_cmd = {
+ .opcode = cpu_to_be16(GVE_RULE_DEL),
+ .location = cpu_to_be32(loc),
+ };
+
+ return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
+}
+
+int gve_adminq_reset_flow_rules(struct gve_priv *priv)
+{
+ struct gve_adminq_configure_flow_rule flow_rule_cmd = {
+ .opcode = cpu_to_be16(GVE_RULE_RESET),
+ };
+
+ return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
+}
+
+/* In the dma memory that the driver allocated for the device to query the flow rules, the device
+ * will first write it with a struct of gve_query_flow_rules_descriptor. Next to it, the device
+ * will write an array of rules or rule ids with the count that specified in the descriptor.
+ * For GVE_FLOW_RULE_QUERY_STATS, the device will only write the descriptor.
+ */
+static int gve_adminq_process_flow_rules_query(struct gve_priv *priv, u16 query_opcode,
+ struct gve_query_flow_rules_descriptor *descriptor)
+{
+ struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
+ u32 num_queried_rules, total_memory_len, rule_info_len;
+ void *descriptor_end, *rule_info;
+
+ total_memory_len = be32_to_cpu(descriptor->total_length);
+ if (total_memory_len > GVE_ADMINQ_BUFFER_SIZE) {
+ dev_err(&priv->dev->dev, "flow rules query is out of memory.\n");
+ return -ENOMEM;
+ }
+
+ num_queried_rules = be32_to_cpu(descriptor->num_queried_rules);
+ descriptor_end = (void *)descriptor + total_memory_len;
+ rule_info = (void *)(descriptor + 1);
+
+ switch (query_opcode) {
+ case GVE_FLOW_RULE_QUERY_RULES:
+ rule_info_len = num_queried_rules * sizeof(*flow_rules_cache->rules_cache);
+
+ memcpy(flow_rules_cache->rules_cache, rule_info, rule_info_len);
+ flow_rules_cache->rules_cache_num = num_queried_rules;
+ break;
+ case GVE_FLOW_RULE_QUERY_IDS:
+ rule_info_len = num_queried_rules * sizeof(*flow_rules_cache->rule_ids_cache);
+
+ memcpy(flow_rules_cache->rule_ids_cache, rule_info, rule_info_len);
+ flow_rules_cache->rule_ids_cache_num = num_queried_rules;
+ break;
+ case GVE_FLOW_RULE_QUERY_STATS:
+ priv->num_flow_rules = be32_to_cpu(descriptor->num_flow_rules);
+ priv->max_flow_rules = be32_to_cpu(descriptor->max_flow_rules);
+ return 0;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+int gve_adminq_query_flow_rules(struct gve_priv *priv, u16 query_opcode, u32 starting_loc)
+{
+ struct gve_query_flow_rules_descriptor *descriptor;
+ union gve_adminq_command cmd;
+ dma_addr_t descriptor_bus;
+ int err = 0;
+
+ memset(&cmd, 0, sizeof(cmd));
+ descriptor = dma_pool_alloc(priv->adminq_pool, GFP_KERNEL, &descriptor_bus);
+ if (!descriptor)
+ return -ENOMEM;
+
+ cmd.opcode = cpu_to_be32(GVE_ADMINQ_QUERY_FLOW_RULES);
+ cmd.query_flow_rules = (struct gve_adminq_query_flow_rules) {
+ .opcode = cpu_to_be16(query_opcode),
+ .starting_rule_id = cpu_to_be32(starting_loc),
+ .available_length = cpu_to_be64(GVE_ADMINQ_BUFFER_SIZE),
+ .rule_descriptor_addr = cpu_to_be64(descriptor_bus),
+ };
+ err = gve_adminq_execute_cmd(priv, &cmd);
+ if (err)
+ goto out;
+
+ err = gve_adminq_process_flow_rules_query(priv, query_opcode, descriptor);
+
+out:
+ dma_pool_free(priv->adminq_pool, descriptor, descriptor_bus);
+ return err;
+}
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index e64a0e72e781..9ddb72f92197 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -25,11 +25,21 @@ enum gve_adminq_opcodes {
GVE_ADMINQ_REPORT_LINK_SPEED = 0xD,
GVE_ADMINQ_GET_PTYPE_MAP = 0xE,
GVE_ADMINQ_VERIFY_DRIVER_COMPATIBILITY = 0xF,
+ GVE_ADMINQ_QUERY_FLOW_RULES = 0x10,
/* For commands that are larger than 56 bytes */
GVE_ADMINQ_EXTENDED_COMMAND = 0xFF,
};
+/* The normal adminq command is restricted to be 56 bytes at maximum. For the
+ * longer adminq command, it is wrapped by GVE_ADMINQ_EXTENDED_COMMAND with
+ * inner opcode of gve_adminq_extended_cmd_opcodes specified. The inner command
+ * is written in the dma memory allocated by GVE_ADMINQ_EXTENDED_COMMAND.
+ */
+enum gve_adminq_extended_cmd_opcodes {
+ GVE_ADMINQ_CONFIGURE_FLOW_RULE = 0x101,
+};
+
/* Admin queue status codes */
enum gve_adminq_statuses {
GVE_ADMINQ_COMMAND_UNSET = 0x0,
@@ -434,6 +444,66 @@ struct gve_adminq_get_ptype_map {
__be64 ptype_map_addr;
};
+/* Flow-steering related definitions */
+enum gve_adminq_flow_rule_cfg_opcode {
+ GVE_RULE_ADD = 0,
+ GVE_RULE_DEL = 1,
+ GVE_RULE_RESET = 2,
+};
+
+enum gve_adminq_flow_rule_query_opcode {
+ GVE_FLOW_RULE_QUERY_RULES = 0,
+ GVE_FLOW_RULE_QUERY_IDS = 1,
+ GVE_FLOW_RULE_QUERY_STATS = 2,
+};
+
+enum gve_adminq_flow_type {
+ GVE_FLOW_TYPE_TCPV4,
+ GVE_FLOW_TYPE_UDPV4,
+ GVE_FLOW_TYPE_SCTPV4,
+ GVE_FLOW_TYPE_AHV4,
+ GVE_FLOW_TYPE_ESPV4,
+ GVE_FLOW_TYPE_TCPV6,
+ GVE_FLOW_TYPE_UDPV6,
+ GVE_FLOW_TYPE_SCTPV6,
+ GVE_FLOW_TYPE_AHV6,
+ GVE_FLOW_TYPE_ESPV6,
+};
+
+/* Flow-steering command */
+struct gve_adminq_flow_rule {
+ __be16 flow_type;
+ __be16 action; /* RX queue id */
+ struct gve_flow_spec key;
+ struct gve_flow_spec mask;
+};
+
+struct gve_adminq_configure_flow_rule {
+ __be16 opcode;
+ u8 padding[2];
+ struct gve_adminq_flow_rule rule;
+ __be32 location;
+};
+
+static_assert(sizeof(struct gve_adminq_configure_flow_rule) == 92);
+
+struct gve_query_flow_rules_descriptor {
+ __be32 num_flow_rules; /* Current rule counts stored in the device */
+ __be32 max_flow_rules;
+ __be32 num_queried_rules;
+ __be32 total_length; /* The memory length that the device writes */
+};
+
+struct gve_adminq_query_flow_rules {
+ __be16 opcode;
+ u8 padding[2];
+ __be32 starting_rule_id;
+ __be64 available_length; /* The dma memory length that the driver allocated */
+ __be64 rule_descriptor_addr; /* The dma memory address */
+};
+
+static_assert(sizeof(struct gve_adminq_query_flow_rules) == 24);
+
union gve_adminq_command {
struct {
__be32 opcode;
@@ -454,6 +524,7 @@ union gve_adminq_command {
struct gve_adminq_get_ptype_map get_ptype_map;
struct gve_adminq_verify_driver_compatibility
verify_driver_compatibility;
+ struct gve_adminq_query_flow_rules query_flow_rules;
struct gve_adminq_extended_command extended_command;
};
};
@@ -488,6 +559,10 @@ int gve_adminq_verify_driver_compatibility(struct gve_priv *priv,
u64 driver_info_len,
dma_addr_t driver_info_addr);
int gve_adminq_report_link_speed(struct gve_priv *priv);
+int gve_adminq_add_flow_rule(struct gve_priv *priv, struct gve_adminq_flow_rule *rule, u32 loc);
+int gve_adminq_del_flow_rule(struct gve_priv *priv, u32 loc);
+int gve_adminq_reset_flow_rules(struct gve_priv *priv);
+int gve_adminq_query_flow_rules(struct gve_priv *priv, u16 query_opcode, u32 starting_loc);
struct gve_ptype_lut;
int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index 156b7e128b53..02cee7e0e229 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -74,7 +74,8 @@ static const char gve_gstrings_adminq_stats[][ETH_GSTRING_LEN] = {
"adminq_create_tx_queue_cnt", "adminq_create_rx_queue_cnt",
"adminq_destroy_tx_queue_cnt", "adminq_destroy_rx_queue_cnt",
"adminq_dcfg_device_resources_cnt", "adminq_set_driver_parameter_cnt",
- "adminq_report_stats_cnt", "adminq_report_link_speed_cnt", "adminq_get_ptype_map_cnt"
+ "adminq_report_stats_cnt", "adminq_report_link_speed_cnt", "adminq_get_ptype_map_cnt",
+ "adminq_query_flow_rules", "adminq_cfg_flow_rule",
};
static const char gve_gstrings_priv_flags[][ETH_GSTRING_LEN] = {
@@ -458,6 +459,8 @@ gve_get_ethtool_stats(struct net_device *netdev,
data[i++] = priv->adminq_report_stats_cnt;
data[i++] = priv->adminq_report_link_speed_cnt;
data[i++] = priv->adminq_get_ptype_map_cnt;
+ data[i++] = priv->adminq_query_flow_rules_cnt;
+ data[i++] = priv->adminq_cfg_flow_rule_cnt;
}
static void gve_get_channels(struct net_device *netdev,
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index cabf7d4bcecb..eb435ccbe98e 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -141,6 +141,52 @@ static void gve_get_stats(struct net_device *dev, struct rtnl_link_stats64 *s)
}
}
+static int gve_alloc_flow_rule_caches(struct gve_priv *priv)
+{
+ struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
+ int err = 0;
+
+ if (!priv->max_flow_rules)
+ return 0;
+
+ flow_rules_cache->rules_cache =
+ kvcalloc(GVE_FLOW_RULES_CACHE_SIZE, sizeof(*flow_rules_cache->rules_cache),
+ GFP_KERNEL);
+ if (!flow_rules_cache->rules_cache) {
+ dev_err(&priv->pdev->dev, "Cannot alloc flow rules cache\n");
+ return -ENOMEM;
+ }
+
+ flow_rules_cache->rule_ids_cache =
+ kvcalloc(GVE_FLOW_RULE_IDS_CACHE_SIZE, sizeof(*flow_rules_cache->rule_ids_cache),
+ GFP_KERNEL);
+ if (!flow_rules_cache->rule_ids_cache) {
+ dev_err(&priv->pdev->dev, "Cannot alloc flow rule ids cache\n");
+ err = -ENOMEM;
+ goto free_rules_cache;
+ }
+
+ return 0;
+
+free_rules_cache:
+ kvfree(flow_rules_cache->rules_cache);
+ flow_rules_cache->rules_cache = NULL;
+ return err;
+}
+
+static void gve_free_flow_rule_caches(struct gve_priv *priv)
+{
+ struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
+
+ if (!priv->max_flow_rules)
+ return;
+
+ kvfree(flow_rules_cache->rule_ids_cache);
+ flow_rules_cache->rule_ids_cache = NULL;
+ kvfree(flow_rules_cache->rules_cache);
+ flow_rules_cache->rules_cache = NULL;
+}
+
static int gve_alloc_counter_array(struct gve_priv *priv)
{
priv->counter_array =
@@ -521,9 +567,12 @@ static int gve_setup_device_resources(struct gve_priv *priv)
{
int err;
- err = gve_alloc_counter_array(priv);
+ err = gve_alloc_flow_rule_caches(priv);
if (err)
return err;
+ err = gve_alloc_counter_array(priv);
+ if (err)
+ goto abort_with_flow_rule_caches;
err = gve_alloc_notify_blocks(priv);
if (err)
goto abort_with_counter;
@@ -575,6 +624,8 @@ static int gve_setup_device_resources(struct gve_priv *priv)
gve_free_notify_blocks(priv);
abort_with_counter:
gve_free_counter_array(priv);
+abort_with_flow_rule_caches:
+ gve_free_flow_rule_caches(priv);
return err;
}
@@ -606,6 +657,7 @@ static void gve_teardown_device_resources(struct gve_priv *priv)
kvfree(priv->ptype_lut_dqo);
priv->ptype_lut_dqo = NULL;
+ gve_free_flow_rule_caches(priv);
gve_free_counter_array(priv);
gve_free_notify_blocks(priv);
gve_free_stats_report(priv);
--
2.45.0.rc1.225.g2a3ae87e7f-goog
We were depending on the rtnl_lock to make sure there is only one adminq
command running at a time. But some commands may take too long to hold
the rtnl_lock, such as the upcoming flow steering operations. For such
situations, it can temporarily drop the rtnl_lock, and replace it for
these operations with a new adminq lock, which can ensure the adminq
command execution to be thread-safe.
Signed-off-by: Ziwei Xiao <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
drivers/net/ethernet/google/gve/gve.h | 1 +
drivers/net/ethernet/google/gve/gve_adminq.c | 22 +++++++++++---------
2 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index ae1e21c9b0a5..ca7fce17f2c0 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -724,6 +724,7 @@ struct gve_priv {
union gve_adminq_command *adminq;
dma_addr_t adminq_bus_addr;
struct dma_pool *adminq_pool;
+ struct mutex adminq_lock; /* Protects adminq command execution */
u32 adminq_mask; /* masks prod_cnt to adminq size */
u32 adminq_prod_cnt; /* free-running count of AQ cmds executed */
u32 adminq_cmd_fail; /* free-running count of AQ cmds failed */
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 3df8243680d9..2c3ec5c3b114 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -284,6 +284,7 @@ int gve_adminq_alloc(struct device *dev, struct gve_priv *priv)
&priv->reg_bar0->adminq_base_address_lo);
iowrite32be(GVE_DRIVER_STATUS_RUN_MASK, &priv->reg_bar0->driver_status);
}
+ mutex_init(&priv->adminq_lock);
gve_set_admin_queue_ok(priv);
return 0;
}
@@ -511,28 +512,29 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
return 0;
}
-/* This function is not threadsafe - the caller is responsible for any
- * necessary locks.
- * The caller is also responsible for making sure there are no commands
- * waiting to be executed.
- */
static int gve_adminq_execute_cmd(struct gve_priv *priv,
union gve_adminq_command *cmd_orig)
{
u32 tail, head;
int err;
+ mutex_lock(&priv->adminq_lock);
tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
head = priv->adminq_prod_cnt;
- if (tail != head)
- // This is not a valid path
- return -EINVAL;
+ if (tail != head) {
+ err = -EINVAL;
+ goto out;
+ }
err = gve_adminq_issue_cmd(priv, cmd_orig);
if (err)
- return err;
+ goto out;
- return gve_adminq_kick_and_wait(priv);
+ err = gve_adminq_kick_and_wait(priv);
+
+out:
+ mutex_unlock(&priv->adminq_lock);
+ return err;
}
/* The device specifies that the management vector can either be the first irq
--
2.45.0.rc1.225.g2a3ae87e7f-goog
From: Jeroen de Borst <[email protected]>
The adminq command is limited to 64 bytes per entry and it's 56 bytes
for the command itself at maximum. To support larger commands, we need
to dma_alloc a separate memory to put the command in that memory and
send the dma memory address instead of the actual command.
This change introduces an extended adminq command to wrap the real
command with the inner opcode and the allocated dma memory address
specified. Once the device receives it, it can get the real command from
the given dma memory address. As designed with the device, all the
extended commands will use inner opcode larger than 0xFF.
Signed-off-by: Jeroen de Borst <[email protected]>
Co-developed-by: Ziwei Xiao <[email protected]>
Signed-off-by: Ziwei Xiao <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
drivers/net/ethernet/google/gve/gve_adminq.c | 31 ++++++++++++++++++++
drivers/net/ethernet/google/gve/gve_adminq.h | 12 ++++++++
2 files changed, 43 insertions(+)
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 2c3ec5c3b114..514641b3ccc7 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -461,6 +461,8 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
opcode = be32_to_cpu(READ_ONCE(cmd->opcode));
+ if (opcode == GVE_ADMINQ_EXTENDED_COMMAND)
+ opcode = be32_to_cpu(cmd->extended_command.inner_opcode);
switch (opcode) {
case GVE_ADMINQ_DESCRIBE_DEVICE:
@@ -537,6 +539,35 @@ static int gve_adminq_execute_cmd(struct gve_priv *priv,
return err;
}
+static int gve_adminq_execute_extended_cmd(struct gve_priv *priv, u32 opcode,
+ size_t cmd_size, void *cmd_orig)
+{
+ union gve_adminq_command cmd;
+ dma_addr_t inner_cmd_bus;
+ void *inner_cmd;
+ int err;
+
+ inner_cmd = dma_alloc_coherent(&priv->pdev->dev, cmd_size,
+ &inner_cmd_bus, GFP_KERNEL);
+ if (!inner_cmd)
+ return -ENOMEM;
+
+ memcpy(inner_cmd, cmd_orig, cmd_size);
+
+ memset(&cmd, 0, sizeof(cmd));
+ cmd.opcode = cpu_to_be32(GVE_ADMINQ_EXTENDED_COMMAND);
+ cmd.extended_command = (struct gve_adminq_extended_command) {
+ .inner_opcode = cpu_to_be32(opcode),
+ .inner_length = cpu_to_be32(cmd_size),
+ .inner_command_addr = cpu_to_be64(inner_cmd_bus),
+ };
+
+ err = gve_adminq_execute_cmd(priv, &cmd);
+
+ dma_free_coherent(&priv->pdev->dev, cmd_size, inner_cmd, inner_cmd_bus);
+ return err;
+}
+
/* The device specifies that the management vector can either be the first irq
* or the last irq. ntfy_blk_msix_base_idx indicates the first irq assigned to
* the ntfy blks. It if is 0 then the management vector is last, if it is 1 then
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index e64f0dbe744d..e0370ace8397 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -25,6 +25,9 @@ enum gve_adminq_opcodes {
GVE_ADMINQ_REPORT_LINK_SPEED = 0xD,
GVE_ADMINQ_GET_PTYPE_MAP = 0xE,
GVE_ADMINQ_VERIFY_DRIVER_COMPATIBILITY = 0xF,
+
+ /* For commands that are larger than 56 bytes */
+ GVE_ADMINQ_EXTENDED_COMMAND = 0xFF,
};
/* Admin queue status codes */
@@ -208,6 +211,14 @@ enum gve_driver_capbility {
#define GVE_DRIVER_CAPABILITY_FLAGS3 0x0
#define GVE_DRIVER_CAPABILITY_FLAGS4 0x0
+struct gve_adminq_extended_command {
+ __be32 inner_opcode;
+ __be32 inner_length;
+ __be64 inner_command_addr;
+};
+
+static_assert(sizeof(struct gve_adminq_extended_command) == 16);
+
struct gve_driver_info {
u8 os_type; /* 0x01 = Linux */
u8 driver_major;
@@ -432,6 +443,7 @@ union gve_adminq_command {
struct gve_adminq_get_ptype_map get_ptype_map;
struct gve_adminq_verify_driver_compatibility
verify_driver_compatibility;
+ struct gve_adminq_extended_command extended_command;
};
};
u8 reserved[64];
--
2.45.0.rc1.225.g2a3ae87e7f-goog
From: Jeroen de Borst <[email protected]>
Add a new device option to signal to the driver that the device supports
flow steering. This device option also carries the maximum number of
flow steering rules that the device can store.
Signed-off-by: Jeroen de Borst <[email protected]>
Co-developed-by: Ziwei Xiao <[email protected]>
Signed-off-by: Ziwei Xiao <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
drivers/net/ethernet/google/gve/gve.h | 2 +
drivers/net/ethernet/google/gve/gve_adminq.c | 42 ++++++++++++++++++--
drivers/net/ethernet/google/gve/gve_adminq.h | 11 +++++
3 files changed, 51 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index ca7fce17f2c0..58213c15e084 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -786,6 +786,8 @@ struct gve_priv {
u16 header_buf_size; /* device configured, header-split supported if non-zero */
bool header_split_enabled; /* True if the header split is enabled by the user */
+
+ u32 max_flow_rules;
};
enum gve_service_task_flags_bit {
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 514641b3ccc7..85d0d742ad21 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -44,6 +44,7 @@ void gve_parse_device_option(struct gve_priv *priv,
struct gve_device_option_jumbo_frames **dev_op_jumbo_frames,
struct gve_device_option_dqo_qpl **dev_op_dqo_qpl,
struct gve_device_option_buffer_sizes **dev_op_buffer_sizes,
+ struct gve_device_option_flow_steering **dev_op_flow_steering,
struct gve_device_option_modify_ring **dev_op_modify_ring)
{
u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
@@ -189,6 +190,23 @@ void gve_parse_device_option(struct gve_priv *priv,
if (option_length == GVE_DEVICE_OPTION_NO_MIN_RING_SIZE)
priv->default_min_ring_size = true;
break;
+ case GVE_DEV_OPT_ID_FLOW_STEERING:
+ if (option_length < sizeof(**dev_op_flow_steering) ||
+ req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING) {
+ dev_warn(&priv->pdev->dev, GVE_DEVICE_OPTION_ERROR_FMT,
+ "Flow Steering",
+ (int)sizeof(**dev_op_flow_steering),
+ GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING,
+ option_length, req_feat_mask);
+ break;
+ }
+
+ if (option_length > sizeof(**dev_op_flow_steering))
+ dev_warn(&priv->pdev->dev,
+ GVE_DEVICE_OPTION_TOO_BIG_FMT,
+ "Flow Steering");
+ *dev_op_flow_steering = (void *)(option + 1);
+ break;
default:
/* If we don't recognize the option just continue
* without doing anything.
@@ -208,6 +226,7 @@ gve_process_device_options(struct gve_priv *priv,
struct gve_device_option_jumbo_frames **dev_op_jumbo_frames,
struct gve_device_option_dqo_qpl **dev_op_dqo_qpl,
struct gve_device_option_buffer_sizes **dev_op_buffer_sizes,
+ struct gve_device_option_flow_steering **dev_op_flow_steering,
struct gve_device_option_modify_ring **dev_op_modify_ring)
{
const int num_options = be16_to_cpu(descriptor->num_device_options);
@@ -230,7 +249,7 @@ gve_process_device_options(struct gve_priv *priv,
dev_op_gqi_rda, dev_op_gqi_qpl,
dev_op_dqo_rda, dev_op_jumbo_frames,
dev_op_dqo_qpl, dev_op_buffer_sizes,
- dev_op_modify_ring);
+ dev_op_flow_steering, dev_op_modify_ring);
dev_opt = next_opt;
}
@@ -838,6 +857,8 @@ static void gve_enable_supported_features(struct gve_priv *priv,
*dev_op_dqo_qpl,
const struct gve_device_option_buffer_sizes
*dev_op_buffer_sizes,
+ const struct gve_device_option_flow_steering
+ *dev_op_flow_steering,
const struct gve_device_option_modify_ring
*dev_op_modify_ring)
{
@@ -890,10 +911,22 @@ static void gve_enable_supported_features(struct gve_priv *priv,
priv->min_tx_desc_cnt = be16_to_cpu(dev_op_modify_ring->min_tx_ring_size);
}
}
+
+ if (dev_op_flow_steering &&
+ (supported_features_mask & GVE_SUP_FLOW_STEERING_MASK)) {
+ if (dev_op_flow_steering->max_flow_rules) {
+ priv->max_flow_rules =
+ be32_to_cpu(dev_op_flow_steering->max_flow_rules);
+ dev_info(&priv->pdev->dev,
+ "FLOW STEERING device option enabled with max rule limit of %u.\n",
+ priv->max_flow_rules);
+ }
+ }
}
int gve_adminq_describe_device(struct gve_priv *priv)
{
+ struct gve_device_option_flow_steering *dev_op_flow_steering = NULL;
struct gve_device_option_buffer_sizes *dev_op_buffer_sizes = NULL;
struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
struct gve_device_option_modify_ring *dev_op_modify_ring = NULL;
@@ -930,6 +963,7 @@ int gve_adminq_describe_device(struct gve_priv *priv)
&dev_op_gqi_qpl, &dev_op_dqo_rda,
&dev_op_jumbo_frames, &dev_op_dqo_qpl,
&dev_op_buffer_sizes,
+ &dev_op_flow_steering,
&dev_op_modify_ring);
if (err)
goto free_device_descriptor;
@@ -969,9 +1003,8 @@ int gve_adminq_describe_device(struct gve_priv *priv)
/* set default descriptor counts */
gve_set_default_desc_cnt(priv, descriptor);
- /* DQO supports LRO. */
if (!gve_is_gqi(priv))
- priv->dev->hw_features |= NETIF_F_LRO;
+ priv->dev->hw_features |= NETIF_F_LRO | NETIF_F_NTUPLE;
priv->max_registered_pages =
be64_to_cpu(descriptor->max_registered_pages);
@@ -991,7 +1024,8 @@ int gve_adminq_describe_device(struct gve_priv *priv)
gve_enable_supported_features(priv, supported_features_mask,
dev_op_jumbo_frames, dev_op_dqo_qpl,
- dev_op_buffer_sizes, dev_op_modify_ring);
+ dev_op_buffer_sizes, dev_op_flow_steering,
+ dev_op_modify_ring);
free_device_descriptor:
dma_pool_free(priv->adminq_pool, descriptor, descriptor_bus);
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index e0370ace8397..e64a0e72e781 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -146,6 +146,14 @@ struct gve_device_option_modify_ring {
static_assert(sizeof(struct gve_device_option_modify_ring) == 12);
+struct gve_device_option_flow_steering {
+ __be32 supported_features_mask;
+ __be32 reserved;
+ __be32 max_flow_rules;
+};
+
+static_assert(sizeof(struct gve_device_option_flow_steering) == 12);
+
/* Terminology:
*
* RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
@@ -163,6 +171,7 @@ enum gve_dev_opt_id {
GVE_DEV_OPT_ID_DQO_QPL = 0x7,
GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
GVE_DEV_OPT_ID_BUFFER_SIZES = 0xa,
+ GVE_DEV_OPT_ID_FLOW_STEERING = 0xb,
};
enum gve_dev_opt_req_feat_mask {
@@ -174,12 +183,14 @@ enum gve_dev_opt_req_feat_mask {
GVE_DEV_OPT_REQ_FEAT_MASK_DQO_QPL = 0x0,
GVE_DEV_OPT_REQ_FEAT_MASK_BUFFER_SIZES = 0x0,
GVE_DEV_OPT_REQ_FEAT_MASK_MODIFY_RING = 0x0,
+ GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING = 0x0,
};
enum gve_sup_feature_mask {
GVE_SUP_MODIFY_RING_MASK = 1 << 0,
GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
GVE_SUP_BUFFER_SIZES_MASK = 1 << 4,
+ GVE_SUP_FLOW_STEERING_MASK = 1 << 5,
};
#define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
--
2.45.0.rc1.225.g2a3ae87e7f-goog
From: Jeroen de Borst <[email protected]>
Implement the ethtool commands that can be used to configure and query
flow-steering rules. For these ethtool commands, the driver will
temporarily drop the rtnl lock to reduce the latency for the flow
steering commands on separate NICs. It will then be protected by the new
added adminq lock.
A large part of this change consists of translating the ethtool
representation of 'ntuples' to our internal gve_flow_rule and vice-versa
in the new created gve_flow_rule.c
Considering the possible large amount of flow rules, the driver doesn't
store all the rules locally. When the user runs 'ethtool -n <nic>' to
check the registered rules, the driver will send adminq command to
query a limited amount of rules/rule ids(that filled in a 4096 bytes dma
memory) at a time as a cache for the ethtool queries. The adminq query
commands will be repeated for several times until the ethtool has
queried all the needed rules.
Signed-off-by: Jeroen de Borst <[email protected]>
Co-developed-by: Ziwei Xiao <[email protected]>
Signed-off-by: Ziwei Xiao <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
drivers/net/ethernet/google/gve/Makefile | 2 +-
drivers/net/ethernet/google/gve/gve.h | 8 +-
drivers/net/ethernet/google/gve/gve_ethtool.c | 86 ++++-
.../net/ethernet/google/gve/gve_flow_rule.c | 296 ++++++++++++++++++
drivers/net/ethernet/google/gve/gve_main.c | 32 +-
5 files changed, 415 insertions(+), 9 deletions(-)
create mode 100644 drivers/net/ethernet/google/gve/gve_flow_rule.c
diff --git a/drivers/net/ethernet/google/gve/Makefile b/drivers/net/ethernet/google/gve/Makefile
index b9a6be76531b..9ed07080b38a 100644
--- a/drivers/net/ethernet/google/gve/Makefile
+++ b/drivers/net/ethernet/google/gve/Makefile
@@ -1,4 +1,4 @@
# Makefile for the Google virtual Ethernet (gve) driver
obj-$(CONFIG_GVE) += gve.o
-gve-objs := gve_main.o gve_tx.o gve_tx_dqo.o gve_rx.o gve_rx_dqo.o gve_ethtool.o gve_adminq.o gve_utils.o
+gve-objs := gve_main.o gve_tx.o gve_tx_dqo.o gve_rx.o gve_rx_dqo.o gve_ethtool.o gve_adminq.o gve_utils.o gve_flow_rule.o
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 355ae797eacf..8565f1df9c50 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -1,7 +1,7 @@
/* SPDX-License-Identifier: (GPL-2.0 OR MIT)
* Google virtual Ethernet (gve) driver
*
- * Copyright (C) 2015-2021 Google, Inc.
+ * Copyright (C) 2015-2024 Google LLC
*/
#ifndef _GVE_H_
@@ -1169,6 +1169,12 @@ int gve_adjust_config(struct gve_priv *priv,
int gve_adjust_queues(struct gve_priv *priv,
struct gve_queue_config new_rx_config,
struct gve_queue_config new_tx_config);
+/* flow steering rule */
+int gve_get_flow_rule_entry(struct gve_priv *priv, struct ethtool_rxnfc *cmd);
+int gve_get_flow_rule_ids(struct gve_priv *priv, struct ethtool_rxnfc *cmd, u32 *rule_locs);
+int gve_add_flow_rule(struct gve_priv *priv, struct ethtool_rxnfc *cmd);
+int gve_del_flow_rule(struct gve_priv *priv, struct ethtool_rxnfc *cmd);
+int gve_flow_rules_reset(struct gve_priv *priv);
/* report stats handling */
void gve_handle_report_stats(struct gve_priv *priv);
/* exported by ethtool.c */
diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index 02cee7e0e229..27166bde5fa0 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/* Google virtual Ethernet (gve) driver
*
- * Copyright (C) 2015-2021 Google, Inc.
+ * Copyright (C) 2015-2024 Google LLC
*/
#include <linux/rtnetlink.h>
@@ -503,6 +503,12 @@ static int gve_set_channels(struct net_device *netdev,
return -EINVAL;
}
+ if (old_settings.rx_count != new_rx && priv->num_flow_rules) {
+ dev_err(&priv->pdev->dev,
+ "Changing number of RX queues is disabled when flow rules are active");
+ return -EBUSY;
+ }
+
if (!netif_carrier_ok(netdev)) {
priv->tx_cfg.num_queues = new_tx;
priv->rx_cfg.num_queues = new_rx;
@@ -783,6 +789,82 @@ static int gve_set_coalesce(struct net_device *netdev,
return 0;
}
+static int gve_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
+{
+ struct gve_priv *priv = netdev_priv(netdev);
+ int err = 0;
+
+ if (!(netdev->features & NETIF_F_NTUPLE))
+ return -EOPNOTSUPP;
+
+ dev_hold(netdev);
+ rtnl_unlock();
+
+ switch (cmd->cmd) {
+ case ETHTOOL_SRXCLSRLINS:
+ err = gve_add_flow_rule(priv, cmd);
+ break;
+ case ETHTOOL_SRXCLSRLDEL:
+ err = gve_del_flow_rule(priv, cmd);
+ break;
+ case ETHTOOL_SRXFH:
+ err = -EOPNOTSUPP;
+ break;
+ default:
+ err = -EOPNOTSUPP;
+ break;
+ }
+
+ rtnl_lock();
+ dev_put(netdev);
+ return err;
+}
+
+static int gve_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd, u32 *rule_locs)
+{
+ struct gve_priv *priv = netdev_priv(netdev);
+ int err = 0;
+
+ dev_hold(netdev);
+ rtnl_unlock();
+
+ switch (cmd->cmd) {
+ case ETHTOOL_GRXRINGS:
+ cmd->data = priv->rx_cfg.num_queues;
+ break;
+ case ETHTOOL_GRXCLSRLCNT:
+ if (!priv->max_flow_rules) {
+ err = -EOPNOTSUPP;
+ goto out;
+ }
+
+ err = gve_adminq_query_flow_rules(priv, GVE_FLOW_RULE_QUERY_STATS, 0);
+ if (err)
+ goto out;
+
+ cmd->rule_cnt = priv->num_flow_rules;
+ cmd->data = priv->max_flow_rules;
+ break;
+ case ETHTOOL_GRXCLSRULE:
+ err = gve_get_flow_rule_entry(priv, cmd);
+ break;
+ case ETHTOOL_GRXCLSRLALL:
+ err = gve_get_flow_rule_ids(priv, cmd, (u32 *)rule_locs);
+ break;
+ case ETHTOOL_GRXFH:
+ err = -EOPNOTSUPP;
+ break;
+ default:
+ err = -EOPNOTSUPP;
+ break;
+ }
+
+out:
+ rtnl_lock();
+ dev_put(netdev);
+ return err;
+}
+
const struct ethtool_ops gve_ethtool_ops = {
.supported_coalesce_params = ETHTOOL_COALESCE_USECS,
.supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT,
@@ -794,6 +876,8 @@ const struct ethtool_ops gve_ethtool_ops = {
.get_msglevel = gve_get_msglevel,
.set_channels = gve_set_channels,
.get_channels = gve_get_channels,
+ .set_rxnfc = gve_set_rxnfc,
+ .get_rxnfc = gve_get_rxnfc,
.get_link = ethtool_op_get_link,
.get_coalesce = gve_get_coalesce,
.set_coalesce = gve_set_coalesce,
diff --git a/drivers/net/ethernet/google/gve/gve_flow_rule.c b/drivers/net/ethernet/google/gve/gve_flow_rule.c
new file mode 100644
index 000000000000..1cafd520f2db
--- /dev/null
+++ b/drivers/net/ethernet/google/gve/gve_flow_rule.c
@@ -0,0 +1,296 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/* Google virtual Ethernet (gve) driver
+ *
+ * Copyright (C) 2015-2024 Google LLC
+ */
+
+#include "gve.h"
+#include "gve_adminq.h"
+
+static
+int gve_fill_ethtool_flow_spec(struct ethtool_rx_flow_spec *fsp, struct gve_flow_rule *rule)
+{
+ static const u16 flow_type_lut[] = {
+ [GVE_FLOW_TYPE_TCPV4] = TCP_V4_FLOW,
+ [GVE_FLOW_TYPE_UDPV4] = UDP_V4_FLOW,
+ [GVE_FLOW_TYPE_SCTPV4] = SCTP_V4_FLOW,
+ [GVE_FLOW_TYPE_AHV4] = AH_V4_FLOW,
+ [GVE_FLOW_TYPE_ESPV4] = ESP_V4_FLOW,
+ [GVE_FLOW_TYPE_TCPV6] = TCP_V6_FLOW,
+ [GVE_FLOW_TYPE_UDPV6] = UDP_V6_FLOW,
+ [GVE_FLOW_TYPE_SCTPV6] = SCTP_V6_FLOW,
+ [GVE_FLOW_TYPE_AHV6] = AH_V6_FLOW,
+ [GVE_FLOW_TYPE_ESPV6] = ESP_V6_FLOW,
+ };
+
+ if (be16_to_cpu(rule->flow_type) >= ARRAY_SIZE(flow_type_lut))
+ return -EINVAL;
+
+ fsp->flow_type = flow_type_lut[be16_to_cpu(rule->flow_type)];
+
+ memset(&fsp->h_u, 0, sizeof(fsp->h_u));
+ memset(&fsp->h_ext, 0, sizeof(fsp->h_ext));
+ memset(&fsp->m_u, 0, sizeof(fsp->m_u));
+ memset(&fsp->m_ext, 0, sizeof(fsp->m_ext));
+
+ switch (fsp->flow_type) {
+ case TCP_V4_FLOW:
+ case UDP_V4_FLOW:
+ case SCTP_V4_FLOW:
+ fsp->h_u.tcp_ip4_spec.ip4src = rule->key.src_ip[0];
+ fsp->h_u.tcp_ip4_spec.ip4dst = rule->key.dst_ip[0];
+ fsp->h_u.tcp_ip4_spec.psrc = rule->key.src_port;
+ fsp->h_u.tcp_ip4_spec.pdst = rule->key.dst_port;
+ fsp->h_u.tcp_ip4_spec.tos = rule->key.tos;
+ fsp->m_u.tcp_ip4_spec.ip4src = rule->mask.src_ip[0];
+ fsp->m_u.tcp_ip4_spec.ip4dst = rule->mask.dst_ip[0];
+ fsp->m_u.tcp_ip4_spec.psrc = rule->mask.src_port;
+ fsp->m_u.tcp_ip4_spec.pdst = rule->mask.dst_port;
+ fsp->m_u.tcp_ip4_spec.tos = rule->mask.tos;
+ break;
+ case AH_V4_FLOW:
+ case ESP_V4_FLOW:
+ fsp->h_u.ah_ip4_spec.ip4src = rule->key.src_ip[0];
+ fsp->h_u.ah_ip4_spec.ip4dst = rule->key.dst_ip[0];
+ fsp->h_u.ah_ip4_spec.spi = rule->key.spi;
+ fsp->h_u.ah_ip4_spec.tos = rule->key.tos;
+ fsp->m_u.ah_ip4_spec.ip4src = rule->mask.src_ip[0];
+ fsp->m_u.ah_ip4_spec.ip4dst = rule->mask.dst_ip[0];
+ fsp->m_u.ah_ip4_spec.spi = rule->mask.spi;
+ fsp->m_u.ah_ip4_spec.tos = rule->mask.tos;
+ break;
+ case TCP_V6_FLOW:
+ case UDP_V6_FLOW:
+ case SCTP_V6_FLOW:
+ memcpy(fsp->h_u.tcp_ip6_spec.ip6src, &rule->key.src_ip,
+ sizeof(struct in6_addr));
+ memcpy(fsp->h_u.tcp_ip6_spec.ip6dst, &rule->key.dst_ip,
+ sizeof(struct in6_addr));
+ fsp->h_u.tcp_ip6_spec.psrc = rule->key.src_port;
+ fsp->h_u.tcp_ip6_spec.pdst = rule->key.dst_port;
+ fsp->h_u.tcp_ip6_spec.tclass = rule->key.tclass;
+ memcpy(fsp->m_u.tcp_ip6_spec.ip6src, &rule->mask.src_ip,
+ sizeof(struct in6_addr));
+ memcpy(fsp->m_u.tcp_ip6_spec.ip6dst, &rule->mask.dst_ip,
+ sizeof(struct in6_addr));
+ fsp->m_u.tcp_ip6_spec.psrc = rule->mask.src_port;
+ fsp->m_u.tcp_ip6_spec.pdst = rule->mask.dst_port;
+ fsp->m_u.tcp_ip6_spec.tclass = rule->mask.tclass;
+ break;
+ case AH_V6_FLOW:
+ case ESP_V6_FLOW:
+ memcpy(fsp->h_u.ah_ip6_spec.ip6src, &rule->key.src_ip,
+ sizeof(struct in6_addr));
+ memcpy(fsp->h_u.ah_ip6_spec.ip6dst, &rule->key.dst_ip,
+ sizeof(struct in6_addr));
+ fsp->h_u.ah_ip6_spec.spi = rule->key.spi;
+ fsp->h_u.ah_ip6_spec.tclass = rule->key.tclass;
+ memcpy(fsp->m_u.ah_ip6_spec.ip6src, &rule->mask.src_ip,
+ sizeof(struct in6_addr));
+ memcpy(fsp->m_u.ah_ip6_spec.ip6dst, &rule->mask.dst_ip,
+ sizeof(struct in6_addr));
+ fsp->m_u.ah_ip6_spec.spi = rule->mask.spi;
+ fsp->m_u.ah_ip6_spec.tclass = rule->mask.tclass;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ fsp->ring_cookie = be16_to_cpu(rule->action);
+
+ return 0;
+}
+
+static int gve_generate_flow_rule(struct gve_priv *priv, struct ethtool_rx_flow_spec *fsp,
+ struct gve_adminq_flow_rule *rule)
+{
+ static const u16 flow_type_lut[] = {
+ [TCP_V4_FLOW] = GVE_FLOW_TYPE_TCPV4,
+ [UDP_V4_FLOW] = GVE_FLOW_TYPE_UDPV4,
+ [SCTP_V4_FLOW] = GVE_FLOW_TYPE_SCTPV4,
+ [AH_V4_FLOW] = GVE_FLOW_TYPE_AHV4,
+ [ESP_V4_FLOW] = GVE_FLOW_TYPE_ESPV4,
+ [TCP_V6_FLOW] = GVE_FLOW_TYPE_TCPV6,
+ [UDP_V6_FLOW] = GVE_FLOW_TYPE_UDPV6,
+ [SCTP_V6_FLOW] = GVE_FLOW_TYPE_SCTPV6,
+ [AH_V6_FLOW] = GVE_FLOW_TYPE_AHV6,
+ [ESP_V6_FLOW] = GVE_FLOW_TYPE_ESPV6,
+ };
+ u32 flow_type;
+
+ if (fsp->ring_cookie == RX_CLS_FLOW_DISC)
+ return -EOPNOTSUPP;
+
+ if (fsp->ring_cookie >= priv->rx_cfg.num_queues)
+ return -EINVAL;
+
+ rule->action = cpu_to_be16(fsp->ring_cookie);
+
+ flow_type = fsp->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT | FLOW_RSS);
+ if (!flow_type || flow_type >= ARRAY_SIZE(flow_type_lut))
+ return -EINVAL;
+
+ rule->flow_type = cpu_to_be16(flow_type_lut[flow_type]);
+
+ switch (flow_type) {
+ case TCP_V4_FLOW:
+ case UDP_V4_FLOW:
+ case SCTP_V4_FLOW:
+ rule->key.src_ip[0] = fsp->h_u.tcp_ip4_spec.ip4src;
+ rule->key.dst_ip[0] = fsp->h_u.tcp_ip4_spec.ip4dst;
+ rule->key.src_port = fsp->h_u.tcp_ip4_spec.psrc;
+ rule->key.dst_port = fsp->h_u.tcp_ip4_spec.pdst;
+ rule->mask.src_ip[0] = fsp->m_u.tcp_ip4_spec.ip4src;
+ rule->mask.dst_ip[0] = fsp->m_u.tcp_ip4_spec.ip4dst;
+ rule->mask.src_port = fsp->m_u.tcp_ip4_spec.psrc;
+ rule->mask.dst_port = fsp->m_u.tcp_ip4_spec.pdst;
+ break;
+ case AH_V4_FLOW:
+ case ESP_V4_FLOW:
+ rule->key.src_ip[0] = fsp->h_u.tcp_ip4_spec.ip4src;
+ rule->key.dst_ip[0] = fsp->h_u.tcp_ip4_spec.ip4dst;
+ rule->key.spi = fsp->h_u.ah_ip4_spec.spi;
+ rule->mask.src_ip[0] = fsp->m_u.tcp_ip4_spec.ip4src;
+ rule->mask.dst_ip[0] = fsp->m_u.tcp_ip4_spec.ip4dst;
+ rule->mask.spi = fsp->m_u.ah_ip4_spec.spi;
+ break;
+ case TCP_V6_FLOW:
+ case UDP_V6_FLOW:
+ case SCTP_V6_FLOW:
+ memcpy(&rule->key.src_ip, fsp->h_u.tcp_ip6_spec.ip6src,
+ sizeof(struct in6_addr));
+ memcpy(&rule->key.dst_ip, fsp->h_u.tcp_ip6_spec.ip6dst,
+ sizeof(struct in6_addr));
+ rule->key.src_port = fsp->h_u.tcp_ip6_spec.psrc;
+ rule->key.dst_port = fsp->h_u.tcp_ip6_spec.pdst;
+ memcpy(&rule->mask.src_ip, fsp->m_u.tcp_ip6_spec.ip6src,
+ sizeof(struct in6_addr));
+ memcpy(&rule->mask.dst_ip, fsp->m_u.tcp_ip6_spec.ip6dst,
+ sizeof(struct in6_addr));
+ rule->mask.src_port = fsp->m_u.tcp_ip6_spec.psrc;
+ rule->mask.dst_port = fsp->m_u.tcp_ip6_spec.pdst;
+ break;
+ case AH_V6_FLOW:
+ case ESP_V6_FLOW:
+ memcpy(&rule->key.src_ip, fsp->h_u.usr_ip6_spec.ip6src,
+ sizeof(struct in6_addr));
+ memcpy(&rule->key.dst_ip, fsp->h_u.usr_ip6_spec.ip6dst,
+ sizeof(struct in6_addr));
+ rule->key.spi = fsp->h_u.ah_ip6_spec.spi;
+ memcpy(&rule->mask.src_ip, fsp->m_u.usr_ip6_spec.ip6src,
+ sizeof(struct in6_addr));
+ memcpy(&rule->mask.dst_ip, fsp->m_u.usr_ip6_spec.ip6dst,
+ sizeof(struct in6_addr));
+ rule->key.spi = fsp->h_u.ah_ip6_spec.spi;
+ break;
+ default:
+ /* not doing un-parsed flow types */
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+int gve_get_flow_rule_entry(struct gve_priv *priv, struct ethtool_rxnfc *cmd)
+{
+ struct ethtool_rx_flow_spec *fsp = (struct ethtool_rx_flow_spec *)&cmd->fs;
+ struct gve_flow_rule *rules_cache = priv->flow_rules_cache.rules_cache;
+ u32 *cache_num = &priv->flow_rules_cache.rules_cache_num;
+ struct gve_flow_rule *rule = NULL;
+ int err = 0;
+ u32 i;
+
+ if (!priv->max_flow_rules)
+ return -EOPNOTSUPP;
+
+ if (!priv->flow_rules_cache.rules_cache_synced ||
+ fsp->location < be32_to_cpu(rules_cache[0].location) ||
+ fsp->location > be32_to_cpu(rules_cache[*cache_num - 1].location)) {
+ err = gve_adminq_query_flow_rules(priv, GVE_FLOW_RULE_QUERY_RULES, fsp->location);
+ if (err)
+ return err;
+
+ priv->flow_rules_cache.rules_cache_synced = true;
+ }
+
+ for (i = 0; i < *cache_num; i++) {
+ if (fsp->location == be32_to_cpu(rules_cache[i].location)) {
+ rule = &rules_cache[i];
+ break;
+ }
+ }
+
+ if (!rule)
+ return -EINVAL;
+
+ err = gve_fill_ethtool_flow_spec(fsp, rule);
+
+ return err;
+}
+
+int gve_get_flow_rule_ids(struct gve_priv *priv, struct ethtool_rxnfc *cmd, u32 *rule_locs)
+{
+ u32 *rule_ids_cache = priv->flow_rules_cache.rule_ids_cache;
+ u32 *cache_num = &priv->flow_rules_cache.rule_ids_cache_num;
+ u32 starting_rule_id = 0;
+ u32 i = 0, j = 0;
+ int err = 0;
+
+ if (!priv->max_flow_rules)
+ return -EOPNOTSUPP;
+
+ do {
+ err = gve_adminq_query_flow_rules(priv, GVE_FLOW_RULE_QUERY_IDS,
+ starting_rule_id);
+ if (err)
+ return err;
+
+ for (i = 0; i < *cache_num; i++) {
+ if (j >= cmd->rule_cnt)
+ return -EMSGSIZE;
+
+ rule_locs[j++] = be32_to_cpu(rule_ids_cache[i]);
+ starting_rule_id = be32_to_cpu(rule_ids_cache[i]) + 1;
+ }
+ } while (*cache_num != 0);
+ cmd->data = priv->max_flow_rules;
+
+ return err;
+}
+
+int gve_add_flow_rule(struct gve_priv *priv, struct ethtool_rxnfc *cmd)
+{
+ struct ethtool_rx_flow_spec *fsp = &cmd->fs;
+ struct gve_adminq_flow_rule *rule = NULL;
+ int err;
+
+ if (!priv->max_flow_rules)
+ return -EOPNOTSUPP;
+
+ rule = kvzalloc(sizeof(*rule), GFP_KERNEL);
+ if (!rule)
+ return -ENOMEM;
+
+ err = gve_generate_flow_rule(priv, fsp, rule);
+ if (err)
+ goto out;
+
+ err = gve_adminq_add_flow_rule(priv, rule, fsp->location);
+
+out:
+ kfree(rule);
+ if (err)
+ dev_err(&priv->pdev->dev, "Failed to add the flow rule: %u", fsp->location);
+
+ return err;
+}
+
+int gve_del_flow_rule(struct gve_priv *priv, struct ethtool_rxnfc *cmd)
+{
+ struct ethtool_rx_flow_spec *fsp = (struct ethtool_rx_flow_spec *)&cmd->fs;
+
+ if (!priv->max_flow_rules)
+ return -EOPNOTSUPP;
+
+ return gve_adminq_del_flow_rule(priv, fsp->location);
+}
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index eb435ccbe98e..ac38453327c3 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/* Google virtual Ethernet (gve) driver
*
- * Copyright (C) 2015-2021 Google, Inc.
+ * Copyright (C) 2015-2024 Google LLC
*/
#include <linux/bpf.h>
@@ -638,6 +638,12 @@ static void gve_teardown_device_resources(struct gve_priv *priv)
/* Tell device its resources are being freed */
if (gve_get_device_resources_ok(priv)) {
+ err = gve_flow_rules_reset(priv);
+ if (err) {
+ dev_err(&priv->pdev->dev,
+ "Failed to reset flow rules: err=%d\n", err);
+ gve_trigger_reset(priv);
+ }
/* detach the stats report */
err = gve_adminq_report_stats(priv, 0, 0x0, GVE_STATS_REPORT_TIMER_PERIOD);
if (err) {
@@ -1782,6 +1788,14 @@ static int gve_xdp(struct net_device *dev, struct netdev_bpf *xdp)
}
}
+int gve_flow_rules_reset(struct gve_priv *priv)
+{
+ if (!priv->max_flow_rules)
+ return 0;
+
+ return gve_adminq_reset_flow_rules(priv);
+}
+
int gve_adjust_config(struct gve_priv *priv,
struct gve_tx_alloc_rings_cfg *tx_alloc_cfg,
struct gve_rx_alloc_rings_cfg *rx_alloc_cfg)
@@ -2055,15 +2069,21 @@ static int gve_set_features(struct net_device *netdev,
netdev->features ^= NETIF_F_LRO;
if (netif_carrier_ok(netdev)) {
err = gve_adjust_config(priv, &tx_alloc_cfg, &rx_alloc_cfg);
- if (err) {
- /* Revert the change on error. */
- netdev->features = orig_features;
- return err;
- }
+ if (err)
+ goto revert_features;
}
}
+ if ((netdev->features & NETIF_F_NTUPLE) && !(features & NETIF_F_NTUPLE)) {
+ err = gve_flow_rules_reset(priv);
+ if (err)
+ goto revert_features;
+ }
return 0;
+
+revert_features:
+ netdev->features = orig_features;
+ return err;
}
static const struct net_device_ops gve_netdev_ops = {
--
2.45.0.rc1.225.g2a3ae87e7f-goog
On 2024-05-07 15:59, Ziwei Xiao wrote:
> From: Jeroen de Borst <[email protected]>
>
> Add a new device option to signal to the driver that the device supports
> flow steering. This device option also carries the maximum number of
> flow steering rules that the device can store.
>
> Signed-off-by: Jeroen de Borst <[email protected]>
> Co-developed-by: Ziwei Xiao <[email protected]>
> Signed-off-by: Ziwei Xiao <[email protected]>
> Reviewed-by: Praveen Kaligineedi <[email protected]>
> Reviewed-by: Harshitha Ramamurthy <[email protected]>
> Reviewed-by: Willem de Bruijn <[email protected]>
> ---
> drivers/net/ethernet/google/gve/gve.h | 2 +
> drivers/net/ethernet/google/gve/gve_adminq.c | 42 ++++++++++++++++++--
> drivers/net/ethernet/google/gve/gve_adminq.h | 11 +++++
> 3 files changed, 51 insertions(+), 4 deletions(-)
Think something went wrong here. The title is different but patch is
same as 2/5.
>
> diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
> index ca7fce17f2c0..58213c15e084 100644
> --- a/drivers/net/ethernet/google/gve/gve.h
> +++ b/drivers/net/ethernet/google/gve/gve.h
> @@ -786,6 +786,8 @@ struct gve_priv {
>
> u16 header_buf_size; /* device configured, header-split supported if non-zero */
> bool header_split_enabled; /* True if the header split is enabled by the user */
> +
> + u32 max_flow_rules;
> };
>
> enum gve_service_task_flags_bit {
> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
> index 514641b3ccc7..85d0d742ad21 100644
> --- a/drivers/net/ethernet/google/gve/gve_adminq.c
> +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
> @@ -44,6 +44,7 @@ void gve_parse_device_option(struct gve_priv *priv,
> struct gve_device_option_jumbo_frames **dev_op_jumbo_frames,
> struct gve_device_option_dqo_qpl **dev_op_dqo_qpl,
> struct gve_device_option_buffer_sizes **dev_op_buffer_sizes,
> + struct gve_device_option_flow_steering **dev_op_flow_steering,
> struct gve_device_option_modify_ring **dev_op_modify_ring)
> {
> u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
> @@ -189,6 +190,23 @@ void gve_parse_device_option(struct gve_priv *priv,
> if (option_length == GVE_DEVICE_OPTION_NO_MIN_RING_SIZE)
> priv->default_min_ring_size = true;
> break;
> + case GVE_DEV_OPT_ID_FLOW_STEERING:
> + if (option_length < sizeof(**dev_op_flow_steering) ||
> + req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING) {
> + dev_warn(&priv->pdev->dev, GVE_DEVICE_OPTION_ERROR_FMT,
> + "Flow Steering",
> + (int)sizeof(**dev_op_flow_steering),
> + GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING,
> + option_length, req_feat_mask);
> + break;
> + }
> +
> + if (option_length > sizeof(**dev_op_flow_steering))
> + dev_warn(&priv->pdev->dev,
> + GVE_DEVICE_OPTION_TOO_BIG_FMT,
> + "Flow Steering");
> + *dev_op_flow_steering = (void *)(option + 1);
> + break;
> default:
> /* If we don't recognize the option just continue
> * without doing anything.
> @@ -208,6 +226,7 @@ gve_process_device_options(struct gve_priv *priv,
> struct gve_device_option_jumbo_frames **dev_op_jumbo_frames,
> struct gve_device_option_dqo_qpl **dev_op_dqo_qpl,
> struct gve_device_option_buffer_sizes **dev_op_buffer_sizes,
> + struct gve_device_option_flow_steering **dev_op_flow_steering,
> struct gve_device_option_modify_ring **dev_op_modify_ring)
> {
> const int num_options = be16_to_cpu(descriptor->num_device_options);
> @@ -230,7 +249,7 @@ gve_process_device_options(struct gve_priv *priv,
> dev_op_gqi_rda, dev_op_gqi_qpl,
> dev_op_dqo_rda, dev_op_jumbo_frames,
> dev_op_dqo_qpl, dev_op_buffer_sizes,
> - dev_op_modify_ring);
> + dev_op_flow_steering, dev_op_modify_ring);
> dev_opt = next_opt;
> }
>
> @@ -838,6 +857,8 @@ static void gve_enable_supported_features(struct gve_priv *priv,
> *dev_op_dqo_qpl,
> const struct gve_device_option_buffer_sizes
> *dev_op_buffer_sizes,
> + const struct gve_device_option_flow_steering
> + *dev_op_flow_steering,
> const struct gve_device_option_modify_ring
> *dev_op_modify_ring)
> {
> @@ -890,10 +911,22 @@ static void gve_enable_supported_features(struct gve_priv *priv,
> priv->min_tx_desc_cnt = be16_to_cpu(dev_op_modify_ring->min_tx_ring_size);
> }
> }
> +
> + if (dev_op_flow_steering &&
> + (supported_features_mask & GVE_SUP_FLOW_STEERING_MASK)) {
> + if (dev_op_flow_steering->max_flow_rules) {
> + priv->max_flow_rules =
> + be32_to_cpu(dev_op_flow_steering->max_flow_rules);
> + dev_info(&priv->pdev->dev,
> + "FLOW STEERING device option enabled with max rule limit of %u.\n",
> + priv->max_flow_rules);
> + }
> + }
> }
>
> int gve_adminq_describe_device(struct gve_priv *priv)
> {
> + struct gve_device_option_flow_steering *dev_op_flow_steering = NULL;
> struct gve_device_option_buffer_sizes *dev_op_buffer_sizes = NULL;
> struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
> struct gve_device_option_modify_ring *dev_op_modify_ring = NULL;
> @@ -930,6 +963,7 @@ int gve_adminq_describe_device(struct gve_priv *priv)
> &dev_op_gqi_qpl, &dev_op_dqo_rda,
> &dev_op_jumbo_frames, &dev_op_dqo_qpl,
> &dev_op_buffer_sizes,
> + &dev_op_flow_steering,
> &dev_op_modify_ring);
> if (err)
> goto free_device_descriptor;
> @@ -969,9 +1003,8 @@ int gve_adminq_describe_device(struct gve_priv *priv)
> /* set default descriptor counts */
> gve_set_default_desc_cnt(priv, descriptor);
>
> - /* DQO supports LRO. */
> if (!gve_is_gqi(priv))
> - priv->dev->hw_features |= NETIF_F_LRO;
> + priv->dev->hw_features |= NETIF_F_LRO | NETIF_F_NTUPLE;
>
> priv->max_registered_pages =
> be64_to_cpu(descriptor->max_registered_pages);
> @@ -991,7 +1024,8 @@ int gve_adminq_describe_device(struct gve_priv *priv)
>
> gve_enable_supported_features(priv, supported_features_mask,
> dev_op_jumbo_frames, dev_op_dqo_qpl,
> - dev_op_buffer_sizes, dev_op_modify_ring);
> + dev_op_buffer_sizes, dev_op_flow_steering,
> + dev_op_modify_ring);
>
> free_device_descriptor:
> dma_pool_free(priv->adminq_pool, descriptor, descriptor_bus);
> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
> index e0370ace8397..e64a0e72e781 100644
> --- a/drivers/net/ethernet/google/gve/gve_adminq.h
> +++ b/drivers/net/ethernet/google/gve/gve_adminq.h
> @@ -146,6 +146,14 @@ struct gve_device_option_modify_ring {
>
> static_assert(sizeof(struct gve_device_option_modify_ring) == 12);
>
> +struct gve_device_option_flow_steering {
> + __be32 supported_features_mask;
> + __be32 reserved;
> + __be32 max_flow_rules;
> +};
> +
> +static_assert(sizeof(struct gve_device_option_flow_steering) == 12);
> +
> /* Terminology:
> *
> * RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
> @@ -163,6 +171,7 @@ enum gve_dev_opt_id {
> GVE_DEV_OPT_ID_DQO_QPL = 0x7,
> GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
> GVE_DEV_OPT_ID_BUFFER_SIZES = 0xa,
> + GVE_DEV_OPT_ID_FLOW_STEERING = 0xb,
> };
>
> enum gve_dev_opt_req_feat_mask {
> @@ -174,12 +183,14 @@ enum gve_dev_opt_req_feat_mask {
> GVE_DEV_OPT_REQ_FEAT_MASK_DQO_QPL = 0x0,
> GVE_DEV_OPT_REQ_FEAT_MASK_BUFFER_SIZES = 0x0,
> GVE_DEV_OPT_REQ_FEAT_MASK_MODIFY_RING = 0x0,
> + GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING = 0x0,
> };
>
> enum gve_sup_feature_mask {
> GVE_SUP_MODIFY_RING_MASK = 1 << 0,
> GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
> GVE_SUP_BUFFER_SIZES_MASK = 1 << 4,
> + GVE_SUP_FLOW_STEERING_MASK = 1 << 5,
> };
>
> #define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
On 2024-05-07 15:59, Ziwei Xiao wrote:
> From: Jeroen de Borst <[email protected]>
>
> Adding new adminq commands for the driver to configure and query flow
> rules that are stored in the device. Flow steering rules are assigned
> with a location that determines the relative order of the rules.
>
> Flow rules can run up to an order of millions. In such cases, storing
> a full copy of the rules in the driver to prepare for the ethtool query
> is infeasible while querying them from the device is better. That needs
> to be optimized too so that we don't send a lot of adminq commands. The
> solution here is to store a limited number of rules/rule ids in the
> driver in a cache. This patch uses dma_pool to allocate 4k bytes which
> lets device write at most 46 flow rules(4k/88) or 1k rule ids(4096/4)
> at a time.
>
> For configuring flow rules, there are 3 sub-commands:
> - ADD which adds a rule at the location supplied
> - DEL which deletes the rule at the location supplied
> - RESET which clears all currently active rules in the device
>
> For querying flow rules, there are also 3 sub-commands:
> - QUERY_RULES corresponds to ETHTOOL_GRXCLSRULE. It fills the rules in
> the allocated cache after querying the device
> - QUERY_RULES_IDS corresponds to ETHTOOL_GRXCLSRLALL. It fills the
> rule_ids in the allocated cache after querying the device
> - QUERY_RULES_STATS corresponds to ETHTOOL_GRXCLSRLCNT. It queries the
> device's current flow rule number and the supported max flow rule
> limit
>
> Signed-off-by: Jeroen de Borst <[email protected]>
> Co-developed-by: Ziwei Xiao <[email protected]>
> Signed-off-by: Ziwei Xiao <[email protected]>
> Reviewed-by: Praveen Kaligineedi <[email protected]>
> Reviewed-by: Harshitha Ramamurthy <[email protected]>
> Reviewed-by: Willem de Bruijn <[email protected]>
> ---
> drivers/net/ethernet/google/gve/gve.h | 42 ++++++
> drivers/net/ethernet/google/gve/gve_adminq.c | 133 ++++++++++++++++++
> drivers/net/ethernet/google/gve/gve_adminq.h | 75 ++++++++++
> drivers/net/ethernet/google/gve/gve_ethtool.c | 5 +-
> drivers/net/ethernet/google/gve/gve_main.c | 54 ++++++-
> 5 files changed, 307 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
> index 58213c15e084..355ae797eacf 100644
> --- a/drivers/net/ethernet/google/gve/gve.h
> +++ b/drivers/net/ethernet/google/gve/gve.h
> @@ -60,6 +60,10 @@
>
> #define GVE_DEFAULT_RX_BUFFER_OFFSET 2048
>
> +#define GVE_FLOW_RULES_CACHE_SIZE (GVE_ADMINQ_BUFFER_SIZE / sizeof(struct gve_flow_rule))
> +#define GVE_FLOW_RULE_IDS_CACHE_SIZE \
> + (GVE_ADMINQ_BUFFER_SIZE / sizeof(((struct gve_flow_rule *)0)->location))
> +
> #define GVE_XDP_ACTIONS 5
>
> #define GVE_GQ_TX_MIN_PKT_DESC_BYTES 182
> @@ -678,6 +682,39 @@ enum gve_queue_format {
> GVE_DQO_QPL_FORMAT = 0x4,
> };
>
> +struct gve_flow_spec {
> + __be32 src_ip[4];
> + __be32 dst_ip[4];
> + union {
> + struct {
> + __be16 src_port;
> + __be16 dst_port;
> + };
> + __be32 spi;
> + };
> + union {
> + u8 tos;
> + u8 tclass;
> + };
> +};
> +
> +struct gve_flow_rule {
> + u32 location;
> + u16 flow_type;
> + u16 action;
> + struct gve_flow_spec key;
> + struct gve_flow_spec mask;
> +};
> +
> +struct gve_flow_rules_cache {
> + bool rules_cache_synced; /* False if the driver's rules_cache is outdated */
> + struct gve_flow_rule *rules_cache;
> + u32 *rule_ids_cache;
> + /* The total number of queried rules that stored in the caches */
> + u32 rules_cache_num;
> + u32 rule_ids_cache_num;
> +};
> +
> struct gve_priv {
> struct net_device *dev;
> struct gve_tx_ring *tx; /* array of tx_cfg.num_queues */
> @@ -744,6 +781,8 @@ struct gve_priv {
> u32 adminq_report_link_speed_cnt;
> u32 adminq_get_ptype_map_cnt;
> u32 adminq_verify_driver_compatibility_cnt;
> + u32 adminq_query_flow_rules_cnt;
> + u32 adminq_cfg_flow_rule_cnt;
>
> /* Global stats */
> u32 interface_up_cnt; /* count of times interface turned up since last reset */
> @@ -788,6 +827,9 @@ struct gve_priv {
> bool header_split_enabled; /* True if the header split is enabled by the user */
>
> u32 max_flow_rules;
> + u32 num_flow_rules; /* Current number of flow rules registered in the device */
> +
> + struct gve_flow_rules_cache flow_rules_cache;
> };
>
> enum gve_service_task_flags_bit {
> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
> index 85d0d742ad21..7a929e20cf96 100644
> --- a/drivers/net/ethernet/google/gve/gve_adminq.c
> +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
> @@ -287,6 +287,8 @@ int gve_adminq_alloc(struct device *dev, struct gve_priv *priv)
> priv->adminq_report_stats_cnt = 0;
> priv->adminq_report_link_speed_cnt = 0;
> priv->adminq_get_ptype_map_cnt = 0;
> + priv->adminq_query_flow_rules_cnt = 0;
> + priv->adminq_cfg_flow_rule_cnt = 0;
>
> /* Setup Admin queue with the device */
> if (priv->pdev->revision < 0x1) {
> @@ -526,6 +528,12 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
> case GVE_ADMINQ_VERIFY_DRIVER_COMPATIBILITY:
> priv->adminq_verify_driver_compatibility_cnt++;
> break;
> + case GVE_ADMINQ_QUERY_FLOW_RULES:
> + priv->adminq_query_flow_rules_cnt++;
> + break;
> + case GVE_ADMINQ_CONFIGURE_FLOW_RULE:
> + priv->adminq_cfg_flow_rule_cnt++;
> + break;
> default:
> dev_err(&priv->pdev->dev, "unknown AQ command opcode %d\n", opcode);
> }
> @@ -1188,3 +1196,128 @@ int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
> ptype_map_bus);
> return err;
> }
> +
> +static int
> +gve_adminq_configure_flow_rule(struct gve_priv *priv,
> + struct gve_adminq_configure_flow_rule *flow_rule_cmd)
> +{
> + int err = gve_adminq_execute_extended_cmd(priv,
> + GVE_ADMINQ_CONFIGURE_FLOW_RULE,
> + sizeof(struct gve_adminq_configure_flow_rule),
> + flow_rule_cmd);
> +
> + if (err) {
> + dev_err(&priv->pdev->dev, "Timeout to configure the flow rule, trigger reset");
> + gve_reset(priv, true);
> + } else {
> + priv->flow_rules_cache.rules_cache_synced = false;
> + }
> +
> + return err;
> +}
> +
> +int gve_adminq_add_flow_rule(struct gve_priv *priv, struct gve_adminq_flow_rule *rule, u32 loc)
> +{
> + struct gve_adminq_configure_flow_rule flow_rule_cmd = {
> + .opcode = cpu_to_be16(GVE_RULE_ADD),
> + .location = cpu_to_be32(loc),
> + .rule = *rule,
> + };
> +
> + return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
> +}
> +
> +int gve_adminq_del_flow_rule(struct gve_priv *priv, u32 loc)
> +{
> + struct gve_adminq_configure_flow_rule flow_rule_cmd = {
> + .opcode = cpu_to_be16(GVE_RULE_DEL),
> + .location = cpu_to_be32(loc),
> + };
> +
> + return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
> +}
> +
> +int gve_adminq_reset_flow_rules(struct gve_priv *priv)
> +{
> + struct gve_adminq_configure_flow_rule flow_rule_cmd = {
> + .opcode = cpu_to_be16(GVE_RULE_RESET),
> + };
> +
> + return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
> +}
> +
> +/* In the dma memory that the driver allocated for the device to query the flow rules, the device
> + * will first write it with a struct of gve_query_flow_rules_descriptor. Next to it, the device
> + * will write an array of rules or rule ids with the count that specified in the descriptor.
> + * For GVE_FLOW_RULE_QUERY_STATS, the device will only write the descriptor.
> + */
> +static int gve_adminq_process_flow_rules_query(struct gve_priv *priv, u16 query_opcode,
> + struct gve_query_flow_rules_descriptor *descriptor)
> +{
> + struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
> + u32 num_queried_rules, total_memory_len, rule_info_len;
> + void *descriptor_end, *rule_info;
> +
> + total_memory_len = be32_to_cpu(descriptor->total_length);
> + if (total_memory_len > GVE_ADMINQ_BUFFER_SIZE) {
> + dev_err(&priv->dev->dev, "flow rules query is out of memory.\n");
The error doesn't seem to match the inequality used. Also, how can the
HW write more than GVE_ADMINQ_BUFFER_SIZE?
> + return -ENOMEM;
> + }
> +
> + num_queried_rules = be32_to_cpu(descriptor->num_queried_rules);
> + descriptor_end = (void *)descriptor + total_memory_len;
This isn't being used.
> + rule_info = (void *)(descriptor + 1);
> +
> + switch (query_opcode) {
> + case GVE_FLOW_RULE_QUERY_RULES:
> + rule_info_len = num_queried_rules * sizeof(*flow_rules_cache->rules_cache);
Do you need to verify what the HW has written matches your expectations
i.e. is sizeof(*descriptor) + rule_info_len == total_memory_len?
> +
> + memcpy(flow_rules_cache->rules_cache, rule_info, rule_info_len);
> + flow_rules_cache->rules_cache_num = num_queried_rules;
> + break;
> + case GVE_FLOW_RULE_QUERY_IDS:
> + rule_info_len = num_queried_rules * sizeof(*flow_rules_cache->rule_ids_cache);
> +
> + memcpy(flow_rules_cache->rule_ids_cache, rule_info, rule_info_len);
> + flow_rules_cache->rule_ids_cache_num = num_queried_rules;
> + break;
> + case GVE_FLOW_RULE_QUERY_STATS:
> + priv->num_flow_rules = be32_to_cpu(descriptor->num_flow_rules);
> + priv->max_flow_rules = be32_to_cpu(descriptor->max_flow_rules);
> + return 0;
> + default:
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +int gve_adminq_query_flow_rules(struct gve_priv *priv, u16 query_opcode, u32 starting_loc)
> +{
> + struct gve_query_flow_rules_descriptor *descriptor;
> + union gve_adminq_command cmd;
> + dma_addr_t descriptor_bus;
> + int err = 0;
> +
> + memset(&cmd, 0, sizeof(cmd));
> + descriptor = dma_pool_alloc(priv->adminq_pool, GFP_KERNEL, &descriptor_bus);
Why is adminq_pool only used for 2 commands?
> + if (!descriptor)
> + return -ENOMEM;
> +
> + cmd.opcode = cpu_to_be32(GVE_ADMINQ_QUERY_FLOW_RULES);
> + cmd.query_flow_rules = (struct gve_adminq_query_flow_rules) {
> + .opcode = cpu_to_be16(query_opcode),
> + .starting_rule_id = cpu_to_be32(starting_loc),
> + .available_length = cpu_to_be64(GVE_ADMINQ_BUFFER_SIZE),
> + .rule_descriptor_addr = cpu_to_be64(descriptor_bus),
> + };
> + err = gve_adminq_execute_cmd(priv, &cmd);
> + if (err)
> + goto out;
> +
> + err = gve_adminq_process_flow_rules_query(priv, query_opcode, descriptor);
> +
> +out:
> + dma_pool_free(priv->adminq_pool, descriptor, descriptor_bus);
> + return err;
> +}
> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
> index e64a0e72e781..9ddb72f92197 100644
> --- a/drivers/net/ethernet/google/gve/gve_adminq.h
> +++ b/drivers/net/ethernet/google/gve/gve_adminq.h
> @@ -25,11 +25,21 @@ enum gve_adminq_opcodes {
> GVE_ADMINQ_REPORT_LINK_SPEED = 0xD,
> GVE_ADMINQ_GET_PTYPE_MAP = 0xE,
> GVE_ADMINQ_VERIFY_DRIVER_COMPATIBILITY = 0xF,
> + GVE_ADMINQ_QUERY_FLOW_RULES = 0x10,
>
> /* For commands that are larger than 56 bytes */
> GVE_ADMINQ_EXTENDED_COMMAND = 0xFF,
> };
>
> +/* The normal adminq command is restricted to be 56 bytes at maximum. For the
> + * longer adminq command, it is wrapped by GVE_ADMINQ_EXTENDED_COMMAND with
> + * inner opcode of gve_adminq_extended_cmd_opcodes specified. The inner command
> + * is written in the dma memory allocated by GVE_ADMINQ_EXTENDED_COMMAND.
> + */
> +enum gve_adminq_extended_cmd_opcodes {
> + GVE_ADMINQ_CONFIGURE_FLOW_RULE = 0x101,
> +};
> +
> /* Admin queue status codes */
> enum gve_adminq_statuses {
> GVE_ADMINQ_COMMAND_UNSET = 0x0,
> @@ -434,6 +444,66 @@ struct gve_adminq_get_ptype_map {
> __be64 ptype_map_addr;
> };
>
> +/* Flow-steering related definitions */
> +enum gve_adminq_flow_rule_cfg_opcode {
> + GVE_RULE_ADD = 0,
> + GVE_RULE_DEL = 1,
> + GVE_RULE_RESET = 2,
> +};
Could these be more descriptive?
> +
> +enum gve_adminq_flow_rule_query_opcode {
> + GVE_FLOW_RULE_QUERY_RULES = 0,
> + GVE_FLOW_RULE_QUERY_IDS = 1,
> + GVE_FLOW_RULE_QUERY_STATS = 2,
> +};
> +
> +enum gve_adminq_flow_type {
> + GVE_FLOW_TYPE_TCPV4,
> + GVE_FLOW_TYPE_UDPV4,
> + GVE_FLOW_TYPE_SCTPV4,
> + GVE_FLOW_TYPE_AHV4,
> + GVE_FLOW_TYPE_ESPV4,
> + GVE_FLOW_TYPE_TCPV6,
> + GVE_FLOW_TYPE_UDPV6,
> + GVE_FLOW_TYPE_SCTPV6,
> + GVE_FLOW_TYPE_AHV6,
> + GVE_FLOW_TYPE_ESPV6,
> +};
> +
> +/* Flow-steering command */
> +struct gve_adminq_flow_rule {
> + __be16 flow_type;
> + __be16 action; /* RX queue id */
> + struct gve_flow_spec key;
> + struct gve_flow_spec mask;
> +};
> +
> +struct gve_adminq_configure_flow_rule {
> + __be16 opcode;
> + u8 padding[2];
> + struct gve_adminq_flow_rule rule;
> + __be32 location;
> +};
> +
> +static_assert(sizeof(struct gve_adminq_configure_flow_rule) == 92);
> +
> +struct gve_query_flow_rules_descriptor {
> + __be32 num_flow_rules; /* Current rule counts stored in the device */
> + __be32 max_flow_rules;
> + __be32 num_queried_rules;
nit: more comments here are appreciated.
> + __be32 total_length; /* The memory length that the device writes */
> +};
> +
> +struct gve_adminq_query_flow_rules {
> + __be16 opcode;
> + u8 padding[2];
> + __be32 starting_rule_id;
> + __be64 available_length; /* The dma memory length that the driver allocated */
> + __be64 rule_descriptor_addr; /* The dma memory address */
> +};
> +
> +static_assert(sizeof(struct gve_adminq_query_flow_rules) == 24);
> +
> union gve_adminq_command {
> struct {
> __be32 opcode;
> @@ -454,6 +524,7 @@ union gve_adminq_command {
> struct gve_adminq_get_ptype_map get_ptype_map;
> struct gve_adminq_verify_driver_compatibility
> verify_driver_compatibility;
> + struct gve_adminq_query_flow_rules query_flow_rules;
> struct gve_adminq_extended_command extended_command;
> };
> };
> @@ -488,6 +559,10 @@ int gve_adminq_verify_driver_compatibility(struct gve_priv *priv,
> u64 driver_info_len,
> dma_addr_t driver_info_addr);
> int gve_adminq_report_link_speed(struct gve_priv *priv);
> +int gve_adminq_add_flow_rule(struct gve_priv *priv, struct gve_adminq_flow_rule *rule, u32 loc);
> +int gve_adminq_del_flow_rule(struct gve_priv *priv, u32 loc);
> +int gve_adminq_reset_flow_rules(struct gve_priv *priv);
> +int gve_adminq_query_flow_rules(struct gve_priv *priv, u16 query_opcode, u32 starting_loc);
>
> struct gve_ptype_lut;
> int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
> diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
> index 156b7e128b53..02cee7e0e229 100644
> --- a/drivers/net/ethernet/google/gve/gve_ethtool.c
> +++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
> @@ -74,7 +74,8 @@ static const char gve_gstrings_adminq_stats[][ETH_GSTRING_LEN] = {
> "adminq_create_tx_queue_cnt", "adminq_create_rx_queue_cnt",
> "adminq_destroy_tx_queue_cnt", "adminq_destroy_rx_queue_cnt",
> "adminq_dcfg_device_resources_cnt", "adminq_set_driver_parameter_cnt",
> - "adminq_report_stats_cnt", "adminq_report_link_speed_cnt", "adminq_get_ptype_map_cnt"
> + "adminq_report_stats_cnt", "adminq_report_link_speed_cnt", "adminq_get_ptype_map_cnt",
> + "adminq_query_flow_rules", "adminq_cfg_flow_rule",
> };
>
> static const char gve_gstrings_priv_flags[][ETH_GSTRING_LEN] = {
> @@ -458,6 +459,8 @@ gve_get_ethtool_stats(struct net_device *netdev,
> data[i++] = priv->adminq_report_stats_cnt;
> data[i++] = priv->adminq_report_link_speed_cnt;
> data[i++] = priv->adminq_get_ptype_map_cnt;
> + data[i++] = priv->adminq_query_flow_rules_cnt;
> + data[i++] = priv->adminq_cfg_flow_rule_cnt;
> }
>
> static void gve_get_channels(struct net_device *netdev,
> diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> index cabf7d4bcecb..eb435ccbe98e 100644
> --- a/drivers/net/ethernet/google/gve/gve_main.c
> +++ b/drivers/net/ethernet/google/gve/gve_main.c
> @@ -141,6 +141,52 @@ static void gve_get_stats(struct net_device *dev, struct rtnl_link_stats64 *s)
> }
> }
>
> +static int gve_alloc_flow_rule_caches(struct gve_priv *priv)
> +{
> + struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
> + int err = 0;
> +
> + if (!priv->max_flow_rules)
> + return 0;
> +
> + flow_rules_cache->rules_cache =
> + kvcalloc(GVE_FLOW_RULES_CACHE_SIZE, sizeof(*flow_rules_cache->rules_cache),
> + GFP_KERNEL);
> + if (!flow_rules_cache->rules_cache) {
> + dev_err(&priv->pdev->dev, "Cannot alloc flow rules cache\n");
> + return -ENOMEM;
> + }
> +
> + flow_rules_cache->rule_ids_cache =
> + kvcalloc(GVE_FLOW_RULE_IDS_CACHE_SIZE, sizeof(*flow_rules_cache->rule_ids_cache),
> + GFP_KERNEL);
> + if (!flow_rules_cache->rule_ids_cache) {
> + dev_err(&priv->pdev->dev, "Cannot alloc flow rule ids cache\n");
> + err = -ENOMEM;
> + goto free_rules_cache;
> + }
> +
> + return 0;
> +
> +free_rules_cache:
> + kvfree(flow_rules_cache->rules_cache);
> + flow_rules_cache->rules_cache = NULL;
> + return err;
> +}
> +
> +static void gve_free_flow_rule_caches(struct gve_priv *priv)
> +{
> + struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
> +
> + if (!priv->max_flow_rules)
> + return;
Is this needed? Is kernel style just kvfree() w/o checks?
> +
> + kvfree(flow_rules_cache->rule_ids_cache);
> + flow_rules_cache->rule_ids_cache = NULL;
> + kvfree(flow_rules_cache->rules_cache);
> + flow_rules_cache->rules_cache = NULL;
> +}
> +
> static int gve_alloc_counter_array(struct gve_priv *priv)
> {
> priv->counter_array =
> @@ -521,9 +567,12 @@ static int gve_setup_device_resources(struct gve_priv *priv)
> {
> int err;
>
> - err = gve_alloc_counter_array(priv);
> + err = gve_alloc_flow_rule_caches(priv);
> if (err)
> return err;
> + err = gve_alloc_counter_array(priv);
> + if (err)
> + goto abort_with_flow_rule_caches;
> err = gve_alloc_notify_blocks(priv);
> if (err)
> goto abort_with_counter;
> @@ -575,6 +624,8 @@ static int gve_setup_device_resources(struct gve_priv *priv)
> gve_free_notify_blocks(priv);
> abort_with_counter:
> gve_free_counter_array(priv);
> +abort_with_flow_rule_caches:
> + gve_free_flow_rule_caches(priv);
>
> return err;
> }
> @@ -606,6 +657,7 @@ static void gve_teardown_device_resources(struct gve_priv *priv)
> kvfree(priv->ptype_lut_dqo);
> priv->ptype_lut_dqo = NULL;
>
> + gve_free_flow_rule_caches(priv);
> gve_free_counter_array(priv);
> gve_free_notify_blocks(priv);
> gve_free_stats_report(priv);
On 2024-05-07 15:59, Ziwei Xiao wrote:
> From: Jeroen de Borst <[email protected]>
>
> Add a new device option to signal to the driver that the device supports
> flow steering. This device option also carries the maximum number of
> flow steering rules that the device can store.
Other than superficial style choices, looks good.
>
> Signed-off-by: Jeroen de Borst <[email protected]>
> Co-developed-by: Ziwei Xiao <[email protected]>
> Signed-off-by: Ziwei Xiao <[email protected]>
> Reviewed-by: Praveen Kaligineedi <[email protected]>
> Reviewed-by: Harshitha Ramamurthy <[email protected]>
> Reviewed-by: Willem de Bruijn <[email protected]>
> ---
> drivers/net/ethernet/google/gve/gve.h | 2 +
> drivers/net/ethernet/google/gve/gve_adminq.c | 42 ++++++++++++++++++--
> drivers/net/ethernet/google/gve/gve_adminq.h | 11 +++++
> 3 files changed, 51 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
> index ca7fce17f2c0..58213c15e084 100644
> --- a/drivers/net/ethernet/google/gve/gve.h
> +++ b/drivers/net/ethernet/google/gve/gve.h
> @@ -786,6 +786,8 @@ struct gve_priv {
>
> u16 header_buf_size; /* device configured, header-split supported if non-zero */
> bool header_split_enabled; /* True if the header split is enabled by the user */
> +
> + u32 max_flow_rules;
nit: this struct is lovingly documented, could we continue by adding a
one liner here maybe about how it's device configured?
> };
>
> enum gve_service_task_flags_bit {
> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
> index 514641b3ccc7..85d0d742ad21 100644
> --- a/drivers/net/ethernet/google/gve/gve_adminq.c
> +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
> @@ -44,6 +44,7 @@ void gve_parse_device_option(struct gve_priv *priv,
> struct gve_device_option_jumbo_frames **dev_op_jumbo_frames,
> struct gve_device_option_dqo_qpl **dev_op_dqo_qpl,
> struct gve_device_option_buffer_sizes **dev_op_buffer_sizes,
> + struct gve_device_option_flow_steering **dev_op_flow_steering,
nit: getting unwieldy here, is it time to pack into a struct?
…
> the rtnl_lock, such as the upcoming flow steering operations. For such
> situations, it can temporarily drop the rtnl_lock, and replace it for
> these operations with a new adminq lock, which can ensure the adminq
> command execution to be thread-safe.
Would you like to use imperative wordings for an improved change description?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.9-rc7#n94
Regards,
Markus
…
> This change introduces an extended adminq command to wrap the real
> command with the inner opcode and the allocated dma memory address
…
Please add imperative wordings for an improved change description.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.9-rc7#n94
Regards,
Markus
> Adding new adminq commands for the driver to configure and query flow
> rules that are stored in the device. …
Will corresponding imperative wordings be desirable for an improved change description?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.9-rc7#n94
Regards,
Markus
…
> +++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
…
> +static int gve_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd, u32 *rule_locs)
> +{
> + struct gve_priv *priv = netdev_priv(netdev);
> + int err = 0;
> +
> + dev_hold(netdev);
> + rtnl_unlock();
…
> +out:
> + rtnl_lock();
> + dev_put(netdev);
> + return err;
> +}
…
How do you think about to increase the application of scope-based resource management
at such source code places?
Regards,
Markus
Hi Ziwei,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Ziwei-Xiao/gve-Add-adminq-mutex-lock/20240508-071419
base: net-next/main
patch link: https://lore.kernel.org/r/20240507225945.1408516-5-ziweixiao%40google.com
patch subject: [PATCH net-next 4/5] gve: Add flow steering adminq commands
config: arm-allyesconfig (https://download.01.org/0day-ci/archive/20240508/[email protected]/config)
compiler: arm-linux-gnueabi-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240508/[email protected]/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All warnings (new ones prefixed by >>):
drivers/net/ethernet/google/gve/gve_adminq.c: In function 'gve_adminq_process_flow_rules_query':
>> drivers/net/ethernet/google/gve/gve_adminq.c:1259:15: warning: variable 'descriptor_end' set but not used [-Wunused-but-set-variable]
1259 | void *descriptor_end, *rule_info;
| ^~~~~~~~~~~~~~
vim +/descriptor_end +1259 drivers/net/ethernet/google/gve/gve_adminq.c
1248
1249 /* In the dma memory that the driver allocated for the device to query the flow rules, the device
1250 * will first write it with a struct of gve_query_flow_rules_descriptor. Next to it, the device
1251 * will write an array of rules or rule ids with the count that specified in the descriptor.
1252 * For GVE_FLOW_RULE_QUERY_STATS, the device will only write the descriptor.
1253 */
1254 static int gve_adminq_process_flow_rules_query(struct gve_priv *priv, u16 query_opcode,
1255 struct gve_query_flow_rules_descriptor *descriptor)
1256 {
1257 struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
1258 u32 num_queried_rules, total_memory_len, rule_info_len;
> 1259 void *descriptor_end, *rule_info;
1260
1261 total_memory_len = be32_to_cpu(descriptor->total_length);
1262 if (total_memory_len > GVE_ADMINQ_BUFFER_SIZE) {
1263 dev_err(&priv->dev->dev, "flow rules query is out of memory.\n");
1264 return -ENOMEM;
1265 }
1266
1267 num_queried_rules = be32_to_cpu(descriptor->num_queried_rules);
1268 descriptor_end = (void *)descriptor + total_memory_len;
1269 rule_info = (void *)(descriptor + 1);
1270
1271 switch (query_opcode) {
1272 case GVE_FLOW_RULE_QUERY_RULES:
1273 rule_info_len = num_queried_rules * sizeof(*flow_rules_cache->rules_cache);
1274
1275 memcpy(flow_rules_cache->rules_cache, rule_info, rule_info_len);
1276 flow_rules_cache->rules_cache_num = num_queried_rules;
1277 break;
1278 case GVE_FLOW_RULE_QUERY_IDS:
1279 rule_info_len = num_queried_rules * sizeof(*flow_rules_cache->rule_ids_cache);
1280
1281 memcpy(flow_rules_cache->rule_ids_cache, rule_info, rule_info_len);
1282 flow_rules_cache->rule_ids_cache_num = num_queried_rules;
1283 break;
1284 case GVE_FLOW_RULE_QUERY_STATS:
1285 priv->num_flow_rules = be32_to_cpu(descriptor->num_flow_rules);
1286 priv->max_flow_rules = be32_to_cpu(descriptor->max_flow_rules);
1287 return 0;
1288 default:
1289 return -EINVAL;
1290 }
1291
1292 return 0;
1293 }
1294
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Tue, May 7, 2024 at 11:24 PM David Wei <[email protected]> wrote:
>
> On 2024-05-07 15:59, Ziwei Xiao wrote:
> > From: Jeroen de Borst <[email protected]>
> >
> > Adding new adminq commands for the driver to configure and query flow
> > rules that are stored in the device. Flow steering rules are assigned
> > with a location that determines the relative order of the rules.
> >
> > Flow rules can run up to an order of millions. In such cases, storing
> > a full copy of the rules in the driver to prepare for the ethtool query
> > is infeasible while querying them from the device is better. That needs
> > to be optimized too so that we don't send a lot of adminq commands. The
> > solution here is to store a limited number of rules/rule ids in the
> > driver in a cache. This patch uses dma_pool to allocate 4k bytes which
> > lets device write at most 46 flow rules(4k/88) or 1k rule ids(4096/4)
> > at a time.
> >
> > For configuring flow rules, there are 3 sub-commands:
> > - ADD which adds a rule at the location supplied
> > - DEL which deletes the rule at the location supplied
> > - RESET which clears all currently active rules in the device
> >
> > For querying flow rules, there are also 3 sub-commands:
> > - QUERY_RULES corresponds to ETHTOOL_GRXCLSRULE. It fills the rules in
> > the allocated cache after querying the device
> > - QUERY_RULES_IDS corresponds to ETHTOOL_GRXCLSRLALL. It fills the
> > rule_ids in the allocated cache after querying the device
> > - QUERY_RULES_STATS corresponds to ETHTOOL_GRXCLSRLCNT. It queries the
> > device's current flow rule number and the supported max flow rule
> > limit
> >
> > Signed-off-by: Jeroen de Borst <[email protected]>
> > Co-developed-by: Ziwei Xiao <[email protected]>
> > Signed-off-by: Ziwei Xiao <[email protected]>
> > Reviewed-by: Praveen Kaligineedi <[email protected]>
> > Reviewed-by: Harshitha Ramamurthy <[email protected]>
> > Reviewed-by: Willem de Bruijn <[email protected]>
> > ---
> > drivers/net/ethernet/google/gve/gve.h | 42 ++++++
> > drivers/net/ethernet/google/gve/gve_adminq.c | 133 ++++++++++++++++++
> > drivers/net/ethernet/google/gve/gve_adminq.h | 75 ++++++++++
> > drivers/net/ethernet/google/gve/gve_ethtool.c | 5 +-
> > drivers/net/ethernet/google/gve/gve_main.c | 54 ++++++-
> > 5 files changed, 307 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
> > index 58213c15e084..355ae797eacf 100644
> > --- a/drivers/net/ethernet/google/gve/gve.h
> > +++ b/drivers/net/ethernet/google/gve/gve.h
> > @@ -60,6 +60,10 @@
> >
> > #define GVE_DEFAULT_RX_BUFFER_OFFSET 2048
> >
> > +#define GVE_FLOW_RULES_CACHE_SIZE (GVE_ADMINQ_BUFFER_SIZE / sizeof(struct gve_flow_rule))
> > +#define GVE_FLOW_RULE_IDS_CACHE_SIZE \
> > + (GVE_ADMINQ_BUFFER_SIZE / sizeof(((struct gve_flow_rule *)0)->location))
> > +
> > #define GVE_XDP_ACTIONS 5
> >
> > #define GVE_GQ_TX_MIN_PKT_DESC_BYTES 182
> > @@ -678,6 +682,39 @@ enum gve_queue_format {
> > GVE_DQO_QPL_FORMAT = 0x4,
> > };
> >
> > +struct gve_flow_spec {
> > + __be32 src_ip[4];
> > + __be32 dst_ip[4];
> > + union {
> > + struct {
> > + __be16 src_port;
> > + __be16 dst_port;
> > + };
> > + __be32 spi;
> > + };
> > + union {
> > + u8 tos;
> > + u8 tclass;
> > + };
> > +};
> > +
> > +struct gve_flow_rule {
> > + u32 location;
> > + u16 flow_type;
> > + u16 action;
> > + struct gve_flow_spec key;
> > + struct gve_flow_spec mask;
> > +};
> > +
> > +struct gve_flow_rules_cache {
> > + bool rules_cache_synced; /* False if the driver's rules_cache is outdated */
> > + struct gve_flow_rule *rules_cache;
> > + u32 *rule_ids_cache;
> > + /* The total number of queried rules that stored in the caches */
> > + u32 rules_cache_num;
> > + u32 rule_ids_cache_num;
> > +};
> > +
> > struct gve_priv {
> > struct net_device *dev;
> > struct gve_tx_ring *tx; /* array of tx_cfg.num_queues */
> > @@ -744,6 +781,8 @@ struct gve_priv {
> > u32 adminq_report_link_speed_cnt;
> > u32 adminq_get_ptype_map_cnt;
> > u32 adminq_verify_driver_compatibility_cnt;
> > + u32 adminq_query_flow_rules_cnt;
> > + u32 adminq_cfg_flow_rule_cnt;
> >
> > /* Global stats */
> > u32 interface_up_cnt; /* count of times interface turned up since last reset */
> > @@ -788,6 +827,9 @@ struct gve_priv {
> > bool header_split_enabled; /* True if the header split is enabled by the user */
> >
> > u32 max_flow_rules;
> > + u32 num_flow_rules; /* Current number of flow rules registered in the device */
> > +
> > + struct gve_flow_rules_cache flow_rules_cache;
> > };
> >
> > enum gve_service_task_flags_bit {
> > diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
> > index 85d0d742ad21..7a929e20cf96 100644
> > --- a/drivers/net/ethernet/google/gve/gve_adminq.c
> > +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
> > @@ -287,6 +287,8 @@ int gve_adminq_alloc(struct device *dev, struct gve_priv *priv)
> > priv->adminq_report_stats_cnt = 0;
> > priv->adminq_report_link_speed_cnt = 0;
> > priv->adminq_get_ptype_map_cnt = 0;
> > + priv->adminq_query_flow_rules_cnt = 0;
> > + priv->adminq_cfg_flow_rule_cnt = 0;
> >
> > /* Setup Admin queue with the device */
> > if (priv->pdev->revision < 0x1) {
> > @@ -526,6 +528,12 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
> > case GVE_ADMINQ_VERIFY_DRIVER_COMPATIBILITY:
> > priv->adminq_verify_driver_compatibility_cnt++;
> > break;
> > + case GVE_ADMINQ_QUERY_FLOW_RULES:
> > + priv->adminq_query_flow_rules_cnt++;
> > + break;
> > + case GVE_ADMINQ_CONFIGURE_FLOW_RULE:
> > + priv->adminq_cfg_flow_rule_cnt++;
> > + break;
> > default:
> > dev_err(&priv->pdev->dev, "unknown AQ command opcode %d\n", opcode);
> > }
> > @@ -1188,3 +1196,128 @@ int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
> > ptype_map_bus);
> > return err;
> > }
> > +
> > +static int
> > +gve_adminq_configure_flow_rule(struct gve_priv *priv,
> > + struct gve_adminq_configure_flow_rule *flow_rule_cmd)
> > +{
> > + int err = gve_adminq_execute_extended_cmd(priv,
> > + GVE_ADMINQ_CONFIGURE_FLOW_RULE,
> > + sizeof(struct gve_adminq_configure_flow_rule),
> > + flow_rule_cmd);
> > +
> > + if (err) {
> > + dev_err(&priv->pdev->dev, "Timeout to configure the flow rule, trigger reset");
> > + gve_reset(priv, true);
> > + } else {
> > + priv->flow_rules_cache.rules_cache_synced = false;
> > + }
> > +
> > + return err;
> > +}
> > +
> > +int gve_adminq_add_flow_rule(struct gve_priv *priv, struct gve_adminq_flow_rule *rule, u32 loc)
> > +{
> > + struct gve_adminq_configure_flow_rule flow_rule_cmd = {
> > + .opcode = cpu_to_be16(GVE_RULE_ADD),
> > + .location = cpu_to_be32(loc),
> > + .rule = *rule,
> > + };
> > +
> > + return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
> > +}
> > +
> > +int gve_adminq_del_flow_rule(struct gve_priv *priv, u32 loc)
> > +{
> > + struct gve_adminq_configure_flow_rule flow_rule_cmd = {
> > + .opcode = cpu_to_be16(GVE_RULE_DEL),
> > + .location = cpu_to_be32(loc),
> > + };
> > +
> > + return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
> > +}
> > +
> > +int gve_adminq_reset_flow_rules(struct gve_priv *priv)
> > +{
> > + struct gve_adminq_configure_flow_rule flow_rule_cmd = {
> > + .opcode = cpu_to_be16(GVE_RULE_RESET),
> > + };
> > +
> > + return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
> > +}
> > +
> > +/* In the dma memory that the driver allocated for the device to query the flow rules, the device
> > + * will first write it with a struct of gve_query_flow_rules_descriptor. Next to it, the device
> > + * will write an array of rules or rule ids with the count that specified in the descriptor.
> > + * For GVE_FLOW_RULE_QUERY_STATS, the device will only write the descriptor.
> > + */
> > +static int gve_adminq_process_flow_rules_query(struct gve_priv *priv, u16 query_opcode,
> > + struct gve_query_flow_rules_descriptor *descriptor)
> > +{
> > + struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
> > + u32 num_queried_rules, total_memory_len, rule_info_len;
> > + void *descriptor_end, *rule_info;
> > +
> > + total_memory_len = be32_to_cpu(descriptor->total_length);
> > + if (total_memory_len > GVE_ADMINQ_BUFFER_SIZE) {
> > + dev_err(&priv->dev->dev, "flow rules query is out of memory.\n");
>
> The error doesn't seem to match the inequality used. Also, how can the
> HW write more than GVE_ADMINQ_BUFFER_SIZE?
Will remove this check and add another check as suggested below.
>
> > + return -ENOMEM;
> > + }
> > +
> > + num_queried_rules = be32_to_cpu(descriptor->num_queried_rules);
> > + descriptor_end = (void *)descriptor + total_memory_len;
>
> This isn't being used.
Will remove it. Thank you!
>
> > + rule_info = (void *)(descriptor + 1);
> > +
> > + switch (query_opcode) {
> > + case GVE_FLOW_RULE_QUERY_RULES:
> > + rule_info_len = num_queried_rules * sizeof(*flow_rules_cache->rules_cache);
>
> Do you need to verify what the HW has written matches your expectations
> i.e. is sizeof(*descriptor) + rule_info_len == total_memory_len?
Will add a check here. Thanks for the suggestion!
>
> > +
> > + memcpy(flow_rules_cache->rules_cache, rule_info, rule_info_len);
> > + flow_rules_cache->rules_cache_num = num_queried_rules;
> > + break;
> > + case GVE_FLOW_RULE_QUERY_IDS:
> > + rule_info_len = num_queried_rules * sizeof(*flow_rules_cache->rule_ids_cache);
> > +
> > + memcpy(flow_rules_cache->rule_ids_cache, rule_info, rule_info_len);
> > + flow_rules_cache->rule_ids_cache_num = num_queried_rules;
> > + break;
> > + case GVE_FLOW_RULE_QUERY_STATS:
> > + priv->num_flow_rules = be32_to_cpu(descriptor->num_flow_rules);
> > + priv->max_flow_rules = be32_to_cpu(descriptor->max_flow_rules);
> > + return 0;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +int gve_adminq_query_flow_rules(struct gve_priv *priv, u16 query_opcode, u32 starting_loc)
> > +{
> > + struct gve_query_flow_rules_descriptor *descriptor;
> > + union gve_adminq_command cmd;
> > + dma_addr_t descriptor_bus;
> > + int err = 0;
> > +
> > + memset(&cmd, 0, sizeof(cmd));
> > + descriptor = dma_pool_alloc(priv->adminq_pool, GFP_KERNEL, &descriptor_bus);
>
> Why is adminq_pool only used for 2 commands?
>
The adminq_pool is not new added. It's also used for the other adminq commands.
> > + if (!descriptor)
> > + return -ENOMEM;
> > +
> > + cmd.opcode = cpu_to_be32(GVE_ADMINQ_QUERY_FLOW_RULES);
> > + cmd.query_flow_rules = (struct gve_adminq_query_flow_rules) {
> > + .opcode = cpu_to_be16(query_opcode),
> > + .starting_rule_id = cpu_to_be32(starting_loc),
> > + .available_length = cpu_to_be64(GVE_ADMINQ_BUFFER_SIZE),
> > + .rule_descriptor_addr = cpu_to_be64(descriptor_bus),
> > + };
> > + err = gve_adminq_execute_cmd(priv, &cmd);
> > + if (err)
> > + goto out;
> > +
> > + err = gve_adminq_process_flow_rules_query(priv, query_opcode, descriptor);
> > +
> > +out:
> > + dma_pool_free(priv->adminq_pool, descriptor, descriptor_bus);
> > + return err;
> > +}
> > diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
> > index e64a0e72e781..9ddb72f92197 100644
> > --- a/drivers/net/ethernet/google/gve/gve_adminq.h
> > +++ b/drivers/net/ethernet/google/gve/gve_adminq.h
> > @@ -25,11 +25,21 @@ enum gve_adminq_opcodes {
> > GVE_ADMINQ_REPORT_LINK_SPEED = 0xD,
> > GVE_ADMINQ_GET_PTYPE_MAP = 0xE,
> > GVE_ADMINQ_VERIFY_DRIVER_COMPATIBILITY = 0xF,
> > + GVE_ADMINQ_QUERY_FLOW_RULES = 0x10,
> >
> > /* For commands that are larger than 56 bytes */
> > GVE_ADMINQ_EXTENDED_COMMAND = 0xFF,
> > };
> >
> > +/* The normal adminq command is restricted to be 56 bytes at maximum. For the
> > + * longer adminq command, it is wrapped by GVE_ADMINQ_EXTENDED_COMMAND with
> > + * inner opcode of gve_adminq_extended_cmd_opcodes specified. The inner command
> > + * is written in the dma memory allocated by GVE_ADMINQ_EXTENDED_COMMAND.
> > + */
> > +enum gve_adminq_extended_cmd_opcodes {
> > + GVE_ADMINQ_CONFIGURE_FLOW_RULE = 0x101,
> > +};
> > +
> > /* Admin queue status codes */
> > enum gve_adminq_statuses {
> > GVE_ADMINQ_COMMAND_UNSET = 0x0,
> > @@ -434,6 +444,66 @@ struct gve_adminq_get_ptype_map {
> > __be64 ptype_map_addr;
> > };
> >
> > +/* Flow-steering related definitions */
> > +enum gve_adminq_flow_rule_cfg_opcode {
> > + GVE_RULE_ADD = 0,
> > + GVE_RULE_DEL = 1,
> > + GVE_RULE_RESET = 2,
> > +};
>
> Could these be more descriptive?
>
Could you be more specific on which needs to be improved? Is the enum
name or the field name?
> > +
> > +enum gve_adminq_flow_rule_query_opcode {
> > + GVE_FLOW_RULE_QUERY_RULES = 0,
> > + GVE_FLOW_RULE_QUERY_IDS = 1,
> > + GVE_FLOW_RULE_QUERY_STATS = 2,
> > +};
> > +
> > +enum gve_adminq_flow_type {
> > + GVE_FLOW_TYPE_TCPV4,
> > + GVE_FLOW_TYPE_UDPV4,
> > + GVE_FLOW_TYPE_SCTPV4,
> > + GVE_FLOW_TYPE_AHV4,
> > + GVE_FLOW_TYPE_ESPV4,
> > + GVE_FLOW_TYPE_TCPV6,
> > + GVE_FLOW_TYPE_UDPV6,
> > + GVE_FLOW_TYPE_SCTPV6,
> > + GVE_FLOW_TYPE_AHV6,
> > + GVE_FLOW_TYPE_ESPV6,
> > +};
> > +
> > +/* Flow-steering command */
> > +struct gve_adminq_flow_rule {
> > + __be16 flow_type;
> > + __be16 action; /* RX queue id */
> > + struct gve_flow_spec key;
> > + struct gve_flow_spec mask;
> > +};
> > +
> > +struct gve_adminq_configure_flow_rule {
> > + __be16 opcode;
> > + u8 padding[2];
> > + struct gve_adminq_flow_rule rule;
> > + __be32 location;
> > +};
> > +
> > +static_assert(sizeof(struct gve_adminq_configure_flow_rule) == 92);
> > +
> > +struct gve_query_flow_rules_descriptor {
> > + __be32 num_flow_rules; /* Current rule counts stored in the device */
> > + __be32 max_flow_rules;
> > + __be32 num_queried_rules;
>
> nit: more comments here are appreciated.
Will add
>
> > + __be32 total_length; /* The memory length that the device writes */
> > +};
> > +
> > +struct gve_adminq_query_flow_rules {
> > + __be16 opcode;
> > + u8 padding[2];
> > + __be32 starting_rule_id;
> > + __be64 available_length; /* The dma memory length that the driver allocated */
> > + __be64 rule_descriptor_addr; /* The dma memory address */
> > +};
> > +
> > +static_assert(sizeof(struct gve_adminq_query_flow_rules) == 24);
> > +
> > union gve_adminq_command {
> > struct {
> > __be32 opcode;
> > @@ -454,6 +524,7 @@ union gve_adminq_command {
> > struct gve_adminq_get_ptype_map get_ptype_map;
> > struct gve_adminq_verify_driver_compatibility
> > verify_driver_compatibility;
> > + struct gve_adminq_query_flow_rules query_flow_rules;
> > struct gve_adminq_extended_command extended_command;
> > };
> > };
> > @@ -488,6 +559,10 @@ int gve_adminq_verify_driver_compatibility(struct gve_priv *priv,
> > u64 driver_info_len,
> > dma_addr_t driver_info_addr);
> > int gve_adminq_report_link_speed(struct gve_priv *priv);
> > +int gve_adminq_add_flow_rule(struct gve_priv *priv, struct gve_adminq_flow_rule *rule, u32 loc);
> > +int gve_adminq_del_flow_rule(struct gve_priv *priv, u32 loc);
> > +int gve_adminq_reset_flow_rules(struct gve_priv *priv);
> > +int gve_adminq_query_flow_rules(struct gve_priv *priv, u16 query_opcode, u32 starting_loc);
> >
> > struct gve_ptype_lut;
> > int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
> > diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
> > index 156b7e128b53..02cee7e0e229 100644
> > --- a/drivers/net/ethernet/google/gve/gve_ethtool.c
> > +++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
> > @@ -74,7 +74,8 @@ static const char gve_gstrings_adminq_stats[][ETH_GSTRING_LEN] = {
> > "adminq_create_tx_queue_cnt", "adminq_create_rx_queue_cnt",
> > "adminq_destroy_tx_queue_cnt", "adminq_destroy_rx_queue_cnt",
> > "adminq_dcfg_device_resources_cnt", "adminq_set_driver_parameter_cnt",
> > - "adminq_report_stats_cnt", "adminq_report_link_speed_cnt", "adminq_get_ptype_map_cnt"
> > + "adminq_report_stats_cnt", "adminq_report_link_speed_cnt", "adminq_get_ptype_map_cnt",
> > + "adminq_query_flow_rules", "adminq_cfg_flow_rule",
> > };
> >
> > static const char gve_gstrings_priv_flags[][ETH_GSTRING_LEN] = {
> > @@ -458,6 +459,8 @@ gve_get_ethtool_stats(struct net_device *netdev,
> > data[i++] = priv->adminq_report_stats_cnt;
> > data[i++] = priv->adminq_report_link_speed_cnt;
> > data[i++] = priv->adminq_get_ptype_map_cnt;
> > + data[i++] = priv->adminq_query_flow_rules_cnt;
> > + data[i++] = priv->adminq_cfg_flow_rule_cnt;
> > }
> >
> > static void gve_get_channels(struct net_device *netdev,
> > diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> > index cabf7d4bcecb..eb435ccbe98e 100644
> > --- a/drivers/net/ethernet/google/gve/gve_main.c
> > +++ b/drivers/net/ethernet/google/gve/gve_main.c
> > @@ -141,6 +141,52 @@ static void gve_get_stats(struct net_device *dev, struct rtnl_link_stats64 *s)
> > }
> > }
> >
> > +static int gve_alloc_flow_rule_caches(struct gve_priv *priv)
> > +{
> > + struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
> > + int err = 0;
> > +
> > + if (!priv->max_flow_rules)
> > + return 0;
> > +
> > + flow_rules_cache->rules_cache =
> > + kvcalloc(GVE_FLOW_RULES_CACHE_SIZE, sizeof(*flow_rules_cache->rules_cache),
> > + GFP_KERNEL);
> > + if (!flow_rules_cache->rules_cache) {
> > + dev_err(&priv->pdev->dev, "Cannot alloc flow rules cache\n");
> > + return -ENOMEM;
> > + }
> > +
> > + flow_rules_cache->rule_ids_cache =
> > + kvcalloc(GVE_FLOW_RULE_IDS_CACHE_SIZE, sizeof(*flow_rules_cache->rule_ids_cache),
> > + GFP_KERNEL);
> > + if (!flow_rules_cache->rule_ids_cache) {
> > + dev_err(&priv->pdev->dev, "Cannot alloc flow rule ids cache\n");
> > + err = -ENOMEM;
> > + goto free_rules_cache;
> > + }
> > +
> > + return 0;
> > +
> > +free_rules_cache:
> > + kvfree(flow_rules_cache->rules_cache);
> > + flow_rules_cache->rules_cache = NULL;
> > + return err;
> > +}
> > +
> > +static void gve_free_flow_rule_caches(struct gve_priv *priv)
> > +{
> > + struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
> > +
> > + if (!priv->max_flow_rules)
> > + return;
>
> Is this needed? Is kernel style just kvfree() w/o checks?
>
Will remove
> > +
> > + kvfree(flow_rules_cache->rule_ids_cache);
> > + flow_rules_cache->rule_ids_cache = NULL;
> > + kvfree(flow_rules_cache->rules_cache);
> > + flow_rules_cache->rules_cache = NULL;
> > +}
> > +
> > static int gve_alloc_counter_array(struct gve_priv *priv)
> > {
> > priv->counter_array =
> > @@ -521,9 +567,12 @@ static int gve_setup_device_resources(struct gve_priv *priv)
> > {
> > int err;
> >
> > - err = gve_alloc_counter_array(priv);
> > + err = gve_alloc_flow_rule_caches(priv);
> > if (err)
> > return err;
> > + err = gve_alloc_counter_array(priv);
> > + if (err)
> > + goto abort_with_flow_rule_caches;
> > err = gve_alloc_notify_blocks(priv);
> > if (err)
> > goto abort_with_counter;
> > @@ -575,6 +624,8 @@ static int gve_setup_device_resources(struct gve_priv *priv)
> > gve_free_notify_blocks(priv);
> > abort_with_counter:
> > gve_free_counter_array(priv);
> > +abort_with_flow_rule_caches:
> > + gve_free_flow_rule_caches(priv);
> >
> > return err;
> > }
> > @@ -606,6 +657,7 @@ static void gve_teardown_device_resources(struct gve_priv *priv)
> > kvfree(priv->ptype_lut_dqo);
> > priv->ptype_lut_dqo = NULL;
> >
> > + gve_free_flow_rule_caches(priv);
> > gve_free_counter_array(priv);
> > gve_free_notify_blocks(priv);
> > gve_free_stats_report(priv);
On Tue, May 7, 2024 at 10:34 PM David Wei <[email protected]> wrote:
>
> On 2024-05-07 15:59, Ziwei Xiao wrote:
> > From: Jeroen de Borst <[email protected]>
> >
> > Add a new device option to signal to the driver that the device supports
> > flow steering. This device option also carries the maximum number of
> > flow steering rules that the device can store.
>
> Other than superficial style choices, looks good.
>
> >
> > Signed-off-by: Jeroen de Borst <[email protected]>
> > Co-developed-by: Ziwei Xiao <[email protected]>
> > Signed-off-by: Ziwei Xiao <[email protected]>
> > Reviewed-by: Praveen Kaligineedi <[email protected]>
> > Reviewed-by: Harshitha Ramamurthy <[email protected]>
> > Reviewed-by: Willem de Bruijn <[email protected]>
> > ---
> > drivers/net/ethernet/google/gve/gve.h | 2 +
> > drivers/net/ethernet/google/gve/gve_adminq.c | 42 ++++++++++++++++++--
> > drivers/net/ethernet/google/gve/gve_adminq.h | 11 +++++
> > 3 files changed, 51 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
> > index ca7fce17f2c0..58213c15e084 100644
> > --- a/drivers/net/ethernet/google/gve/gve.h
> > +++ b/drivers/net/ethernet/google/gve/gve.h
> > @@ -786,6 +786,8 @@ struct gve_priv {
> >
> > u16 header_buf_size; /* device configured, header-split supported if non-zero */
> > bool header_split_enabled; /* True if the header split is enabled by the user */
> > +
> > + u32 max_flow_rules;
>
> nit: this struct is lovingly documented, could we continue by adding a
> one liner here maybe about how it's device configured?
>
Will add.
> > };
> >
> > enum gve_service_task_flags_bit {
> > diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
> > index 514641b3ccc7..85d0d742ad21 100644
> > --- a/drivers/net/ethernet/google/gve/gve_adminq.c
> > +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
> > @@ -44,6 +44,7 @@ void gve_parse_device_option(struct gve_priv *priv,
> > struct gve_device_option_jumbo_frames **dev_op_jumbo_frames,
> > struct gve_device_option_dqo_qpl **dev_op_dqo_qpl,
> > struct gve_device_option_buffer_sizes **dev_op_buffer_sizes,
> > + struct gve_device_option_flow_steering **dev_op_flow_steering,
>
> nit: getting unwieldy here, is it time to pack into a struct?
Thank you for pointing this out! We have plans to improve this device
option part, but may not be able to be included in this patch.
On Tue, May 7, 2024 at 10:33 PM David Wei <[email protected]> wrote:
>
> On 2024-05-07 15:59, Ziwei Xiao wrote:
> > From: Jeroen de Borst <[email protected]>
> >
> > Add a new device option to signal to the driver that the device supports
> > flow steering. This device option also carries the maximum number of
> > flow steering rules that the device can store.
> >
> > Signed-off-by: Jeroen de Borst <[email protected]>
> > Co-developed-by: Ziwei Xiao <[email protected]>
> > Signed-off-by: Ziwei Xiao <[email protected]>
> > Reviewed-by: Praveen Kaligineedi <[email protected]>
> > Reviewed-by: Harshitha Ramamurthy <[email protected]>
> > Reviewed-by: Willem de Bruijn <[email protected]>
> > ---
> > drivers/net/ethernet/google/gve/gve.h | 2 +
> > drivers/net/ethernet/google/gve/gve_adminq.c | 42 ++++++++++++++++++--
> > drivers/net/ethernet/google/gve/gve_adminq.h | 11 +++++
> > 3 files changed, 51 insertions(+), 4 deletions(-)
>
> Think something went wrong here. The title is different but patch is
> same as 2/5.
This is the patch for adding the device option(3/5), while the
previous patch you commented is actually for adding extended
adminq(2/5). I don't see any wrong with these two patches. Maybe it's
replying in the wrong thread?
>
> >
> > diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
> > index ca7fce17f2c0..58213c15e084 100644
> > --- a/drivers/net/ethernet/google/gve/gve.h
> > +++ b/drivers/net/ethernet/google/gve/gve.h
> > @@ -786,6 +786,8 @@ struct gve_priv {
> >
> > u16 header_buf_size; /* device configured, header-split supported if non-zero */
> > bool header_split_enabled; /* True if the header split is enabled by the user */
> > +
> > + u32 max_flow_rules;
> > };
> >
> > enum gve_service_task_flags_bit {
> > diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
> > index 514641b3ccc7..85d0d742ad21 100644
> > --- a/drivers/net/ethernet/google/gve/gve_adminq.c
> > +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
> > @@ -44,6 +44,7 @@ void gve_parse_device_option(struct gve_priv *priv,
> > struct gve_device_option_jumbo_frames **dev_op_jumbo_frames,
> > struct gve_device_option_dqo_qpl **dev_op_dqo_qpl,
> > struct gve_device_option_buffer_sizes **dev_op_buffer_sizes,
> > + struct gve_device_option_flow_steering **dev_op_flow_steering,
> > struct gve_device_option_modify_ring **dev_op_modify_ring)
> > {
> > u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
> > @@ -189,6 +190,23 @@ void gve_parse_device_option(struct gve_priv *priv,
> > if (option_length == GVE_DEVICE_OPTION_NO_MIN_RING_SIZE)
> > priv->default_min_ring_size = true;
> > break;
> > + case GVE_DEV_OPT_ID_FLOW_STEERING:
> > + if (option_length < sizeof(**dev_op_flow_steering) ||
> > + req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING) {
> > + dev_warn(&priv->pdev->dev, GVE_DEVICE_OPTION_ERROR_FMT,
> > + "Flow Steering",
> > + (int)sizeof(**dev_op_flow_steering),
> > + GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING,
> > + option_length, req_feat_mask);
> > + break;
> > + }
> > +
> > + if (option_length > sizeof(**dev_op_flow_steering))
> > + dev_warn(&priv->pdev->dev,
> > + GVE_DEVICE_OPTION_TOO_BIG_FMT,
> > + "Flow Steering");
> > + *dev_op_flow_steering = (void *)(option + 1);
> > + break;
> > default:
> > /* If we don't recognize the option just continue
> > * without doing anything.
> > @@ -208,6 +226,7 @@ gve_process_device_options(struct gve_priv *priv,
> > struct gve_device_option_jumbo_frames **dev_op_jumbo_frames,
> > struct gve_device_option_dqo_qpl **dev_op_dqo_qpl,
> > struct gve_device_option_buffer_sizes **dev_op_buffer_sizes,
> > + struct gve_device_option_flow_steering **dev_op_flow_steering,
> > struct gve_device_option_modify_ring **dev_op_modify_ring)
> > {
> > const int num_options = be16_to_cpu(descriptor->num_device_options);
> > @@ -230,7 +249,7 @@ gve_process_device_options(struct gve_priv *priv,
> > dev_op_gqi_rda, dev_op_gqi_qpl,
> > dev_op_dqo_rda, dev_op_jumbo_frames,
> > dev_op_dqo_qpl, dev_op_buffer_sizes,
> > - dev_op_modify_ring);
> > + dev_op_flow_steering, dev_op_modify_ring);
> > dev_opt = next_opt;
> > }
> >
> > @@ -838,6 +857,8 @@ static void gve_enable_supported_features(struct gve_priv *priv,
> > *dev_op_dqo_qpl,
> > const struct gve_device_option_buffer_sizes
> > *dev_op_buffer_sizes,
> > + const struct gve_device_option_flow_steering
> > + *dev_op_flow_steering,
> > const struct gve_device_option_modify_ring
> > *dev_op_modify_ring)
> > {
> > @@ -890,10 +911,22 @@ static void gve_enable_supported_features(struct gve_priv *priv,
> > priv->min_tx_desc_cnt = be16_to_cpu(dev_op_modify_ring->min_tx_ring_size);
> > }
> > }
> > +
> > + if (dev_op_flow_steering &&
> > + (supported_features_mask & GVE_SUP_FLOW_STEERING_MASK)) {
> > + if (dev_op_flow_steering->max_flow_rules) {
> > + priv->max_flow_rules =
> > + be32_to_cpu(dev_op_flow_steering->max_flow_rules);
> > + dev_info(&priv->pdev->dev,
> > + "FLOW STEERING device option enabled with max rule limit of %u.\n",
> > + priv->max_flow_rules);
> > + }
> > + }
> > }
> >
> > int gve_adminq_describe_device(struct gve_priv *priv)
> > {
> > + struct gve_device_option_flow_steering *dev_op_flow_steering = NULL;
> > struct gve_device_option_buffer_sizes *dev_op_buffer_sizes = NULL;
> > struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
> > struct gve_device_option_modify_ring *dev_op_modify_ring = NULL;
> > @@ -930,6 +963,7 @@ int gve_adminq_describe_device(struct gve_priv *priv)
> > &dev_op_gqi_qpl, &dev_op_dqo_rda,
> > &dev_op_jumbo_frames, &dev_op_dqo_qpl,
> > &dev_op_buffer_sizes,
> > + &dev_op_flow_steering,
> > &dev_op_modify_ring);
> > if (err)
> > goto free_device_descriptor;
> > @@ -969,9 +1003,8 @@ int gve_adminq_describe_device(struct gve_priv *priv)
> > /* set default descriptor counts */
> > gve_set_default_desc_cnt(priv, descriptor);
> >
> > - /* DQO supports LRO. */
> > if (!gve_is_gqi(priv))
> > - priv->dev->hw_features |= NETIF_F_LRO;
> > + priv->dev->hw_features |= NETIF_F_LRO | NETIF_F_NTUPLE;
> >
> > priv->max_registered_pages =
> > be64_to_cpu(descriptor->max_registered_pages);
> > @@ -991,7 +1024,8 @@ int gve_adminq_describe_device(struct gve_priv *priv)
> >
> > gve_enable_supported_features(priv, supported_features_mask,
> > dev_op_jumbo_frames, dev_op_dqo_qpl,
> > - dev_op_buffer_sizes, dev_op_modify_ring);
> > + dev_op_buffer_sizes, dev_op_flow_steering,
> > + dev_op_modify_ring);
> >
> > free_device_descriptor:
> > dma_pool_free(priv->adminq_pool, descriptor, descriptor_bus);
> > diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
> > index e0370ace8397..e64a0e72e781 100644
> > --- a/drivers/net/ethernet/google/gve/gve_adminq.h
> > +++ b/drivers/net/ethernet/google/gve/gve_adminq.h
> > @@ -146,6 +146,14 @@ struct gve_device_option_modify_ring {
> >
> > static_assert(sizeof(struct gve_device_option_modify_ring) == 12);
> >
> > +struct gve_device_option_flow_steering {
> > + __be32 supported_features_mask;
> > + __be32 reserved;
> > + __be32 max_flow_rules;
> > +};
> > +
> > +static_assert(sizeof(struct gve_device_option_flow_steering) == 12);
> > +
> > /* Terminology:
> > *
> > * RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
> > @@ -163,6 +171,7 @@ enum gve_dev_opt_id {
> > GVE_DEV_OPT_ID_DQO_QPL = 0x7,
> > GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
> > GVE_DEV_OPT_ID_BUFFER_SIZES = 0xa,
> > + GVE_DEV_OPT_ID_FLOW_STEERING = 0xb,
> > };
> >
> > enum gve_dev_opt_req_feat_mask {
> > @@ -174,12 +183,14 @@ enum gve_dev_opt_req_feat_mask {
> > GVE_DEV_OPT_REQ_FEAT_MASK_DQO_QPL = 0x0,
> > GVE_DEV_OPT_REQ_FEAT_MASK_BUFFER_SIZES = 0x0,
> > GVE_DEV_OPT_REQ_FEAT_MASK_MODIFY_RING = 0x0,
> > + GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING = 0x0,
> > };
> >
> > enum gve_sup_feature_mask {
> > GVE_SUP_MODIFY_RING_MASK = 1 << 0,
> > GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
> > GVE_SUP_BUFFER_SIZES_MASK = 1 << 4,
> > + GVE_SUP_FLOW_STEERING_MASK = 1 << 5,
> > };
> >
> > #define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
On Wed, May 8, 2024 at 7:09 AM Markus Elfring <[email protected]> wrote:
>
> …
> > +++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
> …
> > +static int gve_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd, u32 *rule_locs)
> > +{
> > + struct gve_priv *priv = netdev_priv(netdev);
> > + int err = 0;
> > +
> > + dev_hold(netdev);
> > + rtnl_unlock();
> …
> > +out:
> > + rtnl_lock();
> > + dev_put(netdev);
> > + return err;
> > +}
> …
>
> How do you think about to increase the application of scope-based resource management
> at such source code places?
>
Is the suggestion to combine dev_hold(netdev) together with
rtnl_unlock()? If so, I think there might be different usages for
using rtnl_unlock. For example, some drivers will call rtnl_unlock
after dev_close(netdev). Please correct me if I'm wrong. Thank you!
> Regards,
> Markus
On Thu, 9 May 2024 17:19:00 -0700 Ziwei Xiao wrote:
> > How do you think about to increase the application of scope-based resource management
> > at such source code places?
> >
> Is the suggestion to combine dev_hold(netdev) together with
> rtnl_unlock()? If so, I think there might be different usages for
> using rtnl_unlock. For example, some drivers will call rtnl_unlock
> after dev_close(netdev). Please correct me if I'm wrong. Thank you!
We are rather cautious about adoption of the scope-based resource
management in networking. Don't let Markus lead you astray.
>> …
>>> +++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
>> …
>>> +static int gve_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd, u32 *rule_locs)
>>> +{
>>> + struct gve_priv *priv = netdev_priv(netdev);
>>> + int err = 0;
>>> +
>>> + dev_hold(netdev);
>>> + rtnl_unlock();
>> …
>>> +out:
>>> + rtnl_lock();
>>> + dev_put(netdev);
>>> + return err;
>>> +}
>> …
>>
>> How do you think about to increase the application of scope-based resource management
>> at such source code places?
>>
> Is the suggestion to combine dev_hold(netdev) together with
> rtnl_unlock()? If so, I think there might be different usages for
> using rtnl_unlock. For example, some drivers will call rtnl_unlock
> after dev_close(netdev). Please correct me if I'm wrong.
How much collateral evolution did you notice according to information from
an article like “Scope-based resource management for the kernel”
by Jonathan Corbet (from 2023-06-15)?
https://lwn.net/Articles/934679/
Would you become interested to extend the usage of resource guards accordingly?
https://elixir.bootlin.com/linux/v6.9-rc7/source/include/linux/cleanup.h#L124
Regards,
Markus
On Tue, May 07, 2024 at 10:59:42PM +0000, Ziwei Xiao wrote:
> From: Jeroen de Borst <[email protected]>
>
> The adminq command is limited to 64 bytes per entry and it's 56 bytes
> for the command itself at maximum. To support larger commands, we need
> to dma_alloc a separate memory to put the command in that memory and
> send the dma memory address instead of the actual command.
>
> This change introduces an extended adminq command to wrap the real
> command with the inner opcode and the allocated dma memory address
> specified. Once the device receives it, it can get the real command from
> the given dma memory address. As designed with the device, all the
> extended commands will use inner opcode larger than 0xFF.
>
> Signed-off-by: Jeroen de Borst <[email protected]>
> Co-developed-by: Ziwei Xiao <[email protected]>
> Signed-off-by: Ziwei Xiao <[email protected]>
> Reviewed-by: Praveen Kaligineedi <[email protected]>
> Reviewed-by: Harshitha Ramamurthy <[email protected]>
> Reviewed-by: Willem de Bruijn <[email protected]>
> ---
> drivers/net/ethernet/google/gve/gve_adminq.c | 31 ++++++++++++++++++++
> drivers/net/ethernet/google/gve/gve_adminq.h | 12 ++++++++
> 2 files changed, 43 insertions(+)
>
> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
> index 2c3ec5c3b114..514641b3ccc7 100644
> --- a/drivers/net/ethernet/google/gve/gve_adminq.c
> +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
> @@ -461,6 +461,8 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
>
> memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
> opcode = be32_to_cpu(READ_ONCE(cmd->opcode));
> + if (opcode == GVE_ADMINQ_EXTENDED_COMMAND)
> + opcode = be32_to_cpu(cmd->extended_command.inner_opcode);
>
> switch (opcode) {
> case GVE_ADMINQ_DESCRIBE_DEVICE:
> @@ -537,6 +539,35 @@ static int gve_adminq_execute_cmd(struct gve_priv *priv,
> return err;
> }
>
> +static int gve_adminq_execute_extended_cmd(struct gve_priv *priv, u32 opcode,
> + size_t cmd_size, void *cmd_orig)
Hi Ziewi Xiaoi and Jeroen,
As of this patch, gve_adminq_execute_extended_cmd is defined but unused.
Which causes an error when compiling with W=1 using gcc-13 or clang-18.
Perhaps it would be better to squash this patch into the patch that
uses gve_adminq_execute_extended_cmd.
..
On Tue, May 07, 2024 at 10:59:45PM +0000, Ziwei Xiao wrote:
> From: Jeroen de Borst <[email protected]>
>
> Implement the ethtool commands that can be used to configure and query
> flow-steering rules. For these ethtool commands, the driver will
> temporarily drop the rtnl lock to reduce the latency for the flow
> steering commands on separate NICs. It will then be protected by the new
> added adminq lock.
>
> A large part of this change consists of translating the ethtool
> representation of 'ntuples' to our internal gve_flow_rule and vice-versa
> in the new created gve_flow_rule.c
>
> Considering the possible large amount of flow rules, the driver doesn't
> store all the rules locally. When the user runs 'ethtool -n <nic>' to
> check the registered rules, the driver will send adminq command to
> query a limited amount of rules/rule ids(that filled in a 4096 bytes dma
> memory) at a time as a cache for the ethtool queries. The adminq query
> commands will be repeated for several times until the ethtool has
> queried all the needed rules.
>
> Signed-off-by: Jeroen de Borst <[email protected]>
> Co-developed-by: Ziwei Xiao <[email protected]>
> Signed-off-by: Ziwei Xiao <[email protected]>
> Reviewed-by: Praveen Kaligineedi <[email protected]>
> Reviewed-by: Harshitha Ramamurthy <[email protected]>
> Reviewed-by: Willem de Bruijn <[email protected]>
..
> diff --git a/drivers/net/ethernet/google/gve/gve_flow_rule.c b/drivers/net/ethernet/google/gve/gve_flow_rule.c
> new file mode 100644
> index 000000000000..1cafd520f2db
> --- /dev/null
> +++ b/drivers/net/ethernet/google/gve/gve_flow_rule.c
> @@ -0,0 +1,296 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> +/* Google virtual Ethernet (gve) driver
> + *
> + * Copyright (C) 2015-2024 Google LLC
> + */
> +
> +#include "gve.h"
> +#include "gve_adminq.h"
> +
> +static
> +int gve_fill_ethtool_flow_spec(struct ethtool_rx_flow_spec *fsp, struct gve_flow_rule *rule)
> +{
> + static const u16 flow_type_lut[] = {
> + [GVE_FLOW_TYPE_TCPV4] = TCP_V4_FLOW,
> + [GVE_FLOW_TYPE_UDPV4] = UDP_V4_FLOW,
> + [GVE_FLOW_TYPE_SCTPV4] = SCTP_V4_FLOW,
> + [GVE_FLOW_TYPE_AHV4] = AH_V4_FLOW,
> + [GVE_FLOW_TYPE_ESPV4] = ESP_V4_FLOW,
> + [GVE_FLOW_TYPE_TCPV6] = TCP_V6_FLOW,
> + [GVE_FLOW_TYPE_UDPV6] = UDP_V6_FLOW,
> + [GVE_FLOW_TYPE_SCTPV6] = SCTP_V6_FLOW,
> + [GVE_FLOW_TYPE_AHV6] = AH_V6_FLOW,
> + [GVE_FLOW_TYPE_ESPV6] = ESP_V6_FLOW,
> + };
> +
> + if (be16_to_cpu(rule->flow_type) >= ARRAY_SIZE(flow_type_lut))
The type of rule->flow_type is u16.
But be16_to_cpu expects a 16-bit big endian value as it's argument.
This does not seem right.
This was flagged by Sparse along with several other problems in this patch.
Please make sure patches don't introduce new Sparse warnings.
Thanks!
..
> +int gve_add_flow_rule(struct gve_priv *priv, struct ethtool_rxnfc *cmd)
> +{
> + struct ethtool_rx_flow_spec *fsp = &cmd->fs;
> + struct gve_adminq_flow_rule *rule = NULL;
> + int err;
> +
> + if (!priv->max_flow_rules)
> + return -EOPNOTSUPP;
> +
> + rule = kvzalloc(sizeof(*rule), GFP_KERNEL);
> + if (!rule)
> + return -ENOMEM;
> +
> + err = gve_generate_flow_rule(priv, fsp, rule);
> + if (err)
> + goto out;
> +
> + err = gve_adminq_add_flow_rule(priv, rule, fsp->location);
> +
> +out:
> + kfree(rule);
rule was allocated using kvmalloc(), so it should be freed using kvfree().
Flagged by Coccinelle.
> + if (err)
> + dev_err(&priv->pdev->dev, "Failed to add the flow rule: %u", fsp->location);
> +
> + return err;
> +}
..
On 2024-05-09 17:18, Ziwei Xiao wrote:
> On Tue, May 7, 2024 at 11:24 PM David Wei <[email protected]> wrote:
>>
>> On 2024-05-07 15:59, Ziwei Xiao wrote:
[...]
>>> +/* Flow-steering related definitions */
>>> +enum gve_adminq_flow_rule_cfg_opcode {
>>> + GVE_RULE_ADD = 0,
>>> + GVE_RULE_DEL = 1,
>>> + GVE_RULE_RESET = 2,
>>> +};
>>
>> Could these be more descriptive?
>>
> Could you be more specific on which needs to be improved? Is the enum
> name or the field name?
Sorry for the late response.
The enum field names. GVE_RULE_x is too sparse for me; what rule? To
match the rest of the file maybe something like GVE_FLOW_RULE_CFG_x.