LinuxLists.cc - [PATCH net-next v2 0/5] gve: Add flow steering support

2024-06-13 01:47:59

Subject: [PATCH net-next v2 0/5] gve: Add flow steering support

To support flow steering in GVE driver, there are two adminq changes
need to be made in advance.

The first one is adding adminq mutex lock, which is to allow the
incoming flow steering operations to be able to temporarily drop the
rtnl_lock to reduce the latency for registering flow rules among
several NICs at the same time.

The second one is to add the extended adminq command so that we can
support larger adminq command such as configure_flow_rule command. In
that patch, there is a new added function called
gve_adminq_execute_extended_cmd with the attribute of __maybe_unused.
That attribute will be removed in the third patch of this series where
it will use the previously unused function.

And the other three patches are needed for the actual flow steering
feature support in driver.

Jeroen de Borst (4):
gve: Add adminq extended command
gve: Add flow steering device option
gve: Add flow steering adminq commands
gve: Add flow steering ethtool support

Ziwei Xiao (1):
gve: Add adminq mutex lock

drivers/net/ethernet/google/gve/Makefile | 2 +-
drivers/net/ethernet/google/gve/gve.h | 54 +++-
drivers/net/ethernet/google/gve/gve_adminq.c | 230 +++++++++++++-
drivers/net/ethernet/google/gve/gve_adminq.h | 103 ++++++
drivers/net/ethernet/google/gve/gve_ethtool.c | 85 ++++-
.../net/ethernet/google/gve/gve_flow_rule.c | 298 ++++++++++++++++++
drivers/net/ethernet/google/gve/gve_main.c | 83 ++++-
7 files changed, 830 insertions(+), 25 deletions(-)
create mode 100644 drivers/net/ethernet/google/gve/gve_flow_rule.c

--
2.45.2.627.g7a2c4fd464-goog

2024-06-13 01:48:15

by Ziwei Xiao

[permalink] [raw]

Subject: [PATCH net-next v2 1/5] gve: Add adminq mutex lock

We were depending on the rtnl_lock to make sure there is only one adminq
command running at a time. But some commands may take too long to hold
the rtnl_lock, such as the upcoming flow steering operations. For such
situations, it can temporarily drop the rtnl_lock, and replace it for
these operations with a new adminq lock, which can ensure the adminq
command execution to be thread-safe.

Signed-off-by: Ziwei Xiao <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
drivers/net/ethernet/google/gve/gve.h | 1 +
drivers/net/ethernet/google/gve/gve_adminq.c | 22 +++++++++++---------
2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index ae1e21c9b0a5..ca7fce17f2c0 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -724,6 +724,7 @@ struct gve_priv {
union gve_adminq_command *adminq;
dma_addr_t adminq_bus_addr;
struct dma_pool *adminq_pool;
+ struct mutex adminq_lock; /* Protects adminq command execution */
u32 adminq_mask; /* masks prod_cnt to adminq size */
u32 adminq_prod_cnt; /* free-running count of AQ cmds executed */
u32 adminq_cmd_fail; /* free-running count of AQ cmds failed */
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 8ca0def176ef..2e0c1eb87b11 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -284,6 +284,7 @@ int gve_adminq_alloc(struct device *dev, struct gve_priv *priv)
&priv->reg_bar0->adminq_base_address_lo);
iowrite32be(GVE_DRIVER_STATUS_RUN_MASK, &priv->reg_bar0->driver_status);
}
+ mutex_init(&priv->adminq_lock);
gve_set_admin_queue_ok(priv);
return 0;
}
@@ -511,28 +512,29 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
return 0;
}

-/* This function is not threadsafe - the caller is responsible for any
- * necessary locks.
- * The caller is also responsible for making sure there are no commands
- * waiting to be executed.
- */
static int gve_adminq_execute_cmd(struct gve_priv *priv,
union gve_adminq_command *cmd_orig)
{
u32 tail, head;
int err;

+ mutex_lock(&priv->adminq_lock);
tail = ioread32be(&priv->reg_bar0->adminq_event_counter);
head = priv->adminq_prod_cnt;
- if (tail != head)
- // This is not a valid path
- return -EINVAL;
+ if (tail != head) {
+ err = -EINVAL;
+ goto out;
+ }

err = gve_adminq_issue_cmd(priv, cmd_orig);
if (err)
- return err;
+ goto out;

- return gve_adminq_kick_and_wait(priv);
+ err = gve_adminq_kick_and_wait(priv);
+
+out:
+ mutex_unlock(&priv->adminq_lock);
+ return err;
}

/* The device specifies that the management vector can either be the first irq
--
2.45.2.627.g7a2c4fd464-goog

2024-06-13 01:48:33

by Ziwei Xiao

[permalink] [raw]

Subject: [PATCH net-next v2 2/5] gve: Add adminq extended command

From: Jeroen de Borst <[email protected]>

The adminq command is limited to 64 bytes per entry and it's 56 bytes
for the command itself at maximum. To support larger commands, we need
to dma_alloc a separate memory to put the command in that memory and
send the dma memory address instead of the actual command.

Introduce an extended adminq command to wrap the real command with the
inner opcode and the allocated dma memory address specified. Once the
device receives it, it can get the real command from the given dma
memory address. As designed with the device, all the extended commands
will use inner opcode larger than 0xFF.

Signed-off-by: Jeroen de Borst <[email protected]>
Co-developed-by: Ziwei Xiao <[email protected]>
Signed-off-by: Ziwei Xiao <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
Changes in v2:
- Update the commit message to use imperative mood
- Add the __maybe_unused attribute for the unused function of
gve_adminq_execute_extended_cmd

drivers/net/ethernet/google/gve/gve_adminq.c | 31 ++++++++++++++++++++
drivers/net/ethernet/google/gve/gve_adminq.h | 12 ++++++++
2 files changed, 43 insertions(+)

diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 2e0c1eb87b11..5b54ce369eb2 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -461,6 +461,8 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,

memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
opcode = be32_to_cpu(READ_ONCE(cmd->opcode));
+ if (opcode == GVE_ADMINQ_EXTENDED_COMMAND)
+ opcode = be32_to_cpu(cmd->extended_command.inner_opcode);

switch (opcode) {
case GVE_ADMINQ_DESCRIBE_DEVICE:
@@ -537,6 +539,35 @@ static int gve_adminq_execute_cmd(struct gve_priv *priv,
return err;
}

+static int __maybe_unused gve_adminq_execute_extended_cmd(struct gve_priv *priv, u32 opcode,
+ size_t cmd_size, void *cmd_orig)
+{
+ union gve_adminq_command cmd;
+ dma_addr_t inner_cmd_bus;
+ void *inner_cmd;
+ int err;
+
+ inner_cmd = dma_alloc_coherent(&priv->pdev->dev, cmd_size,
+ &inner_cmd_bus, GFP_KERNEL);
+ if (!inner_cmd)
+ return -ENOMEM;
+
+ memcpy(inner_cmd, cmd_orig, cmd_size);
+
+ memset(&cmd, 0, sizeof(cmd));
+ cmd.opcode = cpu_to_be32(GVE_ADMINQ_EXTENDED_COMMAND);
+ cmd.extended_command = (struct gve_adminq_extended_command) {
+ .inner_opcode = cpu_to_be32(opcode),
+ .inner_length = cpu_to_be32(cmd_size),
+ .inner_command_addr = cpu_to_be64(inner_cmd_bus),
+ };
+
+ err = gve_adminq_execute_cmd(priv, &cmd);
+
+ dma_free_coherent(&priv->pdev->dev, cmd_size, inner_cmd, inner_cmd_bus);
+ return err;
+}
+
/* The device specifies that the management vector can either be the first irq
* or the last irq. ntfy_blk_msix_base_idx indicates the first irq assigned to
* the ntfy blks. It if is 0 then the management vector is last, if it is 1 then
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index e64f0dbe744d..e0370ace8397 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -25,6 +25,9 @@ enum gve_adminq_opcodes {
GVE_ADMINQ_REPORT_LINK_SPEED = 0xD,
GVE_ADMINQ_GET_PTYPE_MAP = 0xE,
GVE_ADMINQ_VERIFY_DRIVER_COMPATIBILITY = 0xF,
+
+ /* For commands that are larger than 56 bytes */
+ GVE_ADMINQ_EXTENDED_COMMAND = 0xFF,
};

/* Admin queue status codes */
@@ -208,6 +211,14 @@ enum gve_driver_capbility {
#define GVE_DRIVER_CAPABILITY_FLAGS3 0x0
#define GVE_DRIVER_CAPABILITY_FLAGS4 0x0

+struct gve_adminq_extended_command {
+ __be32 inner_opcode;
+ __be32 inner_length;
+ __be64 inner_command_addr;
+};
+
+static_assert(sizeof(struct gve_adminq_extended_command) == 16);
+
struct gve_driver_info {
u8 os_type; /* 0x01 = Linux */
u8 driver_major;
@@ -432,6 +443,7 @@ union gve_adminq_command {
struct gve_adminq_get_ptype_map get_ptype_map;
struct gve_adminq_verify_driver_compatibility
verify_driver_compatibility;
+ struct gve_adminq_extended_command extended_command;
};
};
u8 reserved[64];
--
2.45.2.627.g7a2c4fd464-goog

2024-06-13 01:48:56

by Ziwei Xiao

[permalink] [raw]

Subject: [PATCH net-next v2 3/5] gve: Add flow steering device option

From: Jeroen de Borst <[email protected]>

Add a new device option to signal to the driver that the device supports
flow steering. This device option also carries the maximum number of
flow steering rules that the device can store.

Signed-off-by: Jeroen de Borst <[email protected]>
Co-developed-by: Ziwei Xiao <[email protected]>
Signed-off-by: Ziwei Xiao <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
drivers/net/ethernet/google/gve/gve.h | 2 +
drivers/net/ethernet/google/gve/gve_adminq.c | 42 ++++++++++++++++++--
drivers/net/ethernet/google/gve/gve_adminq.h | 11 +++++
3 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index ca7fce17f2c0..58213c15e084 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -786,6 +786,8 @@ struct gve_priv {

u16 header_buf_size; /* device configured, header-split supported if non-zero */
bool header_split_enabled; /* True if the header split is enabled by the user */
+
+ u32 max_flow_rules;
};

enum gve_service_task_flags_bit {
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 5b54ce369eb2..7481661df72d 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -44,6 +44,7 @@ void gve_parse_device_option(struct gve_priv *priv,
struct gve_device_option_jumbo_frames **dev_op_jumbo_frames,
struct gve_device_option_dqo_qpl **dev_op_dqo_qpl,
struct gve_device_option_buffer_sizes **dev_op_buffer_sizes,
+ struct gve_device_option_flow_steering **dev_op_flow_steering,
struct gve_device_option_modify_ring **dev_op_modify_ring)
{
u32 req_feat_mask = be32_to_cpu(option->required_features_mask);
@@ -189,6 +190,23 @@ void gve_parse_device_option(struct gve_priv *priv,
if (option_length == GVE_DEVICE_OPTION_NO_MIN_RING_SIZE)
priv->default_min_ring_size = true;
break;
+ case GVE_DEV_OPT_ID_FLOW_STEERING:
+ if (option_length < sizeof(**dev_op_flow_steering) ||
+ req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING) {
+ dev_warn(&priv->pdev->dev, GVE_DEVICE_OPTION_ERROR_FMT,
+ "Flow Steering",
+ (int)sizeof(**dev_op_flow_steering),
+ GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING,
+ option_length, req_feat_mask);
+ break;
+ }
+
+ if (option_length > sizeof(**dev_op_flow_steering))
+ dev_warn(&priv->pdev->dev,
+ GVE_DEVICE_OPTION_TOO_BIG_FMT,
+ "Flow Steering");
+ *dev_op_flow_steering = (void *)(option + 1);
+ break;
default:
/* If we don't recognize the option just continue
* without doing anything.
@@ -208,6 +226,7 @@ gve_process_device_options(struct gve_priv *priv,
struct gve_device_option_jumbo_frames **dev_op_jumbo_frames,
struct gve_device_option_dqo_qpl **dev_op_dqo_qpl,
struct gve_device_option_buffer_sizes **dev_op_buffer_sizes,
+ struct gve_device_option_flow_steering **dev_op_flow_steering,
struct gve_device_option_modify_ring **dev_op_modify_ring)
{
const int num_options = be16_to_cpu(descriptor->num_device_options);
@@ -230,7 +249,7 @@ gve_process_device_options(struct gve_priv *priv,
dev_op_gqi_rda, dev_op_gqi_qpl,
dev_op_dqo_rda, dev_op_jumbo_frames,
dev_op_dqo_qpl, dev_op_buffer_sizes,
- dev_op_modify_ring);
+ dev_op_flow_steering, dev_op_modify_ring);
dev_opt = next_opt;
}

@@ -838,6 +857,8 @@ static void gve_enable_supported_features(struct gve_priv *priv,
*dev_op_dqo_qpl,
const struct gve_device_option_buffer_sizes
*dev_op_buffer_sizes,
+ const struct gve_device_option_flow_steering
+ *dev_op_flow_steering,
const struct gve_device_option_modify_ring
*dev_op_modify_ring)
{
@@ -890,10 +911,22 @@ static void gve_enable_supported_features(struct gve_priv *priv,
priv->min_tx_desc_cnt = be16_to_cpu(dev_op_modify_ring->min_tx_ring_size);
}
}
+
+ if (dev_op_flow_steering &&
+ (supported_features_mask & GVE_SUP_FLOW_STEERING_MASK)) {
+ if (dev_op_flow_steering->max_flow_rules) {
+ priv->max_flow_rules =
+ be32_to_cpu(dev_op_flow_steering->max_flow_rules);
+ dev_info(&priv->pdev->dev,
+ "FLOW STEERING device option enabled with max rule limit of %u.\n",
+ priv->max_flow_rules);
+ }
+ }
}

int gve_adminq_describe_device(struct gve_priv *priv)
{
+ struct gve_device_option_flow_steering *dev_op_flow_steering = NULL;
struct gve_device_option_buffer_sizes *dev_op_buffer_sizes = NULL;
struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL;
struct gve_device_option_modify_ring *dev_op_modify_ring = NULL;
@@ -930,6 +963,7 @@ int gve_adminq_describe_device(struct gve_priv *priv)
&dev_op_gqi_qpl, &dev_op_dqo_rda,
&dev_op_jumbo_frames, &dev_op_dqo_qpl,
&dev_op_buffer_sizes,
+ &dev_op_flow_steering,
&dev_op_modify_ring);
if (err)
goto free_device_descriptor;
@@ -969,9 +1003,8 @@ int gve_adminq_describe_device(struct gve_priv *priv)
/* set default descriptor counts */
gve_set_default_desc_cnt(priv, descriptor);

- /* DQO supports LRO. */
if (!gve_is_gqi(priv))
- priv->dev->hw_features |= NETIF_F_LRO;
+ priv->dev->hw_features |= NETIF_F_LRO | NETIF_F_NTUPLE;

priv->max_registered_pages =
be64_to_cpu(descriptor->max_registered_pages);
@@ -991,7 +1024,8 @@ int gve_adminq_describe_device(struct gve_priv *priv)

gve_enable_supported_features(priv, supported_features_mask,
dev_op_jumbo_frames, dev_op_dqo_qpl,
- dev_op_buffer_sizes, dev_op_modify_ring);
+ dev_op_buffer_sizes, dev_op_flow_steering,
+ dev_op_modify_ring);

free_device_descriptor:
dma_pool_free(priv->adminq_pool, descriptor, descriptor_bus);
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index e0370ace8397..e64a0e72e781 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -146,6 +146,14 @@ struct gve_device_option_modify_ring {

static_assert(sizeof(struct gve_device_option_modify_ring) == 12);

+struct gve_device_option_flow_steering {
+ __be32 supported_features_mask;
+ __be32 reserved;
+ __be32 max_flow_rules;
+};
+
+static_assert(sizeof(struct gve_device_option_flow_steering) == 12);
+
/* Terminology:
*
* RDA - Raw DMA Addressing - Buffers associated with SKBs are directly DMA
@@ -163,6 +171,7 @@ enum gve_dev_opt_id {
GVE_DEV_OPT_ID_DQO_QPL = 0x7,
GVE_DEV_OPT_ID_JUMBO_FRAMES = 0x8,
GVE_DEV_OPT_ID_BUFFER_SIZES = 0xa,
+ GVE_DEV_OPT_ID_FLOW_STEERING = 0xb,
};

enum gve_dev_opt_req_feat_mask {
@@ -174,12 +183,14 @@ enum gve_dev_opt_req_feat_mask {
GVE_DEV_OPT_REQ_FEAT_MASK_DQO_QPL = 0x0,
GVE_DEV_OPT_REQ_FEAT_MASK_BUFFER_SIZES = 0x0,
GVE_DEV_OPT_REQ_FEAT_MASK_MODIFY_RING = 0x0,
+ GVE_DEV_OPT_REQ_FEAT_MASK_FLOW_STEERING = 0x0,
};

enum gve_sup_feature_mask {
GVE_SUP_MODIFY_RING_MASK = 1 << 0,
GVE_SUP_JUMBO_FRAMES_MASK = 1 << 2,
GVE_SUP_BUFFER_SIZES_MASK = 1 << 4,
+ GVE_SUP_FLOW_STEERING_MASK = 1 << 5,
};

#define GVE_DEV_OPT_LEN_GQI_RAW_ADDRESSING 0x0
--
2.45.2.627.g7a2c4fd464-goog

2024-06-13 01:49:08

by Ziwei Xiao

[permalink] [raw]

Subject: [PATCH net-next v2 4/5] gve: Add flow steering adminq commands

From: Jeroen de Borst <[email protected]>

Add new adminq commands for the driver to configure and query flow rules
that are stored in the device. Flow steering rules are assigned with a
location that determines the relative order of the rules.

Flow rules can run up to an order of millions. In such cases, storing
a full copy of the rules in the driver to prepare for the ethtool query
is infeasible while querying them from the device is better. That needs
to be optimized too so that we don't send a lot of adminq commands. The
solution here is to store a limited number of rules/rule ids in the
driver in a cache. Use dma_pool to allocate 4k bytes which lets device
write at most 46 flow rules(4096/88) or 1024 rule ids(4096/4) at a time.

For configuring flow rules, there are 3 sub-commands:
- ADD which adds a rule at the location supplied
- DEL which deletes the rule at the location supplied
- RESET which clears all currently active rules in the device

For querying flow rules, there are also 3 sub-commands:
- QUERY_RULES corresponds to ETHTOOL_GRXCLSRULE. It fills the rules in
the allocated cache after querying the device
- QUERY_RULES_IDS corresponds to ETHTOOL_GRXCLSRLALL. It fills the
rule_ids in the allocated cache after querying the device
- QUERY_RULES_STATS corresponds to ETHTOOL_GRXCLSRLCNT. It queries the
device's current flow rule number and the supported max flow rule
limit

Signed-off-by: Jeroen de Borst <[email protected]>
Co-developed-by: Ziwei Xiao <[email protected]>
Signed-off-by: Ziwei Xiao <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
Changes in v2:
- Update the commit message to use imperative mood
- Remove the variable comments of num_flow_rules and struct
gve_query_flow_rules_descriptor since their names are already
self-describing
- Remove the __maybe_unused attribute for
gve_adminq_execute_extended_cmd since it's no longer unused
from this patch
- Add a new struct gve_adminq_queried_flow_rule to store the
queried rule from the device instead of using gve_flow_rule
- Remove the unused variable of descriptor_end
- Remove the check for whether the total length is smaller than
the allocated memory length. Instead, add the check for
whether the queried results length match with the expected
length
- Update the variable names of the struct
gve_adminq_flow_rule_cfg_opcode to be more self-descriptive

drivers/net/ethernet/google/gve/gve.h | 43 ++++++
drivers/net/ethernet/google/gve/gve_adminq.c | 139 +++++++++++++++++-
drivers/net/ethernet/google/gve/gve_adminq.h | 80 ++++++++++
drivers/net/ethernet/google/gve/gve_ethtool.c | 5 +-
drivers/net/ethernet/google/gve/gve_main.c | 51 ++++++-
5 files changed, 314 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 58213c15e084..b9e9dd958f3c 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -60,6 +60,11 @@

#define GVE_DEFAULT_RX_BUFFER_OFFSET 2048

+#define GVE_FLOW_RULES_CACHE_SIZE \
+ (GVE_ADMINQ_BUFFER_SIZE / sizeof(struct gve_adminq_queried_flow_rule))
+#define GVE_FLOW_RULE_IDS_CACHE_SIZE \
+ (GVE_ADMINQ_BUFFER_SIZE / sizeof(((struct gve_adminq_queried_flow_rule *)0)->location))
+
#define GVE_XDP_ACTIONS 5

#define GVE_GQ_TX_MIN_PKT_DESC_BYTES 182
@@ -678,6 +683,39 @@ enum gve_queue_format {
GVE_DQO_QPL_FORMAT = 0x4,
};

+struct gve_flow_spec {
+ __be32 src_ip[4];
+ __be32 dst_ip[4];
+ union {
+ struct {
+ __be16 src_port;
+ __be16 dst_port;
+ };
+ __be32 spi;
+ };
+ union {
+ u8 tos;
+ u8 tclass;
+ };
+};
+
+struct gve_flow_rule {
+ u32 location;
+ u16 flow_type;
+ u16 action;
+ struct gve_flow_spec key;
+ struct gve_flow_spec mask;
+};
+
+struct gve_flow_rules_cache {
+ bool rules_cache_synced; /* False if the driver's rules_cache is outdated */
+ struct gve_adminq_queried_flow_rule *rules_cache;
+ __be32 *rule_ids_cache;
+ /* The total number of queried rules that stored in the caches */
+ u32 rules_cache_num;
+ u32 rule_ids_cache_num;
+};
+
struct gve_priv {
struct net_device *dev;
struct gve_tx_ring *tx; /* array of tx_cfg.num_queues */
@@ -744,6 +782,8 @@ struct gve_priv {
u32 adminq_report_link_speed_cnt;
u32 adminq_get_ptype_map_cnt;
u32 adminq_verify_driver_compatibility_cnt;
+ u32 adminq_query_flow_rules_cnt;
+ u32 adminq_cfg_flow_rule_cnt;

/* Global stats */
u32 interface_up_cnt; /* count of times interface turned up since last reset */
@@ -788,6 +828,9 @@ struct gve_priv {
bool header_split_enabled; /* True if the header split is enabled by the user */

u32 max_flow_rules;
+ u32 num_flow_rules;
+
+ struct gve_flow_rules_cache flow_rules_cache;
};

enum gve_service_task_flags_bit {
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
index 7481661df72d..3d37025f6f4c 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.c
+++ b/drivers/net/ethernet/google/gve/gve_adminq.c
@@ -287,6 +287,8 @@ int gve_adminq_alloc(struct device *dev, struct gve_priv *priv)
priv->adminq_report_stats_cnt = 0;
priv->adminq_report_link_speed_cnt = 0;
priv->adminq_get_ptype_map_cnt = 0;
+ priv->adminq_query_flow_rules_cnt = 0;
+ priv->adminq_cfg_flow_rule_cnt = 0;

/* Setup Admin queue with the device */
if (priv->pdev->revision < 0x1) {
@@ -526,6 +528,12 @@ static int gve_adminq_issue_cmd(struct gve_priv *priv,
case GVE_ADMINQ_VERIFY_DRIVER_COMPATIBILITY:
priv->adminq_verify_driver_compatibility_cnt++;
break;
+ case GVE_ADMINQ_QUERY_FLOW_RULES:
+ priv->adminq_query_flow_rules_cnt++;
+ break;
+ case GVE_ADMINQ_CONFIGURE_FLOW_RULE:
+ priv->adminq_cfg_flow_rule_cnt++;
+ break;
default:
dev_err(&priv->pdev->dev, "unknown AQ command opcode %d\n", opcode);
}
@@ -558,8 +566,8 @@ static int gve_adminq_execute_cmd(struct gve_priv *priv,
return err;
}

-static int __maybe_unused gve_adminq_execute_extended_cmd(struct gve_priv *priv, u32 opcode,
- size_t cmd_size, void *cmd_orig)
+static int gve_adminq_execute_extended_cmd(struct gve_priv *priv, u32 opcode,
+ size_t cmd_size, void *cmd_orig)
{
union gve_adminq_command cmd;
dma_addr_t inner_cmd_bus;
@@ -1188,3 +1196,130 @@ int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
ptype_map_bus);
return err;
}
+
+static int
+gve_adminq_configure_flow_rule(struct gve_priv *priv,
+ struct gve_adminq_configure_flow_rule *flow_rule_cmd)
+{
+ int err = gve_adminq_execute_extended_cmd(priv,
+ GVE_ADMINQ_CONFIGURE_FLOW_RULE,
+ sizeof(struct gve_adminq_configure_flow_rule),
+ flow_rule_cmd);
+
+ if (err) {
+ dev_err(&priv->pdev->dev, "Timeout to configure the flow rule, trigger reset");
+ gve_reset(priv, true);
+ } else {
+ priv->flow_rules_cache.rules_cache_synced = false;
+ }
+
+ return err;
+}
+
+int gve_adminq_add_flow_rule(struct gve_priv *priv, struct gve_adminq_flow_rule *rule, u32 loc)
+{
+ struct gve_adminq_configure_flow_rule flow_rule_cmd = {
+ .opcode = cpu_to_be16(GVE_FLOW_RULE_CFG_ADD),
+ .location = cpu_to_be32(loc),
+ .rule = *rule,
+ };
+
+ return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
+}
+
+int gve_adminq_del_flow_rule(struct gve_priv *priv, u32 loc)
+{
+ struct gve_adminq_configure_flow_rule flow_rule_cmd = {
+ .opcode = cpu_to_be16(GVE_FLOW_RULE_CFG_DEL),
+ .location = cpu_to_be32(loc),
+ };
+
+ return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
+}
+
+int gve_adminq_reset_flow_rules(struct gve_priv *priv)
+{
+ struct gve_adminq_configure_flow_rule flow_rule_cmd = {
+ .opcode = cpu_to_be16(GVE_FLOW_RULE_CFG_RESET),
+ };
+
+ return gve_adminq_configure_flow_rule(priv, &flow_rule_cmd);
+}
+
+/* In the dma memory that the driver allocated for the device to query the flow rules, the device
+ * will first write it with a struct of gve_query_flow_rules_descriptor. Next to it, the device
+ * will write an array of rules or rule ids with the count that specified in the descriptor.
+ * For GVE_FLOW_RULE_QUERY_STATS, the device will only write the descriptor.
+ */
+static int gve_adminq_process_flow_rules_query(struct gve_priv *priv, u16 query_opcode,
+ struct gve_query_flow_rules_descriptor *descriptor)
+{
+ struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
+ u32 num_queried_rules, total_memory_len, rule_info_len;
+ void *rule_info;
+
+ total_memory_len = be32_to_cpu(descriptor->total_length);
+ num_queried_rules = be32_to_cpu(descriptor->num_queried_rules);
+ rule_info = (void *)(descriptor + 1);
+
+ switch (query_opcode) {
+ case GVE_FLOW_RULE_QUERY_RULES:
+ rule_info_len = num_queried_rules * sizeof(*flow_rules_cache->rules_cache);
+ if (sizeof(*descriptor) + rule_info_len != total_memory_len) {
+ dev_err(&priv->dev->dev, "flow rules query is out of memory.\n");
+ return -ENOMEM;
+ }
+
+ memcpy(flow_rules_cache->rules_cache, rule_info, rule_info_len);
+ flow_rules_cache->rules_cache_num = num_queried_rules;
+ break;
+ case GVE_FLOW_RULE_QUERY_IDS:
+ rule_info_len = num_queried_rules * sizeof(*flow_rules_cache->rule_ids_cache);
+ if (sizeof(*descriptor) + rule_info_len != total_memory_len) {
+ dev_err(&priv->dev->dev, "flow rule ids query is out of memory.\n");
+ return -ENOMEM;
+ }
+
+ memcpy(flow_rules_cache->rule_ids_cache, rule_info, rule_info_len);
+ flow_rules_cache->rule_ids_cache_num = num_queried_rules;
+ break;
+ case GVE_FLOW_RULE_QUERY_STATS:
+ priv->num_flow_rules = be32_to_cpu(descriptor->num_flow_rules);
+ priv->max_flow_rules = be32_to_cpu(descriptor->max_flow_rules);
+ return 0;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+int gve_adminq_query_flow_rules(struct gve_priv *priv, u16 query_opcode, u32 starting_loc)
+{
+ struct gve_query_flow_rules_descriptor *descriptor;
+ union gve_adminq_command cmd;
+ dma_addr_t descriptor_bus;
+ int err = 0;
+
+ memset(&cmd, 0, sizeof(cmd));
+ descriptor = dma_pool_alloc(priv->adminq_pool, GFP_KERNEL, &descriptor_bus);
+ if (!descriptor)
+ return -ENOMEM;
+
+ cmd.opcode = cpu_to_be32(GVE_ADMINQ_QUERY_FLOW_RULES);
+ cmd.query_flow_rules = (struct gve_adminq_query_flow_rules) {
+ .opcode = cpu_to_be16(query_opcode),
+ .starting_rule_id = cpu_to_be32(starting_loc),
+ .available_length = cpu_to_be64(GVE_ADMINQ_BUFFER_SIZE),
+ .rule_descriptor_addr = cpu_to_be64(descriptor_bus),
+ };
+ err = gve_adminq_execute_cmd(priv, &cmd);
+ if (err)
+ goto out;
+
+ err = gve_adminq_process_flow_rules_query(priv, query_opcode, descriptor);
+
+out:
+ dma_pool_free(priv->adminq_pool, descriptor, descriptor_bus);
+ return err;
+}
diff --git a/drivers/net/ethernet/google/gve/gve_adminq.h b/drivers/net/ethernet/google/gve/gve_adminq.h
index e64a0e72e781..ed1370c9b197 100644
--- a/drivers/net/ethernet/google/gve/gve_adminq.h
+++ b/drivers/net/ethernet/google/gve/gve_adminq.h
@@ -25,11 +25,21 @@ enum gve_adminq_opcodes {
GVE_ADMINQ_REPORT_LINK_SPEED = 0xD,
GVE_ADMINQ_GET_PTYPE_MAP = 0xE,
GVE_ADMINQ_VERIFY_DRIVER_COMPATIBILITY = 0xF,
+ GVE_ADMINQ_QUERY_FLOW_RULES = 0x10,

/* For commands that are larger than 56 bytes */
GVE_ADMINQ_EXTENDED_COMMAND = 0xFF,
};

+/* The normal adminq command is restricted to be 56 bytes at maximum. For the
+ * longer adminq command, it is wrapped by GVE_ADMINQ_EXTENDED_COMMAND with
+ * inner opcode of gve_adminq_extended_cmd_opcodes specified. The inner command
+ * is written in the dma memory allocated by GVE_ADMINQ_EXTENDED_COMMAND.
+ */
+enum gve_adminq_extended_cmd_opcodes {
+ GVE_ADMINQ_CONFIGURE_FLOW_RULE = 0x101,
+};
+
/* Admin queue status codes */
enum gve_adminq_statuses {
GVE_ADMINQ_COMMAND_UNSET = 0x0,
@@ -434,6 +444,71 @@ struct gve_adminq_get_ptype_map {
__be64 ptype_map_addr;
};

+/* Flow-steering related definitions */
+enum gve_adminq_flow_rule_cfg_opcode {
+ GVE_FLOW_RULE_CFG_ADD = 0,
+ GVE_FLOW_RULE_CFG_DEL = 1,
+ GVE_FLOW_RULE_CFG_RESET = 2,
+};
+
+enum gve_adminq_flow_rule_query_opcode {
+ GVE_FLOW_RULE_QUERY_RULES = 0,
+ GVE_FLOW_RULE_QUERY_IDS = 1,
+ GVE_FLOW_RULE_QUERY_STATS = 2,
+};
+
+enum gve_adminq_flow_type {
+ GVE_FLOW_TYPE_TCPV4,
+ GVE_FLOW_TYPE_UDPV4,
+ GVE_FLOW_TYPE_SCTPV4,
+ GVE_FLOW_TYPE_AHV4,
+ GVE_FLOW_TYPE_ESPV4,
+ GVE_FLOW_TYPE_TCPV6,
+ GVE_FLOW_TYPE_UDPV6,
+ GVE_FLOW_TYPE_SCTPV6,
+ GVE_FLOW_TYPE_AHV6,
+ GVE_FLOW_TYPE_ESPV6,
+};
+
+/* Flow-steering command */
+struct gve_adminq_flow_rule {
+ __be16 flow_type;
+ __be16 action; /* RX queue id */
+ struct gve_flow_spec key;
+ struct gve_flow_spec mask;
+};
+
+struct gve_adminq_configure_flow_rule {
+ __be16 opcode;
+ u8 padding[2];
+ struct gve_adminq_flow_rule rule;
+ __be32 location;
+};
+
+static_assert(sizeof(struct gve_adminq_configure_flow_rule) == 92);
+
+struct gve_query_flow_rules_descriptor {
+ __be32 num_flow_rules;
+ __be32 max_flow_rules;
+ __be32 num_queried_rules;
+ __be32 total_length;
+};
+
+struct gve_adminq_queried_flow_rule {
+ __be32 location;
+ struct gve_adminq_flow_rule flow_rule;
+};
+
+struct gve_adminq_query_flow_rules {
+ __be16 opcode;
+ u8 padding[2];
+ __be32 starting_rule_id;
+ __be64 available_length; /* The dma memory length that the driver allocated */
+ __be64 rule_descriptor_addr; /* The dma memory address */
+};
+
+static_assert(sizeof(struct gve_adminq_query_flow_rules) == 24);
+
union gve_adminq_command {
struct {
__be32 opcode;
@@ -454,6 +529,7 @@ union gve_adminq_command {
struct gve_adminq_get_ptype_map get_ptype_map;
struct gve_adminq_verify_driver_compatibility
verify_driver_compatibility;
+ struct gve_adminq_query_flow_rules query_flow_rules;
struct gve_adminq_extended_command extended_command;
};
};
@@ -488,6 +564,10 @@ int gve_adminq_verify_driver_compatibility(struct gve_priv *priv,
u64 driver_info_len,
dma_addr_t driver_info_addr);
int gve_adminq_report_link_speed(struct gve_priv *priv);
+int gve_adminq_add_flow_rule(struct gve_priv *priv, struct gve_adminq_flow_rule *rule, u32 loc);
+int gve_adminq_del_flow_rule(struct gve_priv *priv, u32 loc);
+int gve_adminq_reset_flow_rules(struct gve_priv *priv);
+int gve_adminq_query_flow_rules(struct gve_priv *priv, u16 query_opcode, u32 starting_loc);

struct gve_ptype_lut;
int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv,
diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index fe1741d482b4..ffaa878d67bc 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -74,7 +74,8 @@ static const char gve_gstrings_adminq_stats[][ETH_GSTRING_LEN] = {
"adminq_create_tx_queue_cnt", "adminq_create_rx_queue_cnt",
"adminq_destroy_tx_queue_cnt", "adminq_destroy_rx_queue_cnt",
"adminq_dcfg_device_resources_cnt", "adminq_set_driver_parameter_cnt",
- "adminq_report_stats_cnt", "adminq_report_link_speed_cnt", "adminq_get_ptype_map_cnt"
+ "adminq_report_stats_cnt", "adminq_report_link_speed_cnt", "adminq_get_ptype_map_cnt",
+ "adminq_query_flow_rules", "adminq_cfg_flow_rule",
};

static const char gve_gstrings_priv_flags[][ETH_GSTRING_LEN] = {
@@ -450,6 +451,8 @@ gve_get_ethtool_stats(struct net_device *netdev,
data[i++] = priv->adminq_report_stats_cnt;
data[i++] = priv->adminq_report_link_speed_cnt;
data[i++] = priv->adminq_get_ptype_map_cnt;
+ data[i++] = priv->adminq_query_flow_rules_cnt;
+ data[i++] = priv->adminq_cfg_flow_rule_cnt;
}

static void gve_get_channels(struct net_device *netdev,
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index cabf7d4bcecb..fc142856e189 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -141,6 +141,49 @@ static void gve_get_stats(struct net_device *dev, struct rtnl_link_stats64 *s)
}
}

+static int gve_alloc_flow_rule_caches(struct gve_priv *priv)
+{
+ struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
+ int err = 0;
+
+ if (!priv->max_flow_rules)
+ return 0;
+
+ flow_rules_cache->rules_cache =
+ kvcalloc(GVE_FLOW_RULES_CACHE_SIZE, sizeof(*flow_rules_cache->rules_cache),
+ GFP_KERNEL);
+ if (!flow_rules_cache->rules_cache) {
+ dev_err(&priv->pdev->dev, "Cannot alloc flow rules cache\n");
+ return -ENOMEM;
+ }
+
+ flow_rules_cache->rule_ids_cache =
+ kvcalloc(GVE_FLOW_RULE_IDS_CACHE_SIZE, sizeof(*flow_rules_cache->rule_ids_cache),
+ GFP_KERNEL);
+ if (!flow_rules_cache->rule_ids_cache) {
+ dev_err(&priv->pdev->dev, "Cannot alloc flow rule ids cache\n");
+ err = -ENOMEM;
+ goto free_rules_cache;
+ }
+
+ return 0;
+
+free_rules_cache:
+ kvfree(flow_rules_cache->rules_cache);
+ flow_rules_cache->rules_cache = NULL;
+ return err;
+}
+
+static void gve_free_flow_rule_caches(struct gve_priv *priv)
+{
+ struct gve_flow_rules_cache *flow_rules_cache = &priv->flow_rules_cache;
+
+ kvfree(flow_rules_cache->rule_ids_cache);
+ flow_rules_cache->rule_ids_cache = NULL;
+ kvfree(flow_rules_cache->rules_cache);
+ flow_rules_cache->rules_cache = NULL;
+}
+
static int gve_alloc_counter_array(struct gve_priv *priv)
{
priv->counter_array =
@@ -521,9 +564,12 @@ static int gve_setup_device_resources(struct gve_priv *priv)
{
int err;

- err = gve_alloc_counter_array(priv);
+ err = gve_alloc_flow_rule_caches(priv);
if (err)
return err;
+ err = gve_alloc_counter_array(priv);
+ if (err)
+ goto abort_with_flow_rule_caches;
err = gve_alloc_notify_blocks(priv);
if (err)
goto abort_with_counter;
@@ -575,6 +621,8 @@ static int gve_setup_device_resources(struct gve_priv *priv)
gve_free_notify_blocks(priv);
abort_with_counter:
gve_free_counter_array(priv);
+abort_with_flow_rule_caches:
+ gve_free_flow_rule_caches(priv);

return err;
}
@@ -606,6 +654,7 @@ static void gve_teardown_device_resources(struct gve_priv *priv)
kvfree(priv->ptype_lut_dqo);
priv->ptype_lut_dqo = NULL;

+ gve_free_flow_rule_caches(priv);
gve_free_counter_array(priv);
gve_free_notify_blocks(priv);
gve_free_stats_report(priv);
--
2.45.2.627.g7a2c4fd464-goog

2024-06-13 01:49:24

by Ziwei Xiao

[permalink] [raw]

Subject: [PATCH net-next v2 5/5] gve: Add flow steering ethtool support

From: Jeroen de Borst <[email protected]>

Implement the ethtool commands that can be used to configure and query
flow-steering rules. For these ethtool commands, the driver will
temporarily drop the rtnl lock to reduce the latency for the flow
steering commands on separate NICs. It will then be protected by the new
added adminq lock.

A large part of this change consists of translating the ethtool
representation of 'ntuples' to our internal gve_flow_rule and vice-versa
in the new created gve_flow_rule.c

Considering the possible large amount of flow rules, the driver doesn't
store all the rules locally. When the user runs 'ethtool -n <nic>' to
check the registered rules, the driver will send adminq command to
query a limited amount of rules/rule ids(that filled in a 4096 bytes dma
memory) at a time as a cache for the ethtool queries. The adminq query
commands will be repeated for several times until the ethtool has
queried all the needed rules.

Signed-off-by: Jeroen de Borst <[email protected]>
Co-developed-by: Ziwei Xiao <[email protected]>
Signed-off-by: Ziwei Xiao <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
Changes in v2:
- Fix the sparse warning of be16_to_cpu by switching to use the
new added structure gve_adminq_queried_flow_rule from the
previous patch to do the be16_to_cpu operations
- Use kvfree instead of kfree for the rule allocated by kvzalloc
- Delete the check that disallows rx queue count changing when
there are rules alive because it already has a check in
net/ethtool/channels.c that ensures rx queue count change to
be failed if requested channel counts are too low for existing
ntuple filter settings

drivers/net/ethernet/google/gve/Makefile | 2 +-
drivers/net/ethernet/google/gve/gve.h | 8 +-
drivers/net/ethernet/google/gve/gve_ethtool.c | 80 ++++-
.../net/ethernet/google/gve/gve_flow_rule.c | 298 ++++++++++++++++++
drivers/net/ethernet/google/gve/gve_main.c | 32 +-
5 files changed, 411 insertions(+), 9 deletions(-)
create mode 100644 drivers/net/ethernet/google/gve/gve_flow_rule.c

diff --git a/drivers/net/ethernet/google/gve/Makefile b/drivers/net/ethernet/google/gve/Makefile
index b9a6be76531b..9ed07080b38a 100644
--- a/drivers/net/ethernet/google/gve/Makefile
+++ b/drivers/net/ethernet/google/gve/Makefile
@@ -1,4 +1,4 @@
# Makefile for the Google virtual Ethernet (gve) driver

obj-$(CONFIG_GVE) += gve.o
-gve-objs := gve_main.o gve_tx.o gve_tx_dqo.o gve_rx.o gve_rx_dqo.o gve_ethtool.o gve_adminq.o gve_utils.o
+gve-objs := gve_main.o gve_tx.o gve_tx_dqo.o gve_rx.o gve_rx_dqo.o gve_ethtool.o gve_adminq.o gve_utils.o gve_flow_rule.o
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index b9e9dd958f3c..84ac004d3953 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -1,7 +1,7 @@
/* SPDX-License-Identifier: (GPL-2.0 OR MIT)
* Google virtual Ethernet (gve) driver
*
- * Copyright (C) 2015-2021 Google, Inc.
+ * Copyright (C) 2015-2024 Google LLC
*/

#ifndef _GVE_H_
@@ -1170,6 +1170,12 @@ int gve_adjust_config(struct gve_priv *priv,
int gve_adjust_queues(struct gve_priv *priv,
struct gve_queue_config new_rx_config,
struct gve_queue_config new_tx_config);
+/* flow steering rule */
+int gve_get_flow_rule_entry(struct gve_priv *priv, struct ethtool_rxnfc *cmd);
+int gve_get_flow_rule_ids(struct gve_priv *priv, struct ethtool_rxnfc *cmd, u32 *rule_locs);
+int gve_add_flow_rule(struct gve_priv *priv, struct ethtool_rxnfc *cmd);
+int gve_del_flow_rule(struct gve_priv *priv, struct ethtool_rxnfc *cmd);
+int gve_flow_rules_reset(struct gve_priv *priv);
/* report stats handling */
void gve_handle_report_stats(struct gve_priv *priv);
/* exported by ethtool.c */
diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
index ffaa878d67bc..01ff72be0dca 100644
--- a/drivers/net/ethernet/google/gve/gve_ethtool.c
+++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/* Google virtual Ethernet (gve) driver
*
- * Copyright (C) 2015-2021 Google, Inc.
+ * Copyright (C) 2015-2024 Google LLC
*/

#include <linux/rtnetlink.h>
@@ -775,6 +775,82 @@ static int gve_set_coalesce(struct net_device *netdev,
return 0;
}

+static int gve_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
+{
+ struct gve_priv *priv = netdev_priv(netdev);
+ int err = 0;
+
+ if (!(netdev->features & NETIF_F_NTUPLE))
+ return -EOPNOTSUPP;
+
+ dev_hold(netdev);
+ rtnl_unlock();
+
+ switch (cmd->cmd) {
+ case ETHTOOL_SRXCLSRLINS:
+ err = gve_add_flow_rule(priv, cmd);
+ break;
+ case ETHTOOL_SRXCLSRLDEL:
+ err = gve_del_flow_rule(priv, cmd);
+ break;
+ case ETHTOOL_SRXFH:
+ err = -EOPNOTSUPP;
+ break;
+ default:
+ err = -EOPNOTSUPP;
+ break;
+ }
+
+ rtnl_lock();
+ dev_put(netdev);
+ return err;
+}
+
+static int gve_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd, u32 *rule_locs)
+{
+ struct gve_priv *priv = netdev_priv(netdev);
+ int err = 0;
+
+ dev_hold(netdev);
+ rtnl_unlock();
+
+ switch (cmd->cmd) {
+ case ETHTOOL_GRXRINGS:
+ cmd->data = priv->rx_cfg.num_queues;
+ break;
+ case ETHTOOL_GRXCLSRLCNT:
+ if (!priv->max_flow_rules) {
+ err = -EOPNOTSUPP;
+ goto out;
+ }
+
+ err = gve_adminq_query_flow_rules(priv, GVE_FLOW_RULE_QUERY_STATS, 0);
+ if (err)
+ goto out;
+
+ cmd->rule_cnt = priv->num_flow_rules;
+ cmd->data = priv->max_flow_rules;
+ break;
+ case ETHTOOL_GRXCLSRULE:
+ err = gve_get_flow_rule_entry(priv, cmd);
+ break;
+ case ETHTOOL_GRXCLSRLALL:
+ err = gve_get_flow_rule_ids(priv, cmd, (u32 *)rule_locs);
+ break;
+ case ETHTOOL_GRXFH:
+ err = -EOPNOTSUPP;
+ break;
+ default:
+ err = -EOPNOTSUPP;
+ break;
+ }
+
+out:
+ rtnl_lock();
+ dev_put(netdev);
+ return err;
+}
+
const struct ethtool_ops gve_ethtool_ops = {
.supported_coalesce_params = ETHTOOL_COALESCE_USECS,
.supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT,
@@ -786,6 +862,8 @@ const struct ethtool_ops gve_ethtool_ops = {
.get_msglevel = gve_get_msglevel,
.set_channels = gve_set_channels,
.get_channels = gve_get_channels,
+ .set_rxnfc = gve_set_rxnfc,
+ .get_rxnfc = gve_get_rxnfc,
.get_link = ethtool_op_get_link,
.get_coalesce = gve_get_coalesce,
.set_coalesce = gve_set_coalesce,
diff --git a/drivers/net/ethernet/google/gve/gve_flow_rule.c b/drivers/net/ethernet/google/gve/gve_flow_rule.c
new file mode 100644
index 000000000000..0bb8cd1876a3
--- /dev/null
+++ b/drivers/net/ethernet/google/gve/gve_flow_rule.c
@@ -0,0 +1,298 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/* Google virtual Ethernet (gve) driver
+ *
+ * Copyright (C) 2015-2024 Google LLC
+ */
+
+#include "gve.h"
+#include "gve_adminq.h"
+
+static
+int gve_fill_ethtool_flow_spec(struct ethtool_rx_flow_spec *fsp,
+ struct gve_adminq_queried_flow_rule *rule)
+{
+ struct gve_adminq_flow_rule *flow_rule = &rule->flow_rule;
+ static const u16 flow_type_lut[] = {
+ [GVE_FLOW_TYPE_TCPV4] = TCP_V4_FLOW,
+ [GVE_FLOW_TYPE_UDPV4] = UDP_V4_FLOW,
+ [GVE_FLOW_TYPE_SCTPV4] = SCTP_V4_FLOW,
+ [GVE_FLOW_TYPE_AHV4] = AH_V4_FLOW,
+ [GVE_FLOW_TYPE_ESPV4] = ESP_V4_FLOW,
+ [GVE_FLOW_TYPE_TCPV6] = TCP_V6_FLOW,
+ [GVE_FLOW_TYPE_UDPV6] = UDP_V6_FLOW,
+ [GVE_FLOW_TYPE_SCTPV6] = SCTP_V6_FLOW,
+ [GVE_FLOW_TYPE_AHV6] = AH_V6_FLOW,
+ [GVE_FLOW_TYPE_ESPV6] = ESP_V6_FLOW,
+ };
+
+ if (be16_to_cpu(flow_rule->flow_type) >= ARRAY_SIZE(flow_type_lut))
+ return -EINVAL;
+
+ fsp->flow_type = flow_type_lut[be16_to_cpu(flow_rule->flow_type)];
+
+ memset(&fsp->h_u, 0, sizeof(fsp->h_u));
+ memset(&fsp->h_ext, 0, sizeof(fsp->h_ext));
+ memset(&fsp->m_u, 0, sizeof(fsp->m_u));
+ memset(&fsp->m_ext, 0, sizeof(fsp->m_ext));
+
+ switch (fsp->flow_type) {
+ case TCP_V4_FLOW:
+ case UDP_V4_FLOW:
+ case SCTP_V4_FLOW:
+ fsp->h_u.tcp_ip4_spec.ip4src = flow_rule->key.src_ip[0];
+ fsp->h_u.tcp_ip4_spec.ip4dst = flow_rule->key.dst_ip[0];
+ fsp->h_u.tcp_ip4_spec.psrc = flow_rule->key.src_port;
+ fsp->h_u.tcp_ip4_spec.pdst = flow_rule->key.dst_port;
+ fsp->h_u.tcp_ip4_spec.tos = flow_rule->key.tos;
+ fsp->m_u.tcp_ip4_spec.ip4src = flow_rule->mask.src_ip[0];
+ fsp->m_u.tcp_ip4_spec.ip4dst = flow_rule->mask.dst_ip[0];
+ fsp->m_u.tcp_ip4_spec.psrc = flow_rule->mask.src_port;
+ fsp->m_u.tcp_ip4_spec.pdst = flow_rule->mask.dst_port;
+ fsp->m_u.tcp_ip4_spec.tos = flow_rule->mask.tos;
+ break;
+ case AH_V4_FLOW:
+ case ESP_V4_FLOW:
+ fsp->h_u.ah_ip4_spec.ip4src = flow_rule->key.src_ip[0];
+ fsp->h_u.ah_ip4_spec.ip4dst = flow_rule->key.dst_ip[0];
+ fsp->h_u.ah_ip4_spec.spi = flow_rule->key.spi;
+ fsp->h_u.ah_ip4_spec.tos = flow_rule->key.tos;
+ fsp->m_u.ah_ip4_spec.ip4src = flow_rule->mask.src_ip[0];
+ fsp->m_u.ah_ip4_spec.ip4dst = flow_rule->mask.dst_ip[0];
+ fsp->m_u.ah_ip4_spec.spi = flow_rule->mask.spi;
+ fsp->m_u.ah_ip4_spec.tos = flow_rule->mask.tos;
+ break;
+ case TCP_V6_FLOW:
+ case UDP_V6_FLOW:
+ case SCTP_V6_FLOW:
+ memcpy(fsp->h_u.tcp_ip6_spec.ip6src, &flow_rule->key.src_ip,
+ sizeof(struct in6_addr));
+ memcpy(fsp->h_u.tcp_ip6_spec.ip6dst, &flow_rule->key.dst_ip,
+ sizeof(struct in6_addr));
+ fsp->h_u.tcp_ip6_spec.psrc = flow_rule->key.src_port;
+ fsp->h_u.tcp_ip6_spec.pdst = flow_rule->key.dst_port;
+ fsp->h_u.tcp_ip6_spec.tclass = flow_rule->key.tclass;
+ memcpy(fsp->m_u.tcp_ip6_spec.ip6src, &flow_rule->mask.src_ip,
+ sizeof(struct in6_addr));
+ memcpy(fsp->m_u.tcp_ip6_spec.ip6dst, &flow_rule->mask.dst_ip,
+ sizeof(struct in6_addr));
+ fsp->m_u.tcp_ip6_spec.psrc = flow_rule->mask.src_port;
+ fsp->m_u.tcp_ip6_spec.pdst = flow_rule->mask.dst_port;
+ fsp->m_u.tcp_ip6_spec.tclass = flow_rule->mask.tclass;
+ break;
+ case AH_V6_FLOW:
+ case ESP_V6_FLOW:
+ memcpy(fsp->h_u.ah_ip6_spec.ip6src, &flow_rule->key.src_ip,
+ sizeof(struct in6_addr));
+ memcpy(fsp->h_u.ah_ip6_spec.ip6dst, &flow_rule->key.dst_ip,
+ sizeof(struct in6_addr));
+ fsp->h_u.ah_ip6_spec.spi = flow_rule->key.spi;
+ fsp->h_u.ah_ip6_spec.tclass = flow_rule->key.tclass;
+ memcpy(fsp->m_u.ah_ip6_spec.ip6src, &flow_rule->mask.src_ip,
+ sizeof(struct in6_addr));
+ memcpy(fsp->m_u.ah_ip6_spec.ip6dst, &flow_rule->mask.dst_ip,
+ sizeof(struct in6_addr));
+ fsp->m_u.ah_ip6_spec.spi = flow_rule->mask.spi;
+ fsp->m_u.ah_ip6_spec.tclass = flow_rule->mask.tclass;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ fsp->ring_cookie = be16_to_cpu(flow_rule->action);
+
+ return 0;
+}
+
+static int gve_generate_flow_rule(struct gve_priv *priv, struct ethtool_rx_flow_spec *fsp,
+ struct gve_adminq_flow_rule *rule)
+{
+ static const u16 flow_type_lut[] = {
+ [TCP_V4_FLOW] = GVE_FLOW_TYPE_TCPV4,
+ [UDP_V4_FLOW] = GVE_FLOW_TYPE_UDPV4,
+ [SCTP_V4_FLOW] = GVE_FLOW_TYPE_SCTPV4,
+ [AH_V4_FLOW] = GVE_FLOW_TYPE_AHV4,
+ [ESP_V4_FLOW] = GVE_FLOW_TYPE_ESPV4,
+ [TCP_V6_FLOW] = GVE_FLOW_TYPE_TCPV6,
+ [UDP_V6_FLOW] = GVE_FLOW_TYPE_UDPV6,
+ [SCTP_V6_FLOW] = GVE_FLOW_TYPE_SCTPV6,
+ [AH_V6_FLOW] = GVE_FLOW_TYPE_AHV6,
+ [ESP_V6_FLOW] = GVE_FLOW_TYPE_ESPV6,
+ };
+ u32 flow_type;
+
+ if (fsp->ring_cookie == RX_CLS_FLOW_DISC)
+ return -EOPNOTSUPP;
+
+ if (fsp->ring_cookie >= priv->rx_cfg.num_queues)
+ return -EINVAL;
+
+ rule->action = cpu_to_be16(fsp->ring_cookie);
+
+ flow_type = fsp->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT | FLOW_RSS);
+ if (!flow_type || flow_type >= ARRAY_SIZE(flow_type_lut))
+ return -EINVAL;
+
+ rule->flow_type = cpu_to_be16(flow_type_lut[flow_type]);
+
+ switch (flow_type) {
+ case TCP_V4_FLOW:
+ case UDP_V4_FLOW:
+ case SCTP_V4_FLOW:
+ rule->key.src_ip[0] = fsp->h_u.tcp_ip4_spec.ip4src;
+ rule->key.dst_ip[0] = fsp->h_u.tcp_ip4_spec.ip4dst;
+ rule->key.src_port = fsp->h_u.tcp_ip4_spec.psrc;
+ rule->key.dst_port = fsp->h_u.tcp_ip4_spec.pdst;
+ rule->mask.src_ip[0] = fsp->m_u.tcp_ip4_spec.ip4src;
+ rule->mask.dst_ip[0] = fsp->m_u.tcp_ip4_spec.ip4dst;
+ rule->mask.src_port = fsp->m_u.tcp_ip4_spec.psrc;
+ rule->mask.dst_port = fsp->m_u.tcp_ip4_spec.pdst;
+ break;
+ case AH_V4_FLOW:
+ case ESP_V4_FLOW:
+ rule->key.src_ip[0] = fsp->h_u.tcp_ip4_spec.ip4src;
+ rule->key.dst_ip[0] = fsp->h_u.tcp_ip4_spec.ip4dst;
+ rule->key.spi = fsp->h_u.ah_ip4_spec.spi;
+ rule->mask.src_ip[0] = fsp->m_u.tcp_ip4_spec.ip4src;
+ rule->mask.dst_ip[0] = fsp->m_u.tcp_ip4_spec.ip4dst;
+ rule->mask.spi = fsp->m_u.ah_ip4_spec.spi;
+ break;
+ case TCP_V6_FLOW:
+ case UDP_V6_FLOW:
+ case SCTP_V6_FLOW:
+ memcpy(&rule->key.src_ip, fsp->h_u.tcp_ip6_spec.ip6src,
+ sizeof(struct in6_addr));
+ memcpy(&rule->key.dst_ip, fsp->h_u.tcp_ip6_spec.ip6dst,
+ sizeof(struct in6_addr));
+ rule->key.src_port = fsp->h_u.tcp_ip6_spec.psrc;
+ rule->key.dst_port = fsp->h_u.tcp_ip6_spec.pdst;
+ memcpy(&rule->mask.src_ip, fsp->m_u.tcp_ip6_spec.ip6src,
+ sizeof(struct in6_addr));
+ memcpy(&rule->mask.dst_ip, fsp->m_u.tcp_ip6_spec.ip6dst,
+ sizeof(struct in6_addr));
+ rule->mask.src_port = fsp->m_u.tcp_ip6_spec.psrc;
+ rule->mask.dst_port = fsp->m_u.tcp_ip6_spec.pdst;
+ break;
+ case AH_V6_FLOW:
+ case ESP_V6_FLOW:
+ memcpy(&rule->key.src_ip, fsp->h_u.usr_ip6_spec.ip6src,
+ sizeof(struct in6_addr));
+ memcpy(&rule->key.dst_ip, fsp->h_u.usr_ip6_spec.ip6dst,
+ sizeof(struct in6_addr));
+ rule->key.spi = fsp->h_u.ah_ip6_spec.spi;
+ memcpy(&rule->mask.src_ip, fsp->m_u.usr_ip6_spec.ip6src,
+ sizeof(struct in6_addr));
+ memcpy(&rule->mask.dst_ip, fsp->m_u.usr_ip6_spec.ip6dst,
+ sizeof(struct in6_addr));
+ rule->key.spi = fsp->h_u.ah_ip6_spec.spi;
+ break;
+ default:
+ /* not doing un-parsed flow types */
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+int gve_get_flow_rule_entry(struct gve_priv *priv, struct ethtool_rxnfc *cmd)
+{
+ struct gve_adminq_queried_flow_rule *rules_cache = priv->flow_rules_cache.rules_cache;
+ struct ethtool_rx_flow_spec *fsp = (struct ethtool_rx_flow_spec *)&cmd->fs;
+ u32 *cache_num = &priv->flow_rules_cache.rules_cache_num;
+ struct gve_adminq_queried_flow_rule *rule = NULL;
+ int err = 0;
+ u32 i;
+
+ if (!priv->max_flow_rules)
+ return -EOPNOTSUPP;
+
+ if (!priv->flow_rules_cache.rules_cache_synced ||
+ fsp->location < be32_to_cpu(rules_cache[0].location) ||
+ fsp->location > be32_to_cpu(rules_cache[*cache_num - 1].location)) {
+ err = gve_adminq_query_flow_rules(priv, GVE_FLOW_RULE_QUERY_RULES, fsp->location);
+ if (err)
+ return err;
+
+ priv->flow_rules_cache.rules_cache_synced = true;
+ }
+
+ for (i = 0; i < *cache_num; i++) {
+ if (fsp->location == be32_to_cpu(rules_cache[i].location)) {
+ rule = &rules_cache[i];
+ break;
+ }
+ }
+
+ if (!rule)
+ return -EINVAL;
+
+ err = gve_fill_ethtool_flow_spec(fsp, rule);
+
+ return err;
+}
+
+int gve_get_flow_rule_ids(struct gve_priv *priv, struct ethtool_rxnfc *cmd, u32 *rule_locs)
+{
+ __be32 *rule_ids_cache = priv->flow_rules_cache.rule_ids_cache;
+ u32 *cache_num = &priv->flow_rules_cache.rule_ids_cache_num;
+ u32 starting_rule_id = 0;
+ u32 i = 0, j = 0;
+ int err = 0;
+
+ if (!priv->max_flow_rules)
+ return -EOPNOTSUPP;
+
+ do {
+ err = gve_adminq_query_flow_rules(priv, GVE_FLOW_RULE_QUERY_IDS,
+ starting_rule_id);
+ if (err)
+ return err;
+
+ for (i = 0; i < *cache_num; i++) {
+ if (j >= cmd->rule_cnt)
+ return -EMSGSIZE;
+
+ rule_locs[j++] = be32_to_cpu(rule_ids_cache[i]);
+ starting_rule_id = be32_to_cpu(rule_ids_cache[i]) + 1;
+ }
+ } while (*cache_num != 0);
+ cmd->data = priv->max_flow_rules;
+
+ return err;
+}
+
+int gve_add_flow_rule(struct gve_priv *priv, struct ethtool_rxnfc *cmd)
+{
+ struct ethtool_rx_flow_spec *fsp = &cmd->fs;
+ struct gve_adminq_flow_rule *rule = NULL;
+ int err;
+
+ if (!priv->max_flow_rules)
+ return -EOPNOTSUPP;
+
+ rule = kvzalloc(sizeof(*rule), GFP_KERNEL);
+ if (!rule)
+ return -ENOMEM;
+
+ err = gve_generate_flow_rule(priv, fsp, rule);
+ if (err)
+ goto out;
+
+ err = gve_adminq_add_flow_rule(priv, rule, fsp->location);
+
+out:
+ kvfree(rule);
+ if (err)
+ dev_err(&priv->pdev->dev, "Failed to add the flow rule: %u", fsp->location);
+
+ return err;
+}
+
+int gve_del_flow_rule(struct gve_priv *priv, struct ethtool_rxnfc *cmd)
+{
+ struct ethtool_rx_flow_spec *fsp = (struct ethtool_rx_flow_spec *)&cmd->fs;
+
+ if (!priv->max_flow_rules)
+ return -EOPNOTSUPP;
+
+ return gve_adminq_del_flow_rule(priv, fsp->location);
+}
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index fc142856e189..9744b426940e 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/* Google virtual Ethernet (gve) driver
*
- * Copyright (C) 2015-2021 Google, Inc.
+ * Copyright (C) 2015-2024 Google LLC
*/

#include <linux/bpf.h>
@@ -635,6 +635,12 @@ static void gve_teardown_device_resources(struct gve_priv *priv)

/* Tell device its resources are being freed */
if (gve_get_device_resources_ok(priv)) {
+ err = gve_flow_rules_reset(priv);
+ if (err) {
+ dev_err(&priv->pdev->dev,
+ "Failed to reset flow rules: err=%d\n", err);
+ gve_trigger_reset(priv);
+ }
/* detach the stats report */
err = gve_adminq_report_stats(priv, 0, 0x0, GVE_STATS_REPORT_TIMER_PERIOD);
if (err) {
@@ -1779,6 +1785,14 @@ static int gve_xdp(struct net_device *dev, struct netdev_bpf *xdp)
}
}

+int gve_flow_rules_reset(struct gve_priv *priv)
+{
+ if (!priv->max_flow_rules)
+ return 0;
+
+ return gve_adminq_reset_flow_rules(priv);
+}
+
int gve_adjust_config(struct gve_priv *priv,
struct gve_tx_alloc_rings_cfg *tx_alloc_cfg,
struct gve_rx_alloc_rings_cfg *rx_alloc_cfg)
@@ -2052,15 +2066,21 @@ static int gve_set_features(struct net_device *netdev,
netdev->features ^= NETIF_F_LRO;
if (netif_carrier_ok(netdev)) {
err = gve_adjust_config(priv, &tx_alloc_cfg, &rx_alloc_cfg);
- if (err) {
- /* Revert the change on error. */
- netdev->features = orig_features;
- return err;
- }
+ if (err)
+ goto revert_features;
}
}
+ if ((netdev->features & NETIF_F_NTUPLE) && !(features & NETIF_F_NTUPLE)) {
+ err = gve_flow_rules_reset(priv);
+ if (err)
+ goto revert_features;
+ }

return 0;
+
+revert_features:
+ netdev->features = orig_features;
+ return err;
}

static const struct net_device_ops gve_netdev_ops = {
--
2.45.2.627.g7a2c4fd464-goog

2024-06-15 02:27:33

by Jakub Kicinski

[permalink] [raw]

Subject: Re: [PATCH net-next v2 5/5] gve: Add flow steering ethtool support

On Thu, 13 Jun 2024 01:47:44 +0000 Ziwei Xiao wrote:
> + dev_hold(netdev);
> + rtnl_unlock();

If you care about the op being unlocked make the core not take
the lock in the first place. The core itself doesn't need it
either. Add a flag in the ops to tell the core driver doesn't
need the lock.

Disappointing :|

2024-06-15 02:28:28

by Jakub Kicinski

[permalink] [raw]

Subject: Re: [PATCH net-next v2 3/5] gve: Add flow steering device option

On Thu, 13 Jun 2024 01:47:42 +0000 Ziwei Xiao wrote:
> + if (dev_op_flow_steering &&
> + (supported_features_mask & GVE_SUP_FLOW_STEERING_MASK)) {
> + if (dev_op_flow_steering->max_flow_rules) {
> + priv->max_flow_rules =
> + be32_to_cpu(dev_op_flow_steering->max_flow_rules);
> + dev_info(&priv->pdev->dev,
> + "FLOW STEERING device option enabled with max rule limit of %u.\n",
> + priv->max_flow_rules);

There's a print to the kernel logs every time driver loads to tell
the user flow steering is available...

> - /* DQO supports LRO. */
> if (!gve_is_gqi(priv))
> - priv->dev->hw_features |= NETIF_F_LRO;
> + priv->dev->hw_features |= NETIF_F_LRO | NETIF_F_NTUPLE;

Yet the uAPI feature which is supposed to let the user know its
supported appears to be set unconditionally.