2024-05-20 11:57:06

by Shravan Ramani

[permalink] [raw]
Subject: [PATCH v2 0/4] Updates to mlxbf-pmc

This submission contains 4 patches relating to mlxbf-pmc.

Patch 1 adds documentation for the sysfs files created by the driver.
Patches 2 and 3 add specific functionality to the driver for supporting
64-bit ocunters, cycle count and clock_measure performance block.
Patch 4 adds documentation for the newly added sysfs entries.

v1 -> v2
Added patch 4 to document sysfs entries added in patches 2 and 3.

Shravan Kumar Ramani (4):
Documentation/ABI: Add document for Mellanox PMC driver
platform/mellanox: mlxbf-pmc: Add support for 64-bit counters and
cycle count
platform/mellanox: mlxbf-pmc: Add support for clock_measure
performance block
Documentation/ABI: Add new sysfs fields to sysfs-platform-mellanox-pmc

.../ABI/testing/sysfs-platform-mellanox-pmc | 65 +++++++
drivers/platform/mellanox/mlxbf-pmc.c | 180 +++++++++++++++++-
2 files changed, 241 insertions(+), 4 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-platform-mellanox-pmc

--
2.30.1



2024-05-20 11:57:20

by Shravan Ramani

[permalink] [raw]
Subject: [PATCH v2 1/4] Documentation/ABI: Add document for Mellanox PMC driver

The sysfs interface is created for programming and monitoring the
performance counters in various HW blocks of Mellanox BlueField-1,
BlueField-2 and BlueField-3.

Signed-off-by: Shravan Kumar Ramani <[email protected]>
Reviewed-by: David Thompson <[email protected]>
---
.../ABI/testing/sysfs-platform-mellanox-pmc | 49 +++++++++++++++++++
1 file changed, 49 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-platform-mellanox-pmc

diff --git a/Documentation/ABI/testing/sysfs-platform-mellanox-pmc b/Documentation/ABI/testing/sysfs-platform-mellanox-pmc
new file mode 100644
index 000000000000..47094024dbeb
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-platform-mellanox-pmc
@@ -0,0 +1,49 @@
+HID Driver Description
+MLNXBFD0 mlxbf-pmc Performance counters (BlueField-1)
+MLNXBFD1 mlxbf-pmc Performance counters (BlueField-2)
+MLNXBFD2 mlxbf-pmc Performance counters (BlueField-3)
+
+What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/event_list
+Date: Dec 2020
+KernelVersion: 5.10
+Contact: "Shravan Kumar Ramani <[email protected]>"
+Description:
+ List of events supported by the counters in the specific block.
+ It is used to extract the event number or ID associated with
+ each event.
+
+What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/event<N>
+Date: Dec 2020
+KernelVersion: 5.10
+Contact: "Shravan Kumar Ramani <[email protected]>"
+Description:
+ Event monitored by corresponding counter. This is used to
+ program or read back the event that should be or is currently
+ being monitored by counter<N>.
+
+What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/counter<N>
+Date: Dec 2020
+KernelVersion: 5.10
+Contact: "Shravan Kumar Ramani <[email protected]>"
+Description:
+ Counter value of the event being monitored. This is used to
+ read the counter value of the event which was programmed using
+ event<N>. This is also used to clear or reset the counter value.
+
+What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/enable
+Date: Dec 2020
+KernelVersion: 5.10
+Contact: "Shravan Kumar Ramani <[email protected]>"
+Description:
+ Start or stop counters. This is used to start the counters
+ for monitoring the programmed events and also to stop the
+ counters after the desired duration.
+
+What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/<reg>
+Date: Dec 2020
+KernelVersion: 5.10
+Contact: "Shravan Kumar Ramani <[email protected]>"
+Description:
+ Value of register. This is used to read or reset the registers
+ where various performance statistics are counted for each block.
+
--
2.30.1


2024-05-20 11:57:31

by Shravan Ramani

[permalink] [raw]
Subject: [PATCH v2 2/4] platform/mellanox: mlxbf-pmc: Add support for 64-bit counters and cycle count

Add support for programming any counter to monitor the cycle count.
Since counting of cycles using 32-bit ocunters would result in frequent
wraparounds, add the ability to combine 2 adjacent 32-bit counters to
form 1 64-bit counter.
Both these features are supported by BlueField-3 PMC hardware, hence
the required bit-fields are exposed by the driver via sysfs to allow
the user to configure as needed.

Signed-off-by: Shravan Kumar Ramani <[email protected]>
Reviewed-by: David Thompson <[email protected]>
Reviewed-by: Vadim Pasternak <[email protected]>
---
drivers/platform/mellanox/mlxbf-pmc.c | 134 ++++++++++++++++++++++++++
1 file changed, 134 insertions(+)

diff --git a/drivers/platform/mellanox/mlxbf-pmc.c b/drivers/platform/mellanox/mlxbf-pmc.c
index 4ed9c7fd2b62..635ecc3b3845 100644
--- a/drivers/platform/mellanox/mlxbf-pmc.c
+++ b/drivers/platform/mellanox/mlxbf-pmc.c
@@ -88,6 +88,8 @@
#define MLXBF_PMC_CRSPACE_PERFMON_CTL(n) (n * MLXBF_PMC_CRSPACE_PERFMON_REG0_SZ)
#define MLXBF_PMC_CRSPACE_PERFMON_EN BIT(30)
#define MLXBF_PMC_CRSPACE_PERFMON_CLR BIT(28)
+#define MLXBF_PMC_CRSPACE_PERFMON_UOC GENMASK(15, 0)
+#define MLXBF_PMC_CRSPACE_PERFMON_COUNT_CLOCK(n) (MLXBF_PMC_CRSPACE_PERFMON_CTL(n) + 0x4)
#define MLXBF_PMC_CRSPACE_PERFMON_VAL0(n) (MLXBF_PMC_CRSPACE_PERFMON_CTL(n) + 0xc)

/**
@@ -114,6 +116,8 @@ struct mlxbf_pmc_attribute {
* @attr_event: Attributes for "event" sysfs files
* @attr_event_list: Attributes for "event_list" sysfs files
* @attr_enable: Attributes for "enable" sysfs files
+ * @attr_use_odd_counter: Attributes for "use_odd_counter" sysfs files
+ * @attr_count_clock: Attributes for "count_clock" sysfs files
* @block_attr: All attributes needed for the block
* @block_attr_grp: Attribute group for the block
*/
@@ -126,6 +130,8 @@ struct mlxbf_pmc_block_info {
struct mlxbf_pmc_attribute *attr_event;
struct mlxbf_pmc_attribute attr_event_list;
struct mlxbf_pmc_attribute attr_enable;
+ struct mlxbf_pmc_attribute attr_use_odd_counter;
+ struct mlxbf_pmc_attribute attr_count_clock;
struct attribute *block_attr[MLXBF_PMC_MAX_ATTRS];
struct attribute_group block_attr_grp;
};
@@ -1763,6 +1769,103 @@ static ssize_t mlxbf_pmc_enable_store(struct device *dev,
return count;
}

+/* Show function for "use_odd_counter" sysfs files - only for crspace */
+static ssize_t mlxbf_pmc_use_odd_counter_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct mlxbf_pmc_attribute *attr_use_odd_counter = container_of(
+ attr, struct mlxbf_pmc_attribute, dev_attr);
+ unsigned int blk_num;
+ u32 value, reg;
+
+ blk_num = attr_use_odd_counter->nr;
+
+ if (mlxbf_pmc_readl(pmc->block[blk_num].mmio_base +
+ MLXBF_PMC_CRSPACE_PERFMON_CTL(pmc->block[blk_num].counters),
+ &reg))
+ return -EINVAL;
+
+ value = FIELD_GET(MLXBF_PMC_CRSPACE_PERFMON_UOC, reg);
+
+ return sysfs_emit(buf, "%u\n", value);
+}
+
+/* Store function for "use_odd_counter" sysfs files - only for crspace */
+static ssize_t mlxbf_pmc_use_odd_counter_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct mlxbf_pmc_attribute *attr_use_odd_counter = container_of(
+ attr, struct mlxbf_pmc_attribute, dev_attr);
+ unsigned int blk_num;
+ u32 uoc, reg;
+ int err;
+
+ blk_num = attr_use_odd_counter->nr;
+
+ err = kstrtouint(buf, 0, &uoc);
+ if (err < 0)
+ return err;
+
+ err = mlxbf_pmc_readl(pmc->block[blk_num].mmio_base +
+ MLXBF_PMC_CRSPACE_PERFMON_CTL(pmc->block[blk_num].counters),
+ &reg);
+ if (err)
+ return -EINVAL;
+
+ reg &= ~MLXBF_PMC_CRSPACE_PERFMON_UOC;
+ reg |= FIELD_PREP(MLXBF_PMC_CRSPACE_PERFMON_UOC, uoc);
+
+ mlxbf_pmc_write(pmc->block[blk_num].mmio_base +
+ MLXBF_PMC_CRSPACE_PERFMON_CTL(pmc->block[blk_num].counters),
+ MLXBF_PMC_WRITE_REG_32, reg);
+
+ return count;
+}
+
+/* Show function for "count_clock" sysfs files - only for crspace */
+static ssize_t mlxbf_pmc_count_clock_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct mlxbf_pmc_attribute *attr_count_clock = container_of(
+ attr, struct mlxbf_pmc_attribute, dev_attr);
+ unsigned int blk_num;
+ u32 reg;
+
+ blk_num = attr_count_clock->nr;
+
+ if (mlxbf_pmc_readl(pmc->block[blk_num].mmio_base +
+ MLXBF_PMC_CRSPACE_PERFMON_COUNT_CLOCK(pmc->block[blk_num].counters),
+ &reg))
+ return -EINVAL;
+
+ return sysfs_emit(buf, "%u\n", reg);
+}
+
+/* Store function for "count_clock" sysfs files - only for crspace */
+static ssize_t mlxbf_pmc_count_clock_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct mlxbf_pmc_attribute *attr_count_clock = container_of(
+ attr, struct mlxbf_pmc_attribute, dev_attr);
+ unsigned int blk_num;
+ u32 reg;
+ int err;
+
+ blk_num = attr_count_clock->nr;
+
+ err = kstrtouint(buf, 0, &reg);
+ if (err < 0)
+ return err;
+
+ mlxbf_pmc_write(pmc->block[blk_num].mmio_base +
+ MLXBF_PMC_CRSPACE_PERFMON_COUNT_CLOCK(pmc->block[blk_num].counters),
+ MLXBF_PMC_WRITE_REG_32, reg);
+
+ return count;
+}
+
/* Populate attributes for blocks with counters to monitor performance */
static int mlxbf_pmc_init_perftype_counter(struct device *dev, unsigned int blk_num)
{
@@ -1799,6 +1902,37 @@ static int mlxbf_pmc_init_perftype_counter(struct device *dev, unsigned int blk_
attr = NULL;
}

+ if (pmc->block[blk_num].type == MLXBF_PMC_TYPE_CRSPACE) {
+ /*
+ * Couple adjacent odd and even 32-bit counters to form 64-bit counters
+ * using "use_odd_counter" sysfs which has one bit per even counter.
+ */
+ attr = &pmc->block[blk_num].attr_use_odd_counter;
+ attr->dev_attr.attr.mode = 0644;
+ attr->dev_attr.show = mlxbf_pmc_use_odd_counter_show;
+ attr->dev_attr.store = mlxbf_pmc_use_odd_counter_store;
+ attr->nr = blk_num;
+ attr->dev_attr.attr.name = devm_kasprintf(dev, GFP_KERNEL,
+ "use_odd_counter");
+ if (!attr->dev_attr.attr.name)
+ return -ENOMEM;
+ pmc->block[blk_num].block_attr[++i] = &attr->dev_attr.attr;
+ attr = NULL;
+
+ /* Program crspace counters to count clock cycles using "count_clock" sysfs */
+ attr = &pmc->block[blk_num].attr_count_clock;
+ attr->dev_attr.attr.mode = 0644;
+ attr->dev_attr.show = mlxbf_pmc_count_clock_show;
+ attr->dev_attr.store = mlxbf_pmc_count_clock_store;
+ attr->nr = blk_num;
+ attr->dev_attr.attr.name = devm_kasprintf(dev, GFP_KERNEL,
+ "count_clock");
+ if (!attr->dev_attr.attr.name)
+ return -ENOMEM;
+ pmc->block[blk_num].block_attr[++i] = &attr->dev_attr.attr;
+ attr = NULL;
+ }
+
pmc->block[blk_num].attr_counter = devm_kcalloc(
dev, pmc->block[blk_num].counters,
sizeof(struct mlxbf_pmc_attribute), GFP_KERNEL);
--
2.30.1


2024-05-20 11:57:43

by Shravan Ramani

[permalink] [raw]
Subject: [PATCH v2 4/4] Documentation/ABI: Add new sysfs fields to sysfs-platform-mellanox-pmc

Document newly added "count_clock" and "use_odd_counter" sysfs entries
for the Mellanox BlueField PMC driver.

Signed-off-by: Shravan Kumar Ramani <[email protected]>
Reviewed-by: David Thompson <[email protected]>
---
.../ABI/testing/sysfs-platform-mellanox-pmc | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-platform-mellanox-pmc b/Documentation/ABI/testing/sysfs-platform-mellanox-pmc
index 47094024dbeb..b9973ebec2fd 100644
--- a/Documentation/ABI/testing/sysfs-platform-mellanox-pmc
+++ b/Documentation/ABI/testing/sysfs-platform-mellanox-pmc
@@ -47,3 +47,19 @@ Description:
Value of register. This is used to read or reset the registers
where various performance statistics are counted for each block.

+What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/count_clock
+Date: May 2024
+KernelVersion: 6.10
+Contact: "Shravan Kumar Ramani <[email protected]>"
+Description:
+ Use counter for counting cycles. This is used to program any of
+ the counters in the block to count cycles.
+
+What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/use_odd_counter
+Date: May 2024
+KernelVersion: 6.10
+Contact: "Shravan Kumar Ramani <[email protected]>"
+Description:
+ Form 64-bit counter using 2 32-bit counters. This is used to combine
+ 2 adjacent counters to form a single 64-bit counter.
+
--
2.30.1


2024-05-20 11:58:00

by Shravan Ramani

[permalink] [raw]
Subject: [PATCH v2 3/4] platform/mellanox: mlxbf-pmc: Add support for clock_measure performance block

The HW clock_measure counter info is passed to the driver from ACPI.
Create a new sub-directory for clock_measure events and provide
read access to the user. Writes are blocked since the fields are RO.

Signed-off-by: Shravan Kumar Ramani <[email protected]>
Reviewed-by: David Thompson <[email protected]>
Reviewed-by: Vadim Pasternak <[email protected]>
---
drivers/platform/mellanox/mlxbf-pmc.c | 46 ++++++++++++++++++++++++---
1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/drivers/platform/mellanox/mlxbf-pmc.c b/drivers/platform/mellanox/mlxbf-pmc.c
index 635ecc3b3845..1212a96fb3eb 100644
--- a/drivers/platform/mellanox/mlxbf-pmc.c
+++ b/drivers/platform/mellanox/mlxbf-pmc.c
@@ -865,6 +865,37 @@ static const struct mlxbf_pmc_events mlxbf_pmc_llt_miss_events[] = {
{75, "HISTOGRAM_HISTOGRAM_BIN9"},
};

+static const struct mlxbf_pmc_events mlxbf_pmc_clock_events[] = {
+ { 0x0, "FMON_CLK_LAST_COUNT_PLL_D1_INST0" },
+ { 0x4, "REFERENCE_WINDOW_WIDTH_PLL_D1_INST0" },
+ { 0x8, "FMON_CLK_LAST_COUNT_PLL_D1_INST1" },
+ { 0xc, "REFERENCE_WINDOW_WIDTH_PLL_D1_INST1" },
+ { 0x10, "FMON_CLK_LAST_COUNT_PLL_G1" },
+ { 0x14, "REFERENCE_WINDOW_WIDTH_PLL_G1" },
+ { 0x18, "FMON_CLK_LAST_COUNT_PLL_W1" },
+ { 0x1c, "REFERENCE_WINDOW_WIDTH_PLL_W1" },
+ { 0x20, "FMON_CLK_LAST_COUNT_PLL_T1" },
+ { 0x24, "REFERENCE_WINDOW_WIDTH_PLL_T1" },
+ { 0x28, "FMON_CLK_LAST_COUNT_PLL_A0" },
+ { 0x2c, "REFERENCE_WINDOW_WIDTH_PLL_A0" },
+ { 0x30, "FMON_CLK_LAST_COUNT_PLL_C0" },
+ { 0x34, "REFERENCE_WINDOW_WIDTH_PLL_C0" },
+ { 0x38, "FMON_CLK_LAST_COUNT_PLL_N1" },
+ { 0x3c, "REFERENCE_WINDOW_WIDTH_PLL_N1" },
+ { 0x40, "FMON_CLK_LAST_COUNT_PLL_I1" },
+ { 0x44, "REFERENCE_WINDOW_WIDTH_PLL_I1" },
+ { 0x48, "FMON_CLK_LAST_COUNT_PLL_R1" },
+ { 0x4c, "REFERENCE_WINDOW_WIDTH_PLL_R1" },
+ { 0x50, "FMON_CLK_LAST_COUNT_PLL_P1" },
+ { 0x54, "REFERENCE_WINDOW_WIDTH_PLL_P1" },
+ { 0x58, "FMON_CLK_LAST_COUNT_REF_100_INST0" },
+ { 0x5c, "REFERENCE_WINDOW_WIDTH_REF_100_INST0" },
+ { 0x60, "FMON_CLK_LAST_COUNT_REF_100_INST1" },
+ { 0x64, "REFERENCE_WINDOW_WIDTH_REF_100_INST1" },
+ { 0x68, "FMON_CLK_LAST_COUNT_REF_156" },
+ { 0x6c, "REFERENCE_WINDOW_WIDTH_REF_156" },
+};
+
static struct mlxbf_pmc_context *pmc;

/* UUID used to probe ATF service. */
@@ -1038,6 +1069,9 @@ static const struct mlxbf_pmc_events *mlxbf_pmc_event_list(const char *blk, size
} else if (strstr(blk, "llt")) {
events = mlxbf_pmc_llt_events;
size = ARRAY_SIZE(mlxbf_pmc_llt_events);
+ } else if (strstr(blk, "clock_measure")) {
+ events = mlxbf_pmc_clock_events;
+ size = ARRAY_SIZE(mlxbf_pmc_clock_events);
} else {
events = NULL;
size = 0;
@@ -1472,14 +1506,15 @@ static int mlxbf_pmc_read_event(unsigned int blk_num, u32 cnt_num, bool is_l3, u
/* Method to read a register */
static int mlxbf_pmc_read_reg(unsigned int blk_num, u32 offset, u64 *result)
{
- u32 ecc_out;
+ u32 reg;

- if (strstr(pmc->block_name[blk_num], "ecc")) {
+ if ((strstr(pmc->block_name[blk_num], "ecc")) ||
+ (strstr(pmc->block_name[blk_num], "clock_measure"))) {
if (mlxbf_pmc_readl(pmc->block[blk_num].mmio_base + offset,
- &ecc_out))
+ &reg))
return -EFAULT;

- *result = ecc_out;
+ *result = reg;
return 0;
}

@@ -1493,6 +1528,9 @@ static int mlxbf_pmc_read_reg(unsigned int blk_num, u32 offset, u64 *result)
/* Method to write to a register */
static int mlxbf_pmc_write_reg(unsigned int blk_num, u32 offset, u64 data)
{
+ if (strstr(pmc->block_name[blk_num], "clock_measure"))
+ return -EINVAL;
+
if (strstr(pmc->block_name[blk_num], "ecc")) {
return mlxbf_pmc_write(pmc->block[blk_num].mmio_base + offset,
MLXBF_PMC_WRITE_REG_32, data);
--
2.30.1


2024-05-27 12:34:22

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] Documentation/ABI: Add document for Mellanox PMC driver

On Mon, 20 May 2024, Shravan Kumar Ramani wrote:

> The sysfs interface is created for programming and monitoring the

This patch is not creating interface. I suggest you change the wording to:

"Document the sysfs interface for ...

> performance counters in various HW blocks of Mellanox BlueField-1,
> BlueField-2 and BlueField-3.
>
> Signed-off-by: Shravan Kumar Ramani <[email protected]>
> Reviewed-by: David Thompson <[email protected]>
> ---
> .../ABI/testing/sysfs-platform-mellanox-pmc | 49 +++++++++++++++++++
> 1 file changed, 49 insertions(+)
> create mode 100644 Documentation/ABI/testing/sysfs-platform-mellanox-pmc
>
> diff --git a/Documentation/ABI/testing/sysfs-platform-mellanox-pmc b/Documentation/ABI/testing/sysfs-platform-mellanox-pmc
> new file mode 100644
> index 000000000000..47094024dbeb
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-platform-mellanox-pmc
> @@ -0,0 +1,49 @@
> +HID Driver Description
> +MLNXBFD0 mlxbf-pmc Performance counters (BlueField-1)
> +MLNXBFD1 mlxbf-pmc Performance counters (BlueField-2)
> +MLNXBFD2 mlxbf-pmc Performance counters (BlueField-3)
> +
> +What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/event_list
> +Date: Dec 2020
> +KernelVersion: 5.10
> +Contact: "Shravan Kumar Ramani <[email protected]>"
> +Description:
> + List of events supported by the counters in the specific block.
> + It is used to extract the event number or ID associated with
> + each event.
> +
> +What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/event<N>
> +Date: Dec 2020
> +KernelVersion: 5.10
> +Contact: "Shravan Kumar Ramani <[email protected]>"
> +Description:
> + Event monitored by corresponding counter. This is used to
> + program or read back the event that should be or is currently
> + being monitored by counter<N>.
> +
> +What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/counter<N>
> +Date: Dec 2020
> +KernelVersion: 5.10
> +Contact: "Shravan Kumar Ramani <[email protected]>"
> +Description:
> + Counter value of the event being monitored. This is used to
> + read the counter value of the event which was programmed using
> + event<N>. This is also used to clear or reset the counter value.

Please document how to clear/reset it in more concrete terms (I didn't
check the code but my guess would one writes zero there to reset the
counter?).

> +What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/enable
> +Date: Dec 2020
> +KernelVersion: 5.10
> +Contact: "Shravan Kumar Ramani <[email protected]>"
> +Description:
> + Start or stop counters. This is used to start the counters
> + for monitoring the programmed events and also to stop the
> + counters after the desired duration.
> +
> +What: /sys/bus/platform/devices/<HID>/hwmon/hwmonX/<block>/<reg>
> +Date: Dec 2020
> +KernelVersion: 5.10
> +Contact: "Shravan Kumar Ramani <[email protected]>"
> +Description:
> + Value of register. This is used to read or reset the registers
> + where various performance statistics are counted for each block.

Ditto for the reset part.

--
i.


2024-05-27 12:45:17

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] platform/mellanox: mlxbf-pmc: Add support for 64-bit counters and cycle count

On Mon, 20 May 2024, Shravan Kumar Ramani wrote:

> Add support for programming any counter to monitor the cycle count.
> Since counting of cycles using 32-bit ocunters would result in frequent

counters

> wraparounds, add the ability to combine 2 adjacent 32-bit counters to
> form 1 64-bit counter.
> Both these features are supported by BlueField-3 PMC hardware, hence
> the required bit-fields are exposed by the driver via sysfs to allow
> the user to configure as needed.

I'm trying to understand what happens for the other counter, when the
use_odd_counter is enabled? This change also doesn't add code that would
make the other counter -EBUSY, should that be done?

For 64-bit counter, I suppose the userspace is expected to read the full
counter from two sysfs files and combine the value (your documentation
doesn't explain this)? That seems non-optimal, why cannot kernel just
return the full combined 64-value directly in kernel?

Similarly, are these cycle counters occupying the same space as non-cycle
counters (so both can/cannot be used that the same time)? I'm asking this
because you're adding a parallel interface to read the value and if it's
either-or, I don't understand why the value needs to be read from
different file depending on the counter counting in cycles or not.

--
i.

> Signed-off-by: Shravan Kumar Ramani <[email protected]>
> Reviewed-by: David Thompson <[email protected]>
> Reviewed-by: Vadim Pasternak <[email protected]>
> ---
> drivers/platform/mellanox/mlxbf-pmc.c | 134 ++++++++++++++++++++++++++
> 1 file changed, 134 insertions(+)
>
> diff --git a/drivers/platform/mellanox/mlxbf-pmc.c b/drivers/platform/mellanox/mlxbf-pmc.c
> index 4ed9c7fd2b62..635ecc3b3845 100644
> --- a/drivers/platform/mellanox/mlxbf-pmc.c
> +++ b/drivers/platform/mellanox/mlxbf-pmc.c
> @@ -88,6 +88,8 @@
> #define MLXBF_PMC_CRSPACE_PERFMON_CTL(n) (n * MLXBF_PMC_CRSPACE_PERFMON_REG0_SZ)
> #define MLXBF_PMC_CRSPACE_PERFMON_EN BIT(30)
> #define MLXBF_PMC_CRSPACE_PERFMON_CLR BIT(28)
> +#define MLXBF_PMC_CRSPACE_PERFMON_UOC GENMASK(15, 0)
> +#define MLXBF_PMC_CRSPACE_PERFMON_COUNT_CLOCK(n) (MLXBF_PMC_CRSPACE_PERFMON_CTL(n) + 0x4)
> #define MLXBF_PMC_CRSPACE_PERFMON_VAL0(n) (MLXBF_PMC_CRSPACE_PERFMON_CTL(n) + 0xc)
>
> /**
> @@ -114,6 +116,8 @@ struct mlxbf_pmc_attribute {
> * @attr_event: Attributes for "event" sysfs files
> * @attr_event_list: Attributes for "event_list" sysfs files
> * @attr_enable: Attributes for "enable" sysfs files
> + * @attr_use_odd_counter: Attributes for "use_odd_counter" sysfs files
> + * @attr_count_clock: Attributes for "count_clock" sysfs files
> * @block_attr: All attributes needed for the block
> * @block_attr_grp: Attribute group for the block
> */
> @@ -126,6 +130,8 @@ struct mlxbf_pmc_block_info {
> struct mlxbf_pmc_attribute *attr_event;
> struct mlxbf_pmc_attribute attr_event_list;
> struct mlxbf_pmc_attribute attr_enable;
> + struct mlxbf_pmc_attribute attr_use_odd_counter;
> + struct mlxbf_pmc_attribute attr_count_clock;
> struct attribute *block_attr[MLXBF_PMC_MAX_ATTRS];
> struct attribute_group block_attr_grp;
> };
> @@ -1763,6 +1769,103 @@ static ssize_t mlxbf_pmc_enable_store(struct device *dev,
> return count;
> }
>
> +/* Show function for "use_odd_counter" sysfs files - only for crspace */
> +static ssize_t mlxbf_pmc_use_odd_counter_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct mlxbf_pmc_attribute *attr_use_odd_counter = container_of(
> + attr, struct mlxbf_pmc_attribute, dev_attr);
> + unsigned int blk_num;
> + u32 value, reg;
> +
> + blk_num = attr_use_odd_counter->nr;
> +
> + if (mlxbf_pmc_readl(pmc->block[blk_num].mmio_base +
> + MLXBF_PMC_CRSPACE_PERFMON_CTL(pmc->block[blk_num].counters),
> + &reg))
> + return -EINVAL;
> +
> + value = FIELD_GET(MLXBF_PMC_CRSPACE_PERFMON_UOC, reg);
> +
> + return sysfs_emit(buf, "%u\n", value);
> +}
> +
> +/* Store function for "use_odd_counter" sysfs files - only for crspace */
> +static ssize_t mlxbf_pmc_use_odd_counter_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct mlxbf_pmc_attribute *attr_use_odd_counter = container_of(
> + attr, struct mlxbf_pmc_attribute, dev_attr);
> + unsigned int blk_num;
> + u32 uoc, reg;
> + int err;
> +
> + blk_num = attr_use_odd_counter->nr;
> +
> + err = kstrtouint(buf, 0, &uoc);
> + if (err < 0)
> + return err;
> +
> + err = mlxbf_pmc_readl(pmc->block[blk_num].mmio_base +
> + MLXBF_PMC_CRSPACE_PERFMON_CTL(pmc->block[blk_num].counters),
> + &reg);
> + if (err)
> + return -EINVAL;
> +
> + reg &= ~MLXBF_PMC_CRSPACE_PERFMON_UOC;
> + reg |= FIELD_PREP(MLXBF_PMC_CRSPACE_PERFMON_UOC, uoc);
> +
> + mlxbf_pmc_write(pmc->block[blk_num].mmio_base +
> + MLXBF_PMC_CRSPACE_PERFMON_CTL(pmc->block[blk_num].counters),
> + MLXBF_PMC_WRITE_REG_32, reg);
> +
> + return count;
> +}
> +
> +/* Show function for "count_clock" sysfs files - only for crspace */
> +static ssize_t mlxbf_pmc_count_clock_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct mlxbf_pmc_attribute *attr_count_clock = container_of(
> + attr, struct mlxbf_pmc_attribute, dev_attr);
> + unsigned int blk_num;
> + u32 reg;
> +
> + blk_num = attr_count_clock->nr;
> +
> + if (mlxbf_pmc_readl(pmc->block[blk_num].mmio_base +
> + MLXBF_PMC_CRSPACE_PERFMON_COUNT_CLOCK(pmc->block[blk_num].counters),
> + &reg))
> + return -EINVAL;
> +
> + return sysfs_emit(buf, "%u\n", reg);
> +}
> +
> +/* Store function for "count_clock" sysfs files - only for crspace */
> +static ssize_t mlxbf_pmc_count_clock_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct mlxbf_pmc_attribute *attr_count_clock = container_of(
> + attr, struct mlxbf_pmc_attribute, dev_attr);
> + unsigned int blk_num;
> + u32 reg;
> + int err;
> +
> + blk_num = attr_count_clock->nr;
> +
> + err = kstrtouint(buf, 0, &reg);
> + if (err < 0)
> + return err;
> +
> + mlxbf_pmc_write(pmc->block[blk_num].mmio_base +
> + MLXBF_PMC_CRSPACE_PERFMON_COUNT_CLOCK(pmc->block[blk_num].counters),
> + MLXBF_PMC_WRITE_REG_32, reg);
> +
> + return count;
> +}
> +
> /* Populate attributes for blocks with counters to monitor performance */
> static int mlxbf_pmc_init_perftype_counter(struct device *dev, unsigned int blk_num)
> {
> @@ -1799,6 +1902,37 @@ static int mlxbf_pmc_init_perftype_counter(struct device *dev, unsigned int blk_
> attr = NULL;
> }
>
> + if (pmc->block[blk_num].type == MLXBF_PMC_TYPE_CRSPACE) {
> + /*
> + * Couple adjacent odd and even 32-bit counters to form 64-bit counters
> + * using "use_odd_counter" sysfs which has one bit per even counter.
> + */
> + attr = &pmc->block[blk_num].attr_use_odd_counter;
> + attr->dev_attr.attr.mode = 0644;
> + attr->dev_attr.show = mlxbf_pmc_use_odd_counter_show;
> + attr->dev_attr.store = mlxbf_pmc_use_odd_counter_store;
> + attr->nr = blk_num;
> + attr->dev_attr.attr.name = devm_kasprintf(dev, GFP_KERNEL,
> + "use_odd_counter");
> + if (!attr->dev_attr.attr.name)
> + return -ENOMEM;
> + pmc->block[blk_num].block_attr[++i] = &attr->dev_attr.attr;
> + attr = NULL;
> +
> + /* Program crspace counters to count clock cycles using "count_clock" sysfs */
> + attr = &pmc->block[blk_num].attr_count_clock;
> + attr->dev_attr.attr.mode = 0644;
> + attr->dev_attr.show = mlxbf_pmc_count_clock_show;
> + attr->dev_attr.store = mlxbf_pmc_count_clock_store;
> + attr->nr = blk_num;
> + attr->dev_attr.attr.name = devm_kasprintf(dev, GFP_KERNEL,
> + "count_clock");
> + if (!attr->dev_attr.attr.name)
> + return -ENOMEM;
> + pmc->block[blk_num].block_attr[++i] = &attr->dev_attr.attr;
> + attr = NULL;
> + }


2024-06-03 10:30:05

by Shravan Ramani

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] platform/mellanox: mlxbf-pmc: Add support for 64-bit counters and cycle count


> > Both these features are supported by BlueField-3 PMC hardware, hence
> > the required bit-fields are exposed by the driver via sysfs to allow
> > the user to configure as needed.
>
> I'm trying to understand what happens for the other counter, when the
> use_odd_counter is enabled? This change also doesn't add code that would
> make the other counter -EBUSY, should that be done?

When 2 32-bit counters are coupled to form a 64-bit counter using this setting,
one counter will hold the lower 32 bits while the other will hold the upper 32.
So the other counter (or syses corresponding to it) also needs to be accessed.

> For 64-bit counter, I suppose the userspace is expected to read the full
> counter from two sysfs files and combine the value (your documentation
> doesn't explain this)? That seems non-optimal, why cannot kernel just
> return the full combined 64-value directly in kernel?

I will add more clear comments for this.
While it is true that the driver could combine the 2 fields and present a
64-bit value via one of the sysfs, the reason for the current approach is that
there are other interfaces which expose the same counters for our platform
and there are tools that are expected to work on top of both interfaces for
the purpose of collecting performance stats. The other interfaces follow this
approach of having lower and upper 32-bits separately in each counter, and
the tools expect the same. Hence the driver follows this approach to keep
things consistent across the BlueField platform.

>
> Similarly, are these cycle counters occupying the same space as non-cycle
> counters (so both can/cannot be used that the same time)? I'm asking this
> because you're adding a parallel interface to read the value and if it's
> either-or, I don't understand why the value needs to be read from
> different file depending on the counter counting in cycles or not.

It is the same file. The count_clock sysfs exposes 16 bits, one for each counter,
to allow the user to dedicate any of the 16 counters to counting cycles. Once
set, the corresponding counter can no longer monitor other events, and the
same sysfs can be accessed to read the cycle count. Again, I will try capture
this better and more elaborately in the documentation.

Thanks,
Shravan

2024-06-11 07:16:53

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] platform/mellanox: mlxbf-pmc: Add support for 64-bit counters and cycle count

On Mon, 3 Jun 2024, Shravan Ramani wrote:

> > > Both these features are supported by BlueField-3 PMC hardware, hence
> > > the required bit-fields are exposed by the driver via sysfs to allow
> > > the user to configure as needed.
> >
> > I'm trying to understand what happens for the other counter, when the
> > use_odd_counter is enabled? This change also doesn't add code that would
> > make the other counter -EBUSY, should that be done?
>
> When 2 32-bit counters are coupled to form a 64-bit counter using this setting,
> one counter will hold the lower 32 bits while the other will hold the upper 32.
> So the other counter (or syses corresponding to it) also needs to be accessed.
>
> > For 64-bit counter, I suppose the userspace is expected to read the full
> > counter from two sysfs files and combine the value (your documentation
> > doesn't explain this)? That seems non-optimal, why cannot kernel just
> > return the full combined 64-value directly in kernel?
>
> I will add more clear comments for this.
> While it is true that the driver could combine the 2 fields and present a
> 64-bit value via one of the sysfs, the reason for the current approach is that
> there are other interfaces which expose the same counters for our platform
> and there are tools that are expected to work on top of both interfaces for
> the purpose of collecting performance stats.

> The other interfaces follow this
> approach of having lower and upper 32-bits separately in each counter, and
> the tools expect the same. Hence the driver follows this approach to keep
> things consistent across the BlueField platform.

Hi,

I went to look through the existing arrays in mlxbf-pmc.c but did not find
any entries that would have clearly indicated the counters being hi/lo
parts of the same counter. There were a few 0/1 ones which could be the
same counter although I suspect even they are not parts of the same
counter but two separate entities called 0 and 1 having the same counter.

Could you please elaborate further what you meant with the note about
other interfaces above so I can better assess the claim?

--
i.


2024-06-11 13:48:07

by Shravan Ramani

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] platform/mellanox: mlxbf-pmc: Add support for 64-bit counters and cycle count

> > When 2 32-bit counters are coupled to form a 64-bit counter using this setting,
> > one counter will hold the lower 32 bits while the other will hold the upper 32.
> > So the other counter (or syses corresponding to it) also needs to be accessed.
> >
> > > For 64-bit counter, I suppose the userspace is expected to read the full
> > > counter from two sysfs files and combine the value (your documentation
> > > doesn't explain this)? That seems non-optimal, why cannot kernel just
> > > return the full combined 64-value directly in kernel?
> >
> > I will add more clear comments for this.
> > While it is true that the driver could combine the 2 fields and present a
> > 64-bit value via one of the sysfs, the reason for the current approach is that
> > there are other interfaces which expose the same counters for our platform
> > and there are tools that are expected to work on top of both interfaces for
> > the purpose of collecting performance stats.
>
> > The other interfaces follow this
> > approach of having lower and upper 32-bits separately in each counter, and
> > the tools expect the same. Hence the driver follows this approach to keep
> > things consistent across the BlueField platform.
>
> Hi,
>
> I went to look through the existing arrays in mlxbf-pmc.c but did not find
> any entries that would have clearly indicated the counters being hi/lo
> parts of the same counter. There were a few 0/1 ones which could be the
> same counter although I suspect even they are not parts of the same
> counter but two separate entities called 0 and 1 having the same counter.
>
> Could you please elaborate further what you meant with the note about
> other interfaces above so I can better assess the claim?

When combining 2 counters using the "use_odd_counter" setting, the mechanism
of joining them or assigning upper or lower 32 bits to a counter is handled in HW
and not by the driver. For example, if bit0 of "use_odd_counter" is set, counter0
and counter1 (which were originally separate counters) automatically become
the lower and upper bits of one 64-bit value. The user needs to read both these
sysfs separately to get the full 64-bit value. The driver does not do any special
handling for such cases, merely provides access to both counter0 and counter1.

Since the events supported by the blocks are quite HW centric and low-level in
nature, the driver is generally used alongside various tools which work on top of
this driver to collect telemetry info and provide more readable statistics to the
end-user. Similar to this driver, there are other FW interfaces providing access to
these counters (same and other additional ones as well that belong to other HW
blocks). For the sake of consistency and to allow the tools to be compatible with
all interfaces, the counter data needs to be accessible in the same way, ie, as 32-bit
upper and lower values in counter0 and counter1 sysfs as in the above case.

Thanks,
Shravan

2024-06-12 07:28:25

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] platform/mellanox: mlxbf-pmc: Add support for 64-bit counters and cycle count

On Tue, 11 Jun 2024, Shravan Ramani wrote:

> > > When 2 32-bit counters are coupled to form a 64-bit counter using this setting,
> > > one counter will hold the lower 32 bits while the other will hold the upper 32.
> > > So the other counter (or syses corresponding to it) also needs to be accessed.
> > >
> > > > For 64-bit counter, I suppose the userspace is expected to read the full
> > > > counter from two sysfs files and combine the value (your documentation
> > > > doesn't explain this)? That seems non-optimal, why cannot kernel just
> > > > return the full combined 64-value directly in kernel?
> > >
> > > I will add more clear comments for this.
> > > While it is true that the driver could combine the 2 fields and present a
> > > 64-bit value via one of the sysfs, the reason for the current approach is that
> > > there are other interfaces which expose the same counters for our platform
> > > and there are tools that are expected to work on top of both interfaces for
> > > the purpose of collecting performance stats.
> >
> > > The other interfaces follow this
> > > approach of having lower and upper 32-bits separately in each counter, and
> > > the tools expect the same. Hence the driver follows this approach to keep
> > > things consistent across the BlueField platform.
> >
> > Hi,
> >
> > I went to look through the existing arrays in mlxbf-pmc.c but did not find
> > any entries that would have clearly indicated the counters being hi/lo
> > parts of the same counter. There were a few 0/1 ones which could be the
> > same counter although I suspect even they are not parts of the same
> > counter but two separate entities called 0 and 1 having the same counter.
> >
> > Could you please elaborate further what you meant with the note about
> > other interfaces above so I can better assess the claim?
>
> When combining 2 counters using the "use_odd_counter" setting, the mechanism
> of joining them or assigning upper or lower 32 bits to a counter is handled in HW
> and not by the driver. For example, if bit0 of "use_odd_counter" is set, counter0
> and counter1 (which were originally separate counters) automatically become
> the lower and upper bits of one 64-bit value. The user needs to read both these
> sysfs separately to get the full 64-bit value. The driver does not do any special
> handling for such cases, merely provides access to both counter0 and counter1.

I know all this by now, but we're discussion here is whether kernel should
do "special handling". Although, it's not really correct to depict
representing 64-bit counter in its entirety as "special handling".

I think the kernel should combine the 64-bit halved and you argumented
it shouldn't. When I went to confirm the claim your argument was based
on, I couldn't find on what basis the claim was made.

> Since the events supported by the blocks are quite HW centric and low-level in
> nature, the driver is generally used alongside various tools which work on top of
> this driver to collect telemetry info and provide more readable statistics to the
> end-user. Similar to this driver, there are other FW interfaces providing access to
> these counters (same and other additional ones as well that belong to other HW
> blocks). For the sake of consistency and to allow the tools to be compatible with
> all interfaces, the counter data needs to be accessible in the same way, ie, as 32-bit
> upper and lower values in counter0 and counter1 sysfs as in the above case.

This does nothing to answer my question. Where in the kernel, there's an
example where a 64-bit counter for BlueField platform is presented as 2
32-bit counters? If there isn't any examples in the kernel, your statement
about consistency within the platform doesn't hold water, quoted (again)
here for clarity what I'm refering to:

"The other interfaces follow this approach of having lower and upper
32-bits separately in each counter, and the tools expect the same.
Hence the driver follows this approach to keep things consistent across
the BlueField platform."

Where I can find those "other interfaces" that already follow this
convention?

--
i.


2024-06-14 11:13:38

by Shravan Ramani

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] platform/mellanox: mlxbf-pmc: Add support for 64-bit counters and cycle count


> This does nothing to answer my question. Where in the kernel, there's an
> example where a 64-bit counter for BlueField platform is presented as 2
> 32-bit counters? If there isn't any examples in the kernel, your statement
> about consistency within the platform doesn't hold water, quoted (again)
> here for clarity what I'm refering to:
>
> "The other interfaces follow this approach of having lower and upper
> 32-bits separately in each counter, and the tools expect the same.
> Hence the driver follows this approach to keep things consistent across
> the BlueField platform."
>
> Where I can find those "other interfaces" that already follow this
> convention?

Ah, I think I misunderstood the question and went on elaborating the
same thing, apologies. The other interfaces are not part of the kernel.
They are part of the BlueField Software Package, which also contains
the tools that put together the performance metrics.
My thinking was that since this is a platform driver and is used along
with the BlueField Software Package, consistency with the tools which
were developed following the same convention could be considered,
as long as the driver is not doing something non-standard, of course.
I can change the driver handling to present 64-bit data if you insist.

Thanks,
Shravan

2024-06-14 12:13:41

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] platform/mellanox: mlxbf-pmc: Add support for 64-bit counters and cycle count

On Fri, 14 Jun 2024, Shravan Ramani wrote:

> > This does nothing to answer my question. Where in the kernel, there's an
> > example where a 64-bit counter for BlueField platform is presented as 2
> > 32-bit counters? If there isn't any examples in the kernel, your statement
> > about consistency within the platform doesn't hold water, quoted (again)
> > here for clarity what I'm refering to:
> >
> > "The other interfaces follow this approach of having lower and upper
> > 32-bits separately in each counter, and the tools expect the same.
> > Hence the driver follows this approach to keep things consistent across
> > the BlueField platform."
> >
> > Where I can find those "other interfaces" that already follow this
> > convention?
>
> Ah, I think I misunderstood the question and went on elaborating the
> same thing, apologies. The other interfaces are not part of the kernel.
> They are part of the BlueField Software Package, which also contains
> the tools that put together the performance metrics.
> My thinking was that since this is a platform driver and is used along
> with the BlueField Software Package, consistency with the tools which
> were developed following the same convention could be considered,
> as long as the driver is not doing something non-standard, of course.
> I can change the driver handling to present 64-bit data if you insist.

I'd certainly prefer 64-bit data be presented as such by the kernel.
While you make that change, please make sure the driver correctly handles
the lower dword wraps without returning an inconsistent reading (assuming
the counter parts are read non-atomically, it is a common pitfall).

--
i.