2023-12-22 10:37:59

by Lucas Segarra Fernandez

[permalink] [raw]
Subject: [PATCH v2 0/4] crypto: qat - enable telemetry for QAT GEN 4

Expose through debugfs telemetry data for QAT GEN4 devices.

This allows to gather metrics about the performance and the utilization
of a QAT device and/or a group of ring pairs. In particular, statistics
on (1) the utilization of the PCIe channel, (2) address translation and
device TLB, when SVA is enabled and (3) the internal engines for crypto
and data compression.

The device periodically gathers telemetry data from hardware registers
and writes it into a DMA memory region which is sampled by the driver.
The driver then uses this data to compute basic metrics on the counters
and exposes them through debugfs attributes in the folder
/sys/kernel/debug/qat_<device>_<BDF>/telemetry.

Here is a summary of the changes:
* Patch #1 adds an helper function to math.h to compute the average of
values within an array.
* Patch #2 includes a missing header in the file adf_accel_devices.h to
allow to use the macro GET_DEV().
* Patch #3 introduces device level telemetry and the associated documentation
in /Documentation/ABI.
* Patch #4 extends #3 by introducing ring pair level telemetry and
documentation about it.

This set is based on earlier work done by Wojciech Ziemba.

---
v1 -> v2:
- define avg_array() in the C file where it is used
- set `accel_dev->telemetry` to NULL in adf_tl_free_mem()
- add ring pair service type info to debugfs telemetry/rp_<X>_data output
---

Lucas Segarra Fernandez (4):
crypto: qat - include pci.h for GET_DEV()
crypto: qat - add admin msgs for telemetry
crypto: qat - add support for device telemetry
crypto: qat - add support for ring pair level telemetry

.../ABI/testing/debugfs-driver-qat_telemetry | 228 ++++++
.../intel/qat/qat_420xx/adf_420xx_hw_data.c | 3 +
.../intel/qat/qat_4xxx/adf_4xxx_hw_data.c | 3 +
drivers/crypto/intel/qat/qat_common/Makefile | 3 +
.../intel/qat/qat_common/adf_accel_devices.h | 6 +
.../crypto/intel/qat/qat_common/adf_admin.c | 37 +
.../crypto/intel/qat/qat_common/adf_admin.h | 4 +
.../crypto/intel/qat/qat_common/adf_dbgfs.c | 3 +
.../intel/qat/qat_common/adf_gen4_hw_data.h | 1 +
.../crypto/intel/qat/qat_common/adf_gen4_tl.c | 153 ++++
.../crypto/intel/qat/qat_common/adf_gen4_tl.h | 158 ++++
.../crypto/intel/qat/qat_common/adf_init.c | 12 +
.../intel/qat/qat_common/adf_telemetry.c | 288 +++++++
.../intel/qat/qat_common/adf_telemetry.h | 99 +++
.../intel/qat/qat_common/adf_tl_debugfs.c | 710 ++++++++++++++++++
.../intel/qat/qat_common/adf_tl_debugfs.h | 117 +++
.../qat/qat_common/icp_qat_fw_init_admin.h | 10 +
17 files changed, 1835 insertions(+)
create mode 100644 Documentation/ABI/testing/debugfs-driver-qat_telemetry
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_tl.h
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_telemetry.c
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_telemetry.h
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.h


base-commit: b4719435c14199548ed49f036a7c31040a6b5353
--
2.41.0



2023-12-22 10:38:13

by Lucas Segarra Fernandez

[permalink] [raw]
Subject: [PATCH v2 1/4] crypto: qat - include pci.h for GET_DEV()

GET_DEV() macro expansion relies on struct pci_dev being defined.

Include <linux/pci.h> at adf_accel_devices.h.

Signed-off-by: Lucas Segarra Fernandez <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Damian Muszynski <[email protected]>
---
drivers/crypto/intel/qat/qat_common/adf_accel_devices.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
index 7df6336ddd62..fc7786d71e96 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
+++ b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
@@ -6,6 +6,7 @@
#include <linux/module.h>
#include <linux/list.h>
#include <linux/io.h>
+#include <linux/pci.h>
#include <linux/ratelimit.h>
#include <linux/types.h>
#include "adf_cfg_common.h"
--
2.41.0


2023-12-22 10:38:16

by Lucas Segarra Fernandez

[permalink] [raw]
Subject: [PATCH v2 2/4] crypto: qat - add admin msgs for telemetry

Extend the admin interface with two new public APIs to enable
and disable the telemetry feature: adf_send_admin_tl_start() and
adf_send_admin_tl_stop().

The first, sends to the firmware, through the ICP_QAT_FW_TL_START
message, the IO address where the firmware will write telemetry
metrics and a list of ring pairs (maximum 4) to be monitored.
It returns the number of accelerators of each type supported by
this hardware. After this message is sent, the firmware starts
periodically reporting telemetry data using by writing into the
dma buffer specified as input.

The second, sends the admin message ICP_QAT_FW_TL_STOP
which stops the reporting of telemetry data.

This patch is based on earlier work done by Wojciech Ziemba.

Signed-off-by: Lucas Segarra Fernandez <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Damian Muszynski <[email protected]>
---
.../crypto/intel/qat/qat_common/adf_admin.c | 37 +++++++++++++++++++
.../crypto/intel/qat/qat_common/adf_admin.h | 4 ++
.../qat/qat_common/icp_qat_fw_init_admin.h | 10 +++++
3 files changed, 51 insertions(+)

diff --git a/drivers/crypto/intel/qat/qat_common/adf_admin.c b/drivers/crypto/intel/qat/qat_common/adf_admin.c
index 54b673ec2362..acad526eb741 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_admin.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_admin.c
@@ -498,6 +498,43 @@ int adf_get_cnv_stats(struct adf_accel_dev *accel_dev, u16 ae, u16 *err_cnt,
return ret;
}

+int adf_send_admin_tl_start(struct adf_accel_dev *accel_dev,
+ dma_addr_t tl_dma_addr, size_t layout_sz, u8 *rp_indexes,
+ struct icp_qat_fw_init_admin_slice_cnt *slice_count)
+{
+ u32 ae_mask = GET_HW_DATA(accel_dev)->admin_ae_mask;
+ struct icp_qat_fw_init_admin_resp resp = { };
+ struct icp_qat_fw_init_admin_req req = { };
+ int ret;
+
+ req.cmd_id = ICP_QAT_FW_TL_START;
+ req.init_cfg_ptr = tl_dma_addr;
+ req.init_cfg_sz = layout_sz;
+
+ if (rp_indexes)
+ memcpy(&req.rp_indexes, rp_indexes, sizeof(req.rp_indexes));
+
+ ret = adf_send_admin(accel_dev, &req, &resp, ae_mask);
+ if (ret)
+ return ret;
+
+ memcpy(slice_count, &resp.slices, sizeof(*slice_count));
+
+ return 0;
+}
+
+int adf_send_admin_tl_stop(struct adf_accel_dev *accel_dev)
+{
+ struct adf_hw_device_data *hw_data = GET_HW_DATA(accel_dev);
+ struct icp_qat_fw_init_admin_resp resp = { };
+ struct icp_qat_fw_init_admin_req req = { };
+ u32 ae_mask = hw_data->admin_ae_mask;
+
+ req.cmd_id = ICP_QAT_FW_TL_STOP;
+
+ return adf_send_admin(accel_dev, &req, &resp, ae_mask);
+}
+
int adf_init_admin_comms(struct adf_accel_dev *accel_dev)
{
struct adf_admin_comms *admin;
diff --git a/drivers/crypto/intel/qat/qat_common/adf_admin.h b/drivers/crypto/intel/qat/qat_common/adf_admin.h
index 55cbcbc66c9f..647c8e196752 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_admin.h
+++ b/drivers/crypto/intel/qat/qat_common/adf_admin.h
@@ -23,5 +23,9 @@ int adf_send_admin_rl_delete(struct adf_accel_dev *accel_dev, u16 node_id,
int adf_get_fw_timestamp(struct adf_accel_dev *accel_dev, u64 *timestamp);
int adf_get_pm_info(struct adf_accel_dev *accel_dev, dma_addr_t p_state_addr, size_t buff_size);
int adf_get_cnv_stats(struct adf_accel_dev *accel_dev, u16 ae, u16 *err_cnt, u16 *latest_err);
+int adf_send_admin_tl_start(struct adf_accel_dev *accel_dev,
+ dma_addr_t tl_dma_addr, size_t layout_sz, u8 *rp_indexes,
+ struct icp_qat_fw_init_admin_slice_cnt *slice_count);
+int adf_send_admin_tl_stop(struct adf_accel_dev *accel_dev);

#endif
diff --git a/drivers/crypto/intel/qat/qat_common/icp_qat_fw_init_admin.h b/drivers/crypto/intel/qat/qat_common/icp_qat_fw_init_admin.h
index cd418b51d9f3..63cf18e2a4e5 100644
--- a/drivers/crypto/intel/qat/qat_common/icp_qat_fw_init_admin.h
+++ b/drivers/crypto/intel/qat/qat_common/icp_qat_fw_init_admin.h
@@ -29,6 +29,8 @@ enum icp_qat_fw_init_admin_cmd_id {
ICP_QAT_FW_RL_ADD = 134,
ICP_QAT_FW_RL_UPDATE = 135,
ICP_QAT_FW_RL_REMOVE = 136,
+ ICP_QAT_FW_TL_START = 137,
+ ICP_QAT_FW_TL_STOP = 138,
};

enum icp_qat_fw_init_admin_resp_status {
@@ -36,6 +38,13 @@ enum icp_qat_fw_init_admin_resp_status {
ICP_QAT_FW_INIT_RESP_STATUS_FAIL
};

+struct icp_qat_fw_init_admin_tl_rp_indexes {
+ __u8 rp_num_index_0;
+ __u8 rp_num_index_1;
+ __u8 rp_num_index_2;
+ __u8 rp_num_index_3;
+};
+
struct icp_qat_fw_init_admin_slice_cnt {
__u8 cpr_cnt;
__u8 xlt_cnt;
@@ -87,6 +96,7 @@ struct icp_qat_fw_init_admin_req {
__u8 rp_count;
};
__u32 idle_filter;
+ struct icp_qat_fw_init_admin_tl_rp_indexes rp_indexes;
};

__u32 resrvd4;
--
2.41.0


2023-12-22 10:38:24

by Lucas Segarra Fernandez

[permalink] [raw]
Subject: [PATCH v2 3/4] crypto: qat - add support for device telemetry

Expose through debugfs device telemetry data for QAT GEN4 devices.

This allows to gather metrics about the performance and the utilization
of a device. In particular, statistics on (1) the utilization of the
PCIe channel, (2) address translation, when SVA is enabled and (3) the
internal engines for crypto and data compression.

If telemetry is supported by the firmware, the driver allocates a DMA
region and a circular buffer. When telemetry is enabled, through the
`control` attribute in debugfs, the driver sends to the firmware, via
the admin interface, the `TL_START` command. This triggers the device to
periodically gather telemetry data from hardware registers and write it
into the DMA memory region. The device writes into the shared region
every second.

The driver, every 500ms, snapshots the DMA shared region into the
circular buffer. This is then used to compute basic metric
(min/max/average) on each counter, every time the `device_data` attribute
is queried.

Telemetry counters are exposed through debugfs in the folder
/sys/kernel/debug/qat_<device>_<BDF>/telemetry.

For details, refer to debugfs-driver-qat_telemetry in Documentation/ABI.

This patch is based on earlier work done by Wojciech Ziemba.

Signed-off-by: Lucas Segarra Fernandez <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Damian Muszynski <[email protected]>
---
.../ABI/testing/debugfs-driver-qat_telemetry | 103 ++++
.../intel/qat/qat_420xx/adf_420xx_hw_data.c | 2 +
.../intel/qat/qat_4xxx/adf_4xxx_hw_data.c | 2 +
drivers/crypto/intel/qat/qat_common/Makefile | 3 +
.../intel/qat/qat_common/adf_accel_devices.h | 4 +
.../crypto/intel/qat/qat_common/adf_dbgfs.c | 3 +
.../crypto/intel/qat/qat_common/adf_gen4_tl.c | 118 ++++
.../crypto/intel/qat/qat_common/adf_gen4_tl.h | 121 +++++
.../crypto/intel/qat/qat_common/adf_init.c | 12 +
.../intel/qat/qat_common/adf_telemetry.c | 271 ++++++++++
.../intel/qat/qat_common/adf_telemetry.h | 92 ++++
.../intel/qat/qat_common/adf_tl_debugfs.c | 502 ++++++++++++++++++
.../intel/qat/qat_common/adf_tl_debugfs.h | 106 ++++
13 files changed, 1339 insertions(+)
create mode 100644 Documentation/ABI/testing/debugfs-driver-qat_telemetry
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_tl.h
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_telemetry.c
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_telemetry.h
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.h

diff --git a/Documentation/ABI/testing/debugfs-driver-qat_telemetry b/Documentation/ABI/testing/debugfs-driver-qat_telemetry
new file mode 100644
index 000000000000..24532365387c
--- /dev/null
+++ b/Documentation/ABI/testing/debugfs-driver-qat_telemetry
@@ -0,0 +1,103 @@
+What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/control
+Date: March 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description: (RW) Enables/disables the reporting of telemetry metrics.
+
+ Allowed values to write:
+ ========================
+ * 0: disable telemetry
+ * 1: enable telemetry
+ * 2, 3, 4: enable telemetry and calculate minimum, maximum
+ and average for each counter over 2, 3 or 4 samples
+
+ Returned values:
+ ================
+ * 1-4: telemetry is enabled and running
+ * 0: telemetry is disabled
+
+ Example.
+
+ Writing '3' to this file starts the collection of
+ telemetry metrics. Samples are collected every second and
+ stored in a circular buffer of size 3. These values are then
+ used to calculate the minimum, maximum and average for each
+ counter. After enabling, counters can be retrieved through
+ the ``device_data`` file::
+
+ echo 3 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/control
+
+ Writing '0' to this file stops the collection of telemetry
+ metrics::
+
+ echo 0 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/control
+
+ This attribute is only available for qat_4xxx devices.
+
+What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/device_data
+Date: March 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description: (RO) Reports device telemetry counters.
+ Reads report metrics about performance and utilization of
+ a QAT device:
+
+ ======================= ========================================
+ Field Description
+ ======================= ========================================
+ sample_cnt number of acquisitions of telemetry data
+ from the device. Reads are performed
+ every 1000 ms.
+ pci_trans_cnt number of PCIe partial transactions
+ max_rd_lat maximum logged read latency [ns] (could
+ be any read operation)
+ rd_lat_acc_avg average read latency [ns]
+ max_gp_lat max get to put latency [ns] (only takes
+ samples for AE0)
+ gp_lat_acc_avg average get to put latency [ns]
+ bw_in PCIe, write bandwidth [Mbps]
+ bw_out PCIe, read bandwidth [Mbps]
+ at_page_req_lat_avg Address Translator(AT), average page
+ request latency [ns]
+ at_trans_lat_avg AT, average page translation latency [ns]
+ at_max_tlb_used AT, maximum uTLB used
+ util_cpr<N> utilization of Compression slice N [%]
+ exec_cpr<N> execution count of Compression slice N
+ util_xlt<N> utilization of Translator slice N [%]
+ exec_xlt<N> execution count of Translator slice N
+ util_dcpr<N> utilization of Decompression slice N [%]
+ exec_dcpr<N> execution count of Decompression slice N
+ util_pke<N> utilization of PKE N [%]
+ exec_pke<N> execution count of PKE N
+ util_ucs<N> utilization of UCS slice N [%]
+ exec_ucs<N> execution count of UCS slice N
+ util_wat<N> utilization of Wireless Authentication
+ slice N [%]
+ exec_wat<N> execution count of Wireless Authentication
+ slice N
+ util_wcp<N> utilization of Wireless Cipher slice N [%]
+ exec_wcp<N> execution count of Wireless Cipher slice N
+ util_cph<N> utilization of Cipher slice N [%]
+ exec_cph<N> execution count of Cipher slice N
+ util_ath<N> utilization of Authentication slice N [%]
+ exec_ath<N> execution count of Authentication slice N
+ ======================= ========================================
+
+ The telemetry report file can be read with the following command::
+
+ cat /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/device_data
+
+ If ``control`` is set to 1, only the current values of the
+ counters are displayed::
+
+ <counter_name> <current>
+
+ If ``control`` is 2, 3 or 4, counters are displayed in the
+ following format::
+
+ <counter_name> <current> <min> <max> <avg>
+
+ If a device lacks of a specific accelerator, the corresponding
+ attribute is not reported.
+
+ This attribute is only available for qat_4xxx devices.
diff --git a/drivers/crypto/intel/qat/qat_420xx/adf_420xx_hw_data.c b/drivers/crypto/intel/qat/qat_420xx/adf_420xx_hw_data.c
index d296eb18db3c..a7730d8057d6 100644
--- a/drivers/crypto/intel/qat/qat_420xx/adf_420xx_hw_data.c
+++ b/drivers/crypto/intel/qat/qat_420xx/adf_420xx_hw_data.c
@@ -15,6 +15,7 @@
#include <adf_gen4_pm.h>
#include <adf_gen4_ras.h>
#include <adf_gen4_timer.h>
+#include <adf_gen4_tl.h>
#include "adf_420xx_hw_data.h"
#include "icp_qat_hw.h"

@@ -543,6 +544,7 @@ void adf_init_hw_data_420xx(struct adf_hw_device_data *hw_data, u32 dev_id)
adf_gen4_init_pf_pfvf_ops(&hw_data->pfvf_ops);
adf_gen4_init_dc_ops(&hw_data->dc_ops);
adf_gen4_init_ras_ops(&hw_data->ras_ops);
+ adf_gen4_init_tl_data(&hw_data->tl_data);
adf_init_rl_data(&hw_data->rl_data);
}

diff --git a/drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c b/drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c
index f133126932c1..73001b20cbfd 100644
--- a/drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c
+++ b/drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c
@@ -15,6 +15,7 @@
#include <adf_gen4_pm.h>
#include "adf_gen4_ras.h"
#include <adf_gen4_timer.h>
+#include <adf_gen4_tl.h>
#include "adf_4xxx_hw_data.h"
#include "icp_qat_hw.h"

@@ -453,6 +454,7 @@ void adf_init_hw_data_4xxx(struct adf_hw_device_data *hw_data, u32 dev_id)
adf_gen4_init_pf_pfvf_ops(&hw_data->pfvf_ops);
adf_gen4_init_dc_ops(&hw_data->dc_ops);
adf_gen4_init_ras_ops(&hw_data->ras_ops);
+ adf_gen4_init_tl_data(&hw_data->tl_data);
adf_init_rl_data(&hw_data->rl_data);
}

diff --git a/drivers/crypto/intel/qat/qat_common/Makefile b/drivers/crypto/intel/qat/qat_common/Makefile
index 928de6997155..6908727bff3b 100644
--- a/drivers/crypto/intel/qat/qat_common/Makefile
+++ b/drivers/crypto/intel/qat/qat_common/Makefile
@@ -41,9 +41,12 @@ intel_qat-$(CONFIG_DEBUG_FS) += adf_transport_debug.o \
adf_fw_counters.o \
adf_cnv_dbgfs.o \
adf_gen4_pm_debugfs.o \
+ adf_gen4_tl.o \
adf_heartbeat.o \
adf_heartbeat_dbgfs.o \
adf_pm_dbgfs.o \
+ adf_telemetry.o \
+ adf_tl_debugfs.o \
adf_dbgfs.o

intel_qat-$(CONFIG_PCI_IOV) += adf_sriov.o adf_vf_isr.o adf_pfvf_utils.o \
diff --git a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
index fc7786d71e96..b274ebc799c9 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
+++ b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
@@ -11,6 +11,7 @@
#include <linux/types.h>
#include "adf_cfg_common.h"
#include "adf_rl.h"
+#include "adf_telemetry.h"
#include "adf_pfvf_msg.h"

#define ADF_DH895XCC_DEVICE_NAME "dh895xcc"
@@ -254,6 +255,7 @@ struct adf_hw_device_data {
struct adf_ras_ops ras_ops;
struct adf_dev_err_mask dev_err_mask;
struct adf_rl_hw_data rl_data;
+ struct adf_tl_hw_data tl_data;
const char *fw_name;
const char *fw_mmp_name;
u32 fuses;
@@ -308,6 +310,7 @@ struct adf_hw_device_data {
#define GET_CSR_OPS(accel_dev) (&(accel_dev)->hw_device->csr_ops)
#define GET_PFVF_OPS(accel_dev) (&(accel_dev)->hw_device->pfvf_ops)
#define GET_DC_OPS(accel_dev) (&(accel_dev)->hw_device->dc_ops)
+#define GET_TL_DATA(accel_dev) GET_HW_DATA(accel_dev)->tl_data
#define accel_to_pci_dev(accel_ptr) accel_ptr->accel_pci_dev.pci_dev

struct adf_admin_comms;
@@ -356,6 +359,7 @@ struct adf_accel_dev {
struct adf_cfg_device_data *cfg;
struct adf_fw_loader_data *fw_loader;
struct adf_admin_comms *admin;
+ struct adf_telemetry *telemetry;
struct adf_dc_data *dc_data;
struct adf_pm power_management;
struct list_head crypto_list;
diff --git a/drivers/crypto/intel/qat/qat_common/adf_dbgfs.c b/drivers/crypto/intel/qat/qat_common/adf_dbgfs.c
index 477efcc81a16..c42f5c25aabd 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_dbgfs.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_dbgfs.c
@@ -10,6 +10,7 @@
#include "adf_fw_counters.h"
#include "adf_heartbeat_dbgfs.h"
#include "adf_pm_dbgfs.h"
+#include "adf_tl_debugfs.h"

/**
* adf_dbgfs_init() - add persistent debugfs entries
@@ -66,6 +67,7 @@ void adf_dbgfs_add(struct adf_accel_dev *accel_dev)
adf_heartbeat_dbgfs_add(accel_dev);
adf_pm_dbgfs_add(accel_dev);
adf_cnv_dbgfs_add(accel_dev);
+ adf_tl_dbgfs_add(accel_dev);
}
}

@@ -79,6 +81,7 @@ void adf_dbgfs_rm(struct adf_accel_dev *accel_dev)
return;

if (!accel_dev->is_vf) {
+ adf_tl_dbgfs_rm(accel_dev);
adf_cnv_dbgfs_rm(accel_dev);
adf_pm_dbgfs_rm(accel_dev);
adf_heartbeat_dbgfs_rm(accel_dev);
diff --git a/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c b/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c
new file mode 100644
index 000000000000..4efbe6bc651c
--- /dev/null
+++ b/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2023 Intel Corporation. */
+#include <linux/export.h>
+#include <linux/kernel.h>
+
+#include "adf_gen4_tl.h"
+#include "adf_telemetry.h"
+#include "adf_tl_debugfs.h"
+
+#define ADF_GEN4_TL_DEV_REG_OFF(reg) ADF_TL_DEV_REG_OFF(reg, gen4)
+
+#define ADF_GEN4_TL_SL_UTIL_COUNTER(_name) \
+ ADF_TL_COUNTER("util_" #_name, \
+ ADF_TL_SIMPLE_COUNT, \
+ ADF_TL_SLICE_REG_OFF(_name, reg_tm_slice_util, gen4))
+
+#define ADF_GEN4_TL_SL_EXEC_COUNTER(_name) \
+ ADF_TL_COUNTER("exec_" #_name, \
+ ADF_TL_SIMPLE_COUNT, \
+ ADF_TL_SLICE_REG_OFF(_name, reg_tm_slice_exec_cnt, gen4))
+
+/* Device level counters. */
+static const struct adf_tl_dbg_counter dev_counters[] = {
+ /* PCIe partial transactions. */
+ ADF_TL_COUNTER(PCI_TRANS_CNT_NAME, ADF_TL_SIMPLE_COUNT,
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_pci_trans_cnt)),
+ /* Max read latency[ns]. */
+ ADF_TL_COUNTER(MAX_RD_LAT_NAME, ADF_TL_COUNTER_NS,
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_lat_max)),
+ /* Read latency average[ns]. */
+ ADF_TL_COUNTER_LATENCY(RD_LAT_ACC_NAME, ADF_TL_COUNTER_NS_AVG,
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_lat_acc),
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_rd_cmpl_cnt)),
+ /* Max get to put latency[ns]. */
+ ADF_TL_COUNTER(MAX_LAT_NAME, ADF_TL_COUNTER_NS,
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_gp_lat_max)),
+ /* Get to put latency average[ns]. */
+ ADF_TL_COUNTER_LATENCY(LAT_ACC_NAME, ADF_TL_COUNTER_NS_AVG,
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_gp_lat_acc),
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_ae_put_cnt)),
+ /* PCIe write bandwidth[Mbps]. */
+ ADF_TL_COUNTER(BW_IN_NAME, ADF_TL_COUNTER_MBPS,
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_bw_in)),
+ /* PCIe read bandwidth[Mbps]. */
+ ADF_TL_COUNTER(BW_OUT_NAME, ADF_TL_COUNTER_MBPS,
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_bw_out)),
+ /* Page request latency average[ns]. */
+ ADF_TL_COUNTER_LATENCY(PAGE_REQ_LAT_NAME, ADF_TL_COUNTER_NS_AVG,
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_page_req_lat_acc),
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_page_req_cnt)),
+ /* Page translation latency average[ns]. */
+ ADF_TL_COUNTER_LATENCY(AT_TRANS_LAT_NAME, ADF_TL_COUNTER_NS_AVG,
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_trans_lat_acc),
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_trans_lat_cnt)),
+ /* Maximum uTLB used. */
+ ADF_TL_COUNTER(AT_MAX_UTLB_USED_NAME, ADF_TL_SIMPLE_COUNT,
+ ADF_GEN4_TL_DEV_REG_OFF(reg_tl_at_max_tlb_used)),
+};
+
+/* Slice utilization counters. */
+static const struct adf_tl_dbg_counter sl_util_counters[ADF_TL_SL_CNT_COUNT] = {
+ /* Compression slice utilization. */
+ ADF_GEN4_TL_SL_UTIL_COUNTER(cpr),
+ /* Translator slice utilization. */
+ ADF_GEN4_TL_SL_UTIL_COUNTER(xlt),
+ /* Decompression slice utilization. */
+ ADF_GEN4_TL_SL_UTIL_COUNTER(dcpr),
+ /* PKE utilization. */
+ ADF_GEN4_TL_SL_UTIL_COUNTER(pke),
+ /* Wireless Authentication slice utilization. */
+ ADF_GEN4_TL_SL_UTIL_COUNTER(wat),
+ /* Wireless Cipher slice utilization. */
+ ADF_GEN4_TL_SL_UTIL_COUNTER(wcp),
+ /* UCS slice utilization. */
+ ADF_GEN4_TL_SL_UTIL_COUNTER(ucs),
+ /* Cipher slice utilization. */
+ ADF_GEN4_TL_SL_UTIL_COUNTER(cph),
+ /* Authentication slice utilization. */
+ ADF_GEN4_TL_SL_UTIL_COUNTER(ath),
+};
+
+/* Slice execution counters. */
+static const struct adf_tl_dbg_counter sl_exec_counters[ADF_TL_SL_CNT_COUNT] = {
+ /* Compression slice execution count. */
+ ADF_GEN4_TL_SL_EXEC_COUNTER(cpr),
+ /* Translator slice execution count. */
+ ADF_GEN4_TL_SL_EXEC_COUNTER(xlt),
+ /* Decompression slice execution count. */
+ ADF_GEN4_TL_SL_EXEC_COUNTER(dcpr),
+ /* PKE execution count. */
+ ADF_GEN4_TL_SL_EXEC_COUNTER(pke),
+ /* Wireless Authentication slice execution count. */
+ ADF_GEN4_TL_SL_EXEC_COUNTER(wat),
+ /* Wireless Cipher slice execution count. */
+ ADF_GEN4_TL_SL_EXEC_COUNTER(wcp),
+ /* UCS slice execution count. */
+ ADF_GEN4_TL_SL_EXEC_COUNTER(ucs),
+ /* Cipher slice execution count. */
+ ADF_GEN4_TL_SL_EXEC_COUNTER(cph),
+ /* Authentication slice execution count. */
+ ADF_GEN4_TL_SL_EXEC_COUNTER(ath),
+};
+
+void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data)
+{
+ tl_data->layout_sz = ADF_GEN4_TL_LAYOUT_SZ;
+ tl_data->slice_reg_sz = ADF_GEN4_TL_SLICE_REG_SZ;
+ tl_data->num_hbuff = ADF_GEN4_TL_NUM_HIST_BUFFS;
+ tl_data->msg_cnt_off = ADF_GEN4_TL_MSG_CNT_OFF;
+ tl_data->cpp_ns_per_cycle = ADF_GEN4_CPP_NS_PER_CYCLE;
+ tl_data->bw_units_to_bytes = ADF_GEN4_TL_BW_HW_UNITS_TO_BYTES;
+
+ tl_data->dev_counters = dev_counters;
+ tl_data->num_dev_counters = ARRAY_SIZE(dev_counters);
+ tl_data->sl_util_counters = sl_util_counters;
+ tl_data->sl_exec_counters = sl_exec_counters;
+}
+EXPORT_SYMBOL_GPL(adf_gen4_init_tl_data);
diff --git a/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.h b/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.h
new file mode 100644
index 000000000000..feb2eecf24cf
--- /dev/null
+++ b/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.h
@@ -0,0 +1,121 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2023 Intel Corporation. */
+#ifndef ADF_GEN4_TL_H
+#define ADF_GEN4_TL_H
+
+#include <linux/stddef.h>
+#include <linux/types.h>
+
+struct adf_tl_hw_data;
+
+/* Computation constants. */
+#define ADF_GEN4_CPP_NS_PER_CYCLE 2
+#define ADF_GEN4_TL_BW_HW_UNITS_TO_BYTES 64
+
+/* Maximum aggregation time. Value in milliseconds. */
+#define ADF_GEN4_TL_MAX_AGGR_TIME_MS 4000
+/* Num of buffers to store historic values. */
+#define ADF_GEN4_TL_NUM_HIST_BUFFS \
+ (ADF_GEN4_TL_MAX_AGGR_TIME_MS / ADF_TL_DATA_WR_INTERVAL_MS)
+
+/* Max number of HW resources of one type. */
+#define ADF_GEN4_TL_MAX_SLICES_PER_TYPE 24
+
+/**
+ * struct adf_gen4_tl_slice_data_regs - HW slice data as populated by FW.
+ * @reg_tm_slice_exec_cnt: Slice execution count.
+ * @reg_tm_slice_util: Slice utilization.
+ */
+struct adf_gen4_tl_slice_data_regs {
+ __u32 reg_tm_slice_exec_cnt;
+ __u32 reg_tm_slice_util;
+};
+
+#define ADF_GEN4_TL_SLICE_REG_SZ sizeof(struct adf_gen4_tl_slice_data_regs)
+
+/**
+ * struct adf_gen4_tl_device_data_regs - This structure stores device telemetry
+ * counter values as are being populated periodically by device.
+ * @reg_tl_rd_lat_acc: read latency accumulator
+ * @reg_tl_gp_lat_acc: get-put latency accumulator
+ * @reg_tl_at_page_req_lat_acc: AT/DevTLB page request latency accumulator
+ * @reg_tl_at_trans_lat_acc: DevTLB transaction latency accumulator
+ * @reg_tl_re_acc: accumulated ring empty time
+ * @reg_tl_pci_trans_cnt: PCIe partial transactions
+ * @reg_tl_rd_lat_max: maximum logged read latency
+ * @reg_tl_rd_cmpl_cnt: read requests completed count
+ * @reg_tl_gp_lat_max: maximum logged get to put latency
+ * @reg_tl_ae_put_cnt: Accelerator Engine put counts across all rings
+ * @reg_tl_bw_in: PCIe write bandwidth
+ * @reg_tl_bw_out: PCIe read bandwidth
+ * @reg_tl_at_page_req_cnt: DevTLB page requests count
+ * @reg_tl_at_trans_lat_cnt: DevTLB transaction latency samples count
+ * @reg_tl_at_max_tlb_used: maximum uTLB used
+ * @reg_tl_re_cnt: ring empty time samples count
+ * @reserved: reserved
+ * @ath_slices: array of Authentication slices utilization registers
+ * @cph_slices: array of Cipher slices utilization registers
+ * @cpr_slices: array of Compression slices utilization registers
+ * @xlt_slices: array of Translator slices utilization registers
+ * @dcpr_slices: array of Decompression slices utilization registers
+ * @pke_slices: array of PKE slices utilization registers
+ * @ucs_slices: array of UCS slices utilization registers
+ * @wat_slices: array of Wireless Authentication slices utilization registers
+ * @wcp_slices: array of Wireless Cipher slices utilization registers
+ */
+struct adf_gen4_tl_device_data_regs {
+ __u64 reg_tl_rd_lat_acc;
+ __u64 reg_tl_gp_lat_acc;
+ __u64 reg_tl_at_page_req_lat_acc;
+ __u64 reg_tl_at_trans_lat_acc;
+ __u64 reg_tl_re_acc;
+ __u32 reg_tl_pci_trans_cnt;
+ __u32 reg_tl_rd_lat_max;
+ __u32 reg_tl_rd_cmpl_cnt;
+ __u32 reg_tl_gp_lat_max;
+ __u32 reg_tl_ae_put_cnt;
+ __u32 reg_tl_bw_in;
+ __u32 reg_tl_bw_out;
+ __u32 reg_tl_at_page_req_cnt;
+ __u32 reg_tl_at_trans_lat_cnt;
+ __u32 reg_tl_at_max_tlb_used;
+ __u32 reg_tl_re_cnt;
+ __u32 reserved;
+ struct adf_gen4_tl_slice_data_regs ath_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
+ struct adf_gen4_tl_slice_data_regs cph_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
+ struct adf_gen4_tl_slice_data_regs cpr_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
+ struct adf_gen4_tl_slice_data_regs xlt_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
+ struct adf_gen4_tl_slice_data_regs dcpr_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
+ struct adf_gen4_tl_slice_data_regs pke_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
+ struct adf_gen4_tl_slice_data_regs ucs_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
+ struct adf_gen4_tl_slice_data_regs wat_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
+ struct adf_gen4_tl_slice_data_regs wcp_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
+};
+
+/**
+ * struct adf_gen4_tl_layout - This structure represents entire telemetry
+ * counters data: Device + 4 Ring Pairs as are being populated periodically
+ * by device.
+ * @tl_device_data_regs: structure of device telemetry registers
+ * @reserved1: reserved
+ * @reg_tl_msg_cnt: telemetry messages counter
+ * @reserved: reserved
+ */
+struct adf_gen4_tl_layout {
+ struct adf_gen4_tl_device_data_regs tl_device_data_regs;
+ __u32 reserved1[14];
+ __u32 reg_tl_msg_cnt;
+ __u32 reserved;
+};
+
+#define ADF_GEN4_TL_LAYOUT_SZ sizeof(struct adf_gen4_tl_layout)
+#define ADF_GEN4_TL_MSG_CNT_OFF offsetof(struct adf_gen4_tl_layout, reg_tl_msg_cnt)
+
+#ifdef CONFIG_DEBUG_FS
+void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data);
+#else
+static inline void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data)
+{
+}
+#endif /* CONFIG_DEBUG_FS */
+#endif /* ADF_GEN4_TL_H */
diff --git a/drivers/crypto/intel/qat/qat_common/adf_init.c b/drivers/crypto/intel/qat/qat_common/adf_init.c
index 81c39f3d07e1..f43ae9111553 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_init.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_init.c
@@ -11,6 +11,7 @@
#include "adf_heartbeat.h"
#include "adf_rl.h"
#include "adf_sysfs_ras_counters.h"
+#include "adf_telemetry.h"

static LIST_HEAD(service_table);
static DEFINE_MUTEX(service_lock);
@@ -142,6 +143,10 @@ static int adf_dev_init(struct adf_accel_dev *accel_dev)
if (ret && ret != -EOPNOTSUPP)
return ret;

+ ret = adf_tl_init(accel_dev);
+ if (ret && ret != -EOPNOTSUPP)
+ return ret;
+
/*
* Subservice initialisation is divided into two stages: init and start.
* This is to facilitate any ordering dependencies between services
@@ -220,6 +225,10 @@ static int adf_dev_start(struct adf_accel_dev *accel_dev)
if (ret && ret != -EOPNOTSUPP)
return ret;

+ ret = adf_tl_start(accel_dev);
+ if (ret && ret != -EOPNOTSUPP)
+ return ret;
+
list_for_each_entry(service, &service_table, list) {
if (service->event_hld(accel_dev, ADF_EVENT_START)) {
dev_err(&GET_DEV(accel_dev),
@@ -279,6 +288,7 @@ static void adf_dev_stop(struct adf_accel_dev *accel_dev)
!test_bit(ADF_STATUS_STARTING, &accel_dev->status))
return;

+ adf_tl_stop(accel_dev);
adf_rl_stop(accel_dev);
adf_dbgfs_rm(accel_dev);
adf_sysfs_stop_ras(accel_dev);
@@ -374,6 +384,8 @@ static void adf_dev_shutdown(struct adf_accel_dev *accel_dev)

adf_heartbeat_shutdown(accel_dev);

+ adf_tl_shutdown(accel_dev);
+
hw_data->disable_iov(accel_dev);

if (test_bit(ADF_STATUS_IRQ_ALLOCATED, &accel_dev->status)) {
diff --git a/drivers/crypto/intel/qat/qat_common/adf_telemetry.c b/drivers/crypto/intel/qat/qat_common/adf_telemetry.c
new file mode 100644
index 000000000000..05c476d58895
--- /dev/null
+++ b/drivers/crypto/intel/qat/qat_common/adf_telemetry.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2023 Intel Corporation. */
+#define dev_fmt(fmt) "Telemetry: " fmt
+
+#include <asm/errno.h>
+#include <linux/atomic.h>
+#include <linux/device.h>
+#include <linux/dev_printk.h>
+#include <linux/dma-mapping.h>
+#include <linux/jiffies.h>
+#include <linux/kernel.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/workqueue.h>
+
+#include "adf_admin.h"
+#include "adf_accel_devices.h"
+#include "adf_common_drv.h"
+#include "adf_telemetry.h"
+
+#define TL_IS_ZERO(input) ((input) == 0)
+
+static bool is_tl_supported(struct adf_accel_dev *accel_dev)
+{
+ u16 fw_caps = GET_HW_DATA(accel_dev)->fw_capabilities;
+
+ return fw_caps & TL_CAPABILITY_BIT;
+}
+
+static int validate_tl_data(struct adf_tl_hw_data *tl_data)
+{
+ if (!tl_data->dev_counters ||
+ TL_IS_ZERO(tl_data->num_dev_counters) ||
+ !tl_data->sl_util_counters ||
+ !tl_data->sl_exec_counters)
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+static int adf_tl_alloc_mem(struct adf_accel_dev *accel_dev)
+{
+ struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
+ struct device *dev = &GET_DEV(accel_dev);
+ size_t regs_sz = tl_data->layout_sz;
+ struct adf_telemetry *telemetry;
+ int node = dev_to_node(dev);
+ void *tl_data_regs;
+ unsigned int i;
+
+ telemetry = kzalloc_node(sizeof(*telemetry), GFP_KERNEL, node);
+ if (!telemetry)
+ return -ENOMEM;
+
+ telemetry->regs_hist_buff = kmalloc_array(tl_data->num_hbuff,
+ sizeof(*telemetry->regs_hist_buff),
+ GFP_KERNEL);
+ if (!telemetry->regs_hist_buff)
+ goto err_free_tl;
+
+ telemetry->regs_data = dma_alloc_coherent(dev, regs_sz,
+ &telemetry->regs_data_p,
+ GFP_KERNEL);
+ if (!telemetry->regs_data)
+ goto err_free_regs_hist_buff;
+
+ for (i = 0; i < tl_data->num_hbuff; i++) {
+ tl_data_regs = kzalloc_node(regs_sz, GFP_KERNEL, node);
+ if (!tl_data_regs)
+ goto err_free_dma;
+
+ telemetry->regs_hist_buff[i] = tl_data_regs;
+ }
+
+ accel_dev->telemetry = telemetry;
+
+ return 0;
+
+err_free_dma:
+ dma_free_coherent(dev, regs_sz, telemetry->regs_data,
+ telemetry->regs_data_p);
+
+ while (i--)
+ kfree(telemetry->regs_hist_buff[i]);
+
+err_free_regs_hist_buff:
+ kfree(telemetry->regs_hist_buff);
+err_free_tl:
+ kfree(telemetry);
+
+ return -ENOMEM;
+}
+
+static void adf_tl_free_mem(struct adf_accel_dev *accel_dev)
+{
+ struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
+ struct adf_telemetry *telemetry = accel_dev->telemetry;
+ struct device *dev = &GET_DEV(accel_dev);
+ size_t regs_sz = tl_data->layout_sz;
+ unsigned int i;
+
+ for (i = 0; i < tl_data->num_hbuff; i++)
+ kfree(telemetry->regs_hist_buff[i]);
+
+ dma_free_coherent(dev, regs_sz, telemetry->regs_data,
+ telemetry->regs_data_p);
+
+ kfree(telemetry->regs_hist_buff);
+ kfree(telemetry);
+ accel_dev->telemetry = NULL;
+}
+
+static unsigned long get_next_timeout(void)
+{
+ return msecs_to_jiffies(ADF_TL_TIMER_INT_MS);
+}
+
+static void snapshot_regs(struct adf_telemetry *telemetry, size_t size)
+{
+ void *dst = telemetry->regs_hist_buff[telemetry->hb_num];
+ void *src = telemetry->regs_data;
+
+ memcpy(dst, src, size);
+}
+
+static void tl_work_handler(struct work_struct *work)
+{
+ struct delayed_work *delayed_work;
+ struct adf_telemetry *telemetry;
+ struct adf_tl_hw_data *tl_data;
+ u32 msg_cnt, old_msg_cnt;
+ size_t layout_sz;
+ u32 *regs_data;
+ size_t id;
+
+ delayed_work = to_delayed_work(work);
+ telemetry = container_of(delayed_work, struct adf_telemetry, work_ctx);
+ tl_data = &GET_TL_DATA(telemetry->accel_dev);
+ regs_data = telemetry->regs_data;
+
+ id = tl_data->msg_cnt_off / sizeof(*regs_data);
+ layout_sz = tl_data->layout_sz;
+
+ if (!atomic_read(&telemetry->state)) {
+ cancel_delayed_work_sync(&telemetry->work_ctx);
+ return;
+ }
+
+ msg_cnt = regs_data[id];
+ old_msg_cnt = msg_cnt;
+ if (msg_cnt == telemetry->msg_cnt)
+ goto out;
+
+ mutex_lock(&telemetry->regs_hist_lock);
+
+ snapshot_regs(telemetry, layout_sz);
+
+ /* Check if data changed while updating it */
+ msg_cnt = regs_data[id];
+ if (old_msg_cnt != msg_cnt)
+ snapshot_regs(telemetry, layout_sz);
+
+ telemetry->msg_cnt = msg_cnt;
+ telemetry->hb_num++;
+ telemetry->hb_num %= telemetry->hbuffs;
+
+ mutex_unlock(&telemetry->regs_hist_lock);
+
+out:
+ adf_misc_wq_queue_delayed_work(&telemetry->work_ctx, get_next_timeout());
+}
+
+int adf_tl_halt(struct adf_accel_dev *accel_dev)
+{
+ struct adf_telemetry *telemetry = accel_dev->telemetry;
+ struct device *dev = &GET_DEV(accel_dev);
+ int ret;
+
+ cancel_delayed_work_sync(&telemetry->work_ctx);
+ atomic_set(&telemetry->state, 0);
+
+ ret = adf_send_admin_tl_stop(accel_dev);
+ if (ret)
+ dev_err(dev, "failed to stop telemetry\n");
+
+ return ret;
+}
+
+int adf_tl_run(struct adf_accel_dev *accel_dev, int state)
+{
+ struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
+ struct adf_telemetry *telemetry = accel_dev->telemetry;
+ struct device *dev = &GET_DEV(accel_dev);
+ size_t layout_sz = tl_data->layout_sz;
+ int ret;
+
+ ret = adf_send_admin_tl_start(accel_dev, telemetry->regs_data_p,
+ layout_sz, NULL, &telemetry->slice_cnt);
+ if (ret) {
+ dev_err(dev, "failed to start telemetry\n");
+ return ret;
+ }
+
+ telemetry->hbuffs = state;
+ atomic_set(&telemetry->state, state);
+
+ adf_misc_wq_queue_delayed_work(&telemetry->work_ctx, get_next_timeout());
+
+ return 0;
+}
+
+int adf_tl_init(struct adf_accel_dev *accel_dev)
+{
+ struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
+ struct device *dev = &GET_DEV(accel_dev);
+ struct adf_telemetry *telemetry;
+ int ret;
+
+ ret = validate_tl_data(tl_data);
+ if (ret)
+ return ret;
+
+ ret = adf_tl_alloc_mem(accel_dev);
+ if (ret) {
+ dev_err(dev, "failed to initialize: %d\n", ret);
+ return ret;
+ }
+
+ telemetry = accel_dev->telemetry;
+ telemetry->accel_dev = accel_dev;
+
+ mutex_init(&telemetry->wr_lock);
+ mutex_init(&telemetry->regs_hist_lock);
+ INIT_DELAYED_WORK(&telemetry->work_ctx, tl_work_handler);
+
+ return 0;
+}
+
+int adf_tl_start(struct adf_accel_dev *accel_dev)
+{
+ struct device *dev = &GET_DEV(accel_dev);
+
+ if (!accel_dev->telemetry)
+ return -EOPNOTSUPP;
+
+ if (!is_tl_supported(accel_dev)) {
+ dev_info(dev, "feature not supported by FW\n");
+ adf_tl_free_mem(accel_dev);
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+void adf_tl_stop(struct adf_accel_dev *accel_dev)
+{
+ if (!accel_dev->telemetry)
+ return;
+
+ if (atomic_read(&accel_dev->telemetry->state))
+ adf_tl_halt(accel_dev);
+}
+
+void adf_tl_shutdown(struct adf_accel_dev *accel_dev)
+{
+ if (!accel_dev->telemetry)
+ return;
+
+ adf_tl_free_mem(accel_dev);
+}
diff --git a/drivers/crypto/intel/qat/qat_common/adf_telemetry.h b/drivers/crypto/intel/qat/qat_common/adf_telemetry.h
new file mode 100644
index 000000000000..08de17621467
--- /dev/null
+++ b/drivers/crypto/intel/qat/qat_common/adf_telemetry.h
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2023 Intel Corporation. */
+#ifndef ADF_TELEMETRY_H
+#define ADF_TELEMETRY_H
+
+#include <linux/bits.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
+
+#include "icp_qat_fw_init_admin.h"
+
+struct adf_accel_dev;
+struct adf_tl_dbg_counter;
+struct dentry;
+
+#define ADF_TL_SL_CNT_COUNT \
+ (sizeof(struct icp_qat_fw_init_admin_slice_cnt) / sizeof(__u8))
+
+#define TL_CAPABILITY_BIT BIT(1)
+/* Interval within device writes data to DMA region. Value in milliseconds. */
+#define ADF_TL_DATA_WR_INTERVAL_MS 1000
+/* Interval within timer interrupt should be handled. Value in milliseconds. */
+#define ADF_TL_TIMER_INT_MS (ADF_TL_DATA_WR_INTERVAL_MS / 2)
+
+struct adf_tl_hw_data {
+ size_t layout_sz;
+ size_t slice_reg_sz;
+ size_t msg_cnt_off;
+ const struct adf_tl_dbg_counter *dev_counters;
+ const struct adf_tl_dbg_counter *sl_util_counters;
+ const struct adf_tl_dbg_counter *sl_exec_counters;
+ u8 num_hbuff;
+ u8 cpp_ns_per_cycle;
+ u8 bw_units_to_bytes;
+ u8 num_dev_counters;
+};
+
+struct adf_telemetry {
+ struct adf_accel_dev *accel_dev;
+ atomic_t state;
+ u32 hbuffs;
+ int hb_num;
+ u32 msg_cnt;
+ dma_addr_t regs_data_p; /* bus address for DMA mapping */
+ void *regs_data; /* virtual address for DMA mapping */
+ /**
+ * @regs_hist_buff: array of pointers to copies of the last @hbuffs
+ * values of @regs_data
+ */
+ void **regs_hist_buff;
+ struct dentry *dbg_dir;
+ /**
+ * @regs_hist_lock: protects from race conditions between write and read
+ * to the copies referenced by @regs_hist_buff
+ */
+ struct mutex regs_hist_lock;
+ /**
+ * @wr_lock: protects from concurrent writes to debugfs telemetry files
+ */
+ struct mutex wr_lock;
+ struct delayed_work work_ctx;
+ struct icp_qat_fw_init_admin_slice_cnt slice_cnt;
+};
+
+#ifdef CONFIG_DEBUG_FS
+int adf_tl_init(struct adf_accel_dev *accel_dev);
+int adf_tl_start(struct adf_accel_dev *accel_dev);
+void adf_tl_stop(struct adf_accel_dev *accel_dev);
+void adf_tl_shutdown(struct adf_accel_dev *accel_dev);
+int adf_tl_run(struct adf_accel_dev *accel_dev, int state);
+int adf_tl_halt(struct adf_accel_dev *accel_dev);
+#else
+static inline int adf_tl_init(struct adf_accel_dev *accel_dev)
+{
+ return 0;
+}
+
+static inline int adf_tl_start(struct adf_accel_dev *accel_dev)
+{
+ return 0;
+}
+
+static inline void adf_tl_stop(struct adf_accel_dev *accel_dev)
+{
+}
+
+static inline void adf_tl_shutdown(struct adf_accel_dev *accel_dev)
+{
+}
+#endif /* CONFIG_DEBUG_FS */
+#endif /* ADF_TELEMETRY_H */
diff --git a/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c b/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c
new file mode 100644
index 000000000000..accb46d6ea3c
--- /dev/null
+++ b/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c
@@ -0,0 +1,502 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2023 Intel Corporation. */
+#define dev_fmt(fmt) "Telemetry debugfs: " fmt
+
+#include <linux/atomic.h>
+#include <linux/debugfs.h>
+#include <linux/dev_printk.h>
+#include <linux/dcache.h>
+#include <linux/kernel.h>
+#include <linux/math64.h>
+#include <linux/mutex.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/units.h>
+
+#include "adf_accel_devices.h"
+#include "adf_telemetry.h"
+#include "adf_tl_debugfs.h"
+
+#define TL_VALUE_MIN_PADDING 20
+#define TL_KEY_MIN_PADDING 23
+
+static int tl_collect_values_u32(struct adf_telemetry *telemetry,
+ size_t counter_offset, u64 *arr)
+{
+ unsigned int samples, hb_idx, i;
+ u32 *regs_hist_buff;
+ u32 counter_val;
+
+ samples = min(telemetry->msg_cnt, telemetry->hbuffs);
+ hb_idx = telemetry->hb_num + telemetry->hbuffs - samples;
+
+ mutex_lock(&telemetry->regs_hist_lock);
+
+ for (i = 0; i < samples; i++) {
+ regs_hist_buff = telemetry->regs_hist_buff[hb_idx % telemetry->hbuffs];
+ counter_val = regs_hist_buff[counter_offset / sizeof(counter_val)];
+ arr[i] = counter_val;
+ hb_idx++;
+ }
+
+ mutex_unlock(&telemetry->regs_hist_lock);
+
+ return samples;
+}
+
+static int tl_collect_values_u64(struct adf_telemetry *telemetry,
+ size_t counter_offset, u64 *arr)
+{
+ unsigned int samples, hb_idx, i;
+ u64 *regs_hist_buff;
+ u64 counter_val;
+
+ samples = min(telemetry->msg_cnt, telemetry->hbuffs);
+ hb_idx = telemetry->hb_num + telemetry->hbuffs - samples;
+
+ mutex_lock(&telemetry->regs_hist_lock);
+
+ for (i = 0; i < samples; i++) {
+ regs_hist_buff = telemetry->regs_hist_buff[hb_idx % telemetry->hbuffs];
+ counter_val = regs_hist_buff[counter_offset / sizeof(counter_val)];
+ arr[i] = counter_val;
+ hb_idx++;
+ }
+
+ mutex_unlock(&telemetry->regs_hist_lock);
+
+ return samples;
+}
+
+/**
+ * avg_array() - Return average of values within an array.
+ * @array: Array of values.
+ * @len: Number of elements.
+ *
+ * This algorithm computes average of an array without running into overflow.
+ *
+ * Return: average of values.
+ */
+#define avg_array(array, len) ( \
+{ \
+ typeof(&(array)[0]) _array = (array); \
+ __unqual_scalar_typeof(_array[0]) _x = 0; \
+ __unqual_scalar_typeof(_array[0]) _y = 0; \
+ __unqual_scalar_typeof(_array[0]) _a, _b; \
+ typeof(len) _len = (len); \
+ size_t _i; \
+ \
+ for (_i = 0; _i < _len; _i++) { \
+ _a = _array[_i]; \
+ _b = do_div(_a, _len); \
+ _x += _a; \
+ if (_y >= _len - _b) { \
+ _x++; \
+ _y -= _len - _b; \
+ } else { \
+ _y += _b; \
+ } \
+ } \
+ do_div(_y, _len); \
+ (_x + _y); \
+})
+
+/* Calculation function for simple counter. */
+static int tl_calc_count(struct adf_telemetry *telemetry,
+ const struct adf_tl_dbg_counter *ctr,
+ struct adf_tl_dbg_aggr_values *vals)
+{
+ struct adf_tl_hw_data *tl_data = &GET_TL_DATA(telemetry->accel_dev);
+ u64 *hist_vals;
+ int sample_cnt;
+ int ret = 0;
+
+ hist_vals = kmalloc_array(tl_data->num_hbuff, sizeof(*hist_vals),
+ GFP_KERNEL);
+ if (!hist_vals)
+ return -ENOMEM;
+
+ memset(vals, 0, sizeof(*vals));
+ sample_cnt = tl_collect_values_u32(telemetry, ctr->offset1, hist_vals);
+ if (!sample_cnt)
+ goto out_free_hist_vals;
+
+ vals->curr = hist_vals[sample_cnt - 1];
+ vals->min = min_array(hist_vals, sample_cnt);
+ vals->max = max_array(hist_vals, sample_cnt);
+ vals->avg = avg_array(hist_vals, sample_cnt);
+
+out_free_hist_vals:
+ kfree(hist_vals);
+ return ret;
+}
+
+/* Convert CPP bus cycles to ns. */
+static int tl_cycles_to_ns(struct adf_telemetry *telemetry,
+ const struct adf_tl_dbg_counter *ctr,
+ struct adf_tl_dbg_aggr_values *vals)
+{
+ struct adf_tl_hw_data *tl_data = &GET_TL_DATA(telemetry->accel_dev);
+ u8 cpp_ns_per_cycle = tl_data->cpp_ns_per_cycle;
+ int ret;
+
+ ret = tl_calc_count(telemetry, ctr, vals);
+ if (ret)
+ return ret;
+
+ vals->curr *= cpp_ns_per_cycle;
+ vals->min *= cpp_ns_per_cycle;
+ vals->max *= cpp_ns_per_cycle;
+ vals->avg *= cpp_ns_per_cycle;
+
+ return 0;
+}
+
+/*
+ * Compute latency cumulative average with division of accumulated value
+ * by sample count. Returned value is in ns.
+ */
+static int tl_lat_acc_avg(struct adf_telemetry *telemetry,
+ const struct adf_tl_dbg_counter *ctr,
+ struct adf_tl_dbg_aggr_values *vals)
+{
+ struct adf_tl_hw_data *tl_data = &GET_TL_DATA(telemetry->accel_dev);
+ u8 cpp_ns_per_cycle = tl_data->cpp_ns_per_cycle;
+ u8 num_hbuff = tl_data->num_hbuff;
+ int sample_cnt, i;
+ u64 *hist_vals;
+ u64 *hist_cnt;
+ int ret = 0;
+
+ hist_vals = kmalloc_array(num_hbuff, sizeof(*hist_vals), GFP_KERNEL);
+ if (!hist_vals)
+ return -ENOMEM;
+
+ hist_cnt = kmalloc_array(num_hbuff, sizeof(*hist_cnt), GFP_KERNEL);
+ if (!hist_cnt) {
+ ret = -ENOMEM;
+ goto out_free_hist_vals;
+ }
+
+ memset(vals, 0, sizeof(*vals));
+ sample_cnt = tl_collect_values_u64(telemetry, ctr->offset1, hist_vals);
+ if (!sample_cnt)
+ goto out_free_hist_cnt;
+
+ tl_collect_values_u32(telemetry, ctr->offset2, hist_cnt);
+
+ for (i = 0; i < sample_cnt; i++) {
+ /* Avoid division by 0 if count is 0. */
+ if (hist_cnt[i])
+ hist_vals[i] = div_u64(hist_vals[i] * cpp_ns_per_cycle,
+ hist_cnt[i]);
+ else
+ hist_vals[i] = 0;
+ }
+
+ vals->curr = hist_vals[sample_cnt - 1];
+ vals->min = min_array(hist_vals, sample_cnt);
+ vals->max = max_array(hist_vals, sample_cnt);
+ vals->avg = avg_array(hist_vals, sample_cnt);
+
+out_free_hist_cnt:
+ kfree(hist_cnt);
+out_free_hist_vals:
+ kfree(hist_vals);
+ return ret;
+}
+
+/* Convert HW raw bandwidth units to Mbps. */
+static int tl_bw_hw_units_to_mbps(struct adf_telemetry *telemetry,
+ const struct adf_tl_dbg_counter *ctr,
+ struct adf_tl_dbg_aggr_values *vals)
+{
+ struct adf_tl_hw_data *tl_data = &GET_TL_DATA(telemetry->accel_dev);
+ u16 bw_hw_2_bits = tl_data->bw_units_to_bytes * BITS_PER_BYTE;
+ u64 *hist_vals;
+ int sample_cnt;
+ int ret = 0;
+
+ hist_vals = kmalloc_array(tl_data->num_hbuff, sizeof(*hist_vals),
+ GFP_KERNEL);
+ if (!hist_vals)
+ return -ENOMEM;
+
+ memset(vals, 0, sizeof(*vals));
+ sample_cnt = tl_collect_values_u32(telemetry, ctr->offset1, hist_vals);
+ if (!sample_cnt)
+ goto out_free_hist_vals;
+
+ vals->curr = div_u64(hist_vals[sample_cnt - 1] * bw_hw_2_bits, MEGA);
+ vals->min = div_u64(min_array(hist_vals, sample_cnt) * bw_hw_2_bits, MEGA);
+ vals->max = div_u64(max_array(hist_vals, sample_cnt) * bw_hw_2_bits, MEGA);
+ vals->avg = div_u64(avg_array(hist_vals, sample_cnt) * bw_hw_2_bits, MEGA);
+
+out_free_hist_vals:
+ kfree(hist_vals);
+ return ret;
+}
+
+static void tl_seq_printf_counter(struct adf_telemetry *telemetry,
+ struct seq_file *s, const char *name,
+ struct adf_tl_dbg_aggr_values *vals)
+{
+ seq_printf(s, "%-*s", TL_KEY_MIN_PADDING, name);
+ seq_printf(s, "%*llu", TL_VALUE_MIN_PADDING, vals->curr);
+ if (atomic_read(&telemetry->state) > 1) {
+ seq_printf(s, "%*llu", TL_VALUE_MIN_PADDING, vals->min);
+ seq_printf(s, "%*llu", TL_VALUE_MIN_PADDING, vals->max);
+ seq_printf(s, "%*llu", TL_VALUE_MIN_PADDING, vals->avg);
+ }
+ seq_puts(s, "\n");
+}
+
+static int tl_calc_and_print_counter(struct adf_telemetry *telemetry,
+ struct seq_file *s,
+ const struct adf_tl_dbg_counter *ctr,
+ const char *name)
+{
+ const char *counter_name = name ? name : ctr->name;
+ enum adf_tl_counter_type type = ctr->type;
+ struct adf_tl_dbg_aggr_values vals;
+ int ret;
+
+ switch (type) {
+ case ADF_TL_SIMPLE_COUNT:
+ ret = tl_calc_count(telemetry, ctr, &vals);
+ break;
+ case ADF_TL_COUNTER_NS:
+ ret = tl_cycles_to_ns(telemetry, ctr, &vals);
+ break;
+ case ADF_TL_COUNTER_NS_AVG:
+ ret = tl_lat_acc_avg(telemetry, ctr, &vals);
+ break;
+ case ADF_TL_COUNTER_MBPS:
+ ret = tl_bw_hw_units_to_mbps(telemetry, ctr, &vals);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (ret)
+ return ret;
+
+ tl_seq_printf_counter(telemetry, s, counter_name, &vals);
+
+ return 0;
+}
+
+static int tl_print_sl_counter(struct adf_telemetry *telemetry,
+ const struct adf_tl_dbg_counter *ctr,
+ struct seq_file *s, u8 cnt_id)
+{
+ size_t sl_regs_sz = GET_TL_DATA(telemetry->accel_dev).slice_reg_sz;
+ struct adf_tl_dbg_counter slice_ctr;
+ size_t offset_inc = cnt_id * sl_regs_sz;
+ char cnt_name[MAX_COUNT_NAME_SIZE];
+
+ snprintf(cnt_name, MAX_COUNT_NAME_SIZE, "%s%d", ctr->name, cnt_id);
+ slice_ctr = *ctr;
+ slice_ctr.offset1 += offset_inc;
+
+ return tl_calc_and_print_counter(telemetry, s, &slice_ctr, cnt_name);
+}
+
+static int tl_calc_and_print_sl_counters(struct adf_accel_dev *accel_dev,
+ struct seq_file *s, u8 cnt_type, u8 cnt_id)
+{
+ struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
+ struct adf_telemetry *telemetry = accel_dev->telemetry;
+ const struct adf_tl_dbg_counter *sl_tl_util_counters;
+ const struct adf_tl_dbg_counter *sl_tl_exec_counters;
+ const struct adf_tl_dbg_counter *ctr;
+ int ret;
+
+ sl_tl_util_counters = tl_data->sl_util_counters;
+ sl_tl_exec_counters = tl_data->sl_exec_counters;
+
+ ctr = &sl_tl_util_counters[cnt_type];
+
+ ret = tl_print_sl_counter(telemetry, ctr, s, cnt_id);
+ if (ret) {
+ dev_notice(&GET_DEV(accel_dev),
+ "invalid slice utilization counter type\n");
+ return ret;
+ }
+
+ ctr = &sl_tl_exec_counters[cnt_type];
+
+ ret = tl_print_sl_counter(telemetry, ctr, s, cnt_id);
+ if (ret) {
+ dev_notice(&GET_DEV(accel_dev),
+ "invalid slice execution counter type\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+static void tl_print_msg_cnt(struct seq_file *s, u32 msg_cnt)
+{
+ seq_printf(s, "%-*s", TL_KEY_MIN_PADDING, SNAPSHOT_CNT_MSG);
+ seq_printf(s, "%*u\n", TL_VALUE_MIN_PADDING, msg_cnt);
+}
+
+static int tl_print_dev_data(struct adf_accel_dev *accel_dev,
+ struct seq_file *s)
+{
+ struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
+ struct adf_telemetry *telemetry = accel_dev->telemetry;
+ const struct adf_tl_dbg_counter *dev_tl_counters;
+ u8 num_dev_counters = tl_data->num_dev_counters;
+ u8 *sl_cnt = (u8 *)&telemetry->slice_cnt;
+ const struct adf_tl_dbg_counter *ctr;
+ unsigned int i;
+ int ret;
+ u8 j;
+
+ if (!atomic_read(&telemetry->state)) {
+ dev_info(&GET_DEV(accel_dev), "not enabled\n");
+ return -EPERM;
+ }
+
+ dev_tl_counters = tl_data->dev_counters;
+
+ tl_print_msg_cnt(s, telemetry->msg_cnt);
+
+ /* Print device level telemetry. */
+ for (i = 0; i < num_dev_counters; i++) {
+ ctr = &dev_tl_counters[i];
+ ret = tl_calc_and_print_counter(telemetry, s, ctr, NULL);
+ if (ret) {
+ dev_notice(&GET_DEV(accel_dev),
+ "invalid counter type\n");
+ return ret;
+ }
+ }
+
+ /* Print per slice telemetry. */
+ for (i = 0; i < ADF_TL_SL_CNT_COUNT; i++) {
+ for (j = 0; j < sl_cnt[i]; j++) {
+ ret = tl_calc_and_print_sl_counters(accel_dev, s, i, j);
+ if (ret)
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+static int tl_dev_data_show(struct seq_file *s, void *unused)
+{
+ struct adf_accel_dev *accel_dev = s->private;
+
+ if (!accel_dev)
+ return -EINVAL;
+
+ return tl_print_dev_data(accel_dev, s);
+}
+DEFINE_SHOW_ATTRIBUTE(tl_dev_data);
+
+static int tl_control_show(struct seq_file *s, void *unused)
+{
+ struct adf_accel_dev *accel_dev = s->private;
+
+ if (!accel_dev)
+ return -EINVAL;
+
+ seq_printf(s, "%d\n", atomic_read(&accel_dev->telemetry->state));
+
+ return 0;
+}
+
+static ssize_t tl_control_write(struct file *file, const char __user *userbuf,
+ size_t count, loff_t *ppos)
+{
+ struct seq_file *seq_f = file->private_data;
+ struct adf_accel_dev *accel_dev;
+ struct adf_telemetry *telemetry;
+ struct adf_tl_hw_data *tl_data;
+ struct device *dev;
+ u32 input;
+ int ret;
+
+ accel_dev = seq_f->private;
+ if (!accel_dev)
+ return -EINVAL;
+
+ tl_data = &GET_TL_DATA(accel_dev);
+ telemetry = accel_dev->telemetry;
+ dev = &GET_DEV(accel_dev);
+
+ mutex_lock(&telemetry->wr_lock);
+
+ ret = kstrtou32_from_user(userbuf, count, 10, &input);
+ if (ret)
+ goto unlock_and_exit;
+
+ if (input > tl_data->num_hbuff) {
+ dev_info(dev, "invalid control input\n");
+ ret = -EINVAL;
+ goto unlock_and_exit;
+ }
+
+ /* If input is 0, just stop telemetry. */
+ if (!input) {
+ ret = adf_tl_halt(accel_dev);
+ if (!ret)
+ ret = count;
+
+ goto unlock_and_exit;
+ }
+
+ /* If TL is already enabled, stop it. */
+ if (atomic_read(&telemetry->state)) {
+ dev_info(dev, "already enabled, restarting.\n");
+ ret = adf_tl_halt(accel_dev);
+ if (ret)
+ goto unlock_and_exit;
+ }
+
+ ret = adf_tl_run(accel_dev, input);
+ if (ret)
+ goto unlock_and_exit;
+
+ ret = count;
+
+unlock_and_exit:
+ mutex_unlock(&telemetry->wr_lock);
+ return ret;
+}
+DEFINE_SHOW_STORE_ATTRIBUTE(tl_control);
+
+void adf_tl_dbgfs_add(struct adf_accel_dev *accel_dev)
+{
+ struct adf_telemetry *telemetry = accel_dev->telemetry;
+ struct dentry *parent = accel_dev->debugfs_dir;
+ struct dentry *dir;
+
+ if (!telemetry)
+ return;
+
+ dir = debugfs_create_dir("telemetry", parent);
+ accel_dev->telemetry->dbg_dir = dir;
+ debugfs_create_file("device_data", 0444, dir, accel_dev, &tl_dev_data_fops);
+ debugfs_create_file("control", 0644, dir, accel_dev, &tl_control_fops);
+}
+
+void adf_tl_dbgfs_rm(struct adf_accel_dev *accel_dev)
+{
+ struct adf_telemetry *telemetry = accel_dev->telemetry;
+ struct dentry *dbg_dir;
+
+ if (!telemetry)
+ return;
+
+ dbg_dir = telemetry->dbg_dir;
+
+ debugfs_remove_recursive(dbg_dir);
+
+ if (atomic_read(&telemetry->state))
+ adf_tl_halt(accel_dev);
+}
diff --git a/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.h b/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.h
new file mode 100644
index 000000000000..b2e8f1912c16
--- /dev/null
+++ b/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.h
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2023 Intel Corporation. */
+#ifndef ADF_TL_DEBUGFS_H
+#define ADF_TL_DEBUGFS_H
+
+#include <linux/types.h>
+
+struct adf_accel_dev;
+
+#define MAX_COUNT_NAME_SIZE 32
+#define SNAPSHOT_CNT_MSG "sample_cnt"
+#define RP_NUM_INDEX "rp_num"
+#define PCI_TRANS_CNT_NAME "pci_trans_cnt"
+#define MAX_RD_LAT_NAME "max_rd_lat"
+#define RD_LAT_ACC_NAME "rd_lat_acc_avg"
+#define MAX_LAT_NAME "max_gp_lat"
+#define LAT_ACC_NAME "gp_lat_acc_avg"
+#define BW_IN_NAME "bw_in"
+#define BW_OUT_NAME "bw_out"
+#define PAGE_REQ_LAT_NAME "at_page_req_lat_avg"
+#define AT_TRANS_LAT_NAME "at_trans_lat_avg"
+#define AT_MAX_UTLB_USED_NAME "at_max_tlb_used"
+#define AT_GLOB_DTLB_HIT_NAME "at_glob_devtlb_hit"
+#define AT_GLOB_DTLB_MISS_NAME "at_glob_devtlb_miss"
+#define AT_PAYLD_DTLB_HIT_NAME "tl_at_payld_devtlb_hit"
+#define AT_PAYLD_DTLB_MISS_NAME "tl_at_payld_devtlb_miss"
+
+#define ADF_TL_DATA_REG_OFF(reg, qat_gen) \
+ offsetof(struct adf_##qat_gen##_tl_layout, reg)
+
+#define ADF_TL_DEV_REG_OFF(reg, qat_gen) \
+ (ADF_TL_DATA_REG_OFF(tl_device_data_regs, qat_gen) + \
+ offsetof(struct adf_##qat_gen##_tl_device_data_regs, reg))
+
+#define ADF_TL_SLICE_REG_OFF(slice, reg, qat_gen) \
+ (ADF_TL_DEV_REG_OFF(slice##_slices[0], qat_gen) + \
+ offsetof(struct adf_##qat_gen##_tl_slice_data_regs, reg))
+
+/**
+ * enum adf_tl_counter_type - telemetry counter types
+ * @ADF_TL_COUNTER_UNSUPPORTED: unsupported counter
+ * @ADF_TL_SIMPLE_COUNT: simple counter
+ * @ADF_TL_COUNTER_NS: latency counter, value in ns
+ * @ADF_TL_COUNTER_NS_AVG: accumulated average latency counter, value in ns
+ * @ADF_TL_COUNTER_MBPS: bandwidth, value in MBps
+ */
+enum adf_tl_counter_type {
+ ADF_TL_COUNTER_UNSUPPORTED,
+ ADF_TL_SIMPLE_COUNT,
+ ADF_TL_COUNTER_NS,
+ ADF_TL_COUNTER_NS_AVG,
+ ADF_TL_COUNTER_MBPS,
+};
+
+/**
+ * struct adf_tl_dbg_counter - telemetry counter definition
+ * @name: name of the counter as printed in the report
+ * @adf_tl_counter_type: type of the counter
+ * @offset1: offset of 1st register
+ * @offset2: offset of 2nd optional register
+ */
+struct adf_tl_dbg_counter {
+ const char *name;
+ enum adf_tl_counter_type type;
+ size_t offset1;
+ size_t offset2;
+};
+
+#define ADF_TL_COUNTER(_name, _type, _offset) \
+{ .name = _name, \
+ .type = _type, \
+ .offset1 = _offset \
+}
+
+#define ADF_TL_COUNTER_LATENCY(_name, _type, _offset1, _offset2) \
+{ .name = _name, \
+ .type = _type, \
+ .offset1 = _offset1, \
+ .offset2 = _offset2 \
+}
+
+/* Telemetry counter aggregated values. */
+struct adf_tl_dbg_aggr_values {
+ u64 curr;
+ u64 min;
+ u64 max;
+ u64 avg;
+};
+
+/**
+ * adf_tl_dbgfs_add() - Add telemetry's debug fs entries.
+ * @accel_dev: Pointer to acceleration device.
+ *
+ * Creates telemetry's debug fs folder and attributes in QAT debug fs root.
+ */
+void adf_tl_dbgfs_add(struct adf_accel_dev *accel_dev);
+
+/**
+ * adf_tl_dbgfs_rm() - Remove telemetry's debug fs entries.
+ * @accel_dev: Pointer to acceleration device.
+ *
+ * Removes telemetry's debug fs folder and attributes from QAT debug fs root.
+ */
+void adf_tl_dbgfs_rm(struct adf_accel_dev *accel_dev);
+
+#endif /* ADF_TL_DEBUGFS_H */
--
2.41.0


2023-12-22 10:38:28

by Lucas Segarra Fernandez

[permalink] [raw]
Subject: [PATCH v2 4/4] crypto: qat - add support for ring pair level telemetry

Expose through debugfs ring pair telemetry data for QAT GEN4 devices.

This allows to gather metrics about the PCIe channel and device TLB for
a selected ring pair. It is possible to monitor maximum 4 ring pairs at
the time per device.

For details, refer to debugfs-driver-qat_telemetry in Documentation/ABI.

This patch is based on earlier work done by Wojciech Ziemba.

Signed-off-by: Lucas Segarra Fernandez <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Damian Muszynski <[email protected]>
---
.../ABI/testing/debugfs-driver-qat_telemetry | 125 +++++++++++
.../intel/qat/qat_420xx/adf_420xx_hw_data.c | 1 +
.../intel/qat/qat_4xxx/adf_4xxx_hw_data.c | 1 +
.../intel/qat/qat_common/adf_accel_devices.h | 1 +
.../intel/qat/qat_common/adf_gen4_hw_data.h | 1 +
.../crypto/intel/qat/qat_common/adf_gen4_tl.c | 35 +++
.../crypto/intel/qat/qat_common/adf_gen4_tl.h | 41 +++-
.../intel/qat/qat_common/adf_telemetry.c | 23 +-
.../intel/qat/qat_common/adf_telemetry.h | 7 +
.../intel/qat/qat_common/adf_tl_debugfs.c | 208 ++++++++++++++++++
.../intel/qat/qat_common/adf_tl_debugfs.h | 11 +
11 files changed, 449 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/testing/debugfs-driver-qat_telemetry b/Documentation/ABI/testing/debugfs-driver-qat_telemetry
index 24532365387c..eacee2072088 100644
--- a/Documentation/ABI/testing/debugfs-driver-qat_telemetry
+++ b/Documentation/ABI/testing/debugfs-driver-qat_telemetry
@@ -101,3 +101,128 @@ Description: (RO) Reports device telemetry counters.
attribute is not reported.

This attribute is only available for qat_4xxx devices.
+
+What: /sys/kernel/debug/qat_<device>_<BDF>/telemetry/rp_<A/B/C/D>_data
+Date: March 2024
+KernelVersion: 6.8
+Contact: [email protected]
+Description: (RW) Selects up to 4 Ring Pairs (RP) to monitor, one per file,
+ and report telemetry counters related to each.
+
+ Allowed values to write:
+ ========================
+ * 0 to ``<num_rps - 1>``:
+ Ring pair to be monitored. The value of ``num_rps`` can be
+ retrieved through ``/sys/bus/pci/devices/<BDF>/qat/num_rps``.
+ See Documentation/ABI/testing/sysfs-driver-qat.
+
+ Reads report metrics about performance and utilization of
+ the selected RP:
+
+ ======================= ========================================
+ Field Description
+ ======================= ========================================
+ sample_cnt number of acquisitions of telemetry data
+ from the device. Reads are performed
+ every 1000 ms
+ rp_num RP number associated with slot <A/B/C/D>
+ service_type service associated to the RP
+ pci_trans_cnt number of PCIe partial transactions
+ gp_lat_acc_avg average get to put latency [ns]
+ bw_in PCIe, write bandwidth [Mbps]
+ bw_out PCIe, read bandwidth [Mbps]
+ at_glob_devtlb_hit Message descriptor DevTLB hit rate
+ at_glob_devtlb_miss Message descriptor DevTLB miss rate
+ tl_at_payld_devtlb_hit Payload DevTLB hit rate
+ tl_at_payld_devtlb_miss Payload DevTLB miss rate
+ ======================= ========================================
+
+ Example.
+
+ Writing the value '32' to the file ``rp_C_data`` starts the
+ collection of telemetry metrics for ring pair 32::
+
+ echo 32 > /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/rp_C_data
+
+ Once a ring pair is selected, statistics can be read accessing
+ the file::
+
+ cat /sys/kernel/debug/qat_4xxx_0000:6b:00.0/telemetry/rp_C_data
+
+ If ``control`` is set to 1, only the current values of the
+ counters are displayed::
+
+ <counter_name> <current>
+
+ If ``control`` is 2, 3 or 4, counters are displayed in the
+ following format::
+
+ <counter_name> <current> <min> <max> <avg>
+
+
+ On QAT GEN4 devices there are 64 RPs on a PF, so the allowed
+ values are 0..63. This number is absolute to the device.
+ If Virtual Functions (VF) are used, the ring pair number can
+ be derived from the Bus, Device, Function of the VF:
+
+ ============ ====== ====== ====== ======
+ PCI BDF/VF RP0 RP1 RP2 RP3
+ ============ ====== ====== ====== ======
+ 0000:6b:0.1 RP 0 RP 1 RP 2 RP 3
+ 0000:6b:0.2 RP 4 RP 5 RP 6 RP 7
+ 0000:6b:0.3 RP 8 RP 9 RP 10 RP 11
+ 0000:6b:0.4 RP 12 RP 13 RP 14 RP 15
+ 0000:6b:0.5 RP 16 RP 17 RP 18 RP 19
+ 0000:6b:0.6 RP 20 RP 21 RP 22 RP 23
+ 0000:6b:0.7 RP 24 RP 25 RP 26 RP 27
+ 0000:6b:1.0 RP 28 RP 29 RP 30 RP 31
+ 0000:6b:1.1 RP 32 RP 33 RP 34 RP 35
+ 0000:6b:1.2 RP 36 RP 37 RP 38 RP 39
+ 0000:6b:1.3 RP 40 RP 41 RP 42 RP 43
+ 0000:6b:1.4 RP 44 RP 45 RP 46 RP 47
+ 0000:6b:1.5 RP 48 RP 49 RP 50 RP 51
+ 0000:6b:1.6 RP 52 RP 53 RP 54 RP 55
+ 0000:6b:1.7 RP 56 RP 57 RP 58 RP 59
+ 0000:6b:2.0 RP 60 RP 61 RP 62 RP 63
+ ============ ====== ====== ====== ======
+
+ The mapping is only valid for the BDFs of VFs on the host.
+
+
+ The service provided on a ring-pair varies depending on the
+ configuration. The configuration for a given device can be
+ queried and set using ``cfg_services``.
+ See Documentation/ABI/testing/sysfs-driver-qat for details.
+
+ The following table reports how ring pairs are mapped to VFs
+ on the PF 0000:6b:0.0 configured for `sym;asym` or `asym;sym`:
+
+ =========== ============ =========== ============ ===========
+ PCI BDF/VF RP0/service RP1/service RP2/service RP3/service
+ =========== ============ =========== ============ ===========
+ 0000:6b:0.1 RP 0 asym RP 1 sym RP 2 asym RP 3 sym
+ 0000:6b:0.2 RP 4 asym RP 5 sym RP 6 asym RP 7 sym
+ 0000:6b:0.3 RP 8 asym RP 9 sym RP10 asym RP11 sym
+ ... ... ... ... ...
+ =========== ============ =========== ============ ===========
+
+ All VFs follow the same pattern.
+
+
+ The following table reports how ring pairs are mapped to VFs on
+ the PF 0000:6b:0.0 configured for `dc`:
+
+ =========== ============ =========== ============ ===========
+ PCI BDF/VF RP0/service RP1/service RP2/service RP3/service
+ =========== ============ =========== ============ ===========
+ 0000:6b:0.1 RP 0 dc RP 1 dc RP 2 dc RP 3 dc
+ 0000:6b:0.2 RP 4 dc RP 5 dc RP 6 dc RP 7 dc
+ 0000:6b:0.3 RP 8 dc RP 9 dc RP10 dc RP11 dc
+ ... ... ... ... ...
+ =========== ============ =========== ============ ===========
+
+ The mapping of a RP to a service can be retrieved using
+ ``rp2srv`` from sysfs.
+ See Documentation/ABI/testing/sysfs-driver-qat for details.
+
+ This attribute is only available for qat_4xxx devices.
diff --git a/drivers/crypto/intel/qat/qat_420xx/adf_420xx_hw_data.c b/drivers/crypto/intel/qat/qat_420xx/adf_420xx_hw_data.c
index a7730d8057d6..5edce27db864 100644
--- a/drivers/crypto/intel/qat/qat_420xx/adf_420xx_hw_data.c
+++ b/drivers/crypto/intel/qat/qat_420xx/adf_420xx_hw_data.c
@@ -520,6 +520,7 @@ void adf_init_hw_data_420xx(struct adf_hw_device_data *hw_data, u32 dev_id)
hw_data->init_device = adf_gen4_init_device;
hw_data->reset_device = adf_reset_flr;
hw_data->admin_ae_mask = ADF_420XX_ADMIN_AE_MASK;
+ hw_data->num_rps = ADF_GEN4_MAX_RPS;
hw_data->fw_name = ADF_420XX_FW;
hw_data->fw_mmp_name = ADF_420XX_MMP;
hw_data->uof_get_name = uof_get_name_420xx;
diff --git a/drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c b/drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c
index 73001b20cbfd..558caefd71b9 100644
--- a/drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c
+++ b/drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c
@@ -421,6 +421,7 @@ void adf_init_hw_data_4xxx(struct adf_hw_device_data *hw_data, u32 dev_id)
hw_data->init_device = adf_gen4_init_device;
hw_data->reset_device = adf_reset_flr;
hw_data->admin_ae_mask = ADF_4XXX_ADMIN_AE_MASK;
+ hw_data->num_rps = ADF_GEN4_MAX_RPS;
switch (dev_id) {
case ADF_402XX_PCI_DEVICE_ID:
hw_data->fw_name = ADF_402XX_FW;
diff --git a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
index b274ebc799c9..db671879b1f8 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
+++ b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
@@ -278,6 +278,7 @@ struct adf_hw_device_data {
u8 num_logical_accel;
u8 num_engines;
u32 num_hb_ctrs;
+ u8 num_rps;
};

/* CSR write macro */
diff --git a/drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.h b/drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.h
index 051ad20581a6..46a782ba456f 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.h
+++ b/drivers/crypto/intel/qat/qat_common/adf_gen4_hw_data.h
@@ -36,6 +36,7 @@
#define ADF_GEN4_MSIX_RTTABLE_OFFSET(i) (0x409000 + ((i) * 0x04))

/* Bank and ring configuration */
+#define ADF_GEN4_MAX_RPS 64
#define ADF_GEN4_NUM_RINGS_PER_BANK 2
#define ADF_GEN4_NUM_BANKS_PER_VF 4
#define ADF_GEN4_ETR_MAX_BANKS 64
diff --git a/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c b/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c
index 4efbe6bc651c..7fc7a77f6aed 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c
@@ -9,6 +9,8 @@

#define ADF_GEN4_TL_DEV_REG_OFF(reg) ADF_TL_DEV_REG_OFF(reg, gen4)

+#define ADF_GEN4_TL_RP_REG_OFF(reg) ADF_TL_RP_REG_OFF(reg, gen4)
+
#define ADF_GEN4_TL_SL_UTIL_COUNTER(_name) \
ADF_TL_COUNTER("util_" #_name, \
ADF_TL_SIMPLE_COUNT, \
@@ -101,11 +103,42 @@ static const struct adf_tl_dbg_counter sl_exec_counters[ADF_TL_SL_CNT_COUNT] = {
ADF_GEN4_TL_SL_EXEC_COUNTER(ath),
};

+/* Ring pair counters. */
+static const struct adf_tl_dbg_counter rp_counters[] = {
+ /* PCIe partial transactions. */
+ ADF_TL_COUNTER(PCI_TRANS_CNT_NAME, ADF_TL_SIMPLE_COUNT,
+ ADF_GEN4_TL_RP_REG_OFF(reg_tl_pci_trans_cnt)),
+ /* Get to put latency average[ns]. */
+ ADF_TL_COUNTER_LATENCY(LAT_ACC_NAME, ADF_TL_COUNTER_NS_AVG,
+ ADF_GEN4_TL_RP_REG_OFF(reg_tl_gp_lat_acc),
+ ADF_GEN4_TL_RP_REG_OFF(reg_tl_ae_put_cnt)),
+ /* PCIe write bandwidth[Mbps]. */
+ ADF_TL_COUNTER(BW_IN_NAME, ADF_TL_COUNTER_MBPS,
+ ADF_GEN4_TL_RP_REG_OFF(reg_tl_bw_in)),
+ /* PCIe read bandwidth[Mbps]. */
+ ADF_TL_COUNTER(BW_OUT_NAME, ADF_TL_COUNTER_MBPS,
+ ADF_GEN4_TL_RP_REG_OFF(reg_tl_bw_out)),
+ /* Message descriptor DevTLB hit rate. */
+ ADF_TL_COUNTER(AT_GLOB_DTLB_HIT_NAME, ADF_TL_SIMPLE_COUNT,
+ ADF_GEN4_TL_RP_REG_OFF(reg_tl_at_glob_devtlb_hit)),
+ /* Message descriptor DevTLB miss rate. */
+ ADF_TL_COUNTER(AT_GLOB_DTLB_MISS_NAME, ADF_TL_SIMPLE_COUNT,
+ ADF_GEN4_TL_RP_REG_OFF(reg_tl_at_glob_devtlb_miss)),
+ /* Payload DevTLB hit rate. */
+ ADF_TL_COUNTER(AT_PAYLD_DTLB_HIT_NAME, ADF_TL_SIMPLE_COUNT,
+ ADF_GEN4_TL_RP_REG_OFF(reg_tl_at_payld_devtlb_hit)),
+ /* Payload DevTLB miss rate. */
+ ADF_TL_COUNTER(AT_PAYLD_DTLB_MISS_NAME, ADF_TL_SIMPLE_COUNT,
+ ADF_GEN4_TL_RP_REG_OFF(reg_tl_at_payld_devtlb_miss)),
+};
+
void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data)
{
tl_data->layout_sz = ADF_GEN4_TL_LAYOUT_SZ;
tl_data->slice_reg_sz = ADF_GEN4_TL_SLICE_REG_SZ;
+ tl_data->rp_reg_sz = ADF_GEN4_TL_RP_REG_SZ;
tl_data->num_hbuff = ADF_GEN4_TL_NUM_HIST_BUFFS;
+ tl_data->max_rp = ADF_GEN4_TL_MAX_RP_NUM;
tl_data->msg_cnt_off = ADF_GEN4_TL_MSG_CNT_OFF;
tl_data->cpp_ns_per_cycle = ADF_GEN4_CPP_NS_PER_CYCLE;
tl_data->bw_units_to_bytes = ADF_GEN4_TL_BW_HW_UNITS_TO_BYTES;
@@ -114,5 +147,7 @@ void adf_gen4_init_tl_data(struct adf_tl_hw_data *tl_data)
tl_data->num_dev_counters = ARRAY_SIZE(dev_counters);
tl_data->sl_util_counters = sl_util_counters;
tl_data->sl_exec_counters = sl_exec_counters;
+ tl_data->rp_counters = rp_counters;
+ tl_data->num_rp_counters = ARRAY_SIZE(rp_counters);
}
EXPORT_SYMBOL_GPL(adf_gen4_init_tl_data);
diff --git a/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.h b/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.h
index feb2eecf24cf..32df4163beb9 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.h
+++ b/drivers/crypto/intel/qat/qat_common/adf_gen4_tl.h
@@ -21,6 +21,9 @@ struct adf_tl_hw_data;
/* Max number of HW resources of one type. */
#define ADF_GEN4_TL_MAX_SLICES_PER_TYPE 24

+/* Max number of simultaneously monitored ring pairs. */
+#define ADF_GEN4_TL_MAX_RP_NUM 4
+
/**
* struct adf_gen4_tl_slice_data_regs - HW slice data as populated by FW.
* @reg_tm_slice_exec_cnt: Slice execution count.
@@ -92,18 +95,52 @@ struct adf_gen4_tl_device_data_regs {
struct adf_gen4_tl_slice_data_regs wcp_slices[ADF_GEN4_TL_MAX_SLICES_PER_TYPE];
};

+/**
+ * struct adf_gen4_tl_ring_pair_data_regs - This structure stores Ring Pair
+ * telemetry counter values as are being populated periodically by device.
+ * @reg_tl_gp_lat_acc: get-put latency accumulator
+ * @reserved: reserved
+ * @reg_tl_pci_trans_cnt: PCIe partial transactions
+ * @reg_tl_ae_put_cnt: Accelerator Engine put counts across all rings
+ * @reg_tl_bw_in: PCIe write bandwidth
+ * @reg_tl_bw_out: PCIe read bandwidth
+ * @reg_tl_at_glob_devtlb_hit: Message descriptor DevTLB hit rate
+ * @reg_tl_at_glob_devtlb_miss: Message descriptor DevTLB miss rate
+ * @reg_tl_at_payld_devtlb_hit: Payload DevTLB hit rate
+ * @reg_tl_at_payld_devtlb_miss: Payload DevTLB miss rate
+ * @reg_tl_re_cnt: ring empty time samples count
+ * @reserved1: reserved
+ */
+struct adf_gen4_tl_ring_pair_data_regs {
+ __u64 reg_tl_gp_lat_acc;
+ __u64 reserved;
+ __u32 reg_tl_pci_trans_cnt;
+ __u32 reg_tl_ae_put_cnt;
+ __u32 reg_tl_bw_in;
+ __u32 reg_tl_bw_out;
+ __u32 reg_tl_at_glob_devtlb_hit;
+ __u32 reg_tl_at_glob_devtlb_miss;
+ __u32 reg_tl_at_payld_devtlb_hit;
+ __u32 reg_tl_at_payld_devtlb_miss;
+ __u32 reg_tl_re_cnt;
+ __u32 reserved1;
+};
+
+#define ADF_GEN4_TL_RP_REG_SZ sizeof(struct adf_gen4_tl_ring_pair_data_regs)
+
/**
* struct adf_gen4_tl_layout - This structure represents entire telemetry
* counters data: Device + 4 Ring Pairs as are being populated periodically
* by device.
* @tl_device_data_regs: structure of device telemetry registers
- * @reserved1: reserved
+ * @tl_ring_pairs_data_regs: array of ring pairs telemetry registers
* @reg_tl_msg_cnt: telemetry messages counter
* @reserved: reserved
*/
struct adf_gen4_tl_layout {
struct adf_gen4_tl_device_data_regs tl_device_data_regs;
- __u32 reserved1[14];
+ struct adf_gen4_tl_ring_pair_data_regs
+ tl_ring_pairs_data_regs[ADF_GEN4_TL_MAX_RP_NUM];
__u32 reg_tl_msg_cnt;
__u32 reserved;
};
diff --git a/drivers/crypto/intel/qat/qat_common/adf_telemetry.c b/drivers/crypto/intel/qat/qat_common/adf_telemetry.c
index 05c476d58895..2ff714d11bd2 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_telemetry.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_telemetry.c
@@ -33,7 +33,9 @@ static int validate_tl_data(struct adf_tl_hw_data *tl_data)
if (!tl_data->dev_counters ||
TL_IS_ZERO(tl_data->num_dev_counters) ||
!tl_data->sl_util_counters ||
- !tl_data->sl_exec_counters)
+ !tl_data->sl_exec_counters ||
+ !tl_data->rp_counters ||
+ TL_IS_ZERO(tl_data->num_rp_counters))
return -EOPNOTSUPP;

return 0;
@@ -53,11 +55,17 @@ static int adf_tl_alloc_mem(struct adf_accel_dev *accel_dev)
if (!telemetry)
return -ENOMEM;

+ telemetry->rp_num_indexes = kmalloc_array(tl_data->max_rp,
+ sizeof(*telemetry->rp_num_indexes),
+ GFP_KERNEL);
+ if (!telemetry->rp_num_indexes)
+ goto err_free_tl;
+
telemetry->regs_hist_buff = kmalloc_array(tl_data->num_hbuff,
sizeof(*telemetry->regs_hist_buff),
GFP_KERNEL);
if (!telemetry->regs_hist_buff)
- goto err_free_tl;
+ goto err_free_rp_indexes;

telemetry->regs_data = dma_alloc_coherent(dev, regs_sz,
&telemetry->regs_data_p,
@@ -86,6 +94,8 @@ static int adf_tl_alloc_mem(struct adf_accel_dev *accel_dev)

err_free_regs_hist_buff:
kfree(telemetry->regs_hist_buff);
+err_free_rp_indexes:
+ kfree(telemetry->rp_num_indexes);
err_free_tl:
kfree(telemetry);

@@ -107,6 +117,7 @@ static void adf_tl_free_mem(struct adf_accel_dev *accel_dev)
telemetry->regs_data_p);

kfree(telemetry->regs_hist_buff);
+ kfree(telemetry->rp_num_indexes);
kfree(telemetry);
accel_dev->telemetry = NULL;
}
@@ -196,7 +207,8 @@ int adf_tl_run(struct adf_accel_dev *accel_dev, int state)
int ret;

ret = adf_send_admin_tl_start(accel_dev, telemetry->regs_data_p,
- layout_sz, NULL, &telemetry->slice_cnt);
+ layout_sz, telemetry->rp_num_indexes,
+ &telemetry->slice_cnt);
if (ret) {
dev_err(dev, "failed to start telemetry\n");
return ret;
@@ -213,8 +225,10 @@ int adf_tl_run(struct adf_accel_dev *accel_dev, int state)
int adf_tl_init(struct adf_accel_dev *accel_dev)
{
struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
+ u8 max_rp = GET_TL_DATA(accel_dev).max_rp;
struct device *dev = &GET_DEV(accel_dev);
struct adf_telemetry *telemetry;
+ unsigned int i;
int ret;

ret = validate_tl_data(tl_data);
@@ -234,6 +248,9 @@ int adf_tl_init(struct adf_accel_dev *accel_dev)
mutex_init(&telemetry->regs_hist_lock);
INIT_DELAYED_WORK(&telemetry->work_ctx, tl_work_handler);

+ for (i = 0; i < max_rp; i++)
+ telemetry->rp_num_indexes[i] = ADF_TL_RP_REGS_DISABLED;
+
return 0;
}

diff --git a/drivers/crypto/intel/qat/qat_common/adf_telemetry.h b/drivers/crypto/intel/qat/qat_common/adf_telemetry.h
index 08de17621467..9be81cd3b886 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_telemetry.h
+++ b/drivers/crypto/intel/qat/qat_common/adf_telemetry.h
@@ -23,17 +23,23 @@ struct dentry;
/* Interval within timer interrupt should be handled. Value in milliseconds. */
#define ADF_TL_TIMER_INT_MS (ADF_TL_DATA_WR_INTERVAL_MS / 2)

+#define ADF_TL_RP_REGS_DISABLED (0xff)
+
struct adf_tl_hw_data {
size_t layout_sz;
size_t slice_reg_sz;
+ size_t rp_reg_sz;
size_t msg_cnt_off;
const struct adf_tl_dbg_counter *dev_counters;
const struct adf_tl_dbg_counter *sl_util_counters;
const struct adf_tl_dbg_counter *sl_exec_counters;
+ const struct adf_tl_dbg_counter *rp_counters;
u8 num_hbuff;
u8 cpp_ns_per_cycle;
u8 bw_units_to_bytes;
u8 num_dev_counters;
+ u8 num_rp_counters;
+ u8 max_rp;
};

struct adf_telemetry {
@@ -50,6 +56,7 @@ struct adf_telemetry {
*/
void **regs_hist_buff;
struct dentry *dbg_dir;
+ u8 *rp_num_indexes;
/**
* @regs_hist_lock: protects from race conditions between write and read
* to the copies referenced by @regs_hist_buff
diff --git a/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c b/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c
index accb46d6ea3c..c8241f5a0a26 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c
@@ -6,6 +6,7 @@
#include <linux/debugfs.h>
#include <linux/dev_printk.h>
#include <linux/dcache.h>
+#include <linux/file.h>
#include <linux/kernel.h>
#include <linux/math64.h>
#include <linux/mutex.h>
@@ -14,11 +15,13 @@
#include <linux/units.h>

#include "adf_accel_devices.h"
+#include "adf_cfg_strings.h"
#include "adf_telemetry.h"
#include "adf_tl_debugfs.h"

#define TL_VALUE_MIN_PADDING 20
#define TL_KEY_MIN_PADDING 23
+#define TL_RP_SRV_UNKNOWN "Unknown"

static int tl_collect_values_u32(struct adf_telemetry *telemetry,
size_t counter_offset, u64 *arr)
@@ -470,11 +473,210 @@ static ssize_t tl_control_write(struct file *file, const char __user *userbuf,
}
DEFINE_SHOW_STORE_ATTRIBUTE(tl_control);

+static int get_rp_index_from_file(const struct file *f, u8 *rp_id, u8 rp_num)
+{
+ char alpha;
+ u8 index;
+ int ret;
+
+ ret = sscanf(f->f_path.dentry->d_name.name, ADF_TL_RP_REGS_FNAME, &alpha);
+ if (ret != 1)
+ return -EINVAL;
+
+ index = ADF_TL_DBG_RP_INDEX_ALPHA(alpha);
+ *rp_id = index;
+
+ return 0;
+}
+
+static int adf_tl_dbg_change_rp_index(struct adf_accel_dev *accel_dev,
+ unsigned int new_rp_num,
+ unsigned int rp_regs_index)
+{
+ struct adf_hw_device_data *hw_data = GET_HW_DATA(accel_dev);
+ struct adf_telemetry *telemetry = accel_dev->telemetry;
+ struct device *dev = &GET_DEV(accel_dev);
+ unsigned int i;
+ u8 curr_state;
+ int ret;
+
+ if (new_rp_num >= hw_data->num_rps) {
+ dev_info(dev, "invalid Ring Pair number selected\n");
+ return -EINVAL;
+ }
+
+ for (i = 0; i < hw_data->tl_data.max_rp; i++) {
+ if (telemetry->rp_num_indexes[i] == new_rp_num) {
+ dev_info(dev, "RP nr: %d is already selected in slot rp_%c_data\n",
+ new_rp_num, ADF_TL_DBG_RP_ALPHA_INDEX(i));
+ return 0;
+ }
+ }
+
+ dev_dbg(dev, "selecting RP nr %u into slot rp_%c_data\n",
+ new_rp_num, ADF_TL_DBG_RP_ALPHA_INDEX(rp_regs_index));
+
+ curr_state = atomic_read(&telemetry->state);
+
+ if (curr_state) {
+ ret = adf_tl_halt(accel_dev);
+ if (ret)
+ return ret;
+
+ telemetry->rp_num_indexes[rp_regs_index] = new_rp_num;
+
+ ret = adf_tl_run(accel_dev, curr_state);
+ if (ret)
+ return ret;
+ } else {
+ telemetry->rp_num_indexes[rp_regs_index] = new_rp_num;
+ }
+
+ return 0;
+}
+
+static void tl_print_rp_srv(struct adf_accel_dev *accel_dev, struct seq_file *s,
+ u8 rp_idx)
+{
+ u32 banks_per_vf = GET_HW_DATA(accel_dev)->num_banks_per_vf;
+ enum adf_cfg_service_type svc;
+
+ seq_printf(s, "%-*s", TL_KEY_MIN_PADDING, RP_SERVICE_TYPE);
+
+ svc = GET_SRV_TYPE(accel_dev, rp_idx % banks_per_vf);
+ switch (svc) {
+ case COMP:
+ seq_printf(s, "%*s\n", TL_VALUE_MIN_PADDING, ADF_CFG_DC);
+ break;
+ case SYM:
+ seq_printf(s, "%*s\n", TL_VALUE_MIN_PADDING, ADF_CFG_SYM);
+ break;
+ case ASYM:
+ seq_printf(s, "%*s\n", TL_VALUE_MIN_PADDING, ADF_CFG_ASYM);
+ break;
+ default:
+ seq_printf(s, "%*s\n", TL_VALUE_MIN_PADDING, TL_RP_SRV_UNKNOWN);
+ break;
+ }
+}
+
+static int tl_print_rp_data(struct adf_accel_dev *accel_dev, struct seq_file *s,
+ u8 rp_regs_index)
+{
+ struct adf_tl_hw_data *tl_data = &GET_TL_DATA(accel_dev);
+ struct adf_telemetry *telemetry = accel_dev->telemetry;
+ const struct adf_tl_dbg_counter *rp_tl_counters;
+ u8 num_rp_counters = tl_data->num_rp_counters;
+ size_t rp_regs_sz = tl_data->rp_reg_sz;
+ struct adf_tl_dbg_counter ctr;
+ unsigned int i;
+ u8 rp_idx;
+ int ret;
+
+ if (!atomic_read(&telemetry->state)) {
+ dev_info(&GET_DEV(accel_dev), "not enabled\n");
+ return -EPERM;
+ }
+
+ rp_tl_counters = tl_data->rp_counters;
+ rp_idx = telemetry->rp_num_indexes[rp_regs_index];
+
+ if (rp_idx == ADF_TL_RP_REGS_DISABLED) {
+ dev_info(&GET_DEV(accel_dev), "no RP number selected in rp_%c_data\n",
+ ADF_TL_DBG_RP_ALPHA_INDEX(rp_regs_index));
+ return -EPERM;
+ }
+
+ tl_print_msg_cnt(s, telemetry->msg_cnt);
+ seq_printf(s, "%-*s", TL_KEY_MIN_PADDING, RP_NUM_INDEX);
+ seq_printf(s, "%*d\n", TL_VALUE_MIN_PADDING, rp_idx);
+ tl_print_rp_srv(accel_dev, s, rp_idx);
+
+ for (i = 0; i < num_rp_counters; i++) {
+ ctr = rp_tl_counters[i];
+ ctr.offset1 += rp_regs_sz * rp_regs_index;
+ ctr.offset2 += rp_regs_sz * rp_regs_index;
+ ret = tl_calc_and_print_counter(telemetry, s, &ctr, NULL);
+ if (ret) {
+ dev_dbg(&GET_DEV(accel_dev),
+ "invalid RP counter type\n");
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+static int tl_rp_data_show(struct seq_file *s, void *unused)
+{
+ struct adf_accel_dev *accel_dev = s->private;
+ u8 rp_regs_index;
+ u8 max_rp;
+ int ret;
+
+ if (!accel_dev)
+ return -EINVAL;
+
+ max_rp = GET_TL_DATA(accel_dev).max_rp;
+ ret = get_rp_index_from_file(s->file, &rp_regs_index, max_rp);
+ if (ret) {
+ dev_dbg(&GET_DEV(accel_dev), "invalid RP data file name\n");
+ return ret;
+ }
+
+ return tl_print_rp_data(accel_dev, s, rp_regs_index);
+}
+
+static ssize_t tl_rp_data_write(struct file *file, const char __user *userbuf,
+ size_t count, loff_t *ppos)
+{
+ struct seq_file *seq_f = file->private_data;
+ struct adf_accel_dev *accel_dev;
+ struct adf_telemetry *telemetry;
+ unsigned int new_rp_num;
+ u8 rp_regs_index;
+ u8 max_rp;
+ int ret;
+
+ accel_dev = seq_f->private;
+ if (!accel_dev)
+ return -EINVAL;
+
+ telemetry = accel_dev->telemetry;
+ max_rp = GET_TL_DATA(accel_dev).max_rp;
+
+ mutex_lock(&telemetry->wr_lock);
+
+ ret = get_rp_index_from_file(file, &rp_regs_index, max_rp);
+ if (ret) {
+ dev_dbg(&GET_DEV(accel_dev), "invalid RP data file name\n");
+ goto unlock_and_exit;
+ }
+
+ ret = kstrtou32_from_user(userbuf, count, 10, &new_rp_num);
+ if (ret)
+ goto unlock_and_exit;
+
+ ret = adf_tl_dbg_change_rp_index(accel_dev, new_rp_num, rp_regs_index);
+ if (ret)
+ goto unlock_and_exit;
+
+ ret = count;
+
+unlock_and_exit:
+ mutex_unlock(&telemetry->wr_lock);
+ return ret;
+}
+DEFINE_SHOW_STORE_ATTRIBUTE(tl_rp_data);
+
void adf_tl_dbgfs_add(struct adf_accel_dev *accel_dev)
{
struct adf_telemetry *telemetry = accel_dev->telemetry;
struct dentry *parent = accel_dev->debugfs_dir;
+ u8 max_rp = GET_TL_DATA(accel_dev).max_rp;
+ char name[ADF_TL_RP_REGS_FNAME_SIZE];
struct dentry *dir;
+ unsigned int i;

if (!telemetry)
return;
@@ -483,6 +685,12 @@ void adf_tl_dbgfs_add(struct adf_accel_dev *accel_dev)
accel_dev->telemetry->dbg_dir = dir;
debugfs_create_file("device_data", 0444, dir, accel_dev, &tl_dev_data_fops);
debugfs_create_file("control", 0644, dir, accel_dev, &tl_control_fops);
+
+ for (i = 0; i < max_rp; i++) {
+ snprintf(name, sizeof(name), ADF_TL_RP_REGS_FNAME,
+ ADF_TL_DBG_RP_ALPHA_INDEX(i));
+ debugfs_create_file(name, 0644, dir, accel_dev, &tl_rp_data_fops);
+ }
}

void adf_tl_dbgfs_rm(struct adf_accel_dev *accel_dev)
diff --git a/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.h b/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.h
index b2e8f1912c16..11cc9eae19b3 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.h
+++ b/drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.h
@@ -24,6 +24,13 @@ struct adf_accel_dev;
#define AT_GLOB_DTLB_MISS_NAME "at_glob_devtlb_miss"
#define AT_PAYLD_DTLB_HIT_NAME "tl_at_payld_devtlb_hit"
#define AT_PAYLD_DTLB_MISS_NAME "tl_at_payld_devtlb_miss"
+#define RP_SERVICE_TYPE "service_type"
+
+#define ADF_TL_DBG_RP_ALPHA_INDEX(index) ((index) + 'A')
+#define ADF_TL_DBG_RP_INDEX_ALPHA(alpha) ((alpha) - 'A')
+
+#define ADF_TL_RP_REGS_FNAME "rp_%c_data"
+#define ADF_TL_RP_REGS_FNAME_SIZE 16

#define ADF_TL_DATA_REG_OFF(reg, qat_gen) \
offsetof(struct adf_##qat_gen##_tl_layout, reg)
@@ -36,6 +43,10 @@ struct adf_accel_dev;
(ADF_TL_DEV_REG_OFF(slice##_slices[0], qat_gen) + \
offsetof(struct adf_##qat_gen##_tl_slice_data_regs, reg))

+#define ADF_TL_RP_REG_OFF(reg, qat_gen) \
+ (ADF_TL_DATA_REG_OFF(tl_ring_pairs_data_regs[0], qat_gen) + \
+ offsetof(struct adf_##qat_gen##_tl_ring_pair_data_regs, reg))
+
/**
* enum adf_tl_counter_type - telemetry counter types
* @ADF_TL_COUNTER_UNSUPPORTED: unsupported counter
--
2.41.0


2023-12-29 03:29:41

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] crypto: qat - enable telemetry for QAT GEN 4

On Fri, Dec 22, 2023 at 11:35:04AM +0100, Lucas Segarra Fernandez wrote:
> Expose through debugfs telemetry data for QAT GEN4 devices.
>
> This allows to gather metrics about the performance and the utilization
> of a QAT device and/or a group of ring pairs. In particular, statistics
> on (1) the utilization of the PCIe channel, (2) address translation and
> device TLB, when SVA is enabled and (3) the internal engines for crypto
> and data compression.
>
> The device periodically gathers telemetry data from hardware registers
> and writes it into a DMA memory region which is sampled by the driver.
> The driver then uses this data to compute basic metrics on the counters
> and exposes them through debugfs attributes in the folder
> /sys/kernel/debug/qat_<device>_<BDF>/telemetry.
>
> Here is a summary of the changes:
> * Patch #1 adds an helper function to math.h to compute the average of
> values within an array.
> * Patch #2 includes a missing header in the file adf_accel_devices.h to
> allow to use the macro GET_DEV().
> * Patch #3 introduces device level telemetry and the associated documentation
> in /Documentation/ABI.
> * Patch #4 extends #3 by introducing ring pair level telemetry and
> documentation about it.
>
> This set is based on earlier work done by Wojciech Ziemba.
>
> ---
> v1 -> v2:
> - define avg_array() in the C file where it is used
> - set `accel_dev->telemetry` to NULL in adf_tl_free_mem()
> - add ring pair service type info to debugfs telemetry/rp_<X>_data output
> ---
>
> Lucas Segarra Fernandez (4):
> crypto: qat - include pci.h for GET_DEV()
> crypto: qat - add admin msgs for telemetry
> crypto: qat - add support for device telemetry
> crypto: qat - add support for ring pair level telemetry
>
> .../ABI/testing/debugfs-driver-qat_telemetry | 228 ++++++
> .../intel/qat/qat_420xx/adf_420xx_hw_data.c | 3 +
> .../intel/qat/qat_4xxx/adf_4xxx_hw_data.c | 3 +
> drivers/crypto/intel/qat/qat_common/Makefile | 3 +
> .../intel/qat/qat_common/adf_accel_devices.h | 6 +
> .../crypto/intel/qat/qat_common/adf_admin.c | 37 +
> .../crypto/intel/qat/qat_common/adf_admin.h | 4 +
> .../crypto/intel/qat/qat_common/adf_dbgfs.c | 3 +
> .../intel/qat/qat_common/adf_gen4_hw_data.h | 1 +
> .../crypto/intel/qat/qat_common/adf_gen4_tl.c | 153 ++++
> .../crypto/intel/qat/qat_common/adf_gen4_tl.h | 158 ++++
> .../crypto/intel/qat/qat_common/adf_init.c | 12 +
> .../intel/qat/qat_common/adf_telemetry.c | 288 +++++++
> .../intel/qat/qat_common/adf_telemetry.h | 99 +++
> .../intel/qat/qat_common/adf_tl_debugfs.c | 710 ++++++++++++++++++
> .../intel/qat/qat_common/adf_tl_debugfs.h | 117 +++
> .../qat/qat_common/icp_qat_fw_init_admin.h | 10 +
> 17 files changed, 1835 insertions(+)
> create mode 100644 Documentation/ABI/testing/debugfs-driver-qat_telemetry
> create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_tl.c
> create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_tl.h
> create mode 100644 drivers/crypto/intel/qat/qat_common/adf_telemetry.c
> create mode 100644 drivers/crypto/intel/qat/qat_common/adf_telemetry.h
> create mode 100644 drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.c
> create mode 100644 drivers/crypto/intel/qat/qat_common/adf_tl_debugfs.h
>
>
> base-commit: b4719435c14199548ed49f036a7c31040a6b5353
> --
> 2.41.0

All applied. Thanks.
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt