2023-06-20 13:39:54

by Damian Muszynski

[permalink] [raw]
Subject: [PATCH v2 0/5] crypto: qat - add heartbeat feature

This set introduces support for the QAT heartbeat feature. It allows
detection whenever device firmware or acceleration unit will hang.
We're adding this feature to allow our clients having a tool with
they could verify if all of the Quick Assist hardware resources are
healthy and operational.

QAT device firmware periodically writes counters to a specified physical
memory location. A pair of counters per thread is incremented at
the start and end of the main processing loop within the firmware.
Checking for Heartbeat consists of checking the validity of the pair
of counter values for each thread. Stagnant counters indicate
a firmware hang.

The first patch adds timestamp synchronization with the firmware.
The second patch removes historical and never used HB definitions.
Patch no. 3 is implementing the hardware clock frequency measuring
interface.
The fourth introduces the main heartbeat implementation with the debugfs
interface.
The last patch implements an algorithm that allows the code to detect
which version of heartbeat API is used at the currently loaded firmware.

Signed-off-by: Damian Muszynski <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>

Changes since v1:
- fixed build errors on a few of architectures - replaced macro
DIV_ROUND_CLOSEST with DIV_ROUND_CLOSEST_ULL
- included prerequisite patch "add internal timer for qat 4xxx" which initially
was sent separately as this patchset was still in developement.
- timer patch reworked to use delayed work as suggested by Herbert Xu

Damian Muszynski (5):
crypto: qat - add internal timer for qat 4xxx
crypto: qat - drop obsolete heartbeat interface
crypto: qat - add measure clock frequency
crypto: qat - add heartbeat feature
crypto: qat - add heartbeat counters check

Documentation/ABI/testing/debugfs-driver-qat | 51 +++
.../intel/qat/qat_4xxx/adf_4xxx_hw_data.c | 14 +
.../intel/qat/qat_4xxx/adf_4xxx_hw_data.h | 4 +
drivers/crypto/intel/qat/qat_4xxx/adf_drv.c | 3 +
.../intel/qat/qat_c3xxx/adf_c3xxx_hw_data.c | 28 ++
.../intel/qat/qat_c3xxx/adf_c3xxx_hw_data.h | 7 +
.../intel/qat/qat_c62x/adf_c62x_hw_data.c | 28 ++
.../intel/qat/qat_c62x/adf_c62x_hw_data.h | 7 +
drivers/crypto/intel/qat/qat_common/Makefile | 4 +
.../intel/qat/qat_common/adf_accel_devices.h | 13 +
.../crypto/intel/qat/qat_common/adf_admin.c | 43 +++
.../intel/qat/qat_common/adf_cfg_strings.h | 2 +
.../crypto/intel/qat/qat_common/adf_clock.c | 127 +++++++
.../crypto/intel/qat/qat_common/adf_clock.h | 14 +
.../intel/qat/qat_common/adf_common_drv.h | 5 +
.../crypto/intel/qat/qat_common/adf_dbgfs.c | 9 +-
.../intel/qat/qat_common/adf_gen2_config.c | 7 +
.../intel/qat/qat_common/adf_gen2_hw_data.h | 3 +
.../intel/qat/qat_common/adf_gen4_hw_data.h | 3 +
.../intel/qat/qat_common/adf_gen4_timer.c | 70 ++++
.../intel/qat/qat_common/adf_gen4_timer.h | 21 ++
.../intel/qat/qat_common/adf_heartbeat.c | 336 ++++++++++++++++++
.../intel/qat/qat_common/adf_heartbeat.h | 79 ++++
.../qat/qat_common/adf_heartbeat_dbgfs.c | 194 ++++++++++
.../qat/qat_common/adf_heartbeat_dbgfs.h | 12 +
.../crypto/intel/qat/qat_common/adf_init.c | 28 ++
drivers/crypto/intel/qat/qat_common/adf_isr.c | 6 +
.../qat/qat_common/icp_qat_fw_init_admin.h | 23 +-
.../qat/qat_dh895xcc/adf_dh895xcc_hw_data.c | 13 +
.../qat/qat_dh895xcc/adf_dh895xcc_hw_data.h | 5 +
30 files changed, 1143 insertions(+), 16 deletions(-)
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_clock.c
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_clock.h
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_timer.c
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_timer.h
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat.c
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat.h
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.c
create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.h

--
2.40.1



2023-06-20 13:40:17

by Damian Muszynski

[permalink] [raw]
Subject: [PATCH v2 5/5] crypto: qat - add heartbeat counters check

A firmware update for QAT GEN2 changed the format of a data
structure used to report the heartbeat counters.

To support all firmware versions, extend the heartbeat logic
with an algorithm that detects the number of counters returned
by firmware. The algorithm detects the number of counters to
be used (and size of the corresponding data structure) by the
comparison the expected size of the data in memory, with the data
which was written by the firmware.

Firmware detection is done one time during the first read of heartbeat
debugfs file to avoid increasing the time needed to load the module.

Signed-off-by: Damian Muszynski <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
---
.../intel/qat/qat_c3xxx/adf_c3xxx_hw_data.c | 2 +
.../intel/qat/qat_c62x/adf_c62x_hw_data.c | 2 +
.../intel/qat/qat_common/adf_accel_devices.h | 1 +
.../intel/qat/qat_common/adf_heartbeat.c | 68 +++++++++++++++++++
.../intel/qat/qat_common/adf_heartbeat.h | 6 ++
.../qat/qat_dh895xcc/adf_dh895xcc_hw_data.c | 2 +
6 files changed, 81 insertions(+)

diff --git a/drivers/crypto/intel/qat/qat_c3xxx/adf_c3xxx_hw_data.c b/drivers/crypto/intel/qat/qat_c3xxx/adf_c3xxx_hw_data.c
index e81d11409426..9c00c441b602 100644
--- a/drivers/crypto/intel/qat/qat_c3xxx/adf_c3xxx_hw_data.c
+++ b/drivers/crypto/intel/qat/qat_c3xxx/adf_c3xxx_hw_data.c
@@ -8,6 +8,7 @@
#include <adf_gen2_hw_data.h>
#include <adf_gen2_pfvf.h>
#include "adf_c3xxx_hw_data.h"
+#include "adf_heartbeat.h"
#include "icp_qat_hw.h"

/* Worker thread to service arbiter mappings */
@@ -153,6 +154,7 @@ void adf_init_hw_data_c3xxx(struct adf_hw_device_data *hw_data)
hw_data->measure_clock = measure_clock;
hw_data->get_hb_clock = get_ts_clock;
hw_data->num_hb_ctrs = ADF_NUM_HB_CNT_PER_AE;
+ hw_data->check_hb_ctrs = adf_heartbeat_check_ctrs;

adf_gen2_init_pf_pfvf_ops(&hw_data->pfvf_ops);
adf_gen2_init_hw_csr_ops(&hw_data->csr_ops);
diff --git a/drivers/crypto/intel/qat/qat_c62x/adf_c62x_hw_data.c b/drivers/crypto/intel/qat/qat_c62x/adf_c62x_hw_data.c
index 1a8c8e3a48e9..355a781693eb 100644
--- a/drivers/crypto/intel/qat/qat_c62x/adf_c62x_hw_data.c
+++ b/drivers/crypto/intel/qat/qat_c62x/adf_c62x_hw_data.c
@@ -8,6 +8,7 @@
#include <adf_gen2_hw_data.h>
#include <adf_gen2_pfvf.h>
#include "adf_c62x_hw_data.h"
+#include "adf_heartbeat.h"
#include "icp_qat_hw.h"

/* Worker thread to service arbiter mappings */
@@ -155,6 +156,7 @@ void adf_init_hw_data_c62x(struct adf_hw_device_data *hw_data)
hw_data->measure_clock = measure_clock;
hw_data->get_hb_clock = get_ts_clock;
hw_data->num_hb_ctrs = ADF_NUM_HB_CNT_PER_AE;
+ hw_data->check_hb_ctrs = adf_heartbeat_check_ctrs;

adf_gen2_init_pf_pfvf_ops(&hw_data->pfvf_ops);
adf_gen2_init_hw_csr_ops(&hw_data->csr_ops);
diff --git a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
index ab897e1717e0..e57abde66f4f 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
+++ b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h
@@ -190,6 +190,7 @@ struct adf_hw_device_data {
int (*send_admin_init)(struct adf_accel_dev *accel_dev);
int (*start_timer)(struct adf_accel_dev *accel_dev);
void (*stop_timer)(struct adf_accel_dev *accel_dev);
+ void (*check_hb_ctrs)(struct adf_accel_dev *accel_dev);
uint32_t (*get_hb_clock)(struct adf_hw_device_data *self);
int (*measure_clock)(struct adf_accel_dev *accel_dev);
int (*init_arb)(struct adf_accel_dev *accel_dev);
diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c
index 7358aac8e56d..beef9a5f6c75 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c
@@ -20,6 +20,8 @@
#include "adf_transport_internal.h"
#include "icp_qat_fw_init_admin.h"

+#define ADF_HB_EMPTY_SIG 0xA5A5A5A5
+
/* Heartbeat counter pair */
struct hb_cnt_pair {
__u16 resp_heartbeat_cnt;
@@ -42,6 +44,57 @@ static int adf_hb_check_polling_freq(struct adf_accel_dev *accel_dev)
return 0;
}

+/**
+ * validate_hb_ctrs_cnt() - checks if the number of heartbeat counters should
+ * be updated by one to support the currently loaded firmware.
+ * @accel_dev: Pointer to acceleration device.
+ *
+ * Return:
+ * * true - hb_ctrs must increased by ADF_NUM_PKE_STRAND
+ * * false - no changes needed
+ */
+static bool validate_hb_ctrs_cnt(struct adf_accel_dev *accel_dev)
+{
+ const size_t hb_ctrs = accel_dev->hw_device->num_hb_ctrs;
+ const size_t max_aes = accel_dev->hw_device->num_engines;
+ const size_t hb_struct_size = sizeof(struct hb_cnt_pair);
+ const size_t exp_diff_size = array3_size(ADF_NUM_PKE_STRAND, max_aes,
+ hb_struct_size);
+ const size_t dev_ctrs = size_mul(max_aes, hb_ctrs);
+ const size_t stats_size = size_mul(dev_ctrs, hb_struct_size);
+ const u32 exp_diff_cnt = exp_diff_size / sizeof(u32);
+ const u32 stats_el_cnt = stats_size / sizeof(u32);
+ struct hb_cnt_pair *hb_stats = accel_dev->heartbeat->dma.virt_addr;
+ const u32 *mem_to_chk = (u32 *)(hb_stats + dev_ctrs);
+ u32 el_diff_cnt = 0;
+ int i;
+
+ /* count how many bytes are different from pattern */
+ for (i = 0; i < stats_el_cnt; i++) {
+ if (mem_to_chk[i] == ADF_HB_EMPTY_SIG)
+ break;
+
+ el_diff_cnt++;
+ }
+
+ return el_diff_cnt && el_diff_cnt == exp_diff_cnt;
+}
+
+void adf_heartbeat_check_ctrs(struct adf_accel_dev *accel_dev)
+{
+ struct hb_cnt_pair *hb_stats = accel_dev->heartbeat->dma.virt_addr;
+ const size_t hb_ctrs = accel_dev->hw_device->num_hb_ctrs;
+ const size_t max_aes = accel_dev->hw_device->num_engines;
+ const size_t dev_ctrs = size_mul(max_aes, hb_ctrs);
+ const size_t stats_size = size_mul(dev_ctrs, sizeof(struct hb_cnt_pair));
+ const size_t mem_items_to_fill = size_mul(stats_size, 2) / sizeof(u32);
+
+ /* fill hb stats memory with pattern */
+ memset32((uint32_t *)hb_stats, ADF_HB_EMPTY_SIG, mem_items_to_fill);
+ accel_dev->heartbeat->ctrs_cnt_checked = false;
+}
+EXPORT_SYMBOL_GPL(adf_heartbeat_check_ctrs);
+
static int get_timer_ticks(struct adf_accel_dev *accel_dev, unsigned int *value)
{
char timer_str[ADF_CFG_MAX_VAL_LEN_IN_BYTES] = { };
@@ -123,6 +176,13 @@ static int adf_hb_get_status(struct adf_accel_dev *accel_dev)
size_t ae = 0;
int ret = 0;

+ if (!accel_dev->heartbeat->ctrs_cnt_checked) {
+ if (validate_hb_ctrs_cnt(accel_dev))
+ hw_device->num_hb_ctrs += ADF_NUM_PKE_STRAND;
+
+ accel_dev->heartbeat->ctrs_cnt_checked = true;
+ }
+
live_stats = accel_dev->heartbeat->dma.virt_addr;
last_stats = live_stats + dev_ctrs;
count_fails = (u16 *)(last_stats + dev_ctrs);
@@ -221,6 +281,11 @@ int adf_heartbeat_init(struct adf_accel_dev *accel_dev)
if (!hb->dma.virt_addr)
goto err_free;

+ /*
+ * Default set this flag as true to avoid unnecessary checks,
+ * it will be reset on platforms that need such a check
+ */
+ hb->ctrs_cnt_checked = true;
accel_dev->heartbeat = hb;

return 0;
@@ -241,6 +306,9 @@ int adf_heartbeat_start(struct adf_accel_dev *accel_dev)
return -EFAULT;
}

+ if (accel_dev->hw_device->check_hb_ctrs)
+ accel_dev->hw_device->check_hb_ctrs(accel_dev);
+
ret = get_timer_ticks(accel_dev, &timer_ticks);
if (ret)
return ret;
diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h
index 297147f44150..b22e3cb29798 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h
+++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h
@@ -24,6 +24,7 @@ struct adf_heartbeat {
unsigned int hb_failed_counter;
unsigned int hb_timer;
u64 last_hb_check_time;
+ bool ctrs_cnt_checked;
struct hb_dma_addr {
dma_addr_t phy_addr;
void *virt_addr;
@@ -48,6 +49,7 @@ int adf_heartbeat_save_cfg_param(struct adf_accel_dev *accel_dev,
unsigned int timer_ms);
void adf_heartbeat_status(struct adf_accel_dev *accel_dev,
enum adf_device_heartbeat_status *hb_status);
+void adf_heartbeat_check_ctrs(struct adf_accel_dev *accel_dev);

#else
static inline int adf_heartbeat_init(struct adf_accel_dev *accel_dev)
@@ -69,5 +71,9 @@ static inline int adf_heartbeat_save_cfg_param(struct adf_accel_dev *accel_dev,
{
return 0;
}
+
+static inline void adf_heartbeat_check_ctrs(struct adf_accel_dev *accel_dev)
+{
+}
#endif
#endif /* ADF_HEARTBEAT_H_ */
diff --git a/drivers/crypto/intel/qat/qat_dh895xcc/adf_dh895xcc_hw_data.c b/drivers/crypto/intel/qat/qat_dh895xcc/adf_dh895xcc_hw_data.c
index 8fbab905c5cc..09551f949126 100644
--- a/drivers/crypto/intel/qat/qat_dh895xcc/adf_dh895xcc_hw_data.c
+++ b/drivers/crypto/intel/qat/qat_dh895xcc/adf_dh895xcc_hw_data.c
@@ -7,6 +7,7 @@
#include <adf_gen2_hw_data.h>
#include <adf_gen2_pfvf.h>
#include "adf_dh895xcc_hw_data.h"
+#include "adf_heartbeat.h"
#include "icp_qat_hw.h"

#define ADF_DH895XCC_VF_MSK 0xFFFFFFFF
@@ -248,6 +249,7 @@ void adf_init_hw_data_dh895xcc(struct adf_hw_device_data *hw_data)
hw_data->clock_frequency = ADF_DH895X_AE_FREQ;
hw_data->get_hb_clock = get_ts_clock;
hw_data->num_hb_ctrs = ADF_NUM_HB_CNT_PER_AE;
+ hw_data->check_hb_ctrs = adf_heartbeat_check_ctrs;

adf_gen2_init_pf_pfvf_ops(&hw_data->pfvf_ops);
hw_data->pfvf_ops.enable_vf2pf_interrupts = enable_vf2pf_interrupts;
--
2.40.1


2023-06-20 13:41:54

by Damian Muszynski

[permalink] [raw]
Subject: [PATCH v2 2/5] crypto: qat - drop obsolete heartbeat interface

Drop legacy heartbeat interface from FW API as it is no longer used.

Signed-off-by: Damian Muszynski <[email protected]>
Reviewed-by: Giovanni Cabiddu <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
---
.../intel/qat/qat_common/icp_qat_fw_init_admin.h | 15 ---------------
1 file changed, 15 deletions(-)

diff --git a/drivers/crypto/intel/qat/qat_common/icp_qat_fw_init_admin.h b/drivers/crypto/intel/qat/qat_common/icp_qat_fw_init_admin.h
index d853c9242acf..6ea19b4fb0ce 100644
--- a/drivers/crypto/intel/qat/qat_common/icp_qat_fw_init_admin.h
+++ b/drivers/crypto/intel/qat/qat_common/icp_qat_fw_init_admin.h
@@ -102,19 +102,4 @@ struct icp_qat_fw_init_admin_resp {

#define ICP_QAT_FW_SYNC ICP_QAT_FW_HEARTBEAT_SYNC

-#define ICP_QAT_FW_COMN_HEARTBEAT_OK 0
-#define ICP_QAT_FW_COMN_HEARTBEAT_BLOCKED 1
-#define ICP_QAT_FW_COMN_HEARTBEAT_FLAG_BITPOS 0
-#define ICP_QAT_FW_COMN_HEARTBEAT_FLAG_MASK 0x1
-#define ICP_QAT_FW_COMN_STATUS_RESRVD_FLD_MASK 0xFE
-#define ICP_QAT_FW_COMN_HEARTBEAT_HDR_FLAG_GET(hdr_t) \
- ICP_QAT_FW_COMN_HEARTBEAT_FLAG_GET(hdr_t.flags)
-
-#define ICP_QAT_FW_COMN_HEARTBEAT_HDR_FLAG_SET(hdr_t, val) \
- ICP_QAT_FW_COMN_HEARTBEAT_FLAG_SET(hdr_t, val)
-
-#define ICP_QAT_FW_COMN_HEARTBEAT_FLAG_GET(flags) \
- QAT_FIELD_GET(flags, \
- ICP_QAT_FW_COMN_HEARTBEAT_FLAG_BITPOS, \
- ICP_QAT_FW_COMN_HEARTBEAT_FLAG_MASK)
#endif
--
2.40.1


2023-06-22 11:30:59

by Damian Muszynski

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] crypto: qat - add heartbeat feature

Hi Herbert,

On 2023-06-20 at 15:08:18 +0200, Damian Muszynski wrote:
> This set introduces support for the QAT heartbeat feature. It allows
> detection whenever device firmware or acceleration unit will hang.
> We're adding this feature to allow our clients having a tool with
> they could verify if all of the Quick Assist hardware resources are
> healthy and operational.
>
> QAT device firmware periodically writes counters to a specified physical
> memory location. A pair of counters per thread is incremented at
> the start and end of the main processing loop within the firmware.
> Checking for Heartbeat consists of checking the validity of the pair
> of counter values for each thread. Stagnant counters indicate
> a firmware hang.
>
> The first patch adds timestamp synchronization with the firmware.
> The second patch removes historical and never used HB definitions.
> Patch no. 3 is implementing the hardware clock frequency measuring
> interface.
> The fourth introduces the main heartbeat implementation with the debugfs
> interface.
> The last patch implements an algorithm that allows the code to detect
> which version of heartbeat API is used at the currently loaded firmware.
>
> Signed-off-by: Damian Muszynski <[email protected]>
> Reviewed-by: Giovanni Cabiddu <[email protected]>
> Reviewed-by: Andy Shevchenko <[email protected]>

Unfortunately, with this set we found another build issue on 32 bit architectures.
Please ignore this version. I will send the fixed one today.

>> ld: drivers/crypto/intel/qat/qat_common/adf_clock.o: in function `measure_clock':
>> drivers/crypto/intel/qat/qat_common/adf_clock.c:87: undefined reference to `__udivdi3'

Sorry for the noise.

---
Best Regards,
Damian Muszynski