2020-09-04 18:49:03

by Oded Gabbay

[permalink] [raw]
Subject: [PATCH 1/6] habanalabs: rename ArmCP to CPU-CP

There were a couple of comments where the name ArmCP was still used. Rename
it to CPU-CP.

In addition, rename ArmCP or ARM in log messages to "device CPU".

Signed-off-by: Oded Gabbay <[email protected]>
---
drivers/misc/habanalabs/common/firmware_if.c | 17 ++++++++-------
drivers/misc/habanalabs/common/habanalabs.h | 22 ++++++++++----------
drivers/misc/habanalabs/gaudi/gaudi.c | 4 ++--
drivers/misc/habanalabs/goya/goya.c | 2 +-
4 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c
index f2a38e95359a..4409962d30ae 100644
--- a/drivers/misc/habanalabs/common/firmware_if.c
+++ b/drivers/misc/habanalabs/common/firmware_if.c
@@ -175,7 +175,7 @@ int hl_fw_unmask_irq_arr(struct hl_device *hdev, const u32 *irq_arr,
total_pkt_size = sizeof(struct cpucp_unmask_irq_arr_packet) +
irq_arr_size;

- /* data should be aligned to 8 bytes in order to ArmCP to copy it */
+ /* data should be aligned to 8 bytes in order to CPU-CP to copy it */
total_pkt_size = (total_pkt_size + 0x7) & ~0x7;

/* total_pkt_size is casted to u16 later on */
@@ -283,7 +283,7 @@ int hl_fw_cpucp_info_get(struct hl_device *hdev)
&cpucp_info_dma_addr);
if (!cpucp_info_cpu_addr) {
dev_err(hdev->dev,
- "Failed to allocate DMA memory for ArmCP info packet\n");
+ "Failed to allocate DMA memory for CPU-CP info packet\n");
return -ENOMEM;
}

@@ -298,7 +298,7 @@ int hl_fw_cpucp_info_get(struct hl_device *hdev)
HL_CPUCP_INFO_TIMEOUT_USEC, &result);
if (rc) {
dev_err(hdev->dev,
- "Failed to handle ArmCP info pkt, error %d\n", rc);
+ "Failed to handle CPU-CP info pkt, error %d\n", rc);
goto out;
}

@@ -333,7 +333,7 @@ int hl_fw_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size)
max_size, &eeprom_info_dma_addr);
if (!eeprom_info_cpu_addr) {
dev_err(hdev->dev,
- "Failed to allocate DMA memory for ArmCP EEPROM packet\n");
+ "Failed to allocate DMA memory for CPU-CP EEPROM packet\n");
return -ENOMEM;
}

@@ -349,7 +349,8 @@ int hl_fw_get_eeprom_data(struct hl_device *hdev, void *data, size_t max_size)

if (rc) {
dev_err(hdev->dev,
- "Failed to handle ArmCP EEPROM packet, error %d\n", rc);
+ "Failed to handle CPU-CP EEPROM packet, error %d\n",
+ rc);
goto out;
}

@@ -379,7 +380,7 @@ int hl_fw_cpucp_pci_counters_get(struct hl_device *hdev,
HL_CPUCP_INFO_TIMEOUT_USEC, &result);
if (rc) {
dev_err(hdev->dev,
- "Failed to handle ArmCP PCI info pkt, error %d\n", rc);
+ "Failed to handle CPU-CP PCI info pkt, error %d\n", rc);
return rc;
}
counters->rx_throughput = result;
@@ -390,7 +391,7 @@ int hl_fw_cpucp_pci_counters_get(struct hl_device *hdev,
HL_CPUCP_INFO_TIMEOUT_USEC, &result);
if (rc) {
dev_err(hdev->dev,
- "Failed to handle ArmCP PCI info pkt, error %d\n", rc);
+ "Failed to handle CPU-CP PCI info pkt, error %d\n", rc);
return rc;
}
counters->tx_throughput = result;
@@ -403,7 +404,7 @@ int hl_fw_cpucp_pci_counters_get(struct hl_device *hdev,
HL_CPUCP_INFO_TIMEOUT_USEC, &result);
if (rc) {
dev_err(hdev->dev,
- "Failed to handle ArmCP PCI info pkt, error %d\n", rc);
+ "Failed to handle CPU-CP PCI info pkt, error %d\n", rc);
return rc;
}
counters->replay_cnt = (u32) result;
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index 6ea8ae616cf4..b56410d75065 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -466,7 +466,7 @@ struct hl_cs_job;
#define HL_EQ_LENGTH 64
#define HL_EQ_SIZE_IN_BYTES (HL_EQ_LENGTH * HL_EQ_ENTRY_SIZE)

-/* Host <-> ArmCP shared memory size */
+/* Host <-> CPU-CP shared memory size */
#define HL_CPU_ACCESSIBLE_MEM_SIZE SZ_2M

/**
@@ -648,7 +648,7 @@ enum div_select_defs {
* @debugfs_read32: debug interface for reading u32 from DRAM/SRAM.
* @debugfs_write32: debug interface for writing u32 to DRAM/SRAM.
* @add_device_attr: add ASIC specific device attributes.
- * @handle_eqe: handle event queue entry (IRQ) from ArmCP.
+ * @handle_eqe: handle event queue entry (IRQ) from CPU-CP.
* @set_pll_profile: change PLL profile (manual/automatic).
* @get_events_stat: retrieve event queue entries histogram.
* @read_pte: read MMU page table entry from DRAM.
@@ -657,7 +657,7 @@ enum div_select_defs {
* (L1 only) or hard (L0 & L1) flush.
* @mmu_invalidate_cache_range: flush specific MMU STLB cache lines with
* ASID-VA-size mask.
- * @send_heartbeat: send is-alive packet to ArmCP and verify response.
+ * @send_heartbeat: send is-alive packet to CPU-CP and verify response.
* @set_clock_gating: enable/disable clock gating per engine according to
* clock gating mask in hdev
* @disable_clock_gating: disable clock gating completely
@@ -1438,8 +1438,8 @@ struct hl_device_idle_busy_ts {
* @dev: related kernel basic device structure.
* @dev_ctrl: related kernel device structure for the control device
* @work_freq: delayed work to lower device frequency if possible.
- * @work_heartbeat: delayed work for ArmCP is-alive check.
- * @asic_name: ASIC specific nmae.
+ * @work_heartbeat: delayed work for CPU-CP is-alive check.
+ * @asic_name: ASIC specific name.
* @asic_type: ASIC specific type.
* @completion_queue: array of hl_cq.
* @cq_wq: work queues of completion queues for executing work in process
@@ -1450,14 +1450,14 @@ struct hl_device_idle_busy_ts {
* @hw_queues_mirror_list: CS mirror list for TDR.
* @hw_queues_mirror_lock: protects hw_queues_mirror_list.
* @kernel_cb_mgr: command buffer manager for creating/destroying/handling CGs.
- * @event_queue: event queue for IRQ from ArmCP.
+ * @event_queue: event queue for IRQ from CPU-CP.
* @dma_pool: DMA pool for small allocations.
- * @cpu_accessible_dma_mem: Host <-> ArmCP shared memory CPU address.
- * @cpu_accessible_dma_address: Host <-> ArmCP shared memory DMA address.
- * @cpu_accessible_dma_pool: Host <-> ArmCP shared memory pool.
+ * @cpu_accessible_dma_mem: Host <-> CPU-CP shared memory CPU address.
+ * @cpu_accessible_dma_address: Host <-> CPU-CP shared memory DMA address.
+ * @cpu_accessible_dma_pool: Host <-> CPU-CP shared memory pool.
* @asid_bitmap: holds used/available ASIDs.
* @asid_mutex: protects asid_bitmap.
- * @send_cpu_message_lock: enforces only one message in Host <-> ArmCP queue.
+ * @send_cpu_message_lock: enforces only one message in Host <-> CPU-CP queue.
* @debug_lock: protects critical section of setting debug mode for device
* @asic_prop: ASIC specific immutable properties.
* @asic_funcs: ASIC specific functions.
@@ -1511,7 +1511,7 @@ struct hl_device_idle_busy_ts {
* @late_init_done: is late init stage was done during initialization.
* @hwmon_initialized: is H/W monitor sensors was initialized.
* @hard_reset_pending: is there a hard reset work pending.
- * @heartbeat: is heartbeat sanity check towards ArmCP enabled.
+ * @heartbeat: is heartbeat sanity check towards CPU-CP enabled.
* @reset_on_lockup: true if a reset should be done in case of stuck CS, false
* otherwise.
* @dram_supports_virtual_memory: is MMU enabled towards DRAM.
diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index bc7e7e3ba3a8..7946179632fb 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -2854,7 +2854,7 @@ static int gaudi_init_cpu_queues(struct hl_device *hdev, u32 cpu_timeout)

if (err) {
dev_err(hdev->dev,
- "Failed to communicate with ARM CPU (ArmCP timeout)\n");
+ "Failed to communicate with Device CPU (CPU-CP timeout)\n");
return -EIO;
}

@@ -5616,7 +5616,7 @@ static bool gaudi_tpc_read_interrupts(struct hl_device *hdev, u8 tpc_id,
bool soft_reset_required = false;

/* Accessing the TPC_INTR_CAUSE registers requires disabling the clock
- * gating, and thus cannot be done in ArmCP and should be done instead
+ * gating, and thus cannot be done in CPU-CP and should be done instead
* by the driver.
*/

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 94b7958ba5fb..fb7e4f50c198 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -4509,7 +4509,7 @@ static int goya_unmask_irq_arr(struct hl_device *hdev, u32 *irq_arr,
total_pkt_size = sizeof(struct cpucp_unmask_irq_arr_packet) +
irq_arr_size;

- /* data should be aligned to 8 bytes in order to ArmCP to copy it */
+ /* data should be aligned to 8 bytes in order to CPU-CP to copy it */
total_pkt_size = (total_pkt_size + 0x7) & ~0x7;

/* total_pkt_size is casted to u16 later on */
--
2.17.1


2020-09-04 18:50:40

by Oded Gabbay

[permalink] [raw]
Subject: [PATCH 4/6] habanalabs: Fix alignment issue in cpucp_info structure

From: Ofir Bitton <[email protected]>

Because the device CPU compiler aligns structures to 8 bytes,
struct cpucp_info has an alignment issue as some parts
in the structure are not aligned to 8 bytes.
It is preferred that we explicitly insert placeholders inside
the structure to avoid confusion

in order to validate this scenario, we printed both pointers:

__u8 cpucp_version[VERSION_MAX_LEN]; (0xffff899c67ed4cbc)
__le64 dram_size; (0xffff899c67ed4d40)

we see difference of 132 bytes although the first array
is only 128 bytes long, Meaning compiler added a 4 byte padding.

Signed-off-by: Ofir Bitton <[email protected]>
Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
---
drivers/misc/habanalabs/include/common/cpucp_if.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/misc/habanalabs/include/common/cpucp_if.h b/drivers/misc/habanalabs/include/common/cpucp_if.h
index 1e8480e978e2..dcde440427b4 100644
--- a/drivers/misc/habanalabs/include/common/cpucp_if.h
+++ b/drivers/misc/habanalabs/include/common/cpucp_if.h
@@ -410,6 +410,7 @@ struct cpucp_info {
__u8 fuse_version[VERSION_MAX_LEN];
__u8 thermal_version[VERSION_MAX_LEN];
__u8 cpucp_version[VERSION_MAX_LEN];
+ __le32 reserved2;
__le64 dram_size;
char card_name[CARD_NAME_MAX_LEN];
};
--
2.17.1

2020-09-04 18:51:05

by Oded Gabbay

[permalink] [raw]
Subject: [PATCH 6/6] habanalabs: check flag before reset because of f/w event

For consistency with GAUDI code, add check of the relevant flag in the
device structure before resetting the GOYA device in case of firmware
event.

Signed-off-by: Oded Gabbay <[email protected]>
---
drivers/misc/habanalabs/goya/goya.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 804c83eb12b7..46a900fb3ef8 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -4644,7 +4644,8 @@ void goya_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_entry)
case GOYA_ASYNC_EVENT_ID_L2_RAM_ECC:
case GOYA_ASYNC_EVENT_ID_PSOC_GPIO_05_SW_RESET:
goya_print_irq_info(hdev, event_type, false);
- hl_device_reset(hdev, true, false);
+ if (hdev->hard_reset_on_fw_events)
+ hl_device_reset(hdev, true, false);
break;

case GOYA_ASYNC_EVENT_ID_PCIE_DEC:
--
2.17.1

2020-09-07 15:28:22

by Tomer Tayar

[permalink] [raw]
Subject: RE: [PATCH 6/6] habanalabs: check flag before reset because of f/w event

On Fri, Sep 4, 2020 at 21:46 Oded Gabbay <[email protected]> wrote:
> For consistency with GAUDI code, add check of the relevant flag in the
> device structure before resetting the GOYA device in case of firmware
> event.
>
> Signed-off-by: Oded Gabbay <[email protected]>

This patch-set is:
Reviewed-by: Tomer Tayar <[email protected]>