Update documentation describing sysfs node that could help to
configure isolation strategy for users in the user space. And
describing sysfs node that could read the device isolated state.
Signed-off-by: Kai Ye <[email protected]>
---
Documentation/ABI/testing/sysfs-driver-uacce | 27 ++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce
index 08f2591138af..50737c897ba3 100644
--- a/Documentation/ABI/testing/sysfs-driver-uacce
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -19,6 +19,33 @@ Contact: [email protected]
Description: Available instances left of the device
Return -ENODEV if uacce_ops get_available_instances is not provided
+What: /sys/class/uacce/<dev_name>/isolate_strategy
+Date: Oct 2022
+KernelVersion: 6.1
+Contact: [email protected]
+Description: (RW) Configure the frequency size for the hardware error
+ isolation strategy. This unit is the number of times. Number
+ of occurrences in a period, also means threshold. If the number
+ of device pci AER error exceeds the threshold in a time window,
+ the device is isolated. This size is a configured integer value.
+ The default is 0. The maximum value is 65535.
+
+ In the hisilicon accelerator engine, first we will
+ time-stamp every slot AER error. Then check the AER error log
+ when the device AER error occurred. if the device slot AER error
+ count exceeds the preset the number of times in one hour, the
+ isolated state will be set to true. So the device will be
+ isolated. And the AER error log that exceed one hour will be
+ cleared.
+
+What: /sys/class/uacce/<dev_name>/isolate
+Date: Oct 2022
+KernelVersion: 6.1
+Contact: [email protected]
+Description: (R) A sysfs node that read the device isolated state. The value 1
+ means the device is unavailable. The 0 means the device is
+ available.
+
What: /sys/class/uacce/<dev_name>/algorithms
Date: Feb 2020
KernelVersion: 5.7
--
2.17.1
On Tue, Oct 25, 2022 at 12:39:30PM +0000, Kai Ye wrote:
> Update documentation describing sysfs node that could help to
> configure isolation strategy for users in the user space. And
> describing sysfs node that could read the device isolated state.
>
> Signed-off-by: Kai Ye <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-driver-uacce | 27 ++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce
> index 08f2591138af..50737c897ba3 100644
> --- a/Documentation/ABI/testing/sysfs-driver-uacce
> +++ b/Documentation/ABI/testing/sysfs-driver-uacce
> @@ -19,6 +19,33 @@ Contact: [email protected]
> Description: Available instances left of the device
> Return -ENODEV if uacce_ops get_available_instances is not provided
>
> +What: /sys/class/uacce/<dev_name>/isolate_strategy
> +Date: Oct 2022
> +KernelVersion: 6.1
> +Contact: [email protected]
> +Description: (RW) Configure the frequency size for the hardware error
> + isolation strategy. This unit is the number of times. Number
Number of times what?
> + of occurrences in a period, also means threshold. If the number
> + of device pci AER error exceeds the threshold in a time window,
What is the time window?
> + the device is isolated. This size is a configured integer value.
> + The default is 0. The maximum value is 65535.
> +
> + In the hisilicon accelerator engine, first we will
> + time-stamp every slot AER error. Then check the AER error log
> + when the device AER error occurred. if the device slot AER error
> + count exceeds the preset the number of times in one hour, the
> + isolated state will be set to true. So the device will be
> + isolated. And the AER error log that exceed one hour will be
> + cleared.
This seems like a very hardware-specific implementation here. And this
is supposed to be a generic class?
I feel this is getting really messy :(
thanks,
greg k-h