This patchset adds more features support for FPGA Device Feature List
(DFL) drivers, including PR enhancement, virtualization support based
on PCIe SRIOV, private features to Port, private features to FME, and
enhancement to DFL framework. Please refer to details in below list.
Patch 1: A bug fixing to current dfl-fme-mgr driver.
Patch 2-3: add 512bit data width PR support.
Patch 4-6: add virtualization support based on PCIe SRIOV.
Patch 7-8: add new AFU state and userclock related sysfs to dfl-afu.
Patch 9-10: enhancement to DFL framework in order to support id_table.
Patch 11: add error reporting private feature support to dfl-afu.
Patch 12: add STP (SignalTap) private feature support to dfl-afu.
Patch 13: add capability sysfs interfaces to dfl-fme.
Patch 14: add thermal management private feature support to dfl-fme.
Patch 15: add power management private feature support to dfl-fme.
Patch 16: add global error reporitng private feature support to dfl-fme.
Patch 17: add performance reporting support to dfl-fme.
Wu Hao (17):
fpga: dfl-fme-mgr: fix FME_PR_INTFC_ID register address.
fpga: dfl: fme: align PR buffer size per PR datawidth
fpga: dfl: fme: support 512bit data width PR
Documentation: fpga: dfl: add descriptions for virtualization and new
interfaces.
fpga: dfl: fme: add DFL_FPGA_FME_PORT_RELEASE/ASSIGN ioctl support.
fpga: dfl: pci: enable SRIOV support.
fpga: dfl: afu: add AFU state related sysfs interfaces
fpga: dfl: afu: add userclock sysfs interfaces.
fpga: dfl: add id_table for dfl private feature driver
fpga: dfl: afu: export __port_enable/disable function.
fpga: dfl: afu: add error reporting support.
fpga: dfl: afu: add STP (SignalTap) support
fpga: dfl: fme: add capability sysfs interfaces
fpga: dfl: fme: add thermal management support
fpga: dfl: fme: add power management support
fpga: dfl: fme: add global error reporting support
fpga: dfl: fme: add performance reporting support
Documentation/ABI/testing/sysfs-platform-dfl-fme | 279 +++++++
Documentation/ABI/testing/sysfs-platform-dfl-port | 94 +++
Documentation/fpga/dfl.txt | 115 +++
drivers/fpga/Makefile | 4 +-
drivers/fpga/dfl-afu-error.c | 225 +++++
drivers/fpga/dfl-afu-main.c | 335 +++++++-
drivers/fpga/dfl-afu.h | 7 +
drivers/fpga/dfl-fme-error.c | 390 +++++++++
drivers/fpga/dfl-fme-main.c | 583 ++++++++++++-
drivers/fpga/dfl-fme-mgr.c | 79 +-
drivers/fpga/dfl-fme-perf.c | 950 ++++++++++++++++++++++
drivers/fpga/dfl-fme-pr.c | 64 +-
drivers/fpga/dfl-fme.h | 9 +-
drivers/fpga/dfl-pci.c | 40 +
drivers/fpga/dfl.c | 170 +++-
drivers/fpga/dfl.h | 56 +-
include/uapi/linux/fpga-dfl.h | 32 +
17 files changed, 3355 insertions(+), 77 deletions(-)
create mode 100644 drivers/fpga/dfl-afu-error.c
create mode 100644 drivers/fpga/dfl-fme-error.c
create mode 100644 drivers/fpga/dfl-fme-perf.c
--
2.7.4
FME_PR_INTFC_ID is used as compat_id for fpga manager and region,
but high 64 bits and low 64 bits of the compat_id are swapped by
mistake. This patch fixes this problem by fixing register address.
Signed-off-by: Wu Hao <[email protected]>
---
drivers/fpga/dfl-fme-mgr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
index 76f3770..b3f7eee 100644
--- a/drivers/fpga/dfl-fme-mgr.c
+++ b/drivers/fpga/dfl-fme-mgr.c
@@ -30,8 +30,8 @@
#define FME_PR_STS 0x10
#define FME_PR_DATA 0x18
#define FME_PR_ERR 0x20
-#define FME_PR_INTFC_ID_H 0xA8
-#define FME_PR_INTFC_ID_L 0xB0
+#define FME_PR_INTFC_ID_L 0xA8
+#define FME_PR_INTFC_ID_H 0xB0
/* FME PR Control Register Bitfield */
#define FME_PR_CTRL_PR_RST BIT_ULL(0) /* Reset PR engine */
--
2.7.4
Current driver checks if input bitstream file size is aligned or
not per PR data width (default 32bits). It requires one additional
step for end user when they generate the bitstream file, padding
extra zeros to bitstream file to align its size per PR data width,
but they don't have to as hardware will drop extra padding bytes
automatically.
In order to simplify the user steps, this patch aligns PR buffer
size per PR data width in driver, to allow user to pass unaligned
size bitstream files to driver.
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
drivers/fpga/dfl-fme-pr.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
index d9ca955..c1fb1fe 100644
--- a/drivers/fpga/dfl-fme-pr.c
+++ b/drivers/fpga/dfl-fme-pr.c
@@ -74,6 +74,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
struct dfl_fme *fme;
unsigned long minsz;
void *buf = NULL;
+ size_t length;
int ret = 0;
u64 v;
@@ -85,9 +86,6 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
if (port_pr.argsz < minsz || port_pr.flags)
return -EINVAL;
- if (!IS_ALIGNED(port_pr.buffer_size, 4))
- return -EINVAL;
-
/* get fme header region */
fme_hdr = dfl_get_feature_ioaddr_by_id(&pdev->dev,
FME_FEATURE_ID_HEADER);
@@ -103,7 +101,13 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
port_pr.buffer_size))
return -EFAULT;
- buf = vmalloc(port_pr.buffer_size);
+ /*
+ * align PR buffer per PR bandwidth, as HW ignores the extra padding
+ * data automatically.
+ */
+ length = ALIGN(port_pr.buffer_size, 4);
+
+ buf = vmalloc(length);
if (!buf)
return -ENOMEM;
@@ -140,7 +144,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
fpga_image_info_free(region->info);
info->buf = buf;
- info->count = port_pr.buffer_size;
+ info->count = length;
info->region_id = port_pr.port_id;
region->info = info;
--
2.7.4
This patch adds virtualization support description for DFL based
FPGA devices (based on PCIe SRIOV), and introductions to new
interfaces added by new dfl private features.
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
Documentation/fpga/dfl.txt | 115 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 115 insertions(+)
diff --git a/Documentation/fpga/dfl.txt b/Documentation/fpga/dfl.txt
index 6df4621..360c1d9 100644
--- a/Documentation/fpga/dfl.txt
+++ b/Documentation/fpga/dfl.txt
@@ -84,6 +84,8 @@ The following functions are exposed through ioctls:
Get driver API version (DFL_FPGA_GET_API_VERSION)
Check for extensions (DFL_FPGA_CHECK_EXTENSION)
Program bitstream (DFL_FPGA_FME_PORT_PR)
+ Assign port to PF (DFL_FPGA_FME_PORT_ASSIGN)
+ Release port from PF (DFL_FPGA_FME_PORT_RELEASE)
More functions are exposed through sysfs
(/sys/class/fpga_region/regionX/dfl-fme.n/):
@@ -99,6 +101,24 @@ More functions are exposed through sysfs
one FPGA device may have more than one port, this sysfs interface indicates
how many ports the FPGA device has.
+ Power management (power_mgmt/)
+ power management sysfs interfaces allow user to read power management
+ information (power consumption, power limits, throttling thresholds,
+ thresholds status, etc) and configure power thresholds for different
+ throttling levels.
+
+ Thermal management (thermal_mgmt/)
+ thermal management sysfs interfaces allow user to read temperature,
+ thresholds, thresholds status and other thermal related information.
+
+ Global error reporting management (errors/)
+ error reporting sysfs interfaces allow user to read errors detected by the
+ hardware, and clear the logged errors.
+
+ Performance counters (perf/)
+ performance counters sysfs interfaces allow user to use different counters
+ to get performance data.
+
FIU - PORT
==========
@@ -139,6 +159,10 @@ More functions are exposed through sysfs:
Read Accelerator GUID (afu_id)
afu_id indicates which PR bitstream is programmed to this AFU.
+ Error reporting (errors/)
+ error reporting sysfs interfaces allow user to read port/afu errors
+ detected by the hardware, and clear the logged errors.
+
DFL Framework Overview
======================
@@ -212,6 +236,97 @@ the compat_id exposed by the target FPGA region. This check is usually done by
userspace before calling the reconfiguration IOCTL.
+FPGA virtualization - PCIe SRIOV
+================================
+This section describes the virtualization support on DFL based FPGA device to
+enable accessing an accelerator from applications running in a virtual machine
+(VM). This section only describes the PCIe based FPGA device with SRIOV support.
+
+Features supported by the particular FPGA device are exposed through Device
+Feature Lists, as illustrated below:
+
+ +-------------------------------+ +-------------+
+ | PF | | VF |
+ +-------------------------------+ +-------------+
+ ^ ^ ^ ^
+ | | | |
++-----|------------|---------|--------------|-------+
+| | | | | |
+| +-----+ +-------+ +-------+ +-------+ |
+| | FME | | Port0 | | Port1 | | Port2 | |
+| +-----+ +-------+ +-------+ +-------+ |
+| ^ ^ ^ |
+| | | | |
+| +-------+ +------+ +-------+ |
+| | AFU | | AFU | | AFU | |
+| +-------+ +------+ +-------+ |
+| |
+| DFL based FPGA PCIe Device |
++---------------------------------------------------+
+
+FME is always accessed through the physical function (PF).
+
+Ports (and related AFUs) are accessed via PF by default, but could be exposed
+through virtual function (VF) devices via PCIe SRIOV. Each VF only contains
+1 Port and 1 AFU for isolation. Users could assign individual VFs (accelerators)
+created via PCIe SRIOV interface, to virtual machines.
+
+The driver organization in virtualization case is illustrated below:
+
+ +-------++------++------+ |
+ | FME || FME || FME | |
+ | FPGA || FPGA || FPGA | |
+ |Manager||Bridge||Region| |
+ +-------++------++------+ |
+ +-----------------------+ +--------+ | +--------+
+ | FME | | AFU | | | AFU |
+ | Module | | Module | | | Module |
+ +-----------------------+ +--------+ | +--------+
+ +-----------------------+ | +-----------------------+
+ | FPGA Container Device | | | FPGA Container Device |
+ | (FPGA Base Region) | | | (FPGA Base Region) |
+ +-----------------------+ | +-----------------------+
+ +------------------+ | +------------------+
+ | FPGA PCIE Module | | Virtual | FPGA PCIE Module |
+ +------------------+ Host | Machine +------------------+
+ -------------------------------------- | ------------------------------
+ +---------------+ | +---------------+
+ | PCI PF Device | | | PCI VF Device |
+ +---------------+ | +---------------+
+
+FPGA PCIe device driver is always loaded first once a FPGA PCIe PF or VF device
+is detected. It:
+
+ a) finish enumeration on both FPGA PCIe PF and VF device using common
+ interfaces from DFL framework.
+ b) supports SRIOV.
+
+The FME device driver plays a management role in this driver architecture, it
+provides ioctls to release Port from PF and assign Port to PF. After release
+a port from PF, then it's safe to expose this port through a VF via PCIe SRIOV
+sysfs interface.
+
+To enable accessing an accelerator from applications running in a VM, the
+respective AFU's port needs to be assigned to a VF using the following steps:
+
+ a) The PF owns all AFU ports by default. Any port that needs to be
+ reassigned to a VF must first be released through the
+ DFL_FPGA_FME_PORT_RELEASE ioctl on the FME device.
+
+ b) Once N ports are released from PF, then user can use command below
+ to enable SRIOV and VFs. Each VF owns only one Port with AFU.
+
+ echo N > $PCI_DEVICE_PATH/sriov_numvfs
+
+ c) Pass through the VFs to VMs
+
+ d) The AFU under VF is accessible from applications in VM (using the
+ same driver inside the VF).
+
+Note that an FME can't be assigned to a VF, thus PR and other management
+functions are only available via the PF.
+
+
Device enumeration
==================
This section introduces how applications enumerate the fpga device from
--
2.7.4
This patch adds id_table for each dfl private feature driver,
it allows to reuse same private feature driver to match and support
multiple dfl private features.
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
drivers/fpga/dfl-afu-main.c | 14 ++++++++++++--
drivers/fpga/dfl-fme-main.c | 11 ++++++++---
drivers/fpga/dfl-fme-pr.c | 7 ++++++-
drivers/fpga/dfl-fme.h | 3 ++-
drivers/fpga/dfl.c | 21 +++++++++++++++++++--
drivers/fpga/dfl.h | 21 +++++++++++++++------
6 files changed, 62 insertions(+), 15 deletions(-)
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 82fd80a..2916876 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -440,6 +440,11 @@ port_hdr_ioctl(struct platform_device *pdev, struct dfl_feature *feature,
return ret;
}
+static const struct dfl_feature_id port_hdr_id_table[] = {
+ {.id = PORT_FEATURE_ID_HEADER,},
+ {0,}
+};
+
static const struct dfl_feature_ops port_hdr_ops = {
.init = port_hdr_init,
.uinit = port_hdr_uinit,
@@ -500,6 +505,11 @@ static void port_afu_uinit(struct platform_device *pdev,
sysfs_remove_files(&pdev->dev.kobj, port_afu_attrs);
}
+static const struct dfl_feature_id port_afu_id_table[] = {
+ {.id = PORT_FEATURE_ID_AFU,},
+ {0,}
+};
+
static const struct dfl_feature_ops port_afu_ops = {
.init = port_afu_init,
.uinit = port_afu_uinit,
@@ -507,11 +517,11 @@ static const struct dfl_feature_ops port_afu_ops = {
static struct dfl_feature_driver port_feature_drvs[] = {
{
- .id = PORT_FEATURE_ID_HEADER,
+ .id_table = port_hdr_id_table,
.ops = &port_hdr_ops,
},
{
- .id = PORT_FEATURE_ID_AFU,
+ .id_table = port_afu_id_table,
.ops = &port_afu_ops,
},
{
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 8b2a337..38c6342 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -158,6 +158,11 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
return -ENODEV;
}
+static const struct dfl_feature_id fme_hdr_id_table[] = {
+ {.id = FME_FEATURE_ID_HEADER,},
+ {0,}
+};
+
static const struct dfl_feature_ops fme_hdr_ops = {
.init = fme_hdr_init,
.uinit = fme_hdr_uinit,
@@ -166,12 +171,12 @@ static const struct dfl_feature_ops fme_hdr_ops = {
static struct dfl_feature_driver fme_feature_drvs[] = {
{
- .id = FME_FEATURE_ID_HEADER,
+ .id_table = fme_hdr_id_table,
.ops = &fme_hdr_ops,
},
{
- .id = FME_FEATURE_ID_PR_MGMT,
- .ops = &pr_mgmt_ops,
+ .id_table = fme_pr_mgmt_id_table,
+ .ops = &fme_pr_mgmt_ops,
},
{
.ops = NULL,
diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
index 8a0e46a..b054ac6 100644
--- a/drivers/fpga/dfl-fme-pr.c
+++ b/drivers/fpga/dfl-fme-pr.c
@@ -482,7 +482,12 @@ static long fme_pr_ioctl(struct platform_device *pdev,
return ret;
}
-const struct dfl_feature_ops pr_mgmt_ops = {
+const struct dfl_feature_id fme_pr_mgmt_id_table[] = {
+ {.id = FME_FEATURE_ID_PR_MGMT,},
+ {0}
+};
+
+const struct dfl_feature_ops fme_pr_mgmt_ops = {
.init = pr_mgmt_init,
.uinit = pr_mgmt_uinit,
.ioctl = fme_pr_ioctl,
diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
index de20755..7a021c4 100644
--- a/drivers/fpga/dfl-fme.h
+++ b/drivers/fpga/dfl-fme.h
@@ -35,6 +35,7 @@ struct dfl_fme {
struct dfl_feature_platform_data *pdata;
};
-extern const struct dfl_feature_ops pr_mgmt_ops;
+extern const struct dfl_feature_ops fme_pr_mgmt_ops;
+extern const struct dfl_feature_id fme_pr_mgmt_id_table[];
#endif /* __DFL_FME_H */
diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
index c5aa287..65f91ef 100644
--- a/drivers/fpga/dfl.c
+++ b/drivers/fpga/dfl.c
@@ -14,6 +14,8 @@
#include "dfl.h"
+#define DRV_VERSION "0.8"
+
static DEFINE_MUTEX(dfl_id_mutex);
/*
@@ -274,6 +276,21 @@ static int dfl_feature_instance_init(struct platform_device *pdev,
return ret;
}
+static bool dfl_feature_drv_match(struct dfl_feature *feature,
+ struct dfl_feature_driver *driver)
+{
+ const struct dfl_feature_id *ids = driver->id_table;
+
+ if (ids) {
+ while (ids->id) {
+ if (ids->id == feature->id)
+ return true;
+ ids++;
+ }
+ }
+ return false;
+}
+
/**
* dfl_fpga_dev_feature_init - init for sub features of dfl feature device
* @pdev: feature device.
@@ -294,8 +311,7 @@ int dfl_fpga_dev_feature_init(struct platform_device *pdev,
while (drv->ops) {
dfl_fpga_dev_for_each_feature(pdata, feature) {
- /* match feature and drv using id */
- if (feature->id == drv->id) {
+ if (dfl_feature_drv_match(feature, drv)) {
ret = dfl_feature_instance_init(pdev, pdata,
feature, drv);
if (ret)
@@ -1164,3 +1180,4 @@ module_exit(dfl_fpga_exit);
MODULE_DESCRIPTION("FPGA Device Feature List (DFL) Support");
MODULE_AUTHOR("Intel Corporation");
MODULE_LICENSE("GPL v2");
+MODULE_VERSION(DRV_VERSION);
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index 3c5dc3a..fbc57f0 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -30,8 +30,8 @@
/* plus one for fme device */
#define MAX_DFL_FEATURE_DEV_NUM (MAX_DFL_FPGA_PORT_NUM + 1)
-/* Reserved 0x0 for Header Group Register and 0xff for AFU */
-#define FEATURE_ID_FIU_HEADER 0x0
+/* Reserved 0xfe for Header Group Register and 0xff for AFU */
+#define FEATURE_ID_FIU_HEADER 0xfe
#define FEATURE_ID_AFU 0xff
#define FME_FEATURE_ID_HEADER FEATURE_ID_FIU_HEADER
@@ -169,13 +169,22 @@ void dfl_fpga_port_ops_put(struct dfl_fpga_port_ops *ops);
int dfl_fpga_check_port_id(struct platform_device *pdev, void *pport_id);
/**
- * struct dfl_feature_driver - sub feature's driver
+ * struct dfl_feature_id - dfl private feature id
*
- * @id: sub feature id.
- * @ops: ops of this sub feature.
+ * @id: unique dfl private feature id.
*/
-struct dfl_feature_driver {
+struct dfl_feature_id {
u64 id;
+};
+
+/**
+ * struct dfl_feature_driver - dfl private feature driver
+ *
+ * @id_table: id_table for dfl private features supported by this driver.
+ * @ops: ops of this dfl private feature driver.
+ */
+struct dfl_feature_driver {
+ const struct dfl_feature_id *id_table;
const struct dfl_feature_ops *ops;
};
--
2.7.4
STP (SignalTap) is one of the private features under the port for
debugging. This patch adds private feature driver support for it
to allow userspace applications to mmap related mmio region and
provide STP service.
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
drivers/fpga/dfl-afu-main.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 754729e..14970a4 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -518,6 +518,36 @@ static const struct dfl_feature_ops port_afu_ops = {
.uinit = port_afu_uinit,
};
+static int port_stp_init(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ struct resource *res = &pdev->resource[feature->resource_index];
+
+ dev_dbg(&pdev->dev, "PORT STP Init.\n");
+
+ return afu_mmio_region_add(dev_get_platdata(&pdev->dev),
+ DFL_PORT_REGION_INDEX_STP,
+ resource_size(res), res->start,
+ DFL_PORT_REGION_MMAP | DFL_PORT_REGION_READ |
+ DFL_PORT_REGION_WRITE);
+}
+
+static void port_stp_uinit(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ dev_dbg(&pdev->dev, "PORT STP UInit.\n");
+}
+
+static const struct dfl_feature_id port_stp_id_table[] = {
+ {.id = PORT_FEATURE_ID_STP,},
+ {0,}
+};
+
+static const struct dfl_feature_ops port_stp_ops = {
+ .init = port_stp_init,
+ .uinit = port_stp_uinit,
+};
+
static struct dfl_feature_driver port_feature_drvs[] = {
{
.id_table = port_hdr_id_table,
@@ -532,6 +562,10 @@ static struct dfl_feature_driver port_feature_drvs[] = {
.ops = &port_err_ops,
},
{
+ .id_table = port_stp_id_table,
+ .ops = &port_stp_ops,
+ },
+ {
.ops = NULL,
}
};
--
2.7.4
This patch adds support for performance reporting private feature
for FPGA Management Engine (FME). Actually it supports 4 categories
performance counters, 'clock', 'cache', 'iommu' and 'fabric', user
could read the performance counter via exposed sysfs interfaces.
Please refer to sysfs doc for more details.
Signed-off-by: Luwei Kang <[email protected]>
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
Documentation/ABI/testing/sysfs-platform-dfl-fme | 86 ++
drivers/fpga/Makefile | 1 +
drivers/fpga/dfl-fme-main.c | 4 +
drivers/fpga/dfl-fme-perf.c | 950 +++++++++++++++++++++++
drivers/fpga/dfl-fme.h | 2 +
drivers/fpga/dfl.c | 1 +
drivers/fpga/dfl.h | 2 +
7 files changed, 1046 insertions(+)
create mode 100644 drivers/fpga/dfl-fme-perf.c
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index 38f9cdd..12f9449 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -214,3 +214,89 @@ KernelVersion: 5.2
Contact: Wu Hao <[email protected]>
Description: Read-Write. Write this file to inject errors for testing
purpose. Read this file to check errors injected.
+
+What: /sys/bus/platform/devices/dfl-fme.0/perf/clock
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Only. Read for Accelerator Function Unit (AFU) clock
+ counter.
+
+What: /sys/bus/platform/devices/dfl-fme.0/perf/cache/freeze
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Write. Read and Write this file to freeze or unfreeze
+ the 'cache' category performance counters.
+
+What: /sys/bus/platform/devices/dfl-fme.0/perf/cache/<counter>
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Only. Read 'cache' category performance counters:
+ read_hit, read_miss, write_hit, write_miss, hold_request,
+ data_write_port_contention, tag_write_port_contention,
+ tx_req_stall, rx_req_stall and rx_eviction.
+
+What: /sys/bus/platform/devices/dfl-fme.0/perf/iommu/freeze
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Write. Read and Write this file to freeze or unfreeze
+ the 'iommu' category performance counters.
+
+What: /sys/bus/platform/devices/dfl-fme.0/perf/iommu/<sip_counter>
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Only. Read 'iommu' category 'sip' sub category
+ performance counters: iotlb_4k_hit, iotlb_2m_hit,
+ iotlb_1g_hit, slpwc_l3_hit, slpwc_l4_hit, rcc_hit,
+ rcc_miss, iotlb_4k_miss, iotlb_2m_miss, iotlb_1g_miss,
+ slpwc_l3_miss and slpwc_l4_miss.
+
+What: /sys/bus/platform/devices/dfl-fme.0/perf/iommu/afu0/<counter>
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Only. Read 'iommu' category 'afuX' sub category
+ performance counters: read_transaction, write_transaction,
+ devtlb_read_hit, devtlb_write_hit, devtlb_4k_fill,
+ devtlb_2m_fill and devtlb_1g_fill.
+
+What: /sys/bus/platform/devices/dfl-fme.0/perf/fabric/freeze
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Write. Read and Write this file to freeze or unfreeze
+ the 'fabric' category performance counters.
+
+What: /sys/bus/platform/devices/dfl-fme.0/perf/fabric/<counter>
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Only. Read 'fabric' category performance counters:
+ pcie0_read, pcie0_write, pcie1_read, pcie1_write,
+ upi_read, upi_write and mmio_read.
+
+What: /sys/bus/platform/devices/dfl-fme.0/perf/fabric/enable
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Write. Read and write this file to enable device
+ level fabric counters sysfs interfaces in the same folder.
+
+What: /sys/bus/platform/devices/dfl-fme.0/perf/fabric/port0/<counter>
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Only. Read 'fabric' category "portX" sub category
+ performance counters: pcie0_read, pcie0_write, pcie1_read,
+ pcie1_write, upi_read, upi_write and mmio_read.
+
+What: /sys/bus/platform/devices/dfl-fme.0/perf/fabric/port0/enable
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Write. Read and write this file to enable port level
+ fabric counters sysfs interfaces in the same folder.
diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
index 1a9fa3d..7df3971 100644
--- a/drivers/fpga/Makefile
+++ b/drivers/fpga/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_FPGA_DFL_FME_REGION) += dfl-fme-region.o
obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o
dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o dfl-fme-error.o
+dfl-fme-objs += dfl-fme-perf.o
dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o
dfl-afu-objs += dfl-afu-error.o
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 76cb112..3abcb56 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -690,6 +690,10 @@ static struct dfl_feature_driver fme_feature_drvs[] = {
.ops = &fme_global_err_ops,
},
{
+ .id_table = fme_perf_id_table,
+ .ops = &fme_perf_ops,
+ },
+ {
.ops = NULL,
},
};
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
new file mode 100644
index 0000000..035bb68
--- /dev/null
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -0,0 +1,950 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for FPGA Management Engine (FME) Global Performance Reporting
+ *
+ * Copyright 2019 Intel Corporation, Inc.
+ *
+ * Authors:
+ * Kang Luwei <[email protected]>
+ * Xiao Guangrong <[email protected]>
+ * Wu Hao <[email protected]>
+ * Joseph Grecco <[email protected]>
+ * Enno Luebbers <[email protected]>
+ * Tim Whisonant <[email protected]>
+ * Ananda Ravuri <[email protected]>
+ * Mitchel, Henry <[email protected]>
+ */
+
+#include "dfl.h"
+#include "dfl-fme.h"
+
+/*
+ * Performance Counter Registers for Cache.
+ *
+ * Cache Events are listed below as CACHE_EVNT_*.
+ */
+#define CACHE_CTRL 0x8
+#define CACHE_RESET_CNTR BIT_ULL(0)
+#define CACHE_FREEZE_CNTR BIT_ULL(8)
+#define CACHE_CTRL_EVNT GENMASK_ULL(19, 16)
+#define CACHE_EVNT_RD_HIT 0x0
+#define CACHE_EVNT_WR_HIT 0x1
+#define CACHE_EVNT_RD_MISS 0x2
+#define CACHE_EVNT_WR_MISS 0x3
+#define CACHE_EVNT_RSVD 0x4
+#define CACHE_EVNT_HOLD_REQ 0x5
+#define CACHE_EVNT_DATA_WR_PORT_CONTEN 0x6
+#define CACHE_EVNT_TAG_WR_PORT_CONTEN 0x7
+#define CACHE_EVNT_TX_REQ_STALL 0x8
+#define CACHE_EVNT_RX_REQ_STALL 0x9
+#define CACHE_EVNT_EVICTIONS 0xa
+#define CACHE_EVNT_MAX CACHE_EVNT_EVICTIONS
+#define CACHE_CHANNEL_SEL BIT_ULL(20)
+#define CACHE_CHANNEL_RD 0
+#define CACHE_CHANNEL_WR 1
+#define CACHE_CHANNEL_MAX 2
+#define CACHE_CNTR0 0x10
+#define CACHE_CNTR1 0x18
+#define CACHE_CNTR_EVNT_CNTR GENMASK_ULL(47, 0)
+#define CACHE_CNTR_EVNT GENMASK_ULL(63, 60)
+
+/*
+ * Performance Counter Registers for Fabric.
+ *
+ * Fabric Events are listed below as FAB_EVNT_*
+ */
+#define FAB_CTRL 0x20
+#define FAB_RESET_CNTR BIT_ULL(0)
+#define FAB_FREEZE_CNTR BIT_ULL(8)
+#define FAB_CTRL_EVNT GENMASK_ULL(19, 16)
+#define FAB_EVNT_PCIE0_RD 0x0
+#define FAB_EVNT_PCIE0_WR 0x1
+#define FAB_EVNT_PCIE1_RD 0x2
+#define FAB_EVNT_PCIE1_WR 0x3
+#define FAB_EVNT_UPI_RD 0x4
+#define FAB_EVNT_UPI_WR 0x5
+#define FAB_EVNT_MMIO_RD 0x6
+#define FAB_EVNT_MMIO_WR 0x7
+#define FAB_EVNT_MAX FAB_EVNT_MMIO_WR
+#define FAB_PORT_ID GENMASK_ULL(21, 20)
+#define FAB_PORT_FILTER BIT_ULL(23)
+#define FAB_PORT_FILTER_DISABLE 0
+#define FAB_PORT_FILTER_ENABLE 1
+#define FAB_CNTR 0x28
+#define FAB_CNTR_EVNT_CNTR GENMASK_ULL(59, 0)
+#define FAB_CNTR_EVNT GENMASK_ULL(63, 60)
+
+/*
+ * Performance Counter Registers for Clock.
+ *
+ * Clock Counter can't be reset or frozen by SW.
+ */
+#define CLK_CNTR 0x30
+
+/*
+ * Performance Counter Registers for IOMMU / VT-D.
+ *
+ * VT-D Events are listed below as VTD_EVNT_* and VTD_SIP_EVNT_*
+ */
+#define VTD_CTRL 0x38
+#define VTD_RESET_CNTR BIT_ULL(0)
+#define VTD_FREEZE_CNTR BIT_ULL(8)
+#define VTD_CTRL_EVNT GENMASK_ULL(19, 16)
+#define VTD_EVNT_AFU_MEM_RD_TRANS 0x0
+#define VTD_EVNT_AFU_MEM_WR_TRANS 0x1
+#define VTD_EVNT_AFU_DEVTLB_RD_HIT 0x2
+#define VTD_EVNT_AFU_DEVTLB_WR_HIT 0x3
+#define VTD_EVNT_DEVTLB_4K_FILL 0x4
+#define VTD_EVNT_DEVTLB_2M_FILL 0x5
+#define VTD_EVNT_DEVTLB_1G_FILL 0x6
+#define VTD_EVNT_MAX VTD_EVNT_DEVTLB_1G_FILL
+#define VTD_CNTR 0x40
+#define VTD_CNTR_EVNT GENMASK_ULL(63, 60)
+#define VTD_CNTR_EVNT_CNTR GENMASK_ULL(47, 0)
+#define VTD_SIP_CTRL 0x48
+#define VTD_SIP_RESET_CNTR BIT_ULL(0)
+#define VTD_SIP_FREEZE_CNTR BIT_ULL(8)
+#define VTD_SIP_CTRL_EVNT GENMASK_ULL(19, 16)
+#define VTD_SIP_EVNT_IOTLB_4K_HIT 0x0
+#define VTD_SIP_EVNT_IOTLB_2M_HIT 0x1
+#define VTD_SIP_EVNT_IOTLB_1G_HIT 0x2
+#define VTD_SIP_EVNT_SLPWC_L3_HIT 0x3
+#define VTD_SIP_EVNT_SLPWC_L4_HIT 0x4
+#define VTD_SIP_EVNT_RCC_HIT 0x5
+#define VTD_SIP_EVNT_IOTLB_4K_MISS 0x6
+#define VTD_SIP_EVNT_IOTLB_2M_MISS 0x7
+#define VTD_SIP_EVNT_IOTLB_1G_MISS 0x8
+#define VTD_SIP_EVNT_SLPWC_L3_MISS 0x9
+#define VTD_SIP_EVNT_SLPWC_L4_MISS 0xa
+#define VTD_SIP_EVNT_RCC_MISS 0xb
+#define VTD_SIP_EVNT_MAX VTD_SIP_EVNT_RCC_MISS
+#define VTD_SIP_CNTR 0X50
+#define VTD_SIP_CNTR_EVNT GENMASK_ULL(63, 60)
+#define VTD_SIP_CNTR_EVNT_CNTR GENMASK_ULL(47, 0)
+
+#define PERF_OBJ_ROOT_ID (~0)
+
+#define PERF_TIMEOUT 30
+
+/**
+ * struct perf_object - object of performance counter
+ *
+ * @id: instance id. PERF_OBJ_ROOT_ID indicates it is a parent object which
+ * counts performance counters for all instances.
+ * @attr_groups: the sysfs files are associated with this object.
+ * @feature: pointer to related private feature.
+ * @node: used to link itself to parent's children list.
+ * @children: used to link its children objects together.
+ * @kobj: generic kobject interface.
+ *
+ * 'node' and 'children' are used to construct parent-children hierarchy.
+ */
+struct perf_object {
+ int id;
+ const struct attribute_group **attr_groups;
+ struct dfl_feature *feature;
+
+ struct list_head node;
+ struct list_head children;
+ struct kobject kobj;
+};
+
+/**
+ * struct perf_obj_attribute - attribute of perf object
+ *
+ * @attr: attribute of this perf object.
+ * @show: show callback for sysfs attribute.
+ * @store: store callback for sysfs attribute.
+ */
+struct perf_obj_attribute {
+ struct attribute attr;
+ ssize_t (*show)(struct perf_object *pobj, char *buf);
+ ssize_t (*store)(struct perf_object *pobj,
+ const char *buf, size_t n);
+};
+
+#define to_perf_obj_attr(_attr) \
+ container_of(_attr, struct perf_obj_attribute, attr)
+#define to_perf_obj(_kobj) \
+ container_of(_kobj, struct perf_object, kobj)
+
+#define PERF_OBJ_ATTR(_name, _filename, _mode, _show, _store) \
+struct perf_obj_attribute perf_obj_attr_##_name = \
+ __ATTR(_filename, _mode, _show, _store)
+
+#define PERF_OBJ_ATTR_RW(_name) \
+ struct perf_obj_attribute perf_obj_attr_##_name = __ATTR_RW(_name)
+#define PERF_OBJ_ATTR_RO(_name) \
+ struct perf_obj_attribute perf_obj_attr_##_name = __ATTR_RO(_name)
+#define PERF_OBJ_ATTR_WO(_name) \
+ struct perf_obj_attribute perf_obj_attr_##_name = __ATTR_WO(_name)
+
+static ssize_t perf_obj_attr_show(struct kobject *kobj,
+ struct attribute *__attr, char *buf)
+{
+ struct perf_obj_attribute *attr = to_perf_obj_attr(__attr);
+ struct perf_object *pobj = to_perf_obj(kobj);
+ ssize_t ret = -EIO;
+
+ if (attr->show)
+ ret = attr->show(pobj, buf);
+ return ret;
+}
+
+static ssize_t perf_obj_attr_store(struct kobject *kobj,
+ struct attribute *__attr,
+ const char *buf, size_t n)
+{
+ struct perf_obj_attribute *attr = to_perf_obj_attr(__attr);
+ struct perf_object *pobj = to_perf_obj(kobj);
+ ssize_t ret = -EIO;
+
+ if (attr->store)
+ ret = attr->store(pobj, buf, n);
+ return ret;
+}
+
+static const struct sysfs_ops perf_obj_sysfs_ops = {
+ .show = perf_obj_attr_show,
+ .store = perf_obj_attr_store,
+};
+
+static void perf_obj_release(struct kobject *kobj)
+{
+ kfree(to_perf_obj(kobj));
+}
+
+static struct kobj_type perf_obj_ktype = {
+ .sysfs_ops = &perf_obj_sysfs_ops,
+ .release = perf_obj_release,
+};
+
+static struct perf_object *
+create_perf_obj(struct dfl_feature *feature, struct kobject *parent, int id,
+ const struct attribute_group **groups, const char *name)
+{
+ struct perf_object *pobj;
+ int ret;
+
+ pobj = kzalloc(sizeof(*pobj), GFP_KERNEL);
+ if (!pobj)
+ return ERR_PTR(-ENOMEM);
+
+ pobj->id = id;
+ pobj->feature = feature;
+ pobj->attr_groups = groups;
+ INIT_LIST_HEAD(&pobj->node);
+ INIT_LIST_HEAD(&pobj->children);
+
+ if (id != PERF_OBJ_ROOT_ID)
+ ret = kobject_init_and_add(&pobj->kobj, &perf_obj_ktype,
+ parent, "%s%d", name, id);
+ else
+ ret = kobject_init_and_add(&pobj->kobj, &perf_obj_ktype,
+ parent, "%s", name);
+ if (ret)
+ goto put_exit;
+
+ if (pobj->attr_groups) {
+ ret = sysfs_create_groups(&pobj->kobj, pobj->attr_groups);
+ if (ret)
+ goto del_exit;
+ }
+
+ return pobj;
+
+del_exit:
+ kobject_del(&pobj->kobj);
+put_exit:
+ kobject_put(&pobj->kobj);
+ return ERR_PTR(ret);
+}
+
+/*
+ * Counter Sysfs Interface for Clock.
+ */
+static ssize_t clock_show(struct perf_object *pobj, char *buf)
+{
+ void __iomem *base = pobj->feature->ioaddr;
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
+ (unsigned long long)readq(base + CLK_CNTR));
+}
+static PERF_OBJ_ATTR_RO(clock);
+
+static struct attribute *clock_attrs[] = {
+ &perf_obj_attr_clock.attr,
+ NULL,
+};
+
+static struct attribute_group clock_attr_group = {
+ .attrs = clock_attrs,
+};
+
+static const struct attribute_group *perf_dev_attr_groups[] = {
+ &clock_attr_group,
+ NULL,
+};
+
+static void destroy_perf_obj(struct perf_object *pobj)
+{
+ struct perf_object *obj, *obj_tmp;
+
+ list_for_each_entry_safe(obj, obj_tmp, &pobj->children, node)
+ destroy_perf_obj(obj);
+
+ list_del(&pobj->node);
+ if (pobj->attr_groups)
+ sysfs_remove_groups(&pobj->kobj, pobj->attr_groups);
+ kobject_put(&pobj->kobj);
+}
+
+static struct perf_object *create_perf_dev(struct dfl_feature *feature)
+{
+ struct platform_device *pdev = feature->pdev;
+
+ return create_perf_obj(feature, &pdev->dev.kobj, PERF_OBJ_ROOT_ID,
+ perf_dev_attr_groups, "perf");
+}
+
+/*
+ * Counter Sysfs Interfaces for Cache.
+ */
+static ssize_t cache_freeze_show(struct perf_object *pobj, char *buf)
+{
+ void __iomem *base = pobj->feature->ioaddr;
+ u64 v;
+
+ v = readq(base + CACHE_CTRL);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(CACHE_FREEZE_CNTR, v));
+}
+
+static ssize_t cache_freeze_store(struct perf_object *pobj,
+ const char *buf, size_t n)
+{
+ struct dfl_feature *feature = pobj->feature;
+ struct dfl_feature_platform_data *pdata;
+ void __iomem *base = feature->ioaddr;
+ bool state;
+ u64 v;
+
+ if (strtobool(buf, &state))
+ return -EINVAL;
+
+ pdata = dev_get_platdata(&feature->pdev->dev);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + CACHE_CTRL);
+ v &= ~CACHE_FREEZE_CNTR;
+ v |= FIELD_PREP(CACHE_FREEZE_CNTR, state ? 1 : 0);
+ writeq(v, base + CACHE_CTRL);
+ mutex_unlock(&pdata->lock);
+
+ return n;
+}
+static PERF_OBJ_ATTR(cache_freeze, freeze, 0644,
+ cache_freeze_show, cache_freeze_store);
+
+static ssize_t read_cache_counter(struct perf_object *pobj, char *buf,
+ u8 channel, u8 event)
+{
+ struct dfl_feature *feature = pobj->feature;
+ struct dfl_feature_platform_data *pdata;
+ void __iomem *base = feature->ioaddr;
+ u64 v, count;
+
+ if (event > CACHE_EVNT_MAX || channel > CACHE_CHANNEL_MAX)
+ return -EINVAL;
+
+ pdata = dev_get_platdata(&feature->pdev->dev);
+
+ mutex_lock(&pdata->lock);
+ /* set channel access type and cache event code. */
+ v = readq(base + CACHE_CTRL);
+ v &= ~(CACHE_CHANNEL_SEL | CACHE_CTRL_EVNT);
+ v |= FIELD_PREP(CACHE_CHANNEL_SEL, channel);
+ v |= FIELD_PREP(CACHE_CTRL_EVNT, event);
+ writeq(v, base + CACHE_CTRL);
+
+ if (readq_poll_timeout(base + CACHE_CNTR0, v,
+ FIELD_GET(CACHE_CNTR_EVNT, v) == event,
+ 1, PERF_TIMEOUT)) {
+ dev_err(&feature->pdev->dev, "timeout, unmatched cache event type in counter registers.\n");
+ mutex_unlock(&pdata->lock);
+ return -ETIMEDOUT;
+ }
+
+ v = readq(base + CACHE_CNTR0);
+ count = FIELD_GET(CACHE_CNTR_EVNT_CNTR, v);
+ v = readq(base + CACHE_CNTR1);
+ count += FIELD_GET(CACHE_CNTR_EVNT_CNTR, v);
+ mutex_unlock(&pdata->lock);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n", (unsigned long long)count);
+}
+
+#define CACHE_SHOW(name, type, event) \
+static ssize_t name##_show(struct perf_object *pobj, char *buf) \
+{ \
+ return read_cache_counter(pobj, buf, type, event); \
+} \
+static PERF_OBJ_ATTR_RO(name)
+
+CACHE_SHOW(read_hit, CACHE_CHANNEL_RD, CACHE_EVNT_RD_HIT);
+CACHE_SHOW(read_miss, CACHE_CHANNEL_RD, CACHE_EVNT_RD_MISS);
+CACHE_SHOW(write_hit, CACHE_CHANNEL_WR, CACHE_EVNT_WR_HIT);
+CACHE_SHOW(write_miss, CACHE_CHANNEL_WR, CACHE_EVNT_WR_MISS);
+CACHE_SHOW(hold_request, CACHE_CHANNEL_RD, CACHE_EVNT_HOLD_REQ);
+CACHE_SHOW(tx_req_stall, CACHE_CHANNEL_RD, CACHE_EVNT_TX_REQ_STALL);
+CACHE_SHOW(rx_req_stall, CACHE_CHANNEL_RD, CACHE_EVNT_RX_REQ_STALL);
+CACHE_SHOW(rx_eviction, CACHE_CHANNEL_RD, CACHE_EVNT_EVICTIONS);
+CACHE_SHOW(data_write_port_contention, CACHE_CHANNEL_WR,
+ CACHE_EVNT_DATA_WR_PORT_CONTEN);
+CACHE_SHOW(tag_write_port_contention, CACHE_CHANNEL_WR,
+ CACHE_EVNT_TAG_WR_PORT_CONTEN);
+
+static struct attribute *cache_attrs[] = {
+ &perf_obj_attr_read_hit.attr,
+ &perf_obj_attr_read_miss.attr,
+ &perf_obj_attr_write_hit.attr,
+ &perf_obj_attr_write_miss.attr,
+ &perf_obj_attr_hold_request.attr,
+ &perf_obj_attr_data_write_port_contention.attr,
+ &perf_obj_attr_tag_write_port_contention.attr,
+ &perf_obj_attr_tx_req_stall.attr,
+ &perf_obj_attr_rx_req_stall.attr,
+ &perf_obj_attr_rx_eviction.attr,
+ &perf_obj_attr_cache_freeze.attr,
+ NULL,
+};
+
+static struct attribute_group cache_attr_group = {
+ .attrs = cache_attrs,
+};
+
+static const struct attribute_group *cache_attr_groups[] = {
+ &cache_attr_group,
+ NULL,
+};
+
+static int create_perf_cache_obj(struct perf_object *perf_dev)
+{
+ struct perf_object *pobj;
+
+ pobj = create_perf_obj(perf_dev->feature, &perf_dev->kobj,
+ PERF_OBJ_ROOT_ID, cache_attr_groups, "cache");
+ if (IS_ERR(pobj))
+ return PTR_ERR(pobj);
+
+ list_add(&pobj->node, &perf_dev->children);
+
+ return 0;
+}
+
+/*
+ * Counter Sysfs Interfaces for VT-D / IOMMU.
+ */
+static ssize_t vtd_freeze_show(struct perf_object *pobj, char *buf)
+{
+ void __iomem *base = pobj->feature->ioaddr;
+ u64 v;
+
+ v = readq(base + VTD_CTRL);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(VTD_FREEZE_CNTR, v));
+}
+
+static ssize_t vtd_freeze_store(struct perf_object *pobj,
+ const char *buf, size_t n)
+{
+ struct dfl_feature *feature = pobj->feature;
+ struct dfl_feature_platform_data *pdata;
+ void __iomem *base = feature->ioaddr;
+ bool state;
+ u64 v;
+
+ if (strtobool(buf, &state))
+ return -EINVAL;
+
+ pdata = dev_get_platdata(&feature->pdev->dev);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + VTD_CTRL);
+ v &= ~VTD_FREEZE_CNTR;
+ v |= FIELD_PREP(VTD_FREEZE_CNTR, state ? 1 : 0);
+ writeq(v, base + VTD_CTRL);
+ mutex_unlock(&pdata->lock);
+
+ return n;
+}
+static PERF_OBJ_ATTR(vtd_freeze, freeze, 0644,
+ vtd_freeze_show, vtd_freeze_store);
+
+static struct attribute *iommu_top_attrs[] = {
+ &perf_obj_attr_vtd_freeze.attr,
+ NULL,
+};
+
+static struct attribute_group iommu_top_attr_group = {
+ .attrs = iommu_top_attrs,
+};
+
+static ssize_t read_iommu_sip_counter(struct perf_object *pobj,
+ u8 event, char *buf)
+{
+ struct dfl_feature *feature = pobj->feature;
+ struct dfl_feature_platform_data *pdata;
+ void __iomem *base = feature->ioaddr;
+ u64 v, count;
+
+ if (event > VTD_SIP_EVNT_MAX)
+ return -EINVAL;
+
+ pdata = dev_get_platdata(&feature->pdev->dev);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + VTD_SIP_CTRL);
+ v &= ~VTD_SIP_CTRL_EVNT;
+ v |= FIELD_PREP(VTD_SIP_CTRL_EVNT, event);
+ writeq(v, base + VTD_SIP_CTRL);
+
+ if (readq_poll_timeout(base + VTD_SIP_CNTR, v,
+ FIELD_GET(VTD_SIP_CNTR_EVNT, v) == event,
+ 1, PERF_TIMEOUT)) {
+ dev_err(&feature->pdev->dev, "timeout, unmatched VTd SIP event type in counter registers\n");
+ mutex_unlock(&pdata->lock);
+ return -ETIMEDOUT;
+ }
+
+ v = readq(base + VTD_SIP_CNTR);
+ count = FIELD_GET(VTD_SIP_CNTR_EVNT_CNTR, v);
+ mutex_unlock(&pdata->lock);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n", (unsigned long long)count);
+}
+
+#define VTD_SIP_SHOW(name, event) \
+static ssize_t name##_show(struct perf_object *pobj, char *buf) \
+{ \
+ return read_iommu_sip_counter(pobj, event, buf); \
+} \
+static PERF_OBJ_ATTR_RO(name)
+
+VTD_SIP_SHOW(iotlb_4k_hit, VTD_SIP_EVNT_IOTLB_4K_HIT);
+VTD_SIP_SHOW(iotlb_2m_hit, VTD_SIP_EVNT_IOTLB_2M_HIT);
+VTD_SIP_SHOW(iotlb_1g_hit, VTD_SIP_EVNT_IOTLB_1G_HIT);
+VTD_SIP_SHOW(slpwc_l3_hit, VTD_SIP_EVNT_SLPWC_L3_HIT);
+VTD_SIP_SHOW(slpwc_l4_hit, VTD_SIP_EVNT_SLPWC_L4_HIT);
+VTD_SIP_SHOW(rcc_hit, VTD_SIP_EVNT_RCC_HIT);
+VTD_SIP_SHOW(iotlb_4k_miss, VTD_SIP_EVNT_IOTLB_4K_MISS);
+VTD_SIP_SHOW(iotlb_2m_miss, VTD_SIP_EVNT_IOTLB_2M_MISS);
+VTD_SIP_SHOW(iotlb_1g_miss, VTD_SIP_EVNT_IOTLB_1G_MISS);
+VTD_SIP_SHOW(slpwc_l3_miss, VTD_SIP_EVNT_SLPWC_L3_MISS);
+VTD_SIP_SHOW(slpwc_l4_miss, VTD_SIP_EVNT_SLPWC_L4_MISS);
+VTD_SIP_SHOW(rcc_miss, VTD_SIP_EVNT_RCC_MISS);
+
+static struct attribute *iommu_sip_attrs[] = {
+ &perf_obj_attr_iotlb_4k_hit.attr,
+ &perf_obj_attr_iotlb_2m_hit.attr,
+ &perf_obj_attr_iotlb_1g_hit.attr,
+ &perf_obj_attr_slpwc_l3_hit.attr,
+ &perf_obj_attr_slpwc_l4_hit.attr,
+ &perf_obj_attr_rcc_hit.attr,
+ &perf_obj_attr_iotlb_4k_miss.attr,
+ &perf_obj_attr_iotlb_2m_miss.attr,
+ &perf_obj_attr_iotlb_1g_miss.attr,
+ &perf_obj_attr_slpwc_l3_miss.attr,
+ &perf_obj_attr_slpwc_l4_miss.attr,
+ &perf_obj_attr_rcc_miss.attr,
+ NULL,
+};
+
+static struct attribute_group iommu_sip_attr_group = {
+ .attrs = iommu_sip_attrs,
+};
+
+static const struct attribute_group *iommu_top_attr_groups[] = {
+ &iommu_top_attr_group,
+ &iommu_sip_attr_group,
+ NULL,
+};
+
+static ssize_t read_iommu_counter(struct perf_object *pobj, u8 event, char *buf)
+{
+ struct dfl_feature *feature = pobj->feature;
+ struct dfl_feature_platform_data *pdata;
+ void __iomem *base = feature->ioaddr;
+ u64 v, count;
+
+ if (event > VTD_EVNT_MAX)
+ return -EINVAL;
+
+ event += pobj->id;
+ pdata = dev_get_platdata(&feature->pdev->dev);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + VTD_CTRL);
+ v &= ~VTD_CTRL_EVNT;
+ v |= FIELD_PREP(VTD_CTRL_EVNT, event);
+ writeq(v, base + VTD_CTRL);
+
+ if (readq_poll_timeout(base + VTD_CNTR, v,
+ FIELD_GET(VTD_CNTR_EVNT, v) == event, 1,
+ PERF_TIMEOUT)) {
+ dev_err(&feature->pdev->dev, "timeout, unmatched VTd event type in counter registers\n");
+ mutex_unlock(&pdata->lock);
+ return -ETIMEDOUT;
+ }
+
+ v = readq(base + VTD_CNTR);
+ count = FIELD_GET(VTD_CNTR_EVNT_CNTR, v);
+ mutex_unlock(&pdata->lock);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n", (unsigned long long)count);
+}
+
+#define VTD_SHOW(name, base_event) \
+static ssize_t name##_show(struct perf_object *pobj, char *buf) \
+{ \
+ return read_iommu_counter(pobj, base_event, buf); \
+} \
+static PERF_OBJ_ATTR_RO(name)
+
+VTD_SHOW(read_transaction, VTD_EVNT_AFU_MEM_RD_TRANS);
+VTD_SHOW(write_transaction, VTD_EVNT_AFU_MEM_WR_TRANS);
+VTD_SHOW(devtlb_read_hit, VTD_EVNT_AFU_DEVTLB_RD_HIT);
+VTD_SHOW(devtlb_write_hit, VTD_EVNT_AFU_DEVTLB_WR_HIT);
+VTD_SHOW(devtlb_4k_fill, VTD_EVNT_DEVTLB_4K_FILL);
+VTD_SHOW(devtlb_2m_fill, VTD_EVNT_DEVTLB_2M_FILL);
+VTD_SHOW(devtlb_1g_fill, VTD_EVNT_DEVTLB_1G_FILL);
+
+static struct attribute *iommu_attrs[] = {
+ &perf_obj_attr_read_transaction.attr,
+ &perf_obj_attr_write_transaction.attr,
+ &perf_obj_attr_devtlb_read_hit.attr,
+ &perf_obj_attr_devtlb_write_hit.attr,
+ &perf_obj_attr_devtlb_4k_fill.attr,
+ &perf_obj_attr_devtlb_2m_fill.attr,
+ &perf_obj_attr_devtlb_1g_fill.attr,
+ NULL,
+};
+
+static struct attribute_group iommu_attr_group = {
+ .attrs = iommu_attrs,
+};
+
+static const struct attribute_group *iommu_attr_groups[] = {
+ &iommu_attr_group,
+ NULL,
+};
+
+#define PERF_MAX_PORT_NUM 1
+
+static int create_perf_iommu_obj(struct perf_object *perf_dev)
+{
+ struct dfl_feature *feature = perf_dev->feature;
+ struct device *dev = &feature->pdev->dev;
+ struct perf_object *pobj, *obj;
+ void __iomem *base;
+ u64 v;
+ int i;
+
+ /* check if iommu is not supported on this device. */
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
+ v = readq(base + FME_HDR_CAP);
+ if (!FIELD_GET(FME_CAP_IOMMU_AVL, v))
+ return 0;
+
+ pobj = create_perf_obj(feature, &perf_dev->kobj, PERF_OBJ_ROOT_ID,
+ iommu_top_attr_groups, "iommu");
+ if (IS_ERR(pobj))
+ return PTR_ERR(pobj);
+
+ list_add(&pobj->node, &perf_dev->children);
+
+ for (i = 0; i < PERF_MAX_PORT_NUM; i++) {
+ obj = create_perf_obj(feature, &pobj->kobj, i,
+ iommu_attr_groups, "afu");
+ if (IS_ERR(obj))
+ return PTR_ERR(obj);
+
+ list_add(&obj->node, &pobj->children);
+ }
+
+ return 0;
+}
+
+/*
+ * Counter Sysfs Interfaces for Fabric
+ */
+static bool fabric_pobj_is_enabled(struct perf_object *pobj)
+{
+ struct dfl_feature *feature = pobj->feature;
+ void __iomem *base = feature->ioaddr;
+ u64 v;
+
+ v = readq(base + FAB_CTRL);
+
+ if (FIELD_GET(FAB_PORT_FILTER, v) == FAB_PORT_FILTER_DISABLE)
+ return pobj->id == PERF_OBJ_ROOT_ID;
+
+ return pobj->id == FIELD_GET(FAB_PORT_ID, v);
+}
+
+static ssize_t read_fabric_counter(struct perf_object *pobj,
+ u8 event, char *buf)
+{
+ struct dfl_feature *feature = pobj->feature;
+ struct dfl_feature_platform_data *pdata;
+ void __iomem *base = feature->ioaddr;
+ u64 v, count = 0;
+
+ if (event > FAB_EVNT_MAX)
+ return -EINVAL;
+
+ pdata = dev_get_platdata(&feature->pdev->dev);
+
+ mutex_lock(&pdata->lock);
+ /* if it is disabled, force the counter to return zero. */
+ if (!fabric_pobj_is_enabled(pobj))
+ goto exit;
+
+ v = readq(base + FAB_CTRL);
+ v &= ~FAB_CTRL_EVNT;
+ v |= FIELD_PREP(FAB_CTRL_EVNT, event);
+ writeq(v, base + FAB_CTRL);
+
+ if (readq_poll_timeout(base + FAB_CNTR, v,
+ FIELD_GET(FAB_CNTR_EVNT, v) == event,
+ 1, PERF_TIMEOUT)) {
+ dev_err(&feature->pdev->dev, "timeout, unmatched fab event type in counter registers.\n");
+ mutex_unlock(&pdata->lock);
+ return -ETIMEDOUT;
+ }
+
+ v = readq(base + FAB_CNTR);
+ count = FIELD_GET(FAB_CNTR_EVNT_CNTR, v);
+exit:
+ mutex_unlock(&pdata->lock);
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n", (unsigned long long)count);
+}
+
+#define FAB_SHOW(name, event) \
+static ssize_t name##_show(struct perf_object *pobj, char *buf) \
+{ \
+ return read_fabric_counter(pobj, event, buf); \
+} \
+static PERF_OBJ_ATTR_RO(name)
+
+FAB_SHOW(pcie0_read, FAB_EVNT_PCIE0_RD);
+FAB_SHOW(pcie0_write, FAB_EVNT_PCIE0_WR);
+FAB_SHOW(pcie1_read, FAB_EVNT_PCIE1_RD);
+FAB_SHOW(pcie1_write, FAB_EVNT_PCIE1_WR);
+FAB_SHOW(upi_read, FAB_EVNT_UPI_RD);
+FAB_SHOW(upi_write, FAB_EVNT_UPI_WR);
+FAB_SHOW(mmio_read, FAB_EVNT_MMIO_RD);
+FAB_SHOW(mmio_write, FAB_EVNT_MMIO_WR);
+
+static ssize_t fab_enable_show(struct perf_object *pobj, char *buf)
+{
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)!!fabric_pobj_is_enabled(pobj));
+}
+
+/*
+ * If enable one port or all port event counter in fabric, other
+ * fabric event counter originally enabled will be disable automatically.
+ */
+static ssize_t fab_enable_store(struct perf_object *pobj,
+ const char *buf, size_t n)
+{
+ struct dfl_feature *feature = pobj->feature;
+ struct dfl_feature_platform_data *pdata;
+ void __iomem *base = feature->ioaddr;
+ bool state;
+ u64 v;
+
+ if (strtobool(buf, &state) || !state)
+ return -EINVAL;
+
+ pdata = dev_get_platdata(&feature->pdev->dev);
+
+ /* if it is already enabled. */
+ if (fabric_pobj_is_enabled(pobj))
+ return n;
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + FAB_CTRL);
+ v &= ~(FAB_PORT_FILTER | FAB_PORT_ID);
+
+ if (pobj->id == PERF_OBJ_ROOT_ID) {
+ v |= FIELD_PREP(FAB_PORT_FILTER, FAB_PORT_FILTER_DISABLE);
+ } else {
+ v |= FIELD_PREP(FAB_PORT_FILTER, FAB_PORT_FILTER_ENABLE);
+ v |= FIELD_PREP(FAB_PORT_ID, pobj->id);
+ }
+ writeq(v, base + FAB_CTRL);
+ mutex_unlock(&pdata->lock);
+
+ return n;
+}
+static PERF_OBJ_ATTR(fab_enable, enable, 0644,
+ fab_enable_show, fab_enable_store);
+
+static struct attribute *fabric_attrs[] = {
+ &perf_obj_attr_pcie0_read.attr,
+ &perf_obj_attr_pcie0_write.attr,
+ &perf_obj_attr_pcie1_read.attr,
+ &perf_obj_attr_pcie1_write.attr,
+ &perf_obj_attr_upi_read.attr,
+ &perf_obj_attr_upi_write.attr,
+ &perf_obj_attr_mmio_read.attr,
+ &perf_obj_attr_mmio_write.attr,
+ &perf_obj_attr_fab_enable.attr,
+ NULL,
+};
+
+static struct attribute_group fabric_attr_group = {
+ .attrs = fabric_attrs,
+};
+
+static const struct attribute_group *fabric_attr_groups[] = {
+ &fabric_attr_group,
+ NULL,
+};
+
+static ssize_t fab_freeze_show(struct perf_object *pobj, char *buf)
+{
+ void __iomem *base = pobj->feature->ioaddr;
+ u64 v;
+
+ v = readq(base + FAB_CTRL);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(FAB_FREEZE_CNTR, v));
+}
+
+static ssize_t fab_freeze_store(struct perf_object *pobj,
+ const char *buf, size_t n)
+{
+ struct dfl_feature *feature = pobj->feature;
+ struct dfl_feature_platform_data *pdata;
+ void __iomem *base = feature->ioaddr;
+ bool state;
+ u64 v;
+
+ if (strtobool(buf, &state))
+ return -EINVAL;
+
+ pdata = dev_get_platdata(&feature->pdev->dev);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + FAB_CTRL);
+ v &= ~FAB_FREEZE_CNTR;
+ v |= FIELD_PREP(FAB_FREEZE_CNTR, state ? 1 : 0);
+ writeq(v, base + FAB_CTRL);
+ mutex_unlock(&pdata->lock);
+
+ return n;
+}
+static PERF_OBJ_ATTR(fab_freeze, freeze, 0644,
+ fab_freeze_show, fab_freeze_store);
+
+static struct attribute *fabric_top_attrs[] = {
+ &perf_obj_attr_fab_freeze.attr,
+ NULL,
+};
+
+static struct attribute_group fabric_top_attr_group = {
+ .attrs = fabric_top_attrs,
+};
+
+static const struct attribute_group *fabric_top_attr_groups[] = {
+ &fabric_attr_group,
+ &fabric_top_attr_group,
+ NULL,
+};
+
+static int create_perf_fabric_obj(struct perf_object *perf_dev)
+{
+ struct perf_object *pobj, *obj;
+ int i;
+
+ pobj = create_perf_obj(perf_dev->feature, &perf_dev->kobj,
+ PERF_OBJ_ROOT_ID, fabric_top_attr_groups,
+ "fabric");
+ if (IS_ERR(pobj))
+ return PTR_ERR(pobj);
+
+ list_add(&pobj->node, &perf_dev->children);
+
+ for (i = 0; i < PERF_MAX_PORT_NUM; i++) {
+ obj = create_perf_obj(perf_dev->feature, &pobj->kobj, i,
+ fabric_attr_groups, "port");
+ if (IS_ERR(obj))
+ return PTR_ERR(obj);
+
+ list_add(&obj->node, &pobj->children);
+ }
+
+ return 0;
+}
+
+static int fme_perf_init(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ struct perf_object *perf_dev;
+ int ret;
+
+ perf_dev = create_perf_dev(feature);
+ if (IS_ERR(perf_dev))
+ return PTR_ERR(perf_dev);
+
+ ret = create_perf_fabric_obj(perf_dev);
+ if (ret)
+ goto done;
+
+ if (feature->id == FME_FEATURE_ID_GLOBAL_IPERF) {
+ /*
+ * Cache and IOMMU(VT-D) performance counters are not supported
+ * on discreted solutions e.g. Intel Programmable Acceleration
+ * Card based on PCIe.
+ */
+ ret = create_perf_cache_obj(perf_dev);
+ if (ret)
+ goto done;
+
+ ret = create_perf_iommu_obj(perf_dev);
+ if (ret)
+ goto done;
+ }
+
+ feature->priv = perf_dev;
+ return 0;
+
+done:
+ destroy_perf_obj(perf_dev);
+ return ret;
+}
+
+static void fme_perf_uinit(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ struct perf_object *perf_dev = feature->priv;
+
+ destroy_perf_obj(perf_dev);
+}
+
+const struct dfl_feature_id fme_perf_id_table[] = {
+ {.id = FME_FEATURE_ID_GLOBAL_IPERF,},
+ {.id = FME_FEATURE_ID_GLOBAL_DPERF,},
+ {0,}
+};
+
+const struct dfl_feature_ops fme_perf_ops = {
+ .init = fme_perf_init,
+ .uinit = fme_perf_uinit,
+};
diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
index 5fbe3f5..dc71048 100644
--- a/drivers/fpga/dfl-fme.h
+++ b/drivers/fpga/dfl-fme.h
@@ -39,5 +39,7 @@ extern const struct dfl_feature_ops fme_pr_mgmt_ops;
extern const struct dfl_feature_id fme_pr_mgmt_id_table[];
extern const struct dfl_feature_ops fme_global_err_ops;
extern const struct dfl_feature_id fme_global_err_id_table[];
+extern const struct dfl_feature_ops fme_perf_ops;
+extern const struct dfl_feature_id fme_perf_id_table[];
#endif /* __DFL_FME_H */
diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
index 65f91ef..637692a 100644
--- a/drivers/fpga/dfl.c
+++ b/drivers/fpga/dfl.c
@@ -507,6 +507,7 @@ static int build_info_commit_dev(struct build_feature_devs_info *binfo)
struct dfl_feature *feature = &pdata->features[index];
/* save resource information for each feature */
+ feature->pdev = fdev;
feature->id = finfo->fid;
feature->resource_index = index;
feature->ioaddr = finfo->ioaddr;
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index 6c32080..bf23436 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -191,6 +191,7 @@ struct dfl_feature_driver {
/**
* struct dfl_feature - sub feature of the feature devices
*
+ * @pdev: parent platform device.
* @id: sub feature id.
* @resource_index: each sub feature has one mmio resource for its registers.
* this index is used to find its mmio resource from the
@@ -200,6 +201,7 @@ struct dfl_feature_driver {
* @priv: priv data of this feature.
*/
struct dfl_feature {
+ struct platform_device *pdev;
u64 id;
int resource_index;
void __iomem *ioaddr;
--
2.7.4
This patch adds 3 read-only sysfs interfaces for FPGA Management Engine
(FME) block for capabilities including cache_size, fabric_version and
socket_id.
Signed-off-by: Luwei Kang <[email protected]>
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
Documentation/ABI/testing/sysfs-platform-dfl-fme | 23 ++++++++++++
drivers/fpga/dfl-fme-main.c | 48 ++++++++++++++++++++++++
2 files changed, 71 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index 8fa4feb..b8327e9 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -21,3 +21,26 @@ Contact: Wu Hao <[email protected]>
Description: Read-only. It returns Bitstream (static FPGA region) meta
data, which includes the synthesis date, seed and other
information of this static FPGA region.
+
+What: /sys/bus/platform/devices/dfl-fme.0/cache_size
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns cache size of this FPGA device.
+
+What: /sys/bus/platform/devices/dfl-fme.0/fabric_version
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns fabric version of this FPGA device.
+ Userspace applications need this information to select
+ best data channels per different fabric design.
+
+What: /sys/bus/platform/devices/dfl-fme.0/socket_id
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns socket_id to indicate which socket
+ this FPGA belongs to, only valid for integrated solution.
+ User only needs this information, in case standard numa node
+ can't provide correct information.
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 38c6342..8339ee8 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -75,10 +75,58 @@ static ssize_t bitstream_metadata_show(struct device *dev,
}
static DEVICE_ATTR_RO(bitstream_metadata);
+static ssize_t cache_size_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
+
+ v = readq(base + FME_HDR_CAP);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(FME_CAP_CACHE_SIZE, v));
+}
+static DEVICE_ATTR_RO(cache_size);
+
+static ssize_t fabric_version_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
+
+ v = readq(base + FME_HDR_CAP);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(FME_CAP_FABRIC_VERID, v));
+}
+static DEVICE_ATTR_RO(fabric_version);
+
+static ssize_t socket_id_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
+
+ v = readq(base + FME_HDR_CAP);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(FME_CAP_SOCKET_ID, v));
+}
+static DEVICE_ATTR_RO(socket_id);
+
static const struct attribute *fme_hdr_attrs[] = {
&dev_attr_ports_num.attr,
&dev_attr_bitstream_id.attr,
&dev_attr_bitstream_metadata.attr,
+ &dev_attr_cache_size.attr,
+ &dev_attr_fabric_version.attr,
+ &dev_attr_socket_id.attr,
NULL,
};
--
2.7.4
This patch introduces userclock sysfs interfaces for AFU, user
could use these interfaces for clock setting to AFU.
Please note that, this is only working for port header feature
with revision 0, for later revisions, userclock setting is moved
to a separated private feature, so one revision sysfs interface
is exposed to userspace application for this purpose too.
Signed-off-by: Ananda Ravuri <[email protected]>
Signed-off-by: Russ Weight <[email protected]>
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
Documentation/ABI/testing/sysfs-platform-dfl-port | 35 +++++++
drivers/fpga/dfl-afu-main.c | 114 +++++++++++++++++++++-
drivers/fpga/dfl.h | 4 +
3 files changed, 152 insertions(+), 1 deletion(-)
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port
index f0c4e92..f611e47 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-port
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
@@ -44,3 +44,38 @@ Contact: Wu Hao <[email protected]>
Description: Read-write. Read and set AFU latency tolerance reporting value.
Set ltr to 1 if the AFU can tolerate latency >= 40us or set it
to 0 if it is latency sensitive.
+
+What: /sys/bus/platform/devices/dfl-port.0/revision
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get the revision of port header
+ feature.
+
+What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcmd
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Write-only. User writes command to this interface to set
+ userclock to AFU.
+
+What: /sys/bus/platform/devices/dfl-port.0/userclk_freqsts
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get the status of issued command
+ to userclck_freqcmd.
+
+What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrcmd
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Write-only. User writes command to this interface to set
+ userclock counter.
+
+What: /sys/bus/platform/devices/dfl-port.0/userclk_freqcntrsts
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get the status of issued command
+ to userclck_freqcntrcmd.
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 2ffec06..82fd80a 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -144,6 +144,17 @@ id_show(struct device *dev, struct device_attribute *attr, char *buf)
static DEVICE_ATTR_RO(id);
static ssize_t
+revision_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ return scnprintf(buf, PAGE_SIZE, "%x\n", dfl_feature_revision(base));
+}
+static DEVICE_ATTR_RO(revision);
+
+static ssize_t
ltr_show(struct device *dev, struct device_attribute *attr, char *buf)
{
struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
@@ -282,6 +293,7 @@ static DEVICE_ATTR_RO(power_state);
static const struct attribute *port_hdr_attrs[] = {
&dev_attr_id.attr,
+ &dev_attr_revision.attr,
&dev_attr_ltr.attr,
&dev_attr_ap1_event.attr,
&dev_attr_ap2_event.attr,
@@ -289,14 +301,113 @@ static const struct attribute *port_hdr_attrs[] = {
NULL,
};
+static ssize_t
+userclk_freqcmd_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ u64 userclk_freq_cmd;
+ void __iomem *base;
+
+ if (kstrtou64(buf, 0, &userclk_freq_cmd))
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ writeq(userclk_freq_cmd, base + PORT_HDR_USRCLK_CMD0);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_WO(userclk_freqcmd);
+
+static ssize_t
+userclk_freqcntrcmd_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ u64 userclk_freqcntr_cmd;
+ void __iomem *base;
+
+ if (kstrtou64(buf, 0, &userclk_freqcntr_cmd))
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ writeq(userclk_freqcntr_cmd, base + PORT_HDR_USRCLK_CMD1);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_WO(userclk_freqcntrcmd);
+
+static ssize_t
+userclk_freqsts_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ u64 userclk_freqsts;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ userclk_freqsts = readq(base + PORT_HDR_USRCLK_STS0);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
+ (unsigned long long)userclk_freqsts);
+}
+static DEVICE_ATTR_RO(userclk_freqsts);
+
+static ssize_t
+userclk_freqcntrsts_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ u64 userclk_freqcntrsts;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ userclk_freqcntrsts = readq(base + PORT_HDR_USRCLK_STS1);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
+ (unsigned long long)userclk_freqcntrsts);
+}
+static DEVICE_ATTR_RO(userclk_freqcntrsts);
+
+static const struct attribute *port_hdr_userclk_attrs[] = {
+ &dev_attr_userclk_freqcmd.attr,
+ &dev_attr_userclk_freqcntrcmd.attr,
+ &dev_attr_userclk_freqsts.attr,
+ &dev_attr_userclk_freqcntrsts.attr,
+ NULL,
+};
+
static int port_hdr_init(struct platform_device *pdev,
struct dfl_feature *feature)
{
+ int ret;
+
dev_dbg(&pdev->dev, "PORT HDR Init.\n");
port_reset(pdev);
- return sysfs_create_files(&pdev->dev.kobj, port_hdr_attrs);
+ ret = sysfs_create_files(&pdev->dev.kobj, port_hdr_attrs);
+ if (ret)
+ return ret;
+
+ /*
+ * if revision > 0, the userclock will be moved from port hdr register
+ * region to a separated private feature.
+ */
+ if (dfl_feature_revision(feature->ioaddr) > 0)
+ return 0;
+
+ ret = sysfs_create_files(&pdev->dev.kobj, port_hdr_userclk_attrs);
+ if (ret)
+ sysfs_remove_files(&pdev->dev.kobj, port_hdr_attrs);
+
+ return ret;
}
static void port_hdr_uinit(struct platform_device *pdev,
@@ -304,6 +415,7 @@ static void port_hdr_uinit(struct platform_device *pdev,
{
dev_dbg(&pdev->dev, "PORT HDR UInit.\n");
+ sysfs_remove_files(&pdev->dev.kobj, port_hdr_userclk_attrs);
sysfs_remove_files(&pdev->dev.kobj, port_hdr_attrs);
}
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index 1525098..3c5dc3a 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -120,6 +120,10 @@
#define PORT_HDR_CAP 0x30
#define PORT_HDR_CTRL 0x38
#define PORT_HDR_STS 0x40
+#define PORT_HDR_USRCLK_CMD0 0x50
+#define PORT_HDR_USRCLK_CMD1 0x58
+#define PORT_HDR_USRCLK_STS0 0x60
+#define PORT_HDR_USRCLK_STS1 0x68
/* Port Capability Register Bitfield */
#define PORT_CAP_PORT_NUM GENMASK_ULL(1, 0) /* ID of this port */
--
2.7.4
In order to support virtualization usage via PCIe SRIOV, this patch
adds two ioctls under FPGA Management Engine (FME) to release and
assign back the port device. In order to safely turn Port from PF
into VF and enable PCIe SRIOV, it requires user to invoke this
PORT_RELEASE ioctl to release port firstly to remove userspace
interfaces, and then configure the PF/VF access register in FME.
After disable SRIOV, it requires user to invoke this PORT_ASSIGN
ioctl to attach the port back to PF.
Ioctl interfaces:
* DFL_FPGA_FME_PORT_RELEASE
Release platform device of given port, it deletes port platform
device to remove related userspace interfaces on PF, then
configures PF/VF access mode to VF.
* DFL_FPGA_FME_PORT_ASSIGN
Assign platform device of given port back to PF, it configures
PF/VF access mode to PF, then adds port platform device back to
re-enable related userspace interfaces on PF.
Signed-off-by: Zhang Yi Z <[email protected]>
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
drivers/fpga/dfl-fme-main.c | 54 +++++++++++++++++++++
drivers/fpga/dfl.c | 107 +++++++++++++++++++++++++++++++++++++-----
drivers/fpga/dfl.h | 10 ++++
include/uapi/linux/fpga-dfl.h | 32 +++++++++++++
4 files changed, 191 insertions(+), 12 deletions(-)
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 076d74f..8b2a337 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -16,6 +16,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
+#include <linux/uaccess.h>
#include <linux/fpga-dfl.h>
#include "dfl.h"
@@ -105,9 +106,62 @@ static void fme_hdr_uinit(struct platform_device *pdev,
sysfs_remove_files(&pdev->dev.kobj, fme_hdr_attrs);
}
+static long fme_hdr_ioctl_release_port(struct dfl_feature_platform_data *pdata,
+ void __user *arg)
+{
+ struct dfl_fpga_cdev *cdev = pdata->dfl_cdev;
+ struct dfl_fpga_fme_port_release release;
+ unsigned long minsz;
+
+ minsz = offsetofend(struct dfl_fpga_fme_port_release, port_id);
+
+ if (copy_from_user(&release, arg, minsz))
+ return -EFAULT;
+
+ if (release.argsz < minsz || release.flags)
+ return -EINVAL;
+
+ return dfl_fpga_cdev_config_port(cdev, release.port_id, true);
+}
+
+static long fme_hdr_ioctl_assign_port(struct dfl_feature_platform_data *pdata,
+ void __user *arg)
+{
+ struct dfl_fpga_cdev *cdev = pdata->dfl_cdev;
+ struct dfl_fpga_fme_port_assign assign;
+ unsigned long minsz;
+
+ minsz = offsetofend(struct dfl_fpga_fme_port_assign, port_id);
+
+ if (copy_from_user(&assign, arg, minsz))
+ return -EFAULT;
+
+ if (assign.argsz < minsz || assign.flags)
+ return -EINVAL;
+
+ return dfl_fpga_cdev_config_port(cdev, assign.port_id, false);
+}
+
+static long fme_hdr_ioctl(struct platform_device *pdev,
+ struct dfl_feature *feature,
+ unsigned int cmd, unsigned long arg)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
+
+ switch (cmd) {
+ case DFL_FPGA_FME_PORT_RELEASE:
+ return fme_hdr_ioctl_release_port(pdata, (void __user *)arg);
+ case DFL_FPGA_FME_PORT_ASSIGN:
+ return fme_hdr_ioctl_assign_port(pdata, (void __user *)arg);
+ }
+
+ return -ENODEV;
+}
+
static const struct dfl_feature_ops fme_hdr_ops = {
.init = fme_hdr_init,
.uinit = fme_hdr_uinit,
+ .ioctl = fme_hdr_ioctl,
};
static struct dfl_feature_driver fme_feature_drvs[] = {
diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
index 2c09e50..a6b6d38 100644
--- a/drivers/fpga/dfl.c
+++ b/drivers/fpga/dfl.c
@@ -224,16 +224,20 @@ EXPORT_SYMBOL_GPL(dfl_fpga_port_ops_del);
*/
int dfl_fpga_check_port_id(struct platform_device *pdev, void *pport_id)
{
- struct dfl_fpga_port_ops *port_ops = dfl_fpga_port_ops_get(pdev);
- int port_id;
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
+ struct dfl_fpga_port_ops *port_ops;
+
+ if (pdata->id != FEATURE_DEV_ID_UNUSED)
+ return pdata->id == *(int *)pport_id;
+ port_ops = dfl_fpga_port_ops_get(pdev);
if (!port_ops || !port_ops->get_id)
return 0;
- port_id = port_ops->get_id(pdev);
+ pdata->id = port_ops->get_id(pdev);
dfl_fpga_port_ops_put(port_ops);
- return port_id == *(int *)pport_id;
+ return pdata->id == *(int *)pport_id;
}
EXPORT_SYMBOL_GPL(dfl_fpga_check_port_id);
@@ -462,6 +466,7 @@ static int build_info_commit_dev(struct build_feature_devs_info *binfo)
pdata->dev = fdev;
pdata->num = binfo->feature_num;
pdata->dfl_cdev = binfo->cdev;
+ pdata->id = FEATURE_DEV_ID_UNUSED;
mutex_init(&pdata->lock);
/*
@@ -959,25 +964,27 @@ void dfl_fpga_feature_devs_remove(struct dfl_fpga_cdev *cdev)
{
struct dfl_feature_platform_data *pdata, *ptmp;
- remove_feature_devs(cdev);
-
mutex_lock(&cdev->lock);
- if (cdev->fme_dev) {
- /* the fme should be unregistered. */
- WARN_ON(device_is_registered(cdev->fme_dev));
+ if (cdev->fme_dev)
put_device(cdev->fme_dev);
- }
list_for_each_entry_safe(pdata, ptmp, &cdev->port_dev_list, node) {
struct platform_device *port_dev = pdata->dev;
- /* the port should be unregistered. */
- WARN_ON(device_is_registered(&port_dev->dev));
+ /* remove released ports */
+ if (!device_is_registered(&port_dev->dev)) {
+ dfl_id_free(feature_dev_id_type(port_dev),
+ port_dev->id);
+ platform_device_put(port_dev);
+ }
+
list_del(&pdata->node);
put_device(&port_dev->dev);
}
mutex_unlock(&cdev->lock);
+ remove_feature_devs(cdev);
+
fpga_region_unregister(cdev->region);
devm_kfree(cdev->parent, cdev);
}
@@ -1015,6 +1022,82 @@ __dfl_fpga_cdev_find_port(struct dfl_fpga_cdev *cdev, void *data,
}
EXPORT_SYMBOL_GPL(__dfl_fpga_cdev_find_port);
+static int attach_port_dev(struct dfl_fpga_cdev *cdev, u32 port_id)
+{
+ struct platform_device *port_pdev;
+ int ret = -ENODEV;
+
+ mutex_lock(&cdev->lock);
+ port_pdev = __dfl_fpga_cdev_find_port(cdev, &port_id,
+ dfl_fpga_check_port_id);
+ if (!port_pdev)
+ goto unlock_exit;
+
+ if (device_is_registered(&port_pdev->dev)) {
+ ret = -EBUSY;
+ goto put_dev_exit;
+ }
+
+ ret = platform_device_add(port_pdev);
+ if (ret)
+ goto put_dev_exit;
+
+ dfl_feature_dev_use_end(dev_get_platdata(&port_pdev->dev));
+ cdev->released_port_num--;
+put_dev_exit:
+ put_device(&port_pdev->dev);
+unlock_exit:
+ mutex_unlock(&cdev->lock);
+ return ret;
+}
+
+static int detach_port_dev(struct dfl_fpga_cdev *cdev, u32 port_id)
+{
+ struct platform_device *port_pdev;
+ int ret = -ENODEV;
+
+ mutex_lock(&cdev->lock);
+ port_pdev = __dfl_fpga_cdev_find_port(cdev, &port_id,
+ dfl_fpga_check_port_id);
+ if (!port_pdev)
+ goto unlock_exit;
+
+ if (!device_is_registered(&port_pdev->dev)) {
+ ret = -EBUSY;
+ goto put_dev_exit;
+ }
+
+ ret = dfl_feature_dev_use_begin(dev_get_platdata(&port_pdev->dev));
+ if (ret)
+ goto put_dev_exit;
+
+ platform_device_del(port_pdev);
+ cdev->released_port_num++;
+put_dev_exit:
+ put_device(&port_pdev->dev);
+unlock_exit:
+ mutex_unlock(&cdev->lock);
+ return ret;
+}
+
+/**
+ * dfl_fpga_cdev_config_port - configure a port feature dev
+ * @cdev: parent container device.
+ * @port_id: id of the port feature device.
+ * @release: release port or assign port back.
+ *
+ * This function allows user to release port platform device or assign it back.
+ * e.g. to safely turn one port from PF into VF for PCI device SRIOV support,
+ * release port platform device is one necessary step.
+ */
+int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev,
+ u32 port_id, bool release)
+{
+ return release ? detach_port_dev(cdev, port_id) :
+ attach_port_dev(cdev, port_id);
+}
+EXPORT_SYMBOL_GPL(dfl_fpga_cdev_config_port);
+
static int __init dfl_fpga_init(void)
{
int ret;
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index 8851c6c..63f39ab 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -183,6 +183,8 @@ struct dfl_feature {
#define DEV_STATUS_IN_USE 0
+#define FEATURE_DEV_ID_UNUSED (-1)
+
/**
* struct dfl_feature_platform_data - platform data for feature devices
*
@@ -191,6 +193,7 @@ struct dfl_feature {
* @cdev: cdev of feature dev.
* @dev: ptr to platform device linked with this platform data.
* @dfl_cdev: ptr to container device.
+ * @id: id used for this feature device.
* @disable_count: count for port disable.
* @num: number for sub features.
* @dev_status: dev status (e.g. DEV_STATUS_IN_USE).
@@ -203,6 +206,7 @@ struct dfl_feature_platform_data {
struct cdev cdev;
struct platform_device *dev;
struct dfl_fpga_cdev *dfl_cdev;
+ int id;
unsigned int disable_count;
unsigned long dev_status;
void *private;
@@ -378,6 +382,7 @@ void dfl_fpga_enum_info_free(struct dfl_fpga_enum_info *info);
* @fme_dev: FME feature device under this container device.
* @lock: mutex lock to protect the port device list.
* @port_dev_list: list of all port feature devices under this container device.
+ * @released_port_num: released port number under this container device.
*/
struct dfl_fpga_cdev {
struct device *parent;
@@ -385,6 +390,7 @@ struct dfl_fpga_cdev {
struct device *fme_dev;
struct mutex lock;
struct list_head port_dev_list;
+ int released_port_num;
};
struct dfl_fpga_cdev *
@@ -412,4 +418,8 @@ dfl_fpga_cdev_find_port(struct dfl_fpga_cdev *cdev, void *data,
return pdev;
}
+
+int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev,
+ u32 port_id, bool release);
+
#endif /* __FPGA_DFL_H */
diff --git a/include/uapi/linux/fpga-dfl.h b/include/uapi/linux/fpga-dfl.h
index 2e324e5..e9a00e0 100644
--- a/include/uapi/linux/fpga-dfl.h
+++ b/include/uapi/linux/fpga-dfl.h
@@ -176,4 +176,36 @@ struct dfl_fpga_fme_port_pr {
#define DFL_FPGA_FME_PORT_PR _IO(DFL_FPGA_MAGIC, DFL_FME_BASE + 0)
+/**
+ * DFL_FPGA_FME_PORT_RELEASE - _IOW(DFL_FPGA_MAGIC, DFL_FME_BASE + 1,
+ * struct dfl_fpga_fme_port_release)
+ *
+ * Driver releases the port per Port ID provided by caller.
+ * Return: 0 on success, -errno on failure.
+ */
+struct dfl_fpga_fme_port_release {
+ /* Input */
+ __u32 argsz; /* Structure length */
+ __u32 flags; /* Zero for now */
+ __u32 port_id;
+};
+
+#define DFL_FPGA_FME_PORT_RELEASE _IO(DFL_FPGA_MAGIC, DFL_FME_BASE + 1)
+
+/**
+ * DFL_FPGA_FME_PORT_ASSIGN - _IOW(DFL_FPGA_MAGIC, DFL_FME_BASE + 2,
+ * struct dfl_fpga_fme_port_assign)
+ *
+ * Driver assigns the port back per Port ID provided by caller.
+ * Return: 0 on success, -errno on failure.
+ */
+struct dfl_fpga_fme_port_assign {
+ /* Input */
+ __u32 argsz; /* Structure length */
+ __u32 flags; /* Zero for now */
+ __u32 port_id;
+};
+
+#define DFL_FPGA_FME_PORT_ASSIGN _IO(DFL_FPGA_MAGIC, DFL_FME_BASE + 2)
+
#endif /* _UAPI_LINUX_FPGA_DFL_H */
--
2.7.4
This patch introduces more sysfs interfaces for Accelerated
Function Unit (AFU). These interfaces allow users to read
current AFU Power State (APx), read / clear AFU Power (APx)
events which are sticky to identify transient APx state,
and manage AFU's LTR (latency tolerance reporting).
Signed-off-by: Ananda Ravuri <[email protected]>
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
Documentation/ABI/testing/sysfs-platform-dfl-port | 30 +++++
drivers/fpga/dfl-afu-main.c | 144 ++++++++++++++++++++++
drivers/fpga/dfl.h | 11 ++
3 files changed, 185 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port
index 6a92dda..f0c4e92 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-port
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
@@ -14,3 +14,33 @@ Description: Read-only. User can program different PR bitstreams to FPGA
Accelerator Function Unit (AFU) for different functions. It
returns uuid which could be used to identify which PR bitstream
is programmed in this AFU.
+
+What: /sys/bus/platform/devices/dfl-port.0/power_state
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It reports the APx (AFU Power) state, different APx
+ means different throttling level. When reading this file, it
+ returns "0" - Normal / "1" - AP1 / "2" - AP2 / "6" - AP6.
+
+What: /sys/bus/platform/devices/dfl-port.0/ap1_event
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-write. Read or set 1 to clear AP1 (AFU Power State 1)
+ event. It's used to indicate transient AP1 state.
+
+What: /sys/bus/platform/devices/dfl-port.0/ap2_event
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-write. Read or set 1 to clear AP2 (AFU Power State 2)
+ event. It's used to indicate transient AP2 state.
+
+What: /sys/bus/platform/devices/dfl-port.0/ltr
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-write. Read and set AFU latency tolerance reporting value.
+ Set ltr to 1 if the AFU can tolerate latency >= 40us or set it
+ to 0 if it is latency sensitive.
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 02baa6a..2ffec06 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -21,6 +21,8 @@
#include "dfl-afu.h"
+#define DRV_VERSION "0.8"
+
/**
* port_enable - enable a port
* @pdev: port platform device.
@@ -141,8 +143,149 @@ id_show(struct device *dev, struct device_attribute *attr, char *buf)
}
static DEVICE_ATTR_RO(id);
+static ssize_t
+ltr_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + PORT_HDR_CTRL);
+ mutex_unlock(&pdata->lock);
+
+ return scnprintf(buf, PAGE_SIZE, "%x\n",
+ (u8)FIELD_GET(PORT_CTRL_LATENCY, v));
+}
+
+static ssize_t
+ltr_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u8 ltr;
+ u64 v;
+
+ if (kstrtou8(buf, 0, <r) || ltr > 1)
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + PORT_HDR_CTRL);
+ v &= ~PORT_CTRL_LATENCY;
+ v |= FIELD_PREP(PORT_CTRL_LATENCY, ltr);
+ writeq(v, base + PORT_HDR_CTRL);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_RW(ltr);
+
+static ssize_t
+ap1_event_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + PORT_HDR_STS);
+ mutex_unlock(&pdata->lock);
+
+ return scnprintf(buf, PAGE_SIZE, "%x\n",
+ (u8)FIELD_GET(PORT_STS_AP1_EVT, v));
+}
+
+static ssize_t
+ap1_event_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u8 ap1_event;
+
+ if (kstrtou8(buf, 0, &ap1_event) || ap1_event != 1)
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ writeq(PORT_STS_AP1_EVT, base + PORT_HDR_STS);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_RW(ap1_event);
+
+static ssize_t
+ap2_event_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + PORT_HDR_STS);
+ mutex_unlock(&pdata->lock);
+
+ return scnprintf(buf, PAGE_SIZE, "%x\n",
+ (u8)FIELD_GET(PORT_STS_AP2_EVT, v));
+}
+
+static ssize_t
+ap2_event_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u8 ap2_event;
+
+ if (kstrtou8(buf, 0, &ap2_event) || ap2_event != 1)
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ writeq(PORT_STS_AP2_EVT, base + PORT_HDR_STS);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_RW(ap2_event);
+
+static ssize_t
+power_state_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + PORT_HDR_STS);
+ mutex_unlock(&pdata->lock);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%x\n",
+ (u8)FIELD_GET(PORT_STS_PWR_STATE, v));
+}
+static DEVICE_ATTR_RO(power_state);
+
static const struct attribute *port_hdr_attrs[] = {
&dev_attr_id.attr,
+ &dev_attr_ltr.attr,
+ &dev_attr_ap1_event.attr,
+ &dev_attr_ap2_event.attr,
+ &dev_attr_power_state.attr,
NULL,
};
@@ -634,3 +777,4 @@ MODULE_DESCRIPTION("FPGA Accelerated Function Unit driver");
MODULE_AUTHOR("Intel Corporation");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS("platform:dfl-port");
+MODULE_VERSION(DRV_VERSION);
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index 1350e8e..1525098 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -119,6 +119,7 @@
#define PORT_HDR_NEXT_AFU NEXT_AFU
#define PORT_HDR_CAP 0x30
#define PORT_HDR_CTRL 0x38
+#define PORT_HDR_STS 0x40
/* Port Capability Register Bitfield */
#define PORT_CAP_PORT_NUM GENMASK_ULL(1, 0) /* ID of this port */
@@ -130,6 +131,16 @@
/* Latency tolerance reporting. '1' >= 40us, '0' < 40us.*/
#define PORT_CTRL_LATENCY BIT_ULL(2)
#define PORT_CTRL_SFTRST_ACK BIT_ULL(4) /* HW ack for reset */
+
+/* Port Status Register Bitfield */
+#define PORT_STS_AP2_EVT BIT_ULL(13) /* AP2 event detected */
+#define PORT_STS_AP1_EVT BIT_ULL(12) /* AP1 event detected */
+#define PORT_STS_PWR_STATE GENMASK_ULL(11, 8) /* AFU power states */
+#define PORT_STS_PWR_STATE_NORM 0
+#define PORT_STS_PWR_STATE_AP1 1 /* 50% throttling */
+#define PORT_STS_PWR_STATE_AP2 2 /* 90% throttling */
+#define PORT_STS_PWR_STATE_AP6 6 /* 100% throttling */
+
/**
* struct dfl_fpga_port_ops - port ops
*
--
2.7.4
This patch enables the standard sriov support. It allows user to
enable SRIOV (and VFs), then user could pass through accelerators
(VFs) into virtual machine or use VFs directly in host.
Signed-off-by: Zhang Yi Z <[email protected]>
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
drivers/fpga/dfl-pci.c | 40 ++++++++++++++++++++++++++++++++++++++++
drivers/fpga/dfl.c | 41 +++++++++++++++++++++++++++++++++++++++++
drivers/fpga/dfl.h | 1 +
3 files changed, 82 insertions(+)
diff --git a/drivers/fpga/dfl-pci.c b/drivers/fpga/dfl-pci.c
index 66b5720..2fa571b 100644
--- a/drivers/fpga/dfl-pci.c
+++ b/drivers/fpga/dfl-pci.c
@@ -223,8 +223,46 @@ int cci_pci_probe(struct pci_dev *pcidev, const struct pci_device_id *pcidevid)
return ret;
}
+static int cci_pci_sriov_configure(struct pci_dev *pcidev, int num_vfs)
+{
+ struct cci_drvdata *drvdata = pci_get_drvdata(pcidev);
+ struct dfl_fpga_cdev *cdev = drvdata->cdev;
+ int ret = 0;
+
+ mutex_lock(&cdev->lock);
+
+ if (!num_vfs) {
+ /*
+ * disable SRIOV and then put released ports back to default
+ * PF access mode.
+ */
+ pci_disable_sriov(pcidev);
+
+ __dfl_fpga_cdev_config_port_vf(cdev, false);
+
+ } else if (cdev->released_port_num == num_vfs) {
+ /*
+ * only enable SRIOV if cdev has matched released ports, put
+ * released ports into VF access mode firstly.
+ */
+ __dfl_fpga_cdev_config_port_vf(cdev, true);
+
+ ret = pci_enable_sriov(pcidev, num_vfs);
+ if (ret)
+ __dfl_fpga_cdev_config_port_vf(cdev, false);
+ } else {
+ ret = -EINVAL;
+ }
+
+ mutex_unlock(&cdev->lock);
+ return ret;
+}
+
static void cci_pci_remove(struct pci_dev *pcidev)
{
+ if (dev_is_pf(&pcidev->dev))
+ cci_pci_sriov_configure(pcidev, 0);
+
cci_remove_feature_devs(pcidev);
pci_disable_pcie_error_reporting(pcidev);
}
@@ -234,6 +272,7 @@ static struct pci_driver cci_pci_driver = {
.id_table = cci_pcie_id_tbl,
.probe = cci_pci_probe,
.remove = cci_pci_remove,
+ .sriov_configure = cci_pci_sriov_configure,
};
module_pci_driver(cci_pci_driver);
@@ -241,3 +280,4 @@ module_pci_driver(cci_pci_driver);
MODULE_DESCRIPTION("FPGA DFL PCIe Device Driver");
MODULE_AUTHOR("Intel Corporation");
MODULE_LICENSE("GPL v2");
+MODULE_VERSION(DRV_VERSION);
diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
index a6b6d38..c5aa287 100644
--- a/drivers/fpga/dfl.c
+++ b/drivers/fpga/dfl.c
@@ -1098,6 +1098,47 @@ int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev,
}
EXPORT_SYMBOL_GPL(dfl_fpga_cdev_config_port);
+static void config_port_vf(struct device *fme_dev, int port_id, bool is_vf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(fme_dev, FME_FEATURE_ID_HEADER);
+
+ v = readq(base + FME_HDR_PORT_OFST(port_id));
+
+ v &= ~FME_PORT_OFST_ACC_CTRL;
+ v |= FIELD_PREP(FME_PORT_OFST_ACC_CTRL,
+ is_vf ? FME_PORT_OFST_ACC_VF : FME_PORT_OFST_ACC_PF);
+
+ writeq(v, base + FME_HDR_PORT_OFST(port_id));
+}
+
+/**
+ * __dfl_fpga_cdev_config_port_vf - configure port to VF access mode
+ *
+ * @cdev: parent container device.
+ * @if_vf: true for VF access mode, and false for PF access mode
+ *
+ * Return: 0 on success, negative error code otherwise.
+ *
+ * This function is needed in sriov configuration routine. It could be used to
+ * configures the released ports access mode to VF or PF.
+ * The caller needs to hold lock for protection.
+ */
+void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf)
+{
+ struct dfl_feature_platform_data *pdata;
+
+ list_for_each_entry(pdata, &cdev->port_dev_list, node) {
+ if (device_is_registered(&pdata->dev->dev))
+ continue;
+
+ config_port_vf(cdev->fme_dev, pdata->id, is_vf);
+ }
+}
+EXPORT_SYMBOL_GPL(__dfl_fpga_cdev_config_port_vf);
+
static int __init dfl_fpga_init(void)
{
int ret;
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index 63f39ab..1350e8e 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -421,5 +421,6 @@ dfl_fpga_cdev_find_port(struct dfl_fpga_cdev *cdev, void *data,
int dfl_fpga_cdev_config_port(struct dfl_fpga_cdev *cdev,
u32 port_id, bool release);
+void __dfl_fpga_cdev_config_port_vf(struct dfl_fpga_cdev *cdev, bool is_vf);
#endif /* __FPGA_DFL_H */
--
2.7.4
This patch adds support for power management private feature under
FPGA Management Engine (FME), sysfs interfaces are introduced for
different power management functions, users could use these sysfs
interface to get current number of consumed power, throttling
thresholds, threshold status and other information, and configure
different value for throttling thresholds too.
Signed-off-by: Luwei Kang <[email protected]>
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++
drivers/fpga/dfl-fme-main.c | 257 +++++++++++++++++++++++
2 files changed, 313 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index d3aeb88..4b6448f 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -100,3 +100,59 @@ Description: Read-only. Read this file to get the policy of temperature
threshold1. It only supports two value (policy):
0 - AP2 state (90% throttling)
1 - AP1 state (50% throttling)
+
+What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/consumed
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns current power consumed by FPGA.
+
+What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Write. Read/Write this file to get/set current power
+ threshold1 in Watts.
+
+What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Write. Read/Write this file to get/set current power
+ threshold2 in Watts.
+
+What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1_status
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns 1 if power consumption reaches the
+ threshold1, otherwise 0.
+
+What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2_status
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns 1 if power consumption reaches the
+ threshold2, otherwise 0.
+
+What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/ltr
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get current Latency Tolerance
+ Reporting (ltr) value, it's only valid for integrated
+ solution as it blocks CPU on low power state.
+
+What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/xeon_limit
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get power limit for xeon, it
+ is only valid for integrated solution.
+
+What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/fpga_limit
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get power limit for fpga, it
+ is only valid for integrated solution.
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 449a17d..dafa6580 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -415,6 +415,259 @@ static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
.uinit = fme_thermal_mgmt_uinit,
};
+#define FME_PWR_STATUS 0x8
+#define FME_LATENCY_TOLERANCE BIT_ULL(18)
+#define PWR_CONSUMED GENMASK_ULL(17, 0)
+
+#define FME_PWR_THRESHOLD 0x10
+#define PWR_THRESHOLD1 GENMASK_ULL(6, 0) /* in Watts */
+#define PWR_THRESHOLD2 GENMASK_ULL(14, 8) /* in Watts */
+#define PWR_THRESHOLD_MAX 0x7f
+#define PWR_THRESHOLD1_STATUS BIT_ULL(16)
+#define PWR_THRESHOLD2_STATUS BIT_ULL(17)
+
+#define FME_PWR_XEON_LIMIT 0x18
+#define XEON_PWR_LIMIT GENMASK_ULL(14, 0)
+#define XEON_PWR_EN BIT_ULL(15)
+#define FME_PWR_FPGA_LIMIT 0x20
+#define FPGA_PWR_LIMIT GENMASK_ULL(14, 0)
+#define FPGA_PWR_EN BIT_ULL(15)
+
+#define POWER_ATTR(_name, _mode, _show, _store) \
+struct device_attribute power_attr_##_name = \
+ __ATTR(_name, _mode, _show, _store)
+
+#define POWER_ATTR_RO(_name, _show) \
+ POWER_ATTR(_name, 0444, _show, NULL)
+
+#define POWER_ATTR_RW(_name, _show, _store) \
+ POWER_ATTR(_name, 0644, _show, _store)
+
+static ssize_t pwr_consumed_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
+
+ v = readq(base + FME_PWR_STATUS);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(PWR_CONSUMED, v));
+}
+static POWER_ATTR_RO(consumed, pwr_consumed_show);
+
+static ssize_t pwr_threshold1_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
+
+ v = readq(base + FME_PWR_THRESHOLD);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(PWR_THRESHOLD1, v));
+}
+
+static ssize_t pwr_threshold1_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u8 threshold;
+ int ret;
+ u64 v;
+
+ ret = kstrtou8(buf, 0, &threshold);
+ if (ret)
+ return ret;
+
+ if (threshold > PWR_THRESHOLD_MAX)
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + FME_PWR_THRESHOLD);
+ v &= ~PWR_THRESHOLD1;
+ v |= FIELD_PREP(PWR_THRESHOLD1, threshold);
+ writeq(v, base + FME_PWR_THRESHOLD);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static POWER_ATTR_RW(threshold1, pwr_threshold1_show, pwr_threshold1_store);
+
+static ssize_t pwr_threshold2_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
+
+ v = readq(base + FME_PWR_THRESHOLD);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(PWR_THRESHOLD2, v));
+}
+
+static ssize_t pwr_threshold2_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u8 threshold;
+ int ret;
+ u64 v;
+
+ ret = kstrtou8(buf, 0, &threshold);
+ if (ret)
+ return ret;
+
+ if (threshold > PWR_THRESHOLD_MAX)
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + FME_PWR_THRESHOLD);
+ v &= ~PWR_THRESHOLD2;
+ v |= FIELD_PREP(PWR_THRESHOLD2, threshold);
+ writeq(v, base + FME_PWR_THRESHOLD);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static POWER_ATTR_RW(threshold2, pwr_threshold2_show, pwr_threshold2_store);
+
+static ssize_t pwr_threshold1_status_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
+
+ v = readq(base + FME_PWR_THRESHOLD);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(PWR_THRESHOLD1_STATUS, v));
+}
+static POWER_ATTR_RO(threshold1_status, pwr_threshold1_status_show);
+
+static ssize_t pwr_threshold2_status_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
+
+ v = readq(base + FME_PWR_THRESHOLD);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(PWR_THRESHOLD2_STATUS, v));
+}
+static POWER_ATTR_RO(threshold2_status, pwr_threshold2_status_show);
+
+static ssize_t ltr_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
+
+ v = readq(base + FME_PWR_STATUS);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(FME_LATENCY_TOLERANCE, v));
+}
+static POWER_ATTR_RO(ltr, ltr_show);
+
+static ssize_t xeon_limit_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u16 xeon_limit = 0;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
+
+ v = readq(base + FME_PWR_XEON_LIMIT);
+
+ if (FIELD_GET(XEON_PWR_EN, v))
+ xeon_limit = FIELD_GET(XEON_PWR_LIMIT, v);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n", xeon_limit);
+}
+static POWER_ATTR_RO(xeon_limit, xeon_limit_show);
+
+static ssize_t fpga_limit_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u16 fpga_limit = 0;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
+
+ v = readq(base + FME_PWR_FPGA_LIMIT);
+
+ if (FIELD_GET(FPGA_PWR_EN, v))
+ fpga_limit = FIELD_GET(FPGA_PWR_LIMIT, v);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n", fpga_limit);
+}
+static POWER_ATTR_RO(fpga_limit, fpga_limit_show);
+
+static struct attribute *power_mgmt_attrs[] = {
+ &power_attr_consumed.attr,
+ &power_attr_threshold1.attr,
+ &power_attr_threshold2.attr,
+ &power_attr_threshold1_status.attr,
+ &power_attr_threshold2_status.attr,
+ &power_attr_xeon_limit.attr,
+ &power_attr_fpga_limit.attr,
+ &power_attr_ltr.attr,
+ NULL,
+};
+
+static struct attribute_group power_mgmt_attr_group = {
+ .attrs = power_mgmt_attrs,
+ .name = "power_mgmt",
+};
+
+static int fme_power_mgmt_init(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ return sysfs_create_group(&pdev->dev.kobj, &power_mgmt_attr_group);
+}
+
+static void fme_power_mgmt_uinit(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ sysfs_remove_group(&pdev->dev.kobj, &power_mgmt_attr_group);
+}
+
+static const struct dfl_feature_id fme_power_mgmt_id_table[] = {
+ {.id = FME_FEATURE_ID_POWER_MGMT,},
+ {0,}
+};
+
+static const struct dfl_feature_ops fme_power_mgmt_ops = {
+ .init = fme_power_mgmt_init,
+ .uinit = fme_power_mgmt_uinit,
+};
+
static struct dfl_feature_driver fme_feature_drvs[] = {
{
.id_table = fme_hdr_id_table,
@@ -429,6 +682,10 @@ static struct dfl_feature_driver fme_feature_drvs[] = {
.ops = &fme_thermal_mgmt_ops,
},
{
+ .id_table = fme_power_mgmt_id_table,
+ .ops = &fme_power_mgmt_ops,
+ },
+ {
.ops = NULL,
},
};
--
2.7.4
This patch adds support for global error reporting for FPGA
Management Engine (FME), it introduces sysfs interfaces to
report different error detected by the hardware, and allow
user to clear errors or inject error for testing purpose.
Signed-off-by: Luwei Kang <[email protected]>
Signed-off-by: Ananda Ravuri <[email protected]>
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
Documentation/ABI/testing/sysfs-platform-dfl-fme | 58 ++++
drivers/fpga/Makefile | 2 +-
drivers/fpga/dfl-fme-error.c | 390 +++++++++++++++++++++++
drivers/fpga/dfl-fme-main.c | 4 +
drivers/fpga/dfl-fme.h | 2 +
drivers/fpga/dfl.h | 2 +
6 files changed, 457 insertions(+), 1 deletion(-)
create mode 100644 drivers/fpga/dfl-fme-error.c
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index 4b6448f..38f9cdd 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -156,3 +156,61 @@ KernelVersion: 5.2
Contact: Wu Hao <[email protected]>
Description: Read-only. Read this file to get power limit for fpga, it
is only valid for integrated solution.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/errors
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get errors detected by hardware.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/first_error
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get the first error detected by
+ hardware.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/next_error
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get the second error detected by
+ hardware.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/clear
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Write-only. Write error code to this file to clear errors. If
+ the input error code doesn't match, it returns -EBUSY.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie0_errors
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns errors detected on pcie0 link.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie1_errors
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns errors detected on pcie1 link.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/nonfatal_errors
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns non-fatal errors detected.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/catfatal_errors
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns catastrophic and fatal errors detected.
+
+What: /sys/bus/platform/devices/dfl-fme.0/errors/inject_error
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-Write. Write this file to inject errors for testing
+ purpose. Read this file to check errors injected.
diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
index f1f0af7..1a9fa3d 100644
--- a/drivers/fpga/Makefile
+++ b/drivers/fpga/Makefile
@@ -38,7 +38,7 @@ obj-$(CONFIG_FPGA_DFL_FME_BRIDGE) += dfl-fme-br.o
obj-$(CONFIG_FPGA_DFL_FME_REGION) += dfl-fme-region.o
obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o
-dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o
+dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o dfl-fme-error.o
dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o
dfl-afu-objs += dfl-afu-error.o
diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c
new file mode 100644
index 0000000..f2bd5f8
--- /dev/null
+++ b/drivers/fpga/dfl-fme-error.c
@@ -0,0 +1,390 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for FPGA Management Engine Error Management
+ *
+ * Copyright 2019 Intel Corporation, Inc.
+ *
+ * Authors:
+ * Kang Luwei <[email protected]>
+ * Xiao Guangrong <[email protected]>
+ * Wu Hao <[email protected]>
+ * Joseph Grecco <[email protected]>
+ * Enno Luebbers <[email protected]>
+ * Tim Whisonant <[email protected]>
+ * Ananda Ravuri <[email protected]>
+ * Mitchel, Henry <[email protected]>
+ */
+
+#include <linux/uaccess.h>
+
+#include "dfl.h"
+#include "dfl-fme.h"
+
+#define FME_ERROR_MASK 0x8
+#define FME_ERROR 0x10
+#define MBP_ERROR BIT_ULL(6)
+#define PCIE0_ERROR_MASK 0x18
+#define PCIE0_ERROR 0x20
+#define PCIE1_ERROR_MASK 0x28
+#define PCIE1_ERROR 0x30
+#define FME_FIRST_ERROR 0x38
+#define FME_NEXT_ERROR 0x40
+#define RAS_NONFAT_ERROR_MASK 0x48
+#define RAS_NONFAT_ERROR 0x50
+#define RAS_CATFAT_ERROR_MASK 0x58
+#define RAS_CATFAT_ERROR 0x60
+#define RAS_ERROR_INJECT 0x68
+#define INJECT_ERROR_MASK GENMASK_ULL(2, 0)
+
+static ssize_t errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
+ (unsigned long long)readq(base + FME_ERROR));
+}
+static DEVICE_ATTR_RO(errors);
+
+static ssize_t first_error_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
+ (unsigned long long)readq(base + FME_FIRST_ERROR));
+}
+static DEVICE_ATTR_RO(first_error);
+
+static ssize_t next_error_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
+ (unsigned long long)readq(base + FME_NEXT_ERROR));
+}
+static DEVICE_ATTR_RO(next_error);
+
+static ssize_t clear_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
+ struct device *err_dev = dev->parent;
+ struct dfl_feature *feature;
+ void __iomem *base;
+ u64 v, val;
+ int ret;
+
+ ret = kstrtou64(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ feature = dfl_get_feature_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+ base = feature->ioaddr;
+
+ mutex_lock(&pdata->lock);
+ writeq(GENMASK_ULL(63, 0), base + FME_ERROR_MASK);
+
+ v = readq(base + FME_ERROR);
+ if (val != v) {
+ ret = -EINVAL;
+ goto done;
+ }
+
+ writeq(v, base + FME_ERROR);
+ v = readq(base + FME_FIRST_ERROR);
+ writeq(v, base + FME_FIRST_ERROR);
+ v = readq(base + FME_NEXT_ERROR);
+ writeq(v, base + FME_NEXT_ERROR);
+
+done:
+ /* Workaround: disable MBP_ERROR if feature revision is 0 */
+ writeq(dfl_feature_revision(feature->ioaddr) ? 0ULL : MBP_ERROR,
+ base + FME_ERROR_MASK);
+ mutex_unlock(&pdata->lock);
+ return ret ? ret : count;
+}
+static DEVICE_ATTR_WO(clear);
+
+static ssize_t revision_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n", dfl_feature_revision(base));
+}
+static DEVICE_ATTR_RO(revision);
+
+static ssize_t pcie0_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
+ (unsigned long long)readq(base + PCIE0_ERROR));
+}
+
+static ssize_t pcie0_errors_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+ u64 v, val;
+ int ret;
+
+ ret = kstrtou64(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ mutex_lock(&pdata->lock);
+ writeq(GENMASK_ULL(63, 0), base + PCIE0_ERROR_MASK);
+
+ v = readq(base + PCIE0_ERROR);
+ if (val != v)
+ ret = -EBUSY;
+ else
+ writeq(v, base + PCIE0_ERROR);
+
+ writeq(0ULL, base + PCIE0_ERROR_MASK);
+ mutex_unlock(&pdata->lock);
+ return ret ? ret : count;
+}
+static DEVICE_ATTR_RW(pcie0_errors);
+
+static ssize_t pcie1_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
+ (unsigned long long)readq(base + PCIE1_ERROR));
+}
+
+static ssize_t pcie1_errors_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+ u64 v, val;
+ int ret;
+
+ ret = kstrtou64(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ mutex_lock(&pdata->lock);
+ writeq(GENMASK_ULL(63, 0), base + PCIE1_ERROR_MASK);
+
+ v = readq(base + PCIE1_ERROR);
+ if (val != v)
+ ret = -EBUSY;
+ else
+ writeq(v, base + PCIE1_ERROR);
+
+ writeq(0ULL, base + PCIE1_ERROR_MASK);
+ mutex_unlock(&pdata->lock);
+ return ret ? ret : count;
+}
+static DEVICE_ATTR_RW(pcie1_errors);
+
+static ssize_t nonfatal_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
+ (unsigned long long)readq(base + RAS_NONFAT_ERROR));
+}
+static DEVICE_ATTR_RO(nonfatal_errors);
+
+static ssize_t catfatal_errors_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
+ (unsigned long long)readq(base + RAS_CATFAT_ERROR));
+}
+static DEVICE_ATTR_RO(catfatal_errors);
+
+static ssize_t inject_error_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ v = readq(base + RAS_ERROR_INJECT);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
+ (unsigned long long)FIELD_GET(INJECT_ERROR_MASK, v));
+}
+
+static ssize_t inject_error_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
+ struct device *err_dev = dev->parent;
+ void __iomem *base;
+ u8 inject_error;
+ int ret;
+ u64 v;
+
+ ret = kstrtou8(buf, 0, &inject_error);
+ if (ret)
+ return ret;
+
+ if (inject_error & ~INJECT_ERROR_MASK)
+ return -EINVAL;
+
+ base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
+
+ mutex_lock(&pdata->lock);
+ v = readq(base + RAS_ERROR_INJECT);
+ v &= ~INJECT_ERROR_MASK;
+ v |= FIELD_PREP(INJECT_ERROR_MASK, inject_error);
+ writeq(v, base + RAS_ERROR_INJECT);
+ mutex_unlock(&pdata->lock);
+
+ return count;
+}
+static DEVICE_ATTR_RW(inject_error);
+
+static struct attribute *fme_errors_attrs[] = {
+ &dev_attr_errors.attr,
+ &dev_attr_first_error.attr,
+ &dev_attr_next_error.attr,
+ &dev_attr_clear.attr,
+ NULL,
+};
+
+static struct attribute_group fme_errors_attr_group = {
+ .attrs = fme_errors_attrs,
+ .name = "fme-errors",
+};
+
+static struct attribute *errors_attrs[] = {
+ &dev_attr_revision.attr,
+ &dev_attr_pcie0_errors.attr,
+ &dev_attr_pcie1_errors.attr,
+ &dev_attr_nonfatal_errors.attr,
+ &dev_attr_catfatal_errors.attr,
+ &dev_attr_inject_error.attr,
+ NULL,
+};
+
+static struct attribute_group errors_attr_group = {
+ .attrs = errors_attrs,
+};
+
+static const struct attribute_group *error_groups[] = {
+ &fme_errors_attr_group,
+ &errors_attr_group,
+ NULL
+};
+
+static void fme_error_enable(struct dfl_feature *feature)
+{
+ void __iomem *base = feature->ioaddr;
+
+ /* Workaround: disable MBP_ERROR if revision is 0 */
+ writeq(dfl_feature_revision(feature->ioaddr) ? 0ULL : MBP_ERROR,
+ base + FME_ERROR_MASK);
+ writeq(0ULL, base + PCIE0_ERROR_MASK);
+ writeq(0ULL, base + PCIE1_ERROR_MASK);
+ writeq(0ULL, base + RAS_NONFAT_ERROR_MASK);
+ writeq(0ULL, base + RAS_CATFAT_ERROR_MASK);
+}
+
+static void err_dev_release(struct device *dev)
+{
+ kfree(dev);
+}
+
+static int fme_global_err_init(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ struct device *dev;
+ int ret = 0;
+
+ dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+ if (!dev)
+ return -ENOMEM;
+
+ dev->parent = &pdev->dev;
+ dev->release = err_dev_release;
+ dev_set_name(dev, "errors");
+
+ fme_error_enable(feature);
+
+ ret = device_register(dev);
+ if (ret) {
+ put_device(dev);
+ return ret;
+ }
+
+ ret = sysfs_create_groups(&dev->kobj, error_groups);
+ if (ret) {
+ device_unregister(dev);
+ return ret;
+ }
+
+ feature->priv = dev;
+
+ return ret;
+}
+
+static void fme_global_err_uinit(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ struct device *dev = feature->priv;
+
+ sysfs_remove_groups(&dev->kobj, error_groups);
+ device_unregister(dev);
+}
+
+const struct dfl_feature_id fme_global_err_id_table[] = {
+ {.id = FME_FEATURE_ID_GLOBAL_ERR,},
+ {0,}
+};
+
+const struct dfl_feature_ops fme_global_err_ops = {
+ .init = fme_global_err_init,
+ .uinit = fme_global_err_uinit,
+};
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index dafa6580..76cb112 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -686,6 +686,10 @@ static struct dfl_feature_driver fme_feature_drvs[] = {
.ops = &fme_power_mgmt_ops,
},
{
+ .id_table = fme_global_err_id_table,
+ .ops = &fme_global_err_ops,
+ },
+ {
.ops = NULL,
},
};
diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
index 7a021c4..5fbe3f5 100644
--- a/drivers/fpga/dfl-fme.h
+++ b/drivers/fpga/dfl-fme.h
@@ -37,5 +37,7 @@ struct dfl_fme {
extern const struct dfl_feature_ops fme_pr_mgmt_ops;
extern const struct dfl_feature_id fme_pr_mgmt_id_table[];
+extern const struct dfl_feature_ops fme_global_err_ops;
+extern const struct dfl_feature_id fme_global_err_id_table[];
#endif /* __DFL_FME_H */
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index fbc57f0..6c32080 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -197,12 +197,14 @@ struct dfl_feature_driver {
* feature dev (platform device)'s reources.
* @ioaddr: mapped mmio resource address.
* @ops: ops of this sub feature.
+ * @priv: priv data of this feature.
*/
struct dfl_feature {
u64 id;
int resource_index;
void __iomem *ioaddr;
const struct dfl_feature_ops *ops;
+ void *priv;
};
#define DEV_STATUS_IN_USE 0
--
2.7.4
This patch adds support to thermal management private feature for DFL
FPGA Management Engine (FME). As thermal throttling is handled by
hardware automatically per pre-defined thresholds, this private
feature driver only provides read-only sysfs interfaces for user
to read temperature, thresholds, threshold policy and other info.
Signed-off-by: Luwei Kang <[email protected]>
Signed-off-by: Russ Weight <[email protected]>
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++++
drivers/fpga/dfl-fme-main.c | 202 +++++++++++++++++++++++
2 files changed, 258 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
index b8327e9..d3aeb88 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
@@ -44,3 +44,59 @@ Description: Read-only. It returns socket_id to indicate which socket
this FPGA belongs to, only valid for integrated solution.
User only needs this information, in case standard numa node
can't provide correct information.
+
+What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/temperature
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns temperature (in Celsius) of this FPGA
+ device.
+
+What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get the temperature threshold1
+ (in Celsius).
+
+What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold2
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get the temperature threshold2
+ (in Celsius).
+
+What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/trip_threshold
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns trip threshold (in Celsius), once FPGA
+ temperature reaches trip threshold, it triggers a fatal event
+ to board management controller (BMC) to shutdown FPGA.
+
+What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1_status
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns 1 if temperature reaches threshold1,
+ otherwise 0. Once temperature reaches threshold1, hardware
+ will automatically enter throttling state (AP1 - 50%
+ or AP2 - 90% throttling, see 'threshold1_policy').
+
+What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold2_status
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. It returns 1 if temperature reaches threshold2,
+ otherwise 0. Once temperature reaches threshold2, hardware
+ will automatically enter the deepest throttling state (AP6
+ - 100% throttling).
+
+What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1_policy
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get the policy of temperature
+ threshold1. It only supports two value (policy):
+ 0 - AP2 state (90% throttling)
+ 1 - AP1 state (50% throttling)
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 8339ee8..449a17d 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -18,6 +18,7 @@
#include <linux/module.h>
#include <linux/uaccess.h>
#include <linux/fpga-dfl.h>
+#include <linux/sysfs.h>
#include "dfl.h"
#include "dfl-fme.h"
@@ -217,6 +218,203 @@ static const struct dfl_feature_ops fme_hdr_ops = {
.ioctl = fme_hdr_ioctl,
};
+#define FME_THERM_THRESHOLD 0x8
+#define TEMP_THRESHOLD1 GENMASK_ULL(6, 0)
+#define TEMP_THRESHOLD1_EN BIT_ULL(7)
+#define TEMP_THRESHOLD2 GENMASK_ULL(14, 8)
+#define TEMP_THRESHOLD2_EN BIT_ULL(15)
+#define TRIP_THRESHOLD GENMASK_ULL(30, 24)
+#define TEMP_THRESHOLD1_STATUS BIT_ULL(32) /* threshold1 reached */
+#define TEMP_THRESHOLD2_STATUS BIT_ULL(33) /* threshold2 reached */
+/* threshold1 policy: 0 - AP2 (90% throttle) / 1 - AP1 (50% throttle) */
+#define TEMP_THRESHOLD1_POLICY BIT_ULL(44)
+
+#define FME_THERM_RDSENSOR_FMT1 0x10
+#define FPGA_TEMPERATURE GENMASK_ULL(6, 0)
+
+#define FME_THERM_CAP 0x20
+#define TEMP_THRESHOLD_DISABLE BIT_ULL(0)
+
+#define THERMAL_ATTR(_name, _mode, _show, _store) \
+struct device_attribute thermal_attr_##_name = \
+ __ATTR(_name, _mode, _show, _store)
+
+#define THERMAL_ATTR_RO(_name, _show) \
+ THERMAL_ATTR(_name, 0444, _show, NULL)
+
+static ssize_t temperature_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_THERMAL_MGMT);
+
+ v = readq(base + FME_THERM_RDSENSOR_FMT1);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(FPGA_TEMPERATURE, v));
+}
+static THERMAL_ATTR_RO(temperature, temperature_show);
+
+static struct attribute *thermal_mgmt_attrs[] = {
+ &thermal_attr_temperature.attr,
+ NULL,
+};
+
+static struct attribute_group thermal_mgmt_attr_group = {
+ .name = "thermal_mgmt",
+ .attrs = thermal_mgmt_attrs,
+};
+
+static ssize_t temp_threshold1_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_THERMAL_MGMT);
+
+ v = readq(base + FME_THERM_THRESHOLD);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(TEMP_THRESHOLD1, v));
+}
+static THERMAL_ATTR_RO(threshold1, temp_threshold1_show);
+
+static ssize_t temp_threshold2_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_THERMAL_MGMT);
+
+ v = readq(base + FME_THERM_THRESHOLD);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(TEMP_THRESHOLD2, v));
+}
+static THERMAL_ATTR_RO(threshold2, temp_threshold2_show);
+
+static ssize_t temp_trip_threshold_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_THERMAL_MGMT);
+
+ v = readq(base + FME_THERM_THRESHOLD);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(TRIP_THRESHOLD, v));
+}
+static THERMAL_ATTR_RO(trip_threshold, temp_trip_threshold_show);
+
+static ssize_t temp_threshold1_status_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_THERMAL_MGMT);
+
+ v = readq(base + FME_THERM_THRESHOLD);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(TEMP_THRESHOLD1_STATUS, v));
+}
+static THERMAL_ATTR_RO(threshold1_status, temp_threshold1_status_show);
+
+static ssize_t temp_threshold2_status_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_THERMAL_MGMT);
+
+ v = readq(base + FME_THERM_THRESHOLD);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(TEMP_THRESHOLD2_STATUS, v));
+}
+static THERMAL_ATTR_RO(threshold2_status, temp_threshold2_status_show);
+
+static ssize_t temp_threshold1_policy_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ void __iomem *base;
+ u64 v;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_THERMAL_MGMT);
+
+ v = readq(base + FME_THERM_THRESHOLD);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n",
+ (unsigned int)FIELD_GET(TEMP_THRESHOLD1_POLICY, v));
+}
+static THERMAL_ATTR_RO(threshold1_policy, temp_threshold1_policy_show);
+
+static struct attribute *thermal_threshold_attrs[] = {
+ &thermal_attr_threshold1.attr,
+ &thermal_attr_threshold2.attr,
+ &thermal_attr_trip_threshold.attr,
+ &thermal_attr_threshold1_status.attr,
+ &thermal_attr_threshold2_status.attr,
+ &thermal_attr_threshold1_policy.attr,
+ NULL,
+};
+
+static struct attribute_group thermal_threshold_attr_group = {
+ .name = "thermal_mgmt",
+ .attrs = thermal_threshold_attrs,
+};
+
+static int fme_thermal_mgmt_init(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ void __iomem *base = feature->ioaddr;
+ int ret;
+ u64 v;
+
+ ret = sysfs_create_group(&pdev->dev.kobj, &thermal_mgmt_attr_group);
+ if (ret)
+ return ret;
+
+ v = readq(base + FME_THERM_CAP);
+ if (FIELD_GET(TEMP_THRESHOLD_DISABLE, v))
+ return 0;
+
+ ret = sysfs_merge_group(&pdev->dev.kobj, &thermal_threshold_attr_group);
+ if (ret)
+ sysfs_remove_group(&pdev->dev.kobj, &thermal_mgmt_attr_group);
+
+ return ret;
+}
+
+static void fme_thermal_mgmt_uinit(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ sysfs_unmerge_group(&pdev->dev.kobj, &thermal_threshold_attr_group);
+ sysfs_remove_group(&pdev->dev.kobj, &thermal_mgmt_attr_group);
+}
+
+static const struct dfl_feature_id fme_thermal_mgmt_id_table[] = {
+ {.id = FME_FEATURE_ID_THERMAL_MGMT,},
+ {0,}
+};
+
+static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
+ .init = fme_thermal_mgmt_init,
+ .uinit = fme_thermal_mgmt_uinit,
+};
+
static struct dfl_feature_driver fme_feature_drvs[] = {
{
.id_table = fme_hdr_id_table,
@@ -227,6 +425,10 @@ static struct dfl_feature_driver fme_feature_drvs[] = {
.ops = &fme_pr_mgmt_ops,
},
{
+ .id_table = fme_thermal_mgmt_id_table,
+ .ops = &fme_thermal_mgmt_ops,
+ },
+ {
.ops = NULL,
},
};
--
2.7.4
As these two functions are used by other private features. e.g.
in error reporting private feature, it requires to check port status
and reset port for error clearing.
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
drivers/fpga/dfl-afu-main.c | 25 ++++++++++++++-----------
drivers/fpga/dfl-afu.h | 3 +++
2 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index 2916876..e727d9b 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -24,14 +24,16 @@
#define DRV_VERSION "0.8"
/**
- * port_enable - enable a port
+ * __port_enable - enable a port
* @pdev: port platform device.
*
* Enable Port by clear the port soft reset bit, which is set by default.
* The AFU is unable to respond to any MMIO access while in reset.
- * port_enable function should only be used after port_disable function.
+ * __port_enable function should only be used after __port_disable function.
+ *
+ * The caller needs to hold lock for protection.
*/
-static void port_enable(struct platform_device *pdev)
+void __port_enable(struct platform_device *pdev)
{
struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
void __iomem *base;
@@ -54,13 +56,14 @@ static void port_enable(struct platform_device *pdev)
#define RST_POLL_TIMEOUT 1000 /* us */
/**
- * port_disable - disable a port
+ * __port_disable - disable a port
* @pdev: port platform device.
*
- * Disable Port by setting the port soft reset bit, it puts the port into
- * reset.
+ * Disable Port by setting the port soft reset bit, it puts the port into reset.
+ *
+ * The caller needs to hold lock for protection.
*/
-static int port_disable(struct platform_device *pdev)
+int __port_disable(struct platform_device *pdev)
{
struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
void __iomem *base;
@@ -106,9 +109,9 @@ static int __port_reset(struct platform_device *pdev)
{
int ret;
- ret = port_disable(pdev);
+ ret = __port_disable(pdev);
if (!ret)
- port_enable(pdev);
+ __port_enable(pdev);
return ret;
}
@@ -810,9 +813,9 @@ static int port_enable_set(struct platform_device *pdev, bool enable)
mutex_lock(&pdata->lock);
if (enable)
- port_enable(pdev);
+ __port_enable(pdev);
else
- ret = port_disable(pdev);
+ ret = __port_disable(pdev);
mutex_unlock(&pdata->lock);
return ret;
diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
index 0c7630a..35e60c5 100644
--- a/drivers/fpga/dfl-afu.h
+++ b/drivers/fpga/dfl-afu.h
@@ -79,6 +79,9 @@ struct dfl_afu {
struct dfl_feature_platform_data *pdata;
};
+void __port_enable(struct platform_device *pdev);
+int __port_disable(struct platform_device *pdev);
+
void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);
int afu_mmio_region_add(struct dfl_feature_platform_data *pdata,
u32 region_index, u64 region_size, u64 phys, u32 flags);
--
2.7.4
Error reporting is one important private feature, it reports error
detected on port and accelerated function unit (AFU). It introduces
several sysfs interfaces to allow userspace to check and clear
errors detected by hardware.
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
Documentation/ABI/testing/sysfs-platform-dfl-port | 29 +++
drivers/fpga/Makefile | 1 +
drivers/fpga/dfl-afu-error.c | 225 ++++++++++++++++++++++
drivers/fpga/dfl-afu-main.c | 4 +
drivers/fpga/dfl-afu.h | 4 +
5 files changed, 263 insertions(+)
create mode 100644 drivers/fpga/dfl-afu-error.c
diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port
index f611e47..e6140aa 100644
--- a/Documentation/ABI/testing/sysfs-platform-dfl-port
+++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
@@ -79,3 +79,32 @@ KernelVersion: 5.2
Contact: Wu Hao <[email protected]>
Description: Read-only. Read this file to get the status of issued command
to userclck_freqcntrcmd.
+
+What: /sys/bus/platform/devices/dfl-port.0/errors/errors
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get errors detected on port and
+ Accelerated Function Unit (AFU).
+
+What: /sys/bus/platform/devices/dfl-port.0/errors/first_error
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get the first error detected by
+ hardware.
+
+What: /sys/bus/platform/devices/dfl-port.0/errors/first_malformed_req
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Read-only. Read this file to get the first malformed request
+ captured by hardware.
+
+What: /sys/bus/platform/devices/dfl-port.0/errors/clear
+Date: March 2019
+KernelVersion: 5.2
+Contact: Wu Hao <[email protected]>
+Description: Write-only. Write error code to this file to clear errors. If
+ the input error code doesn't match, it returns -EBUSY error
+ code.
diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
index c0dd4c8..f1f0af7 100644
--- a/drivers/fpga/Makefile
+++ b/drivers/fpga/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o
dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o
dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o
+dfl-afu-objs += dfl-afu-error.o
# Drivers for FPGAs which implement DFL
obj-$(CONFIG_FPGA_DFL_PCI) += dfl-pci.o
diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
new file mode 100644
index 0000000..b66bd4a
--- /dev/null
+++ b/drivers/fpga/dfl-afu-error.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for FPGA Accelerated Function Unit (AFU) Error Reporting
+ *
+ * Copyright 2019 Intel Corporation, Inc.
+ *
+ * Authors:
+ * Wu Hao <[email protected]>
+ * Xiao Guangrong <[email protected]>
+ * Joseph Grecco <[email protected]>
+ * Enno Luebbers <[email protected]>
+ * Tim Whisonant <[email protected]>
+ * Ananda Ravuri <[email protected]>
+ * Mitchel Henry <[email protected]>
+ */
+
+#include <linux/uaccess.h>
+
+#include "dfl-afu.h"
+
+#define PORT_ERROR_MASK 0x8
+#define PORT_ERROR 0x10
+#define PORT_FIRST_ERROR 0x18
+#define PORT_MALFORMED_REQ0 0x20
+#define PORT_MALFORMED_REQ1 0x28
+
+#define ERROR_MASK GENMASK_ULL(63, 0)
+
+/* mask or unmask port errors by the error mask register. */
+static void __port_err_mask(struct device *dev, bool mask)
+{
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+
+ writeq(mask ? ERROR_MASK : 0, base + PORT_ERROR_MASK);
+}
+
+/* clear port errors. */
+static int __port_err_clear(struct device *dev, u64 err)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+ void __iomem *base_err, *base_hdr;
+ int ret;
+ u64 v;
+
+ base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+ base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
+
+ /*
+ * clear Port Errors
+ *
+ * - Check for AP6 State
+ * - Halt Port by keeping Port in reset
+ * - Set PORT Error mask to all 1 to mask errors
+ * - Clear all errors
+ * - Set Port mask to all 0 to enable errors
+ * - All errors start capturing new errors
+ * - Enable Port by pulling the port out of reset
+ */
+
+ /* if device is still in AP6 power state, can not clear any error. */
+ v = readq(base_hdr + PORT_HDR_STS);
+ if (FIELD_GET(PORT_STS_PWR_STATE, v) == PORT_STS_PWR_STATE_AP6) {
+ dev_err(dev, "Could not clear errors, device in AP6 state.\n");
+ return -EBUSY;
+ }
+
+ /* Halt Port by keeping Port in reset */
+ ret = __port_disable(pdev);
+ if (ret)
+ return ret;
+
+ /* Mask all errors */
+ __port_err_mask(dev, true);
+
+ /* Clear errors if err input matches with current port errors.*/
+ v = readq(base_err + PORT_ERROR);
+
+ if (v == err) {
+ writeq(v, base_err + PORT_ERROR);
+
+ v = readq(base_err + PORT_FIRST_ERROR);
+ writeq(v, base_err + PORT_FIRST_ERROR);
+ } else {
+ ret = -EBUSY;
+ }
+
+ /* Clear mask */
+ __port_err_mask(dev, false);
+
+ /* Enable the Port by clear the reset */
+ __port_enable(pdev);
+
+ return ret;
+}
+
+static ssize_t revision_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ void __iomem *base;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+
+ return scnprintf(buf, PAGE_SIZE, "%u\n", dfl_feature_revision(base));
+}
+static DEVICE_ATTR_RO(revision);
+
+static ssize_t errors_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 error;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+
+ mutex_lock(&pdata->lock);
+ error = readq(base + PORT_ERROR);
+ mutex_unlock(&pdata->lock);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n", (unsigned long long)error);
+}
+static DEVICE_ATTR_RO(errors);
+
+static ssize_t first_error_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 error;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+
+ mutex_lock(&pdata->lock);
+ error = readq(base + PORT_FIRST_ERROR);
+ mutex_unlock(&pdata->lock);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%llx\n", (unsigned long long)error);
+}
+static DEVICE_ATTR_RO(first_error);
+
+static ssize_t first_malformed_req_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ void __iomem *base;
+ u64 req0, req1;
+
+ base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
+
+ mutex_lock(&pdata->lock);
+ req0 = readq(base + PORT_MALFORMED_REQ0);
+ req1 = readq(base + PORT_MALFORMED_REQ1);
+ mutex_unlock(&pdata->lock);
+
+ return scnprintf(buf, PAGE_SIZE, "0x%016llx%016llx\n",
+ (unsigned long long)req1, (unsigned long long)req0);
+}
+static DEVICE_ATTR_RO(first_malformed_req);
+
+static ssize_t clear_store(struct device *dev, struct device_attribute *attr,
+ const char *buff, size_t count)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
+ u64 value;
+ int ret;
+
+ if (kstrtou64(buff, 0, &value))
+ return -EINVAL;
+
+ mutex_lock(&pdata->lock);
+ ret = __port_err_clear(dev, value);
+ mutex_unlock(&pdata->lock);
+
+ return ret ? ret : count;
+}
+static DEVICE_ATTR_WO(clear);
+
+static struct attribute *port_err_attrs[] = {
+ &dev_attr_revision.attr,
+ &dev_attr_errors.attr,
+ &dev_attr_first_error.attr,
+ &dev_attr_first_malformed_req.attr,
+ &dev_attr_clear.attr,
+ NULL,
+};
+
+static struct attribute_group port_err_attr_group = {
+ .attrs = port_err_attrs,
+ .name = "errors",
+};
+
+static int port_err_init(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
+
+ dev_dbg(&pdev->dev, "PORT ERR Init.\n");
+
+ mutex_lock(&pdata->lock);
+ __port_err_mask(&pdev->dev, false);
+ mutex_unlock(&pdata->lock);
+
+ return sysfs_create_group(&pdev->dev.kobj, &port_err_attr_group);
+}
+
+static void port_err_uinit(struct platform_device *pdev,
+ struct dfl_feature *feature)
+{
+ dev_dbg(&pdev->dev, "PORT ERR UInit.\n");
+
+ sysfs_remove_group(&pdev->dev.kobj, &port_err_attr_group);
+}
+
+const struct dfl_feature_id port_err_id_table[] = {
+ {.id = PORT_FEATURE_ID_ERROR,},
+ {0,}
+};
+
+const struct dfl_feature_ops port_err_ops = {
+ .init = port_err_init,
+ .uinit = port_err_uinit,
+};
diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
index e727d9b..754729e 100644
--- a/drivers/fpga/dfl-afu-main.c
+++ b/drivers/fpga/dfl-afu-main.c
@@ -528,6 +528,10 @@ static struct dfl_feature_driver port_feature_drvs[] = {
.ops = &port_afu_ops,
},
{
+ .id_table = port_err_id_table,
+ .ops = &port_err_ops,
+ },
+ {
.ops = NULL,
}
};
diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
index 35e60c5..c3182a2 100644
--- a/drivers/fpga/dfl-afu.h
+++ b/drivers/fpga/dfl-afu.h
@@ -100,4 +100,8 @@ int afu_dma_unmap_region(struct dfl_feature_platform_data *pdata, u64 iova);
struct dfl_afu_dma_region *
afu_dma_region_find(struct dfl_feature_platform_data *pdata,
u64 iova, u64 size);
+
+extern const struct dfl_feature_ops port_err_ops;
+extern const struct dfl_feature_id port_err_id_table[];
+
#endif /* __DFL_AFU_H */
--
2.7.4
In early partial reconfiguration private feature, it only
supports 32bit data width when writing data to hardware for
PR. 512bit data width PR support is an important optimization
for some specific solutions (e.g. XEON with FPGA integrated),
it allows driver to use AVX512 instruction to improve the
performance of partial reconfiguration. e.g. programming one
100MB bitstream image via this 512bit data width PR hardware
only takes ~300ms, but 32bit revision requires ~3s per test
result.
Please note now this optimization is only done on revision 2
of this PR private feature which is only used in integrated
solution that AVX512 is always supported.
Signed-off-by: Ananda Ravuri <[email protected]>
Signed-off-by: Xu Yilun <[email protected]>
Signed-off-by: Wu Hao <[email protected]>
---
drivers/fpga/dfl-fme-main.c | 3 ++
drivers/fpga/dfl-fme-mgr.c | 75 +++++++++++++++++++++++++++++++++++++--------
drivers/fpga/dfl-fme-pr.c | 45 ++++++++++++++++-----------
drivers/fpga/dfl-fme.h | 2 ++
drivers/fpga/dfl.h | 5 +++
5 files changed, 99 insertions(+), 31 deletions(-)
diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
index 086ad24..076d74f 100644
--- a/drivers/fpga/dfl-fme-main.c
+++ b/drivers/fpga/dfl-fme-main.c
@@ -21,6 +21,8 @@
#include "dfl.h"
#include "dfl-fme.h"
+#define DRV_VERSION "0.8"
+
static ssize_t ports_num_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -277,3 +279,4 @@ MODULE_DESCRIPTION("FPGA Management Engine driver");
MODULE_AUTHOR("Intel Corporation");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS("platform:dfl-fme");
+MODULE_VERSION(DRV_VERSION);
diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
index b3f7eee..027d457 100644
--- a/drivers/fpga/dfl-fme-mgr.c
+++ b/drivers/fpga/dfl-fme-mgr.c
@@ -22,14 +22,18 @@
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/fpga/fpga-mgr.h>
+#include "dfl.h"
#include "dfl-fme-pr.h"
+#define DRV_VERSION "0.8"
+
/* FME Partial Reconfiguration Sub Feature Register Set */
#define FME_PR_DFH 0x0
#define FME_PR_CTRL 0x8
#define FME_PR_STS 0x10
#define FME_PR_DATA 0x18
#define FME_PR_ERR 0x20
+#define FME_PR_512_DATA 0x40 /* Data Register for 512bit datawidth PR */
#define FME_PR_INTFC_ID_L 0xA8
#define FME_PR_INTFC_ID_H 0xB0
@@ -67,8 +71,31 @@
#define PR_WAIT_TIMEOUT 8000000
#define PR_HOST_STATUS_IDLE 0
+#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
+
+#include <asm/fpu/api.h>
+
+static inline void copy512(void *src, void __iomem *dst)
+{
+ kernel_fpu_begin();
+
+ asm volatile("vmovdqu64 (%0), %%zmm0;"
+ "vmovntdq %%zmm0, (%1);"
+ :
+ : "r"(src), "r"(dst));
+
+ kernel_fpu_end();
+}
+#else
+static inline void copy512(void *src, void __iomem *dst)
+{
+ WARN_ON_ONCE(1);
+}
+#endif
+
struct fme_mgr_priv {
void __iomem *ioaddr;
+ unsigned int pr_datawidth;
u64 pr_error;
};
@@ -169,7 +196,7 @@ static int fme_mgr_write(struct fpga_manager *mgr,
struct fme_mgr_priv *priv = mgr->priv;
void __iomem *fme_pr = priv->ioaddr;
u64 pr_ctrl, pr_status, pr_data;
- int delay = 0, pr_credit, i = 0;
+ int ret = 0, delay = 0, pr_credit;
dev_dbg(dev, "start request\n");
@@ -181,9 +208,9 @@ static int fme_mgr_write(struct fpga_manager *mgr,
/*
* driver can push data to PR hardware using PR_DATA register once HW
- * has enough pr_credit (> 1), pr_credit reduces one for every 32bit
- * pr data write to PR_DATA register. If pr_credit <= 1, driver needs
- * to wait for enough pr_credit from hardware by polling.
+ * has enough pr_credit (> 1), pr_credit reduces one for every pr data
+ * width write to PR_DATA register. If pr_credit <= 1, driver needs to
+ * wait for enough pr_credit from hardware by polling.
*/
pr_status = readq(fme_pr + FME_PR_STS);
pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr_status);
@@ -192,7 +219,8 @@ static int fme_mgr_write(struct fpga_manager *mgr,
while (pr_credit <= 1) {
if (delay++ > PR_WAIT_TIMEOUT) {
dev_err(dev, "PR_CREDIT timeout\n");
- return -ETIMEDOUT;
+ ret = -ETIMEDOUT;
+ goto done;
}
udelay(1);
@@ -200,21 +228,32 @@ static int fme_mgr_write(struct fpga_manager *mgr,
pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT, pr_status);
}
- if (count < 4) {
+ if (count < priv->pr_datawidth) {
dev_err(dev, "Invalid PR bitstream size\n");
return -EINVAL;
}
- pr_data = 0;
- pr_data |= FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
- *(((u32 *)buf) + i));
- writeq(pr_data, fme_pr + FME_PR_DATA);
- count -= 4;
+ switch (priv->pr_datawidth) {
+ case 4:
+ pr_data = 0;
+ pr_data |= FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
+ *((u32 *)buf));
+ writeq(pr_data, fme_pr + FME_PR_DATA);
+ break;
+ case 64:
+ copy512((void *)buf, fme_pr + FME_PR_512_DATA);
+ break;
+ default:
+ ret = -EFAULT;
+ goto done;
+ }
+ buf += priv->pr_datawidth;
+ count -= priv->pr_datawidth;
pr_credit--;
- i++;
}
- return 0;
+done:
+ return ret;
}
static int fme_mgr_write_complete(struct fpga_manager *mgr,
@@ -302,6 +341,15 @@ static int fme_mgr_probe(struct platform_device *pdev)
return PTR_ERR(priv->ioaddr);
}
+ /*
+ * Only revision 2 supports 512bit datawidth for better performance,
+ * other revisions use default 32bit datawidth.
+ */
+ if (dfl_feature_revision(priv->ioaddr) == 2)
+ priv->pr_datawidth = 64;
+ else
+ priv->pr_datawidth = 4;
+
compat_id = devm_kzalloc(dev, sizeof(*compat_id), GFP_KERNEL);
if (!compat_id)
return -ENOMEM;
@@ -342,3 +390,4 @@ MODULE_DESCRIPTION("FPGA Manager for DFL FPGA Management Engine");
MODULE_AUTHOR("Intel Corporation");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS("platform:dfl-fme-mgr");
+MODULE_VERSION(DRV_VERSION);
diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
index c1fb1fe..8a0e46a 100644
--- a/drivers/fpga/dfl-fme-pr.c
+++ b/drivers/fpga/dfl-fme-pr.c
@@ -83,7 +83,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
if (copy_from_user(&port_pr, argp, minsz))
return -EFAULT;
- if (port_pr.argsz < minsz || port_pr.flags)
+ if (port_pr.argsz < minsz || port_pr.flags || !port_pr.buffer_size)
return -EINVAL;
/* get fme header region */
@@ -101,15 +101,25 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
port_pr.buffer_size))
return -EFAULT;
+ mutex_lock(&pdata->lock);
+ fme = dfl_fpga_pdata_get_private(pdata);
+ /* fme device has been unregistered. */
+ if (!fme) {
+ ret = -EINVAL;
+ goto unlock_exit;
+ }
+
/*
* align PR buffer per PR bandwidth, as HW ignores the extra padding
* data automatically.
*/
- length = ALIGN(port_pr.buffer_size, 4);
+ length = ALIGN(port_pr.buffer_size, fme->pr_datawidth);
buf = vmalloc(length);
- if (!buf)
- return -ENOMEM;
+ if (!buf) {
+ ret = -ENOMEM;
+ goto unlock_exit;
+ }
if (copy_from_user(buf,
(void __user *)(unsigned long)port_pr.buffer_address,
@@ -127,18 +137,10 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
info->flags |= FPGA_MGR_PARTIAL_RECONFIG;
- mutex_lock(&pdata->lock);
- fme = dfl_fpga_pdata_get_private(pdata);
- /* fme device has been unregistered. */
- if (!fme) {
- ret = -EINVAL;
- goto unlock_exit;
- }
-
region = dfl_fme_region_find(fme, port_pr.port_id);
if (!region) {
ret = -EINVAL;
- goto unlock_exit;
+ goto free_exit;
}
fpga_image_info_free(region->info);
@@ -159,13 +161,10 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
fpga_bridges_put(®ion->bridge_list);
put_device(®ion->dev);
-unlock_exit:
- mutex_unlock(&pdata->lock);
free_exit:
vfree(buf);
- if (copy_to_user((void __user *)arg, &port_pr, minsz))
- return -EFAULT;
-
+unlock_exit:
+ mutex_unlock(&pdata->lock);
return ret;
}
@@ -391,6 +390,16 @@ static int pr_mgmt_init(struct platform_device *pdev,
mutex_lock(&pdata->lock);
priv = dfl_fpga_pdata_get_private(pdata);
+ /*
+ * Initialize PR data width.
+ * Only revision 2 supports 512bit datawidth for better performance,
+ * other revisions use default 32bit datawidth.
+ */
+ if (dfl_feature_revision(feature->ioaddr) == 2)
+ priv->pr_datawidth = 64;
+ else
+ priv->pr_datawidth = 4;
+
/* Initialize the region and bridge sub device list */
INIT_LIST_HEAD(&priv->region_list);
INIT_LIST_HEAD(&priv->bridge_list);
diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
index 5394a21..de20755 100644
--- a/drivers/fpga/dfl-fme.h
+++ b/drivers/fpga/dfl-fme.h
@@ -21,12 +21,14 @@
/**
* struct dfl_fme - dfl fme private data
*
+ * @pr_datawidth: data width for partial reconfiguration.
* @mgr: FME's FPGA manager platform device.
* @region_list: linked list of FME's FPGA regions.
* @bridge_list: linked list of FME's FPGA bridges.
* @pdata: fme platform device's pdata.
*/
struct dfl_fme {
+ int pr_datawidth;
struct platform_device *mgr;
struct list_head region_list;
struct list_head bridge_list;
diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
index a8b869e..8851c6c 100644
--- a/drivers/fpga/dfl.h
+++ b/drivers/fpga/dfl.h
@@ -331,6 +331,11 @@ static inline bool dfl_feature_is_port(void __iomem *base)
(FIELD_GET(DFH_ID, v) == DFH_ID_FIU_PORT);
}
+static inline u8 dfl_feature_revision(void __iomem *base)
+{
+ return (u8)FIELD_GET(DFH_REVISION, readq(base + DFH));
+}
+
/**
* struct dfl_fpga_enum_info - DFL FPGA enumeration information
*
--
2.7.4
On Sun, Mar 24, 2019 at 10:23 PM Wu Hao <[email protected]> wrote:
Hi Hao,
>
> FME_PR_INTFC_ID is used as compat_id for fpga manager and region,
> but high 64 bits and low 64 bits of the compat_id are swapped by
> mistake. This patch fixes this problem by fixing register address.
>
> Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Thanks,
Alan
> ---
> drivers/fpga/dfl-fme-mgr.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
> index 76f3770..b3f7eee 100644
> --- a/drivers/fpga/dfl-fme-mgr.c
> +++ b/drivers/fpga/dfl-fme-mgr.c
> @@ -30,8 +30,8 @@
> #define FME_PR_STS 0x10
> #define FME_PR_DATA 0x18
> #define FME_PR_ERR 0x20
> -#define FME_PR_INTFC_ID_H 0xA8
> -#define FME_PR_INTFC_ID_L 0xB0
> +#define FME_PR_INTFC_ID_L 0xA8
> +#define FME_PR_INTFC_ID_H 0xB0
>
> /* FME PR Control Register Bitfield */
> #define FME_PR_CTRL_PR_RST BIT_ULL(0) /* Reset PR engine */
> --
> 2.7.4
>
On Sun, Mar 24, 2019 at 10:23 PM Wu Hao <[email protected]> wrote:
Hi Hao,
Looks good, one question below.
>
> Current driver checks if input bitstream file size is aligned or
> not per PR data width (default 32bits). It requires one additional
> step for end user when they generate the bitstream file, padding
> extra zeros to bitstream file to align its size per PR data width,
> but they don't have to as hardware will drop extra padding bytes
> automatically.
>
> In order to simplify the user steps, this patch aligns PR buffer
> size per PR data width in driver, to allow user to pass unaligned
> size bitstream files to driver.
>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
> ---
> drivers/fpga/dfl-fme-pr.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
> index d9ca955..c1fb1fe 100644
> --- a/drivers/fpga/dfl-fme-pr.c
> +++ b/drivers/fpga/dfl-fme-pr.c
> @@ -74,6 +74,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> struct dfl_fme *fme;
> unsigned long minsz;
> void *buf = NULL;
> + size_t length;
> int ret = 0;
> u64 v;
>
> @@ -85,9 +86,6 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> if (port_pr.argsz < minsz || port_pr.flags)
> return -EINVAL;
>
> - if (!IS_ALIGNED(port_pr.buffer_size, 4))
> - return -EINVAL;
> -
> /* get fme header region */
> fme_hdr = dfl_get_feature_ioaddr_by_id(&pdev->dev,
> FME_FEATURE_ID_HEADER);
> @@ -103,7 +101,13 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> port_pr.buffer_size))
> return -EFAULT;
>
> - buf = vmalloc(port_pr.buffer_size);
> + /*
> + * align PR buffer per PR bandwidth, as HW ignores the extra padding
> + * data automatically.
> + */
> + length = ALIGN(port_pr.buffer_size, 4);
> +
> + buf = vmalloc(length);
Since it may not be completely filled, would it be worthwhile to alloc
a zero'ed buff?
Alan
> if (!buf)
> return -ENOMEM;
>
> @@ -140,7 +144,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> fpga_image_info_free(region->info);
>
> info->buf = buf;
> - info->count = port_pr.buffer_size;
> + info->count = length;
> info->region_id = port_pr.port_id;
> region->info = info;
>
> --
> 2.7.4
>
On Sun, Mar 24, 2019 at 10:23 PM Wu Hao <[email protected]> wrote:
Hi Hao,
This looks fine.
>
> In early partial reconfiguration private feature, it only
> supports 32bit data width when writing data to hardware for
> PR. 512bit data width PR support is an important optimization
> for some specific solutions (e.g. XEON with FPGA integrated),
> it allows driver to use AVX512 instruction to improve the
> performance of partial reconfiguration. e.g. programming one
> 100MB bitstream image via this 512bit data width PR hardware
> only takes ~300ms, but 32bit revision requires ~3s per test
> result.
>
> Please note now this optimization is only done on revision 2
> of this PR private feature which is only used in integrated
> solution that AVX512 is always supported.
>
> Signed-off-by: Ananda Ravuri <[email protected]>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
On Mon, 2019-03-25 at 11:07 +0800, Wu Hao wrote:
> In early partial reconfiguration private feature, it only
> supports 32bit data width when writing data to hardware for
> PR. 512bit data width PR support is an important optimization
> for some specific solutions (e.g. XEON with FPGA integrated),
> it allows driver to use AVX512 instruction to improve the
> performance of partial reconfiguration. e.g. programming one
> 100MB bitstream image via this 512bit data width PR hardware
> only takes ~300ms, but 32bit revision requires ~3s per test
> result.
>
> Please note now this optimization is only done on revision 2
> of this PR private feature which is only used in integrated
> solution that AVX512 is always supported.
>
> Signed-off-by: Ananda Ravuri <[email protected]>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
> ---
> drivers/fpga/dfl-fme-main.c | 3 ++
> drivers/fpga/dfl-fme-mgr.c | 75 +++++++++++++++++++++++++++++++++++++---
> -----
> drivers/fpga/dfl-fme-pr.c | 45 ++++++++++++++++-----------
> drivers/fpga/dfl-fme.h | 2 ++
> drivers/fpga/dfl.h | 5 +++
> 5 files changed, 99 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> index 086ad24..076d74f 100644
> --- a/drivers/fpga/dfl-fme-main.c
> +++ b/drivers/fpga/dfl-fme-main.c
> @@ -21,6 +21,8 @@
> #include "dfl.h"
> #include "dfl-fme.h"
>
> +#define DRV_VERSION "0.8"
What is this going to be used for? Under what circumstances will the
driver version be bumped? What does it have to do with 512-bit writes?
> +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
> +
> +#include <asm/fpu/api.h>
> +
> +static inline void copy512(void *src, void __iomem *dst)
> +{
> + kernel_fpu_begin();
> +
> + asm volatile("vmovdqu64 (%0), %%zmm0;"
> + "vmovntdq %%zmm0, (%1);"
> + :
> + : "r"(src), "r"(dst));
> +
> + kernel_fpu_end();
> +}
Shouldn't there be some sort of check that AVX512 is actually supported
on the running system?
Also, src should be const, and the asm statement should have a memory
clobber.
> +#else
> +static inline void copy512(void *src, void __iomem *dst)
> +{
> + WARN_ON_ONCE(1);
> +}
> +#endif
Likewise, this will be called if a revision 2 device is used on non-x86
(or on x86 with an old binutils). The driver should fall back to 32-bit
in such cases.
> @@ -200,21 +228,32 @@ static int fme_mgr_write(struct fpga_manager *mgr,
> pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT,
> pr_status);
> }
>
> - if (count < 4) {
> + if (count < priv->pr_datawidth) {
> dev_err(dev, "Invalid PR bitstream size\n");
> return -EINVAL;
Shouldn't this have become a WARN_ON in patch 2 given that the kernel
already pads the buffer?
> }
>
> - pr_data = 0;
> - pr_data |= FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
> - *(((u32 *)buf) + i));
> - writeq(pr_data, fme_pr + FME_PR_DATA);
> - count -= 4;
> + switch (priv->pr_datawidth) {
> + case 4:
> + pr_data = 0;
> + pr_data |= FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
> + *((u32 *)buf));
I know it's not new, but why not just "pr_data = FIELD..."? Const should
also be preserved in the cast, and you can drop one set of parentheses.
> + writeq(pr_data, fme_pr + FME_PR_DATA);
> + break;
> + case 64:
> + copy512((void *)buf, fme_pr + FME_PR_512_DATA);
> + break;
Unnecessary cast.
> + default:
> + ret = -EFAULT;
> + goto done;
How is it EFAULT? Any other value for pr_datawidth should be WARN_ON
since it's set by kernel code.
> @@ -159,13 +161,10 @@ static int fme_pr(struct platform_device *pdev,
> unsigned long arg)
> fpga_bridges_put(®ion->bridge_list);
>
> put_device(®ion->dev);
> -unlock_exit:
> - mutex_unlock(&pdata->lock);
> free_exit:
> vfree(buf);
> - if (copy_to_user((void __user *)arg, &port_pr, minsz))
> - return -EFAULT;
> -
Why is the copy_to_user being removed?
-Scott
On Mon, 2019-03-25 at 17:53 -0500, Scott Wood wrote:
> On Mon, 2019-03-25 at 11:07 +0800, Wu Hao wrote:
> > In early partial reconfiguration private feature, it only
> > supports 32bit data width when writing data to hardware for
> > PR. 512bit data width PR support is an important optimization
> > for some specific solutions (e.g. XEON with FPGA integrated),
> > it allows driver to use AVX512 instruction to improve the
> > performance of partial reconfiguration. e.g. programming one
> > 100MB bitstream image via this 512bit data width PR hardware
> > only takes ~300ms, but 32bit revision requires ~3s per test
> > result.
> >
> > Please note now this optimization is only done on revision 2
> > of this PR private feature which is only used in integrated
> > solution that AVX512 is always supported.
> >
> > Signed-off-by: Ananda Ravuri <[email protected]>
> > Signed-off-by: Xu Yilun <[email protected]>
> > Signed-off-by: Wu Hao <[email protected]>
> > ---
> > drivers/fpga/dfl-fme-main.c | 3 ++
> > drivers/fpga/dfl-fme-mgr.c | 75 +++++++++++++++++++++++++++++++++++++-
> > --
> > -----
> > drivers/fpga/dfl-fme-pr.c | 45 ++++++++++++++++-----------
> > drivers/fpga/dfl-fme.h | 2 ++
> > drivers/fpga/dfl.h | 5 +++
> > 5 files changed, 99 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > index 086ad24..076d74f 100644
> > --- a/drivers/fpga/dfl-fme-main.c
> > +++ b/drivers/fpga/dfl-fme-main.c
> > @@ -21,6 +21,8 @@
> > #include "dfl.h"
> > #include "dfl-fme.h"
> >
> > +#define DRV_VERSION "0.8"
>
> What is this going to be used for? Under what circumstances will the
> driver version be bumped? What does it have to do with 512-bit writes?
>
> > +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
> > +
> > +#include <asm/fpu/api.h>
> > +
> > +static inline void copy512(void *src, void __iomem *dst)
> > +{
> > + kernel_fpu_begin();
> > +
> > + asm volatile("vmovdqu64 (%0), %%zmm0;"
> > + "vmovntdq %%zmm0, (%1);"
> > + :
> > + : "r"(src), "r"(dst));
> > +
> > + kernel_fpu_end();
> > +}
>
> Shouldn't there be some sort of check that AVX512 is actually supported
> on the running system?
>
> Also, src should be const, and the asm statement should have a memory
> clobber.
>
> > +#else
> > +static inline void copy512(void *src, void __iomem *dst)
> > +{
> > + WARN_ON_ONCE(1);
> > +}
> > +#endif
>
> Likewise, this will be called if a revision 2 device is used on non-x86
> (or on x86 with an old binutils). The driver should fall back to 32-bit
> in such cases.
Sorry, I missed the comment about revision 2 only being on integrated
devices -- but will that always be the case? Seems worthwhile to check for
AVX512 support anyway. And there's still the possibility of being built
with an old binutils such that CONFIG_AS_AVX512 is not set, or running on a
kernel where avx512 was disabled via a boot option.
What about future revisions >= 2? Currently the driver will treat them as
if they were revision < 2. Is that intended?
-Scott
On Mon, Mar 25, 2019 at 12:50:40PM -0500, Alan Tull wrote:
> On Sun, Mar 24, 2019 at 10:23 PM Wu Hao <[email protected]> wrote:
>
> Hi Hao,
>
> Looks good, one question below.
>
> >
> > Current driver checks if input bitstream file size is aligned or
> > not per PR data width (default 32bits). It requires one additional
> > step for end user when they generate the bitstream file, padding
> > extra zeros to bitstream file to align its size per PR data width,
> > but they don't have to as hardware will drop extra padding bytes
> > automatically.
> >
> > In order to simplify the user steps, this patch aligns PR buffer
> > size per PR data width in driver, to allow user to pass unaligned
> > size bitstream files to driver.
> >
> > Signed-off-by: Xu Yilun <[email protected]>
> > Signed-off-by: Wu Hao <[email protected]>
> > ---
> > drivers/fpga/dfl-fme-pr.c | 14 +++++++++-----
> > 1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
> > index d9ca955..c1fb1fe 100644
> > --- a/drivers/fpga/dfl-fme-pr.c
> > +++ b/drivers/fpga/dfl-fme-pr.c
> > @@ -74,6 +74,7 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> > struct dfl_fme *fme;
> > unsigned long minsz;
> > void *buf = NULL;
> > + size_t length;
> > int ret = 0;
> > u64 v;
> >
> > @@ -85,9 +86,6 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> > if (port_pr.argsz < minsz || port_pr.flags)
> > return -EINVAL;
> >
> > - if (!IS_ALIGNED(port_pr.buffer_size, 4))
> > - return -EINVAL;
> > -
> > /* get fme header region */
> > fme_hdr = dfl_get_feature_ioaddr_by_id(&pdev->dev,
> > FME_FEATURE_ID_HEADER);
> > @@ -103,7 +101,13 @@ static int fme_pr(struct platform_device *pdev, unsigned long arg)
> > port_pr.buffer_size))
> > return -EFAULT;
> >
> > - buf = vmalloc(port_pr.buffer_size);
> > + /*
> > + * align PR buffer per PR bandwidth, as HW ignores the extra padding
> > + * data automatically.
> > + */
> > + length = ALIGN(port_pr.buffer_size, 4);
> > +
> > + buf = vmalloc(length);
>
> Since it may not be completely filled, would it be worthwhile to alloc
> a zero'ed buff?
>
Hi Alan,
Thanks for the review, acutally per spec, hw doesn't care about the
extra padding data. So for now, i guess we don't need this.
Thanks
Hao
> Alan
>
On Mon, Mar 25, 2019 at 5:58 PM Scott Wood <[email protected]> wrote:
Hi Scott,
>
> On Mon, 2019-03-25 at 17:53 -0500, Scott Wood wrote:
> > On Mon, 2019-03-25 at 11:07 +0800, Wu Hao wrote:
> > > In early partial reconfiguration private feature, it only
> > > supports 32bit data width when writing data to hardware for
> > > PR. 512bit data width PR support is an important optimization
> > > for some specific solutions (e.g. XEON with FPGA integrated),
> > > it allows driver to use AVX512 instruction to improve the
> > > performance of partial reconfiguration. e.g. programming one
> > > 100MB bitstream image via this 512bit data width PR hardware
> > > only takes ~300ms, but 32bit revision requires ~3s per test
> > > result.
> > >
> > > Please note now this optimization is only done on revision 2
> > > of this PR private feature which is only used in integrated
> > > solution that AVX512 is always supported.
> > >
> > > Signed-off-by: Ananda Ravuri <[email protected]>
> > > Signed-off-by: Xu Yilun <[email protected]>
> > > Signed-off-by: Wu Hao <[email protected]>
> > > ---
> > > drivers/fpga/dfl-fme-main.c | 3 ++
> > > drivers/fpga/dfl-fme-mgr.c | 75 +++++++++++++++++++++++++++++++++++++-
> > > --
> > > -----
> > > drivers/fpga/dfl-fme-pr.c | 45 ++++++++++++++++-----------
> > > drivers/fpga/dfl-fme.h | 2 ++
> > > drivers/fpga/dfl.h | 5 +++
> > > 5 files changed, 99 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > > index 086ad24..076d74f 100644
> > > --- a/drivers/fpga/dfl-fme-main.c
> > > +++ b/drivers/fpga/dfl-fme-main.c
> > > @@ -21,6 +21,8 @@
> > > #include "dfl.h"
> > > #include "dfl-fme.h"
> > >
> > > +#define DRV_VERSION "0.8"
> >
> > What is this going to be used for? Under what circumstances will the
> > driver version be bumped? What does it have to do with 512-bit writes?
> >
> > > +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
> > > +
> > > +#include <asm/fpu/api.h>
> > > +
> > > +static inline void copy512(void *src, void __iomem *dst)
> > > +{
> > > + kernel_fpu_begin();
> > > +
> > > + asm volatile("vmovdqu64 (%0), %%zmm0;"
> > > + "vmovntdq %%zmm0, (%1);"
> > > + :
> > > + : "r"(src), "r"(dst));
> > > +
> > > + kernel_fpu_end();
> > > +}
> >
> > Shouldn't there be some sort of check that AVX512 is actually supported
> > on the running system?
> >
> > Also, src should be const, and the asm statement should have a memory
> > clobber.
> >
> > > +#else
> > > +static inline void copy512(void *src, void __iomem *dst)
> > > +{
> > > + WARN_ON_ONCE(1);
> > > +}
> > > +#endif
> >
> > Likewise, this will be called if a revision 2 device is used on non-x86
> > (or on x86 with an old binutils). The driver should fall back to 32-bit
> > in such cases.
>
> Sorry, I missed the comment about revision 2 only being on integrated
> devices -- but will that always be the case? Seems worthwhile to check for
> AVX512 support anyway. And there's still the possibility of being built
> with an old binutils such that CONFIG_AS_AVX512 is not set, or running on a
> kernel where avx512 was disabled via a boot option.
The code checks for CONFIG_AS_AVX512 above.
What boot option are you referring to?
Alan
>
> What about future revisions >= 2? Currently the driver will treat them as
> if they were revision < 2. Is that intended?
>
> -Scott
>
>
On Tue, 2019-03-26 at 14:33 -0500, Alan Tull wrote:
> On Mon, Mar 25, 2019 at 5:58 PM Scott Wood <[email protected]> wrote:
>
> Hi Scott,
>
> > On Mon, 2019-03-25 at 17:53 -0500, Scott Wood wrote:
> > > On Mon, 2019-03-25 at 11:07 +0800, Wu Hao wrote:
> > > > +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
> > > > +
> > > > +#include <asm/fpu/api.h>
> > > > +
> > > > +static inline void copy512(void *src, void __iomem *dst)
> > > > +{
> > > > + kernel_fpu_begin();
> > > > +
> > > > + asm volatile("vmovdqu64 (%0), %%zmm0;"
> > > > + "vmovntdq %%zmm0, (%1);"
> > > > + :
> > > > + : "r"(src), "r"(dst));
> > > > +
> > > > + kernel_fpu_end();
> > > > +}
> > >
> > > Shouldn't there be some sort of check that AVX512 is actually
> > > supported
> > > on the running system?
> > >
> > > Also, src should be const, and the asm statement should have a memory
> > > clobber.
> > >
> > > > +#else
> > > > +static inline void copy512(void *src, void __iomem *dst)
> > > > +{
> > > > + WARN_ON_ONCE(1);
> > > > +}
> > > > +#endif
> > >
> > > Likewise, this will be called if a revision 2 device is used on non-
> > > x86
> > > (or on x86 with an old binutils). The driver should fall back to 32-
> > > bit
> > > in such cases.
> >
> > Sorry, I missed the comment about revision 2 only being on integrated
> > devices -- but will that always be the case? Seems worthwhile to check
> > for
> > AVX512 support anyway. And there's still the possibility of being built
> > with an old binutils such that CONFIG_AS_AVX512 is not set, or running
> > on a
> > kernel where avx512 was disabled via a boot option.
>
> The code checks for CONFIG_AS_AVX512 above.
That just indicates that binutils supports it. Plus, the code does not
check for CONFIG_AS_AVX512 when deciding whether to set pr_datawidth to 64
(and thus call copy512), so you'll get a WARN_ON rather than falling back to
32-bit.
> What boot option are you referring to?
clearcpuid=304
-Scott
On Tue, Mar 26, 2019 at 04:22:34PM -0500, Scott Wood wrote:
> On Tue, 2019-03-26 at 14:33 -0500, Alan Tull wrote:
> > On Mon, Mar 25, 2019 at 5:58 PM Scott Wood <[email protected]> wrote:
> >
> > Hi Scott,
> >
> > > On Mon, 2019-03-25 at 17:53 -0500, Scott Wood wrote:
> > > > On Mon, 2019-03-25 at 11:07 +0800, Wu Hao wrote:
> > > > > +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
> > > > > +
> > > > > +#include <asm/fpu/api.h>
> > > > > +
> > > > > +static inline void copy512(void *src, void __iomem *dst)
> > > > > +{
> > > > > + kernel_fpu_begin();
> > > > > +
> > > > > + asm volatile("vmovdqu64 (%0), %%zmm0;"
> > > > > + "vmovntdq %%zmm0, (%1);"
> > > > > + :
> > > > > + : "r"(src), "r"(dst));
> > > > > +
> > > > > + kernel_fpu_end();
> > > > > +}
> > > >
> > > > Shouldn't there be some sort of check that AVX512 is actually
> > > > supported
> > > > on the running system?
> > > >
> > > > Also, src should be const, and the asm statement should have a memory
> > > > clobber.
Yes, I will fix this in the next version.
> > > >
> > > > > +#else
> > > > > +static inline void copy512(void *src, void __iomem *dst)
> > > > > +{
> > > > > + WARN_ON_ONCE(1);
> > > > > +}
> > > > > +#endif
> > > >
> > > > Likewise, this will be called if a revision 2 device is used on non-
> > > > x86
> > > > (or on x86 with an old binutils). The driver should fall back to 32-
> > > > bit
> > > > in such cases.
Unfortunately revision 2 is only for integrated FPGA solution, and it doesn't
support any fallback solution (original 32bit data partial reconfiguration is
not supported any more), so driver has to WARN in such path.
> > >
> > > Sorry, I missed the comment about revision 2 only being on integrated
> > > devices -- but will that always be the case? Seems worthwhile to check
> > > for
> > > AVX512 support anyway. And there's still the possibility of being built
> > > with an old binutils such that CONFIG_AS_AVX512 is not set, or running
> > > on a
> > > kernel where avx512 was disabled via a boot option.
> >
> > The code checks for CONFIG_AS_AVX512 above.
>
> That just indicates that binutils supports it. Plus, the code does not
> check for CONFIG_AS_AVX512 when deciding whether to set pr_datawidth to 64
> (and thus call copy512), so you'll get a WARN_ON rather than falling back to
> 32-bit.
>
> > What boot option are you referring to?
>
> clearcpuid=304
Just tried it, my system was down after running above AVX512 with this option.
I agree that it needs to add some check code to make sure it's safe to run
such instructions. I will add some cpu_feature_enabled() check in the next
version.
Thanks a lot for the review and comments.
Hao
>
> -Scott
>
On Mon, Mar 25, 2019 at 05:58:36PM -0500, Scott Wood wrote:
> On Mon, 2019-03-25 at 17:53 -0500, Scott Wood wrote:
> > On Mon, 2019-03-25 at 11:07 +0800, Wu Hao wrote:
> > > In early partial reconfiguration private feature, it only
> > > supports 32bit data width when writing data to hardware for
> > > PR. 512bit data width PR support is an important optimization
> > > for some specific solutions (e.g. XEON with FPGA integrated),
> > > it allows driver to use AVX512 instruction to improve the
> > > performance of partial reconfiguration. e.g. programming one
> > > 100MB bitstream image via this 512bit data width PR hardware
> > > only takes ~300ms, but 32bit revision requires ~3s per test
> > > result.
> > >
> > > Please note now this optimization is only done on revision 2
> > > of this PR private feature which is only used in integrated
> > > solution that AVX512 is always supported.
> > >
> > > Signed-off-by: Ananda Ravuri <[email protected]>
> > > Signed-off-by: Xu Yilun <[email protected]>
> > > Signed-off-by: Wu Hao <[email protected]>
> > > ---
> > > drivers/fpga/dfl-fme-main.c | 3 ++
> > > drivers/fpga/dfl-fme-mgr.c | 75 +++++++++++++++++++++++++++++++++++++-
> > > --
> > > -----
> > > drivers/fpga/dfl-fme-pr.c | 45 ++++++++++++++++-----------
> > > drivers/fpga/dfl-fme.h | 2 ++
> > > drivers/fpga/dfl.h | 5 +++
> > > 5 files changed, 99 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > > index 086ad24..076d74f 100644
> > > --- a/drivers/fpga/dfl-fme-main.c
> > > +++ b/drivers/fpga/dfl-fme-main.c
> > > @@ -21,6 +21,8 @@
> > > #include "dfl.h"
> > > #include "dfl-fme.h"
> > >
> > > +#define DRV_VERSION "0.8"
> >
> > What is this going to be used for? Under what circumstances will the
> > driver version be bumped? What does it have to do with 512-bit writes?
This patchset adds more features to this driver, so i would like to add
a DRV_VERSION there as an initial one. In the future, if some new features
or extensions for existing features (e.g. new revision of a private feature)
are added we need to bump this version.
> >
> > > +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
> > > +
> > > +#include <asm/fpu/api.h>
> > > +
> > > +static inline void copy512(void *src, void __iomem *dst)
> > > +{
> > > + kernel_fpu_begin();
> > > +
> > > + asm volatile("vmovdqu64 (%0), %%zmm0;"
> > > + "vmovntdq %%zmm0, (%1);"
> > > + :
> > > + : "r"(src), "r"(dst));
> > > +
> > > + kernel_fpu_end();
> > > +}
> >
> > Shouldn't there be some sort of check that AVX512 is actually supported
> > on the running system?
> >
> > Also, src should be const, and the asm statement should have a memory
> > clobber.
> >
> > > +#else
> > > +static inline void copy512(void *src, void __iomem *dst)
> > > +{
> > > + WARN_ON_ONCE(1);
> > > +}
> > > +#endif
> >
> > Likewise, this will be called if a revision 2 device is used on non-x86
> > (or on x86 with an old binutils). The driver should fall back to 32-bit
> > in such cases.
>
> Sorry, I missed the comment about revision 2 only being on integrated
> devices -- but will that always be the case? Seems worthwhile to check for
> AVX512 support anyway. And there's still the possibility of being built
> with an old binutils such that CONFIG_AS_AVX512 is not set, or running on a
> kernel where avx512 was disabled via a boot option.
>
> What about future revisions >= 2? Currently the driver will treat them as
> if they were revision < 2. Is that intended?
Yes, it's intended. Currently we don't have any hardware with revisions > 2,
and support new revisions may need new code. :) e.g. currently revision is
used to tell 32bit vs 512bit PR, but in future revisions, it may have new
capability registers for this purpose.
Thanks
Hao
>
> -Scott
>
On Mon, Mar 25, 2019 at 05:53:50PM -0500, Scott Wood wrote:
> On Mon, 2019-03-25 at 11:07 +0800, Wu Hao wrote:
> > @@ -200,21 +228,32 @@ static int fme_mgr_write(struct fpga_manager *mgr,
> > pr_credit = FIELD_GET(FME_PR_STS_PR_CREDIT,
> > pr_status);
> > }
> >
> > - if (count < 4) {
> > + if (count < priv->pr_datawidth) {
> > dev_err(dev, "Invalid PR bitstream size\n");
> > return -EINVAL;
>
> Shouldn't this have become a WARN_ON in patch 2 given that the kernel
> already pads the buffer?
Thanks a lot for the review and comments.
I agree. it's better to use WARN_ON this place.
>
> > }
> >
> > - pr_data = 0;
> > - pr_data |= FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
> > - *(((u32 *)buf) + i));
> > - writeq(pr_data, fme_pr + FME_PR_DATA);
> > - count -= 4;
> > + switch (priv->pr_datawidth) {
> > + case 4:
> > + pr_data = 0;
> > + pr_data |= FIELD_PREP(FME_PR_DATA_PR_DATA_RAW,
> > + *((u32 *)buf));
>
> I know it's not new, but why not just "pr_data = FIELD..."? Const should
> also be preserved in the cast, and you can drop one set of parentheses.
Yes, agree, will fix this.
>
> > + writeq(pr_data, fme_pr + FME_PR_DATA);
> > + break;
> > + case 64:
> > + copy512((void *)buf, fme_pr + FME_PR_512_DATA);
> > + break;
>
> Unnecessary cast.
Will fix this.
>
> > + default:
> > + ret = -EFAULT;
> > + goto done;
>
> How is it EFAULT? Any other value for pr_datawidth should be WARN_ON
> since it's set by kernel code.
Agree, will fix this in the next version.
>
> > @@ -159,13 +161,10 @@ static int fme_pr(struct platform_device *pdev,
> > unsigned long arg)
> > fpga_bridges_put(®ion->bridge_list);
> >
> > put_device(®ion->dev);
> > -unlock_exit:
> > - mutex_unlock(&pdata->lock);
> > free_exit:
> > vfree(buf);
> > - if (copy_to_user((void __user *)arg, &port_pr, minsz))
> > - return -EFAULT;
> > -
>
> Why is the copy_to_user being removed?
This code is not needed at all but added by mistake i think.
Sorry, i should move these code into a separated patch with proper comments
to avoid confusion.
Thanks
Hao
>
> -Scott
On Wed, 2019-03-27 at 12:37 +0800, Wu Hao wrote:
> On Tue, Mar 26, 2019 at 04:22:34PM -0500, Scott Wood wrote:
> > On Tue, 2019-03-26 at 14:33 -0500, Alan Tull wrote:
> > > On Mon, Mar 25, 2019 at 5:58 PM Scott Wood <[email protected]> wrote:
> > > > >
> > > Hi Scott,
> > >
> > > > On Mon, 2019-03-25 at 17:53 -0500, Scott Wood wrote:
> > > > > On Mon, 2019-03-25 at 11:07 +0800, Wu Hao wrote:
> > > > > > +#else
> > > > > > +static inline void copy512(void *src, void __iomem *dst)
> > > > > > +{
> > > > > > + WARN_ON_ONCE(1);
> > > > > > +}
> > > > > > +#endif
> > > > >
Likewise, this will be called if a revision 2 device is used on non-
> > > > > x86
> > > > > (or on x86 with an old binutils). The driver should fall back to
> > > > > 32-
> > > > > bit
> > > > > in such cases.
>
> Unfortunately revision 2 is only for integrated FPGA solution, and it
> doesn't
> support any fallback solution (original 32bit data partial reconfiguration
> is
> not supported any more), so driver has to WARN in such path.
From the commit message it seemed like this was just an optimization, not
something necessary to support revision 2.
If there's no way to program the device without AVX512, then printing an
error message and returning an error to userspace would be better than
WARN_ON, since it's not actually a kernel bug.
-Scott
On Wed, Mar 27, 2019 at 01:10:31AM -0500, Scott Wood wrote:
> On Wed, 2019-03-27 at 12:37 +0800, Wu Hao wrote:
> > On Tue, Mar 26, 2019 at 04:22:34PM -0500, Scott Wood wrote:
> > > On Tue, 2019-03-26 at 14:33 -0500, Alan Tull wrote:
> > > > On Mon, Mar 25, 2019 at 5:58 PM Scott Wood <[email protected]> wrote:
> > > > > >
> > > > Hi Scott,
> > > >
> > > > > On Mon, 2019-03-25 at 17:53 -0500, Scott Wood wrote:
> > > > > > On Mon, 2019-03-25 at 11:07 +0800, Wu Hao wrote:
> > > > > > > +#else
> > > > > > > +static inline void copy512(void *src, void __iomem *dst)
> > > > > > > +{
> > > > > > > + WARN_ON_ONCE(1);
> > > > > > > +}
> > > > > > > +#endif
> > > > > >
> Likewise, this will be called if a revision 2 device is used on non-
> > > > > > x86
> > > > > > (or on x86 with an old binutils). The driver should fall back to
> > > > > > 32-
> > > > > > bit
> > > > > > in such cases.
> >
> > Unfortunately revision 2 is only for integrated FPGA solution, and it
> > doesn't
> > support any fallback solution (original 32bit data partial reconfiguration
> > is
> > not supported any more), so driver has to WARN in such path.
>
> >From the commit message it seemed like this was just an optimization, not
> something necessary to support revision 2.
>
> If there's no way to program the device without AVX512, then printing an
> error message and returning an error to userspace would be better than
> WARN_ON, since it's not actually a kernel bug.
Fair enough. Will do. Thanks for the suggestion.
Hao
>
> -Scott
>
On Wed, 2019-03-27 at 13:10 +0800, Wu Hao wrote:
> On Mon, Mar 25, 2019 at 05:58:36PM -0500, Scott Wood wrote:
> > On Mon, 2019-03-25 at 17:53 -0500, Scott Wood wrote:
> > > On Mon, 2019-03-25 at 11:07 +0800, Wu Hao wrote:
> > > > In early partial reconfiguration private feature, it only
> > > > supports 32bit data width when writing data to hardware for
> > > > PR. 512bit data width PR support is an important optimization
> > > > for some specific solutions (e.g. XEON with FPGA integrated),
> > > > it allows driver to use AVX512 instruction to improve the
> > > > performance of partial reconfiguration. e.g. programming one
> > > > 100MB bitstream image via this 512bit data width PR hardware
> > > > only takes ~300ms, but 32bit revision requires ~3s per test
> > > > result.
> > > >
> > > > Please note now this optimization is only done on revision 2
> > > > of this PR private feature which is only used in integrated
> > > > solution that AVX512 is always supported.
> > > >
> > > > Signed-off-by: Ananda Ravuri <[email protected]>
> > > > Signed-off-by: Xu Yilun <[email protected]>
> > > > Signed-off-by: Wu Hao <[email protected]>
> > > > ---
> > > > drivers/fpga/dfl-fme-main.c | 3 ++
> > > > drivers/fpga/dfl-fme-mgr.c | 75
> > > > +++++++++++++++++++++++++++++++++++++-
> > > > --
> > > > -----
> > > > drivers/fpga/dfl-fme-pr.c | 45 ++++++++++++++++-----------
> > > > drivers/fpga/dfl-fme.h | 2 ++
> > > > drivers/fpga/dfl.h | 5 +++
> > > > 5 files changed, 99 insertions(+), 31 deletions(-)
> > > >
> > > > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-
> > > > main.c
> > > > index 086ad24..076d74f 100644
> > > > --- a/drivers/fpga/dfl-fme-main.c
> > > > +++ b/drivers/fpga/dfl-fme-main.c
> > > > @@ -21,6 +21,8 @@
> > > > #include "dfl.h"
> > > > #include "dfl-fme.h"
> > > >
> > > > +#define DRV_VERSION "0.8"
> > >
> > > What is this going to be used for? Under what circumstances will the
> > > driver version be bumped? What does it have to do with 512-bit
> > > writes?
>
> This patchset adds more features to this driver, so i would like to add
> a DRV_VERSION there as an initial one. In the future, if some new features
> or extensions for existing features (e.g. new revision of a private
> feature)
> are added we need to bump this version.
This doesn't seem like a good way of advertising API availability... Besides
being awkward to query, what happens if a distro kernel has backported some
features but not others that came before? What does it advertise?
I'd suggest some sort of feature flag mechanism that can be queried via
ioctl (e.g. along the lines of KVM capabilities), if "try the API and fall
back if it fails" is unsatisfactory.
Plus, if it's about new APIs being exposed, this doesn't seem like the right
patch for it to be in...
> > Sorry, I missed the comment about revision 2 only being on integrated
> > devices -- but will that always be the case? Seems worthwhile to check
> > for
> > AVX512 support anyway. And there's still the possibility of being built
> > with an old binutils such that CONFIG_AS_AVX512 is not set, or running
> > on a
> > kernel where avx512 was disabled via a boot option.
> >
> > What about future revisions >= 2? Currently the driver will treat them
> > as
> > if they were revision < 2. Is that intended?
>
> Yes, it's intended. Currently we don't have any hardware with revisions >
> 2,
> and support new revisions may need new code. :) e.g. currently revision
> is
> used to tell 32bit vs 512bit PR, but in future revisions, it may have new
> capability registers for this purpose.
The driver should refuse to bind to unrecognized revisions, if they're not
expected to be compatible.
-Scott
On Wed, Mar 27, 2019 at 01:19:29AM -0500, Scott Wood wrote:
> On Wed, 2019-03-27 at 13:10 +0800, Wu Hao wrote:
> > On Mon, Mar 25, 2019 at 05:58:36PM -0500, Scott Wood wrote:
> > > On Mon, 2019-03-25 at 17:53 -0500, Scott Wood wrote:
> > > > On Mon, 2019-03-25 at 11:07 +0800, Wu Hao wrote:
> > > > > In early partial reconfiguration private feature, it only
> > > > > supports 32bit data width when writing data to hardware for
> > > > > PR. 512bit data width PR support is an important optimization
> > > > > for some specific solutions (e.g. XEON with FPGA integrated),
> > > > > it allows driver to use AVX512 instruction to improve the
> > > > > performance of partial reconfiguration. e.g. programming one
> > > > > 100MB bitstream image via this 512bit data width PR hardware
> > > > > only takes ~300ms, but 32bit revision requires ~3s per test
> > > > > result.
> > > > >
> > > > > Please note now this optimization is only done on revision 2
> > > > > of this PR private feature which is only used in integrated
> > > > > solution that AVX512 is always supported.
> > > > >
> > > > > Signed-off-by: Ananda Ravuri <[email protected]>
> > > > > Signed-off-by: Xu Yilun <[email protected]>
> > > > > Signed-off-by: Wu Hao <[email protected]>
> > > > > ---
> > > > > drivers/fpga/dfl-fme-main.c | 3 ++
> > > > > drivers/fpga/dfl-fme-mgr.c | 75
> > > > > +++++++++++++++++++++++++++++++++++++-
> > > > > --
> > > > > -----
> > > > > drivers/fpga/dfl-fme-pr.c | 45 ++++++++++++++++-----------
> > > > > drivers/fpga/dfl-fme.h | 2 ++
> > > > > drivers/fpga/dfl.h | 5 +++
> > > > > 5 files changed, 99 insertions(+), 31 deletions(-)
> > > > >
> > > > > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-
> > > > > main.c
> > > > > index 086ad24..076d74f 100644
> > > > > --- a/drivers/fpga/dfl-fme-main.c
> > > > > +++ b/drivers/fpga/dfl-fme-main.c
> > > > > @@ -21,6 +21,8 @@
> > > > > #include "dfl.h"
> > > > > #include "dfl-fme.h"
> > > > >
> > > > > +#define DRV_VERSION "0.8"
> > > >
> > > > What is this going to be used for? Under what circumstances will the
> > > > driver version be bumped? What does it have to do with 512-bit
> > > > writes?
> >
> > This patchset adds more features to this driver, so i would like to add
> > a DRV_VERSION there as an initial one. In the future, if some new features
> > or extensions for existing features (e.g. new revision of a private
> > feature)
> > are added we need to bump this version.
>
> This doesn't seem like a good way of advertising API availability... Besides
> being awkward to query, what happens if a distro kernel has backported some
> features but not others that came before? What does it advertise?
DRV_VERSION here is not used for API availablity. :)
> I'd suggest some sort of feature flag mechanism that can be queried via
> ioctl (e.g. along the lines of KVM capabilities), if "try the API and fall
> back if it fails" is unsatisfactory.
>
> Plus, if it's about new APIs being exposed, this doesn't seem like the right
> patch for it to be in...
Actually this patch doesn't introduce new APIs, I am trying to make this
transparent to endusers. That means users don't need to know it's a 32bit
PR or a faster 512bit one, they still use the same IOCTL interface for PR.
the API_VERSION and CHECK_EXTENSION ioctls have been defined, but I think
at least we don't need to bump them for this change. How do you think?
>
> > > Sorry, I missed the comment about revision 2 only being on integrated
> > > devices -- but will that always be the case? Seems worthwhile to check
> > > for
> > > AVX512 support anyway. And there's still the possibility of being built
> > > with an old binutils such that CONFIG_AS_AVX512 is not set, or running
> > > on a
> > > kernel where avx512 was disabled via a boot option.
> > >
> > > What about future revisions >= 2? Currently the driver will treat them
> > > as
> > > if they were revision < 2. Is that intended?
> >
> > Yes, it's intended. Currently we don't have any hardware with revisions >
> > 2,
> > and support new revisions may need new code. :) e.g. currently revision
> > is
> > used to tell 32bit vs 512bit PR, but in future revisions, it may have new
> > capability registers for this purpose.
>
> The driver should refuse to bind to unrecognized revisions, if they're not
> expected to be compatible.
Yes, agree.
Thanks
Hao
On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
Hi Hao,
>
> This patch introduces more sysfs interfaces for Accelerated
> Function Unit (AFU). These interfaces allow users to read
> current AFU Power State (APx), read / clear AFU Power (APx)
> events which are sticky to identify transient APx state,
> and manage AFU's LTR (latency tolerance reporting).
>
> Signed-off-by: Ananda Ravuri <[email protected]>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Thanks,
Alan
On Mon, Mar 25, 2019 at 7:44 PM Wu Hao <[email protected]> wrote:
>
> On Mon, Mar 25, 2019 at 12:50:40PM -0500, Alan Tull wrote:
> > On Sun, Mar 24, 2019 at 10:23 PM Wu Hao <[email protected]> wrote:
> >
> > Hi Hao,
> >
> > Looks good, one question below.
> >
> > >
> > > Current driver checks if input bitstream file size is aligned or
> > > not per PR data width (default 32bits). It requires one additional
> > > step for end user when they generate the bitstream file, padding
> > > extra zeros to bitstream file to align its size per PR data width,
> > > but they don't have to as hardware will drop extra padding bytes
> > > automatically.
> > >
> > > In order to simplify the user steps, this patch aligns PR buffer
> > > size per PR data width in driver, to allow user to pass unaligned
> > > size bitstream files to driver.
> > >
> > > Signed-off-by: Xu Yilun <[email protected]>
> > > Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Thanks,
Alan
On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
>
> This patch enables the standard sriov support. It allows user to
> enable SRIOV (and VFs), then user could pass through accelerators
> (VFs) into virtual machine or use VFs directly in host.
>
> Signed-off-by: Zhang Yi Z <[email protected]>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
On Sun, Mar 24, 2019 at 10:23 PM Wu Hao <[email protected]> wrote:
>
> In order to support virtualization usage via PCIe SRIOV, this patch
> adds two ioctls under FPGA Management Engine (FME) to release and
> assign back the port device. In order to safely turn Port from PF
> into VF and enable PCIe SRIOV, it requires user to invoke this
> PORT_RELEASE ioctl to release port firstly to remove userspace
> interfaces, and then configure the PF/VF access register in FME.
> After disable SRIOV, it requires user to invoke this PORT_ASSIGN
> ioctl to attach the port back to PF.
>
> Ioctl interfaces:
> * DFL_FPGA_FME_PORT_RELEASE
> Release platform device of given port, it deletes port platform
> device to remove related userspace interfaces on PF, then
> configures PF/VF access mode to VF.
>
> * DFL_FPGA_FME_PORT_ASSIGN
> Assign platform device of given port back to PF, it configures
> PF/VF access mode to PF, then adds port platform device back to
> re-enable related userspace interfaces on PF.
>
> Signed-off-by: Zhang Yi Z <[email protected]>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Hi Wu,
On Mon, Mar 25, 2019 at 11:07:28AM +0800, Wu Hao wrote:
> FME_PR_INTFC_ID is used as compat_id for fpga manager and region,
> but high 64 bits and low 64 bits of the compat_id are swapped by
> mistake. This patch fixes this problem by fixing register address.
>
> Signed-off-by: Wu Hao <[email protected]>
> ---
> drivers/fpga/dfl-fme-mgr.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
> index 76f3770..b3f7eee 100644
> --- a/drivers/fpga/dfl-fme-mgr.c
> +++ b/drivers/fpga/dfl-fme-mgr.c
> @@ -30,8 +30,8 @@
> #define FME_PR_STS 0x10
> #define FME_PR_DATA 0x18
> #define FME_PR_ERR 0x20
> -#define FME_PR_INTFC_ID_H 0xA8
> -#define FME_PR_INTFC_ID_L 0xB0
> +#define FME_PR_INTFC_ID_L 0xA8
> +#define FME_PR_INTFC_ID_H 0xB0
Does this handle endianess correct?
>
> /* FME PR Control Register Bitfield */
> #define FME_PR_CTRL_PR_RST BIT_ULL(0) /* Reset PR engine */
> --
> 2.7.4
>
Cheers,
Moritz
On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
Hi Hao,
Looks fine.
>
> This patch introduces userclock sysfs interfaces for AFU, user
> could use these interfaces for clock setting to AFU.
>
> Please note that, this is only working for port header feature
> with revision 0, for later revisions, userclock setting is moved
> to a separated private feature, so one revision sysfs interface
> is exposed to userspace application for this purpose too.
>
> Signed-off-by: Ananda Ravuri <[email protected]>
> Signed-off-by: Russ Weight <[email protected]>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
On Mon, Apr 01, 2019 at 12:54:47PM -0700, Moritz Fischer wrote:
> Hi Wu,
>
> On Mon, Mar 25, 2019 at 11:07:28AM +0800, Wu Hao wrote:
> > FME_PR_INTFC_ID is used as compat_id for fpga manager and region,
> > but high 64 bits and low 64 bits of the compat_id are swapped by
> > mistake. This patch fixes this problem by fixing register address.
> >
> > Signed-off-by: Wu Hao <[email protected]>
> > ---
> > drivers/fpga/dfl-fme-mgr.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
> > index 76f3770..b3f7eee 100644
> > --- a/drivers/fpga/dfl-fme-mgr.c
> > +++ b/drivers/fpga/dfl-fme-mgr.c
> > @@ -30,8 +30,8 @@
> > #define FME_PR_STS 0x10
> > #define FME_PR_DATA 0x18
> > #define FME_PR_ERR 0x20
> > -#define FME_PR_INTFC_ID_H 0xA8
> > -#define FME_PR_INTFC_ID_L 0xB0
> > +#define FME_PR_INTFC_ID_L 0xA8
> > +#define FME_PR_INTFC_ID_H 0xB0
>
> Does this handle endianess correct?
Hi Moritz,
This is just a bug fixing for wrong offsets given to these 2 registers
according to spec. I think this is not endianess related, and per my
understanding we don't need more code on endianess handling as that
should be done inside the readq function already. :)
Thanks
Hao
Hi Wu,
On Tue, Apr 02, 2019 at 12:38:45PM +0800, Wu Hao wrote:
> On Mon, Apr 01, 2019 at 12:54:47PM -0700, Moritz Fischer wrote:
> > Hi Wu,
> >
> > On Mon, Mar 25, 2019 at 11:07:28AM +0800, Wu Hao wrote:
> > > FME_PR_INTFC_ID is used as compat_id for fpga manager and region,
> > > but high 64 bits and low 64 bits of the compat_id are swapped by
> > > mistake. This patch fixes this problem by fixing register address.
> > >
> > > Signed-off-by: Wu Hao <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
> > > ---
> > > drivers/fpga/dfl-fme-mgr.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
> > > index 76f3770..b3f7eee 100644
> > > --- a/drivers/fpga/dfl-fme-mgr.c
> > > +++ b/drivers/fpga/dfl-fme-mgr.c
> > > @@ -30,8 +30,8 @@
> > > #define FME_PR_STS 0x10
> > > #define FME_PR_DATA 0x18
> > > #define FME_PR_ERR 0x20
> > > -#define FME_PR_INTFC_ID_H 0xA8
> > > -#define FME_PR_INTFC_ID_L 0xB0
> > > +#define FME_PR_INTFC_ID_L 0xA8
> > > +#define FME_PR_INTFC_ID_H 0xB0
> >
> > Does this handle endianess correct?
>
> Hi Moritz,
>
> This is just a bug fixing for wrong offsets given to these 2 registers
> according to spec. I think this is not endianess related, and per my
> understanding we don't need more code on endianess handling as that
> should be done inside the readq function already. :)
>
> Thanks
> Hao
Thanks for clarifying,
Moritz
Hi Wu,
On Mon, Mar 25, 2019 at 11:07:41AM +0800, Wu Hao wrote:
> This patch adds support to thermal management private feature for DFL
> FPGA Management Engine (FME). As thermal throttling is handled by
> hardware automatically per pre-defined thresholds, this private
> feature driver only provides read-only sysfs interfaces for user
> to read temperature, thresholds, threshold policy and other info.
>
> Signed-off-by: Luwei Kang <[email protected]>
> Signed-off-by: Russ Weight <[email protected]>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++++
> drivers/fpga/dfl-fme-main.c | 202 +++++++++++++++++++++++
> 2 files changed, 258 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> index b8327e9..d3aeb88 100644
> --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> @@ -44,3 +44,59 @@ Description: Read-only. It returns socket_id to indicate which socket
> this FPGA belongs to, only valid for integrated solution.
> User only needs this information, in case standard numa node
> can't provide correct information.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/temperature
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns temperature (in Celsius) of this FPGA
> + device.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get the temperature threshold1
> + (in Celsius).
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold2
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get the temperature threshold2
> + (in Celsius).
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/trip_threshold
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns trip threshold (in Celsius), once FPGA
> + temperature reaches trip threshold, it triggers a fatal event
> + to board management controller (BMC) to shutdown FPGA.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1_status
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns 1 if temperature reaches threshold1,
> + otherwise 0. Once temperature reaches threshold1, hardware
> + will automatically enter throttling state (AP1 - 50%
> + or AP2 - 90% throttling, see 'threshold1_policy').
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold2_status
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns 1 if temperature reaches threshold2,
> + otherwise 0. Once temperature reaches threshold2, hardware
> + will automatically enter the deepest throttling state (AP6
> + - 100% throttling).
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1_policy
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get the policy of temperature
> + threshold1. It only supports two value (policy):
> + 0 - AP2 state (90% throttling)
> + 1 - AP1 state (50% throttling)
These look like they could directly map to the linux thermal framework,
any reason you can't use the thermal framework?
The trip stuff literally maps 1:1 to what a thermal driver does, I think
that's something you'd wanna consider.
Cheers,
Moritz
Hi Wu,
On Mon, Mar 25, 2019 at 11:07:36AM +0800, Wu Hao wrote:
> This patch adds id_table for each dfl private feature driver,
> it allows to reuse same private feature driver to match and support
> multiple dfl private features.
>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
> ---
> drivers/fpga/dfl-afu-main.c | 14 ++++++++++++--
> drivers/fpga/dfl-fme-main.c | 11 ++++++++---
> drivers/fpga/dfl-fme-pr.c | 7 ++++++-
> drivers/fpga/dfl-fme.h | 3 ++-
> drivers/fpga/dfl.c | 21 +++++++++++++++++++--
> drivers/fpga/dfl.h | 21 +++++++++++++++------
> 6 files changed, 62 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> index 82fd80a..2916876 100644
> --- a/drivers/fpga/dfl-afu-main.c
> +++ b/drivers/fpga/dfl-afu-main.c
> @@ -440,6 +440,11 @@ port_hdr_ioctl(struct platform_device *pdev, struct dfl_feature *feature,
> return ret;
> }
>
> +static const struct dfl_feature_id port_hdr_id_table[] = {
> + {.id = PORT_FEATURE_ID_HEADER,},
> + {0,}
> +};
> +
> static const struct dfl_feature_ops port_hdr_ops = {
> .init = port_hdr_init,
> .uinit = port_hdr_uinit,
> @@ -500,6 +505,11 @@ static void port_afu_uinit(struct platform_device *pdev,
> sysfs_remove_files(&pdev->dev.kobj, port_afu_attrs);
> }
>
> +static const struct dfl_feature_id port_afu_id_table[] = {
> + {.id = PORT_FEATURE_ID_AFU,},
> + {0,}
> +};
> +
> static const struct dfl_feature_ops port_afu_ops = {
> .init = port_afu_init,
> .uinit = port_afu_uinit,
> @@ -507,11 +517,11 @@ static const struct dfl_feature_ops port_afu_ops = {
>
> static struct dfl_feature_driver port_feature_drvs[] = {
> {
> - .id = PORT_FEATURE_ID_HEADER,
> + .id_table = port_hdr_id_table,
> .ops = &port_hdr_ops,
> },
> {
> - .id = PORT_FEATURE_ID_AFU,
> + .id_table = port_afu_id_table,
> .ops = &port_afu_ops,
> },
> {
> diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> index 8b2a337..38c6342 100644
> --- a/drivers/fpga/dfl-fme-main.c
> +++ b/drivers/fpga/dfl-fme-main.c
> @@ -158,6 +158,11 @@ static long fme_hdr_ioctl(struct platform_device *pdev,
> return -ENODEV;
> }
>
> +static const struct dfl_feature_id fme_hdr_id_table[] = {
> + {.id = FME_FEATURE_ID_HEADER,},
> + {0,}
> +};
> +
> static const struct dfl_feature_ops fme_hdr_ops = {
> .init = fme_hdr_init,
> .uinit = fme_hdr_uinit,
> @@ -166,12 +171,12 @@ static const struct dfl_feature_ops fme_hdr_ops = {
>
> static struct dfl_feature_driver fme_feature_drvs[] = {
> {
> - .id = FME_FEATURE_ID_HEADER,
> + .id_table = fme_hdr_id_table,
> .ops = &fme_hdr_ops,
> },
> {
> - .id = FME_FEATURE_ID_PR_MGMT,
> - .ops = &pr_mgmt_ops,
> + .id_table = fme_pr_mgmt_id_table,
> + .ops = &fme_pr_mgmt_ops,
> },
> {
> .ops = NULL,
> diff --git a/drivers/fpga/dfl-fme-pr.c b/drivers/fpga/dfl-fme-pr.c
> index 8a0e46a..b054ac6 100644
> --- a/drivers/fpga/dfl-fme-pr.c
> +++ b/drivers/fpga/dfl-fme-pr.c
> @@ -482,7 +482,12 @@ static long fme_pr_ioctl(struct platform_device *pdev,
> return ret;
> }
>
> -const struct dfl_feature_ops pr_mgmt_ops = {
> +const struct dfl_feature_id fme_pr_mgmt_id_table[] = {
> + {.id = FME_FEATURE_ID_PR_MGMT,},
> + {0}
> +};
> +
> +const struct dfl_feature_ops fme_pr_mgmt_ops = {
> .init = pr_mgmt_init,
> .uinit = pr_mgmt_uinit,
> .ioctl = fme_pr_ioctl,
> diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
> index de20755..7a021c4 100644
> --- a/drivers/fpga/dfl-fme.h
> +++ b/drivers/fpga/dfl-fme.h
> @@ -35,6 +35,7 @@ struct dfl_fme {
> struct dfl_feature_platform_data *pdata;
> };
>
> -extern const struct dfl_feature_ops pr_mgmt_ops;
> +extern const struct dfl_feature_ops fme_pr_mgmt_ops;
> +extern const struct dfl_feature_id fme_pr_mgmt_id_table[];
>
> #endif /* __DFL_FME_H */
> diff --git a/drivers/fpga/dfl.c b/drivers/fpga/dfl.c
> index c5aa287..65f91ef 100644
> --- a/drivers/fpga/dfl.c
> +++ b/drivers/fpga/dfl.c
> @@ -14,6 +14,8 @@
>
> #include "dfl.h"
>
> +#define DRV_VERSION "0.8"
> +
> static DEFINE_MUTEX(dfl_id_mutex);
>
> /*
> @@ -274,6 +276,21 @@ static int dfl_feature_instance_init(struct platform_device *pdev,
> return ret;
> }
>
> +static bool dfl_feature_drv_match(struct dfl_feature *feature,
> + struct dfl_feature_driver *driver)
> +{
> + const struct dfl_feature_id *ids = driver->id_table;
> +
> + if (ids) {
> + while (ids->id) {
> + if (ids->id == feature->id)
> + return true;
> + ids++;
> + }
> + }
> + return false;
> +}
> +
> /**
> * dfl_fpga_dev_feature_init - init for sub features of dfl feature device
> * @pdev: feature device.
> @@ -294,8 +311,7 @@ int dfl_fpga_dev_feature_init(struct platform_device *pdev,
>
> while (drv->ops) {
> dfl_fpga_dev_for_each_feature(pdata, feature) {
> - /* match feature and drv using id */
> - if (feature->id == drv->id) {
> + if (dfl_feature_drv_match(feature, drv)) {
> ret = dfl_feature_instance_init(pdev, pdata,
> feature, drv);
> if (ret)
> @@ -1164,3 +1180,4 @@ module_exit(dfl_fpga_exit);
> MODULE_DESCRIPTION("FPGA Device Feature List (DFL) Support");
> MODULE_AUTHOR("Intel Corporation");
> MODULE_LICENSE("GPL v2");
> +MODULE_VERSION(DRV_VERSION);
> diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
> index 3c5dc3a..fbc57f0 100644
> --- a/drivers/fpga/dfl.h
> +++ b/drivers/fpga/dfl.h
> @@ -30,8 +30,8 @@
> /* plus one for fme device */
> #define MAX_DFL_FEATURE_DEV_NUM (MAX_DFL_FPGA_PORT_NUM + 1)
>
> -/* Reserved 0x0 for Header Group Register and 0xff for AFU */
> -#define FEATURE_ID_FIU_HEADER 0x0
> +/* Reserved 0xfe for Header Group Register and 0xff for AFU */
> +#define FEATURE_ID_FIU_HEADER 0xfe
> #define FEATURE_ID_AFU 0xff
>
> #define FME_FEATURE_ID_HEADER FEATURE_ID_FIU_HEADER
> @@ -169,13 +169,22 @@ void dfl_fpga_port_ops_put(struct dfl_fpga_port_ops *ops);
> int dfl_fpga_check_port_id(struct platform_device *pdev, void *pport_id);
>
> /**
> - * struct dfl_feature_driver - sub feature's driver
> + * struct dfl_feature_id - dfl private feature id
> *
> - * @id: sub feature id.
> - * @ops: ops of this sub feature.
> + * @id: unique dfl private feature id.
> */
> -struct dfl_feature_driver {
> +struct dfl_feature_id {
> u64 id;
> +};
> +
> +/**
> + * struct dfl_feature_driver - dfl private feature driver
> + *
> + * @id_table: id_table for dfl private features supported by this driver.
> + * @ops: ops of this dfl private feature driver.
> + */
> +struct dfl_feature_driver {
> + const struct dfl_feature_id *id_table;
> const struct dfl_feature_ops *ops;
> };
>
> --
> 2.7.4
>
On Mon, Mar 25, 2019 at 11:07:37AM +0800, Wu Hao wrote:
> As these two functions are used by other private features. e.g.
> in error reporting private feature, it requires to check port status
> and reset port for error clearing.
>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
> ---
> drivers/fpga/dfl-afu-main.c | 25 ++++++++++++++-----------
> drivers/fpga/dfl-afu.h | 3 +++
> 2 files changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> index 2916876..e727d9b 100644
> --- a/drivers/fpga/dfl-afu-main.c
> +++ b/drivers/fpga/dfl-afu-main.c
> @@ -24,14 +24,16 @@
> #define DRV_VERSION "0.8"
>
> /**
> - * port_enable - enable a port
> + * __port_enable - enable a port
> * @pdev: port platform device.
> *
> * Enable Port by clear the port soft reset bit, which is set by default.
> * The AFU is unable to respond to any MMIO access while in reset.
> - * port_enable function should only be used after port_disable function.
> + * __port_enable function should only be used after __port_disable function.
> + *
> + * The caller needs to hold lock for protection.
> */
> -static void port_enable(struct platform_device *pdev)
> +void __port_enable(struct platform_device *pdev)
> {
> struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
> void __iomem *base;
> @@ -54,13 +56,14 @@ static void port_enable(struct platform_device *pdev)
> #define RST_POLL_TIMEOUT 1000 /* us */
>
> /**
> - * port_disable - disable a port
> + * __port_disable - disable a port
> * @pdev: port platform device.
> *
> - * Disable Port by setting the port soft reset bit, it puts the port into
> - * reset.
> + * Disable Port by setting the port soft reset bit, it puts the port into reset.
> + *
> + * The caller needs to hold lock for protection.
> */
> -static int port_disable(struct platform_device *pdev)
> +int __port_disable(struct platform_device *pdev)
> {
> struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
> void __iomem *base;
> @@ -106,9 +109,9 @@ static int __port_reset(struct platform_device *pdev)
> {
> int ret;
>
> - ret = port_disable(pdev);
> + ret = __port_disable(pdev);
> if (!ret)
> - port_enable(pdev);
> + __port_enable(pdev);
>
> return ret;
> }
> @@ -810,9 +813,9 @@ static int port_enable_set(struct platform_device *pdev, bool enable)
>
> mutex_lock(&pdata->lock);
> if (enable)
> - port_enable(pdev);
> + __port_enable(pdev);
> else
> - ret = port_disable(pdev);
> + ret = __port_disable(pdev);
> mutex_unlock(&pdata->lock);
>
> return ret;
> diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
> index 0c7630a..35e60c5 100644
> --- a/drivers/fpga/dfl-afu.h
> +++ b/drivers/fpga/dfl-afu.h
> @@ -79,6 +79,9 @@ struct dfl_afu {
> struct dfl_feature_platform_data *pdata;
> };
>
> +void __port_enable(struct platform_device *pdev);
> +int __port_disable(struct platform_device *pdev);
> +
> void afu_mmio_region_init(struct dfl_feature_platform_data *pdata);
> int afu_mmio_region_add(struct dfl_feature_platform_data *pdata,
> u32 region_index, u64 region_size, u64 phys, u32 flags);
> --
> 2.7.4
>
Hi Wu,
On Mon, Mar 25, 2019 at 11:07:39AM +0800, Wu Hao wrote:
> STP (SignalTap) is one of the private features under the port for
> debugging. This patch adds private feature driver support for it
> to allow userspace applications to mmap related mmio region and
> provide STP service.
>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
Acked-by: Moritz Fischer <[email protected]>
> ---
> drivers/fpga/dfl-afu-main.c | 34 ++++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> index 754729e..14970a4 100644
> --- a/drivers/fpga/dfl-afu-main.c
> +++ b/drivers/fpga/dfl-afu-main.c
> @@ -518,6 +518,36 @@ static const struct dfl_feature_ops port_afu_ops = {
> .uinit = port_afu_uinit,
> };
>
> +static int port_stp_init(struct platform_device *pdev,
> + struct dfl_feature *feature)
> +{
> + struct resource *res = &pdev->resource[feature->resource_index];
> +
> + dev_dbg(&pdev->dev, "PORT STP Init.\n");
> +
> + return afu_mmio_region_add(dev_get_platdata(&pdev->dev),
> + DFL_PORT_REGION_INDEX_STP,
> + resource_size(res), res->start,
> + DFL_PORT_REGION_MMAP | DFL_PORT_REGION_READ |
> + DFL_PORT_REGION_WRITE);
> +}
> +
> +static void port_stp_uinit(struct platform_device *pdev,
> + struct dfl_feature *feature)
> +{
> + dev_dbg(&pdev->dev, "PORT STP UInit.\n");
> +}
> +
> +static const struct dfl_feature_id port_stp_id_table[] = {
> + {.id = PORT_FEATURE_ID_STP,},
> + {0,}
> +};
> +
> +static const struct dfl_feature_ops port_stp_ops = {
> + .init = port_stp_init,
> + .uinit = port_stp_uinit,
> +};
> +
> static struct dfl_feature_driver port_feature_drvs[] = {
> {
> .id_table = port_hdr_id_table,
> @@ -532,6 +562,10 @@ static struct dfl_feature_driver port_feature_drvs[] = {
> .ops = &port_err_ops,
> },
> {
> + .id_table = port_stp_id_table,
> + .ops = &port_stp_ops,
> + },
> + {
> .ops = NULL,
> }
> };
> --
> 2.7.4
>
Thanks,
Moritz
On Tue, Apr 02, 2019 at 07:59:25AM -0700, Moritz Fischer wrote:
> Hi Wu,
>
> On Mon, Mar 25, 2019 at 11:07:41AM +0800, Wu Hao wrote:
> > This patch adds support to thermal management private feature for DFL
> > FPGA Management Engine (FME). As thermal throttling is handled by
> > hardware automatically per pre-defined thresholds, this private
> > feature driver only provides read-only sysfs interfaces for user
> > to read temperature, thresholds, threshold policy and other info.
> >
> > Signed-off-by: Luwei Kang <[email protected]>
> > Signed-off-by: Russ Weight <[email protected]>
> > Signed-off-by: Xu Yilun <[email protected]>
> > Signed-off-by: Wu Hao <[email protected]>
> > ---
> > Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++++
> > drivers/fpga/dfl-fme-main.c | 202 +++++++++++++++++++++++
> > 2 files changed, 258 insertions(+)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > index b8327e9..d3aeb88 100644
> > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > @@ -44,3 +44,59 @@ Description: Read-only. It returns socket_id to indicate which socket
> > this FPGA belongs to, only valid for integrated solution.
> > User only needs this information, in case standard numa node
> > can't provide correct information.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/temperature
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns temperature (in Celsius) of this FPGA
> > + device.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get the temperature threshold1
> > + (in Celsius).
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold2
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get the temperature threshold2
> > + (in Celsius).
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/trip_threshold
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns trip threshold (in Celsius), once FPGA
> > + temperature reaches trip threshold, it triggers a fatal event
> > + to board management controller (BMC) to shutdown FPGA.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1_status
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns 1 if temperature reaches threshold1,
> > + otherwise 0. Once temperature reaches threshold1, hardware
> > + will automatically enter throttling state (AP1 - 50%
> > + or AP2 - 90% throttling, see 'threshold1_policy').
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold2_status
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns 1 if temperature reaches threshold2,
> > + otherwise 0. Once temperature reaches threshold2, hardware
> > + will automatically enter the deepest throttling state (AP6
> > + - 100% throttling).
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1_policy
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get the policy of temperature
> > + threshold1. It only supports two value (policy):
> > + 0 - AP2 state (90% throttling)
> > + 1 - AP1 state (50% throttling)
>
> These look like they could directly map to the linux thermal framework,
> any reason you can't use the thermal framework?
>
> The trip stuff literally maps 1:1 to what a thermal driver does, I think
> that's something you'd wanna consider.
>
Hi Moritz,
Thanks a lot for the suggestion, actually I feel that the trip points in thermal
zone are used to indicate cooling actions required for thermal software either
in kernel or userspace. But in this case, such FPGA hardware handles cooling
automatically (yes, driver only expose Read-only sysfs for information), so
software doesn't need to take care of this at all. For this purpose, it seems
that we don't have to put these thresholds as trip points. And per my
understanding, if people use such FPGA device, then they may need to know
what's the current hardware throttling behavior, e.g. 50% vs 90%. These
information can't be provided by standard thermal zone sysfs, so anyway user
needs these sysfs interfaces to know it. But it seems that we still could
create a thermal zone without trip points, it could help if user wants to
connect some external cooling devices via userspace thermal daemon, they can
define whatever trip points they like to activate the external cooling
device. I will consider this further more and come up with a new patch in
v2 patchset.
Thanks
Hao
> Cheers,
> Moritz
Hi Hao,
On Thu, Apr 04, 2019 at 12:31:47AM +0800, Wu Hao wrote:
> On Tue, Apr 02, 2019 at 07:59:25AM -0700, Moritz Fischer wrote:
> > Hi Wu,
> >
> > On Mon, Mar 25, 2019 at 11:07:41AM +0800, Wu Hao wrote:
> > > This patch adds support to thermal management private feature for DFL
> > > FPGA Management Engine (FME). As thermal throttling is handled by
> > > hardware automatically per pre-defined thresholds, this private
> > > feature driver only provides read-only sysfs interfaces for user
> > > to read temperature, thresholds, threshold policy and other info.
> > >
> > > Signed-off-by: Luwei Kang <[email protected]>
> > > Signed-off-by: Russ Weight <[email protected]>
> > > Signed-off-by: Xu Yilun <[email protected]>
> > > Signed-off-by: Wu Hao <[email protected]>
> > > ---
> > > Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++++
> > > drivers/fpga/dfl-fme-main.c | 202 +++++++++++++++++++++++
> > > 2 files changed, 258 insertions(+)
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > index b8327e9..d3aeb88 100644
> > > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > @@ -44,3 +44,59 @@ Description: Read-only. It returns socket_id to indicate which socket
> > > this FPGA belongs to, only valid for integrated solution.
> > > User only needs this information, in case standard numa node
> > > can't provide correct information.
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/temperature
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. It returns temperature (in Celsius) of this FPGA
> > > + device.
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. Read this file to get the temperature threshold1
> > > + (in Celsius).
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold2
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. Read this file to get the temperature threshold2
> > > + (in Celsius).
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/trip_threshold
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. It returns trip threshold (in Celsius), once FPGA
> > > + temperature reaches trip threshold, it triggers a fatal event
> > > + to board management controller (BMC) to shutdown FPGA.
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1_status
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. It returns 1 if temperature reaches threshold1,
> > > + otherwise 0. Once temperature reaches threshold1, hardware
> > > + will automatically enter throttling state (AP1 - 50%
> > > + or AP2 - 90% throttling, see 'threshold1_policy').
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold2_status
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. It returns 1 if temperature reaches threshold2,
> > > + otherwise 0. Once temperature reaches threshold2, hardware
> > > + will automatically enter the deepest throttling state (AP6
> > > + - 100% throttling).
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1_policy
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. Read this file to get the policy of temperature
> > > + threshold1. It only supports two value (policy):
> > > + 0 - AP2 state (90% throttling)
> > > + 1 - AP1 state (50% throttling)
> >
> > These look like they could directly map to the linux thermal framework,
> > any reason you can't use the thermal framework?
> >
> > The trip stuff literally maps 1:1 to what a thermal driver does, I think
> > that's something you'd wanna consider.
> >
>
> Hi Moritz,
>
> Thanks a lot for the suggestion, actually I feel that the trip points in thermal
> zone are used to indicate cooling actions required for thermal software either
> in kernel or userspace. But in this case, such FPGA hardware handles cooling
> automatically (yes, driver only expose Read-only sysfs for information), so
> software doesn't need to take care of this at all. For this purpose, it seems
> that we don't have to put these thresholds as trip points. And per my
> understanding, if people use such FPGA device, then they may need to know
> what's the current hardware throttling behavior, e.g. 50% vs 90%. These
> information can't be provided by standard thermal zone sysfs, so anyway user
> needs these sysfs interfaces to know it. But it seems that we still could
> create a thermal zone without trip points, it could help if user wants to
> connect some external cooling devices via userspace thermal daemon, they can
> define whatever trip points they like to activate the external cooling
> device. I will consider this further more and come up with a new patch in
> v2 patchset.
Generally speaking extending an existing framework with the
functionality you want is preferable over rolling 100% your own.
So please look into this.
Thanks,
Moritz
On Wed, Apr 03, 2019 at 11:09:09AM -0700, Moritz Fischer wrote:
> Hi Hao,
>
> On Thu, Apr 04, 2019 at 12:31:47AM +0800, Wu Hao wrote:
> > On Tue, Apr 02, 2019 at 07:59:25AM -0700, Moritz Fischer wrote:
> > > Hi Wu,
> > >
> > > On Mon, Mar 25, 2019 at 11:07:41AM +0800, Wu Hao wrote:
> > > > This patch adds support to thermal management private feature for DFL
> > > > FPGA Management Engine (FME). As thermal throttling is handled by
> > > > hardware automatically per pre-defined thresholds, this private
> > > > feature driver only provides read-only sysfs interfaces for user
> > > > to read temperature, thresholds, threshold policy and other info.
> > > >
> > > > Signed-off-by: Luwei Kang <[email protected]>
> > > > Signed-off-by: Russ Weight <[email protected]>
> > > > Signed-off-by: Xu Yilun <[email protected]>
> > > > Signed-off-by: Wu Hao <[email protected]>
> > > > ---
> > > > Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++++
> > > > drivers/fpga/dfl-fme-main.c | 202 +++++++++++++++++++++++
> > > > 2 files changed, 258 insertions(+)
> > > >
> > > > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > > index b8327e9..d3aeb88 100644
> > > > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > > @@ -44,3 +44,59 @@ Description: Read-only. It returns socket_id to indicate which socket
> > > > this FPGA belongs to, only valid for integrated solution.
> > > > User only needs this information, in case standard numa node
> > > > can't provide correct information.
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/temperature
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-only. It returns temperature (in Celsius) of this FPGA
> > > > + device.
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-only. Read this file to get the temperature threshold1
> > > > + (in Celsius).
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold2
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-only. Read this file to get the temperature threshold2
> > > > + (in Celsius).
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/trip_threshold
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-only. It returns trip threshold (in Celsius), once FPGA
> > > > + temperature reaches trip threshold, it triggers a fatal event
> > > > + to board management controller (BMC) to shutdown FPGA.
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1_status
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-only. It returns 1 if temperature reaches threshold1,
> > > > + otherwise 0. Once temperature reaches threshold1, hardware
> > > > + will automatically enter throttling state (AP1 - 50%
> > > > + or AP2 - 90% throttling, see 'threshold1_policy').
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold2_status
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-only. It returns 1 if temperature reaches threshold2,
> > > > + otherwise 0. Once temperature reaches threshold2, hardware
> > > > + will automatically enter the deepest throttling state (AP6
> > > > + - 100% throttling).
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/thermal_mgmt/threshold1_policy
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-only. Read this file to get the policy of temperature
> > > > + threshold1. It only supports two value (policy):
> > > > + 0 - AP2 state (90% throttling)
> > > > + 1 - AP1 state (50% throttling)
> > >
> > > These look like they could directly map to the linux thermal framework,
> > > any reason you can't use the thermal framework?
> > >
> > > The trip stuff literally maps 1:1 to what a thermal driver does, I think
> > > that's something you'd wanna consider.
> > >
> >
> > Hi Moritz,
> >
> > Thanks a lot for the suggestion, actually I feel that the trip points in thermal
> > zone are used to indicate cooling actions required for thermal software either
> > in kernel or userspace. But in this case, such FPGA hardware handles cooling
> > automatically (yes, driver only expose Read-only sysfs for information), so
> > software doesn't need to take care of this at all. For this purpose, it seems
> > that we don't have to put these thresholds as trip points. And per my
> > understanding, if people use such FPGA device, then they may need to know
> > what's the current hardware throttling behavior, e.g. 50% vs 90%. These
> > information can't be provided by standard thermal zone sysfs, so anyway user
> > needs these sysfs interfaces to know it. But it seems that we still could
> > create a thermal zone without trip points, it could help if user wants to
> > connect some external cooling devices via userspace thermal daemon, they can
> > define whatever trip points they like to activate the external cooling
> > device. I will consider this further more and come up with a new patch in
> > v2 patchset.
>
> Generally speaking extending an existing framework with the
> functionality you want is preferable over rolling 100% your own.
>
> So please look into this.
Yes, agree, will look into this and try to fix this in next version.
Thanks for the comments.
Hao
>
> Thanks,
> Moritz
On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
Hi Hao,
>
> Error reporting is one important private feature, it reports error
> detected on port and accelerated function unit (AFU). It introduces
> several sysfs interfaces to allow userspace to check and clear
> errors detected by hardware.
>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-platform-dfl-port | 29 +++
> drivers/fpga/Makefile | 1 +
> drivers/fpga/dfl-afu-error.c | 225 ++++++++++++++++++++++
> drivers/fpga/dfl-afu-main.c | 4 +
> drivers/fpga/dfl-afu.h | 4 +
> 5 files changed, 263 insertions(+)
> create mode 100644 drivers/fpga/dfl-afu-error.c
>
> diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port
> index f611e47..e6140aa 100644
> --- a/Documentation/ABI/testing/sysfs-platform-dfl-port
> +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
> @@ -79,3 +79,32 @@ KernelVersion: 5.2
> Contact: Wu Hao <[email protected]>
> Description: Read-only. Read this file to get the status of issued command
> to userclck_freqcntrcmd.
> +
> +What: /sys/bus/platform/devices/dfl-port.0/errors/errors
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get errors detected on port and
> + Accelerated Function Unit (AFU).
> +
> +What: /sys/bus/platform/devices/dfl-port.0/errors/first_error
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get the first error detected by
> + hardware.
> +
> +What: /sys/bus/platform/devices/dfl-port.0/errors/first_malformed_req
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get the first malformed request
> + captured by hardware.
> +
> +What: /sys/bus/platform/devices/dfl-port.0/errors/clear
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Write-only. Write error code to this file to clear errors. If
> + the input error code doesn't match, it returns -EBUSY error
> + code.
I understand how -EBUSY could be the right error code for when the
hardware is in a state where the error can't be cleared. But if the
input error code doesn't match, shouldn't the code be -EINVAL? Also
as noted below, the way this is currently coded, -ETIMEDOUT could get
returned.
> diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
> index c0dd4c8..f1f0af7 100644
> --- a/drivers/fpga/Makefile
> +++ b/drivers/fpga/Makefile
> @@ -40,6 +40,7 @@ obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o
>
> dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o
> dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o
> +dfl-afu-objs += dfl-afu-error.o
>
> # Drivers for FPGAs which implement DFL
> obj-$(CONFIG_FPGA_DFL_PCI) += dfl-pci.o
> diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
> new file mode 100644
> index 0000000..b66bd4a
> --- /dev/null
> +++ b/drivers/fpga/dfl-afu-error.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Driver for FPGA Accelerated Function Unit (AFU) Error Reporting
> + *
> + * Copyright 2019 Intel Corporation, Inc.
> + *
> + * Authors:
> + * Wu Hao <[email protected]>
> + * Xiao Guangrong <[email protected]>
> + * Joseph Grecco <[email protected]>
> + * Enno Luebbers <[email protected]>
> + * Tim Whisonant <[email protected]>
> + * Ananda Ravuri <[email protected]>
> + * Mitchel Henry <[email protected]>
> + */
> +
> +#include <linux/uaccess.h>
> +
> +#include "dfl-afu.h"
> +
> +#define PORT_ERROR_MASK 0x8
> +#define PORT_ERROR 0x10
> +#define PORT_FIRST_ERROR 0x18
> +#define PORT_MALFORMED_REQ0 0x20
> +#define PORT_MALFORMED_REQ1 0x28
> +
> +#define ERROR_MASK GENMASK_ULL(63, 0)
> +
> +/* mask or unmask port errors by the error mask register. */
> +static void __port_err_mask(struct device *dev, bool mask)
> +{
> + void __iomem *base;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> +
> + writeq(mask ? ERROR_MASK : 0, base + PORT_ERROR_MASK);
> +}
> +
> +/* clear port errors. */
> +static int __port_err_clear(struct device *dev, u64 err)
> +{
> + struct platform_device *pdev = to_platform_device(dev);
> + void __iomem *base_err, *base_hdr;
> + int ret;
> + u64 v;
> +
> + base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> + base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
> +
> + /*
> + * clear Port Errors
> + *
> + * - Check for AP6 State
> + * - Halt Port by keeping Port in reset
> + * - Set PORT Error mask to all 1 to mask errors
> + * - Clear all errors
> + * - Set Port mask to all 0 to enable errors
> + * - All errors start capturing new errors
> + * - Enable Port by pulling the port out of reset
> + */
> +
> + /* if device is still in AP6 power state, can not clear any error. */
> + v = readq(base_hdr + PORT_HDR_STS);
> + if (FIELD_GET(PORT_STS_PWR_STATE, v) == PORT_STS_PWR_STATE_AP6) {
> + dev_err(dev, "Could not clear errors, device in AP6 state.\n");
> + return -EBUSY;
> + }
> +
> + /* Halt Port by keeping Port in reset */
> + ret = __port_disable(pdev);
> + if (ret)
> + return ret;
__port_disable can return -ETIMEDOUT which will then get returned from
clear_store. The sysfs document only talks about -EBUSY. You could
either document -ETIMEDOUT in the sysfs doc or you could change the
code to adjust the returned error code.
> +
> + /* Mask all errors */
> + __port_err_mask(dev, true);
> +
> + /* Clear errors if err input matches with current port errors.*/
> + v = readq(base_err + PORT_ERROR);
> +
> + if (v == err) {
> + writeq(v, base_err + PORT_ERROR);
> +
> + v = readq(base_err + PORT_FIRST_ERROR);
> + writeq(v, base_err + PORT_FIRST_ERROR);
> + } else {
> + ret = -EBUSY;
> + }
> +
> + /* Clear mask */
> + __port_err_mask(dev, false);
> +
> + /* Enable the Port by clear the reset */
> + __port_enable(pdev);
> +
> + return ret;
> +}
> +
> +static ssize_t revision_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + void __iomem *base;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n", dfl_feature_revision(base));
> +}
> +static DEVICE_ATTR_RO(revision);
This appears to be adding a
/sys/bus/platform/devices/dfl-port.0/errors/revision attribute that
isn't documented in the sysfs document.
> +
> +static ssize_t errors_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> + void __iomem *base;
> + u64 error;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> +
> + mutex_lock(&pdata->lock);
> + error = readq(base + PORT_ERROR);
> + mutex_unlock(&pdata->lock);
> +
> + return scnprintf(buf, PAGE_SIZE, "0x%llx\n", (unsigned long long)error);
> +}
> +static DEVICE_ATTR_RO(errors);
> +
> +static ssize_t first_error_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> + void __iomem *base;
> + u64 error;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> +
> + mutex_lock(&pdata->lock);
> + error = readq(base + PORT_FIRST_ERROR);
> + mutex_unlock(&pdata->lock);
> +
> + return scnprintf(buf, PAGE_SIZE, "0x%llx\n", (unsigned long long)error);
> +}
> +static DEVICE_ATTR_RO(first_error);
> +
> +static ssize_t first_malformed_req_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> + void __iomem *base;
> + u64 req0, req1;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> +
> + mutex_lock(&pdata->lock);
> + req0 = readq(base + PORT_MALFORMED_REQ0);
> + req1 = readq(base + PORT_MALFORMED_REQ1);
> + mutex_unlock(&pdata->lock);
> +
> + return scnprintf(buf, PAGE_SIZE, "0x%016llx%016llx\n",
> + (unsigned long long)req1, (unsigned long long)req0);
> +}
> +static DEVICE_ATTR_RO(first_malformed_req);
> +
> +static ssize_t clear_store(struct device *dev, struct device_attribute *attr,
> + const char *buff, size_t count)
> +{
> + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> + u64 value;
> + int ret;
> +
> + if (kstrtou64(buff, 0, &value))
> + return -EINVAL;
> +
> + mutex_lock(&pdata->lock);
> + ret = __port_err_clear(dev, value);
> + mutex_unlock(&pdata->lock);
> +
> + return ret ? ret : count;
> +}
> +static DEVICE_ATTR_WO(clear);
> +
> +static struct attribute *port_err_attrs[] = {
> + &dev_attr_revision.attr,
> + &dev_attr_errors.attr,
> + &dev_attr_first_error.attr,
> + &dev_attr_first_malformed_req.attr,
> + &dev_attr_clear.attr,
> + NULL,
> +};
> +
> +static struct attribute_group port_err_attr_group = {
> + .attrs = port_err_attrs,
> + .name = "errors",
> +};
> +
> +static int port_err_init(struct platform_device *pdev,
> + struct dfl_feature *feature)
> +{
> + struct dfl_feature_platform_data *pdata = dev_get_platdata(&pdev->dev);
> +
> + dev_dbg(&pdev->dev, "PORT ERR Init.\n");
> +
> + mutex_lock(&pdata->lock);
> + __port_err_mask(&pdev->dev, false);
> + mutex_unlock(&pdata->lock);
> +
> + return sysfs_create_group(&pdev->dev.kobj, &port_err_attr_group);
> +}
> +
> +static void port_err_uinit(struct platform_device *pdev,
> + struct dfl_feature *feature)
> +{
> + dev_dbg(&pdev->dev, "PORT ERR UInit.\n");
> +
> + sysfs_remove_group(&pdev->dev.kobj, &port_err_attr_group);
> +}
> +
> +const struct dfl_feature_id port_err_id_table[] = {
> + {.id = PORT_FEATURE_ID_ERROR,},
> + {0,}
> +};
> +
> +const struct dfl_feature_ops port_err_ops = {
> + .init = port_err_init,
> + .uinit = port_err_uinit,
> +};
> diff --git a/drivers/fpga/dfl-afu-main.c b/drivers/fpga/dfl-afu-main.c
> index e727d9b..754729e 100644
> --- a/drivers/fpga/dfl-afu-main.c
> +++ b/drivers/fpga/dfl-afu-main.c
> @@ -528,6 +528,10 @@ static struct dfl_feature_driver port_feature_drvs[] = {
> .ops = &port_afu_ops,
> },
> {
> + .id_table = port_err_id_table,
> + .ops = &port_err_ops,
> + },
> + {
> .ops = NULL,
> }
> };
> diff --git a/drivers/fpga/dfl-afu.h b/drivers/fpga/dfl-afu.h
> index 35e60c5..c3182a2 100644
> --- a/drivers/fpga/dfl-afu.h
> +++ b/drivers/fpga/dfl-afu.h
> @@ -100,4 +100,8 @@ int afu_dma_unmap_region(struct dfl_feature_platform_data *pdata, u64 iova);
> struct dfl_afu_dma_region *
> afu_dma_region_find(struct dfl_feature_platform_data *pdata,
> u64 iova, u64 size);
> +
> +extern const struct dfl_feature_ops port_err_ops;
> +extern const struct dfl_feature_id port_err_id_table[];
> +
> #endif /* __DFL_AFU_H */
> --
> 2.7.4
>
Thanks,
Alan
On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
Hi Hao,
Looks good...
>
> This patch adds 3 read-only sysfs interfaces for FPGA Management Engine
> (FME) block for capabilities including cache_size, fabric_version and
> socket_id.
>
> Signed-off-by: Luwei Kang <[email protected]>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
Acked-by: Alan Tull <[email protected]>
Thanks,
Alan
> ---
> Documentation/ABI/testing/sysfs-platform-dfl-fme | 23 ++++++++++++
> drivers/fpga/dfl-fme-main.c | 48 ++++++++++++++++++++++++
> 2 files changed, 71 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> index 8fa4feb..b8327e9 100644
> --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> @@ -21,3 +21,26 @@ Contact: Wu Hao <[email protected]>
> Description: Read-only. It returns Bitstream (static FPGA region) meta
> data, which includes the synthesis date, seed and other
> information of this static FPGA region.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/cache_size
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns cache size of this FPGA device.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/fabric_version
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns fabric version of this FPGA device.
> + Userspace applications need this information to select
> + best data channels per different fabric design.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/socket_id
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns socket_id to indicate which socket
> + this FPGA belongs to, only valid for integrated solution.
> + User only needs this information, in case standard numa node
> + can't provide correct information.
> diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> index 38c6342..8339ee8 100644
> --- a/drivers/fpga/dfl-fme-main.c
> +++ b/drivers/fpga/dfl-fme-main.c
> @@ -75,10 +75,58 @@ static ssize_t bitstream_metadata_show(struct device *dev,
> }
> static DEVICE_ATTR_RO(bitstream_metadata);
>
> +static ssize_t cache_size_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + void __iomem *base;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
> +
> + v = readq(base + FME_HDR_CAP);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n",
> + (unsigned int)FIELD_GET(FME_CAP_CACHE_SIZE, v));
> +}
> +static DEVICE_ATTR_RO(cache_size);
> +
> +static ssize_t fabric_version_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + void __iomem *base;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
> +
> + v = readq(base + FME_HDR_CAP);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n",
> + (unsigned int)FIELD_GET(FME_CAP_FABRIC_VERID, v));
> +}
> +static DEVICE_ATTR_RO(fabric_version);
> +
> +static ssize_t socket_id_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + void __iomem *base;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_HEADER);
> +
> + v = readq(base + FME_HDR_CAP);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n",
> + (unsigned int)FIELD_GET(FME_CAP_SOCKET_ID, v));
> +}
> +static DEVICE_ATTR_RO(socket_id);
> +
> static const struct attribute *fme_hdr_attrs[] = {
> &dev_attr_ports_num.attr,
> &dev_attr_bitstream_id.attr,
> &dev_attr_bitstream_metadata.attr,
> + &dev_attr_cache_size.attr,
> + &dev_attr_fabric_version.attr,
> + &dev_attr_socket_id.attr,
> NULL,
> };
>
> --
> 2.7.4
>
On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
Hi Hao,
>
> This patch adds support for global error reporting for FPGA
> Management Engine (FME), it introduces sysfs interfaces to
> report different error detected by the hardware, and allow
> user to clear errors or inject error for testing purpose.
>
> Signed-off-by: Luwei Kang <[email protected]>
> Signed-off-by: Ananda Ravuri <[email protected]>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-platform-dfl-fme | 58 ++++
> drivers/fpga/Makefile | 2 +-
> drivers/fpga/dfl-fme-error.c | 390 +++++++++++++++++++++++
> drivers/fpga/dfl-fme-main.c | 4 +
> drivers/fpga/dfl-fme.h | 2 +
> drivers/fpga/dfl.h | 2 +
> 6 files changed, 457 insertions(+), 1 deletion(-)
> create mode 100644 drivers/fpga/dfl-fme-error.c
>
> diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> index 4b6448f..38f9cdd 100644
> --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> @@ -156,3 +156,61 @@ KernelVersion: 5.2
> Contact: Wu Hao <[email protected]>
> Description: Read-only. Read this file to get power limit for fpga, it
> is only valid for integrated solution.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/errors
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get errors detected by hardware.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/first_error
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get the first error detected by
> + hardware.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/next_error
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get the second error detected by
> + hardware.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/clear
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Write-only. Write error code to this file to clear errors. If
> + the input error code doesn't match, it returns -EBUSY.
As with the afu errors patch, seems like -EINVAL would be better.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie0_errors
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns errors detected on pcie0 link.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie1_errors
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns errors detected on pcie1 link.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/errors/nonfatal_errors
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns non-fatal errors detected.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/errors/catfatal_errors
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns catastrophic and fatal errors detected.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/errors/inject_error
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-Write. Write this file to inject errors for testing
> + purpose. Read this file to check errors injected.
> diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
> index f1f0af7..1a9fa3d 100644
> --- a/drivers/fpga/Makefile
> +++ b/drivers/fpga/Makefile
> @@ -38,7 +38,7 @@ obj-$(CONFIG_FPGA_DFL_FME_BRIDGE) += dfl-fme-br.o
> obj-$(CONFIG_FPGA_DFL_FME_REGION) += dfl-fme-region.o
> obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o
>
> -dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o
> +dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o dfl-fme-error.o
> dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o
> dfl-afu-objs += dfl-afu-error.o
>
> diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c
> new file mode 100644
> index 0000000..f2bd5f8
> --- /dev/null
> +++ b/drivers/fpga/dfl-fme-error.c
> @@ -0,0 +1,390 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Driver for FPGA Management Engine Error Management
> + *
> + * Copyright 2019 Intel Corporation, Inc.
> + *
> + * Authors:
> + * Kang Luwei <[email protected]>
> + * Xiao Guangrong <[email protected]>
> + * Wu Hao <[email protected]>
> + * Joseph Grecco <[email protected]>
> + * Enno Luebbers <[email protected]>
> + * Tim Whisonant <[email protected]>
> + * Ananda Ravuri <[email protected]>
> + * Mitchel, Henry <[email protected]>
> + */
> +
> +#include <linux/uaccess.h>
> +
> +#include "dfl.h"
> +#include "dfl-fme.h"
> +
> +#define FME_ERROR_MASK 0x8
> +#define FME_ERROR 0x10
> +#define MBP_ERROR BIT_ULL(6)
> +#define PCIE0_ERROR_MASK 0x18
> +#define PCIE0_ERROR 0x20
> +#define PCIE1_ERROR_MASK 0x28
> +#define PCIE1_ERROR 0x30
> +#define FME_FIRST_ERROR 0x38
> +#define FME_NEXT_ERROR 0x40
> +#define RAS_NONFAT_ERROR_MASK 0x48
> +#define RAS_NONFAT_ERROR 0x50
> +#define RAS_CATFAT_ERROR_MASK 0x58
> +#define RAS_CATFAT_ERROR 0x60
> +#define RAS_ERROR_INJECT 0x68
> +#define INJECT_ERROR_MASK GENMASK_ULL(2, 0)
> +
> +static ssize_t errors_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> + (unsigned long long)readq(base + FME_ERROR));
> +}
> +static DEVICE_ATTR_RO(errors);
> +
> +static ssize_t first_error_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> + (unsigned long long)readq(base + FME_FIRST_ERROR));
> +}
> +static DEVICE_ATTR_RO(first_error);
> +
> +static ssize_t next_error_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> + (unsigned long long)readq(base + FME_NEXT_ERROR));
> +}
> +static DEVICE_ATTR_RO(next_error);
> +
> +static ssize_t clear_store(struct device *dev, struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
> + struct device *err_dev = dev->parent;
> + struct dfl_feature *feature;
> + void __iomem *base;
> + u64 v, val;
> + int ret;
> +
> + ret = kstrtou64(buf, 0, &val);
> + if (ret)
> + return ret;
from kstrtox.c:
* Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
Your sysfs doc could be updated to explain these return codes.
> +
> + feature = dfl_get_feature_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> + base = feature->ioaddr;
> +
> + mutex_lock(&pdata->lock);
> + writeq(GENMASK_ULL(63, 0), base + FME_ERROR_MASK);
> +
> + v = readq(base + FME_ERROR);
> + if (val != v) {
> + ret = -EINVAL;
Oh wait, that's what I thought it should be ;) so the doc just needs to change.
> + goto done;
It would be easy to avoid using 'goto' here.
> + }
> +
> + writeq(v, base + FME_ERROR);
> + v = readq(base + FME_FIRST_ERROR);
> + writeq(v, base + FME_FIRST_ERROR);
> + v = readq(base + FME_NEXT_ERROR);
> + writeq(v, base + FME_NEXT_ERROR);
> +
> +done:
> + /* Workaround: disable MBP_ERROR if feature revision is 0 */
> + writeq(dfl_feature_revision(feature->ioaddr) ? 0ULL : MBP_ERROR,
> + base + FME_ERROR_MASK);
> + mutex_unlock(&pdata->lock);
> + return ret ? ret : count;
> +}
> +static DEVICE_ATTR_WO(clear);
> +
> +static ssize_t revision_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n", dfl_feature_revision(base));
> +}
> +static DEVICE_ATTR_RO(revision);
The revision attr to be documented in the sysfs doc.
> +
> +static ssize_t pcie0_errors_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> + (unsigned long long)readq(base + PCIE0_ERROR));
> +}
> +
> +static ssize_t pcie0_errors_store(struct device *dev,
The sysfs doc shows this as read-only.
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> + u64 v, val;
> + int ret;
> +
> + ret = kstrtou64(buf, 0, &val);
> + if (ret)
> + return ret;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + mutex_lock(&pdata->lock);
> + writeq(GENMASK_ULL(63, 0), base + PCIE0_ERROR_MASK);
> +
> + v = readq(base + PCIE0_ERROR);
> + if (val != v)
> + ret = -EBUSY;
> + else
> + writeq(v, base + PCIE0_ERROR);
> +
> + writeq(0ULL, base + PCIE0_ERROR_MASK);
> + mutex_unlock(&pdata->lock);
> + return ret ? ret : count;
> +}
> +static DEVICE_ATTR_RW(pcie0_errors);
> +
> +static ssize_t pcie1_errors_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> + (unsigned long long)readq(base + PCIE1_ERROR));
> +}
> +
> +static ssize_t pcie1_errors_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
Same here, documentation needs to show as R/W and explain what happens
when written.
> +{
> + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> + u64 v, val;
> + int ret;
> +
> + ret = kstrtou64(buf, 0, &val);
> + if (ret)
> + return ret;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + mutex_lock(&pdata->lock);
> + writeq(GENMASK_ULL(63, 0), base + PCIE1_ERROR_MASK);
> +
> + v = readq(base + PCIE1_ERROR);
> + if (val != v)
> + ret = -EBUSY;
> + else
> + writeq(v, base + PCIE1_ERROR);
> +
> + writeq(0ULL, base + PCIE1_ERROR_MASK);
> + mutex_unlock(&pdata->lock);
> + return ret ? ret : count;
> +}
> +static DEVICE_ATTR_RW(pcie1_errors);
> +
> +static ssize_t nonfatal_errors_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> + (unsigned long long)readq(base + RAS_NONFAT_ERROR));
> +}
> +static DEVICE_ATTR_RO(nonfatal_errors);
> +
> +static ssize_t catfatal_errors_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> + (unsigned long long)readq(base + RAS_CATFAT_ERROR));
> +}
> +static DEVICE_ATTR_RO(catfatal_errors);
> +
> +static ssize_t inject_error_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + v = readq(base + RAS_ERROR_INJECT);
> +
> + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> + (unsigned long long)FIELD_GET(INJECT_ERROR_MASK, v));
> +}
> +
> +static ssize_t inject_error_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
> + struct device *err_dev = dev->parent;
> + void __iomem *base;
> + u8 inject_error;
> + int ret;
> + u64 v;
> +
> + ret = kstrtou8(buf, 0, &inject_error);
> + if (ret)
> + return ret;
> +
> + if (inject_error & ~INJECT_ERROR_MASK)
> + return -EINVAL;
> +
> + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> +
> + mutex_lock(&pdata->lock);
> + v = readq(base + RAS_ERROR_INJECT);
> + v &= ~INJECT_ERROR_MASK;
> + v |= FIELD_PREP(INJECT_ERROR_MASK, inject_error);
> + writeq(v, base + RAS_ERROR_INJECT);
> + mutex_unlock(&pdata->lock);
> +
> + return count;
> +}
> +static DEVICE_ATTR_RW(inject_error);
> +
> +static struct attribute *fme_errors_attrs[] = {
> + &dev_attr_errors.attr,
> + &dev_attr_first_error.attr,
> + &dev_attr_next_error.attr,
> + &dev_attr_clear.attr,
> + NULL,
> +};
> +
> +static struct attribute_group fme_errors_attr_group = {
> + .attrs = fme_errors_attrs,
> + .name = "fme-errors",
> +};
> +
> +static struct attribute *errors_attrs[] = {
> + &dev_attr_revision.attr,
> + &dev_attr_pcie0_errors.attr,
> + &dev_attr_pcie1_errors.attr,
> + &dev_attr_nonfatal_errors.attr,
> + &dev_attr_catfatal_errors.attr,
> + &dev_attr_inject_error.attr,
> + NULL,
> +};
> +
> +static struct attribute_group errors_attr_group = {
> + .attrs = errors_attrs,
> +};
> +
> +static const struct attribute_group *error_groups[] = {
> + &fme_errors_attr_group,
> + &errors_attr_group,
> + NULL
> +};
> +
> +static void fme_error_enable(struct dfl_feature *feature)
> +{
> + void __iomem *base = feature->ioaddr;
> +
> + /* Workaround: disable MBP_ERROR if revision is 0 */
> + writeq(dfl_feature_revision(feature->ioaddr) ? 0ULL : MBP_ERROR,
> + base + FME_ERROR_MASK);
> + writeq(0ULL, base + PCIE0_ERROR_MASK);
> + writeq(0ULL, base + PCIE1_ERROR_MASK);
> + writeq(0ULL, base + RAS_NONFAT_ERROR_MASK);
> + writeq(0ULL, base + RAS_CATFAT_ERROR_MASK);
> +}
> +
> +static void err_dev_release(struct device *dev)
> +{
> + kfree(dev);
> +}
> +
> +static int fme_global_err_init(struct platform_device *pdev,
> + struct dfl_feature *feature)
> +{
> + struct device *dev;
> + int ret = 0;
> +
> + dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> + if (!dev)
> + return -ENOMEM;
> +
> + dev->parent = &pdev->dev;
> + dev->release = err_dev_release;
> + dev_set_name(dev, "errors");
> +
> + fme_error_enable(feature);
> +
> + ret = device_register(dev);
> + if (ret) {
> + put_device(dev);
> + return ret;
> + }
> +
> + ret = sysfs_create_groups(&dev->kobj, error_groups);
> + if (ret) {
> + device_unregister(dev);
> + return ret;
> + }
> +
> + feature->priv = dev;
> +
> + return ret;
> +}
> +
> +static void fme_global_err_uinit(struct platform_device *pdev,
> + struct dfl_feature *feature)
> +{
> + struct device *dev = feature->priv;
> +
> + sysfs_remove_groups(&dev->kobj, error_groups);
> + device_unregister(dev);
> +}
> +
> +const struct dfl_feature_id fme_global_err_id_table[] = {
> + {.id = FME_FEATURE_ID_GLOBAL_ERR,},
> + {0,}
> +};
> +
> +const struct dfl_feature_ops fme_global_err_ops = {
> + .init = fme_global_err_init,
> + .uinit = fme_global_err_uinit,
> +};
> diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> index dafa6580..76cb112 100644
> --- a/drivers/fpga/dfl-fme-main.c
> +++ b/drivers/fpga/dfl-fme-main.c
> @@ -686,6 +686,10 @@ static struct dfl_feature_driver fme_feature_drvs[] = {
> .ops = &fme_power_mgmt_ops,
> },
> {
> + .id_table = fme_global_err_id_table,
> + .ops = &fme_global_err_ops,
> + },
> + {
> .ops = NULL,
> },
> };
> diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
> index 7a021c4..5fbe3f5 100644
> --- a/drivers/fpga/dfl-fme.h
> +++ b/drivers/fpga/dfl-fme.h
> @@ -37,5 +37,7 @@ struct dfl_fme {
>
> extern const struct dfl_feature_ops fme_pr_mgmt_ops;
> extern const struct dfl_feature_id fme_pr_mgmt_id_table[];
> +extern const struct dfl_feature_ops fme_global_err_ops;
> +extern const struct dfl_feature_id fme_global_err_id_table[];
>
> #endif /* __DFL_FME_H */
> diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
> index fbc57f0..6c32080 100644
> --- a/drivers/fpga/dfl.h
> +++ b/drivers/fpga/dfl.h
> @@ -197,12 +197,14 @@ struct dfl_feature_driver {
> * feature dev (platform device)'s reources.
> * @ioaddr: mapped mmio resource address.
> * @ops: ops of this sub feature.
> + * @priv: priv data of this feature.
> */
> struct dfl_feature {
> u64 id;
> int resource_index;
> void __iomem *ioaddr;
> const struct dfl_feature_ops *ops;
> + void *priv;
> };
>
> #define DEV_STATUS_IN_USE 0
> --
> 2.7.4
>
Thanks,
Alan
On Tue, Apr 09, 2019 at 04:35:25PM -0500, Alan Tull wrote:
> On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
>
> Hi Hao,
>
> >
> > This patch adds support for global error reporting for FPGA
> > Management Engine (FME), it introduces sysfs interfaces to
> > report different error detected by the hardware, and allow
> > user to clear errors or inject error for testing purpose.
> >
> > Signed-off-by: Luwei Kang <[email protected]>
> > Signed-off-by: Ananda Ravuri <[email protected]>
> > Signed-off-by: Xu Yilun <[email protected]>
> > Signed-off-by: Wu Hao <[email protected]>
> > ---
> > Documentation/ABI/testing/sysfs-platform-dfl-fme | 58 ++++
> > drivers/fpga/Makefile | 2 +-
> > drivers/fpga/dfl-fme-error.c | 390 +++++++++++++++++++++++
> > drivers/fpga/dfl-fme-main.c | 4 +
> > drivers/fpga/dfl-fme.h | 2 +
> > drivers/fpga/dfl.h | 2 +
> > 6 files changed, 457 insertions(+), 1 deletion(-)
> > create mode 100644 drivers/fpga/dfl-fme-error.c
> >
> > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > index 4b6448f..38f9cdd 100644
> > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > @@ -156,3 +156,61 @@ KernelVersion: 5.2
> > Contact: Wu Hao <[email protected]>
> > Description: Read-only. Read this file to get power limit for fpga, it
> > is only valid for integrated solution.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/errors
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get errors detected by hardware.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/first_error
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get the first error detected by
> > + hardware.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/next_error
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get the second error detected by
> > + hardware.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/errors/fme-errors/clear
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Write-only. Write error code to this file to clear errors. If
> > + the input error code doesn't match, it returns -EBUSY.
>
> As with the afu errors patch, seems like -EINVAL would be better.
>
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie0_errors
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns errors detected on pcie0 link.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/errors/pcie1_errors
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns errors detected on pcie1 link.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/errors/nonfatal_errors
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns non-fatal errors detected.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/errors/catfatal_errors
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns catastrophic and fatal errors detected.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/errors/inject_error
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-Write. Write this file to inject errors for testing
> > + purpose. Read this file to check errors injected.
> > diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
> > index f1f0af7..1a9fa3d 100644
> > --- a/drivers/fpga/Makefile
> > +++ b/drivers/fpga/Makefile
> > @@ -38,7 +38,7 @@ obj-$(CONFIG_FPGA_DFL_FME_BRIDGE) += dfl-fme-br.o
> > obj-$(CONFIG_FPGA_DFL_FME_REGION) += dfl-fme-region.o
> > obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o
> >
> > -dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o
> > +dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o dfl-fme-error.o
> > dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o
> > dfl-afu-objs += dfl-afu-error.o
> >
> > diff --git a/drivers/fpga/dfl-fme-error.c b/drivers/fpga/dfl-fme-error.c
> > new file mode 100644
> > index 0000000..f2bd5f8
> > --- /dev/null
> > +++ b/drivers/fpga/dfl-fme-error.c
> > @@ -0,0 +1,390 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Driver for FPGA Management Engine Error Management
> > + *
> > + * Copyright 2019 Intel Corporation, Inc.
> > + *
> > + * Authors:
> > + * Kang Luwei <[email protected]>
> > + * Xiao Guangrong <[email protected]>
> > + * Wu Hao <[email protected]>
> > + * Joseph Grecco <[email protected]>
> > + * Enno Luebbers <[email protected]>
> > + * Tim Whisonant <[email protected]>
> > + * Ananda Ravuri <[email protected]>
> > + * Mitchel, Henry <[email protected]>
> > + */
> > +
> > +#include <linux/uaccess.h>
> > +
> > +#include "dfl.h"
> > +#include "dfl-fme.h"
> > +
> > +#define FME_ERROR_MASK 0x8
> > +#define FME_ERROR 0x10
> > +#define MBP_ERROR BIT_ULL(6)
> > +#define PCIE0_ERROR_MASK 0x18
> > +#define PCIE0_ERROR 0x20
> > +#define PCIE1_ERROR_MASK 0x28
> > +#define PCIE1_ERROR 0x30
> > +#define FME_FIRST_ERROR 0x38
> > +#define FME_NEXT_ERROR 0x40
> > +#define RAS_NONFAT_ERROR_MASK 0x48
> > +#define RAS_NONFAT_ERROR 0x50
> > +#define RAS_CATFAT_ERROR_MASK 0x58
> > +#define RAS_CATFAT_ERROR 0x60
> > +#define RAS_ERROR_INJECT 0x68
> > +#define INJECT_ERROR_MASK GENMASK_ULL(2, 0)
> > +
> > +static ssize_t errors_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> > + (unsigned long long)readq(base + FME_ERROR));
> > +}
> > +static DEVICE_ATTR_RO(errors);
> > +
> > +static ssize_t first_error_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> > + (unsigned long long)readq(base + FME_FIRST_ERROR));
> > +}
> > +static DEVICE_ATTR_RO(first_error);
> > +
> > +static ssize_t next_error_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> > + (unsigned long long)readq(base + FME_NEXT_ERROR));
> > +}
> > +static DEVICE_ATTR_RO(next_error);
> > +
> > +static ssize_t clear_store(struct device *dev, struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
> > + struct device *err_dev = dev->parent;
> > + struct dfl_feature *feature;
> > + void __iomem *base;
> > + u64 v, val;
> > + int ret;
> > +
> > + ret = kstrtou64(buf, 0, &val);
> > + if (ret)
> > + return ret;
>
> from kstrtox.c:
> * Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
>
> Your sysfs doc could be updated to explain these return codes.
Sure, will fix this.
>
> > +
> > + feature = dfl_get_feature_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > + base = feature->ioaddr;
> > +
> > + mutex_lock(&pdata->lock);
> > + writeq(GENMASK_ULL(63, 0), base + FME_ERROR_MASK);
> > +
> > + v = readq(base + FME_ERROR);
> > + if (val != v) {
> > + ret = -EINVAL;
>
> Oh wait, that's what I thought it should be ;) so the doc just needs to change.
Sure, will capture the detailed return values in doc.
>
> > + goto done;
>
> It would be easy to avoid using 'goto' here.
Yes, agree.
>
> > + }
> > +
> > + writeq(v, base + FME_ERROR);
> > + v = readq(base + FME_FIRST_ERROR);
> > + writeq(v, base + FME_FIRST_ERROR);
> > + v = readq(base + FME_NEXT_ERROR);
> > + writeq(v, base + FME_NEXT_ERROR);
> > +
> > +done:
> > + /* Workaround: disable MBP_ERROR if feature revision is 0 */
> > + writeq(dfl_feature_revision(feature->ioaddr) ? 0ULL : MBP_ERROR,
> > + base + FME_ERROR_MASK);
> > + mutex_unlock(&pdata->lock);
> > + return ret ? ret : count;
> > +}
> > +static DEVICE_ATTR_WO(clear);
> > +
> > +static ssize_t revision_show(struct device *dev, struct device_attribute *attr,
> > + char *buf)
> > +{
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n", dfl_feature_revision(base));
> > +}
> > +static DEVICE_ATTR_RO(revision);
>
> The revision attr to be documented in the sysfs doc.
Will fix this.
>
> > +
> > +static ssize_t pcie0_errors_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> > + (unsigned long long)readq(base + PCIE0_ERROR));
> > +}
> > +
> > +static ssize_t pcie0_errors_store(struct device *dev,
>
> The sysfs doc shows this as read-only.
>
> > + struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > + u64 v, val;
> > + int ret;
> > +
> > + ret = kstrtou64(buf, 0, &val);
> > + if (ret)
> > + return ret;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + mutex_lock(&pdata->lock);
> > + writeq(GENMASK_ULL(63, 0), base + PCIE0_ERROR_MASK);
> > +
> > + v = readq(base + PCIE0_ERROR);
> > + if (val != v)
> > + ret = -EBUSY;
> > + else
> > + writeq(v, base + PCIE0_ERROR);
> > +
> > + writeq(0ULL, base + PCIE0_ERROR_MASK);
> > + mutex_unlock(&pdata->lock);
> > + return ret ? ret : count;
> > +}
> > +static DEVICE_ATTR_RW(pcie0_errors);
> > +
> > +static ssize_t pcie1_errors_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> > + (unsigned long long)readq(base + PCIE1_ERROR));
> > +}
> > +
> > +static ssize_t pcie1_errors_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t count)
>
> Same here, documentation needs to show as R/W and explain what happens
> when written.
Sorry, will fix them in doc.
>
> > +{
> > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > + u64 v, val;
> > + int ret;
> > +
> > + ret = kstrtou64(buf, 0, &val);
> > + if (ret)
> > + return ret;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + mutex_lock(&pdata->lock);
> > + writeq(GENMASK_ULL(63, 0), base + PCIE1_ERROR_MASK);
> > +
> > + v = readq(base + PCIE1_ERROR);
> > + if (val != v)
> > + ret = -EBUSY;
> > + else
> > + writeq(v, base + PCIE1_ERROR);
> > +
> > + writeq(0ULL, base + PCIE1_ERROR_MASK);
> > + mutex_unlock(&pdata->lock);
> > + return ret ? ret : count;
> > +}
> > +static DEVICE_ATTR_RW(pcie1_errors);
> > +
> > +static ssize_t nonfatal_errors_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> > + (unsigned long long)readq(base + RAS_NONFAT_ERROR));
> > +}
> > +static DEVICE_ATTR_RO(nonfatal_errors);
> > +
> > +static ssize_t catfatal_errors_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> > + (unsigned long long)readq(base + RAS_CATFAT_ERROR));
> > +}
> > +static DEVICE_ATTR_RO(catfatal_errors);
> > +
> > +static ssize_t inject_error_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + v = readq(base + RAS_ERROR_INJECT);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "0x%llx\n",
> > + (unsigned long long)FIELD_GET(INJECT_ERROR_MASK, v));
> > +}
> > +
> > +static ssize_t inject_error_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev->parent);
> > + struct device *err_dev = dev->parent;
> > + void __iomem *base;
> > + u8 inject_error;
> > + int ret;
> > + u64 v;
> > +
> > + ret = kstrtou8(buf, 0, &inject_error);
> > + if (ret)
> > + return ret;
> > +
> > + if (inject_error & ~INJECT_ERROR_MASK)
> > + return -EINVAL;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(err_dev, FME_FEATURE_ID_GLOBAL_ERR);
> > +
> > + mutex_lock(&pdata->lock);
> > + v = readq(base + RAS_ERROR_INJECT);
> > + v &= ~INJECT_ERROR_MASK;
> > + v |= FIELD_PREP(INJECT_ERROR_MASK, inject_error);
> > + writeq(v, base + RAS_ERROR_INJECT);
> > + mutex_unlock(&pdata->lock);
> > +
> > + return count;
> > +}
> > +static DEVICE_ATTR_RW(inject_error);
> > +
> > +static struct attribute *fme_errors_attrs[] = {
> > + &dev_attr_errors.attr,
> > + &dev_attr_first_error.attr,
> > + &dev_attr_next_error.attr,
> > + &dev_attr_clear.attr,
> > + NULL,
> > +};
> > +
> > +static struct attribute_group fme_errors_attr_group = {
> > + .attrs = fme_errors_attrs,
> > + .name = "fme-errors",
> > +};
> > +
> > +static struct attribute *errors_attrs[] = {
> > + &dev_attr_revision.attr,
> > + &dev_attr_pcie0_errors.attr,
> > + &dev_attr_pcie1_errors.attr,
> > + &dev_attr_nonfatal_errors.attr,
> > + &dev_attr_catfatal_errors.attr,
> > + &dev_attr_inject_error.attr,
> > + NULL,
> > +};
> > +
> > +static struct attribute_group errors_attr_group = {
> > + .attrs = errors_attrs,
> > +};
> > +
> > +static const struct attribute_group *error_groups[] = {
> > + &fme_errors_attr_group,
> > + &errors_attr_group,
> > + NULL
> > +};
> > +
> > +static void fme_error_enable(struct dfl_feature *feature)
> > +{
> > + void __iomem *base = feature->ioaddr;
> > +
> > + /* Workaround: disable MBP_ERROR if revision is 0 */
> > + writeq(dfl_feature_revision(feature->ioaddr) ? 0ULL : MBP_ERROR,
> > + base + FME_ERROR_MASK);
> > + writeq(0ULL, base + PCIE0_ERROR_MASK);
> > + writeq(0ULL, base + PCIE1_ERROR_MASK);
> > + writeq(0ULL, base + RAS_NONFAT_ERROR_MASK);
> > + writeq(0ULL, base + RAS_CATFAT_ERROR_MASK);
> > +}
> > +
> > +static void err_dev_release(struct device *dev)
> > +{
> > + kfree(dev);
> > +}
> > +
> > +static int fme_global_err_init(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + struct device *dev;
> > + int ret = 0;
> > +
> > + dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> > + if (!dev)
> > + return -ENOMEM;
> > +
> > + dev->parent = &pdev->dev;
> > + dev->release = err_dev_release;
> > + dev_set_name(dev, "errors");
> > +
> > + fme_error_enable(feature);
> > +
> > + ret = device_register(dev);
> > + if (ret) {
> > + put_device(dev);
> > + return ret;
> > + }
> > +
> > + ret = sysfs_create_groups(&dev->kobj, error_groups);
> > + if (ret) {
> > + device_unregister(dev);
> > + return ret;
> > + }
> > +
> > + feature->priv = dev;
> > +
> > + return ret;
> > +}
> > +
> > +static void fme_global_err_uinit(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + struct device *dev = feature->priv;
> > +
> > + sysfs_remove_groups(&dev->kobj, error_groups);
> > + device_unregister(dev);
> > +}
> > +
> > +const struct dfl_feature_id fme_global_err_id_table[] = {
> > + {.id = FME_FEATURE_ID_GLOBAL_ERR,},
> > + {0,}
> > +};
> > +
> > +const struct dfl_feature_ops fme_global_err_ops = {
> > + .init = fme_global_err_init,
> > + .uinit = fme_global_err_uinit,
> > +};
> > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > index dafa6580..76cb112 100644
> > --- a/drivers/fpga/dfl-fme-main.c
> > +++ b/drivers/fpga/dfl-fme-main.c
> > @@ -686,6 +686,10 @@ static struct dfl_feature_driver fme_feature_drvs[] = {
> > .ops = &fme_power_mgmt_ops,
> > },
> > {
> > + .id_table = fme_global_err_id_table,
> > + .ops = &fme_global_err_ops,
> > + },
> > + {
> > .ops = NULL,
> > },
> > };
> > diff --git a/drivers/fpga/dfl-fme.h b/drivers/fpga/dfl-fme.h
> > index 7a021c4..5fbe3f5 100644
> > --- a/drivers/fpga/dfl-fme.h
> > +++ b/drivers/fpga/dfl-fme.h
> > @@ -37,5 +37,7 @@ struct dfl_fme {
> >
> > extern const struct dfl_feature_ops fme_pr_mgmt_ops;
> > extern const struct dfl_feature_id fme_pr_mgmt_id_table[];
> > +extern const struct dfl_feature_ops fme_global_err_ops;
> > +extern const struct dfl_feature_id fme_global_err_id_table[];
> >
> > #endif /* __DFL_FME_H */
> > diff --git a/drivers/fpga/dfl.h b/drivers/fpga/dfl.h
> > index fbc57f0..6c32080 100644
> > --- a/drivers/fpga/dfl.h
> > +++ b/drivers/fpga/dfl.h
> > @@ -197,12 +197,14 @@ struct dfl_feature_driver {
> > * feature dev (platform device)'s reources.
> > * @ioaddr: mapped mmio resource address.
> > * @ops: ops of this sub feature.
> > + * @priv: priv data of this feature.
> > */
> > struct dfl_feature {
> > u64 id;
> > int resource_index;
> > void __iomem *ioaddr;
> > const struct dfl_feature_ops *ops;
> > + void *priv;
> > };
> >
> > #define DEV_STATUS_IN_USE 0
> > --
> > 2.7.4
> >
Thanks for the review, will fix these things in the next version.
Hao
>
> Thanks,
> Alan
On Tue, Apr 09, 2019 at 03:57:37PM -0500, Alan Tull wrote:
> On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
>
> Hi Hao,
>
> >
> > Error reporting is one important private feature, it reports error
> > detected on port and accelerated function unit (AFU). It introduces
> > several sysfs interfaces to allow userspace to check and clear
> > errors detected by hardware.
> >
> > Signed-off-by: Xu Yilun <[email protected]>
> > Signed-off-by: Wu Hao <[email protected]>
> > ---
> > Documentation/ABI/testing/sysfs-platform-dfl-port | 29 +++
> > drivers/fpga/Makefile | 1 +
> > drivers/fpga/dfl-afu-error.c | 225 ++++++++++++++++++++++
> > drivers/fpga/dfl-afu-main.c | 4 +
> > drivers/fpga/dfl-afu.h | 4 +
> > 5 files changed, 263 insertions(+)
> > create mode 100644 drivers/fpga/dfl-afu-error.c
> >
> > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-port b/Documentation/ABI/testing/sysfs-platform-dfl-port
> > index f611e47..e6140aa 100644
> > --- a/Documentation/ABI/testing/sysfs-platform-dfl-port
> > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-port
> > @@ -79,3 +79,32 @@ KernelVersion: 5.2
> > Contact: Wu Hao <[email protected]>
> > Description: Read-only. Read this file to get the status of issued command
> > to userclck_freqcntrcmd.
> > +
> > +What: /sys/bus/platform/devices/dfl-port.0/errors/errors
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get errors detected on port and
> > + Accelerated Function Unit (AFU).
> > +
> > +What: /sys/bus/platform/devices/dfl-port.0/errors/first_error
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get the first error detected by
> > + hardware.
> > +
> > +What: /sys/bus/platform/devices/dfl-port.0/errors/first_malformed_req
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get the first malformed request
> > + captured by hardware.
> > +
> > +What: /sys/bus/platform/devices/dfl-port.0/errors/clear
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Write-only. Write error code to this file to clear errors. If
> > + the input error code doesn't match, it returns -EBUSY error
> > + code.
>
> I understand how -EBUSY could be the right error code for when the
> hardware is in a state where the error can't be cleared. But if the
> input error code doesn't match, shouldn't the code be -EINVAL? Also
> as noted below, the way this is currently coded, -ETIMEDOUT could get
> returned.
Thanks for the comments, let me try to capture all possible error return
values in doc in the next version to avoid confusion.
>
> > diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
> > index c0dd4c8..f1f0af7 100644
> > --- a/drivers/fpga/Makefile
> > +++ b/drivers/fpga/Makefile
> > @@ -40,6 +40,7 @@ obj-$(CONFIG_FPGA_DFL_AFU) += dfl-afu.o
> >
> > dfl-fme-objs := dfl-fme-main.o dfl-fme-pr.o
> > dfl-afu-objs := dfl-afu-main.o dfl-afu-region.o dfl-afu-dma-region.o
> > +dfl-afu-objs += dfl-afu-error.o
> >
> > # Drivers for FPGAs which implement DFL
> > obj-$(CONFIG_FPGA_DFL_PCI) += dfl-pci.o
> > diff --git a/drivers/fpga/dfl-afu-error.c b/drivers/fpga/dfl-afu-error.c
> > new file mode 100644
> > index 0000000..b66bd4a
> > --- /dev/null
> > +++ b/drivers/fpga/dfl-afu-error.c
> > @@ -0,0 +1,225 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Driver for FPGA Accelerated Function Unit (AFU) Error Reporting
> > + *
> > + * Copyright 2019 Intel Corporation, Inc.
> > + *
> > + * Authors:
> > + * Wu Hao <[email protected]>
> > + * Xiao Guangrong <[email protected]>
> > + * Joseph Grecco <[email protected]>
> > + * Enno Luebbers <[email protected]>
> > + * Tim Whisonant <[email protected]>
> > + * Ananda Ravuri <[email protected]>
> > + * Mitchel Henry <[email protected]>
> > + */
> > +
> > +#include <linux/uaccess.h>
> > +
> > +#include "dfl-afu.h"
> > +
> > +#define PORT_ERROR_MASK 0x8
> > +#define PORT_ERROR 0x10
> > +#define PORT_FIRST_ERROR 0x18
> > +#define PORT_MALFORMED_REQ0 0x20
> > +#define PORT_MALFORMED_REQ1 0x28
> > +
> > +#define ERROR_MASK GENMASK_ULL(63, 0)
> > +
> > +/* mask or unmask port errors by the error mask register. */
> > +static void __port_err_mask(struct device *dev, bool mask)
> > +{
> > + void __iomem *base;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> > +
> > + writeq(mask ? ERROR_MASK : 0, base + PORT_ERROR_MASK);
> > +}
> > +
> > +/* clear port errors. */
> > +static int __port_err_clear(struct device *dev, u64 err)
> > +{
> > + struct platform_device *pdev = to_platform_device(dev);
> > + void __iomem *base_err, *base_hdr;
> > + int ret;
> > + u64 v;
> > +
> > + base_err = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> > + base_hdr = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_HEADER);
> > +
> > + /*
> > + * clear Port Errors
> > + *
> > + * - Check for AP6 State
> > + * - Halt Port by keeping Port in reset
> > + * - Set PORT Error mask to all 1 to mask errors
> > + * - Clear all errors
> > + * - Set Port mask to all 0 to enable errors
> > + * - All errors start capturing new errors
> > + * - Enable Port by pulling the port out of reset
> > + */
> > +
> > + /* if device is still in AP6 power state, can not clear any error. */
> > + v = readq(base_hdr + PORT_HDR_STS);
> > + if (FIELD_GET(PORT_STS_PWR_STATE, v) == PORT_STS_PWR_STATE_AP6) {
> > + dev_err(dev, "Could not clear errors, device in AP6 state.\n");
> > + return -EBUSY;
> > + }
> > +
> > + /* Halt Port by keeping Port in reset */
> > + ret = __port_disable(pdev);
> > + if (ret)
> > + return ret;
>
> __port_disable can return -ETIMEDOUT which will then get returned from
> clear_store. The sysfs document only talks about -EBUSY. You could
> either document -ETIMEDOUT in the sysfs doc or you could change the
> code to adjust the returned error code.
Yes, agree.
>
> > +
> > + /* Mask all errors */
> > + __port_err_mask(dev, true);
> > +
> > + /* Clear errors if err input matches with current port errors.*/
> > + v = readq(base_err + PORT_ERROR);
> > +
> > + if (v == err) {
> > + writeq(v, base_err + PORT_ERROR);
> > +
> > + v = readq(base_err + PORT_FIRST_ERROR);
> > + writeq(v, base_err + PORT_FIRST_ERROR);
> > + } else {
> > + ret = -EBUSY;
> > + }
> > +
> > + /* Clear mask */
> > + __port_err_mask(dev, false);
> > +
> > + /* Enable the Port by clear the reset */
> > + __port_enable(pdev);
> > +
> > + return ret;
> > +}
> > +
> > +static ssize_t revision_show(struct device *dev, struct device_attribute *attr,
> > + char *buf)
> > +{
> > + void __iomem *base;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, PORT_FEATURE_ID_ERROR);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n", dfl_feature_revision(base));
> > +}
> > +static DEVICE_ATTR_RO(revision);
>
> This appears to be adding a
> /sys/bus/platform/devices/dfl-port.0/errors/revision attribute that
> isn't documented in the sysfs document.
Sorry, will fix all above issues in the next version.
Thanks again for the code review and comments.
Hao
On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
Hi Hao,
>
> This patch adds support for power management private feature under
> FPGA Management Engine (FME), sysfs interfaces are introduced for
> different power management functions, users could use these sysfs
> interface to get current number of consumed power, throttling
How about
s/number/measurement/
?
> thresholds, threshold status and other information, and configure
> different value for throttling thresholds too.
>
> Signed-off-by: Luwei Kang <[email protected]>
> Signed-off-by: Xu Yilun <[email protected]>
> Signed-off-by: Wu Hao <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++
> drivers/fpga/dfl-fme-main.c | 257 +++++++++++++++++++++++
> 2 files changed, 313 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> index d3aeb88..4b6448f 100644
> --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> @@ -100,3 +100,59 @@ Description: Read-only. Read this file to get the policy of temperature
> threshold1. It only supports two value (policy):
> 0 - AP2 state (90% throttling)
> 1 - AP1 state (50% throttling)
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/consumed
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns current power consumed by FPGA.
What are the units?
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-Write. Read/Write this file to get/set current power
> + threshold1 in Watts.
Perhaps document error codes here and for threshold2 below.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-Write. Read/Write this file to get/set current power
> + threshold2 in Watts.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1_status
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns 1 if power consumption reaches the
> + threshold1, otherwise 0.
I'm used to things like this requiring user to reset the status, so it
may be worth making it explicit that it will return to zero if
consumption drops below threshold if that's what's happening here.
If it's correct, perhaps could just say something like 'returns 1 if
power consumption is currently at or above threshold1, otherwise 0'
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2_status
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. It returns 1 if power consumption reaches the
> + threshold2, otherwise 0.
Same here.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/ltr
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get current Latency Tolerance
> + Reporting (ltr) value, it's only valid for integrated
> + solution as it blocks CPU on low power state.
If we're not on the integrated solution, it returns a value but it is
not really real?
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/xeon_limit
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get power limit for xeon, it
> + is only valid for integrated solution.
> +
> +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/fpga_limit
> +Date: March 2019
> +KernelVersion: 5.2
> +Contact: Wu Hao <[email protected]>
> +Description: Read-only. Read this file to get power limit for fpga, it
> + is only valid for integrated solution.
> diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> index 449a17d..dafa6580 100644
> --- a/drivers/fpga/dfl-fme-main.c
> +++ b/drivers/fpga/dfl-fme-main.c
> @@ -415,6 +415,259 @@ static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
> .uinit = fme_thermal_mgmt_uinit,
> };
>
> +#define FME_PWR_STATUS 0x8
> +#define FME_LATENCY_TOLERANCE BIT_ULL(18)
> +#define PWR_CONSUMED GENMASK_ULL(17, 0)
> +
> +#define FME_PWR_THRESHOLD 0x10
> +#define PWR_THRESHOLD1 GENMASK_ULL(6, 0) /* in Watts */
> +#define PWR_THRESHOLD2 GENMASK_ULL(14, 8) /* in Watts */
> +#define PWR_THRESHOLD_MAX 0x7f
> +#define PWR_THRESHOLD1_STATUS BIT_ULL(16)
> +#define PWR_THRESHOLD2_STATUS BIT_ULL(17)
> +
> +#define FME_PWR_XEON_LIMIT 0x18
> +#define XEON_PWR_LIMIT GENMASK_ULL(14, 0)
> +#define XEON_PWR_EN BIT_ULL(15)
> +#define FME_PWR_FPGA_LIMIT 0x20
> +#define FPGA_PWR_LIMIT GENMASK_ULL(14, 0)
> +#define FPGA_PWR_EN BIT_ULL(15)
> +
> +#define POWER_ATTR(_name, _mode, _show, _store) \
> +struct device_attribute power_attr_##_name = \
> + __ATTR(_name, _mode, _show, _store)
> +
> +#define POWER_ATTR_RO(_name, _show) \
> + POWER_ATTR(_name, 0444, _show, NULL)
> +
> +#define POWER_ATTR_RW(_name, _show, _store) \
> + POWER_ATTR(_name, 0644, _show, _store)
Are these #defines necessary? Seems like you could just use DEVICE_ATTR*
> +
> +static ssize_t pwr_consumed_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + void __iomem *base;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> +
> + v = readq(base + FME_PWR_STATUS);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n",
> + (unsigned int)FIELD_GET(PWR_CONSUMED, v));
> +}
> +static POWER_ATTR_RO(consumed, pwr_consumed_show);
> +
> +static ssize_t pwr_threshold1_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + void __iomem *base;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> +
> + v = readq(base + FME_PWR_THRESHOLD);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n",
> + (unsigned int)FIELD_GET(PWR_THRESHOLD1, v));
> +}
> +
> +static ssize_t pwr_threshold1_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> + void __iomem *base;
> + u8 threshold;
> + int ret;
> + u64 v;
> +
> + ret = kstrtou8(buf, 0, &threshold);
> + if (ret)
> + return ret;
> +
> + if (threshold > PWR_THRESHOLD_MAX)
> + return -EINVAL;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> +
> + mutex_lock(&pdata->lock);
> + v = readq(base + FME_PWR_THRESHOLD);
> + v &= ~PWR_THRESHOLD1;
> + v |= FIELD_PREP(PWR_THRESHOLD1, threshold);
> + writeq(v, base + FME_PWR_THRESHOLD);
> + mutex_unlock(&pdata->lock);
> +
> + return count;
> +}
> +static POWER_ATTR_RW(threshold1, pwr_threshold1_show, pwr_threshold1_store);
> +
> +static ssize_t pwr_threshold2_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + void __iomem *base;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> +
> + v = readq(base + FME_PWR_THRESHOLD);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n",
> + (unsigned int)FIELD_GET(PWR_THRESHOLD2, v));
> +}
> +
> +static ssize_t pwr_threshold2_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> + void __iomem *base;
> + u8 threshold;
> + int ret;
> + u64 v;
> +
> + ret = kstrtou8(buf, 0, &threshold);
> + if (ret)
> + return ret;
> +
> + if (threshold > PWR_THRESHOLD_MAX)
> + return -EINVAL;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> +
> + mutex_lock(&pdata->lock);
> + v = readq(base + FME_PWR_THRESHOLD);
> + v &= ~PWR_THRESHOLD2;
> + v |= FIELD_PREP(PWR_THRESHOLD2, threshold);
> + writeq(v, base + FME_PWR_THRESHOLD);
> + mutex_unlock(&pdata->lock);
> +
> + return count;
> +}
> +static POWER_ATTR_RW(threshold2, pwr_threshold2_show, pwr_threshold2_store);
> +
> +static ssize_t pwr_threshold1_status_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + void __iomem *base;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> +
> + v = readq(base + FME_PWR_THRESHOLD);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n",
> + (unsigned int)FIELD_GET(PWR_THRESHOLD1_STATUS, v));
> +}
> +static POWER_ATTR_RO(threshold1_status, pwr_threshold1_status_show);
> +
> +static ssize_t pwr_threshold2_status_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + void __iomem *base;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> +
> + v = readq(base + FME_PWR_THRESHOLD);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n",
> + (unsigned int)FIELD_GET(PWR_THRESHOLD2_STATUS, v));
> +}
> +static POWER_ATTR_RO(threshold2_status, pwr_threshold2_status_show);
> +
> +static ssize_t ltr_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + void __iomem *base;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> +
> + v = readq(base + FME_PWR_STATUS);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n",
> + (unsigned int)FIELD_GET(FME_LATENCY_TOLERANCE, v));
> +}
> +static POWER_ATTR_RO(ltr, ltr_show);
> +
> +static ssize_t xeon_limit_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + void __iomem *base;
> + u16 xeon_limit = 0;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> +
> + v = readq(base + FME_PWR_XEON_LIMIT);
> +
> + if (FIELD_GET(XEON_PWR_EN, v))
> + xeon_limit = FIELD_GET(XEON_PWR_LIMIT, v);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n", xeon_limit);
> +}
> +static POWER_ATTR_RO(xeon_limit, xeon_limit_show);
> +
> +static ssize_t fpga_limit_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + void __iomem *base;
> + u16 fpga_limit = 0;
> + u64 v;
> +
> + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> +
> + v = readq(base + FME_PWR_FPGA_LIMIT);
> +
> + if (FIELD_GET(FPGA_PWR_EN, v))
> + fpga_limit = FIELD_GET(FPGA_PWR_LIMIT, v);
> +
> + return scnprintf(buf, PAGE_SIZE, "%u\n", fpga_limit);
> +}
> +static POWER_ATTR_RO(fpga_limit, fpga_limit_show);
> +
> +static struct attribute *power_mgmt_attrs[] = {
> + &power_attr_consumed.attr,
> + &power_attr_threshold1.attr,
> + &power_attr_threshold2.attr,
> + &power_attr_threshold1_status.attr,
> + &power_attr_threshold2_status.attr,
> + &power_attr_xeon_limit.attr,
> + &power_attr_fpga_limit.attr,
> + &power_attr_ltr.attr,
This is a nit, but I would expect to see these listed in the same
order as their show/store functions above. So ltr_attr would come
between threshold2_status_attr and xeon_limit_attr.
> + NULL,
> +};
> +
> +static struct attribute_group power_mgmt_attr_group = {
> + .attrs = power_mgmt_attrs,
> + .name = "power_mgmt",
> +};
> +
> +static int fme_power_mgmt_init(struct platform_device *pdev,
> + struct dfl_feature *feature)
> +{
> + return sysfs_create_group(&pdev->dev.kobj, &power_mgmt_attr_group);
> +}
> +
> +static void fme_power_mgmt_uinit(struct platform_device *pdev,
> + struct dfl_feature *feature)
> +{
> + sysfs_remove_group(&pdev->dev.kobj, &power_mgmt_attr_group);
> +}
> +
> +static const struct dfl_feature_id fme_power_mgmt_id_table[] = {
> + {.id = FME_FEATURE_ID_POWER_MGMT,},
> + {0,}
> +};
> +
> +static const struct dfl_feature_ops fme_power_mgmt_ops = {
> + .init = fme_power_mgmt_init,
> + .uinit = fme_power_mgmt_uinit,
> +};
> +
> static struct dfl_feature_driver fme_feature_drvs[] = {
> {
> .id_table = fme_hdr_id_table,
> @@ -429,6 +682,10 @@ static struct dfl_feature_driver fme_feature_drvs[] = {
> .ops = &fme_thermal_mgmt_ops,
> },
> {
> + .id_table = fme_power_mgmt_id_table,
> + .ops = &fme_power_mgmt_ops,
> + },
> + {
> .ops = NULL,
> },
> };
> --
> 2.7.4
>
Thanks,
Alan
On Tue, Apr 2, 2019 at 10:07 AM Moritz Fischer <[email protected]> wrote:
>
> Hi Wu,
>
> On Mon, Mar 25, 2019 at 11:07:39AM +0800, Wu Hao wrote:
> > STP (SignalTap) is one of the private features under the port for
> > debugging. This patch adds private feature driver support for it
> > to allow userspace applications to mmap related mmio region and
> > provide STP service.
> >
> > Signed-off-by: Xu Yilun <[email protected]>
> > Signed-off-by: Wu Hao <[email protected]>
> Acked-by: Moritz Fischer <[email protected]>
Acked-by: Alan Tull <[email protected]>
Thanks,
Alan
On Tue, Apr 2, 2019 at 10:50 AM Moritz Fischer <[email protected]> wrote:
Hi Hao,
>
> On Mon, Mar 25, 2019 at 11:07:37AM +0800, Wu Hao wrote:
> > As these two functions are used by other private features. e.g.
> > in error reporting private feature, it requires to check port status
> > and reset port for error clearing.
> >
> > Signed-off-by: Xu Yilun <[email protected]>
> > Signed-off-by: Wu Hao <[email protected]>
> Acked-by: Moritz Fischer <[email protected]>
Acked-by: Alan Tull <[email protected]>
Thanks,
Alan
On Tue, Apr 2, 2019 at 10:09 AM Moritz Fischer <[email protected]> wrote:
>
> Hi Wu,
>
> On Mon, Mar 25, 2019 at 11:07:36AM +0800, Wu Hao wrote:
> > This patch adds id_table for each dfl private feature driver,
> > it allows to reuse same private feature driver to match and support
> > multiple dfl private features.
> >
> > Signed-off-by: Xu Yilun <[email protected]>
> > Signed-off-by: Wu Hao <[email protected]>
> Acked-by: Moritz Fischer <[email protected]>
Acked-by: Alan Tull <[email protected]>
Thanks,
Alan
On Thu, Apr 11, 2019 at 03:07:35PM -0500, Alan Tull wrote:
> On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
>
> Hi Hao,
>
> >
> > This patch adds support for power management private feature under
> > FPGA Management Engine (FME), sysfs interfaces are introduced for
> > different power management functions, users could use these sysfs
> > interface to get current number of consumed power, throttling
>
> How about
> s/number/measurement/
> ?
Sounds better. : )
>
> > thresholds, threshold status and other information, and configure
> > different value for throttling thresholds too.
> >
> > Signed-off-by: Luwei Kang <[email protected]>
> > Signed-off-by: Xu Yilun <[email protected]>
> > Signed-off-by: Wu Hao <[email protected]>
> > ---
> > Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++
> > drivers/fpga/dfl-fme-main.c | 257 +++++++++++++++++++++++
> > 2 files changed, 313 insertions(+)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > index d3aeb88..4b6448f 100644
> > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > @@ -100,3 +100,59 @@ Description: Read-only. Read this file to get the policy of temperature
> > threshold1. It only supports two value (policy):
> > 0 - AP2 state (90% throttling)
> > 1 - AP1 state (50% throttling)
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/consumed
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns current power consumed by FPGA.
>
> What are the units?
>
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-Write. Read/Write this file to get/set current power
> > + threshold1 in Watts.
>
> Perhaps document error codes here and for threshold2 below.
>
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-Write. Read/Write this file to get/set current power
> > + threshold2 in Watts.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1_status
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns 1 if power consumption reaches the
> > + threshold1, otherwise 0.
>
> I'm used to things like this requiring user to reset the status, so it
> may be worth making it explicit that it will return to zero if
> consumption drops below threshold if that's what's happening here.
> If it's correct, perhaps could just say something like 'returns 1 if
> power consumption is currently at or above threshold1, otherwise 0'
>
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2_status
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns 1 if power consumption reaches the
> > + threshold2, otherwise 0.
>
> Same here.
Sure, will fix all above comments in this sysfs doc.
>
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/ltr
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get current Latency Tolerance
> > + Reporting (ltr) value, it's only valid for integrated
> > + solution as it blocks CPU on low power state.
>
> If we're not on the integrated solution, it returns a value but it is
> not really real?
Currently only integrated solution is implementing this private feature, other
devices e.g. Intel PAC card is not using this private feature, so user will
not see these sysfs interfaces at all.
If in the future, other devices want
>
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/xeon_limit
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get power limit for xeon, it
> > + is only valid for integrated solution.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/fpga_limit
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get power limit for fpga, it
> > + is only valid for integrated solution.
> > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > index 449a17d..dafa6580 100644
> > --- a/drivers/fpga/dfl-fme-main.c
> > +++ b/drivers/fpga/dfl-fme-main.c
> > @@ -415,6 +415,259 @@ static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
> > .uinit = fme_thermal_mgmt_uinit,
> > };
> >
> > +#define FME_PWR_STATUS 0x8
> > +#define FME_LATENCY_TOLERANCE BIT_ULL(18)
> > +#define PWR_CONSUMED GENMASK_ULL(17, 0)
> > +
> > +#define FME_PWR_THRESHOLD 0x10
> > +#define PWR_THRESHOLD1 GENMASK_ULL(6, 0) /* in Watts */
> > +#define PWR_THRESHOLD2 GENMASK_ULL(14, 8) /* in Watts */
> > +#define PWR_THRESHOLD_MAX 0x7f
> > +#define PWR_THRESHOLD1_STATUS BIT_ULL(16)
> > +#define PWR_THRESHOLD2_STATUS BIT_ULL(17)
> > +
> > +#define FME_PWR_XEON_LIMIT 0x18
> > +#define XEON_PWR_LIMIT GENMASK_ULL(14, 0)
> > +#define XEON_PWR_EN BIT_ULL(15)
> > +#define FME_PWR_FPGA_LIMIT 0x20
> > +#define FPGA_PWR_LIMIT GENMASK_ULL(14, 0)
> > +#define FPGA_PWR_EN BIT_ULL(15)
> > +
> > +#define POWER_ATTR(_name, _mode, _show, _store) \
> > +struct device_attribute power_attr_##_name = \
> > + __ATTR(_name, _mode, _show, _store)
> > +
> > +#define POWER_ATTR_RO(_name, _show) \
> > + POWER_ATTR(_name, 0444, _show, NULL)
> > +
> > +#define POWER_ATTR_RW(_name, _show, _store) \
> > + POWER_ATTR(_name, 0644, _show, _store)
>
> Are these #defines necessary? Seems like you could just use DEVICE_ATTR*
Actually it adds a prefix power_attr_xxx there to avoid name conflicts with
other ones from different private features, e.g. for the thermal threshold.
>
> > +
> > +static ssize_t pwr_consumed_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_STATUS);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_CONSUMED, v));
> > +}
> > +static POWER_ATTR_RO(consumed, pwr_consumed_show);
> > +
> > +static ssize_t pwr_threshold1_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_THRESHOLD1, v));
> > +}
> > +
> > +static ssize_t pwr_threshold1_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > + void __iomem *base;
> > + u8 threshold;
> > + int ret;
> > + u64 v;
> > +
> > + ret = kstrtou8(buf, 0, &threshold);
> > + if (ret)
> > + return ret;
> > +
> > + if (threshold > PWR_THRESHOLD_MAX)
> > + return -EINVAL;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + mutex_lock(&pdata->lock);
> > + v = readq(base + FME_PWR_THRESHOLD);
> > + v &= ~PWR_THRESHOLD1;
> > + v |= FIELD_PREP(PWR_THRESHOLD1, threshold);
> > + writeq(v, base + FME_PWR_THRESHOLD);
> > + mutex_unlock(&pdata->lock);
> > +
> > + return count;
> > +}
> > +static POWER_ATTR_RW(threshold1, pwr_threshold1_show, pwr_threshold1_store);
> > +
> > +static ssize_t pwr_threshold2_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_THRESHOLD2, v));
> > +}
> > +
> > +static ssize_t pwr_threshold2_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > + void __iomem *base;
> > + u8 threshold;
> > + int ret;
> > + u64 v;
> > +
> > + ret = kstrtou8(buf, 0, &threshold);
> > + if (ret)
> > + return ret;
> > +
> > + if (threshold > PWR_THRESHOLD_MAX)
> > + return -EINVAL;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + mutex_lock(&pdata->lock);
> > + v = readq(base + FME_PWR_THRESHOLD);
> > + v &= ~PWR_THRESHOLD2;
> > + v |= FIELD_PREP(PWR_THRESHOLD2, threshold);
> > + writeq(v, base + FME_PWR_THRESHOLD);
> > + mutex_unlock(&pdata->lock);
> > +
> > + return count;
> > +}
> > +static POWER_ATTR_RW(threshold2, pwr_threshold2_show, pwr_threshold2_store);
> > +
> > +static ssize_t pwr_threshold1_status_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_THRESHOLD1_STATUS, v));
> > +}
> > +static POWER_ATTR_RO(threshold1_status, pwr_threshold1_status_show);
> > +
> > +static ssize_t pwr_threshold2_status_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_THRESHOLD2_STATUS, v));
> > +}
> > +static POWER_ATTR_RO(threshold2_status, pwr_threshold2_status_show);
> > +
> > +static ssize_t ltr_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_STATUS);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(FME_LATENCY_TOLERANCE, v));
> > +}
> > +static POWER_ATTR_RO(ltr, ltr_show);
> > +
> > +static ssize_t xeon_limit_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u16 xeon_limit = 0;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_XEON_LIMIT);
> > +
> > + if (FIELD_GET(XEON_PWR_EN, v))
> > + xeon_limit = FIELD_GET(XEON_PWR_LIMIT, v);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n", xeon_limit);
> > +}
> > +static POWER_ATTR_RO(xeon_limit, xeon_limit_show);
> > +
> > +static ssize_t fpga_limit_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u16 fpga_limit = 0;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_FPGA_LIMIT);
> > +
> > + if (FIELD_GET(FPGA_PWR_EN, v))
> > + fpga_limit = FIELD_GET(FPGA_PWR_LIMIT, v);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n", fpga_limit);
> > +}
> > +static POWER_ATTR_RO(fpga_limit, fpga_limit_show);
> > +
> > +static struct attribute *power_mgmt_attrs[] = {
> > + &power_attr_consumed.attr,
> > + &power_attr_threshold1.attr,
> > + &power_attr_threshold2.attr,
> > + &power_attr_threshold1_status.attr,
> > + &power_attr_threshold2_status.attr,
> > + &power_attr_xeon_limit.attr,
> > + &power_attr_fpga_limit.attr,
> > + &power_attr_ltr.attr,
>
> This is a nit, but I would expect to see these listed in the same
> order as their show/store functions above. So ltr_attr would come
> between threshold2_status_attr and xeon_limit_attr.
Sure, it does make sense.
>
> > + NULL,
> > +};
> > +
> > +static struct attribute_group power_mgmt_attr_group = {
> > + .attrs = power_mgmt_attrs,
> > + .name = "power_mgmt",
> > +};
> > +
> > +static int fme_power_mgmt_init(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + return sysfs_create_group(&pdev->dev.kobj, &power_mgmt_attr_group);
> > +}
> > +
> > +static void fme_power_mgmt_uinit(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + sysfs_remove_group(&pdev->dev.kobj, &power_mgmt_attr_group);
> > +}
> > +
> > +static const struct dfl_feature_id fme_power_mgmt_id_table[] = {
> > + {.id = FME_FEATURE_ID_POWER_MGMT,},
> > + {0,}
> > +};
> > +
> > +static const struct dfl_feature_ops fme_power_mgmt_ops = {
> > + .init = fme_power_mgmt_init,
> > + .uinit = fme_power_mgmt_uinit,
> > +};
> > +
> > static struct dfl_feature_driver fme_feature_drvs[] = {
> > {
> > .id_table = fme_hdr_id_table,
> > @@ -429,6 +682,10 @@ static struct dfl_feature_driver fme_feature_drvs[] = {
> > .ops = &fme_thermal_mgmt_ops,
> > },
> > {
> > + .id_table = fme_power_mgmt_id_table,
> > + .ops = &fme_power_mgmt_ops,
> > + },
> > + {
> > .ops = NULL,
> > },
> > };
> > --
> > 2.7.4
> >
Thanks a lot for the review and comments. :)
Hao
>
> Thanks,
> Alan
Hi Hao,
this looks suspiciously like a hwmon driver ;-)
https://www.kernel.org/doc/Documentation/hwmon/hwmon-kernel-api.txt
Cheers,
Moritz
On Thu, Apr 11, 2019 at 1:08 PM Alan Tull <[email protected]> wrote:
>
> On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
>
> Hi Hao,
>
> >
> > This patch adds support for power management private feature under
> > FPGA Management Engine (FME), sysfs interfaces are introduced for
> > different power management functions, users could use these sysfs
> > interface to get current number of consumed power, throttling
>
> How about
> s/number/measurement/
> ?
>
> > thresholds, threshold status and other information, and configure
> > different value for throttling thresholds too.
> >
> > Signed-off-by: Luwei Kang <[email protected]>
> > Signed-off-by: Xu Yilun <[email protected]>
> > Signed-off-by: Wu Hao <[email protected]>
> > ---
> > Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++
> > drivers/fpga/dfl-fme-main.c | 257 +++++++++++++++++++++++
> > 2 files changed, 313 insertions(+)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > index d3aeb88..4b6448f 100644
> > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > @@ -100,3 +100,59 @@ Description: Read-only. Read this file to get the policy of temperature
> > threshold1. It only supports two value (policy):
> > 0 - AP2 state (90% throttling)
> > 1 - AP1 state (50% throttling)
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/consumed
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns current power consumed by FPGA.
>
> What are the units?
>
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-Write. Read/Write this file to get/set current power
> > + threshold1 in Watts.
>
> Perhaps document error codes here and for threshold2 below.
>
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-Write. Read/Write this file to get/set current power
> > + threshold2 in Watts.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1_status
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns 1 if power consumption reaches the
> > + threshold1, otherwise 0.
>
> I'm used to things like this requiring user to reset the status, so it
> may be worth making it explicit that it will return to zero if
> consumption drops below threshold if that's what's happening here.
> If it's correct, perhaps could just say something like 'returns 1 if
> power consumption is currently at or above threshold1, otherwise 0'
>
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2_status
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. It returns 1 if power consumption reaches the
> > + threshold2, otherwise 0.
>
> Same here.
>
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/ltr
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get current Latency Tolerance
> > + Reporting (ltr) value, it's only valid for integrated
> > + solution as it blocks CPU on low power state.
>
> If we're not on the integrated solution, it returns a value but it is
> not really real?
>
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/xeon_limit
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get power limit for xeon, it
> > + is only valid for integrated solution.
> > +
> > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/fpga_limit
> > +Date: March 2019
> > +KernelVersion: 5.2
> > +Contact: Wu Hao <[email protected]>
> > +Description: Read-only. Read this file to get power limit for fpga, it
> > + is only valid for integrated solution.
> > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > index 449a17d..dafa6580 100644
> > --- a/drivers/fpga/dfl-fme-main.c
> > +++ b/drivers/fpga/dfl-fme-main.c
> > @@ -415,6 +415,259 @@ static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
> > .uinit = fme_thermal_mgmt_uinit,
> > };
> >
> > +#define FME_PWR_STATUS 0x8
> > +#define FME_LATENCY_TOLERANCE BIT_ULL(18)
> > +#define PWR_CONSUMED GENMASK_ULL(17, 0)
> > +
> > +#define FME_PWR_THRESHOLD 0x10
> > +#define PWR_THRESHOLD1 GENMASK_ULL(6, 0) /* in Watts */
> > +#define PWR_THRESHOLD2 GENMASK_ULL(14, 8) /* in Watts */
> > +#define PWR_THRESHOLD_MAX 0x7f
> > +#define PWR_THRESHOLD1_STATUS BIT_ULL(16)
> > +#define PWR_THRESHOLD2_STATUS BIT_ULL(17)
> > +
> > +#define FME_PWR_XEON_LIMIT 0x18
> > +#define XEON_PWR_LIMIT GENMASK_ULL(14, 0)
> > +#define XEON_PWR_EN BIT_ULL(15)
> > +#define FME_PWR_FPGA_LIMIT 0x20
> > +#define FPGA_PWR_LIMIT GENMASK_ULL(14, 0)
> > +#define FPGA_PWR_EN BIT_ULL(15)
> > +
> > +#define POWER_ATTR(_name, _mode, _show, _store) \
> > +struct device_attribute power_attr_##_name = \
> > + __ATTR(_name, _mode, _show, _store)
> > +
> > +#define POWER_ATTR_RO(_name, _show) \
> > + POWER_ATTR(_name, 0444, _show, NULL)
> > +
> > +#define POWER_ATTR_RW(_name, _show, _store) \
> > + POWER_ATTR(_name, 0644, _show, _store)
>
> Are these #defines necessary? Seems like you could just use DEVICE_ATTR*
>
> > +
> > +static ssize_t pwr_consumed_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_STATUS);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_CONSUMED, v));
> > +}
> > +static POWER_ATTR_RO(consumed, pwr_consumed_show);
> > +
> > +static ssize_t pwr_threshold1_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_THRESHOLD1, v));
> > +}
> > +
> > +static ssize_t pwr_threshold1_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > + void __iomem *base;
> > + u8 threshold;
> > + int ret;
> > + u64 v;
> > +
> > + ret = kstrtou8(buf, 0, &threshold);
> > + if (ret)
> > + return ret;
> > +
> > + if (threshold > PWR_THRESHOLD_MAX)
> > + return -EINVAL;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + mutex_lock(&pdata->lock);
> > + v = readq(base + FME_PWR_THRESHOLD);
> > + v &= ~PWR_THRESHOLD1;
> > + v |= FIELD_PREP(PWR_THRESHOLD1, threshold);
> > + writeq(v, base + FME_PWR_THRESHOLD);
> > + mutex_unlock(&pdata->lock);
> > +
> > + return count;
> > +}
> > +static POWER_ATTR_RW(threshold1, pwr_threshold1_show, pwr_threshold1_store);
> > +
> > +static ssize_t pwr_threshold2_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_THRESHOLD2, v));
> > +}
> > +
> > +static ssize_t pwr_threshold2_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > + void __iomem *base;
> > + u8 threshold;
> > + int ret;
> > + u64 v;
> > +
> > + ret = kstrtou8(buf, 0, &threshold);
> > + if (ret)
> > + return ret;
> > +
> > + if (threshold > PWR_THRESHOLD_MAX)
> > + return -EINVAL;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + mutex_lock(&pdata->lock);
> > + v = readq(base + FME_PWR_THRESHOLD);
> > + v &= ~PWR_THRESHOLD2;
> > + v |= FIELD_PREP(PWR_THRESHOLD2, threshold);
> > + writeq(v, base + FME_PWR_THRESHOLD);
> > + mutex_unlock(&pdata->lock);
> > +
> > + return count;
> > +}
> > +static POWER_ATTR_RW(threshold2, pwr_threshold2_show, pwr_threshold2_store);
> > +
> > +static ssize_t pwr_threshold1_status_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_THRESHOLD1_STATUS, v));
> > +}
> > +static POWER_ATTR_RO(threshold1_status, pwr_threshold1_status_show);
> > +
> > +static ssize_t pwr_threshold2_status_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_THRESHOLD);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(PWR_THRESHOLD2_STATUS, v));
> > +}
> > +static POWER_ATTR_RO(threshold2_status, pwr_threshold2_status_show);
> > +
> > +static ssize_t ltr_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_STATUS);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > + (unsigned int)FIELD_GET(FME_LATENCY_TOLERANCE, v));
> > +}
> > +static POWER_ATTR_RO(ltr, ltr_show);
> > +
> > +static ssize_t xeon_limit_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u16 xeon_limit = 0;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_XEON_LIMIT);
> > +
> > + if (FIELD_GET(XEON_PWR_EN, v))
> > + xeon_limit = FIELD_GET(XEON_PWR_LIMIT, v);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n", xeon_limit);
> > +}
> > +static POWER_ATTR_RO(xeon_limit, xeon_limit_show);
> > +
> > +static ssize_t fpga_limit_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + void __iomem *base;
> > + u16 fpga_limit = 0;
> > + u64 v;
> > +
> > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > +
> > + v = readq(base + FME_PWR_FPGA_LIMIT);
> > +
> > + if (FIELD_GET(FPGA_PWR_EN, v))
> > + fpga_limit = FIELD_GET(FPGA_PWR_LIMIT, v);
> > +
> > + return scnprintf(buf, PAGE_SIZE, "%u\n", fpga_limit);
> > +}
> > +static POWER_ATTR_RO(fpga_limit, fpga_limit_show);
> > +
> > +static struct attribute *power_mgmt_attrs[] = {
> > + &power_attr_consumed.attr,
> > + &power_attr_threshold1.attr,
> > + &power_attr_threshold2.attr,
> > + &power_attr_threshold1_status.attr,
> > + &power_attr_threshold2_status.attr,
> > + &power_attr_xeon_limit.attr,
> > + &power_attr_fpga_limit.attr,
> > + &power_attr_ltr.attr,
>
> This is a nit, but I would expect to see these listed in the same
> order as their show/store functions above. So ltr_attr would come
> between threshold2_status_attr and xeon_limit_attr.
>
> > + NULL,
> > +};
> > +
> > +static struct attribute_group power_mgmt_attr_group = {
> > + .attrs = power_mgmt_attrs,
> > + .name = "power_mgmt",
> > +};
> > +
> > +static int fme_power_mgmt_init(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + return sysfs_create_group(&pdev->dev.kobj, &power_mgmt_attr_group);
> > +}
> > +
> > +static void fme_power_mgmt_uinit(struct platform_device *pdev,
> > + struct dfl_feature *feature)
> > +{
> > + sysfs_remove_group(&pdev->dev.kobj, &power_mgmt_attr_group);
> > +}
> > +
> > +static const struct dfl_feature_id fme_power_mgmt_id_table[] = {
> > + {.id = FME_FEATURE_ID_POWER_MGMT,},
> > + {0,}
> > +};
> > +
> > +static const struct dfl_feature_ops fme_power_mgmt_ops = {
> > + .init = fme_power_mgmt_init,
> > + .uinit = fme_power_mgmt_uinit,
> > +};
> > +
> > static struct dfl_feature_driver fme_feature_drvs[] = {
> > {
> > .id_table = fme_hdr_id_table,
> > @@ -429,6 +682,10 @@ static struct dfl_feature_driver fme_feature_drvs[] = {
> > .ops = &fme_thermal_mgmt_ops,
> > },
> > {
> > + .id_table = fme_power_mgmt_id_table,
> > + .ops = &fme_power_mgmt_ops,
> > + },
> > + {
> > .ops = NULL,
> > },
> > };
> > --
> > 2.7.4
> >
>
> Thanks,
> Alan
On Thu, Apr 11, 2019 at 10:06 PM Wu Hao <[email protected]> wrote:
>
> On Thu, Apr 11, 2019 at 03:07:35PM -0500, Alan Tull wrote:
> > On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
> >
> > Hi Hao,
> >
> > >
> > > This patch adds support for power management private feature under
> > > FPGA Management Engine (FME), sysfs interfaces are introduced for
> > > different power management functions, users could use these sysfs
> > > interface to get current number of consumed power, throttling
> >
> > How about
> > s/number/measurement/
> > ?
>
> Sounds better. : )
>
> >
> > > thresholds, threshold status and other information, and configure
> > > different value for throttling thresholds too.
> > >
> > > Signed-off-by: Luwei Kang <[email protected]>
> > > Signed-off-by: Xu Yilun <[email protected]>
> > > Signed-off-by: Wu Hao <[email protected]>
> > > ---
> > > Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++
> > > drivers/fpga/dfl-fme-main.c | 257 +++++++++++++++++++++++
> > > 2 files changed, 313 insertions(+)
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > index d3aeb88..4b6448f 100644
> > > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > @@ -100,3 +100,59 @@ Description: Read-only. Read this file to get the policy of temperature
> > > threshold1. It only supports two value (policy):
> > > 0 - AP2 state (90% throttling)
> > > 1 - AP1 state (50% throttling)
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/consumed
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. It returns current power consumed by FPGA.
> >
> > What are the units?
> >
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-Write. Read/Write this file to get/set current power
> > > + threshold1 in Watts.
> >
> > Perhaps document error codes here and for threshold2 below.
> >
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-Write. Read/Write this file to get/set current power
> > > + threshold2 in Watts.
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1_status
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. It returns 1 if power consumption reaches the
> > > + threshold1, otherwise 0.
> >
> > I'm used to things like this requiring user to reset the status, so it
> > may be worth making it explicit that it will return to zero if
> > consumption drops below threshold if that's what's happening here.
> > If it's correct, perhaps could just say something like 'returns 1 if
> > power consumption is currently at or above threshold1, otherwise 0'
> >
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2_status
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. It returns 1 if power consumption reaches the
> > > + threshold2, otherwise 0.
> >
> > Same here.
>
> Sure, will fix all above comments in this sysfs doc.
>
> >
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/ltr
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. Read this file to get current Latency Tolerance
> > > + Reporting (ltr) value, it's only valid for integrated
> > > + solution as it blocks CPU on low power state.
> >
> > If we're not on the integrated solution, it returns a value but it is
> > not really real?
>
> Currently only integrated solution is implementing this private feature, other
> devices e.g. Intel PAC card is not using this private feature, so user will
> not see these sysfs interfaces at all.
OK then perhaps the "it's only valid for integrated solution as it
blocks CPU on low power state" explanation doesn't need to be here and
can lead to confusion.
>
> If in the future, other devices want
>
> >
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/xeon_limit
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. Read this file to get power limit for xeon, it
> > > + is only valid for integrated solution.
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/fpga_limit
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. Read this file to get power limit for fpga, it
> > > + is only valid for integrated solution.
> > > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > > index 449a17d..dafa6580 100644
> > > --- a/drivers/fpga/dfl-fme-main.c
> > > +++ b/drivers/fpga/dfl-fme-main.c
> > > @@ -415,6 +415,259 @@ static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
> > > .uinit = fme_thermal_mgmt_uinit,
> > > };
> > >
> > > +#define FME_PWR_STATUS 0x8
> > > +#define FME_LATENCY_TOLERANCE BIT_ULL(18)
> > > +#define PWR_CONSUMED GENMASK_ULL(17, 0)
> > > +
> > > +#define FME_PWR_THRESHOLD 0x10
> > > +#define PWR_THRESHOLD1 GENMASK_ULL(6, 0) /* in Watts */
> > > +#define PWR_THRESHOLD2 GENMASK_ULL(14, 8) /* in Watts */
> > > +#define PWR_THRESHOLD_MAX 0x7f
> > > +#define PWR_THRESHOLD1_STATUS BIT_ULL(16)
> > > +#define PWR_THRESHOLD2_STATUS BIT_ULL(17)
> > > +
> > > +#define FME_PWR_XEON_LIMIT 0x18
> > > +#define XEON_PWR_LIMIT GENMASK_ULL(14, 0)
> > > +#define XEON_PWR_EN BIT_ULL(15)
> > > +#define FME_PWR_FPGA_LIMIT 0x20
> > > +#define FPGA_PWR_LIMIT GENMASK_ULL(14, 0)
> > > +#define FPGA_PWR_EN BIT_ULL(15)
> > > +
> > > +#define POWER_ATTR(_name, _mode, _show, _store) \
> > > +struct device_attribute power_attr_##_name = \
> > > + __ATTR(_name, _mode, _show, _store)
> > > +
> > > +#define POWER_ATTR_RO(_name, _show) \
> > > + POWER_ATTR(_name, 0444, _show, NULL)
> > > +
> > > +#define POWER_ATTR_RW(_name, _show, _store) \
> > > + POWER_ATTR(_name, 0644, _show, _store)
> >
> > Are these #defines necessary? Seems like you could just use DEVICE_ATTR*
>
> Actually it adds a prefix power_attr_xxx there to avoid name conflicts with
> other ones from different private features, e.g. for the thermal threshold.
Ah yes, I see it now, thanks.
>
> >
> > > +
> > > +static ssize_t pwr_consumed_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_STATUS);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(PWR_CONSUMED, v));
> > > +}
> > > +static POWER_ATTR_RO(consumed, pwr_consumed_show);
> > > +
> > > +static ssize_t pwr_threshold1_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(PWR_THRESHOLD1, v));
> > > +}
> > > +
> > > +static ssize_t pwr_threshold1_store(struct device *dev,
> > > + struct device_attribute *attr,
> > > + const char *buf, size_t count)
> > > +{
> > > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > > + void __iomem *base;
> > > + u8 threshold;
> > > + int ret;
> > > + u64 v;
> > > +
> > > + ret = kstrtou8(buf, 0, &threshold);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (threshold > PWR_THRESHOLD_MAX)
> > > + return -EINVAL;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + mutex_lock(&pdata->lock);
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > + v &= ~PWR_THRESHOLD1;
> > > + v |= FIELD_PREP(PWR_THRESHOLD1, threshold);
> > > + writeq(v, base + FME_PWR_THRESHOLD);
> > > + mutex_unlock(&pdata->lock);
> > > +
> > > + return count;
> > > +}
> > > +static POWER_ATTR_RW(threshold1, pwr_threshold1_show, pwr_threshold1_store);
> > > +
> > > +static ssize_t pwr_threshold2_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(PWR_THRESHOLD2, v));
> > > +}
> > > +
> > > +static ssize_t pwr_threshold2_store(struct device *dev,
> > > + struct device_attribute *attr,
> > > + const char *buf, size_t count)
> > > +{
> > > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > > + void __iomem *base;
> > > + u8 threshold;
> > > + int ret;
> > > + u64 v;
> > > +
> > > + ret = kstrtou8(buf, 0, &threshold);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (threshold > PWR_THRESHOLD_MAX)
> > > + return -EINVAL;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + mutex_lock(&pdata->lock);
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > + v &= ~PWR_THRESHOLD2;
> > > + v |= FIELD_PREP(PWR_THRESHOLD2, threshold);
> > > + writeq(v, base + FME_PWR_THRESHOLD);
> > > + mutex_unlock(&pdata->lock);
> > > +
> > > + return count;
> > > +}
> > > +static POWER_ATTR_RW(threshold2, pwr_threshold2_show, pwr_threshold2_store);
> > > +
> > > +static ssize_t pwr_threshold1_status_show(struct device *dev,
> > > + struct device_attribute *attr,
> > > + char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(PWR_THRESHOLD1_STATUS, v));
> > > +}
> > > +static POWER_ATTR_RO(threshold1_status, pwr_threshold1_status_show);
> > > +
> > > +static ssize_t pwr_threshold2_status_show(struct device *dev,
> > > + struct device_attribute *attr,
> > > + char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(PWR_THRESHOLD2_STATUS, v));
> > > +}
> > > +static POWER_ATTR_RO(threshold2_status, pwr_threshold2_status_show);
> > > +
> > > +static ssize_t ltr_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_STATUS);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(FME_LATENCY_TOLERANCE, v));
> > > +}
> > > +static POWER_ATTR_RO(ltr, ltr_show);
> > > +
> > > +static ssize_t xeon_limit_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u16 xeon_limit = 0;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_XEON_LIMIT);
> > > +
> > > + if (FIELD_GET(XEON_PWR_EN, v))
> > > + xeon_limit = FIELD_GET(XEON_PWR_LIMIT, v);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n", xeon_limit);
> > > +}
> > > +static POWER_ATTR_RO(xeon_limit, xeon_limit_show);
> > > +
> > > +static ssize_t fpga_limit_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u16 fpga_limit = 0;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_FPGA_LIMIT);
> > > +
> > > + if (FIELD_GET(FPGA_PWR_EN, v))
> > > + fpga_limit = FIELD_GET(FPGA_PWR_LIMIT, v);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n", fpga_limit);
> > > +}
> > > +static POWER_ATTR_RO(fpga_limit, fpga_limit_show);
> > > +
> > > +static struct attribute *power_mgmt_attrs[] = {
> > > + &power_attr_consumed.attr,
> > > + &power_attr_threshold1.attr,
> > > + &power_attr_threshold2.attr,
> > > + &power_attr_threshold1_status.attr,
> > > + &power_attr_threshold2_status.attr,
> > > + &power_attr_xeon_limit.attr,
> > > + &power_attr_fpga_limit.attr,
> > > + &power_attr_ltr.attr,
> >
> > This is a nit, but I would expect to see these listed in the same
> > order as their show/store functions above. So ltr_attr would come
> > between threshold2_status_attr and xeon_limit_attr.
>
> Sure, it does make sense.
>
> >
> > > + NULL,
> > > +};
> > > +
> > > +static struct attribute_group power_mgmt_attr_group = {
> > > + .attrs = power_mgmt_attrs,
> > > + .name = "power_mgmt",
> > > +};
> > > +
> > > +static int fme_power_mgmt_init(struct platform_device *pdev,
> > > + struct dfl_feature *feature)
> > > +{
> > > + return sysfs_create_group(&pdev->dev.kobj, &power_mgmt_attr_group);
> > > +}
> > > +
> > > +static void fme_power_mgmt_uinit(struct platform_device *pdev,
> > > + struct dfl_feature *feature)
> > > +{
> > > + sysfs_remove_group(&pdev->dev.kobj, &power_mgmt_attr_group);
> > > +}
> > > +
> > > +static const struct dfl_feature_id fme_power_mgmt_id_table[] = {
> > > + {.id = FME_FEATURE_ID_POWER_MGMT,},
> > > + {0,}
> > > +};
> > > +
> > > +static const struct dfl_feature_ops fme_power_mgmt_ops = {
> > > + .init = fme_power_mgmt_init,
> > > + .uinit = fme_power_mgmt_uinit,
> > > +};
> > > +
> > > static struct dfl_feature_driver fme_feature_drvs[] = {
> > > {
> > > .id_table = fme_hdr_id_table,
> > > @@ -429,6 +682,10 @@ static struct dfl_feature_driver fme_feature_drvs[] = {
> > > .ops = &fme_thermal_mgmt_ops,
> > > },
> > > {
> > > + .id_table = fme_power_mgmt_id_table,
> > > + .ops = &fme_power_mgmt_ops,
> > > + },
> > > + {
> > > .ops = NULL,
> > > },
> > > };
> > > --
> > > 2.7.4
> > >
>
> Thanks a lot for the review and comments. :)
>
> Hao
>
> >
> > Thanks,
> > Alan
On Fri, Apr 12, 2019 at 02:05:21PM -0700, Moritz Fischer wrote:
> Hi Hao,
>
> this looks suspiciously like a hwmon driver ;-)
>
> https://www.kernel.org/doc/Documentation/hwmon/hwmon-kernel-api.txt
Hi Moritz,
Thanks a lot for the suggestion, yes, agree, and patch for thermal
management should be the similar case too. Let me see if i can make
thermal / power management code to hwmon in the next version. : )
Hao
>
> Cheers,
> Moritz
>
>
> On Thu, Apr 11, 2019 at 1:08 PM Alan Tull <[email protected]> wrote:
> >
> > On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
> >
> > Hi Hao,
> >
> > >
> > > This patch adds support for power management private feature under
> > > FPGA Management Engine (FME), sysfs interfaces are introduced for
> > > different power management functions, users could use these sysfs
> > > interface to get current number of consumed power, throttling
> >
> > How about
> > s/number/measurement/
> > ?
> >
> > > thresholds, threshold status and other information, and configure
> > > different value for throttling thresholds too.
> > >
> > > Signed-off-by: Luwei Kang <[email protected]>
> > > Signed-off-by: Xu Yilun <[email protected]>
> > > Signed-off-by: Wu Hao <[email protected]>
> > > ---
> > > Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++
> > > drivers/fpga/dfl-fme-main.c | 257 +++++++++++++++++++++++
> > > 2 files changed, 313 insertions(+)
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > index d3aeb88..4b6448f 100644
> > > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > @@ -100,3 +100,59 @@ Description: Read-only. Read this file to get the policy of temperature
> > > threshold1. It only supports two value (policy):
> > > 0 - AP2 state (90% throttling)
> > > 1 - AP1 state (50% throttling)
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/consumed
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. It returns current power consumed by FPGA.
> >
> > What are the units?
> >
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-Write. Read/Write this file to get/set current power
> > > + threshold1 in Watts.
> >
> > Perhaps document error codes here and for threshold2 below.
> >
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-Write. Read/Write this file to get/set current power
> > > + threshold2 in Watts.
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1_status
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. It returns 1 if power consumption reaches the
> > > + threshold1, otherwise 0.
> >
> > I'm used to things like this requiring user to reset the status, so it
> > may be worth making it explicit that it will return to zero if
> > consumption drops below threshold if that's what's happening here.
> > If it's correct, perhaps could just say something like 'returns 1 if
> > power consumption is currently at or above threshold1, otherwise 0'
> >
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2_status
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. It returns 1 if power consumption reaches the
> > > + threshold2, otherwise 0.
> >
> > Same here.
> >
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/ltr
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. Read this file to get current Latency Tolerance
> > > + Reporting (ltr) value, it's only valid for integrated
> > > + solution as it blocks CPU on low power state.
> >
> > If we're not on the integrated solution, it returns a value but it is
> > not really real?
> >
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/xeon_limit
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. Read this file to get power limit for xeon, it
> > > + is only valid for integrated solution.
> > > +
> > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/fpga_limit
> > > +Date: March 2019
> > > +KernelVersion: 5.2
> > > +Contact: Wu Hao <[email protected]>
> > > +Description: Read-only. Read this file to get power limit for fpga, it
> > > + is only valid for integrated solution.
> > > diff --git a/drivers/fpga/dfl-fme-main.c b/drivers/fpga/dfl-fme-main.c
> > > index 449a17d..dafa6580 100644
> > > --- a/drivers/fpga/dfl-fme-main.c
> > > +++ b/drivers/fpga/dfl-fme-main.c
> > > @@ -415,6 +415,259 @@ static const struct dfl_feature_ops fme_thermal_mgmt_ops = {
> > > .uinit = fme_thermal_mgmt_uinit,
> > > };
> > >
> > > +#define FME_PWR_STATUS 0x8
> > > +#define FME_LATENCY_TOLERANCE BIT_ULL(18)
> > > +#define PWR_CONSUMED GENMASK_ULL(17, 0)
> > > +
> > > +#define FME_PWR_THRESHOLD 0x10
> > > +#define PWR_THRESHOLD1 GENMASK_ULL(6, 0) /* in Watts */
> > > +#define PWR_THRESHOLD2 GENMASK_ULL(14, 8) /* in Watts */
> > > +#define PWR_THRESHOLD_MAX 0x7f
> > > +#define PWR_THRESHOLD1_STATUS BIT_ULL(16)
> > > +#define PWR_THRESHOLD2_STATUS BIT_ULL(17)
> > > +
> > > +#define FME_PWR_XEON_LIMIT 0x18
> > > +#define XEON_PWR_LIMIT GENMASK_ULL(14, 0)
> > > +#define XEON_PWR_EN BIT_ULL(15)
> > > +#define FME_PWR_FPGA_LIMIT 0x20
> > > +#define FPGA_PWR_LIMIT GENMASK_ULL(14, 0)
> > > +#define FPGA_PWR_EN BIT_ULL(15)
> > > +
> > > +#define POWER_ATTR(_name, _mode, _show, _store) \
> > > +struct device_attribute power_attr_##_name = \
> > > + __ATTR(_name, _mode, _show, _store)
> > > +
> > > +#define POWER_ATTR_RO(_name, _show) \
> > > + POWER_ATTR(_name, 0444, _show, NULL)
> > > +
> > > +#define POWER_ATTR_RW(_name, _show, _store) \
> > > + POWER_ATTR(_name, 0644, _show, _store)
> >
> > Are these #defines necessary? Seems like you could just use DEVICE_ATTR*
> >
> > > +
> > > +static ssize_t pwr_consumed_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_STATUS);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(PWR_CONSUMED, v));
> > > +}
> > > +static POWER_ATTR_RO(consumed, pwr_consumed_show);
> > > +
> > > +static ssize_t pwr_threshold1_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(PWR_THRESHOLD1, v));
> > > +}
> > > +
> > > +static ssize_t pwr_threshold1_store(struct device *dev,
> > > + struct device_attribute *attr,
> > > + const char *buf, size_t count)
> > > +{
> > > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > > + void __iomem *base;
> > > + u8 threshold;
> > > + int ret;
> > > + u64 v;
> > > +
> > > + ret = kstrtou8(buf, 0, &threshold);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (threshold > PWR_THRESHOLD_MAX)
> > > + return -EINVAL;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + mutex_lock(&pdata->lock);
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > + v &= ~PWR_THRESHOLD1;
> > > + v |= FIELD_PREP(PWR_THRESHOLD1, threshold);
> > > + writeq(v, base + FME_PWR_THRESHOLD);
> > > + mutex_unlock(&pdata->lock);
> > > +
> > > + return count;
> > > +}
> > > +static POWER_ATTR_RW(threshold1, pwr_threshold1_show, pwr_threshold1_store);
> > > +
> > > +static ssize_t pwr_threshold2_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(PWR_THRESHOLD2, v));
> > > +}
> > > +
> > > +static ssize_t pwr_threshold2_store(struct device *dev,
> > > + struct device_attribute *attr,
> > > + const char *buf, size_t count)
> > > +{
> > > + struct dfl_feature_platform_data *pdata = dev_get_platdata(dev);
> > > + void __iomem *base;
> > > + u8 threshold;
> > > + int ret;
> > > + u64 v;
> > > +
> > > + ret = kstrtou8(buf, 0, &threshold);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (threshold > PWR_THRESHOLD_MAX)
> > > + return -EINVAL;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + mutex_lock(&pdata->lock);
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > + v &= ~PWR_THRESHOLD2;
> > > + v |= FIELD_PREP(PWR_THRESHOLD2, threshold);
> > > + writeq(v, base + FME_PWR_THRESHOLD);
> > > + mutex_unlock(&pdata->lock);
> > > +
> > > + return count;
> > > +}
> > > +static POWER_ATTR_RW(threshold2, pwr_threshold2_show, pwr_threshold2_store);
> > > +
> > > +static ssize_t pwr_threshold1_status_show(struct device *dev,
> > > + struct device_attribute *attr,
> > > + char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(PWR_THRESHOLD1_STATUS, v));
> > > +}
> > > +static POWER_ATTR_RO(threshold1_status, pwr_threshold1_status_show);
> > > +
> > > +static ssize_t pwr_threshold2_status_show(struct device *dev,
> > > + struct device_attribute *attr,
> > > + char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_THRESHOLD);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(PWR_THRESHOLD2_STATUS, v));
> > > +}
> > > +static POWER_ATTR_RO(threshold2_status, pwr_threshold2_status_show);
> > > +
> > > +static ssize_t ltr_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_STATUS);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n",
> > > + (unsigned int)FIELD_GET(FME_LATENCY_TOLERANCE, v));
> > > +}
> > > +static POWER_ATTR_RO(ltr, ltr_show);
> > > +
> > > +static ssize_t xeon_limit_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u16 xeon_limit = 0;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_XEON_LIMIT);
> > > +
> > > + if (FIELD_GET(XEON_PWR_EN, v))
> > > + xeon_limit = FIELD_GET(XEON_PWR_LIMIT, v);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n", xeon_limit);
> > > +}
> > > +static POWER_ATTR_RO(xeon_limit, xeon_limit_show);
> > > +
> > > +static ssize_t fpga_limit_show(struct device *dev,
> > > + struct device_attribute *attr, char *buf)
> > > +{
> > > + void __iomem *base;
> > > + u16 fpga_limit = 0;
> > > + u64 v;
> > > +
> > > + base = dfl_get_feature_ioaddr_by_id(dev, FME_FEATURE_ID_POWER_MGMT);
> > > +
> > > + v = readq(base + FME_PWR_FPGA_LIMIT);
> > > +
> > > + if (FIELD_GET(FPGA_PWR_EN, v))
> > > + fpga_limit = FIELD_GET(FPGA_PWR_LIMIT, v);
> > > +
> > > + return scnprintf(buf, PAGE_SIZE, "%u\n", fpga_limit);
> > > +}
> > > +static POWER_ATTR_RO(fpga_limit, fpga_limit_show);
> > > +
> > > +static struct attribute *power_mgmt_attrs[] = {
> > > + &power_attr_consumed.attr,
> > > + &power_attr_threshold1.attr,
> > > + &power_attr_threshold2.attr,
> > > + &power_attr_threshold1_status.attr,
> > > + &power_attr_threshold2_status.attr,
> > > + &power_attr_xeon_limit.attr,
> > > + &power_attr_fpga_limit.attr,
> > > + &power_attr_ltr.attr,
> >
> > This is a nit, but I would expect to see these listed in the same
> > order as their show/store functions above. So ltr_attr would come
> > between threshold2_status_attr and xeon_limit_attr.
> >
> > > + NULL,
> > > +};
> > > +
> > > +static struct attribute_group power_mgmt_attr_group = {
> > > + .attrs = power_mgmt_attrs,
> > > + .name = "power_mgmt",
> > > +};
> > > +
> > > +static int fme_power_mgmt_init(struct platform_device *pdev,
> > > + struct dfl_feature *feature)
> > > +{
> > > + return sysfs_create_group(&pdev->dev.kobj, &power_mgmt_attr_group);
> > > +}
> > > +
> > > +static void fme_power_mgmt_uinit(struct platform_device *pdev,
> > > + struct dfl_feature *feature)
> > > +{
> > > + sysfs_remove_group(&pdev->dev.kobj, &power_mgmt_attr_group);
> > > +}
> > > +
> > > +static const struct dfl_feature_id fme_power_mgmt_id_table[] = {
> > > + {.id = FME_FEATURE_ID_POWER_MGMT,},
> > > + {0,}
> > > +};
> > > +
> > > +static const struct dfl_feature_ops fme_power_mgmt_ops = {
> > > + .init = fme_power_mgmt_init,
> > > + .uinit = fme_power_mgmt_uinit,
> > > +};
> > > +
> > > static struct dfl_feature_driver fme_feature_drvs[] = {
> > > {
> > > .id_table = fme_hdr_id_table,
> > > @@ -429,6 +682,10 @@ static struct dfl_feature_driver fme_feature_drvs[] = {
> > > .ops = &fme_thermal_mgmt_ops,
> > > },
> > > {
> > > + .id_table = fme_power_mgmt_id_table,
> > > + .ops = &fme_power_mgmt_ops,
> > > + },
> > > + {
> > > .ops = NULL,
> > > },
> > > };
> > > --
> > > 2.7.4
> > >
> >
> > Thanks,
> > Alan
On Mon, Apr 15, 2019 at 04:17:48PM -0500, Alan Tull wrote:
> On Thu, Apr 11, 2019 at 10:06 PM Wu Hao <[email protected]> wrote:
> >
> > On Thu, Apr 11, 2019 at 03:07:35PM -0500, Alan Tull wrote:
> > > On Sun, Mar 24, 2019 at 10:24 PM Wu Hao <[email protected]> wrote:
> > >
> > > Hi Hao,
> > >
> > > >
> > > > This patch adds support for power management private feature under
> > > > FPGA Management Engine (FME), sysfs interfaces are introduced for
> > > > different power management functions, users could use these sysfs
> > > > interface to get current number of consumed power, throttling
> > >
> > > How about
> > > s/number/measurement/
> > > ?
> >
> > Sounds better. : )
> >
> > >
> > > > thresholds, threshold status and other information, and configure
> > > > different value for throttling thresholds too.
> > > >
> > > > Signed-off-by: Luwei Kang <[email protected]>
> > > > Signed-off-by: Xu Yilun <[email protected]>
> > > > Signed-off-by: Wu Hao <[email protected]>
> > > > ---
> > > > Documentation/ABI/testing/sysfs-platform-dfl-fme | 56 +++++
> > > > drivers/fpga/dfl-fme-main.c | 257 +++++++++++++++++++++++
> > > > 2 files changed, 313 insertions(+)
> > > >
> > > > diff --git a/Documentation/ABI/testing/sysfs-platform-dfl-fme b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > > index d3aeb88..4b6448f 100644
> > > > --- a/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > > +++ b/Documentation/ABI/testing/sysfs-platform-dfl-fme
> > > > @@ -100,3 +100,59 @@ Description: Read-only. Read this file to get the policy of temperature
> > > > threshold1. It only supports two value (policy):
> > > > 0 - AP2 state (90% throttling)
> > > > 1 - AP1 state (50% throttling)
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/consumed
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-only. It returns current power consumed by FPGA.
> > >
> > > What are the units?
> > >
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-Write. Read/Write this file to get/set current power
> > > > + threshold1 in Watts.
> > >
> > > Perhaps document error codes here and for threshold2 below.
> > >
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-Write. Read/Write this file to get/set current power
> > > > + threshold2 in Watts.
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold1_status
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-only. It returns 1 if power consumption reaches the
> > > > + threshold1, otherwise 0.
> > >
> > > I'm used to things like this requiring user to reset the status, so it
> > > may be worth making it explicit that it will return to zero if
> > > consumption drops below threshold if that's what's happening here.
> > > If it's correct, perhaps could just say something like 'returns 1 if
> > > power consumption is currently at or above threshold1, otherwise 0'
> > >
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/threshold2_status
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-only. It returns 1 if power consumption reaches the
> > > > + threshold2, otherwise 0.
> > >
> > > Same here.
> >
> > Sure, will fix all above comments in this sysfs doc.
> >
> > >
> > > > +
> > > > +What: /sys/bus/platform/devices/dfl-fme.0/power_mgmt/ltr
> > > > +Date: March 2019
> > > > +KernelVersion: 5.2
> > > > +Contact: Wu Hao <[email protected]>
> > > > +Description: Read-only. Read this file to get current Latency Tolerance
> > > > + Reporting (ltr) value, it's only valid for integrated
> > > > + solution as it blocks CPU on low power state.
> > >
> > > If we're not on the integrated solution, it returns a value but it is
> > > not really real?
> >
> > Currently only integrated solution is implementing this private feature, other
> > devices e.g. Intel PAC card is not using this private feature, so user will
> > not see these sysfs interfaces at all.
>
> OK then perhaps the "it's only valid for integrated solution as it
> blocks CPU on low power state" explanation doesn't need to be here and
> can lead to confusion.
>
Sure, will fix it in the next version. Thanks!
Hao