Hi Greg,
I applied the fixups to sysfs ABI docs, I would appreciate if PECI
could go through your tree into v5.18.
Here is the usual cover letter from the previous revision:
The Platform Environment Control Interface (PECI) is a communication
interface between Intel processors and management controllers (e.g.
Baseboard Management Controller, BMC).
This series adds a PECI subsystem and introduces drivers which run in
the Linux instance on the management controller (not the main Intel
processor) and is intended to be used by the OpenBMC [1], a Linux
distribution for BMC devices.
The information exposed over PECI (like processor and DIMM
temperature) refers to the Intel processor and can be consumed by
daemons running on the BMC to, for example, display the processor
temperature in its web interface.
The PECI bus is collection of code that provides interface support
between PECI devices (that actually represent processors) and PECI
controllers (such as the "peci-aspeed" controller) that allow to
access physical PECI interface. PECI devices are bound to PECI
drivers that provides access to PECI services. This series introduces
a generic "peci-cpu" driver that exposes hardware monitoring "cputemp"
and "dimmtemp" using the auxiliary bus.
Exposing "raw" PECI to userspace, either to write userspace drivers or
for debug/testing purpose was left out of this series to encourage
writing kernel drivers instead, but may be pursued in the future.
Introducing PECI to upstream Linux was already attempted before [2].
Since it's been over a year since last revision, and the series
changed quite a bit in the meantime, I've decided to start from v1.
I would also like to give credit to everyone who helped me with
different aspects of preliminary review:
- Pierre-Louis Bossart,
- Tony Luck,
- Andy Shevchenko,
- Dave Hansen.
[1] https://github.com/openbmc/openbmc
[2] https://lore.kernel.org/openbmc/[email protected]/
Changes v7 -> v8:
* Updated "KernelVersion" in sysfs ABI docs (Greg)
Changes v6 -> v7:
* Fixed Kconfig warnings ([email protected])
Changes v5 -> v6:
* Added missing COMMON_CLK selection ([email protected])
* Fixed WARN_ON always evaluated to true ([email protected])
* Clean interrupt status unconditionally (Joel)
* Replaced memcpy_toio()/memcpy_fromio() with writel()/readl() to
avoid issues when submitting unaligned PECI commands
Changes v4 -> v5:
* Added clk_aspeed_peci to express controller programming using common
clock framework (Billy)
* Modified peci-aspeed DTS schema to match clock changes (Billy)
* Added workaround for peci-aspeed controller hang (Billy)
* Removed unnecessary "else after return" (Guenter)
Changes v3 -> v4:
* Fixed an issue where peci doesn't work after host shutdown (Zev)
* Replaced kill_device() with peci_device_del_lock (Greg)
* Fixed dts_valid() parameter type (Guenter)
* Removed Jae from MAINTAINERS file (Jae)
Changes v2 -> v3:
* Dropped x86/cpu patches (Boris)
* Dropped pr_fmt() for PECI module (Dan)
* Fixed releasing peci controller device flow (Dan)
* Improved peci-aspeed commit-msg and Kconfig help (Dan)
* Fixed aspeed_peci_xfer() to use the proper spin_lock function (Dan)
* Wrapped print_hex_dump_bytes() in CONFIG_DYNAMIC_DEBUG (Dan)
* Removed debug status logs from aspeed_peci_irq_handler() (Dan)
* Renamed functions using devres to start with "devm" (Dan)
* Changed request to be allocated on stack in peci_detect (Dan)
* Removed redundant WARN_ON on invalid PECI addr (Dan)
* Changed peci_device_create() to use device_initialize() + device_add() pattern (Dan)
* Fixed peci_device_destroy() to use kill_device() avoiding double-free (Dan)
* Renamed functions that perform xfer using "peci_xfer_*" prefix (Dan)
* Renamed peci_request_data_dib(temp) -> peci_request_dib(temp)_read (Dan)
* Fixed thermal margin readings for older Intel processors (Zev)
* Misc hwmon simplifications (Guenter)
* Used BIT_PER_TYPE to verify macro value constrains (Guenter)
* Improved WARN_ON message to print chan_rank_max and idx_dimm_max (Guenter)
* Improved dimmtemp to not reattempt probe if no dimms are populated
Changes v1 -> v2:
Biggest changes when it comes to diffstat are locking in HWMON
(I decided to clean things up a bit while adding it), switching to
devres usage in more places and exposing sysfs interface in separate patch.
* Moved extending X86 ARCHITECTURE MAINTAINERS earlier in series (Dan)
* Removed "default n" for GENERIC_LIB_X86 (Dan)
* Added vendor prefix for peci-aspeed specific properties (Rob)
* Refactored PECI to use devres consistently (Dan)
* Added missing sysfs documentation and excluded adding peci-sysfs to
separate patch (Dan)
* Used module_init() instead of subsys_init() for peci module initialization (Dan)
* Removed redundant struct peci_device member (Dan)
* Improved PECI Kconfig help (Randy/Dan)
* Fixed/removed log messages (Dan, Guenter)
* Refactored peci-cputemp and peci-dimmtemp and added missing locks (Guenter)
* Removed unused dev_set_drvdata() in peci-cputemp and peci-dimmtemp (Guenter)
* Fixed used types, names, fixed broken and added additional comments
to peci-hwmon (Guenter, Zev)
* Refactored peci-dimmtemp to not return -ETIMEDOUT (Guenter)
* Added sanity check for min_peci_revision in peci-hwmon drivers (Zev)
* Added assert for DIMM_NUMS_MAX and additional warning in peci-dimmtemp (Zev)
* Fixed macro names in peci-aspeed (Zev)
* Refactored peci-aspeed sanitizing properties to a single helper function (Zev)
* Fixed peci_cpu_device_ids definition for Broadwell Xeon D (David)
* Refactor peci_request to use a single allocation (Zev)
* Used min_t() to improve code readability (Zev)
* Added macro for PECI_RDENDPTCFG_MMIO_WR_LEN_BASE and fixed adev type
array name to more descriptive (Zev)
* Fixed peci-hwmon commit-msg and documentation (Zev)
Thanks
-Iwona
Iwona Winiarska (11):
dt-bindings: Add generic bindings for PECI
dt-bindings: Add bindings for peci-aspeed
ARM: dts: aspeed: Add PECI controller nodes
peci: Add core infrastructure
peci: Add device detection
peci: Add sysfs interface for PECI bus
peci: Add support for PECI device drivers
peci: Add peci-cpu driver
hwmon: peci: Add cputemp driver
hwmon: peci: Add dimmtemp driver
docs: Add PECI documentation
Jae Hyun Yoo (2):
peci: Add peci-aspeed controller driver
docs: hwmon: Document PECI drivers
Documentation/ABI/testing/sysfs-bus-peci | 16 +
.../devicetree/bindings/peci/peci-aspeed.yaml | 72 ++
.../bindings/peci/peci-controller.yaml | 33 +
Documentation/hwmon/index.rst | 2 +
Documentation/hwmon/peci-cputemp.rst | 90 +++
Documentation/hwmon/peci-dimmtemp.rst | 57 ++
Documentation/index.rst | 1 +
Documentation/peci/index.rst | 16 +
Documentation/peci/peci.rst | 51 ++
MAINTAINERS | 26 +
arch/arm/boot/dts/aspeed-g4.dtsi | 11 +
arch/arm/boot/dts/aspeed-g5.dtsi | 11 +
arch/arm/boot/dts/aspeed-g6.dtsi | 11 +
drivers/Kconfig | 3 +
drivers/Makefile | 1 +
drivers/hwmon/Kconfig | 2 +
drivers/hwmon/Makefile | 1 +
drivers/hwmon/peci/Kconfig | 31 +
drivers/hwmon/peci/Makefile | 7 +
drivers/hwmon/peci/common.h | 58 ++
drivers/hwmon/peci/cputemp.c | 592 ++++++++++++++++
drivers/hwmon/peci/dimmtemp.c | 630 ++++++++++++++++++
drivers/peci/Kconfig | 36 +
drivers/peci/Makefile | 10 +
drivers/peci/controller/Kconfig | 18 +
drivers/peci/controller/Makefile | 3 +
drivers/peci/controller/peci-aspeed.c | 599 +++++++++++++++++
drivers/peci/core.c | 236 +++++++
drivers/peci/cpu.c | 343 ++++++++++
drivers/peci/device.c | 252 +++++++
drivers/peci/internal.h | 136 ++++
drivers/peci/request.c | 482 ++++++++++++++
drivers/peci/sysfs.c | 82 +++
include/linux/peci-cpu.h | 40 ++
include/linux/peci.h | 112 ++++
35 files changed, 4071 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-bus-peci
create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.yaml
create mode 100644 Documentation/devicetree/bindings/peci/peci-controller.yaml
create mode 100644 Documentation/hwmon/peci-cputemp.rst
create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
create mode 100644 Documentation/peci/index.rst
create mode 100644 Documentation/peci/peci.rst
create mode 100644 drivers/hwmon/peci/Kconfig
create mode 100644 drivers/hwmon/peci/Makefile
create mode 100644 drivers/hwmon/peci/common.h
create mode 100644 drivers/hwmon/peci/cputemp.c
create mode 100644 drivers/hwmon/peci/dimmtemp.c
create mode 100644 drivers/peci/Kconfig
create mode 100644 drivers/peci/Makefile
create mode 100644 drivers/peci/controller/Kconfig
create mode 100644 drivers/peci/controller/Makefile
create mode 100644 drivers/peci/controller/peci-aspeed.c
create mode 100644 drivers/peci/core.c
create mode 100644 drivers/peci/cpu.c
create mode 100644 drivers/peci/device.c
create mode 100644 drivers/peci/internal.h
create mode 100644 drivers/peci/request.c
create mode 100644 drivers/peci/sysfs.c
create mode 100644 include/linux/peci-cpu.h
create mode 100644 include/linux/peci.h
--
2.34.1
Add PECI controller nodes with all required information.
Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Joel Stanley <[email protected]>
---
arch/arm/boot/dts/aspeed-g4.dtsi | 11 +++++++++++
arch/arm/boot/dts/aspeed-g5.dtsi | 11 +++++++++++
arch/arm/boot/dts/aspeed-g6.dtsi | 11 +++++++++++
3 files changed, 33 insertions(+)
diff --git a/arch/arm/boot/dts/aspeed-g4.dtsi b/arch/arm/boot/dts/aspeed-g4.dtsi
index f14dace34c5a..fa8b581c3d6c 100644
--- a/arch/arm/boot/dts/aspeed-g4.dtsi
+++ b/arch/arm/boot/dts/aspeed-g4.dtsi
@@ -392,6 +392,17 @@ uart_routing: uart-routing@9c {
};
};
+ peci0: peci-controller@1e78b000 {
+ compatible = "aspeed,ast2400-peci";
+ reg = <0x1e78b000 0x60>;
+ interrupts = <15>;
+ clocks = <&syscon ASPEED_CLK_GATE_REFCLK>;
+ resets = <&syscon ASPEED_RESET_PECI>;
+ cmd-timeout-ms = <1000>;
+ clock-frequency = <1000000>;
+ status = "disabled";
+ };
+
uart2: serial@1e78d000 {
compatible = "ns16550a";
reg = <0x1e78d000 0x20>;
diff --git a/arch/arm/boot/dts/aspeed-g5.dtsi b/arch/arm/boot/dts/aspeed-g5.dtsi
index 7495f93c5069..4147b397c883 100644
--- a/arch/arm/boot/dts/aspeed-g5.dtsi
+++ b/arch/arm/boot/dts/aspeed-g5.dtsi
@@ -516,6 +516,17 @@ ibt: ibt@140 {
};
};
+ peci0: peci-controller@1e78b000 {
+ compatible = "aspeed,ast2500-peci";
+ reg = <0x1e78b000 0x60>;
+ interrupts = <15>;
+ clocks = <&syscon ASPEED_CLK_GATE_REFCLK>;
+ resets = <&syscon ASPEED_RESET_PECI>;
+ cmd-timeout-ms = <1000>;
+ clock-frequency = <1000000>;
+ status = "disabled";
+ };
+
uart2: serial@1e78d000 {
compatible = "ns16550a";
reg = <0x1e78d000 0x20>;
diff --git a/arch/arm/boot/dts/aspeed-g6.dtsi b/arch/arm/boot/dts/aspeed-g6.dtsi
index c32e87fad4dc..3d5ce9da42c3 100644
--- a/arch/arm/boot/dts/aspeed-g6.dtsi
+++ b/arch/arm/boot/dts/aspeed-g6.dtsi
@@ -512,6 +512,17 @@ wdt4: watchdog@1e7850c0 {
status = "disabled";
};
+ peci0: peci-controller@1e78b000 {
+ compatible = "aspeed,ast2600-peci";
+ reg = <0x1e78b000 0x100>;
+ interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&syscon ASPEED_CLK_GATE_REF0CLK>;
+ resets = <&syscon ASPEED_RESET_PECI>;
+ cmd-timeout-ms = <1000>;
+ clock-frequency = <1000000>;
+ status = "disabled";
+ };
+
lpc: lpc@1e789000 {
compatible = "aspeed,ast2600-lpc-v2", "simple-mfd", "syscon";
reg = <0x1e789000 0x1000>;
--
2.34.1
Add peci-dimmtemp driver for Temperature Sensor on DIMM readings that
are accessible via the processor PECI interface.
The main use case for the driver (and PECI interface) is out-of-band
management, where we're able to obtain thermal readings from an external
entity connected with PECI, e.g. BMC on server platforms.
Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
---
drivers/hwmon/peci/Kconfig | 13 +
drivers/hwmon/peci/Makefile | 2 +
drivers/hwmon/peci/dimmtemp.c | 630 ++++++++++++++++++++++++++++++++++
3 files changed, 645 insertions(+)
create mode 100644 drivers/hwmon/peci/dimmtemp.c
diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
index e10eed68d70a..9d32a57badfe 100644
--- a/drivers/hwmon/peci/Kconfig
+++ b/drivers/hwmon/peci/Kconfig
@@ -14,5 +14,18 @@ config SENSORS_PECI_CPUTEMP
This driver can also be built as a module. If so, the module
will be called peci-cputemp.
+config SENSORS_PECI_DIMMTEMP
+ tristate "PECI DIMM temperature monitoring client"
+ depends on PECI
+ select SENSORS_PECI
+ select PECI_CPU
+ help
+ If you say yes here you get support for the generic Intel PECI hwmon
+ driver which provides Temperature Sensor on DIMM readings that are
+ accessible via the processor PECI interface.
+
+ This driver can also be built as a module. If so, the module
+ will be called peci-dimmtemp.
+
config SENSORS_PECI
tristate
diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
index e8a0ada5ab1f..191cfa0227f3 100644
--- a/drivers/hwmon/peci/Makefile
+++ b/drivers/hwmon/peci/Makefile
@@ -1,5 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
peci-cputemp-y := cputemp.o
+peci-dimmtemp-y := dimmtemp.o
obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
+obj-$(CONFIG_SENSORS_PECI_DIMMTEMP) += peci-dimmtemp.o
diff --git a/drivers/hwmon/peci/dimmtemp.c b/drivers/hwmon/peci/dimmtemp.c
new file mode 100644
index 000000000000..c8222354c005
--- /dev/null
+++ b/drivers/hwmon/peci/dimmtemp.c
@@ -0,0 +1,630 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/auxiliary_bus.h>
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/hwmon.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/peci.h>
+#include <linux/peci-cpu.h>
+#include <linux/units.h>
+#include <linux/workqueue.h>
+
+#include "common.h"
+
+#define DIMM_MASK_CHECK_DELAY_JIFFIES msecs_to_jiffies(5000)
+
+/* Max number of channel ranks and DIMM index per channel */
+#define CHAN_RANK_MAX_ON_HSX 8
+#define DIMM_IDX_MAX_ON_HSX 3
+#define CHAN_RANK_MAX_ON_BDX 4
+#define DIMM_IDX_MAX_ON_BDX 3
+#define CHAN_RANK_MAX_ON_BDXD 2
+#define DIMM_IDX_MAX_ON_BDXD 2
+#define CHAN_RANK_MAX_ON_SKX 6
+#define DIMM_IDX_MAX_ON_SKX 2
+#define CHAN_RANK_MAX_ON_ICX 8
+#define DIMM_IDX_MAX_ON_ICX 2
+#define CHAN_RANK_MAX_ON_ICXD 4
+#define DIMM_IDX_MAX_ON_ICXD 2
+
+#define CHAN_RANK_MAX CHAN_RANK_MAX_ON_HSX
+#define DIMM_IDX_MAX DIMM_IDX_MAX_ON_HSX
+#define DIMM_NUMS_MAX (CHAN_RANK_MAX * DIMM_IDX_MAX)
+
+#define CPU_SEG_MASK GENMASK(23, 16)
+#define GET_CPU_SEG(x) (((x) & CPU_SEG_MASK) >> 16)
+#define CPU_BUS_MASK GENMASK(7, 0)
+#define GET_CPU_BUS(x) ((x) & CPU_BUS_MASK)
+
+#define DIMM_TEMP_MAX GENMASK(15, 8)
+#define DIMM_TEMP_CRIT GENMASK(23, 16)
+#define GET_TEMP_MAX(x) (((x) & DIMM_TEMP_MAX) >> 8)
+#define GET_TEMP_CRIT(x) (((x) & DIMM_TEMP_CRIT) >> 16)
+
+#define NO_DIMM_RETRY_COUNT_MAX 5
+
+struct peci_dimmtemp;
+
+struct dimm_info {
+ int chan_rank_max;
+ int dimm_idx_max;
+ u8 min_peci_revision;
+ int (*read_thresholds)(struct peci_dimmtemp *priv, int dimm_order,
+ int chan_rank, u32 *data);
+};
+
+struct peci_dimm_thresholds {
+ long temp_max;
+ long temp_crit;
+ struct peci_sensor_state state;
+};
+
+enum peci_dimm_threshold_type {
+ temp_max_type,
+ temp_crit_type,
+};
+
+struct peci_dimmtemp {
+ struct peci_device *peci_dev;
+ struct device *dev;
+ const char *name;
+ const struct dimm_info *gen_info;
+ struct delayed_work detect_work;
+ struct {
+ struct peci_sensor_data temp;
+ struct peci_dimm_thresholds thresholds;
+ } dimm[DIMM_NUMS_MAX];
+ char **dimmtemp_label;
+ DECLARE_BITMAP(dimm_mask, DIMM_NUMS_MAX);
+ u8 no_dimm_retry_count;
+};
+
+static u8 __dimm_temp(u32 reg, int dimm_order)
+{
+ return (reg >> (dimm_order * 8)) & 0xff;
+}
+
+static int get_dimm_temp(struct peci_dimmtemp *priv, int dimm_no, long *val)
+{
+ int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
+ int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
+ int ret = 0;
+ u32 data;
+
+ mutex_lock(&priv->dimm[dimm_no].temp.state.lock);
+ if (!peci_sensor_need_update(&priv->dimm[dimm_no].temp.state))
+ goto skip_update;
+
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank, &data);
+ if (ret)
+ goto unlock;
+
+ priv->dimm[dimm_no].temp.value = __dimm_temp(data, dimm_order) * MILLIDEGREE_PER_DEGREE;
+
+ peci_sensor_mark_updated(&priv->dimm[dimm_no].temp.state);
+
+skip_update:
+ *val = priv->dimm[dimm_no].temp.value;
+unlock:
+ mutex_unlock(&priv->dimm[dimm_no].temp.state.lock);
+ return ret;
+}
+
+static int update_thresholds(struct peci_dimmtemp *priv, int dimm_no)
+{
+ int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
+ int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
+ u32 data;
+ int ret;
+
+ if (!peci_sensor_need_update(&priv->dimm[dimm_no].thresholds.state))
+ return 0;
+
+ ret = priv->gen_info->read_thresholds(priv, dimm_order, chan_rank, &data);
+ if (ret == -ENODATA) /* Use default or previous value */
+ return 0;
+ if (ret)
+ return ret;
+
+ priv->dimm[dimm_no].thresholds.temp_max = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
+ priv->dimm[dimm_no].thresholds.temp_crit = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
+
+ peci_sensor_mark_updated(&priv->dimm[dimm_no].thresholds.state);
+
+ return 0;
+}
+
+static int get_dimm_thresholds(struct peci_dimmtemp *priv, enum peci_dimm_threshold_type type,
+ int dimm_no, long *val)
+{
+ int ret;
+
+ mutex_lock(&priv->dimm[dimm_no].thresholds.state.lock);
+ ret = update_thresholds(priv, dimm_no);
+ if (ret)
+ goto unlock;
+
+ switch (type) {
+ case temp_max_type:
+ *val = priv->dimm[dimm_no].thresholds.temp_max;
+ break;
+ case temp_crit_type:
+ *val = priv->dimm[dimm_no].thresholds.temp_crit;
+ break;
+ default:
+ ret = -EOPNOTSUPP;
+ break;
+ }
+unlock:
+ mutex_unlock(&priv->dimm[dimm_no].thresholds.state.lock);
+
+ return ret;
+}
+
+static int dimmtemp_read_string(struct device *dev,
+ enum hwmon_sensor_types type,
+ u32 attr, int channel, const char **str)
+{
+ struct peci_dimmtemp *priv = dev_get_drvdata(dev);
+
+ if (attr != hwmon_temp_label)
+ return -EOPNOTSUPP;
+
+ *str = (const char *)priv->dimmtemp_label[channel];
+
+ return 0;
+}
+
+static int dimmtemp_read(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, long *val)
+{
+ struct peci_dimmtemp *priv = dev_get_drvdata(dev);
+
+ switch (attr) {
+ case hwmon_temp_input:
+ return get_dimm_temp(priv, channel, val);
+ case hwmon_temp_max:
+ return get_dimm_thresholds(priv, temp_max_type, channel, val);
+ case hwmon_temp_crit:
+ return get_dimm_thresholds(priv, temp_crit_type, channel, val);
+ default:
+ break;
+ }
+
+ return -EOPNOTSUPP;
+}
+
+static umode_t dimmtemp_is_visible(const void *data, enum hwmon_sensor_types type,
+ u32 attr, int channel)
+{
+ const struct peci_dimmtemp *priv = data;
+
+ if (test_bit(channel, priv->dimm_mask))
+ return 0444;
+
+ return 0;
+}
+
+static const struct hwmon_ops peci_dimmtemp_ops = {
+ .is_visible = dimmtemp_is_visible,
+ .read_string = dimmtemp_read_string,
+ .read = dimmtemp_read,
+};
+
+static int check_populated_dimms(struct peci_dimmtemp *priv)
+{
+ int chan_rank_max = priv->gen_info->chan_rank_max;
+ int dimm_idx_max = priv->gen_info->dimm_idx_max;
+ u32 chan_rank_empty = 0;
+ u64 dimm_mask = 0;
+ int chan_rank, dimm_idx, ret;
+ u32 pcs;
+
+ BUILD_BUG_ON(BITS_PER_TYPE(chan_rank_empty) < CHAN_RANK_MAX);
+ BUILD_BUG_ON(BITS_PER_TYPE(dimm_mask) < DIMM_NUMS_MAX);
+ if (chan_rank_max * dimm_idx_max > DIMM_NUMS_MAX) {
+ WARN_ONCE(1, "Unsupported number of DIMMs - chan_rank_max: %d, dimm_idx_max: %d",
+ chan_rank_max, dimm_idx_max);
+ return -EINVAL;
+ }
+
+ for (chan_rank = 0; chan_rank < chan_rank_max; chan_rank++) {
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank, &pcs);
+ if (ret) {
+ /*
+ * Overall, we expect either success or -EINVAL in
+ * order to determine whether DIMM is populated or not.
+ * For anything else we fall back to deferring the
+ * detection to be performed at a later point in time.
+ */
+ if (ret == -EINVAL) {
+ chan_rank_empty |= BIT(chan_rank);
+ continue;
+ }
+
+ return -EAGAIN;
+ }
+
+ for (dimm_idx = 0; dimm_idx < dimm_idx_max; dimm_idx++)
+ if (__dimm_temp(pcs, dimm_idx))
+ dimm_mask |= BIT(chan_rank * dimm_idx_max + dimm_idx);
+ }
+
+ /*
+ * If we got all -EINVALs, it means that the CPU doesn't have any
+ * DIMMs. Unfortunately, it may also happen at the very start of
+ * host platform boot. Retrying a couple of times lets us make sure
+ * that the state is persistent.
+ */
+ if (chan_rank_empty == GENMASK(chan_rank_max - 1, 0)) {
+ if (priv->no_dimm_retry_count < NO_DIMM_RETRY_COUNT_MAX) {
+ priv->no_dimm_retry_count++;
+
+ return -EAGAIN;
+ }
+
+ return -ENODEV;
+ }
+
+ /*
+ * It's possible that memory training is not done yet. In this case we
+ * defer the detection to be performed at a later point in time.
+ */
+ if (!dimm_mask) {
+ priv->no_dimm_retry_count = 0;
+ return -EAGAIN;
+ }
+
+ dev_dbg(priv->dev, "Scanned populated DIMMs: %#llx\n", dimm_mask);
+
+ bitmap_from_u64(priv->dimm_mask, dimm_mask);
+
+ return 0;
+}
+
+static int create_dimm_temp_label(struct peci_dimmtemp *priv, int chan)
+{
+ int rank = chan / priv->gen_info->dimm_idx_max;
+ int idx = chan % priv->gen_info->dimm_idx_max;
+
+ priv->dimmtemp_label[chan] = devm_kasprintf(priv->dev, GFP_KERNEL,
+ "DIMM %c%d", 'A' + rank,
+ idx + 1);
+ if (!priv->dimmtemp_label[chan])
+ return -ENOMEM;
+
+ return 0;
+}
+
+static const u32 peci_dimmtemp_temp_channel_config[] = {
+ [0 ... DIMM_NUMS_MAX - 1] = HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT,
+ 0
+};
+
+static const struct hwmon_channel_info peci_dimmtemp_temp_channel = {
+ .type = hwmon_temp,
+ .config = peci_dimmtemp_temp_channel_config,
+};
+
+static const struct hwmon_channel_info *peci_dimmtemp_temp_info[] = {
+ &peci_dimmtemp_temp_channel,
+ NULL
+};
+
+static const struct hwmon_chip_info peci_dimmtemp_chip_info = {
+ .ops = &peci_dimmtemp_ops,
+ .info = peci_dimmtemp_temp_info,
+};
+
+static int create_dimm_temp_info(struct peci_dimmtemp *priv)
+{
+ int ret, i, channels;
+ struct device *dev;
+
+ /*
+ * We expect to either find populated DIMMs and carry on with creating
+ * sensors, or find out that there are no DIMMs populated.
+ * All other states mean that the platform never reached the state that
+ * allows to check DIMM state - causing us to retry later on.
+ */
+ ret = check_populated_dimms(priv);
+ if (ret == -ENODEV) {
+ dev_dbg(priv->dev, "No DIMMs found\n");
+ return 0;
+ } else if (ret) {
+ schedule_delayed_work(&priv->detect_work, DIMM_MASK_CHECK_DELAY_JIFFIES);
+ dev_dbg(priv->dev, "Deferred populating DIMM temp info\n");
+ return ret;
+ }
+
+ channels = priv->gen_info->chan_rank_max * priv->gen_info->dimm_idx_max;
+
+ priv->dimmtemp_label = devm_kzalloc(priv->dev, channels * sizeof(char *), GFP_KERNEL);
+ if (!priv->dimmtemp_label)
+ return -ENOMEM;
+
+ for_each_set_bit(i, priv->dimm_mask, DIMM_NUMS_MAX) {
+ ret = create_dimm_temp_label(priv, i);
+ if (ret)
+ return ret;
+ mutex_init(&priv->dimm[i].thresholds.state.lock);
+ mutex_init(&priv->dimm[i].temp.state.lock);
+ }
+
+ dev = devm_hwmon_device_register_with_info(priv->dev, priv->name, priv,
+ &peci_dimmtemp_chip_info, NULL);
+ if (IS_ERR(dev)) {
+ dev_err(priv->dev, "Failed to register hwmon device\n");
+ return PTR_ERR(dev);
+ }
+
+ dev_dbg(priv->dev, "%s: sensor '%s'\n", dev_name(dev), priv->name);
+
+ return 0;
+}
+
+static void create_dimm_temp_info_delayed(struct work_struct *work)
+{
+ struct peci_dimmtemp *priv = container_of(to_delayed_work(work),
+ struct peci_dimmtemp,
+ detect_work);
+ int ret;
+
+ ret = create_dimm_temp_info(priv);
+ if (ret && ret != -EAGAIN)
+ dev_err(priv->dev, "Failed to populate DIMM temp info\n");
+}
+
+static void remove_delayed_work(void *_priv)
+{
+ struct peci_dimmtemp *priv = _priv;
+
+ cancel_delayed_work_sync(&priv->detect_work);
+}
+
+static int peci_dimmtemp_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
+{
+ struct device *dev = &adev->dev;
+ struct peci_device *peci_dev = to_peci_device(dev->parent);
+ struct peci_dimmtemp *priv;
+ int ret;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_dimmtemp.cpu%d",
+ peci_dev->info.socket_id);
+ if (!priv->name)
+ return -ENOMEM;
+
+ priv->dev = dev;
+ priv->peci_dev = peci_dev;
+ priv->gen_info = (const struct dimm_info *)id->driver_data;
+
+ /*
+ * This is just a sanity check. Since we're using commands that are
+ * guaranteed to be supported on a given platform, we should never see
+ * revision lower than expected.
+ */
+ if (peci_dev->info.peci_revision < priv->gen_info->min_peci_revision)
+ dev_warn(priv->dev,
+ "Unexpected PECI revision %#x, some features may be unavailable\n",
+ peci_dev->info.peci_revision);
+
+ INIT_DELAYED_WORK(&priv->detect_work, create_dimm_temp_info_delayed);
+
+ ret = devm_add_action_or_reset(priv->dev, remove_delayed_work, priv);
+ if (ret)
+ return ret;
+
+ ret = create_dimm_temp_info(priv);
+ if (ret && ret != -EAGAIN) {
+ dev_err(dev, "Failed to populate DIMM temp info\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+static int
+read_thresholds_hsx(struct peci_dimmtemp *priv, int dimm_order, int chan_rank, u32 *data)
+{
+ u8 dev, func;
+ u16 reg;
+ int ret;
+
+ /*
+ * Device 20, Function 0: IMC 0 channel 0 -> rank 0
+ * Device 20, Function 1: IMC 0 channel 1 -> rank 1
+ * Device 21, Function 0: IMC 0 channel 2 -> rank 2
+ * Device 21, Function 1: IMC 0 channel 3 -> rank 3
+ * Device 23, Function 0: IMC 1 channel 0 -> rank 4
+ * Device 23, Function 1: IMC 1 channel 1 -> rank 5
+ * Device 24, Function 0: IMC 1 channel 2 -> rank 6
+ * Device 24, Function 1: IMC 1 channel 3 -> rank 7
+ */
+ dev = 20 + chan_rank / 2 + chan_rank / 4;
+ func = chan_rank % 2;
+ reg = 0x120 + dimm_order * 4;
+
+ ret = peci_pci_local_read(priv->peci_dev, 1, dev, func, reg, data);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int
+read_thresholds_bdxd(struct peci_dimmtemp *priv, int dimm_order, int chan_rank, u32 *data)
+{
+ u8 dev, func;
+ u16 reg;
+ int ret;
+
+ /*
+ * Device 10, Function 2: IMC 0 channel 0 -> rank 0
+ * Device 10, Function 6: IMC 0 channel 1 -> rank 1
+ * Device 12, Function 2: IMC 1 channel 0 -> rank 2
+ * Device 12, Function 6: IMC 1 channel 1 -> rank 3
+ */
+ dev = 10 + chan_rank / 2 * 2;
+ func = (chan_rank % 2) ? 6 : 2;
+ reg = 0x120 + dimm_order * 4;
+
+ ret = peci_pci_local_read(priv->peci_dev, 2, dev, func, reg, data);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int
+read_thresholds_skx(struct peci_dimmtemp *priv, int dimm_order, int chan_rank, u32 *data)
+{
+ u8 dev, func;
+ u16 reg;
+ int ret;
+
+ /*
+ * Device 10, Function 2: IMC 0 channel 0 -> rank 0
+ * Device 10, Function 6: IMC 0 channel 1 -> rank 1
+ * Device 11, Function 2: IMC 0 channel 2 -> rank 2
+ * Device 12, Function 2: IMC 1 channel 0 -> rank 3
+ * Device 12, Function 6: IMC 1 channel 1 -> rank 4
+ * Device 13, Function 2: IMC 1 channel 2 -> rank 5
+ */
+ dev = 10 + chan_rank / 3 * 2 + (chan_rank % 3 == 2 ? 1 : 0);
+ func = chan_rank % 3 == 1 ? 6 : 2;
+ reg = 0x120 + dimm_order * 4;
+
+ ret = peci_pci_local_read(priv->peci_dev, 2, dev, func, reg, data);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int
+read_thresholds_icx(struct peci_dimmtemp *priv, int dimm_order, int chan_rank, u32 *data)
+{
+ u32 reg_val;
+ u64 offset;
+ int ret;
+ u8 dev;
+
+ ret = peci_ep_pci_local_read(priv->peci_dev, 0, 13, 0, 2, 0xd4, ®_val);
+ if (ret || !(reg_val & BIT(31)))
+ return -ENODATA; /* Use default or previous value */
+
+ ret = peci_ep_pci_local_read(priv->peci_dev, 0, 13, 0, 2, 0xd0, ®_val);
+ if (ret)
+ return -ENODATA; /* Use default or previous value */
+
+ /*
+ * Device 26, Offset 224e0: IMC 0 channel 0 -> rank 0
+ * Device 26, Offset 264e0: IMC 0 channel 1 -> rank 1
+ * Device 27, Offset 224e0: IMC 1 channel 0 -> rank 2
+ * Device 27, Offset 264e0: IMC 1 channel 1 -> rank 3
+ * Device 28, Offset 224e0: IMC 2 channel 0 -> rank 4
+ * Device 28, Offset 264e0: IMC 2 channel 1 -> rank 5
+ * Device 29, Offset 224e0: IMC 3 channel 0 -> rank 6
+ * Device 29, Offset 264e0: IMC 3 channel 1 -> rank 7
+ */
+ dev = 26 + chan_rank / 2;
+ offset = 0x224e0 + dimm_order * 4 + (chan_rank % 2) * 0x4000;
+
+ ret = peci_mmio_read(priv->peci_dev, 0, GET_CPU_SEG(reg_val), GET_CPU_BUS(reg_val),
+ dev, 0, offset, data);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static const struct dimm_info dimm_hsx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_HSX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_HSX,
+ .min_peci_revision = 0x33,
+ .read_thresholds = &read_thresholds_hsx,
+};
+
+static const struct dimm_info dimm_bdx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_BDX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_BDX,
+ .min_peci_revision = 0x33,
+ .read_thresholds = &read_thresholds_hsx,
+};
+
+static const struct dimm_info dimm_bdxd = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_BDXD,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_BDXD,
+ .min_peci_revision = 0x33,
+ .read_thresholds = &read_thresholds_bdxd,
+};
+
+static const struct dimm_info dimm_skx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_SKX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_SKX,
+ .min_peci_revision = 0x33,
+ .read_thresholds = &read_thresholds_skx,
+};
+
+static const struct dimm_info dimm_icx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_ICX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_ICX,
+ .min_peci_revision = 0x40,
+ .read_thresholds = &read_thresholds_icx,
+};
+
+static const struct dimm_info dimm_icxd = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_ICXD,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_ICXD,
+ .min_peci_revision = 0x40,
+ .read_thresholds = &read_thresholds_icx,
+};
+
+static const struct auxiliary_device_id peci_dimmtemp_ids[] = {
+ {
+ .name = "peci_cpu.dimmtemp.hsx",
+ .driver_data = (kernel_ulong_t)&dimm_hsx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.bdx",
+ .driver_data = (kernel_ulong_t)&dimm_bdx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.bdxd",
+ .driver_data = (kernel_ulong_t)&dimm_bdxd,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.skx",
+ .driver_data = (kernel_ulong_t)&dimm_skx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.icx",
+ .driver_data = (kernel_ulong_t)&dimm_icx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.icxd",
+ .driver_data = (kernel_ulong_t)&dimm_icxd,
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(auxiliary, peci_dimmtemp_ids);
+
+static struct auxiliary_driver peci_dimmtemp_driver = {
+ .probe = peci_dimmtemp_probe,
+ .id_table = peci_dimmtemp_ids,
+};
+
+module_auxiliary_driver(peci_dimmtemp_driver);
+
+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI dimmtemp driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(PECI_CPU);
--
2.34.1
Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
readings of the processor package and processor cores that are
accessible via the PECI interface.
The main use case for the driver (and PECI interface) is out-of-band
management, where we're able to obtain the DTS readings from an external
entity connected with PECI, e.g. BMC on server platforms.
Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
---
MAINTAINERS | 6 +
drivers/hwmon/Kconfig | 2 +
drivers/hwmon/Makefile | 1 +
drivers/hwmon/peci/Kconfig | 18 ++
drivers/hwmon/peci/Makefile | 5 +
drivers/hwmon/peci/common.h | 58 ++++
drivers/hwmon/peci/cputemp.c | 592 +++++++++++++++++++++++++++++++++++
7 files changed, 682 insertions(+)
create mode 100644 drivers/hwmon/peci/Kconfig
create mode 100644 drivers/hwmon/peci/Makefile
create mode 100644 drivers/hwmon/peci/common.h
create mode 100644 drivers/hwmon/peci/cputemp.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 503d3e636263..b2d0b0b58da9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15109,6 +15109,12 @@ L: [email protected]
S: Maintained
F: drivers/platform/x86/peaq-wmi.c
+PECI HARDWARE MONITORING DRIVERS
+M: Iwona Winiarska <[email protected]>
+L: [email protected]
+S: Supported
+F: drivers/hwmon/peci/
+
PECI SUBSYSTEM
M: Iwona Winiarska <[email protected]>
L: [email protected] (moderated for non-subscribers)
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 8df25f1079ba..af6c3bd65ebd 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1538,6 +1538,8 @@ config SENSORS_PCF8591
These devices are hard to detect and rarely found on mainstream
hardware. If unsure, say N.
+source "drivers/hwmon/peci/Kconfig"
+
source "drivers/hwmon/pmbus/Kconfig"
config SENSORS_PWM_FAN
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index 185f946d698b..6139e5a5aa00 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -208,6 +208,7 @@ obj-$(CONFIG_SENSORS_WM8350) += wm8350-hwmon.o
obj-$(CONFIG_SENSORS_XGENE) += xgene-hwmon.o
obj-$(CONFIG_SENSORS_OCC) += occ/
+obj-$(CONFIG_SENSORS_PECI) += peci/
obj-$(CONFIG_PMBUS) += pmbus/
ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
new file mode 100644
index 000000000000..e10eed68d70a
--- /dev/null
+++ b/drivers/hwmon/peci/Kconfig
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config SENSORS_PECI_CPUTEMP
+ tristate "PECI CPU temperature monitoring client"
+ depends on PECI
+ select SENSORS_PECI
+ select PECI_CPU
+ help
+ If you say yes here you get support for the generic Intel PECI
+ cputemp driver which provides Digital Thermal Sensor (DTS) thermal
+ readings of the CPU package and CPU cores that are accessible via
+ the processor PECI interface.
+
+ This driver can also be built as a module. If so, the module
+ will be called peci-cputemp.
+
+config SENSORS_PECI
+ tristate
diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
new file mode 100644
index 000000000000..e8a0ada5ab1f
--- /dev/null
+++ b/drivers/hwmon/peci/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+peci-cputemp-y := cputemp.o
+
+obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
new file mode 100644
index 000000000000..734506b0eca2
--- /dev/null
+++ b/drivers/hwmon/peci/common.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2021 Intel Corporation */
+
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+#ifndef __PECI_HWMON_COMMON_H
+#define __PECI_HWMON_COMMON_H
+
+#define PECI_HWMON_UPDATE_INTERVAL HZ
+
+/**
+ * struct peci_sensor_state - PECI state information
+ * @valid: flag to indicate the sensor value is valid
+ * @last_updated: time of the last update in jiffies
+ * @lock: mutex to protect sensor access
+ */
+struct peci_sensor_state {
+ bool valid;
+ unsigned long last_updated;
+ struct mutex lock; /* protect sensor access */
+};
+
+/**
+ * struct peci_sensor_data - PECI sensor information
+ * @value: sensor value in milli units
+ * @state: sensor update state
+ */
+
+struct peci_sensor_data {
+ s32 value;
+ struct peci_sensor_state state;
+};
+
+/**
+ * peci_sensor_need_update() - check whether sensor update is needed or not
+ * @sensor: pointer to sensor data struct
+ *
+ * Return: true if update is needed, false if not.
+ */
+
+static inline bool peci_sensor_need_update(struct peci_sensor_state *state)
+{
+ return !state->valid ||
+ time_after(jiffies, state->last_updated + PECI_HWMON_UPDATE_INTERVAL);
+}
+
+/**
+ * peci_sensor_mark_updated() - mark the sensor is updated
+ * @sensor: pointer to sensor data struct
+ */
+static inline void peci_sensor_mark_updated(struct peci_sensor_state *state)
+{
+ state->valid = true;
+ state->last_updated = jiffies;
+}
+
+#endif /* __PECI_HWMON_COMMON_H */
diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
new file mode 100644
index 000000000000..12156328f5cf
--- /dev/null
+++ b/drivers/hwmon/peci/cputemp.c
@@ -0,0 +1,592 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/auxiliary_bus.h>
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/hwmon.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/peci.h>
+#include <linux/peci-cpu.h>
+#include <linux/units.h>
+
+#include "common.h"
+
+#define CORE_NUMS_MAX 64
+
+#define BASE_CHANNEL_NUMS 5
+#define CPUTEMP_CHANNEL_NUMS (BASE_CHANNEL_NUMS + CORE_NUMS_MAX)
+
+#define TEMP_TARGET_FAN_TEMP_MASK GENMASK(15, 8)
+#define TEMP_TARGET_REF_TEMP_MASK GENMASK(23, 16)
+#define TEMP_TARGET_TJ_OFFSET_MASK GENMASK(29, 24)
+
+#define DTS_MARGIN_MASK GENMASK(15, 0)
+#define PCS_MODULE_TEMP_MASK GENMASK(15, 0)
+
+struct resolved_cores_reg {
+ u8 bus;
+ u8 dev;
+ u8 func;
+ u8 offset;
+};
+
+struct cpu_info {
+ struct resolved_cores_reg *reg;
+ u8 min_peci_revision;
+ s32 (*thermal_margin_to_millidegree)(u16 val);
+};
+
+struct peci_temp_target {
+ s32 tcontrol;
+ s32 tthrottle;
+ s32 tjmax;
+ struct peci_sensor_state state;
+};
+
+enum peci_temp_target_type {
+ tcontrol_type,
+ tthrottle_type,
+ tjmax_type,
+ crit_hyst_type,
+};
+
+struct peci_cputemp {
+ struct peci_device *peci_dev;
+ struct device *dev;
+ const char *name;
+ const struct cpu_info *gen_info;
+ struct {
+ struct peci_temp_target target;
+ struct peci_sensor_data die;
+ struct peci_sensor_data dts;
+ struct peci_sensor_data core[CORE_NUMS_MAX];
+ } temp;
+ const char **coretemp_label;
+ DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
+};
+
+enum cputemp_channels {
+ channel_die,
+ channel_dts,
+ channel_tcontrol,
+ channel_tthrottle,
+ channel_tjmax,
+ channel_core,
+};
+
+static const char * const cputemp_label[BASE_CHANNEL_NUMS] = {
+ "Die",
+ "DTS",
+ "Tcontrol",
+ "Tthrottle",
+ "Tjmax",
+};
+
+static int update_temp_target(struct peci_cputemp *priv)
+{
+ s32 tthrottle_offset, tcontrol_margin;
+ u32 pcs;
+ int ret;
+
+ if (!peci_sensor_need_update(&priv->temp.target.state))
+ return 0;
+
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
+ if (ret)
+ return ret;
+
+ priv->temp.target.tjmax =
+ FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
+
+ tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
+ tcontrol_margin = sign_extend32(tcontrol_margin, 7) * MILLIDEGREE_PER_DEGREE;
+ priv->temp.target.tcontrol = priv->temp.target.tjmax - tcontrol_margin;
+
+ tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
+ priv->temp.target.tthrottle = priv->temp.target.tjmax - tthrottle_offset;
+
+ peci_sensor_mark_updated(&priv->temp.target.state);
+
+ return 0;
+}
+
+static int get_temp_target(struct peci_cputemp *priv, enum peci_temp_target_type type, long *val)
+{
+ int ret;
+
+ mutex_lock(&priv->temp.target.state.lock);
+
+ ret = update_temp_target(priv);
+ if (ret)
+ goto unlock;
+
+ switch (type) {
+ case tcontrol_type:
+ *val = priv->temp.target.tcontrol;
+ break;
+ case tthrottle_type:
+ *val = priv->temp.target.tthrottle;
+ break;
+ case tjmax_type:
+ *val = priv->temp.target.tjmax;
+ break;
+ case crit_hyst_type:
+ *val = priv->temp.target.tjmax - priv->temp.target.tcontrol;
+ break;
+ default:
+ ret = -EOPNOTSUPP;
+ break;
+ }
+unlock:
+ mutex_unlock(&priv->temp.target.state.lock);
+
+ return ret;
+}
+
+/*
+ * Error codes:
+ * 0x8000: General sensor error
+ * 0x8001: Reserved
+ * 0x8002: Underflow on reading value
+ * 0x8003-0x81ff: Reserved
+ */
+static bool dts_valid(u16 val)
+{
+ return val < 0x8000 || val > 0x81ff;
+}
+
+/*
+ * Processors return a value of DTS reading in S10.6 fixed point format
+ * (16 bits: 10-bit signed magnitude, 6-bit fraction).
+ */
+static s32 dts_ten_dot_six_to_millidegree(u16 val)
+{
+ return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / 64;
+}
+
+/*
+ * For older processors, thermal margin reading is returned in S8.8 fixed
+ * point format (16 bits: 8-bit signed magnitude, 8-bit fraction).
+ */
+static s32 dts_eight_dot_eight_to_millidegree(u16 val)
+{
+ return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / 256;
+}
+
+static int get_die_temp(struct peci_cputemp *priv, long *val)
+{
+ int ret = 0;
+ long tjmax;
+ u16 temp;
+
+ mutex_lock(&priv->temp.die.state.lock);
+ if (!peci_sensor_need_update(&priv->temp.die.state))
+ goto skip_update;
+
+ ret = peci_temp_read(priv->peci_dev, &temp);
+ if (ret)
+ goto err_unlock;
+
+ if (!dts_valid(temp)) {
+ ret = -EIO;
+ goto err_unlock;
+ }
+
+ ret = get_temp_target(priv, tjmax_type, &tjmax);
+ if (ret)
+ goto err_unlock;
+
+ priv->temp.die.value = (s32)tjmax + dts_ten_dot_six_to_millidegree(temp);
+
+ peci_sensor_mark_updated(&priv->temp.die.state);
+
+skip_update:
+ *val = priv->temp.die.value;
+err_unlock:
+ mutex_unlock(&priv->temp.die.state.lock);
+ return ret;
+}
+
+static int get_dts(struct peci_cputemp *priv, long *val)
+{
+ int ret = 0;
+ u16 thermal_margin;
+ long tcontrol;
+ u32 pcs;
+
+ mutex_lock(&priv->temp.dts.state.lock);
+ if (!peci_sensor_need_update(&priv->temp.dts.state))
+ goto skip_update;
+
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_THERMAL_MARGIN, 0, &pcs);
+ if (ret)
+ goto err_unlock;
+
+ thermal_margin = FIELD_GET(DTS_MARGIN_MASK, pcs);
+ if (!dts_valid(thermal_margin)) {
+ ret = -EIO;
+ goto err_unlock;
+ }
+
+ ret = get_temp_target(priv, tcontrol_type, &tcontrol);
+ if (ret)
+ goto err_unlock;
+
+ /* Note that the tcontrol should be available before calling it */
+ priv->temp.dts.value =
+ (s32)tcontrol - priv->gen_info->thermal_margin_to_millidegree(thermal_margin);
+
+ peci_sensor_mark_updated(&priv->temp.dts.state);
+
+skip_update:
+ *val = priv->temp.dts.value;
+err_unlock:
+ mutex_unlock(&priv->temp.dts.state.lock);
+ return ret;
+}
+
+static int get_core_temp(struct peci_cputemp *priv, int core_index, long *val)
+{
+ int ret = 0;
+ u16 core_dts_margin;
+ long tjmax;
+ u32 pcs;
+
+ mutex_lock(&priv->temp.core[core_index].state.lock);
+ if (!peci_sensor_need_update(&priv->temp.core[core_index].state))
+ goto skip_update;
+
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_MODULE_TEMP, core_index, &pcs);
+ if (ret)
+ goto err_unlock;
+
+ core_dts_margin = FIELD_GET(PCS_MODULE_TEMP_MASK, pcs);
+ if (!dts_valid(core_dts_margin)) {
+ ret = -EIO;
+ goto err_unlock;
+ }
+
+ ret = get_temp_target(priv, tjmax_type, &tjmax);
+ if (ret)
+ goto err_unlock;
+
+ /* Note that the tjmax should be available before calling it */
+ priv->temp.core[core_index].value =
+ (s32)tjmax + dts_ten_dot_six_to_millidegree(core_dts_margin);
+
+ peci_sensor_mark_updated(&priv->temp.core[core_index].state);
+
+skip_update:
+ *val = priv->temp.core[core_index].value;
+err_unlock:
+ mutex_unlock(&priv->temp.core[core_index].state.lock);
+ return ret;
+}
+
+static int cputemp_read_string(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, const char **str)
+{
+ struct peci_cputemp *priv = dev_get_drvdata(dev);
+
+ if (attr != hwmon_temp_label)
+ return -EOPNOTSUPP;
+
+ *str = channel < channel_core ?
+ cputemp_label[channel] : priv->coretemp_label[channel - channel_core];
+
+ return 0;
+}
+
+static int cputemp_read(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, long *val)
+{
+ struct peci_cputemp *priv = dev_get_drvdata(dev);
+
+ switch (attr) {
+ case hwmon_temp_input:
+ switch (channel) {
+ case channel_die:
+ return get_die_temp(priv, val);
+ case channel_dts:
+ return get_dts(priv, val);
+ case channel_tcontrol:
+ return get_temp_target(priv, tcontrol_type, val);
+ case channel_tthrottle:
+ return get_temp_target(priv, tthrottle_type, val);
+ case channel_tjmax:
+ return get_temp_target(priv, tjmax_type, val);
+ default:
+ return get_core_temp(priv, channel - channel_core, val);
+ }
+ break;
+ case hwmon_temp_max:
+ return get_temp_target(priv, tcontrol_type, val);
+ case hwmon_temp_crit:
+ return get_temp_target(priv, tjmax_type, val);
+ case hwmon_temp_crit_hyst:
+ return get_temp_target(priv, crit_hyst_type, val);
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static umode_t cputemp_is_visible(const void *data, enum hwmon_sensor_types type,
+ u32 attr, int channel)
+{
+ const struct peci_cputemp *priv = data;
+
+ if (channel > CPUTEMP_CHANNEL_NUMS)
+ return 0;
+
+ if (channel < channel_core)
+ return 0444;
+
+ if (test_bit(channel - channel_core, priv->core_mask))
+ return 0444;
+
+ return 0;
+}
+
+static int init_core_mask(struct peci_cputemp *priv)
+{
+ struct peci_device *peci_dev = priv->peci_dev;
+ struct resolved_cores_reg *reg = priv->gen_info->reg;
+ u64 core_mask;
+ u32 data;
+ int ret;
+
+ /* Get the RESOLVED_CORES register value */
+ switch (peci_dev->info.model) {
+ case INTEL_FAM6_ICELAKE_X:
+ case INTEL_FAM6_ICELAKE_D:
+ ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
+ reg->func, reg->offset + 4, &data);
+ if (ret)
+ return ret;
+
+ core_mask = (u64)data << 32;
+
+ ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
+ reg->func, reg->offset, &data);
+ if (ret)
+ return ret;
+
+ core_mask |= data;
+
+ break;
+ default:
+ ret = peci_pci_local_read(peci_dev, reg->bus, reg->dev,
+ reg->func, reg->offset, &data);
+ if (ret)
+ return ret;
+
+ core_mask = data;
+
+ break;
+ }
+
+ if (!core_mask)
+ return -EIO;
+
+ bitmap_from_u64(priv->core_mask, core_mask);
+
+ return 0;
+}
+
+static int create_temp_label(struct peci_cputemp *priv)
+{
+ unsigned long core_max = find_last_bit(priv->core_mask, CORE_NUMS_MAX);
+ int i;
+
+ priv->coretemp_label = devm_kzalloc(priv->dev, core_max * sizeof(char *), GFP_KERNEL);
+ if (!priv->coretemp_label)
+ return -ENOMEM;
+
+ for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
+ priv->coretemp_label[i] = devm_kasprintf(priv->dev, GFP_KERNEL, "Core %d", i);
+ if (!priv->coretemp_label[i])
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static void check_resolved_cores(struct peci_cputemp *priv)
+{
+ /*
+ * Failure to resolve cores is non-critical, we're still able to
+ * provide other sensor data.
+ */
+
+ if (init_core_mask(priv))
+ return;
+
+ if (create_temp_label(priv))
+ bitmap_zero(priv->core_mask, CORE_NUMS_MAX);
+}
+
+static void sensor_init(struct peci_cputemp *priv)
+{
+ int i;
+
+ mutex_init(&priv->temp.target.state.lock);
+ mutex_init(&priv->temp.die.state.lock);
+ mutex_init(&priv->temp.dts.state.lock);
+
+ for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX)
+ mutex_init(&priv->temp.core[i].state.lock);
+}
+
+static const struct hwmon_ops peci_cputemp_ops = {
+ .is_visible = cputemp_is_visible,
+ .read_string = cputemp_read_string,
+ .read = cputemp_read,
+};
+
+static const u32 peci_cputemp_temp_channel_config[] = {
+ /* Die temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
+ /* DTS margin */
+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
+ /* Tcontrol temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
+ /* Tthrottle temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT,
+ /* Tjmax temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT,
+ /* Core temperature - for all core channels */
+ [channel_core ... CPUTEMP_CHANNEL_NUMS - 1] = HWMON_T_LABEL | HWMON_T_INPUT,
+ 0
+};
+
+static const struct hwmon_channel_info peci_cputemp_temp_channel = {
+ .type = hwmon_temp,
+ .config = peci_cputemp_temp_channel_config,
+};
+
+static const struct hwmon_channel_info *peci_cputemp_info[] = {
+ &peci_cputemp_temp_channel,
+ NULL
+};
+
+static const struct hwmon_chip_info peci_cputemp_chip_info = {
+ .ops = &peci_cputemp_ops,
+ .info = peci_cputemp_info,
+};
+
+static int peci_cputemp_probe(struct auxiliary_device *adev,
+ const struct auxiliary_device_id *id)
+{
+ struct device *dev = &adev->dev;
+ struct peci_device *peci_dev = to_peci_device(dev->parent);
+ struct peci_cputemp *priv;
+ struct device *hwmon_dev;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_cputemp.cpu%d",
+ peci_dev->info.socket_id);
+ if (!priv->name)
+ return -ENOMEM;
+
+ priv->dev = dev;
+ priv->peci_dev = peci_dev;
+ priv->gen_info = (const struct cpu_info *)id->driver_data;
+
+ /*
+ * This is just a sanity check. Since we're using commands that are
+ * guaranteed to be supported on a given platform, we should never see
+ * revision lower than expected.
+ */
+ if (peci_dev->info.peci_revision < priv->gen_info->min_peci_revision)
+ dev_warn(priv->dev,
+ "Unexpected PECI revision %#x, some features may be unavailable\n",
+ peci_dev->info.peci_revision);
+
+ check_resolved_cores(priv);
+
+ sensor_init(priv);
+
+ hwmon_dev = devm_hwmon_device_register_with_info(priv->dev, priv->name,
+ priv, &peci_cputemp_chip_info, NULL);
+
+ return PTR_ERR_OR_ZERO(hwmon_dev);
+}
+
+/*
+ * RESOLVED_CORES PCI configuration register may have different location on
+ * different platforms.
+ */
+static struct resolved_cores_reg resolved_cores_reg_hsx = {
+ .bus = 1,
+ .dev = 30,
+ .func = 3,
+ .offset = 0xb4,
+};
+
+static struct resolved_cores_reg resolved_cores_reg_icx = {
+ .bus = 14,
+ .dev = 30,
+ .func = 3,
+ .offset = 0xd0,
+};
+
+static const struct cpu_info cpu_hsx = {
+ .reg = &resolved_cores_reg_hsx,
+ .min_peci_revision = 0x33,
+ .thermal_margin_to_millidegree = &dts_eight_dot_eight_to_millidegree,
+};
+
+static const struct cpu_info cpu_icx = {
+ .reg = &resolved_cores_reg_icx,
+ .min_peci_revision = 0x40,
+ .thermal_margin_to_millidegree = &dts_ten_dot_six_to_millidegree,
+};
+
+static const struct auxiliary_device_id peci_cputemp_ids[] = {
+ {
+ .name = "peci_cpu.cputemp.hsx",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.bdx",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.bdxd",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.skx",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.icx",
+ .driver_data = (kernel_ulong_t)&cpu_icx,
+ },
+ {
+ .name = "peci_cpu.cputemp.icxd",
+ .driver_data = (kernel_ulong_t)&cpu_icx,
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(auxiliary, peci_cputemp_ids);
+
+static struct auxiliary_driver peci_cputemp_driver = {
+ .probe = peci_cputemp_probe,
+ .id_table = peci_cputemp_ids,
+};
+
+module_auxiliary_driver(peci_cputemp_driver);
+
+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI cputemp driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(PECI_CPU);
--
2.34.1
Intel processors provide access for various services designed to support
processor and DRAM thermal management, platform manageability and
processor interface tuning and diagnostics.
Those services are available via the Platform Environment Control
Interface (PECI) that provides a communication channel between the
processor and the Baseboard Management Controller (BMC) or other
platform management device.
This change introduces PECI subsystem by adding the initial core module
and API for controller drivers.
Co-developed-by: Jason M Bills <[email protected]>
Signed-off-by: Jason M Bills <[email protected]>
Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
MAINTAINERS | 8 ++
drivers/Kconfig | 3 +
drivers/Makefile | 1 +
drivers/peci/Kconfig | 15 ++++
drivers/peci/Makefile | 5 ++
drivers/peci/core.c | 158 ++++++++++++++++++++++++++++++++++++++++
drivers/peci/internal.h | 16 ++++
include/linux/peci.h | 99 +++++++++++++++++++++++++
8 files changed, 305 insertions(+)
create mode 100644 drivers/peci/Kconfig
create mode 100644 drivers/peci/Makefile
create mode 100644 drivers/peci/core.c
create mode 100644 drivers/peci/internal.h
create mode 100644 include/linux/peci.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 69a2935daf6c..9dde5ea5576e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15101,6 +15101,14 @@ L: [email protected]
S: Maintained
F: drivers/platform/x86/peaq-wmi.c
+PECI SUBSYSTEM
+M: Iwona Winiarska <[email protected]>
+L: [email protected] (moderated for non-subscribers)
+S: Supported
+F: Documentation/devicetree/bindings/peci/
+F: drivers/peci/
+F: include/linux/peci.h
+
PENSANDO ETHERNET DRIVERS
M: Shannon Nelson <[email protected]>
M: [email protected]
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 0d399ddaa185..8d6cd5d08722 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -236,4 +236,7 @@ source "drivers/interconnect/Kconfig"
source "drivers/counter/Kconfig"
source "drivers/most/Kconfig"
+
+source "drivers/peci/Kconfig"
+
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index a110338c860c..020780b6b4d2 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -187,3 +187,4 @@ obj-$(CONFIG_GNSS) += gnss/
obj-$(CONFIG_INTERCONNECT) += interconnect/
obj-$(CONFIG_COUNTER) += counter/
obj-$(CONFIG_MOST) += most/
+obj-$(CONFIG_PECI) += peci/
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
new file mode 100644
index 000000000000..71a4ad81225a
--- /dev/null
+++ b/drivers/peci/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+menuconfig PECI
+ tristate "PECI support"
+ help
+ The Platform Environment Control Interface (PECI) is an interface
+ that provides a communication channel to Intel processors and
+ chipset components from external monitoring or control devices.
+
+ If you are building a Baseboard Management Controller (BMC) kernel
+ for Intel platform say Y here and also to the specific driver for
+ your adapter(s) below. If unsure say N.
+
+ This support is also available as a module. If so, the module
+ will be called peci.
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
new file mode 100644
index 000000000000..e789a354e842
--- /dev/null
+++ b/drivers/peci/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+# Core functionality
+peci-y := core.o
+obj-$(CONFIG_PECI) += peci.o
diff --git a/drivers/peci/core.c b/drivers/peci/core.c
new file mode 100644
index 000000000000..73ad0a47fa9d
--- /dev/null
+++ b/drivers/peci/core.c
@@ -0,0 +1,158 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/bug.h>
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/idr.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/peci.h>
+#include <linux/pm_runtime.h>
+#include <linux/property.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+static DEFINE_IDA(peci_controller_ida);
+
+static void peci_controller_dev_release(struct device *dev)
+{
+ struct peci_controller *controller = to_peci_controller(dev);
+
+ mutex_destroy(&controller->bus_lock);
+ ida_free(&peci_controller_ida, controller->id);
+ kfree(controller);
+}
+
+struct device_type peci_controller_type = {
+ .release = peci_controller_dev_release,
+};
+
+static struct peci_controller *peci_controller_alloc(struct device *dev,
+ struct peci_controller_ops *ops)
+{
+ struct peci_controller *controller;
+ int ret;
+
+ if (!ops->xfer)
+ return ERR_PTR(-EINVAL);
+
+ controller = kzalloc(sizeof(*controller), GFP_KERNEL);
+ if (!controller)
+ return ERR_PTR(-ENOMEM);
+
+ ret = ida_alloc_max(&peci_controller_ida, U8_MAX, GFP_KERNEL);
+ if (ret < 0)
+ goto err;
+ controller->id = ret;
+
+ controller->ops = ops;
+
+ controller->dev.parent = dev;
+ controller->dev.bus = &peci_bus_type;
+ controller->dev.type = &peci_controller_type;
+
+ device_initialize(&controller->dev);
+
+ mutex_init(&controller->bus_lock);
+
+ return controller;
+
+err:
+ kfree(controller);
+ return ERR_PTR(ret);
+}
+
+static void unregister_controller(void *_controller)
+{
+ struct peci_controller *controller = _controller;
+
+ device_unregister(&controller->dev);
+
+ fwnode_handle_put(controller->dev.fwnode);
+
+ pm_runtime_disable(&controller->dev);
+}
+
+/**
+ * devm_peci_controller_add() - add PECI controller
+ * @dev: device for devm operations
+ * @ops: pointer to controller specific methods
+ *
+ * In final stage of its probe(), peci_controller driver calls
+ * devm_peci_controller_add() to register itself with the PECI bus.
+ *
+ * Return: Pointer to the newly allocated controller or ERR_PTR() in case of failure.
+ */
+struct peci_controller *devm_peci_controller_add(struct device *dev,
+ struct peci_controller_ops *ops)
+{
+ struct peci_controller *controller;
+ int ret;
+
+ controller = peci_controller_alloc(dev, ops);
+ if (IS_ERR(controller))
+ return controller;
+
+ ret = dev_set_name(&controller->dev, "peci-%d", controller->id);
+ if (ret)
+ goto err_put;
+
+ pm_runtime_no_callbacks(&controller->dev);
+ pm_suspend_ignore_children(&controller->dev, true);
+ pm_runtime_enable(&controller->dev);
+
+ device_set_node(&controller->dev, fwnode_handle_get(dev_fwnode(dev)));
+
+ ret = device_add(&controller->dev);
+ if (ret)
+ goto err_fwnode;
+
+ ret = devm_add_action_or_reset(dev, unregister_controller, controller);
+ if (ret)
+ return ERR_PTR(ret);
+
+ return controller;
+
+err_fwnode:
+ fwnode_handle_put(controller->dev.fwnode);
+
+ pm_runtime_disable(&controller->dev);
+
+err_put:
+ put_device(&controller->dev);
+
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_NS_GPL(devm_peci_controller_add, PECI);
+
+struct bus_type peci_bus_type = {
+ .name = "peci",
+};
+
+static int __init peci_init(void)
+{
+ int ret;
+
+ ret = bus_register(&peci_bus_type);
+ if (ret < 0) {
+ pr_err("peci: failed to register PECI bus type!\n");
+ return ret;
+ }
+
+ return 0;
+}
+module_init(peci_init);
+
+static void __exit peci_exit(void)
+{
+ bus_unregister(&peci_bus_type);
+}
+module_exit(peci_exit);
+
+MODULE_AUTHOR("Jason M Bills <[email protected]>");
+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI bus core module");
+MODULE_LICENSE("GPL");
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
new file mode 100644
index 000000000000..918dea745a86
--- /dev/null
+++ b/drivers/peci/internal.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2018-2021 Intel Corporation */
+
+#ifndef __PECI_INTERNAL_H
+#define __PECI_INTERNAL_H
+
+#include <linux/device.h>
+#include <linux/types.h>
+
+struct peci_controller;
+
+extern struct bus_type peci_bus_type;
+
+extern struct device_type peci_controller_type;
+
+#endif /* __PECI_INTERNAL_H */
diff --git a/include/linux/peci.h b/include/linux/peci.h
new file mode 100644
index 000000000000..26e0a4e73b50
--- /dev/null
+++ b/include/linux/peci.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2018-2021 Intel Corporation */
+
+#ifndef __LINUX_PECI_H
+#define __LINUX_PECI_H
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+/*
+ * Currently we don't support any PECI command over 32 bytes.
+ */
+#define PECI_REQUEST_MAX_BUF_SIZE 32
+
+struct peci_controller;
+struct peci_request;
+
+/**
+ * struct peci_controller_ops - PECI controller specific methods
+ * @xfer: PECI transfer function
+ *
+ * PECI controllers may have different hardware interfaces - the drivers
+ * implementing PECI controllers can use this structure to abstract away those
+ * differences by exposing a common interface for PECI core.
+ */
+struct peci_controller_ops {
+ int (*xfer)(struct peci_controller *controller, u8 addr, struct peci_request *req);
+};
+
+/**
+ * struct peci_controller - PECI controller
+ * @dev: device object to register PECI controller to the device model
+ * @ops: pointer to device specific controller operations
+ * @bus_lock: lock used to protect multiple callers
+ * @id: PECI controller ID
+ *
+ * PECI controllers usually connect to their drivers using non-PECI bus,
+ * such as the platform bus.
+ * Each PECI controller can communicate with one or more PECI devices.
+ */
+struct peci_controller {
+ struct device dev;
+ struct peci_controller_ops *ops;
+ struct mutex bus_lock; /* held for the duration of xfer */
+ u8 id;
+};
+
+struct peci_controller *devm_peci_controller_add(struct device *parent,
+ struct peci_controller_ops *ops);
+
+static inline struct peci_controller *to_peci_controller(void *d)
+{
+ return container_of(d, struct peci_controller, dev);
+}
+
+/**
+ * struct peci_device - PECI device
+ * @dev: device object to register PECI device to the device model
+ * @controller: manages the bus segment hosting this PECI device
+ * @addr: address used on the PECI bus connected to the parent controller
+ *
+ * A peci_device identifies a single device (i.e. CPU) connected to a PECI bus.
+ * The behaviour exposed to the rest of the system is defined by the PECI driver
+ * managing the device.
+ */
+struct peci_device {
+ struct device dev;
+ u8 addr;
+};
+
+static inline struct peci_device *to_peci_device(struct device *d)
+{
+ return container_of(d, struct peci_device, dev);
+}
+
+/**
+ * struct peci_request - PECI request
+ * @device: PECI device to which the request is sent
+ * @tx: TX buffer specific data
+ * @tx.buf: TX buffer
+ * @tx.len: transfer data length in bytes
+ * @rx: RX buffer specific data
+ * @rx.buf: RX buffer
+ * @rx.len: received data length in bytes
+ *
+ * A peci_request represents a request issued by PECI originator (TX) and
+ * a response received from PECI responder (RX).
+ */
+struct peci_request {
+ struct peci_device *device;
+ struct {
+ u8 buf[PECI_REQUEST_MAX_BUF_SIZE];
+ u8 len;
+ } rx, tx;
+};
+
+#endif /* __LINUX_PECI_H */
--
2.34.1
PECI is an interface that may be used by different types of devices.
Add a peci-cpu driver compatible with Intel processors. The driver is
responsible for handling auxiliary devices that can subsequently be used
by other drivers (e.g. hwmons).
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
MAINTAINERS | 1 +
drivers/peci/Kconfig | 15 ++
drivers/peci/Makefile | 2 +
drivers/peci/cpu.c | 343 +++++++++++++++++++++++++++++++++++++++
drivers/peci/device.c | 1 +
drivers/peci/internal.h | 27 +++
drivers/peci/request.c | 213 ++++++++++++++++++++++++
include/linux/peci-cpu.h | 40 +++++
include/linux/peci.h | 8 -
9 files changed, 642 insertions(+), 8 deletions(-)
create mode 100644 drivers/peci/cpu.c
create mode 100644 include/linux/peci-cpu.h
diff --git a/MAINTAINERS b/MAINTAINERS
index a63b106a09fb..503d3e636263 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15115,6 +15115,7 @@ L: [email protected] (moderated for non-subscribers)
S: Supported
F: Documentation/devicetree/bindings/peci/
F: drivers/peci/
+F: include/linux/peci-cpu.h
F: include/linux/peci.h
PENSANDO ETHERNET DRIVERS
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
index 99279df97a78..89872ad83320 100644
--- a/drivers/peci/Kconfig
+++ b/drivers/peci/Kconfig
@@ -16,6 +16,21 @@ menuconfig PECI
if PECI
+config PECI_CPU
+ tristate "PECI CPU"
+ select AUXILIARY_BUS
+ help
+ This option enables peci-cpu driver for Intel processors. It is
+ responsible for creating auxiliary devices that can subsequently
+ be used by other drivers in order to perform various
+ functionalities such as e.g. temperature monitoring.
+
+ Additional drivers must be enabled in order to use the functionality
+ of the device.
+
+ This driver can also be built as a module. If so, the module
+ will be called peci-cpu.
+
source "drivers/peci/controller/Kconfig"
endif # PECI
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
index 917f689e147a..7de18137e738 100644
--- a/drivers/peci/Makefile
+++ b/drivers/peci/Makefile
@@ -3,6 +3,8 @@
# Core functionality
peci-y := core.o request.o device.o sysfs.o
obj-$(CONFIG_PECI) += peci.o
+peci-cpu-y := cpu.o
+obj-$(CONFIG_PECI_CPU) += peci-cpu.o
# Hardware specific bus drivers
obj-y += controller/
diff --git a/drivers/peci/cpu.c b/drivers/peci/cpu.c
new file mode 100644
index 000000000000..68eb61c65d34
--- /dev/null
+++ b/drivers/peci/cpu.c
@@ -0,0 +1,343 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2021 Intel Corporation
+
+#include <linux/auxiliary_bus.h>
+#include <linux/module.h>
+#include <linux/peci.h>
+#include <linux/peci-cpu.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+/**
+ * peci_temp_read() - read the maximum die temperature from PECI target device
+ * @device: PECI device to which request is going to be sent
+ * @temp_raw: where to store the read temperature
+ *
+ * It uses GetTemp PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_temp_read(struct peci_device *device, s16 *temp_raw)
+{
+ struct peci_request *req;
+
+ req = peci_xfer_get_temp(device);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ *temp_raw = peci_request_temp_read(req);
+
+ peci_request_free(req);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(peci_temp_read, PECI_CPU);
+
+/**
+ * peci_pcs_read() - read PCS register
+ * @device: PECI device to which request is going to be sent
+ * @index: PCS index
+ * @param: PCS parameter
+ * @data: where to store the read data
+ *
+ * It uses RdPkgConfig PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_pcs_read(struct peci_device *device, u8 index, u16 param, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_xfer_pkg_cfg_readl(device, index, param);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_pcs_read, PECI_CPU);
+
+/**
+ * peci_pci_local_read() - read 32-bit memory location using raw address
+ * @device: PECI device to which request is going to be sent
+ * @bus: bus
+ * @dev: device
+ * @func: function
+ * @reg: register
+ * @data: where to store the read data
+ *
+ * It uses RdPCIConfigLocal PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_pci_local_read(struct peci_device *device, u8 bus, u8 dev, u8 func,
+ u16 reg, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_xfer_pci_cfg_local_readl(device, bus, dev, func, reg);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_pci_local_read, PECI_CPU);
+
+/**
+ * peci_ep_pci_local_read() - read 32-bit memory location using raw address
+ * @device: PECI device to which request is going to be sent
+ * @seg: PCI segment
+ * @bus: bus
+ * @dev: device
+ * @func: function
+ * @reg: register
+ * @data: where to store the read data
+ *
+ * Like &peci_pci_local_read, but it uses RdEndpointConfig PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_ep_pci_local_read(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_xfer_ep_pci_cfg_local_readl(device, seg, bus, dev, func, reg);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_ep_pci_local_read, PECI_CPU);
+
+/**
+ * peci_mmio_read() - read 32-bit memory location using 64-bit bar offset address
+ * @device: PECI device to which request is going to be sent
+ * @bar: PCI bar
+ * @seg: PCI segment
+ * @bus: bus
+ * @dev: device
+ * @func: function
+ * @address: 64-bit MMIO address
+ * @data: where to store the read data
+ *
+ * It uses RdEndpointConfig PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_mmio_read(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 address, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_xfer_ep_mmio64_readl(device, bar, seg, bus, dev, func, address);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_mmio_read, PECI_CPU);
+
+static const char * const peci_adev_types[] = {
+ "cputemp",
+ "dimmtemp",
+};
+
+struct peci_cpu {
+ struct peci_device *device;
+ const struct peci_device_id *id;
+};
+
+static void adev_release(struct device *dev)
+{
+ struct auxiliary_device *adev = to_auxiliary_dev(dev);
+
+ auxiliary_device_uninit(adev);
+
+ kfree(adev->name);
+ kfree(adev);
+}
+
+static struct auxiliary_device *adev_alloc(struct peci_cpu *priv, int idx)
+{
+ struct peci_controller *controller = to_peci_controller(priv->device->dev.parent);
+ struct auxiliary_device *adev;
+ const char *name;
+ int ret;
+
+ adev = kzalloc(sizeof(*adev), GFP_KERNEL);
+ if (!adev)
+ return ERR_PTR(-ENOMEM);
+
+ name = kasprintf(GFP_KERNEL, "%s.%s", peci_adev_types[idx], (const char *)priv->id->data);
+ if (!name) {
+ ret = -ENOMEM;
+ goto free_adev;
+ }
+
+ adev->name = name;
+ adev->dev.parent = &priv->device->dev;
+ adev->dev.release = adev_release;
+ adev->id = (controller->id << 16) | (priv->device->addr);
+
+ ret = auxiliary_device_init(adev);
+ if (ret)
+ goto free_name;
+
+ return adev;
+
+free_name:
+ kfree(name);
+free_adev:
+ kfree(adev);
+ return ERR_PTR(ret);
+}
+
+static void unregister_adev(void *_adev)
+{
+ struct auxiliary_device *adev = _adev;
+
+ auxiliary_device_delete(adev);
+}
+
+static int devm_adev_add(struct device *dev, int idx)
+{
+ struct peci_cpu *priv = dev_get_drvdata(dev);
+ struct auxiliary_device *adev;
+ int ret;
+
+ adev = adev_alloc(priv, idx);
+ if (IS_ERR(adev))
+ return PTR_ERR(adev);
+
+ ret = auxiliary_device_add(adev);
+ if (ret) {
+ auxiliary_device_uninit(adev);
+ return ret;
+ }
+
+ ret = devm_add_action_or_reset(&priv->device->dev, unregister_adev, adev);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static void peci_cpu_add_adevices(struct peci_cpu *priv)
+{
+ struct device *dev = &priv->device->dev;
+ int ret, i;
+
+ for (i = 0; i < ARRAY_SIZE(peci_adev_types); i++) {
+ ret = devm_adev_add(dev, i);
+ if (ret) {
+ dev_warn(dev, "Failed to register PECI auxiliary: %s, ret = %d\n",
+ peci_adev_types[i], ret);
+ continue;
+ }
+ }
+}
+
+static int
+peci_cpu_probe(struct peci_device *device, const struct peci_device_id *id)
+{
+ struct device *dev = &device->dev;
+ struct peci_cpu *priv;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ dev_set_drvdata(dev, priv);
+ priv->device = device;
+ priv->id = id;
+
+ peci_cpu_add_adevices(priv);
+
+ return 0;
+}
+
+static const struct peci_device_id peci_cpu_device_ids[] = {
+ { /* Haswell Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_HASWELL_X,
+ .data = "hsx",
+ },
+ { /* Broadwell Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_BROADWELL_X,
+ .data = "bdx",
+ },
+ { /* Broadwell Xeon D */
+ .family = 6,
+ .model = INTEL_FAM6_BROADWELL_D,
+ .data = "bdxd",
+ },
+ { /* Skylake Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_SKYLAKE_X,
+ .data = "skx",
+ },
+ { /* Icelake Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_ICELAKE_X,
+ .data = "icx",
+ },
+ { /* Icelake Xeon D */
+ .family = 6,
+ .model = INTEL_FAM6_ICELAKE_D,
+ .data = "icxd",
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(peci, peci_cpu_device_ids);
+
+static struct peci_driver peci_cpu_driver = {
+ .probe = peci_cpu_probe,
+ .id_table = peci_cpu_device_ids,
+ .driver = {
+ .name = "peci-cpu",
+ },
+};
+module_peci_driver(peci_cpu_driver);
+
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI CPU driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(PECI);
diff --git a/drivers/peci/device.c b/drivers/peci/device.c
index 184b5e650b0b..e6b0bffb14f4 100644
--- a/drivers/peci/device.c
+++ b/drivers/peci/device.c
@@ -3,6 +3,7 @@
#include <linux/bitfield.h>
#include <linux/peci.h>
+#include <linux/peci-cpu.h>
#include <linux/slab.h>
#include "internal.h"
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
index 52c02e12874f..9d75ea54504c 100644
--- a/drivers/peci/internal.h
+++ b/drivers/peci/internal.h
@@ -22,6 +22,7 @@ void peci_request_free(struct peci_request *req);
int peci_request_status(struct peci_request *req);
u64 peci_request_dib_read(struct peci_request *req);
+s16 peci_request_temp_read(struct peci_request *req);
u8 peci_request_data_readb(struct peci_request *req);
u16 peci_request_data_readw(struct peci_request *req);
@@ -36,6 +37,32 @@ struct peci_request *peci_xfer_pkg_cfg_readw(struct peci_device *device, u8 inde
struct peci_request *peci_xfer_pkg_cfg_readl(struct peci_device *device, u8 index, u16 param);
struct peci_request *peci_xfer_pkg_cfg_readq(struct peci_device *device, u8 index, u16 param);
+struct peci_request *peci_xfer_pci_cfg_local_readb(struct peci_device *device,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_pci_cfg_local_readw(struct peci_device *device,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_pci_cfg_local_readl(struct peci_device *device,
+ u8 bus, u8 dev, u8 func, u16 reg);
+
+struct peci_request *peci_xfer_ep_pci_cfg_local_readb(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_ep_pci_cfg_local_readw(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_ep_pci_cfg_local_readl(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+
+struct peci_request *peci_xfer_ep_pci_cfg_readb(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_ep_pci_cfg_readw(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_xfer_ep_pci_cfg_readl(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+
+struct peci_request *peci_xfer_ep_mmio32_readl(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 offset);
+
+struct peci_request *peci_xfer_ep_mmio64_readl(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 offset);
/**
* struct peci_device_id - PECI device data to match
* @data: pointer to driver private data specific to device
diff --git a/drivers/peci/request.c b/drivers/peci/request.c
index a49eb351cda3..8d6dd7b6b559 100644
--- a/drivers/peci/request.c
+++ b/drivers/peci/request.c
@@ -3,6 +3,7 @@
#include <linux/bug.h>
#include <linux/export.h>
+#include <linux/pci.h>
#include <linux/peci.h>
#include <linux/slab.h>
#include <linux/types.h>
@@ -15,6 +16,10 @@
#define PECI_GET_DIB_WR_LEN 1
#define PECI_GET_DIB_RD_LEN 8
+#define PECI_GET_TEMP_CMD 0x01
+#define PECI_GET_TEMP_WR_LEN 1
+#define PECI_GET_TEMP_RD_LEN 2
+
#define PECI_RDPKGCFG_CMD 0xa1
#define PECI_RDPKGCFG_WR_LEN 5
#define PECI_RDPKGCFG_RD_LEN_BASE 1
@@ -22,6 +27,45 @@
#define PECI_WRPKGCFG_WR_LEN_BASE 6
#define PECI_WRPKGCFG_RD_LEN 1
+#define PECI_RDIAMSR_CMD 0xb1
+#define PECI_RDIAMSR_WR_LEN 5
+#define PECI_RDIAMSR_RD_LEN 9
+#define PECI_WRIAMSR_CMD 0xb5
+#define PECI_RDIAMSREX_CMD 0xd1
+#define PECI_RDIAMSREX_WR_LEN 6
+#define PECI_RDIAMSREX_RD_LEN 9
+
+#define PECI_RDPCICFG_CMD 0x61
+#define PECI_RDPCICFG_WR_LEN 6
+#define PECI_RDPCICFG_RD_LEN 5
+#define PECI_RDPCICFG_RD_LEN_MAX 24
+#define PECI_WRPCICFG_CMD 0x65
+
+#define PECI_RDPCICFGLOCAL_CMD 0xe1
+#define PECI_RDPCICFGLOCAL_WR_LEN 5
+#define PECI_RDPCICFGLOCAL_RD_LEN_BASE 1
+#define PECI_WRPCICFGLOCAL_CMD 0xe5
+#define PECI_WRPCICFGLOCAL_WR_LEN_BASE 6
+#define PECI_WRPCICFGLOCAL_RD_LEN 1
+
+#define PECI_ENDPTCFG_TYPE_LOCAL_PCI 0x03
+#define PECI_ENDPTCFG_TYPE_PCI 0x04
+#define PECI_ENDPTCFG_TYPE_MMIO 0x05
+#define PECI_ENDPTCFG_ADDR_TYPE_PCI 0x04
+#define PECI_ENDPTCFG_ADDR_TYPE_MMIO_D 0x05
+#define PECI_ENDPTCFG_ADDR_TYPE_MMIO_Q 0x06
+#define PECI_RDENDPTCFG_CMD 0xc1
+#define PECI_RDENDPTCFG_PCI_WR_LEN 12
+#define PECI_RDENDPTCFG_MMIO_WR_LEN_BASE 10
+#define PECI_RDENDPTCFG_MMIO_D_WR_LEN 14
+#define PECI_RDENDPTCFG_MMIO_Q_WR_LEN 18
+#define PECI_RDENDPTCFG_RD_LEN_BASE 1
+#define PECI_WRENDPTCFG_CMD 0xc5
+#define PECI_WRENDPTCFG_PCI_WR_LEN_BASE 13
+#define PECI_WRENDPTCFG_MMIO_D_WR_LEN_BASE 15
+#define PECI_WRENDPTCFG_MMIO_Q_WR_LEN_BASE 19
+#define PECI_WRENDPTCFG_RD_LEN 1
+
/* Device Specific Completion Code (CC) Definition */
#define PECI_CC_SUCCESS 0x40
#define PECI_CC_NEED_RETRY 0x80
@@ -202,6 +246,27 @@ struct peci_request *peci_xfer_get_dib(struct peci_device *device)
}
EXPORT_SYMBOL_NS_GPL(peci_xfer_get_dib, PECI);
+struct peci_request *peci_xfer_get_temp(struct peci_device *device)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_GET_TEMP_WR_LEN, PECI_GET_TEMP_RD_LEN);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_GET_TEMP_CMD;
+
+ ret = peci_request_xfer(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+EXPORT_SYMBOL_NS_GPL(peci_xfer_get_temp, PECI);
+
static struct peci_request *
__pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
{
@@ -226,6 +291,108 @@ __pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
return req;
}
+static u32 __get_pci_addr(u8 bus, u8 dev, u8 func, u16 reg)
+{
+ return reg | PCI_DEVID(bus, PCI_DEVFN(dev, func)) << 12;
+}
+
+static struct peci_request *
+__pci_cfg_local_read(struct peci_device *device, u8 bus, u8 dev, u8 func, u16 reg, u8 len)
+{
+ struct peci_request *req;
+ u32 pci_addr;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_RDPCICFGLOCAL_WR_LEN,
+ PECI_RDPCICFGLOCAL_RD_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ pci_addr = __get_pci_addr(bus, dev, func, reg);
+
+ req->tx.buf[0] = PECI_RDPCICFGLOCAL_CMD;
+ req->tx.buf[1] = 0;
+ put_unaligned_le24(pci_addr, &req->tx.buf[2]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
+static struct peci_request *
+__ep_pci_cfg_read(struct peci_device *device, u8 msg_type, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg, u8 len)
+{
+ struct peci_request *req;
+ u32 pci_addr;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_RDENDPTCFG_PCI_WR_LEN,
+ PECI_RDENDPTCFG_RD_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ pci_addr = __get_pci_addr(bus, dev, func, reg);
+
+ req->tx.buf[0] = PECI_RDENDPTCFG_CMD;
+ req->tx.buf[1] = 0;
+ req->tx.buf[2] = msg_type;
+ req->tx.buf[3] = 0;
+ req->tx.buf[4] = 0;
+ req->tx.buf[5] = 0;
+ req->tx.buf[6] = PECI_ENDPTCFG_ADDR_TYPE_PCI;
+ req->tx.buf[7] = seg; /* PCI Segment */
+ put_unaligned_le32(pci_addr, &req->tx.buf[8]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
+static struct peci_request *
+__ep_mmio_read(struct peci_device *device, u8 bar, u8 addr_type, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 offset, u8 tx_len, u8 len)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, tx_len, PECI_RDENDPTCFG_RD_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_RDENDPTCFG_CMD;
+ req->tx.buf[1] = 0;
+ req->tx.buf[2] = PECI_ENDPTCFG_TYPE_MMIO;
+ req->tx.buf[3] = 0; /* Endpoint ID */
+ req->tx.buf[4] = 0; /* Reserved */
+ req->tx.buf[5] = bar;
+ req->tx.buf[6] = addr_type;
+ req->tx.buf[7] = seg; /* PCI Segment */
+ req->tx.buf[8] = PCI_DEVFN(dev, func);
+ req->tx.buf[9] = bus; /* PCI Bus */
+
+ if (addr_type == PECI_ENDPTCFG_ADDR_TYPE_MMIO_D)
+ put_unaligned_le32(offset, &req->tx.buf[10]);
+ else
+ put_unaligned_le64(offset, &req->tx.buf[10]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
u8 peci_request_data_readb(struct peci_request *req)
{
return req->rx.buf[1];
@@ -256,6 +423,12 @@ u64 peci_request_dib_read(struct peci_request *req)
}
EXPORT_SYMBOL_NS_GPL(peci_request_dib_read, PECI);
+s16 peci_request_temp_read(struct peci_request *req)
+{
+ return get_unaligned_le16(&req->rx.buf[0]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_temp_read, PECI);
+
#define __read_pkg_config(x, type) \
struct peci_request *peci_xfer_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param) \
{ \
@@ -267,3 +440,43 @@ __read_pkg_config(readb, u8);
__read_pkg_config(readw, u16);
__read_pkg_config(readl, u32);
__read_pkg_config(readq, u64);
+
+#define __read_pci_config_local(x, type) \
+struct peci_request * \
+peci_xfer_pci_cfg_local_##x(struct peci_device *device, u8 bus, u8 dev, u8 func, u16 reg) \
+{ \
+ return __pci_cfg_local_read(device, bus, dev, func, reg, sizeof(type)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_xfer_pci_cfg_local_##x, PECI)
+
+__read_pci_config_local(readb, u8);
+__read_pci_config_local(readw, u16);
+__read_pci_config_local(readl, u32);
+
+#define __read_ep_pci_config(x, msg_type, type) \
+struct peci_request * \
+peci_xfer_ep_pci_cfg_##x(struct peci_device *device, u8 seg, u8 bus, u8 dev, u8 func, u16 reg) \
+{ \
+ return __ep_pci_cfg_read(device, msg_type, seg, bus, dev, func, reg, sizeof(type)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_xfer_ep_pci_cfg_##x, PECI)
+
+__read_ep_pci_config(local_readb, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u8);
+__read_ep_pci_config(local_readw, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u16);
+__read_ep_pci_config(local_readl, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u32);
+__read_ep_pci_config(readb, PECI_ENDPTCFG_TYPE_PCI, u8);
+__read_ep_pci_config(readw, PECI_ENDPTCFG_TYPE_PCI, u16);
+__read_ep_pci_config(readl, PECI_ENDPTCFG_TYPE_PCI, u32);
+
+#define __read_ep_mmio(x, y, addr_type, type1, type2) \
+struct peci_request *peci_xfer_ep_mmio##y##_##x(struct peci_device *device, u8 bar, u8 seg, \
+ u8 bus, u8 dev, u8 func, u64 offset) \
+{ \
+ return __ep_mmio_read(device, bar, addr_type, seg, bus, dev, func, \
+ offset, PECI_RDENDPTCFG_MMIO_WR_LEN_BASE + sizeof(type1), \
+ sizeof(type2)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_xfer_ep_mmio##y##_##x, PECI)
+
+__read_ep_mmio(readl, 32, PECI_ENDPTCFG_ADDR_TYPE_MMIO_D, u32, u32);
+__read_ep_mmio(readl, 64, PECI_ENDPTCFG_ADDR_TYPE_MMIO_Q, u64, u32);
diff --git a/include/linux/peci-cpu.h b/include/linux/peci-cpu.h
new file mode 100644
index 000000000000..ff8ae9c26c80
--- /dev/null
+++ b/include/linux/peci-cpu.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2021 Intel Corporation */
+
+#ifndef __LINUX_PECI_CPU_H
+#define __LINUX_PECI_CPU_H
+
+#include <linux/types.h>
+
+#include "../../arch/x86/include/asm/intel-family.h"
+
+#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
+#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
+#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
+#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
+#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
+#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
+#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
+#define PECI_PCS_MODULE_TEMP 9 /* Per Core DTS Temperature Read */
+#define PECI_PCS_THERMAL_MARGIN 10 /* DTS thermal margin */
+#define PECI_PCS_DDR_DIMM_TEMP 14 /* DDR DIMM Temperature */
+#define PECI_PCS_TEMP_TARGET 16 /* Temperature Target Read */
+#define PECI_PCS_TDP_UNITS 30 /* Units for power/energy registers */
+
+struct peci_device;
+
+int peci_temp_read(struct peci_device *device, s16 *temp_raw);
+
+int peci_pcs_read(struct peci_device *device, u8 index,
+ u16 param, u32 *data);
+
+int peci_pci_local_read(struct peci_device *device, u8 bus, u8 dev,
+ u8 func, u16 reg, u32 *data);
+
+int peci_ep_pci_local_read(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg, u32 *data);
+
+int peci_mmio_read(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 address, u32 *data);
+
+#endif /* __LINUX_PECI_CPU_H */
diff --git a/include/linux/peci.h b/include/linux/peci.h
index 4eda423ba10c..06e6ef935297 100644
--- a/include/linux/peci.h
+++ b/include/linux/peci.h
@@ -14,14 +14,6 @@
*/
#define PECI_REQUEST_MAX_BUF_SIZE 32
-#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
-#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
-#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
-#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
-#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
-#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
-#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
-
struct peci_controller;
struct peci_request;
--
2.34.1
Add device tree bindings for the PECI controller.
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Reviewed-by: Joel Stanley <[email protected]>
---
.../bindings/peci/peci-controller.yaml | 33 +++++++++++++++++++
1 file changed, 33 insertions(+)
create mode 100644 Documentation/devicetree/bindings/peci/peci-controller.yaml
diff --git a/Documentation/devicetree/bindings/peci/peci-controller.yaml b/Documentation/devicetree/bindings/peci/peci-controller.yaml
new file mode 100644
index 000000000000..bbc3d3f3a929
--- /dev/null
+++ b/Documentation/devicetree/bindings/peci/peci-controller.yaml
@@ -0,0 +1,33 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/peci/peci-controller.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Generic Device Tree Bindings for PECI
+
+maintainers:
+ - Iwona Winiarska <[email protected]>
+
+description:
+ PECI (Platform Environment Control Interface) is an interface that provides a
+ communication channel from Intel processors and chipset components to external
+ monitoring or control devices.
+
+properties:
+ $nodename:
+ pattern: "^peci-controller(@.*)?$"
+
+ cmd-timeout-ms:
+ description:
+ Command timeout in units of ms.
+
+additionalProperties: true
+
+examples:
+ - |
+ peci-controller@1e78b000 {
+ reg = <0x1e78b000 0x100>;
+ cmd-timeout-ms = <500>;
+ };
+...
--
2.34.1
Add support for PECI device drivers, which unlike PECI controller
drivers are actually able to provide functionalities to userspace.
Also, extend peci_request API to allow querying more details about PECI
device (e.g. model/family), that's going to be used to find a compatible
peci_driver.
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
drivers/peci/core.c | 44 +++++++++
drivers/peci/device.c | 130 ++++++++++++++++++++++++
drivers/peci/internal.h | 74 ++++++++++++++
drivers/peci/request.c | 214 ++++++++++++++++++++++++++++++++++++++++
include/linux/peci.h | 19 ++++
5 files changed, 481 insertions(+)
diff --git a/drivers/peci/core.c b/drivers/peci/core.c
index e993615cf521..9c8cf07e51c7 100644
--- a/drivers/peci/core.c
+++ b/drivers/peci/core.c
@@ -160,8 +160,52 @@ struct peci_controller *devm_peci_controller_add(struct device *dev,
}
EXPORT_SYMBOL_NS_GPL(devm_peci_controller_add, PECI);
+static const struct peci_device_id *
+peci_bus_match_device_id(const struct peci_device_id *id, struct peci_device *device)
+{
+ while (id->family != 0) {
+ if (id->family == device->info.family &&
+ id->model == device->info.model)
+ return id;
+ id++;
+ }
+
+ return NULL;
+}
+
+static int peci_bus_device_match(struct device *dev, struct device_driver *drv)
+{
+ struct peci_device *device = to_peci_device(dev);
+ struct peci_driver *peci_drv = to_peci_driver(drv);
+
+ if (dev->type != &peci_device_type)
+ return 0;
+
+ return !!peci_bus_match_device_id(peci_drv->id_table, device);
+}
+
+static int peci_bus_device_probe(struct device *dev)
+{
+ struct peci_device *device = to_peci_device(dev);
+ struct peci_driver *driver = to_peci_driver(dev->driver);
+
+ return driver->probe(device, peci_bus_match_device_id(driver->id_table, device));
+}
+
+static void peci_bus_device_remove(struct device *dev)
+{
+ struct peci_device *device = to_peci_device(dev);
+ struct peci_driver *driver = to_peci_driver(dev->driver);
+
+ if (driver->remove)
+ driver->remove(device);
+}
+
struct bus_type peci_bus_type = {
.name = "peci",
+ .match = peci_bus_device_match,
+ .probe = peci_bus_device_probe,
+ .remove = peci_bus_device_remove,
.bus_groups = peci_bus_groups,
};
diff --git a/drivers/peci/device.c b/drivers/peci/device.c
index d10ed1cfcd48..184b5e650b0b 100644
--- a/drivers/peci/device.c
+++ b/drivers/peci/device.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright (c) 2018-2021 Intel Corporation
+#include <linux/bitfield.h>
#include <linux/peci.h>
#include <linux/slab.h>
@@ -13,6 +14,104 @@
*/
static DEFINE_MUTEX(peci_device_del_lock);
+#define REVISION_NUM_MASK GENMASK(15, 8)
+static int peci_get_revision(struct peci_device *device, u8 *revision)
+{
+ struct peci_request *req;
+ u64 dib;
+
+ req = peci_xfer_get_dib(device);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ /*
+ * PECI device may be in a state where it is unable to return a proper
+ * DIB, in which case it returns 0 as DIB value.
+ * Let's treat this as an error to avoid carrying on with the detection
+ * using invalid revision.
+ */
+ dib = peci_request_dib_read(req);
+ if (dib == 0) {
+ peci_request_free(req);
+ return -EIO;
+ }
+
+ *revision = FIELD_GET(REVISION_NUM_MASK, dib);
+
+ peci_request_free(req);
+
+ return 0;
+}
+
+static int peci_get_cpu_id(struct peci_device *device, u32 *cpu_id)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_xfer_pkg_cfg_readl(device, PECI_PCS_PKG_ID, PECI_PKG_ID_CPU_ID);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *cpu_id = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+
+static unsigned int peci_x86_cpu_family(unsigned int sig)
+{
+ unsigned int x86;
+
+ x86 = (sig >> 8) & 0xf;
+
+ if (x86 == 0xf)
+ x86 += (sig >> 20) & 0xff;
+
+ return x86;
+}
+
+static unsigned int peci_x86_cpu_model(unsigned int sig)
+{
+ unsigned int fam, model;
+
+ fam = peci_x86_cpu_family(sig);
+
+ model = (sig >> 4) & 0xf;
+
+ if (fam >= 0x6)
+ model += ((sig >> 16) & 0xf) << 4;
+
+ return model;
+}
+
+static int peci_device_info_init(struct peci_device *device)
+{
+ u8 revision;
+ u32 cpu_id;
+ int ret;
+
+ ret = peci_get_cpu_id(device, &cpu_id);
+ if (ret)
+ return ret;
+
+ device->info.family = peci_x86_cpu_family(cpu_id);
+ device->info.model = peci_x86_cpu_model(cpu_id);
+
+ ret = peci_get_revision(device, &revision);
+ if (ret)
+ return ret;
+ device->info.peci_revision = revision;
+
+ device->info.socket_id = device->addr - PECI_BASE_ADDR;
+
+ return 0;
+}
+
static int peci_detect(struct peci_controller *controller, u8 addr)
{
/*
@@ -82,6 +181,10 @@ int peci_device_create(struct peci_controller *controller, u8 addr)
device->dev.bus = &peci_bus_type;
device->dev.type = &peci_device_type;
+ ret = peci_device_info_init(device);
+ if (ret)
+ goto err_put;
+
ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device->addr);
if (ret)
goto err_put;
@@ -108,6 +211,33 @@ void peci_device_destroy(struct peci_device *device)
mutex_unlock(&peci_device_del_lock);
}
+int __peci_driver_register(struct peci_driver *driver, struct module *owner,
+ const char *mod_name)
+{
+ driver->driver.bus = &peci_bus_type;
+ driver->driver.owner = owner;
+ driver->driver.mod_name = mod_name;
+
+ if (!driver->probe) {
+ pr_err("peci: trying to register driver without probe callback\n");
+ return -EINVAL;
+ }
+
+ if (!driver->id_table) {
+ pr_err("peci: trying to register driver without device id table\n");
+ return -EINVAL;
+ }
+
+ return driver_register(&driver->driver);
+}
+EXPORT_SYMBOL_NS_GPL(__peci_driver_register, PECI);
+
+void peci_driver_unregister(struct peci_driver *driver)
+{
+ driver_unregister(&driver->driver);
+}
+EXPORT_SYMBOL_NS_GPL(peci_driver_unregister, PECI);
+
static void peci_device_release(struct device *dev)
{
struct peci_device *device = to_peci_device(dev);
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
index 978e12c8e1d3..52c02e12874f 100644
--- a/drivers/peci/internal.h
+++ b/drivers/peci/internal.h
@@ -19,6 +19,35 @@ struct peci_request;
struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len);
void peci_request_free(struct peci_request *req);
+int peci_request_status(struct peci_request *req);
+
+u64 peci_request_dib_read(struct peci_request *req);
+
+u8 peci_request_data_readb(struct peci_request *req);
+u16 peci_request_data_readw(struct peci_request *req);
+u32 peci_request_data_readl(struct peci_request *req);
+u64 peci_request_data_readq(struct peci_request *req);
+
+struct peci_request *peci_xfer_get_dib(struct peci_device *device);
+struct peci_request *peci_xfer_get_temp(struct peci_device *device);
+
+struct peci_request *peci_xfer_pkg_cfg_readb(struct peci_device *device, u8 index, u16 param);
+struct peci_request *peci_xfer_pkg_cfg_readw(struct peci_device *device, u8 index, u16 param);
+struct peci_request *peci_xfer_pkg_cfg_readl(struct peci_device *device, u8 index, u16 param);
+struct peci_request *peci_xfer_pkg_cfg_readq(struct peci_device *device, u8 index, u16 param);
+
+/**
+ * struct peci_device_id - PECI device data to match
+ * @data: pointer to driver private data specific to device
+ * @family: device family
+ * @model: device model
+ */
+struct peci_device_id {
+ const void *data;
+ u16 family;
+ u8 model;
+};
+
extern struct device_type peci_device_type;
extern const struct attribute_group *peci_device_groups[];
@@ -28,6 +57,51 @@ void peci_device_destroy(struct peci_device *device);
extern struct bus_type peci_bus_type;
extern const struct attribute_group *peci_bus_groups[];
+/**
+ * struct peci_driver - PECI driver
+ * @driver: inherit device driver
+ * @probe: probe callback
+ * @remove: remove callback
+ * @id_table: PECI device match table to decide which device to bind
+ */
+struct peci_driver {
+ struct device_driver driver;
+ int (*probe)(struct peci_device *device, const struct peci_device_id *id);
+ void (*remove)(struct peci_device *device);
+ const struct peci_device_id *id_table;
+};
+
+static inline struct peci_driver *to_peci_driver(struct device_driver *d)
+{
+ return container_of(d, struct peci_driver, driver);
+}
+
+int __peci_driver_register(struct peci_driver *driver, struct module *owner,
+ const char *mod_name);
+/**
+ * peci_driver_register() - register PECI driver
+ * @driver: the driver to be registered
+ *
+ * PECI drivers that don't need to do anything special in module init should
+ * use the convenience "module_peci_driver" macro instead
+ *
+ * Return: zero on success, else a negative error code.
+ */
+#define peci_driver_register(driver) \
+ __peci_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
+void peci_driver_unregister(struct peci_driver *driver);
+
+/**
+ * module_peci_driver() - helper macro for registering a modular PECI driver
+ * @__peci_driver: peci_driver struct
+ *
+ * Helper macro for PECI drivers which do not do anything special in module
+ * init/exit. This eliminates a lot of boilerplate. Each module may only
+ * use this macro once, and calling it replaces module_init() and module_exit()
+ */
+#define module_peci_driver(__peci_driver) \
+ module_driver(__peci_driver, peci_driver_register, peci_driver_unregister)
+
extern struct device_type peci_controller_type;
int peci_controller_scan_devices(struct peci_controller *controller);
diff --git a/drivers/peci/request.c b/drivers/peci/request.c
index 7dee51c50dd2..a49eb351cda3 100644
--- a/drivers/peci/request.c
+++ b/drivers/peci/request.c
@@ -1,13 +1,140 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright (c) 2021 Intel Corporation
+#include <linux/bug.h>
#include <linux/export.h>
#include <linux/peci.h>
#include <linux/slab.h>
#include <linux/types.h>
+#include <asm/unaligned.h>
+
#include "internal.h"
+#define PECI_GET_DIB_CMD 0xf7
+#define PECI_GET_DIB_WR_LEN 1
+#define PECI_GET_DIB_RD_LEN 8
+
+#define PECI_RDPKGCFG_CMD 0xa1
+#define PECI_RDPKGCFG_WR_LEN 5
+#define PECI_RDPKGCFG_RD_LEN_BASE 1
+#define PECI_WRPKGCFG_CMD 0xa5
+#define PECI_WRPKGCFG_WR_LEN_BASE 6
+#define PECI_WRPKGCFG_RD_LEN 1
+
+/* Device Specific Completion Code (CC) Definition */
+#define PECI_CC_SUCCESS 0x40
+#define PECI_CC_NEED_RETRY 0x80
+#define PECI_CC_OUT_OF_RESOURCE 0x81
+#define PECI_CC_UNAVAIL_RESOURCE 0x82
+#define PECI_CC_INVALID_REQ 0x90
+#define PECI_CC_MCA_ERROR 0x91
+#define PECI_CC_CATASTROPHIC_MCA_ERROR 0x93
+#define PECI_CC_FATAL_MCA_ERROR 0x94
+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB 0x98
+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR 0x9B
+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA 0x9C
+
+#define PECI_RETRY_BIT BIT(0)
+
+#define PECI_RETRY_TIMEOUT msecs_to_jiffies(700)
+#define PECI_RETRY_INTERVAL_MIN msecs_to_jiffies(1)
+#define PECI_RETRY_INTERVAL_MAX msecs_to_jiffies(128)
+
+static u8 peci_request_data_cc(struct peci_request *req)
+{
+ return req->rx.buf[0];
+}
+
+/**
+ * peci_request_status() - return -errno based on PECI completion code
+ * @req: the PECI request that contains response data with completion code
+ *
+ * It can't be used for Ping(), GetDIB() and GetTemp() - for those commands we
+ * don't expect completion code in the response.
+ *
+ * Return: -errno
+ */
+int peci_request_status(struct peci_request *req)
+{
+ u8 cc = peci_request_data_cc(req);
+
+ if (cc != PECI_CC_SUCCESS)
+ dev_dbg(&req->device->dev, "ret: %#02x\n", cc);
+
+ switch (cc) {
+ case PECI_CC_SUCCESS:
+ return 0;
+ case PECI_CC_NEED_RETRY:
+ case PECI_CC_OUT_OF_RESOURCE:
+ case PECI_CC_UNAVAIL_RESOURCE:
+ return -EAGAIN;
+ case PECI_CC_INVALID_REQ:
+ return -EINVAL;
+ case PECI_CC_MCA_ERROR:
+ case PECI_CC_CATASTROPHIC_MCA_ERROR:
+ case PECI_CC_FATAL_MCA_ERROR:
+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB:
+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR:
+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA:
+ return -EIO;
+ }
+
+ WARN_ONCE(1, "Unknown PECI completion code: %#02x\n", cc);
+
+ return -EIO;
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_status, PECI);
+
+static int peci_request_xfer(struct peci_request *req)
+{
+ struct peci_device *device = req->device;
+ struct peci_controller *controller = to_peci_controller(device->dev.parent);
+ int ret;
+
+ mutex_lock(&controller->bus_lock);
+ ret = controller->ops->xfer(controller, device->addr, req);
+ mutex_unlock(&controller->bus_lock);
+
+ return ret;
+}
+
+static int peci_request_xfer_retry(struct peci_request *req)
+{
+ long wait_interval = PECI_RETRY_INTERVAL_MIN;
+ struct peci_device *device = req->device;
+ struct peci_controller *controller = to_peci_controller(device->dev.parent);
+ unsigned long start = jiffies;
+ int ret;
+
+ /* Don't try to use it for ping */
+ if (WARN_ON(req->tx.len == 0))
+ return 0;
+
+ do {
+ ret = peci_request_xfer(req);
+ if (ret) {
+ dev_dbg(&controller->dev, "xfer error: %d\n", ret);
+ return ret;
+ }
+
+ if (peci_request_status(req) != -EAGAIN)
+ return 0;
+
+ /* Set the retry bit to indicate a retry attempt */
+ req->tx.buf[1] |= PECI_RETRY_BIT;
+
+ if (schedule_timeout_interruptible(wait_interval))
+ return -ERESTARTSYS;
+
+ wait_interval = min_t(long, wait_interval * 2, PECI_RETRY_INTERVAL_MAX);
+ } while (time_before(jiffies, start + PECI_RETRY_TIMEOUT));
+
+ dev_dbg(&controller->dev, "request timed out\n");
+
+ return -ETIMEDOUT;
+}
+
/**
* peci_request_alloc() - allocate &struct peci_requests
* @device: PECI device to which request is going to be sent
@@ -53,3 +180,90 @@ void peci_request_free(struct peci_request *req)
kfree(req);
}
EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
+
+struct peci_request *peci_xfer_get_dib(struct peci_device *device)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_GET_DIB_WR_LEN, PECI_GET_DIB_RD_LEN);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_GET_DIB_CMD;
+
+ ret = peci_request_xfer(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+EXPORT_SYMBOL_NS_GPL(peci_xfer_get_dib, PECI);
+
+static struct peci_request *
+__pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_RDPKGCFG_WR_LEN, PECI_RDPKGCFG_RD_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_RDPKGCFG_CMD;
+ req->tx.buf[1] = 0;
+ req->tx.buf[2] = index;
+ put_unaligned_le16(param, &req->tx.buf[3]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
+u8 peci_request_data_readb(struct peci_request *req)
+{
+ return req->rx.buf[1];
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readb, PECI);
+
+u16 peci_request_data_readw(struct peci_request *req)
+{
+ return get_unaligned_le16(&req->rx.buf[1]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readw, PECI);
+
+u32 peci_request_data_readl(struct peci_request *req)
+{
+ return get_unaligned_le32(&req->rx.buf[1]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readl, PECI);
+
+u64 peci_request_data_readq(struct peci_request *req)
+{
+ return get_unaligned_le64(&req->rx.buf[1]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readq, PECI);
+
+u64 peci_request_dib_read(struct peci_request *req)
+{
+ return get_unaligned_le64(&req->rx.buf[0]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_dib_read, PECI);
+
+#define __read_pkg_config(x, type) \
+struct peci_request *peci_xfer_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param) \
+{ \
+ return __pkg_cfg_read(device, index, param, sizeof(type)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_xfer_pkg_cfg_##x, PECI)
+
+__read_pkg_config(readb, u8);
+__read_pkg_config(readw, u16);
+__read_pkg_config(readl, u32);
+__read_pkg_config(readq, u64);
diff --git a/include/linux/peci.h b/include/linux/peci.h
index 7e35673f3786..4eda423ba10c 100644
--- a/include/linux/peci.h
+++ b/include/linux/peci.h
@@ -14,6 +14,14 @@
*/
#define PECI_REQUEST_MAX_BUF_SIZE 32
+#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
+#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
+#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
+#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
+#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
+#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
+#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
+
struct peci_controller;
struct peci_request;
@@ -59,6 +67,11 @@ static inline struct peci_controller *to_peci_controller(void *d)
* struct peci_device - PECI device
* @dev: device object to register PECI device to the device model
* @controller: manages the bus segment hosting this PECI device
+ * @info: PECI device characteristics
+ * @info.family: device family
+ * @info.model: device model
+ * @info.peci_revision: PECI revision supported by the PECI device
+ * @info.socket_id: the socket ID represented by the PECI device
* @addr: address used on the PECI bus connected to the parent controller
* @deleted: indicates that PECI device was already deleted
*
@@ -68,6 +81,12 @@ static inline struct peci_controller *to_peci_controller(void *d)
*/
struct peci_device {
struct device dev;
+ struct {
+ u16 family;
+ u8 model;
+ u8 peci_revision;
+ u8 socket_id;
+ } info;
u8 addr;
bool deleted;
};
--
2.34.1
Add a brief overview of PECI and PECI wire interface.
The documentation also contains kernel-doc for PECI subsystem internals
and PECI CPU Driver API.
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
Documentation/index.rst | 1 +
Documentation/peci/index.rst | 16 +++++++++++
Documentation/peci/peci.rst | 51 ++++++++++++++++++++++++++++++++++++
MAINTAINERS | 1 +
4 files changed, 69 insertions(+)
create mode 100644 Documentation/peci/index.rst
create mode 100644 Documentation/peci/peci.rst
diff --git a/Documentation/index.rst b/Documentation/index.rst
index b58692d687f6..1988c19d9daf 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -138,6 +138,7 @@ needed).
scheduler/index
mhi/index
tty/index
+ peci/index
Architecture-agnostic documentation
-----------------------------------
diff --git a/Documentation/peci/index.rst b/Documentation/peci/index.rst
new file mode 100644
index 000000000000..989de10416e7
--- /dev/null
+++ b/Documentation/peci/index.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+====================
+Linux PECI Subsystem
+====================
+
+.. toctree::
+
+ peci
+
+.. only:: subproject and html
+
+ Indices
+ =======
+
+ * :ref:`genindex`
diff --git a/Documentation/peci/peci.rst b/Documentation/peci/peci.rst
new file mode 100644
index 000000000000..331b1ec00e22
--- /dev/null
+++ b/Documentation/peci/peci.rst
@@ -0,0 +1,51 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+========
+Overview
+========
+
+The Platform Environment Control Interface (PECI) is a communication
+interface between Intel processor and management controllers
+(e.g. Baseboard Management Controller, BMC).
+PECI provides services that allow the management controller to
+configure, monitor and debug platform by accessing various registers.
+It defines a dedicated command protocol, where the management
+controller is acting as a PECI originator and the processor - as
+a PECI responder.
+PECI can be used in both single processor and multiple-processor based
+systems.
+
+NOTE:
+Intel PECI specification is not released as a dedicated document,
+instead it is a part of External Design Specification (EDS) for given
+Intel CPU. External Design Specifications are usually not publicly
+available.
+
+PECI Wire
+---------
+
+PECI Wire interface uses a single wire for self-clocking and data
+transfer. It does not require any additional control lines - the
+physical layer is a self-clocked one-wire bus signal that begins each
+bit with a driven, rising edge from an idle near zero volts. The
+duration of the signal driven high allows to determine whether the bit
+value is logic '0' or logic '1'. PECI Wire also includes variable data
+rate established with every message.
+
+For PECI Wire, each processor package will utilize unique, fixed
+addresses within a defined range and that address should
+have a fixed relationship with the processor socket ID - if one of the
+processors is removed, it does not affect addresses of remaining
+processors.
+
+PECI subsystem internals
+------------------------
+
+.. kernel-doc:: include/linux/peci.h
+.. kernel-doc:: drivers/peci/internal.h
+.. kernel-doc:: drivers/peci/core.c
+.. kernel-doc:: drivers/peci/request.c
+
+PECI CPU Driver API
+-------------------
+.. kernel-doc:: drivers/peci/cpu.c
diff --git a/MAINTAINERS b/MAINTAINERS
index c666ef7ea5a5..9d248d55ac30 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15122,6 +15122,7 @@ M: Iwona Winiarska <[email protected]>
L: [email protected] (moderated for non-subscribers)
S: Supported
F: Documentation/devicetree/bindings/peci/
+F: Documentation/peci/
F: drivers/peci/
F: include/linux/peci-cpu.h
F: include/linux/peci.h
--
2.34.1
Since PECI devices are discoverable, we can dynamically detect devices
that are actually available in the system.
This change complements the earlier implementation by rescanning PECI
bus to detect available devices. For this purpose, it also introduces the
minimal API for PECI requests.
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
drivers/peci/Makefile | 2 +-
drivers/peci/core.c | 33 +++++++++++
drivers/peci/device.c | 120 ++++++++++++++++++++++++++++++++++++++++
drivers/peci/internal.h | 14 +++++
drivers/peci/request.c | 55 ++++++++++++++++++
include/linux/peci.h | 2 +
6 files changed, 225 insertions(+), 1 deletion(-)
create mode 100644 drivers/peci/device.c
create mode 100644 drivers/peci/request.c
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
index 926d8df15cbd..c5f9d3fe21bb 100644
--- a/drivers/peci/Makefile
+++ b/drivers/peci/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
# Core functionality
-peci-y := core.o
+peci-y := core.o request.o device.o
obj-$(CONFIG_PECI) += peci.o
# Hardware specific bus drivers
diff --git a/drivers/peci/core.c b/drivers/peci/core.c
index 73ad0a47fa9d..c3361e6e043a 100644
--- a/drivers/peci/core.c
+++ b/drivers/peci/core.c
@@ -29,6 +29,20 @@ struct device_type peci_controller_type = {
.release = peci_controller_dev_release,
};
+static int peci_controller_scan_devices(struct peci_controller *controller)
+{
+ int ret;
+ u8 addr;
+
+ for (addr = PECI_BASE_ADDR; addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX; addr++) {
+ ret = peci_device_create(controller, addr);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
static struct peci_controller *peci_controller_alloc(struct device *dev,
struct peci_controller_ops *ops)
{
@@ -64,10 +78,23 @@ static struct peci_controller *peci_controller_alloc(struct device *dev,
return ERR_PTR(ret);
}
+static int unregister_child(struct device *dev, void *dummy)
+{
+ peci_device_destroy(to_peci_device(dev));
+
+ return 0;
+}
+
static void unregister_controller(void *_controller)
{
struct peci_controller *controller = _controller;
+ /*
+ * Detach any active PECI devices. This can't fail, thus we do not
+ * check the returned value.
+ */
+ device_for_each_child_reverse(&controller->dev, NULL, unregister_child);
+
device_unregister(&controller->dev);
fwnode_handle_put(controller->dev.fwnode);
@@ -113,6 +140,12 @@ struct peci_controller *devm_peci_controller_add(struct device *dev,
if (ret)
return ERR_PTR(ret);
+ /*
+ * Ignoring retval since failures during scan are non-critical for
+ * controller itself.
+ */
+ peci_controller_scan_devices(controller);
+
return controller;
err_fwnode:
diff --git a/drivers/peci/device.c b/drivers/peci/device.c
new file mode 100644
index 000000000000..2b3a2d893aaf
--- /dev/null
+++ b/drivers/peci/device.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/peci.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+/*
+ * PECI device can be removed using sysfs, but the removal can also happen as
+ * a result of controller being removed.
+ * Mutex is used to protect PECI device from being double-deleted.
+ */
+static DEFINE_MUTEX(peci_device_del_lock);
+
+static int peci_detect(struct peci_controller *controller, u8 addr)
+{
+ /*
+ * PECI Ping is a command encoded by tx_len = 0, rx_len = 0.
+ * We expect correct Write FCS if the device at the target address
+ * is able to respond.
+ */
+ struct peci_request req = { 0 };
+ int ret;
+
+ mutex_lock(&controller->bus_lock);
+ ret = controller->ops->xfer(controller, addr, &req);
+ mutex_unlock(&controller->bus_lock);
+
+ return ret;
+}
+
+static bool peci_addr_valid(u8 addr)
+{
+ return addr >= PECI_BASE_ADDR && addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX;
+}
+
+static int peci_dev_exists(struct device *dev, void *data)
+{
+ struct peci_device *device = to_peci_device(dev);
+ u8 *addr = data;
+
+ if (device->addr == *addr)
+ return -EBUSY;
+
+ return 0;
+}
+
+int peci_device_create(struct peci_controller *controller, u8 addr)
+{
+ struct peci_device *device;
+ int ret;
+
+ if (!peci_addr_valid(addr))
+ return -EINVAL;
+
+ /* Check if we have already detected this device before. */
+ ret = device_for_each_child(&controller->dev, &addr, peci_dev_exists);
+ if (ret)
+ return 0;
+
+ ret = peci_detect(controller, addr);
+ if (ret) {
+ /*
+ * Device not present or host state doesn't allow successful
+ * detection at this time.
+ */
+ if (ret == -EIO || ret == -ETIMEDOUT)
+ return 0;
+
+ return ret;
+ }
+
+ device = kzalloc(sizeof(*device), GFP_KERNEL);
+ if (!device)
+ return -ENOMEM;
+
+ device_initialize(&device->dev);
+
+ device->addr = addr;
+ device->dev.parent = &controller->dev;
+ device->dev.bus = &peci_bus_type;
+ device->dev.type = &peci_device_type;
+
+ ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device->addr);
+ if (ret)
+ goto err_put;
+
+ ret = device_add(&device->dev);
+ if (ret)
+ goto err_put;
+
+ return 0;
+
+err_put:
+ put_device(&device->dev);
+
+ return ret;
+}
+
+void peci_device_destroy(struct peci_device *device)
+{
+ mutex_lock(&peci_device_del_lock);
+ if (!device->deleted) {
+ device_unregister(&device->dev);
+ device->deleted = true;
+ }
+ mutex_unlock(&peci_device_del_lock);
+}
+
+static void peci_device_release(struct device *dev)
+{
+ struct peci_device *device = to_peci_device(dev);
+
+ kfree(device);
+}
+
+struct device_type peci_device_type = {
+ .release = peci_device_release,
+};
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
index 918dea745a86..57d11a902c5d 100644
--- a/drivers/peci/internal.h
+++ b/drivers/peci/internal.h
@@ -8,6 +8,20 @@
#include <linux/types.h>
struct peci_controller;
+struct peci_device;
+struct peci_request;
+
+/* PECI CPU address range 0x30-0x37 */
+#define PECI_BASE_ADDR 0x30
+#define PECI_DEVICE_NUM_MAX 8
+
+struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len);
+void peci_request_free(struct peci_request *req);
+
+extern struct device_type peci_device_type;
+
+int peci_device_create(struct peci_controller *controller, u8 addr);
+void peci_device_destroy(struct peci_device *device);
extern struct bus_type peci_bus_type;
diff --git a/drivers/peci/request.c b/drivers/peci/request.c
new file mode 100644
index 000000000000..7dee51c50dd2
--- /dev/null
+++ b/drivers/peci/request.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2021 Intel Corporation
+
+#include <linux/export.h>
+#include <linux/peci.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "internal.h"
+
+/**
+ * peci_request_alloc() - allocate &struct peci_requests
+ * @device: PECI device to which request is going to be sent
+ * @tx_len: TX length
+ * @rx_len: RX length
+ *
+ * Return: A pointer to a newly allocated &struct peci_request on success or NULL otherwise.
+ */
+struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len)
+{
+ struct peci_request *req;
+
+ /*
+ * TX and RX buffers are fixed length members of peci_request, this is
+ * just a warn for developers to make sure to expand the buffers (or
+ * change the allocation method) if we go over the current limit.
+ */
+ if (WARN_ON_ONCE(tx_len > PECI_REQUEST_MAX_BUF_SIZE || rx_len > PECI_REQUEST_MAX_BUF_SIZE))
+ return NULL;
+ /*
+ * PECI controllers that we are using now don't support DMA, this
+ * should be converted to DMA API once support for controllers that do
+ * allow it is added to avoid an extra copy.
+ */
+ req = kzalloc(sizeof(*req), GFP_KERNEL);
+ if (!req)
+ return NULL;
+
+ req->device = device;
+ req->tx.len = tx_len;
+ req->rx.len = rx_len;
+
+ return req;
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_alloc, PECI);
+
+/**
+ * peci_request_free() - free peci_request
+ * @req: the PECI request to be freed
+ */
+void peci_request_free(struct peci_request *req)
+{
+ kfree(req);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
diff --git a/include/linux/peci.h b/include/linux/peci.h
index 26e0a4e73b50..7e35673f3786 100644
--- a/include/linux/peci.h
+++ b/include/linux/peci.h
@@ -60,6 +60,7 @@ static inline struct peci_controller *to_peci_controller(void *d)
* @dev: device object to register PECI device to the device model
* @controller: manages the bus segment hosting this PECI device
* @addr: address used on the PECI bus connected to the parent controller
+ * @deleted: indicates that PECI device was already deleted
*
* A peci_device identifies a single device (i.e. CPU) connected to a PECI bus.
* The behaviour exposed to the rest of the system is defined by the PECI driver
@@ -68,6 +69,7 @@ static inline struct peci_controller *to_peci_controller(void *d)
struct peci_device {
struct device dev;
u8 addr;
+ bool deleted;
};
static inline struct peci_device *to_peci_device(struct device *d)
--
2.34.1
On Tue, 8 Feb 2022 at 15:37, Iwona Winiarska <[email protected]> wrote:
>
> Hi Greg,
>
> I applied the fixups to sysfs ABI docs, I would appreciate if PECI
> could go through your tree into v5.18.
Acked-by: Joel Stanley <[email protected]>
>
> Here is the usual cover letter from the previous revision:
>
> The Platform Environment Control Interface (PECI) is a communication
> interface between Intel processors and management controllers (e.g.
> Baseboard Management Controller, BMC).
>
> This series adds a PECI subsystem and introduces drivers which run in
> the Linux instance on the management controller (not the main Intel
> processor) and is intended to be used by the OpenBMC [1], a Linux
> distribution for BMC devices.
> The information exposed over PECI (like processor and DIMM
> temperature) refers to the Intel processor and can be consumed by
> daemons running on the BMC to, for example, display the processor
> temperature in its web interface.
>
> The PECI bus is collection of code that provides interface support
> between PECI devices (that actually represent processors) and PECI
> controllers (such as the "peci-aspeed" controller) that allow to
> access physical PECI interface. PECI devices are bound to PECI
> drivers that provides access to PECI services. This series introduces
> a generic "peci-cpu" driver that exposes hardware monitoring "cputemp"
> and "dimmtemp" using the auxiliary bus.
>
> Exposing "raw" PECI to userspace, either to write userspace drivers or
> for debug/testing purpose was left out of this series to encourage
> writing kernel drivers instead, but may be pursued in the future.
>
> Introducing PECI to upstream Linux was already attempted before [2].
> Since it's been over a year since last revision, and the series
> changed quite a bit in the meantime, I've decided to start from v1.
>
> I would also like to give credit to everyone who helped me with
> different aspects of preliminary review:
> - Pierre-Louis Bossart,
> - Tony Luck,
> - Andy Shevchenko,
> - Dave Hansen.
>
> [1] https://github.com/openbmc/openbmc
> [2] https://lore.kernel.org/openbmc/[email protected]/
>
> Changes v7 -> v8:
> * Updated "KernelVersion" in sysfs ABI docs (Greg)
>
> Changes v6 -> v7:
> * Fixed Kconfig warnings ([email protected])
>
> Changes v5 -> v6:
> * Added missing COMMON_CLK selection ([email protected])
> * Fixed WARN_ON always evaluated to true ([email protected])
> * Clean interrupt status unconditionally (Joel)
> * Replaced memcpy_toio()/memcpy_fromio() with writel()/readl() to
> avoid issues when submitting unaligned PECI commands
>
> Changes v4 -> v5:
> * Added clk_aspeed_peci to express controller programming using common
> clock framework (Billy)
> * Modified peci-aspeed DTS schema to match clock changes (Billy)
> * Added workaround for peci-aspeed controller hang (Billy)
> * Removed unnecessary "else after return" (Guenter)
>
> Changes v3 -> v4:
> * Fixed an issue where peci doesn't work after host shutdown (Zev)
> * Replaced kill_device() with peci_device_del_lock (Greg)
> * Fixed dts_valid() parameter type (Guenter)
> * Removed Jae from MAINTAINERS file (Jae)
>
> Changes v2 -> v3:
>
> * Dropped x86/cpu patches (Boris)
> * Dropped pr_fmt() for PECI module (Dan)
> * Fixed releasing peci controller device flow (Dan)
> * Improved peci-aspeed commit-msg and Kconfig help (Dan)
> * Fixed aspeed_peci_xfer() to use the proper spin_lock function (Dan)
> * Wrapped print_hex_dump_bytes() in CONFIG_DYNAMIC_DEBUG (Dan)
> * Removed debug status logs from aspeed_peci_irq_handler() (Dan)
> * Renamed functions using devres to start with "devm" (Dan)
> * Changed request to be allocated on stack in peci_detect (Dan)
> * Removed redundant WARN_ON on invalid PECI addr (Dan)
> * Changed peci_device_create() to use device_initialize() + device_add() pattern (Dan)
> * Fixed peci_device_destroy() to use kill_device() avoiding double-free (Dan)
> * Renamed functions that perform xfer using "peci_xfer_*" prefix (Dan)
> * Renamed peci_request_data_dib(temp) -> peci_request_dib(temp)_read (Dan)
> * Fixed thermal margin readings for older Intel processors (Zev)
> * Misc hwmon simplifications (Guenter)
> * Used BIT_PER_TYPE to verify macro value constrains (Guenter)
> * Improved WARN_ON message to print chan_rank_max and idx_dimm_max (Guenter)
> * Improved dimmtemp to not reattempt probe if no dimms are populated
>
> Changes v1 -> v2:
>
> Biggest changes when it comes to diffstat are locking in HWMON
> (I decided to clean things up a bit while adding it), switching to
> devres usage in more places and exposing sysfs interface in separate patch.
>
> * Moved extending X86 ARCHITECTURE MAINTAINERS earlier in series (Dan)
> * Removed "default n" for GENERIC_LIB_X86 (Dan)
> * Added vendor prefix for peci-aspeed specific properties (Rob)
> * Refactored PECI to use devres consistently (Dan)
> * Added missing sysfs documentation and excluded adding peci-sysfs to
> separate patch (Dan)
> * Used module_init() instead of subsys_init() for peci module initialization (Dan)
> * Removed redundant struct peci_device member (Dan)
> * Improved PECI Kconfig help (Randy/Dan)
> * Fixed/removed log messages (Dan, Guenter)
> * Refactored peci-cputemp and peci-dimmtemp and added missing locks (Guenter)
> * Removed unused dev_set_drvdata() in peci-cputemp and peci-dimmtemp (Guenter)
> * Fixed used types, names, fixed broken and added additional comments
> to peci-hwmon (Guenter, Zev)
> * Refactored peci-dimmtemp to not return -ETIMEDOUT (Guenter)
> * Added sanity check for min_peci_revision in peci-hwmon drivers (Zev)
> * Added assert for DIMM_NUMS_MAX and additional warning in peci-dimmtemp (Zev)
> * Fixed macro names in peci-aspeed (Zev)
> * Refactored peci-aspeed sanitizing properties to a single helper function (Zev)
> * Fixed peci_cpu_device_ids definition for Broadwell Xeon D (David)
> * Refactor peci_request to use a single allocation (Zev)
> * Used min_t() to improve code readability (Zev)
> * Added macro for PECI_RDENDPTCFG_MMIO_WR_LEN_BASE and fixed adev type
> array name to more descriptive (Zev)
> * Fixed peci-hwmon commit-msg and documentation (Zev)
>
> Thanks
> -Iwona
>
> Iwona Winiarska (11):
> dt-bindings: Add generic bindings for PECI
> dt-bindings: Add bindings for peci-aspeed
> ARM: dts: aspeed: Add PECI controller nodes
> peci: Add core infrastructure
> peci: Add device detection
> peci: Add sysfs interface for PECI bus
> peci: Add support for PECI device drivers
> peci: Add peci-cpu driver
> hwmon: peci: Add cputemp driver
> hwmon: peci: Add dimmtemp driver
> docs: Add PECI documentation
>
> Jae Hyun Yoo (2):
> peci: Add peci-aspeed controller driver
> docs: hwmon: Document PECI drivers
>
> Documentation/ABI/testing/sysfs-bus-peci | 16 +
> .../devicetree/bindings/peci/peci-aspeed.yaml | 72 ++
> .../bindings/peci/peci-controller.yaml | 33 +
> Documentation/hwmon/index.rst | 2 +
> Documentation/hwmon/peci-cputemp.rst | 90 +++
> Documentation/hwmon/peci-dimmtemp.rst | 57 ++
> Documentation/index.rst | 1 +
> Documentation/peci/index.rst | 16 +
> Documentation/peci/peci.rst | 51 ++
> MAINTAINERS | 26 +
> arch/arm/boot/dts/aspeed-g4.dtsi | 11 +
> arch/arm/boot/dts/aspeed-g5.dtsi | 11 +
> arch/arm/boot/dts/aspeed-g6.dtsi | 11 +
> drivers/Kconfig | 3 +
> drivers/Makefile | 1 +
> drivers/hwmon/Kconfig | 2 +
> drivers/hwmon/Makefile | 1 +
> drivers/hwmon/peci/Kconfig | 31 +
> drivers/hwmon/peci/Makefile | 7 +
> drivers/hwmon/peci/common.h | 58 ++
> drivers/hwmon/peci/cputemp.c | 592 ++++++++++++++++
> drivers/hwmon/peci/dimmtemp.c | 630 ++++++++++++++++++
> drivers/peci/Kconfig | 36 +
> drivers/peci/Makefile | 10 +
> drivers/peci/controller/Kconfig | 18 +
> drivers/peci/controller/Makefile | 3 +
> drivers/peci/controller/peci-aspeed.c | 599 +++++++++++++++++
> drivers/peci/core.c | 236 +++++++
> drivers/peci/cpu.c | 343 ++++++++++
> drivers/peci/device.c | 252 +++++++
> drivers/peci/internal.h | 136 ++++
> drivers/peci/request.c | 482 ++++++++++++++
> drivers/peci/sysfs.c | 82 +++
> include/linux/peci-cpu.h | 40 ++
> include/linux/peci.h | 112 ++++
> 35 files changed, 4071 insertions(+)
> create mode 100644 Documentation/ABI/testing/sysfs-bus-peci
> create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.yaml
> create mode 100644 Documentation/devicetree/bindings/peci/peci-controller.yaml
> create mode 100644 Documentation/hwmon/peci-cputemp.rst
> create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
> create mode 100644 Documentation/peci/index.rst
> create mode 100644 Documentation/peci/peci.rst
> create mode 100644 drivers/hwmon/peci/Kconfig
> create mode 100644 drivers/hwmon/peci/Makefile
> create mode 100644 drivers/hwmon/peci/common.h
> create mode 100644 drivers/hwmon/peci/cputemp.c
> create mode 100644 drivers/hwmon/peci/dimmtemp.c
> create mode 100644 drivers/peci/Kconfig
> create mode 100644 drivers/peci/Makefile
> create mode 100644 drivers/peci/controller/Kconfig
> create mode 100644 drivers/peci/controller/Makefile
> create mode 100644 drivers/peci/controller/peci-aspeed.c
> create mode 100644 drivers/peci/core.c
> create mode 100644 drivers/peci/cpu.c
> create mode 100644 drivers/peci/device.c
> create mode 100644 drivers/peci/internal.h
> create mode 100644 drivers/peci/request.c
> create mode 100644 drivers/peci/sysfs.c
> create mode 100644 include/linux/peci-cpu.h
> create mode 100644 include/linux/peci.h
>
> --
> 2.34.1
>
Hello,
We are seeing wrong DTS temperatures on at least "Intel(R) Xeon(R)
Bronze 3204 CPU @ 1.90GHz" and most probably other Skylake Xeon CPUs
are also affected, see inline.
On Tue, Feb 08, 2022 at 04:36:36PM +0100, Iwona Winiarska wrote:
> Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
> readings of the processor package and processor cores that are
> accessible via the PECI interface.
...
> +static const struct cpu_info cpu_hsx = {
> + .reg = &resolved_cores_reg_hsx,
> + .min_peci_revision = 0x33,
> + .thermal_margin_to_millidegree = &dts_eight_dot_eight_to_millidegree,
> +};
> +
> +static const struct cpu_info cpu_icx = {
> + .reg = &resolved_cores_reg_icx,
> + .min_peci_revision = 0x40,
> + .thermal_margin_to_millidegree = &dts_ten_dot_six_to_millidegree,
> +};
...
> + {
> + .name = "peci_cpu.cputemp.skx",
> + .driver_data = (kernel_ulong_t)&cpu_hsx,
> + },
With this configuration we get this data:
/sys/bus/peci/devices/0-30/peci_cpu.cputemp.skx.48/hwmon/hwmon15# grep . temp[123]_{label,input}
temp1_label:Die
temp2_label:DTS
temp3_label:Tcontrol
temp1_input:30938
temp2_input:67735
temp3_input:80000
On the host system "sensors" report
Package id 0: +31.C (high = +80.C, crit = +90.C)
So I conclude Die temperature as retrieved over PECI is correct while
DTS is mis-calculated. The old downstream code in OpenBMC was using
ten_dot_six_to_millidegree() function for conversion, and that was
providing expected results. And indeed if we reverse the calculation
here we get 80000 - ((80000-67735) * 256 / 64) = 30940 which matches
expectations.
--
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:[email protected]
On Mon, Mar 20, 2023 at 02:45:15PM +0300, Paul Fertser wrote:
> Hello,
>
> We are seeing wrong DTS temperatures on at least "Intel(R) Xeon(R)
> Bronze 3204 CPU @ 1.90GHz" and most probably other Skylake Xeon CPUs
> are also affected, see inline.
Thanks for the report! I guess we need a fix for this indeed.
> On Tue, Feb 08, 2022 at 04:36:36PM +0100, Iwona Winiarska wrote:
> > Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
> > readings of the processor package and processor cores that are
> > accessible via the PECI interface.
> ...
> > +static const struct cpu_info cpu_hsx = {
> > + .reg = &resolved_cores_reg_hsx,
> > + .min_peci_revision = 0x33,
> > + .thermal_margin_to_millidegree = &dts_eight_dot_eight_to_millidegree,
> > +};
> > +
> > +static const struct cpu_info cpu_icx = {
> > + .reg = &resolved_cores_reg_icx,
> > + .min_peci_revision = 0x40,
> > + .thermal_margin_to_millidegree = &dts_ten_dot_six_to_millidegree,
> > +};
> ...
> > + {
> > + .name = "peci_cpu.cputemp.skx",
> > + .driver_data = (kernel_ulong_t)&cpu_hsx,
> > + },
>
> With this configuration we get this data:
>
> /sys/bus/peci/devices/0-30/peci_cpu.cputemp.skx.48/hwmon/hwmon15# grep . temp[123]_{label,input}
> temp1_label:Die
> temp2_label:DTS
> temp3_label:Tcontrol
> temp1_input:30938
> temp2_input:67735
> temp3_input:80000
>
> On the host system "sensors" report
>
> Package id 0: +31.C (high = +80.C, crit = +90.C)
>
> So I conclude Die temperature as retrieved over PECI is correct while
> DTS is mis-calculated. The old downstream code in OpenBMC was using
> ten_dot_six_to_millidegree() function for conversion, and that was
> providing expected results. And indeed if we reverse the calculation
> here we get 80000 - ((80000-67735) * 256 / 64) = 30940 which matches
> expectations.
--
With Best Regards,
Andy Shevchenko
On Mon, 2023-03-20 at 14:45 +0300, Paul Fertser wrote:
> Hello,
>
> We are seeing wrong DTS temperatures on at least "Intel(R) Xeon(R)
> Bronze 3204 CPU @ 1.90GHz" and most probably other Skylake Xeon CPUs
> are also affected, see inline.
>
> On Tue, Feb 08, 2022 at 04:36:36PM +0100, Iwona Winiarska wrote:
> > Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
> > readings of the processor package and processor cores that are
> > accessible via the PECI interface.
> ...
> > +static const struct cpu_info cpu_hsx = {
> > + .reg = &resolved_cores_reg_hsx,
> > + .min_peci_revision = 0x33,
> > + .thermal_margin_to_millidegree =
> > &dts_eight_dot_eight_to_millidegree,
> > +};
> > +
> > +static const struct cpu_info cpu_icx = {
> > + .reg = &resolved_cores_reg_icx,
> > + .min_peci_revision = 0x40,
> > + .thermal_margin_to_millidegree = &dts_ten_dot_six_to_millidegree,
> > +};
> ...
> > + {
> > + .name = "peci_cpu.cputemp.skx",
> > + .driver_data = (kernel_ulong_t)&cpu_hsx,
> > + },
>
> With this configuration we get this data:
>
> /sys/bus/peci/devices/0-30/peci_cpu.cputemp.skx.48/hwmon/hwmon15# grep .
> temp[123]_{label,input}
> temp1_label:Die
> temp2_label:DTS
> temp3_label:Tcontrol
> temp1_input:30938
> temp2_input:67735
> temp3_input:80000
>
> On the host system "sensors" report
>
> Package id 0: +31.C (high = +80.C, crit = +90.C)
>
> So I conclude Die temperature as retrieved over PECI is correct while
> DTS is mis-calculated. The old downstream code in OpenBMC was using
> ten_dot_six_to_millidegree() function for conversion, and that was
> providing expected results. And indeed if we reverse the calculation
> here we get 80000 - ((80000-67735) * 256 / 64) = 30940 which matches
> expectations.
>
Hi!
Thanks for the report.
It was changed between v2 and v3 after a report about negative temperature on
pre-ICX platforms:
https://lore.kernel.org/lkml/[email protected]/
Unfortunately, I'm not able to test this on Cascade Lake X (or any other pre-ICX
platform).
I just sent a patch that changes SKX to use S10.6 format:
https://lore.kernel.org/lkml/[email protected]/
Thanks
-Iwona