Note: All changes to arch/x86 are contained within patches 01-02.
The Platform Environment Control Interface (PECI) is a communication
interface between Intel processors and management controllers (e.g.
Baseboard Management Controller, BMC).
This series adds a PECI subsystem and introduces drivers which run in
the Linux instance on the management controller (not the main Intel
processor) and is intended to be used by the OpenBMC [1], a Linux
distribution for BMC devices.
The information exposed over PECI (like processor and DIMM
temperature) refers to the Intel processor and can be consumed by
daemons running on the BMC to, for example, display the processor
temperature in its web interface.
The PECI bus is collection of code that provides interface support
between PECI devices (that actually represent processors) and PECI
controllers (such as the "peci-aspeed" controller) that allow to
access physical PECI interface. PECI devices are bound to PECI
drivers that provides access to PECI services. This series introduces
a generic "peci-cpu" driver that exposes hardware monitoring "cputemp"
and "dimmtemp" using the auxiliary bus.
Exposing "raw" PECI to userspace, either to write userspace drivers or
for debug/testing purpose was left out of this series to encourage
writing kernel drivers instead, but may be pursued in the future.
Introducing PECI to upstream Linux was already attempted before [2].
Since it's been over a year since last revision, and the series
changed quite a bit in the meantime, I've decided to start from v1.
I would also like to give credit to everyone who helped me with
different aspects of preliminary review:
- Pierre-Louis Bossart,
- Tony Luck,
- Andy Shevchenko,
- Dave Hansen.
[1] https://github.com/openbmc/openbmc
[2] https://lore.kernel.org/openbmc/[email protected]/
Iwona Winiarska (12):
x86/cpu: Move intel-family to arch-independent headers
x86/cpu: Extract cpuid helpers to arch-independent
dt-bindings: Add generic bindings for PECI
dt-bindings: Add bindings for peci-aspeed
ARM: dts: aspeed: Add PECI controller nodes
peci: Add core infrastructure
peci: Add device detection
peci: Add support for PECI device drivers
peci: Add peci-cpu driver
hwmon: peci: Add cputemp driver
hwmon: peci: Add dimmtemp driver
docs: Add PECI documentation
Jae Hyun Yoo (2):
peci: Add peci-aspeed controller driver
docs: hwmon: Document PECI drivers
.../devicetree/bindings/peci/peci-aspeed.yaml | 111 ++++
.../bindings/peci/peci-controller.yaml | 28 +
Documentation/hwmon/index.rst | 2 +
Documentation/hwmon/peci-cputemp.rst | 93 ++++
Documentation/hwmon/peci-dimmtemp.rst | 58 ++
Documentation/index.rst | 1 +
Documentation/peci/index.rst | 16 +
Documentation/peci/peci.rst | 48 ++
MAINTAINERS | 32 ++
arch/arm/boot/dts/aspeed-g4.dtsi | 14 +
arch/arm/boot/dts/aspeed-g5.dtsi | 14 +
arch/arm/boot/dts/aspeed-g6.dtsi | 14 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/cpu.h | 3 -
arch/x86/include/asm/intel-family.h | 141 +----
arch/x86/include/asm/microcode.h | 2 +-
arch/x86/kvm/cpuid.h | 3 +-
arch/x86/lib/Makefile | 2 +-
drivers/Kconfig | 3 +
drivers/Makefile | 1 +
drivers/edac/mce_amd.c | 3 +-
drivers/hwmon/Kconfig | 2 +
drivers/hwmon/Makefile | 1 +
drivers/hwmon/peci/Kconfig | 31 ++
drivers/hwmon/peci/Makefile | 7 +
drivers/hwmon/peci/common.h | 46 ++
drivers/hwmon/peci/cputemp.c | 503 +++++++++++++++++
drivers/hwmon/peci/dimmtemp.c | 508 ++++++++++++++++++
drivers/peci/Kconfig | 36 ++
drivers/peci/Makefile | 10 +
drivers/peci/controller/Kconfig | 12 +
drivers/peci/controller/Makefile | 3 +
drivers/peci/controller/peci-aspeed.c | 501 +++++++++++++++++
drivers/peci/core.c | 224 ++++++++
drivers/peci/cpu.c | 347 ++++++++++++
drivers/peci/device.c | 211 ++++++++
drivers/peci/internal.h | 137 +++++
drivers/peci/request.c | 502 +++++++++++++++++
drivers/peci/sysfs.c | 82 +++
include/linux/peci-cpu.h | 38 ++
include/linux/peci.h | 93 ++++
include/linux/x86/cpu.h | 9 +
include/linux/x86/intel-family.h | 146 +++++
lib/Kconfig | 5 +
lib/Makefile | 2 +
lib/x86/Makefile | 3 +
{arch/x86/lib => lib/x86}/cpu.c | 2 +-
47 files changed, 3902 insertions(+), 149 deletions(-)
create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.yaml
create mode 100644 Documentation/devicetree/bindings/peci/peci-controller.yaml
create mode 100644 Documentation/hwmon/peci-cputemp.rst
create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
create mode 100644 Documentation/peci/index.rst
create mode 100644 Documentation/peci/peci.rst
create mode 100644 drivers/hwmon/peci/Kconfig
create mode 100644 drivers/hwmon/peci/Makefile
create mode 100644 drivers/hwmon/peci/common.h
create mode 100644 drivers/hwmon/peci/cputemp.c
create mode 100644 drivers/hwmon/peci/dimmtemp.c
create mode 100644 drivers/peci/Kconfig
create mode 100644 drivers/peci/Makefile
create mode 100644 drivers/peci/controller/Kconfig
create mode 100644 drivers/peci/controller/Makefile
create mode 100644 drivers/peci/controller/peci-aspeed.c
create mode 100644 drivers/peci/core.c
create mode 100644 drivers/peci/cpu.c
create mode 100644 drivers/peci/device.c
create mode 100644 drivers/peci/internal.h
create mode 100644 drivers/peci/request.c
create mode 100644 drivers/peci/sysfs.c
create mode 100644 include/linux/peci-cpu.h
create mode 100644 include/linux/peci.h
create mode 100644 include/linux/x86/cpu.h
create mode 100644 include/linux/x86/intel-family.h
create mode 100644 lib/x86/Makefile
rename {arch/x86/lib => lib/x86}/cpu.c (95%)
--
2.31.1
Baseboard management controllers (BMC) often run Linux but are usually
implemented with non-X86 processors. They can use PECI to access package
config space (PCS) registers on the host CPU and since some information,
e.g. figuring out the core count, can be obtained using different
registers on different CPU generations, they need to decode the family
and model.
Move the data from arch/x86/include/asm/intel-family.h into a new file
include/linux/x86/intel-family.h so that it can be used by other
architectures.
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
To limit tree-wide changes and help people that were expecting
intel-family defines in arch/x86 to find it more easily without going
through git history, we're not removing the original header
completely, we're keeping it as a "stub" that includes the new one.
If there is a consensus that the tree-wide option is better,
we can choose this approach.
MAINTAINERS | 1 +
arch/x86/include/asm/intel-family.h | 141 +--------------------------
include/linux/x86/intel-family.h | 146 ++++++++++++++++++++++++++++
3 files changed, 148 insertions(+), 140 deletions(-)
create mode 100644 include/linux/x86/intel-family.h
diff --git a/MAINTAINERS b/MAINTAINERS
index a61f4f3b78a9..ec5987a00800 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9240,6 +9240,7 @@ M: [email protected]
L: [email protected]
S: Supported
F: arch/x86/include/asm/intel-family.h
+F: include/linux/x86/intel-family.h
INTEL DRM DRIVERS (excluding Poulsbo, Moorestown and derivative chipsets)
M: Jani Nikula <[email protected]>
diff --git a/arch/x86/include/asm/intel-family.h b/arch/x86/include/asm/intel-family.h
index 27158436f322..0d4fe1b4e1f6 100644
--- a/arch/x86/include/asm/intel-family.h
+++ b/arch/x86/include/asm/intel-family.h
@@ -2,145 +2,6 @@
#ifndef _ASM_X86_INTEL_FAMILY_H
#define _ASM_X86_INTEL_FAMILY_H
-/*
- * "Big Core" Processors (Branded as Core, Xeon, etc...)
- *
- * While adding a new CPUID for a new microarchitecture, add a new
- * group to keep logically sorted out in chronological order. Within
- * that group keep the CPUID for the variants sorted by model number.
- *
- * The defined symbol names have the following form:
- * INTEL_FAM6{OPTFAMILY}_{MICROARCH}{OPTDIFF}
- * where:
- * OPTFAMILY Describes the family of CPUs that this belongs to. Default
- * is assumed to be "_CORE" (and should be omitted). Other values
- * currently in use are _ATOM and _XEON_PHI
- * MICROARCH Is the code name for the micro-architecture for this core.
- * N.B. Not the platform name.
- * OPTDIFF If needed, a short string to differentiate by market segment.
- *
- * Common OPTDIFFs:
- *
- * - regular client parts
- * _L - regular mobile parts
- * _G - parts with extra graphics on
- * _X - regular server parts
- * _D - micro server parts
- *
- * Historical OPTDIFFs:
- *
- * _EP - 2 socket server parts
- * _EX - 4+ socket server parts
- *
- * The #define line may optionally include a comment including platform or core
- * names. An exception is made for skylake/kabylake where steppings seem to have gotten
- * their own names :-(
- */
-
-/* Wildcard match for FAM6 so X86_MATCH_INTEL_FAM6_MODEL(ANY) works */
-#define INTEL_FAM6_ANY X86_MODEL_ANY
-
-#define INTEL_FAM6_CORE_YONAH 0x0E
-
-#define INTEL_FAM6_CORE2_MEROM 0x0F
-#define INTEL_FAM6_CORE2_MEROM_L 0x16
-#define INTEL_FAM6_CORE2_PENRYN 0x17
-#define INTEL_FAM6_CORE2_DUNNINGTON 0x1D
-
-#define INTEL_FAM6_NEHALEM 0x1E
-#define INTEL_FAM6_NEHALEM_G 0x1F /* Auburndale / Havendale */
-#define INTEL_FAM6_NEHALEM_EP 0x1A
-#define INTEL_FAM6_NEHALEM_EX 0x2E
-
-#define INTEL_FAM6_WESTMERE 0x25
-#define INTEL_FAM6_WESTMERE_EP 0x2C
-#define INTEL_FAM6_WESTMERE_EX 0x2F
-
-#define INTEL_FAM6_SANDYBRIDGE 0x2A
-#define INTEL_FAM6_SANDYBRIDGE_X 0x2D
-#define INTEL_FAM6_IVYBRIDGE 0x3A
-#define INTEL_FAM6_IVYBRIDGE_X 0x3E
-
-#define INTEL_FAM6_HASWELL 0x3C
-#define INTEL_FAM6_HASWELL_X 0x3F
-#define INTEL_FAM6_HASWELL_L 0x45
-#define INTEL_FAM6_HASWELL_G 0x46
-
-#define INTEL_FAM6_BROADWELL 0x3D
-#define INTEL_FAM6_BROADWELL_G 0x47
-#define INTEL_FAM6_BROADWELL_X 0x4F
-#define INTEL_FAM6_BROADWELL_D 0x56
-
-#define INTEL_FAM6_SKYLAKE_L 0x4E /* Sky Lake */
-#define INTEL_FAM6_SKYLAKE 0x5E /* Sky Lake */
-#define INTEL_FAM6_SKYLAKE_X 0x55 /* Sky Lake */
-/* CASCADELAKE_X 0x55 Sky Lake -- s: 7 */
-/* COOPERLAKE_X 0x55 Sky Lake -- s: 11 */
-
-#define INTEL_FAM6_KABYLAKE_L 0x8E /* Sky Lake */
-/* AMBERLAKE_L 0x8E Sky Lake -- s: 9 */
-/* COFFEELAKE_L 0x8E Sky Lake -- s: 10 */
-/* WHISKEYLAKE_L 0x8E Sky Lake -- s: 11,12 */
-
-#define INTEL_FAM6_KABYLAKE 0x9E /* Sky Lake */
-/* COFFEELAKE 0x9E Sky Lake -- s: 10-13 */
-
-#define INTEL_FAM6_COMETLAKE 0xA5 /* Sky Lake */
-#define INTEL_FAM6_COMETLAKE_L 0xA6 /* Sky Lake */
-
-#define INTEL_FAM6_CANNONLAKE_L 0x66 /* Palm Cove */
-
-#define INTEL_FAM6_ICELAKE_X 0x6A /* Sunny Cove */
-#define INTEL_FAM6_ICELAKE_D 0x6C /* Sunny Cove */
-#define INTEL_FAM6_ICELAKE 0x7D /* Sunny Cove */
-#define INTEL_FAM6_ICELAKE_L 0x7E /* Sunny Cove */
-#define INTEL_FAM6_ICELAKE_NNPI 0x9D /* Sunny Cove */
-
-#define INTEL_FAM6_LAKEFIELD 0x8A /* Sunny Cove / Tremont */
-
-#define INTEL_FAM6_ROCKETLAKE 0xA7 /* Cypress Cove */
-
-#define INTEL_FAM6_TIGERLAKE_L 0x8C /* Willow Cove */
-#define INTEL_FAM6_TIGERLAKE 0x8D /* Willow Cove */
-
-#define INTEL_FAM6_SAPPHIRERAPIDS_X 0x8F /* Golden Cove */
-
-#define INTEL_FAM6_ALDERLAKE 0x97 /* Golden Cove / Gracemont */
-#define INTEL_FAM6_ALDERLAKE_L 0x9A /* Golden Cove / Gracemont */
-
-/* "Small Core" Processors (Atom) */
-
-#define INTEL_FAM6_ATOM_BONNELL 0x1C /* Diamondville, Pineview */
-#define INTEL_FAM6_ATOM_BONNELL_MID 0x26 /* Silverthorne, Lincroft */
-
-#define INTEL_FAM6_ATOM_SALTWELL 0x36 /* Cedarview */
-#define INTEL_FAM6_ATOM_SALTWELL_MID 0x27 /* Penwell */
-#define INTEL_FAM6_ATOM_SALTWELL_TABLET 0x35 /* Cloverview */
-
-#define INTEL_FAM6_ATOM_SILVERMONT 0x37 /* Bay Trail, Valleyview */
-#define INTEL_FAM6_ATOM_SILVERMONT_D 0x4D /* Avaton, Rangely */
-#define INTEL_FAM6_ATOM_SILVERMONT_MID 0x4A /* Merriefield */
-
-#define INTEL_FAM6_ATOM_AIRMONT 0x4C /* Cherry Trail, Braswell */
-#define INTEL_FAM6_ATOM_AIRMONT_MID 0x5A /* Moorefield */
-#define INTEL_FAM6_ATOM_AIRMONT_NP 0x75 /* Lightning Mountain */
-
-#define INTEL_FAM6_ATOM_GOLDMONT 0x5C /* Apollo Lake */
-#define INTEL_FAM6_ATOM_GOLDMONT_D 0x5F /* Denverton */
-
-/* Note: the micro-architecture is "Goldmont Plus" */
-#define INTEL_FAM6_ATOM_GOLDMONT_PLUS 0x7A /* Gemini Lake */
-
-#define INTEL_FAM6_ATOM_TREMONT_D 0x86 /* Jacobsville */
-#define INTEL_FAM6_ATOM_TREMONT 0x96 /* Elkhart Lake */
-#define INTEL_FAM6_ATOM_TREMONT_L 0x9C /* Jasper Lake */
-
-/* Xeon Phi */
-
-#define INTEL_FAM6_XEON_PHI_KNL 0x57 /* Knights Landing */
-#define INTEL_FAM6_XEON_PHI_KNM 0x85 /* Knights Mill */
-
-/* Family 5 */
-#define INTEL_FAM5_QUARK_X1000 0x09 /* Quark X1000 SoC */
+#include <linux/x86/intel-family.h>
#endif /* _ASM_X86_INTEL_FAMILY_H */
diff --git a/include/linux/x86/intel-family.h b/include/linux/x86/intel-family.h
new file mode 100644
index 000000000000..ae4b075c1ab9
--- /dev/null
+++ b/include/linux/x86/intel-family.h
@@ -0,0 +1,146 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_X86_INTEL_FAMILY_H
+#define _LINUX_X86_INTEL_FAMILY_H
+
+/*
+ * "Big Core" Processors (Branded as Core, Xeon, etc...)
+ *
+ * While adding a new CPUID for a new microarchitecture, add a new
+ * group to keep logically sorted out in chronological order. Within
+ * that group keep the CPUID for the variants sorted by model number.
+ *
+ * The defined symbol names have the following form:
+ * INTEL_FAM6{OPTFAMILY}_{MICROARCH}{OPTDIFF}
+ * where:
+ * OPTFAMILY Describes the family of CPUs that this belongs to. Default
+ * is assumed to be "_CORE" (and should be omitted). Other values
+ * currently in use are _ATOM and _XEON_PHI
+ * MICROARCH Is the code name for the micro-architecture for this core.
+ * N.B. Not the platform name.
+ * OPTDIFF If needed, a short string to differentiate by market segment.
+ *
+ * Common OPTDIFFs:
+ *
+ * - regular client parts
+ * _L - regular mobile parts
+ * _G - parts with extra graphics on
+ * _X - regular server parts
+ * _D - micro server parts
+ *
+ * Historical OPTDIFFs:
+ *
+ * _EP - 2 socket server parts
+ * _EX - 4+ socket server parts
+ *
+ * The #define line may optionally include a comment including platform or core
+ * names. An exception is made for skylake/kabylake where steppings seem to have gotten
+ * their own names :-(
+ */
+
+/* Wildcard match for FAM6 so X86_MATCH_INTEL_FAM6_MODEL(ANY) works */
+#define INTEL_FAM6_ANY X86_MODEL_ANY
+
+#define INTEL_FAM6_CORE_YONAH 0x0E
+
+#define INTEL_FAM6_CORE2_MEROM 0x0F
+#define INTEL_FAM6_CORE2_MEROM_L 0x16
+#define INTEL_FAM6_CORE2_PENRYN 0x17
+#define INTEL_FAM6_CORE2_DUNNINGTON 0x1D
+
+#define INTEL_FAM6_NEHALEM 0x1E
+#define INTEL_FAM6_NEHALEM_G 0x1F /* Auburndale / Havendale */
+#define INTEL_FAM6_NEHALEM_EP 0x1A
+#define INTEL_FAM6_NEHALEM_EX 0x2E
+
+#define INTEL_FAM6_WESTMERE 0x25
+#define INTEL_FAM6_WESTMERE_EP 0x2C
+#define INTEL_FAM6_WESTMERE_EX 0x2F
+
+#define INTEL_FAM6_SANDYBRIDGE 0x2A
+#define INTEL_FAM6_SANDYBRIDGE_X 0x2D
+#define INTEL_FAM6_IVYBRIDGE 0x3A
+#define INTEL_FAM6_IVYBRIDGE_X 0x3E
+
+#define INTEL_FAM6_HASWELL 0x3C
+#define INTEL_FAM6_HASWELL_X 0x3F
+#define INTEL_FAM6_HASWELL_L 0x45
+#define INTEL_FAM6_HASWELL_G 0x46
+
+#define INTEL_FAM6_BROADWELL 0x3D
+#define INTEL_FAM6_BROADWELL_G 0x47
+#define INTEL_FAM6_BROADWELL_X 0x4F
+#define INTEL_FAM6_BROADWELL_D 0x56
+
+#define INTEL_FAM6_SKYLAKE_L 0x4E /* Sky Lake */
+#define INTEL_FAM6_SKYLAKE 0x5E /* Sky Lake */
+#define INTEL_FAM6_SKYLAKE_X 0x55 /* Sky Lake */
+/* CASCADELAKE_X 0x55 Sky Lake -- s: 7 */
+/* COOPERLAKE_X 0x55 Sky Lake -- s: 11 */
+
+#define INTEL_FAM6_KABYLAKE_L 0x8E /* Sky Lake */
+/* AMBERLAKE_L 0x8E Sky Lake -- s: 9 */
+/* COFFEELAKE_L 0x8E Sky Lake -- s: 10 */
+/* WHISKEYLAKE_L 0x8E Sky Lake -- s: 11,12 */
+
+#define INTEL_FAM6_KABYLAKE 0x9E /* Sky Lake */
+/* COFFEELAKE 0x9E Sky Lake -- s: 10-13 */
+
+#define INTEL_FAM6_COMETLAKE 0xA5 /* Sky Lake */
+#define INTEL_FAM6_COMETLAKE_L 0xA6 /* Sky Lake */
+
+#define INTEL_FAM6_CANNONLAKE_L 0x66 /* Palm Cove */
+
+#define INTEL_FAM6_ICELAKE_X 0x6A /* Sunny Cove */
+#define INTEL_FAM6_ICELAKE_D 0x6C /* Sunny Cove */
+#define INTEL_FAM6_ICELAKE 0x7D /* Sunny Cove */
+#define INTEL_FAM6_ICELAKE_L 0x7E /* Sunny Cove */
+#define INTEL_FAM6_ICELAKE_NNPI 0x9D /* Sunny Cove */
+
+#define INTEL_FAM6_LAKEFIELD 0x8A /* Sunny Cove / Tremont */
+
+#define INTEL_FAM6_ROCKETLAKE 0xA7 /* Cypress Cove */
+
+#define INTEL_FAM6_TIGERLAKE_L 0x8C /* Willow Cove */
+#define INTEL_FAM6_TIGERLAKE 0x8D /* Willow Cove */
+
+#define INTEL_FAM6_SAPPHIRERAPIDS_X 0x8F /* Golden Cove */
+
+#define INTEL_FAM6_ALDERLAKE 0x97 /* Golden Cove / Gracemont */
+#define INTEL_FAM6_ALDERLAKE_L 0x9A /* Golden Cove / Gracemont */
+
+/* "Small Core" Processors (Atom) */
+
+#define INTEL_FAM6_ATOM_BONNELL 0x1C /* Diamondville, Pineview */
+#define INTEL_FAM6_ATOM_BONNELL_MID 0x26 /* Silverthorne, Lincroft */
+
+#define INTEL_FAM6_ATOM_SALTWELL 0x36 /* Cedarview */
+#define INTEL_FAM6_ATOM_SALTWELL_MID 0x27 /* Penwell */
+#define INTEL_FAM6_ATOM_SALTWELL_TABLET 0x35 /* Cloverview */
+
+#define INTEL_FAM6_ATOM_SILVERMONT 0x37 /* Bay Trail, Valleyview */
+#define INTEL_FAM6_ATOM_SILVERMONT_D 0x4D /* Avaton, Rangely */
+#define INTEL_FAM6_ATOM_SILVERMONT_MID 0x4A /* Merriefield */
+
+#define INTEL_FAM6_ATOM_AIRMONT 0x4C /* Cherry Trail, Braswell */
+#define INTEL_FAM6_ATOM_AIRMONT_MID 0x5A /* Moorefield */
+#define INTEL_FAM6_ATOM_AIRMONT_NP 0x75 /* Lightning Mountain */
+
+#define INTEL_FAM6_ATOM_GOLDMONT 0x5C /* Apollo Lake */
+#define INTEL_FAM6_ATOM_GOLDMONT_D 0x5F /* Denverton */
+
+/* Note: the micro-architecture is "Goldmont Plus" */
+#define INTEL_FAM6_ATOM_GOLDMONT_PLUS 0x7A /* Gemini Lake */
+
+#define INTEL_FAM6_ATOM_TREMONT_D 0x86 /* Jacobsville */
+#define INTEL_FAM6_ATOM_TREMONT 0x96 /* Elkhart Lake */
+#define INTEL_FAM6_ATOM_TREMONT_L 0x9C /* Jasper Lake */
+
+/* Xeon Phi */
+
+#define INTEL_FAM6_XEON_PHI_KNL 0x57 /* Knights Landing */
+#define INTEL_FAM6_XEON_PHI_KNM 0x85 /* Knights Mill */
+
+/* Family 5 */
+#define INTEL_FAM5_QUARK_X1000 0x09 /* Quark X1000 SoC */
+
+#endif /* _LINUX_X86_INTEL_FAMILY_H */
--
2.31.1
Baseboard management controllers (BMC) often run Linux but are usually
implemented with non-X86 processors. They can use PECI to access package
config space (PCS) registers on the host CPU and since some information,
e.g. figuring out the core count, can be obtained using different
registers on different CPU generations, they need to decode the family
and model.
The format of Package Identifier PCS register that describes CPUID
information has the same layout as CPUID_1.EAX, so let's allow to reuse
cpuid helpers by making it available for other architectures as well.
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
MAINTAINERS | 2 ++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/cpu.h | 3 ---
arch/x86/include/asm/microcode.h | 2 +-
arch/x86/kvm/cpuid.h | 3 ++-
arch/x86/lib/Makefile | 2 +-
drivers/edac/mce_amd.c | 3 +--
include/linux/x86/cpu.h | 9 +++++++++
lib/Kconfig | 5 +++++
lib/Makefile | 2 ++
lib/x86/Makefile | 3 +++
{arch/x86/lib => lib/x86}/cpu.c | 2 +-
12 files changed, 28 insertions(+), 9 deletions(-)
create mode 100644 include/linux/x86/cpu.h
create mode 100644 lib/x86/Makefile
rename {arch/x86/lib => lib/x86}/cpu.c (95%)
diff --git a/MAINTAINERS b/MAINTAINERS
index ec5987a00800..6f77aaca2a30 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20081,6 +20081,8 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/core
F: Documentation/devicetree/bindings/x86/
F: Documentation/x86/
F: arch/x86/
+F: include/linux/x86/
+F: lib/x86/
X86 ENTRY CODE
M: Andy Lutomirski <[email protected]>
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 49270655e827..750f9b896e4f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -141,6 +141,7 @@ config X86
select GENERIC_IRQ_PROBE
select GENERIC_IRQ_RESERVATION_MODE
select GENERIC_IRQ_SHOW
+ select GENERIC_LIB_X86
select GENERIC_PENDING_IRQ if SMP
select GENERIC_PTDUMP
select GENERIC_SMP_IDLE_THREAD
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 33d41e350c79..2a663a05a795 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -37,9 +37,6 @@ extern int _debug_hotplug_cpu(int cpu, int action);
int mwait_usable(const struct cpuinfo_x86 *);
-unsigned int x86_family(unsigned int sig);
-unsigned int x86_model(unsigned int sig);
-unsigned int x86_stepping(unsigned int sig);
#ifdef CONFIG_CPU_SUP_INTEL
extern void __init sld_setup(struct cpuinfo_x86 *c);
extern void switch_to_sld(unsigned long tifn);
diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index ab45a220fac4..4b0eabf63b98 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -2,9 +2,9 @@
#ifndef _ASM_X86_MICROCODE_H
#define _ASM_X86_MICROCODE_H
-#include <asm/cpu.h>
#include <linux/earlycpio.h>
#include <linux/initrd.h>
+#include <linux/x86/cpu.h>
struct ucode_patch {
struct list_head plist;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c99edfff7f82..bf070d2a2175 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -4,10 +4,11 @@
#include "x86.h"
#include "reverse_cpuid.h"
-#include <asm/cpu.h>
#include <asm/processor.h>
#include <uapi/asm/kvm_para.h>
+#include <linux/x86/cpu.h>
+
extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
void kvm_set_cpu_caps(void);
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index bad4dee4f0e4..fd73c1b72c3e 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -41,7 +41,7 @@ clean-files := inat-tables.c
obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
-lib-y := delay.o misc.o cmdline.o cpu.o
+lib-y := delay.o misc.o cmdline.o
lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
lib-y += memcpy_$(BITS).o
lib-$(CONFIG_ARCH_HAS_COPY_MC) += copy_mc.o copy_mc_64.o
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 27d56920b469..f545f5fad02c 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1,8 +1,7 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/module.h>
#include <linux/slab.h>
-
-#include <asm/cpu.h>
+#include <linux/x86/cpu.h>
#include "mce_amd.h"
diff --git a/include/linux/x86/cpu.h b/include/linux/x86/cpu.h
new file mode 100644
index 000000000000..5f383d47886d
--- /dev/null
+++ b/include/linux/x86/cpu.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _LINUX_X86_CPU_H
+#define _LINUX_X86_CPU_H
+
+unsigned int x86_family(unsigned int sig);
+unsigned int x86_model(unsigned int sig);
+unsigned int x86_stepping(unsigned int sig);
+
+#endif /* _LINUX_X86_CPU_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index d241fe476fda..cc28bc1f2d84 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -718,3 +718,8 @@ config PLDMFW
config ASN1_ENCODER
tristate
+
+config GENERIC_LIB_X86
+ bool
+ depends on X86
+ default n
diff --git a/lib/Makefile b/lib/Makefile
index 5efd1b435a37..befbd9413432 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -360,3 +360,5 @@ obj-$(CONFIG_CMDLINE_KUNIT_TEST) += cmdline_kunit.o
obj-$(CONFIG_SLUB_KUNIT_TEST) += slub_kunit.o
obj-$(CONFIG_GENERIC_LIB_DEVMEM_IS_ALLOWED) += devmem_is_allowed.o
+
+obj-$(CONFIG_GENERIC_LIB_X86) += x86/
diff --git a/lib/x86/Makefile b/lib/x86/Makefile
new file mode 100644
index 000000000000..342024c272fc
--- /dev/null
+++ b/lib/x86/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-y := cpu.o
diff --git a/arch/x86/lib/cpu.c b/lib/x86/cpu.c
similarity index 95%
rename from arch/x86/lib/cpu.c
rename to lib/x86/cpu.c
index 7ad68917a51e..17af59a2fddf 100644
--- a/arch/x86/lib/cpu.c
+++ b/lib/x86/cpu.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/types.h>
#include <linux/export.h>
-#include <asm/cpu.h>
+#include <linux/x86/cpu.h>
unsigned int x86_family(unsigned int sig)
{
--
2.31.1
Add device tree bindings for the PECI controller.
Signed-off-by: Iwona Winiarska <[email protected]>
---
.../bindings/peci/peci-controller.yaml | 28 +++++++++++++++++++
1 file changed, 28 insertions(+)
create mode 100644 Documentation/devicetree/bindings/peci/peci-controller.yaml
diff --git a/Documentation/devicetree/bindings/peci/peci-controller.yaml b/Documentation/devicetree/bindings/peci/peci-controller.yaml
new file mode 100644
index 000000000000..54ae8fc333d3
--- /dev/null
+++ b/Documentation/devicetree/bindings/peci/peci-controller.yaml
@@ -0,0 +1,28 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/peci/peci-controller.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Generic Device Tree Bindings for PECI
+
+maintainers:
+ - Iwona Winiarska <[email protected]>
+
+description: |
+ PECI (Platform Environment Control Interface) is an interface that provides a
+ communication channel from Intel processors and chipset components to external
+ monitoring or control devices.
+
+properties:
+ $nodename:
+ pattern: "^peci-controller(@.*)?$"
+
+additionalProperties: true
+
+examples:
+ - |
+ peci-controller@1e78b000 {
+ reg = <0x1e78b000 0x100>;
+ };
+...
--
2.31.1
Add PECI controller nodes with all required information.
Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
---
arch/arm/boot/dts/aspeed-g4.dtsi | 14 ++++++++++++++
arch/arm/boot/dts/aspeed-g5.dtsi | 14 ++++++++++++++
arch/arm/boot/dts/aspeed-g6.dtsi | 14 ++++++++++++++
3 files changed, 42 insertions(+)
diff --git a/arch/arm/boot/dts/aspeed-g4.dtsi b/arch/arm/boot/dts/aspeed-g4.dtsi
index c5aeb3cf3a09..de733c03ec18 100644
--- a/arch/arm/boot/dts/aspeed-g4.dtsi
+++ b/arch/arm/boot/dts/aspeed-g4.dtsi
@@ -385,6 +385,20 @@ ibt: ibt@140 {
};
};
+ peci0: peci-controller@1e78b000 {
+ compatible = "aspeed,ast2400-peci";
+ reg = <0x1e78b000 0x60>;
+ interrupts = <15>;
+ clocks = <&syscon ASPEED_CLK_GATE_REFCLK>;
+ resets = <&syscon ASPEED_RESET_PECI>;
+ clock-divider = <0>;
+ msg-timing = <1>;
+ addr-timing = <1>;
+ rd-sampling-point = <8>;
+ cmd-timeout-ms = <1000>;
+ status = "disabled";
+ };
+
uart2: serial@1e78d000 {
compatible = "ns16550a";
reg = <0x1e78d000 0x20>;
diff --git a/arch/arm/boot/dts/aspeed-g5.dtsi b/arch/arm/boot/dts/aspeed-g5.dtsi
index 329eaeef66fb..e7db9cf56114 100644
--- a/arch/arm/boot/dts/aspeed-g5.dtsi
+++ b/arch/arm/boot/dts/aspeed-g5.dtsi
@@ -506,6 +506,20 @@ ibt: ibt@140 {
};
};
+ peci0: peci-controller@1e78b000 {
+ compatible = "aspeed,ast2500-peci";
+ reg = <0x1e78b000 0x60>;
+ interrupts = <15>;
+ clocks = <&syscon ASPEED_CLK_GATE_REFCLK>;
+ resets = <&syscon ASPEED_RESET_PECI>;
+ clock-divider = <0>;
+ msg-timing = <1>;
+ addr-timing = <1>;
+ rd-sampling-point = <8>;
+ cmd-timeout-ms = <1000>;
+ status = "disabled";
+ };
+
uart2: serial@1e78d000 {
compatible = "ns16550a";
reg = <0x1e78d000 0x20>;
diff --git a/arch/arm/boot/dts/aspeed-g6.dtsi b/arch/arm/boot/dts/aspeed-g6.dtsi
index f96607b7b4e2..1e951bb7ff65 100644
--- a/arch/arm/boot/dts/aspeed-g6.dtsi
+++ b/arch/arm/boot/dts/aspeed-g6.dtsi
@@ -459,6 +459,20 @@ wdt4: watchdog@1e7850c0 {
status = "disabled";
};
+ peci0: peci-controller@1e78b000 {
+ compatible = "aspeed,ast2600-peci";
+ reg = <0x1e78b000 0x100>;
+ interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&syscon ASPEED_CLK_GATE_REF0CLK>;
+ resets = <&syscon ASPEED_RESET_PECI>;
+ clock-divider = <0>;
+ msg-timing = <1>;
+ addr-timing = <1>;
+ rd-sampling-point = <8>;
+ cmd-timeout-ms = <1000>;
+ status = "disabled";
+ };
+
lpc: lpc@1e789000 {
compatible = "aspeed,ast2600-lpc-v2", "simple-mfd", "syscon";
reg = <0x1e789000 0x1000>;
--
2.31.1
Add device tree bindings for the peci-aspeed controller driver.
Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
---
.../devicetree/bindings/peci/peci-aspeed.yaml | 111 ++++++++++++++++++
1 file changed, 111 insertions(+)
create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.yaml
diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.yaml b/Documentation/devicetree/bindings/peci/peci-aspeed.yaml
new file mode 100644
index 000000000000..6061e06009fb
--- /dev/null
+++ b/Documentation/devicetree/bindings/peci/peci-aspeed.yaml
@@ -0,0 +1,111 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/peci/peci-aspeed.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Aspeed PECI Bus Device Tree Bindings
+
+maintainers:
+ - Iwona Winiarska <[email protected]>
+ - Jae Hyun Yoo <[email protected]>
+
+allOf:
+ - $ref: peci-controller.yaml#
+
+properties:
+ compatible:
+ enum:
+ - aspeed,ast2400-peci
+ - aspeed,ast2500-peci
+ - aspeed,ast2600-peci
+
+ reg:
+ maxItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ clocks:
+ description: |
+ Clock source for PECI controller. Should reference the external
+ oscillator clock.
+ maxItems: 1
+
+ resets:
+ maxItems: 1
+
+ clock-divider:
+ description: This value determines PECI controller internal clock
+ dividing rate. The divider will be calculated as 2 raised to the
+ power of the given value.
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 0
+ maximum: 7
+ default: 0
+
+ msg-timing:
+ description: |
+ Message timing negotiation period. This value will determine the period
+ of message timing negotiation to be issued by PECI controller. The unit
+ of the programmed value is four times of PECI clock period.
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 0
+ maximum: 255
+ default: 1
+
+ addr-timing:
+ description: |
+ Address timing negotiation period. This value will determine the period
+ of address timing negotiation to be issued by PECI controller. The unit
+ of the programmed value is four times of PECI clock period.
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 0
+ maximum: 255
+ default: 1
+
+ rd-sampling-point:
+ description: |
+ Read sampling point selection. The whole period of a bit time will be
+ divided into 16 time frames. This value will determine the time frame
+ in which the controller will sample PECI signal for data read back.
+ Usually in the middle of a bit time is the best.
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 0
+ maximum: 15
+ default: 8
+
+ cmd-timeout-ms:
+ description: |
+ Command timeout in units of ms.
+ $ref: /schemas/types.yaml#/definitions/uint32
+ minimum: 1
+ maximum: 1000
+ default: 1000
+
+required:
+ - compatible
+ - reg
+ - interrupts
+ - clocks
+ - resets
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/clock/ast2600-clock.h>
+ peci-controller@1e78b000 {
+ compatible = "aspeed,ast2600-peci";
+ reg = <0x1e78b000 0x100>;
+ interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&syscon ASPEED_CLK_GATE_REF0CLK>;
+ resets = <&syscon ASPEED_RESET_PECI>;
+ clock-divider = <0>;
+ msg-timing = <1>;
+ addr-timing = <1>;
+ rd-sampling-point = <8>;
+ cmd-timeout-ms = <1000>;
+ };
+...
--
2.31.1
Intel processors provide access for various services designed to support
processor and DRAM thermal management, platform manageability and
processor interface tuning and diagnostics.
Those services are available via the Platform Environment Control
Interface (PECI) that provides a communication channel between the
processor and the Baseboard Management Controller (BMC) or other
platform management device.
This change introduces PECI subsystem by adding the initial core module
and API for controller drivers.
Co-developed-by: Jason M Bills <[email protected]>
Signed-off-by: Jason M Bills <[email protected]>
Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
MAINTAINERS | 9 +++
drivers/Kconfig | 3 +
drivers/Makefile | 1 +
drivers/peci/Kconfig | 14 ++++
drivers/peci/Makefile | 5 ++
drivers/peci/core.c | 166 ++++++++++++++++++++++++++++++++++++++++
drivers/peci/internal.h | 20 +++++
drivers/peci/sysfs.c | 48 ++++++++++++
include/linux/peci.h | 82 ++++++++++++++++++++
9 files changed, 348 insertions(+)
create mode 100644 drivers/peci/Kconfig
create mode 100644 drivers/peci/Makefile
create mode 100644 drivers/peci/core.c
create mode 100644 drivers/peci/internal.h
create mode 100644 drivers/peci/sysfs.c
create mode 100644 include/linux/peci.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 6f77aaca2a30..47411e2b6336 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14495,6 +14495,15 @@ L: [email protected]
S: Maintained
F: drivers/platform/x86/peaq-wmi.c
+PECI SUBSYSTEM
+M: Iwona Winiarska <[email protected]>
+R: Jae Hyun Yoo <[email protected]>
+L: [email protected] (moderated for non-subscribers)
+S: Supported
+F: Documentation/devicetree/bindings/peci/
+F: drivers/peci/
+F: include/linux/peci.h
+
PENSANDO ETHERNET DRIVERS
M: Shannon Nelson <[email protected]>
M: [email protected]
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 8bad63417a50..f472b3d972b3 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -236,4 +236,7 @@ source "drivers/interconnect/Kconfig"
source "drivers/counter/Kconfig"
source "drivers/most/Kconfig"
+
+source "drivers/peci/Kconfig"
+
endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 27c018bdf4de..8d96f0c3dde5 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -189,3 +189,4 @@ obj-$(CONFIG_GNSS) += gnss/
obj-$(CONFIG_INTERCONNECT) += interconnect/
obj-$(CONFIG_COUNTER) += counter/
obj-$(CONFIG_MOST) += most/
+obj-$(CONFIG_PECI) += peci/
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
new file mode 100644
index 000000000000..601cc3c3c852
--- /dev/null
+++ b/drivers/peci/Kconfig
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+menuconfig PECI
+ tristate "PECI support"
+ help
+ The Platform Environment Control Interface (PECI) is an interface
+ that provides a communication channel to Intel processors and
+ chipset components from external monitoring or control devices.
+
+ If you want PECI support, you should say Y here and also to the
+ specific driver for your bus adapter(s) below.
+
+ This support is also available as a module. If so, the module
+ will be called peci.
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
new file mode 100644
index 000000000000..2bb2f51bcda7
--- /dev/null
+++ b/drivers/peci/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+# Core functionality
+peci-y := core.o sysfs.o
+obj-$(CONFIG_PECI) += peci.o
diff --git a/drivers/peci/core.c b/drivers/peci/core.c
new file mode 100644
index 000000000000..0ad00110459d
--- /dev/null
+++ b/drivers/peci/core.c
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/bug.h>
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/idr.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/peci.h>
+#include <linux/pm_runtime.h>
+#include <linux/property.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+static DEFINE_IDA(peci_controller_ida);
+
+static void peci_controller_dev_release(struct device *dev)
+{
+ struct peci_controller *controller = to_peci_controller(dev);
+
+ mutex_destroy(&controller->bus_lock);
+}
+
+struct device_type peci_controller_type = {
+ .release = peci_controller_dev_release,
+};
+
+int peci_controller_scan_devices(struct peci_controller *controller)
+{
+ /* Just a stub, no support for actual devices yet */
+ return 0;
+}
+
+/**
+ * peci_controller_add() - Add PECI controller
+ * @controller: the PECI controller to be added
+ * @parent: device object to be registered as a parent
+ *
+ * In final stage of its probe(), peci_controller driver should include calling
+ * peci_controller_add() to register itself with the PECI bus.
+ * The caller is responsible for allocating the struct peci_controller and
+ * managing its lifetime, calling peci_controller_remove() prior to releasing
+ * the allocation.
+ *
+ * It returns zero on success, else a negative error code (dropping the
+ * controller's refcount). After a successful return, the caller is responsible
+ * for calling peci_controller_remove().
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_controller_add(struct peci_controller *controller, struct device *parent)
+{
+ struct fwnode_handle *node = fwnode_handle_get(dev_fwnode(parent));
+ int ret;
+
+ if (WARN_ON(!controller->xfer))
+ return -EINVAL;
+
+ ret = ida_alloc_max(&peci_controller_ida, U8_MAX, GFP_KERNEL);
+ if (ret < 0)
+ return ret;
+
+ controller->id = ret;
+
+ mutex_init(&controller->bus_lock);
+
+ controller->dev.parent = parent;
+ controller->dev.bus = &peci_bus_type;
+ controller->dev.type = &peci_controller_type;
+ controller->dev.fwnode = node;
+ controller->dev.of_node = to_of_node(node);
+
+ ret = dev_set_name(&controller->dev, "peci-%d", controller->id);
+ if (ret)
+ goto err_id;
+
+ ret = device_register(&controller->dev);
+ if (ret)
+ goto err_put;
+
+ pm_runtime_no_callbacks(&controller->dev);
+ pm_suspend_ignore_children(&controller->dev, true);
+ pm_runtime_enable(&controller->dev);
+
+ /*
+ * Ignoring retval since failures during scan are non-critical for
+ * controller itself.
+ */
+ peci_controller_scan_devices(controller);
+
+ return 0;
+
+err_put:
+ put_device(&controller->dev);
+err_id:
+ fwnode_handle_put(controller->dev.fwnode);
+ ida_free(&peci_controller_ida, controller->id);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
+
+static int _unregister(struct device *dev, void *dummy)
+{
+ /* Just a stub, no support for actual devices yet */
+ return 0;
+}
+
+/**
+ * peci_controller_remove - Delete PECI controller
+ * @controller: the PECI controller to be removed
+ *
+ * This call is used only by PECI controller drivers, which are the only ones
+ * directly touching chip registers.
+ *
+ * Note that this function also drops a reference to the controller.
+ */
+void peci_controller_remove(struct peci_controller *controller)
+{
+ pm_runtime_disable(&controller->dev);
+ /*
+ * Detach any active PECI devices. This can't fail, thus we do not
+ * check the returned value.
+ */
+ device_for_each_child_reverse(&controller->dev, NULL, _unregister);
+
+ device_unregister(&controller->dev);
+ fwnode_handle_put(controller->dev.fwnode);
+ ida_free(&peci_controller_ida, controller->id);
+}
+EXPORT_SYMBOL_NS_GPL(peci_controller_remove, PECI);
+
+struct bus_type peci_bus_type = {
+ .name = "peci",
+ .bus_groups = peci_bus_groups,
+};
+
+static int __init peci_init(void)
+{
+ int ret;
+
+ ret = bus_register(&peci_bus_type);
+ if (ret < 0) {
+ pr_err("failed to register PECI bus type!\n");
+ return ret;
+ }
+
+ return 0;
+}
+subsys_initcall(peci_init);
+
+static void __exit peci_exit(void)
+{
+ bus_unregister(&peci_bus_type);
+}
+module_exit(peci_exit);
+
+MODULE_AUTHOR("Jason M Bills <[email protected]>");
+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI bus core module");
+MODULE_LICENSE("GPL");
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
new file mode 100644
index 000000000000..80c61bcdfc6b
--- /dev/null
+++ b/drivers/peci/internal.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2018-2021 Intel Corporation */
+
+#ifndef __PECI_INTERNAL_H
+#define __PECI_INTERNAL_H
+
+#include <linux/device.h>
+#include <linux/types.h>
+
+struct peci_controller;
+struct attribute_group;
+
+extern struct bus_type peci_bus_type;
+extern const struct attribute_group *peci_bus_groups[];
+
+extern struct device_type peci_controller_type;
+
+int peci_controller_scan_devices(struct peci_controller *controller);
+
+#endif /* __PECI_INTERNAL_H */
diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c
new file mode 100644
index 000000000000..36c5e2a18a92
--- /dev/null
+++ b/drivers/peci/sysfs.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2021 Intel Corporation
+
+#include <linux/peci.h>
+
+#include "internal.h"
+
+static int rescan_controller(struct device *dev, void *data)
+{
+ if (dev->type != &peci_controller_type)
+ return 0;
+
+ return peci_controller_scan_devices(to_peci_controller(dev));
+}
+
+static ssize_t rescan_store(struct bus_type *bus, const char *buf, size_t count)
+{
+ bool res;
+ int ret;
+
+ ret = kstrtobool(buf, &res);
+ if (ret)
+ return ret;
+
+ if (!res)
+ return count;
+
+ ret = bus_for_each_dev(&peci_bus_type, NULL, NULL, rescan_controller);
+ if (ret)
+ return ret;
+
+ return count;
+}
+static BUS_ATTR_WO(rescan);
+
+static struct attribute *peci_bus_attrs[] = {
+ &bus_attr_rescan.attr,
+ NULL
+};
+
+static const struct attribute_group peci_bus_group = {
+ .attrs = peci_bus_attrs,
+};
+
+const struct attribute_group *peci_bus_groups[] = {
+ &peci_bus_group,
+ NULL
+};
diff --git a/include/linux/peci.h b/include/linux/peci.h
new file mode 100644
index 000000000000..cdf3008321fd
--- /dev/null
+++ b/include/linux/peci.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2018-2021 Intel Corporation */
+
+#ifndef __LINUX_PECI_H
+#define __LINUX_PECI_H
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+struct peci_request;
+
+/**
+ * struct peci_controller - PECI controller
+ * @dev: device object to register PECI controller to the device model
+ * @xfer: PECI transfer function
+ * @bus_lock: lock used to protect multiple callers
+ * @id: PECI controller ID
+ *
+ * PECI controllers usually connect to their drivers using non-PECI bus,
+ * such as the platform bus.
+ * Each PECI controller can communicate with one or more PECI devices.
+ */
+struct peci_controller {
+ struct device dev;
+ int (*xfer)(struct peci_controller *controller, u8 addr, struct peci_request *req);
+ struct mutex bus_lock; /* held for the duration of xfer */
+ u8 id;
+};
+
+int peci_controller_add(struct peci_controller *controller, struct device *parent);
+void peci_controller_remove(struct peci_controller *controller);
+
+static inline struct peci_controller *to_peci_controller(void *d)
+{
+ return container_of(d, struct peci_controller, dev);
+}
+
+/**
+ * struct peci_device - PECI device
+ * @dev: device object to register PECI device to the device model
+ * @controller: manages the bus segment hosting this PECI device
+ * @addr: address used on the PECI bus connected to the parent controller
+ *
+ * A peci_device identifies a single device (i.e. CPU) connected to a PECI bus.
+ * The behaviour exposed to the rest of the system is defined by the PECI driver
+ * managing the device.
+ */
+struct peci_device {
+ struct device dev;
+ struct peci_controller *controller;
+ u8 addr;
+};
+
+static inline struct peci_device *to_peci_device(struct device *d)
+{
+ return container_of(d, struct peci_device, dev);
+}
+
+/**
+ * struct peci_request - PECI request
+ * @device: PECI device to which the request is sent
+ * @tx: TX buffer specific data
+ * @tx.buf: pointer to TX buffer
+ * @tx.len: transfer data length in bytes
+ * @rx: RX buffer specific data
+ * @rx.buf: pointer to RX buffer
+ * @rx.len: received data length in bytes
+ *
+ * A peci_request represents a request issued by PECI originator (TX) and
+ * a response received from PECI responder (RX).
+ */
+struct peci_request {
+ struct peci_device *device;
+ struct {
+ u8 *buf;
+ u8 len;
+ } rx, tx;
+};
+
+#endif /* __LINUX_PECI_H */
--
2.31.1
From: Jae Hyun Yoo <[email protected]>
ASPEED AST24xx/AST25xx/AST26xx SoCs supports the PECI electrical
interface (a.k.a PECI wire).
Signed-off-by: Jae Hyun Yoo <[email protected]>
Co-developed-by: Iwona Winiarska <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
MAINTAINERS | 9 +
drivers/peci/Kconfig | 6 +
drivers/peci/Makefile | 3 +
drivers/peci/controller/Kconfig | 12 +
drivers/peci/controller/Makefile | 3 +
drivers/peci/controller/peci-aspeed.c | 501 ++++++++++++++++++++++++++
6 files changed, 534 insertions(+)
create mode 100644 drivers/peci/controller/Kconfig
create mode 100644 drivers/peci/controller/Makefile
create mode 100644 drivers/peci/controller/peci-aspeed.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 47411e2b6336..4ba874afa2fa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2865,6 +2865,15 @@ S: Maintained
F: Documentation/hwmon/asc7621.rst
F: drivers/hwmon/asc7621.c
+ASPEED PECI CONTROLLER
+M: Iwona Winiarska <[email protected]>
+M: Jae Hyun Yoo <[email protected]>
+L: [email protected] (moderated for non-subscribers)
+L: [email protected] (moderated for non-subscribers)
+S: Supported
+F: Documentation/devicetree/bindings/peci/peci-aspeed.yaml
+F: drivers/peci/controller/peci-aspeed.c
+
ASPEED PINCTRL DRIVERS
M: Andrew Jeffery <[email protected]>
L: [email protected] (moderated for non-subscribers)
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
index 601cc3c3c852..0d0ee8009713 100644
--- a/drivers/peci/Kconfig
+++ b/drivers/peci/Kconfig
@@ -12,3 +12,9 @@ menuconfig PECI
This support is also available as a module. If so, the module
will be called peci.
+
+if PECI
+
+source "drivers/peci/controller/Kconfig"
+
+endif # PECI
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
index 2bb2f51bcda7..621a993e306a 100644
--- a/drivers/peci/Makefile
+++ b/drivers/peci/Makefile
@@ -3,3 +3,6 @@
# Core functionality
peci-y := core.o sysfs.o
obj-$(CONFIG_PECI) += peci.o
+
+# Hardware specific bus drivers
+obj-y += controller/
diff --git a/drivers/peci/controller/Kconfig b/drivers/peci/controller/Kconfig
new file mode 100644
index 000000000000..8ddbe494677f
--- /dev/null
+++ b/drivers/peci/controller/Kconfig
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config PECI_ASPEED
+ tristate "ASPEED PECI support"
+ depends on ARCH_ASPEED || COMPILE_TEST
+ depends on OF
+ depends on HAS_IOMEM
+ help
+ Enable this driver if you want to support ASPEED PECI controller.
+
+ This driver can be also build as a module. If so, the module
+ will be called peci-aspeed.
diff --git a/drivers/peci/controller/Makefile b/drivers/peci/controller/Makefile
new file mode 100644
index 000000000000..022c28ef1bf0
--- /dev/null
+++ b/drivers/peci/controller/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_PECI_ASPEED) += peci-aspeed.o
diff --git a/drivers/peci/controller/peci-aspeed.c b/drivers/peci/controller/peci-aspeed.c
new file mode 100644
index 000000000000..888b46383ea4
--- /dev/null
+++ b/drivers/peci/controller/peci-aspeed.c
@@ -0,0 +1,501 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2012-2017 ASPEED Technology Inc.
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/bitfield.h>
+#include <linux/clk.h>
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/peci.h>
+#include <linux/platform_device.h>
+#include <linux/reset.h>
+
+#include <asm/unaligned.h>
+
+/* ASPEED PECI Registers */
+/* Control Register */
+#define ASPEED_PECI_CTRL 0x00
+#define ASPEED_PECI_CTRL_SAMPLING_MASK GENMASK(19, 16)
+#define ASPEED_PECI_CTRL_READ_MODE_MASK GENMASK(13, 12)
+#define ASPEED_PECI_CTRL_READ_MODE_COUNT BIT(12)
+#define ASPEED_PECI_CTRL_READ_MODE_DBG BIT(13)
+#define ASPEED_PECI_CTRL_CLK_SOURCE_MASK BIT(11)
+#define ASPEED_PECI_CTRL_CLK_DIV_MASK GENMASK(10, 8)
+#define ASPEED_PECI_CTRL_INVERT_OUT BIT(7)
+#define ASPEED_PECI_CTRL_INVERT_IN BIT(6)
+#define ASPEED_PECI_CTRL_BUS_CONTENT_EN BIT(5)
+#define ASPEED_PECI_CTRL_PECI_EN BIT(4)
+#define ASPEED_PECI_CTRL_PECI_CLK_EN BIT(0)
+
+/* Timing Negotiation Register */
+#define ASPEED_PECI_TIMING_NEGOTIATION 0x04
+#define ASPEED_PECI_TIMING_MESSAGE_MASK GENMASK(15, 8)
+#define ASPEED_PECI_TIMING_ADDRESS_MASK GENMASK(7, 0)
+
+/* Command Register */
+#define ASPEED_PECI_CMD 0x08
+#define ASPEED_PECI_CMD_PIN_MON BIT(31)
+#define ASPEED_PECI_CMD_STS_MASK GENMASK(27, 24)
+#define ASPEED_PECI_CMD_STS_ADDR_T_NEGO 0x3
+#define ASPEED_PECI_CMD_IDLE_MASK \
+ (ASPEED_PECI_CMD_STS_MASK | ASPEED_PECI_CMD_PIN_MON)
+#define ASPEED_PECI_CMD_FIRE BIT(0)
+
+/* Read/Write Length Register */
+#define ASPEED_PECI_RW_LENGTH 0x0c
+#define ASPEED_PECI_AW_FCS_EN BIT(31)
+#define ASPEED_PECI_READ_LEN_MASK GENMASK(23, 16)
+#define ASPEED_PECI_WRITE_LEN_MASK GENMASK(15, 8)
+#define ASPEED_PECI_TAGET_ADDR_MASK GENMASK(7, 0)
+
+/* Expected FCS Data Register */
+#define ASPEED_PECI_EXP_FCS 0x10
+#define ASPEED_PECI_EXP_READ_FCS_MASK GENMASK(23, 16)
+#define ASPEED_PECI_EXP_AW_FCS_AUTO_MASK GENMASK(15, 8)
+#define ASPEED_PECI_EXP_WRITE_FCS_MASK GENMASK(7, 0)
+
+/* Captured FCS Data Register */
+#define ASPEED_PECI_CAP_FCS 0x14
+#define ASPEED_PECI_CAP_READ_FCS_MASK GENMASK(23, 16)
+#define ASPEED_PECI_CAP_WRITE_FCS_MASK GENMASK(7, 0)
+
+/* Interrupt Register */
+#define ASPEED_PECI_INT_CTRL 0x18
+#define ASPEED_PECI_TIMING_NEGO_SEL_MASK GENMASK(31, 30)
+#define ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO 0
+#define ASPEED_PECI_2ND_BIT_OF_ADDR_NEGO 1
+#define ASPEED_PECI_MESSAGE_NEGO 2
+#define ASPEED_PECI_INT_MASK GENMASK(4, 0)
+#define ASPEED_PECI_INT_BUS_TIMEOUT BIT(4)
+#define ASPEED_PECI_INT_BUS_CONNECT BIT(3)
+#define ASPEED_PECI_INT_W_FCS_BAD BIT(2)
+#define ASPEED_PECI_INT_W_FCS_ABORT BIT(1)
+#define ASPEED_PECI_INT_CMD_DONE BIT(0)
+
+/* Interrupt Status Register */
+#define ASPEED_PECI_INT_STS 0x1c
+#define ASPEED_PECI_INT_TIMING_RESULT_MASK GENMASK(29, 16)
+ /* bits[4..0]: Same bit fields in the 'Interrupt Register' */
+
+/* Rx/Tx Data Buffer Registers */
+#define ASPEED_PECI_W_DATA0 0x20
+#define ASPEED_PECI_W_DATA1 0x24
+#define ASPEED_PECI_W_DATA2 0x28
+#define ASPEED_PECI_W_DATA3 0x2c
+#define ASPEED_PECI_R_DATA0 0x30
+#define ASPEED_PECI_R_DATA1 0x34
+#define ASPEED_PECI_R_DATA2 0x38
+#define ASPEED_PECI_R_DATA3 0x3c
+#define ASPEED_PECI_W_DATA4 0x40
+#define ASPEED_PECI_W_DATA5 0x44
+#define ASPEED_PECI_W_DATA6 0x48
+#define ASPEED_PECI_W_DATA7 0x4c
+#define ASPEED_PECI_R_DATA4 0x50
+#define ASPEED_PECI_R_DATA5 0x54
+#define ASPEED_PECI_R_DATA6 0x58
+#define ASPEED_PECI_R_DATA7 0x5c
+#define ASPEED_PECI_DATA_BUF_SIZE_MAX 32
+
+/* Timing Negotiation */
+#define ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT 8
+#define ASPEED_PECI_RD_SAMPLING_POINT_MAX (BIT(4) - 1)
+#define ASPEED_PECI_CLK_DIV_DEFAULT 0
+#define ASPEED_PECI_CLK_DIV_MAX (BIT(3) - 1)
+#define ASPEED_PECI_MSG_TIMING_DEFAULT 1
+#define ASPEED_PECI_MSG_TIMING_MAX (BIT(8) - 1)
+#define ASPEED_PECI_ADDR_TIMING_DEFAULT 1
+#define ASPEED_PECI_ADDR_TIMING_MAX (BIT(8) - 1)
+
+/* Timeout */
+#define ASPEED_PECI_IDLE_CHECK_TIMEOUT_US (50 * USEC_PER_MSEC)
+#define ASPEED_PECI_IDLE_CHECK_INTERVAL_US (10 * USEC_PER_MSEC)
+#define ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT (1000)
+#define ASPEED_PECI_CMD_TIMEOUT_MS_MAX (1000)
+
+struct aspeed_peci {
+ struct peci_controller controller;
+ struct device *dev;
+ void __iomem *base;
+ struct clk *clk;
+ struct reset_control *rst;
+ int irq;
+ spinlock_t lock; /* to sync completion status handling */
+ struct completion xfer_complete;
+ u32 status;
+ u32 cmd_timeout_ms;
+ u32 msg_timing;
+ u32 addr_timing;
+ u32 rd_sampling_point;
+ u32 clk_div;
+};
+
+static inline struct aspeed_peci *to_aspeed_peci(struct peci_controller *a)
+{
+ return container_of(a, struct aspeed_peci, controller);
+}
+
+static void aspeed_peci_init_regs(struct aspeed_peci *priv)
+{
+ u32 val;
+
+ val = FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, ASPEED_PECI_CLK_DIV_DEFAULT);
+ val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
+ writel(val, priv->base + ASPEED_PECI_CTRL);
+ /*
+ * Timing negotiation period setting.
+ * The unit of the programmed value is 4 times of PECI clock period.
+ */
+ val = FIELD_PREP(ASPEED_PECI_TIMING_MESSAGE_MASK, priv->msg_timing);
+ val |= FIELD_PREP(ASPEED_PECI_TIMING_ADDRESS_MASK, priv->addr_timing);
+ writel(val, priv->base + ASPEED_PECI_TIMING_NEGOTIATION);
+
+ /* Clear interrupts */
+ val = readl(priv->base + ASPEED_PECI_INT_STS) | ASPEED_PECI_INT_MASK;
+ writel(val, priv->base + ASPEED_PECI_INT_STS);
+
+ /* Set timing negotiation mode and enable interrupts */
+ val = FIELD_PREP(ASPEED_PECI_TIMING_NEGO_SEL_MASK, ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO);
+ val |= ASPEED_PECI_INT_MASK;
+ writel(val, priv->base + ASPEED_PECI_INT_CTRL);
+
+ val = FIELD_PREP(ASPEED_PECI_CTRL_SAMPLING_MASK, priv->rd_sampling_point);
+ val |= FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, priv->clk_div);
+ val |= ASPEED_PECI_CTRL_PECI_EN;
+ val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
+ writel(val, priv->base + ASPEED_PECI_CTRL);
+}
+
+static inline int aspeed_peci_check_idle(struct aspeed_peci *priv)
+{
+ u32 cmd_sts = readl(priv->base + ASPEED_PECI_CMD);
+
+ if (FIELD_GET(ASPEED_PECI_CMD_STS_MASK, cmd_sts) == ASPEED_PECI_CMD_STS_ADDR_T_NEGO)
+ aspeed_peci_init_regs(priv);
+
+ return readl_poll_timeout(priv->base + ASPEED_PECI_CMD,
+ cmd_sts,
+ !(cmd_sts & ASPEED_PECI_CMD_IDLE_MASK),
+ ASPEED_PECI_IDLE_CHECK_INTERVAL_US,
+ ASPEED_PECI_IDLE_CHECK_TIMEOUT_US);
+}
+
+static int aspeed_peci_xfer(struct peci_controller *controller,
+ u8 addr, struct peci_request *req)
+{
+ struct aspeed_peci *priv = to_aspeed_peci(controller);
+ unsigned long flags, timeout = msecs_to_jiffies(priv->cmd_timeout_ms);
+ u32 peci_head;
+ int ret;
+
+ if (req->tx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX ||
+ req->rx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX)
+ return -EINVAL;
+
+ /* Check command sts and bus idle state */
+ ret = aspeed_peci_check_idle(priv);
+ if (ret)
+ return ret; /* -ETIMEDOUT */
+
+ spin_lock_irqsave(&priv->lock, flags);
+ reinit_completion(&priv->xfer_complete);
+
+ peci_head = FIELD_PREP(ASPEED_PECI_TAGET_ADDR_MASK, addr) |
+ FIELD_PREP(ASPEED_PECI_WRITE_LEN_MASK, req->tx.len) |
+ FIELD_PREP(ASPEED_PECI_READ_LEN_MASK, req->rx.len);
+
+ writel(peci_head, priv->base + ASPEED_PECI_RW_LENGTH);
+
+ memcpy_toio(priv->base + ASPEED_PECI_W_DATA0, req->tx.buf,
+ req->tx.len > 16 ? 16 : req->tx.len);
+ if (req->tx.len > 16)
+ memcpy_toio(priv->base + ASPEED_PECI_W_DATA4, req->tx.buf + 16,
+ req->tx.len - 16);
+
+ dev_dbg(priv->dev, "HEAD : 0x%08x\n", peci_head);
+ print_hex_dump_bytes("TX : ", DUMP_PREFIX_NONE, req->tx.buf, req->tx.len);
+
+ priv->status = 0;
+ writel(ASPEED_PECI_CMD_FIRE, priv->base + ASPEED_PECI_CMD);
+ spin_unlock_irqrestore(&priv->lock, flags);
+
+ ret = wait_for_completion_interruptible_timeout(&priv->xfer_complete, timeout);
+ if (ret < 0)
+ return ret;
+
+ if (ret == 0) {
+ dev_dbg(priv->dev, "Timeout waiting for a response!\n");
+ return -ETIMEDOUT;
+ }
+
+ spin_lock_irqsave(&priv->lock, flags);
+
+ writel(0, priv->base + ASPEED_PECI_CMD);
+
+ if (priv->status != ASPEED_PECI_INT_CMD_DONE) {
+ spin_unlock_irqrestore(&priv->lock, flags);
+ dev_dbg(priv->dev, "No valid response!\n");
+ return -EIO;
+ }
+
+ spin_unlock_irqrestore(&priv->lock, flags);
+
+ memcpy_fromio(req->rx.buf, priv->base + ASPEED_PECI_R_DATA0,
+ req->rx.len > 16 ? 16 : req->rx.len);
+ if (req->rx.len > 16)
+ memcpy_fromio(req->rx.buf + 16, priv->base + ASPEED_PECI_R_DATA4,
+ req->rx.len - 16);
+
+ print_hex_dump_bytes("RX : ", DUMP_PREFIX_NONE, req->rx.buf, req->rx.len);
+
+ return 0;
+}
+
+static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)
+{
+ struct aspeed_peci *priv = arg;
+ u32 status;
+
+ spin_lock(&priv->lock);
+ status = readl(priv->base + ASPEED_PECI_INT_STS);
+ writel(status, priv->base + ASPEED_PECI_INT_STS);
+ priv->status |= (status & ASPEED_PECI_INT_MASK);
+
+ /*
+ * In most cases, interrupt bits will be set one by one but also note
+ * that multiple interrupt bits could be set at the same time.
+ */
+ if (status & ASPEED_PECI_INT_BUS_TIMEOUT)
+ dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_BUS_TIMEOUT\n");
+
+ if (status & ASPEED_PECI_INT_BUS_CONNECT)
+ dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_BUS_CONNECT\n");
+
+ if (status & ASPEED_PECI_INT_W_FCS_BAD)
+ dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_W_FCS_BAD\n");
+
+ if (status & ASPEED_PECI_INT_W_FCS_ABORT)
+ dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_W_FCS_ABORT\n");
+
+ /*
+ * All commands should be ended up with a ASPEED_PECI_INT_CMD_DONE bit
+ * set even in an error case.
+ */
+ if (status & ASPEED_PECI_INT_CMD_DONE)
+ complete(&priv->xfer_complete);
+
+ spin_unlock(&priv->lock);
+
+ return IRQ_HANDLED;
+}
+
+static void __sanitize_clock_divider(struct aspeed_peci *priv)
+{
+ u32 clk_div;
+ int ret;
+
+ ret = device_property_read_u32(priv->dev, "clock-divider", &clk_div);
+ if (ret) {
+ clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
+ } else if (clk_div > ASPEED_PECI_CLK_DIV_MAX) {
+ dev_warn(priv->dev, "Invalid clock-divider: %u, Using default: %u\n",
+ clk_div, ASPEED_PECI_CLK_DIV_DEFAULT);
+
+ clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
+ }
+
+ priv->clk_div = clk_div;
+}
+
+static void __sanitize_msg_timing(struct aspeed_peci *priv)
+{
+ u32 msg_timing;
+ int ret;
+
+ ret = device_property_read_u32(priv->dev, "msg-timing", &msg_timing);
+ if (ret) {
+ msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
+ } else if (msg_timing > ASPEED_PECI_MSG_TIMING_MAX) {
+ dev_warn(priv->dev, "Invalid msg-timing : %u, Use default : %u\n",
+ msg_timing, ASPEED_PECI_MSG_TIMING_DEFAULT);
+
+ msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
+ }
+
+ priv->msg_timing = msg_timing;
+}
+
+static void __sanitize_addr_timing(struct aspeed_peci *priv)
+{
+ u32 addr_timing;
+ int ret;
+
+ ret = device_property_read_u32(priv->dev, "addr-timing", &addr_timing);
+ if (ret) {
+ addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
+ } else if (addr_timing > ASPEED_PECI_ADDR_TIMING_MAX) {
+ dev_warn(priv->dev, "Invalid addr-timing : %u, Use default : %u\n",
+ addr_timing, ASPEED_PECI_ADDR_TIMING_DEFAULT);
+
+ addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
+ }
+
+ priv->addr_timing = addr_timing;
+}
+
+static void __sanitize_rd_sampling_point(struct aspeed_peci *priv)
+{
+ u32 rd_sampling_point;
+ int ret;
+
+ ret = device_property_read_u32(priv->dev, "rd-sampling-point", &rd_sampling_point);
+ if (ret) {
+ rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
+ } else if (rd_sampling_point > ASPEED_PECI_RD_SAMPLING_POINT_MAX) {
+ dev_warn(priv->dev, "Invalid rd-sampling-point: %u, Use default : %u\n",
+ rd_sampling_point, ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT);
+
+ rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
+ }
+
+ priv->rd_sampling_point = rd_sampling_point;
+}
+
+static void __sanitize_cmd_timeout(struct aspeed_peci *priv)
+{
+ u32 timeout;
+ int ret;
+
+ ret = device_property_read_u32(priv->dev, "cmd-timeout-ms", &timeout);
+ if (ret) {
+ timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
+ } else if (timeout > ASPEED_PECI_CMD_TIMEOUT_MS_MAX || timeout == 0) {
+ dev_warn(priv->dev, "Invalid cmd-timeout-ms: %u, Use default: %u\n",
+ timeout, ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT);
+
+ timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
+ }
+
+ priv->cmd_timeout_ms = timeout;
+}
+
+static void aspeed_peci_device_property_sanitize(struct aspeed_peci *priv)
+{
+ __sanitize_clock_divider(priv);
+ __sanitize_msg_timing(priv);
+ __sanitize_addr_timing(priv);
+ __sanitize_rd_sampling_point(priv);
+ __sanitize_cmd_timeout(priv);
+}
+
+static void aspeed_peci_disable_clk(void *data)
+{
+ clk_disable_unprepare(data);
+}
+
+static int aspeed_peci_init_ctrl(struct aspeed_peci *priv)
+{
+ int ret;
+
+ priv->clk = devm_clk_get(priv->dev, NULL);
+ if (IS_ERR(priv->clk))
+ return dev_err_probe(priv->dev, PTR_ERR(priv->clk), "Failed to get clk source\n");
+
+ ret = clk_prepare_enable(priv->clk);
+ if (ret) {
+ dev_err(priv->dev, "Failed to enable clock\n");
+ return ret;
+ }
+
+ ret = devm_add_action_or_reset(priv->dev, aspeed_peci_disable_clk, priv->clk);
+ if (ret)
+ return ret;
+
+ aspeed_peci_device_property_sanitize(priv);
+
+ aspeed_peci_init_regs(priv);
+
+ return 0;
+}
+
+static int aspeed_peci_probe(struct platform_device *pdev)
+{
+ struct aspeed_peci *priv;
+ int ret;
+
+ priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ priv->dev = &pdev->dev;
+ dev_set_drvdata(priv->dev, priv);
+
+ priv->base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(priv->base))
+ return PTR_ERR(priv->base);
+
+ priv->irq = platform_get_irq(pdev, 0);
+ if (!priv->irq)
+ return priv->irq;
+
+ ret = devm_request_irq(&pdev->dev, priv->irq, aspeed_peci_irq_handler,
+ 0, "peci-aspeed-irq", priv);
+ if (ret)
+ return ret;
+
+ init_completion(&priv->xfer_complete);
+ spin_lock_init(&priv->lock);
+
+ priv->controller.xfer = aspeed_peci_xfer;
+
+ priv->rst = devm_reset_control_get(&pdev->dev, NULL);
+ if (IS_ERR(priv->rst)) {
+ dev_err(&pdev->dev, "Missing or invalid reset controller entry\n");
+ return PTR_ERR(priv->rst);
+ }
+ reset_control_deassert(priv->rst);
+
+ ret = aspeed_peci_init_ctrl(priv);
+ if (ret)
+ return ret;
+
+ return peci_controller_add(&priv->controller, priv->dev);
+}
+
+static int aspeed_peci_remove(struct platform_device *pdev)
+{
+ struct aspeed_peci *priv = dev_get_drvdata(&pdev->dev);
+
+ peci_controller_remove(&priv->controller);
+ reset_control_assert(priv->rst);
+
+ return 0;
+}
+
+static const struct of_device_id aspeed_peci_of_table[] = {
+ { .compatible = "aspeed,ast2400-peci", },
+ { .compatible = "aspeed,ast2500-peci", },
+ { .compatible = "aspeed,ast2600-peci", },
+ { }
+};
+MODULE_DEVICE_TABLE(of, aspeed_peci_of_table);
+
+static struct platform_driver aspeed_peci_driver = {
+ .probe = aspeed_peci_probe,
+ .remove = aspeed_peci_remove,
+ .driver = {
+ .name = "peci-aspeed",
+ .of_match_table = aspeed_peci_of_table,
+ },
+};
+module_platform_driver(aspeed_peci_driver);
+
+MODULE_AUTHOR("Ryan Chen <[email protected]>");
+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
+MODULE_DESCRIPTION("ASPEED PECI driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(PECI);
--
2.31.1
Here we're adding support for PECI device drivers, which unlike PECI
controller drivers are actually able to provide functionalities to
userspace.
We're also extending peci_request API to allow querying more details
about PECI device (e.g. model/family), that's going to be used to find
a compatible peci_driver.
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
drivers/peci/Kconfig | 1 +
drivers/peci/core.c | 49 +++++++++
drivers/peci/device.c | 99 ++++++++++++++++++
drivers/peci/internal.h | 75 ++++++++++++++
drivers/peci/request.c | 217 ++++++++++++++++++++++++++++++++++++++++
include/linux/peci.h | 19 ++++
lib/Kconfig | 2 +-
7 files changed, 461 insertions(+), 1 deletion(-)
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
index 0d0ee8009713..27c31535843c 100644
--- a/drivers/peci/Kconfig
+++ b/drivers/peci/Kconfig
@@ -2,6 +2,7 @@
menuconfig PECI
tristate "PECI support"
+ select GENERIC_LIB_X86
help
The Platform Environment Control Interface (PECI) is an interface
that provides a communication channel to Intel processors and
diff --git a/drivers/peci/core.c b/drivers/peci/core.c
index ae7a9572cdf3..94426b7f2618 100644
--- a/drivers/peci/core.c
+++ b/drivers/peci/core.c
@@ -143,8 +143,57 @@ void peci_controller_remove(struct peci_controller *controller)
}
EXPORT_SYMBOL_NS_GPL(peci_controller_remove, PECI);
+static const struct peci_device_id *
+peci_bus_match_device_id(const struct peci_device_id *id, struct peci_device *device)
+{
+ while (id->family != 0) {
+ if (id->family == device->info.family &&
+ id->model == device->info.model)
+ return id;
+ id++;
+ }
+
+ return NULL;
+}
+
+static int peci_bus_device_match(struct device *dev, struct device_driver *drv)
+{
+ struct peci_device *device = to_peci_device(dev);
+ struct peci_driver *peci_drv = to_peci_driver(drv);
+
+ if (dev->type != &peci_device_type)
+ return 0;
+
+ if (peci_bus_match_device_id(peci_drv->id_table, device))
+ return 1;
+
+ return 0;
+}
+
+static int peci_bus_device_probe(struct device *dev)
+{
+ struct peci_device *device = to_peci_device(dev);
+ struct peci_driver *driver = to_peci_driver(dev->driver);
+
+ return driver->probe(device, peci_bus_match_device_id(driver->id_table, device));
+}
+
+static int peci_bus_device_remove(struct device *dev)
+{
+ struct peci_device *device = to_peci_device(dev);
+ struct peci_driver *driver = to_peci_driver(dev->driver);
+
+ if (driver->remove)
+ driver->remove(device);
+
+ return 0;
+}
+
struct bus_type peci_bus_type = {
.name = "peci",
+ .match = peci_bus_device_match,
+ .probe = peci_bus_device_probe,
+ .remove = peci_bus_device_remove,
.bus_groups = peci_bus_groups,
};
diff --git a/drivers/peci/device.c b/drivers/peci/device.c
index 1124862211e2..8c4bd1ebbc29 100644
--- a/drivers/peci/device.c
+++ b/drivers/peci/device.c
@@ -1,11 +1,79 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright (c) 2018-2021 Intel Corporation
+#include <linux/bitfield.h>
#include <linux/peci.h>
#include <linux/slab.h>
+#include <linux/x86/cpu.h>
#include "internal.h"
+#define REVISION_NUM_MASK GENMASK(15, 8)
+static int peci_get_revision(struct peci_device *device, u8 *revision)
+{
+ struct peci_request *req;
+ u64 dib;
+
+ req = peci_get_dib(device);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ dib = peci_request_data_dib(req);
+ if (dib == 0) {
+ peci_request_free(req);
+ return -EIO;
+ }
+
+ *revision = FIELD_GET(REVISION_NUM_MASK, dib);
+
+ peci_request_free(req);
+
+ return 0;
+}
+
+static int peci_get_cpu_id(struct peci_device *device, u32 *cpu_id)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_pkg_cfg_readl(device, PECI_PCS_PKG_ID, PECI_PKG_ID_CPU_ID);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *cpu_id = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+
+static int peci_device_info_init(struct peci_device *device)
+{
+ u8 revision;
+ u32 cpu_id;
+ int ret;
+
+ ret = peci_get_cpu_id(device, &cpu_id);
+ if (ret)
+ return ret;
+
+ device->info.family = x86_family(cpu_id);
+ device->info.model = x86_model(cpu_id);
+
+ ret = peci_get_revision(device, &revision);
+ if (ret)
+ return ret;
+ device->info.peci_revision = revision;
+
+ device->info.socket_id = device->addr - PECI_BASE_ADDR;
+
+ return 0;
+}
+
static int peci_detect(struct peci_controller *controller, u8 addr)
{
struct peci_request *req;
@@ -75,6 +143,10 @@ int peci_device_create(struct peci_controller *controller, u8 addr)
device->dev.bus = &peci_bus_type;
device->dev.type = &peci_device_type;
+ ret = peci_device_info_init(device);
+ if (ret)
+ goto err_free;
+
ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device->addr);
if (ret)
goto err_free;
@@ -98,6 +170,33 @@ void peci_device_destroy(struct peci_device *device)
device_unregister(&device->dev);
}
+int __peci_driver_register(struct peci_driver *driver, struct module *owner,
+ const char *mod_name)
+{
+ driver->driver.bus = &peci_bus_type;
+ driver->driver.owner = owner;
+ driver->driver.mod_name = mod_name;
+
+ if (!driver->probe) {
+ pr_err("peci: trying to register driver without probe callback\n");
+ return -EINVAL;
+ }
+
+ if (!driver->id_table) {
+ pr_err("peci: trying to register driver without device id table\n");
+ return -EINVAL;
+ }
+
+ return driver_register(&driver->driver);
+}
+EXPORT_SYMBOL_NS_GPL(__peci_driver_register, PECI);
+
+void peci_driver_unregister(struct peci_driver *driver)
+{
+ driver_unregister(&driver->driver);
+}
+EXPORT_SYMBOL_NS_GPL(peci_driver_unregister, PECI);
+
static void peci_device_release(struct device *dev)
{
struct peci_device *device = to_peci_device(dev);
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
index 6b139adaf6b8..c891c93e077a 100644
--- a/drivers/peci/internal.h
+++ b/drivers/peci/internal.h
@@ -19,6 +19,34 @@ struct peci_request;
struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len);
void peci_request_free(struct peci_request *req);
+int peci_request_status(struct peci_request *req);
+u64 peci_request_data_dib(struct peci_request *req);
+
+u8 peci_request_data_readb(struct peci_request *req);
+u16 peci_request_data_readw(struct peci_request *req);
+u32 peci_request_data_readl(struct peci_request *req);
+u64 peci_request_data_readq(struct peci_request *req);
+
+struct peci_request *peci_get_dib(struct peci_device *device);
+struct peci_request *peci_get_temp(struct peci_device *device);
+
+struct peci_request *peci_pkg_cfg_readb(struct peci_device *device, u8 index, u16 param);
+struct peci_request *peci_pkg_cfg_readw(struct peci_device *device, u8 index, u16 param);
+struct peci_request *peci_pkg_cfg_readl(struct peci_device *device, u8 index, u16 param);
+struct peci_request *peci_pkg_cfg_readq(struct peci_device *device, u8 index, u16 param);
+
+/**
+ * struct peci_device_id - PECI device data to match
+ * @data: pointer to driver private data specific to device
+ * @family: device family
+ * @model: device model
+ */
+struct peci_device_id {
+ const void *data;
+ u16 family;
+ u8 model;
+};
+
extern struct device_type peci_device_type;
extern const struct attribute_group *peci_device_groups[];
@@ -28,6 +56,53 @@ void peci_device_destroy(struct peci_device *device);
extern struct bus_type peci_bus_type;
extern const struct attribute_group *peci_bus_groups[];
+/**
+ * struct peci_driver - PECI driver
+ * @driver: inherit device driver
+ * @probe: probe callback
+ * @remove: remove callback
+ * @id_table: PECI device match table to decide which device to bind
+ */
+struct peci_driver {
+ struct device_driver driver;
+ int (*probe)(struct peci_device *device, const struct peci_device_id *id);
+ void (*remove)(struct peci_device *device);
+ const struct peci_device_id *id_table;
+};
+
+static inline struct peci_driver *to_peci_driver(struct device_driver *d)
+{
+ return container_of(d, struct peci_driver, driver);
+}
+
+int __peci_driver_register(struct peci_driver *driver, struct module *owner,
+ const char *mod_name);
+/**
+ * peci_driver_register() - register PECI driver
+ * @driver: the driver to be registered
+ * @owner: owner module of the driver being registered
+ * @mod_name: module name string
+ *
+ * PECI drivers that don't need to do anything special in module init should
+ * use the convenience "module_peci_driver" macro instead
+ *
+ * Return: zero on success, else a negative error code.
+ */
+#define peci_driver_register(driver) \
+ __peci_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
+void peci_driver_unregister(struct peci_driver *driver);
+
+/**
+ * module_peci_driver() - Helper macro for registering a modular PECI driver
+ * @__peci_driver: peci_driver struct
+ *
+ * Helper macro for PECI drivers which do not do anything special in module
+ * init/exit. This eliminates a lot of boilerplate. Each module may only
+ * use this macro once, and calling it replaces module_init() and module_exit()
+ */
+#define module_peci_driver(__peci_driver) \
+ module_driver(__peci_driver, peci_driver_register, peci_driver_unregister)
+
extern struct device_type peci_controller_type;
int peci_controller_scan_devices(struct peci_controller *controller);
diff --git a/drivers/peci/request.c b/drivers/peci/request.c
index 78cee51dfae1..48354455b554 100644
--- a/drivers/peci/request.c
+++ b/drivers/peci/request.c
@@ -1,13 +1,142 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright (c) 2021 Intel Corporation
+#include <linux/bug.h>
#include <linux/export.h>
#include <linux/peci.h>
#include <linux/slab.h>
#include <linux/types.h>
+#include <asm/unaligned.h>
+
#include "internal.h"
+#define PECI_GET_DIB_CMD 0xf7
+#define PECI_GET_DIB_WR_LEN 1
+#define PECI_GET_DIB_RD_LEN 8
+
+#define PECI_RDPKGCFG_CMD 0xa1
+#define PECI_RDPKGCFG_WRITE_LEN 5
+#define PECI_RDPKGCFG_READ_LEN_BASE 1
+#define PECI_WRPKGCFG_CMD 0xa5
+#define PECI_WRPKGCFG_WRITE_LEN_BASE 6
+#define PECI_WRPKGCFG_READ_LEN 1
+
+/* Device Specific Completion Code (CC) Definition */
+#define PECI_CC_SUCCESS 0x40
+#define PECI_CC_NEED_RETRY 0x80
+#define PECI_CC_OUT_OF_RESOURCE 0x81
+#define PECI_CC_UNAVAIL_RESOURCE 0x82
+#define PECI_CC_INVALID_REQ 0x90
+#define PECI_CC_MCA_ERROR 0x91
+#define PECI_CC_CATASTROPHIC_MCA_ERROR 0x93
+#define PECI_CC_FATAL_MCA_ERROR 0x94
+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB 0x98
+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR 0x9B
+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA 0x9C
+
+#define PECI_RETRY_BIT BIT(0)
+
+#define PECI_RETRY_TIMEOUT msecs_to_jiffies(700)
+#define PECI_RETRY_INTERVAL_MIN msecs_to_jiffies(1)
+#define PECI_RETRY_INTERVAL_MAX msecs_to_jiffies(128)
+
+static u8 peci_request_data_cc(struct peci_request *req)
+{
+ return req->rx.buf[0];
+}
+
+/**
+ * peci_request_status() - return -errno based on PECI completion code
+ * @req: the PECI request that contains response data with completion code
+ *
+ * It can't be used for Ping(), GetDIB() and GetTemp() - for those commands we
+ * don't expect completion code in the response.
+ *
+ * Return: -errno
+ */
+int peci_request_status(struct peci_request *req)
+{
+ u8 cc = peci_request_data_cc(req);
+
+ if (cc != PECI_CC_SUCCESS)
+ dev_dbg(&req->device->dev, "ret: %#02x\n", cc);
+
+ switch (cc) {
+ case PECI_CC_SUCCESS:
+ return 0;
+ case PECI_CC_NEED_RETRY:
+ case PECI_CC_OUT_OF_RESOURCE:
+ case PECI_CC_UNAVAIL_RESOURCE:
+ return -EAGAIN;
+ case PECI_CC_INVALID_REQ:
+ return -EINVAL;
+ case PECI_CC_MCA_ERROR:
+ case PECI_CC_CATASTROPHIC_MCA_ERROR:
+ case PECI_CC_FATAL_MCA_ERROR:
+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB:
+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR:
+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA:
+ return -EIO;
+ }
+
+ WARN_ONCE(1, "Unknown PECI completion code: %#02x\n", cc);
+
+ return -EIO;
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_status, PECI);
+
+static int peci_request_xfer(struct peci_request *req)
+{
+ struct peci_device *device = req->device;
+ struct peci_controller *controller = device->controller;
+ int ret;
+
+ mutex_lock(&controller->bus_lock);
+ ret = controller->xfer(controller, device->addr, req);
+ mutex_unlock(&controller->bus_lock);
+
+ return ret;
+}
+
+static int peci_request_xfer_retry(struct peci_request *req)
+{
+ long wait_interval = PECI_RETRY_INTERVAL_MIN;
+ struct peci_device *device = req->device;
+ struct peci_controller *controller = device->controller;
+ unsigned long start = jiffies;
+ int ret;
+
+ /* Don't try to use it for ping */
+ if (WARN_ON(!req->rx.buf))
+ return 0;
+
+ do {
+ ret = peci_request_xfer(req);
+ if (ret) {
+ dev_dbg(&controller->dev, "xfer error: %d\n", ret);
+ return ret;
+ }
+
+ if (peci_request_status(req) != -EAGAIN)
+ return 0;
+
+ /* Set the retry bit to indicate a retry attempt */
+ req->tx.buf[1] |= PECI_RETRY_BIT;
+
+ if (schedule_timeout_interruptible(wait_interval))
+ return -ERESTARTSYS;
+
+ wait_interval *= 2;
+ if (wait_interval > PECI_RETRY_INTERVAL_MAX)
+ wait_interval = PECI_RETRY_INTERVAL_MAX;
+ } while (time_before(jiffies, start + PECI_RETRY_TIMEOUT));
+
+ dev_dbg(&controller->dev, "request timed out\n");
+
+ return -ETIMEDOUT;
+}
+
/**
* peci_request_alloc() - allocate &struct peci_request with buffers with given lengths
* @device: PECI device to which request is going to be sent
@@ -72,3 +201,91 @@ void peci_request_free(struct peci_request *req)
kfree(req);
}
EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
+
+struct peci_request *peci_get_dib(struct peci_device *device)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_GET_DIB_WR_LEN, PECI_GET_DIB_RD_LEN);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_GET_DIB_CMD;
+
+ ret = peci_request_xfer(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+EXPORT_SYMBOL_NS_GPL(peci_get_dib, PECI);
+
+static struct peci_request *
+__pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_RDPKGCFG_WRITE_LEN,
+ PECI_RDPKGCFG_READ_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_RDPKGCFG_CMD;
+ req->tx.buf[1] = 0;
+ req->tx.buf[2] = index;
+ put_unaligned_le16(param, &req->tx.buf[3]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
+u8 peci_request_data_readb(struct peci_request *req)
+{
+ return req->rx.buf[1];
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readb, PECI);
+
+u16 peci_request_data_readw(struct peci_request *req)
+{
+ return get_unaligned_le16(&req->rx.buf[1]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readw, PECI);
+
+u32 peci_request_data_readl(struct peci_request *req)
+{
+ return get_unaligned_le32(&req->rx.buf[1]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readl, PECI);
+
+u64 peci_request_data_readq(struct peci_request *req)
+{
+ return get_unaligned_le64(&req->rx.buf[1]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_readq, PECI);
+
+u64 peci_request_data_dib(struct peci_request *req)
+{
+ return get_unaligned_le64(&req->rx.buf[0]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_dib, PECI);
+
+#define __read_pkg_config(x, type) \
+struct peci_request *peci_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param) \
+{ \
+ return __pkg_cfg_read(device, index, param, sizeof(type)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_pkg_cfg_##x, PECI)
+
+__read_pkg_config(readb, u8);
+__read_pkg_config(readw, u16);
+__read_pkg_config(readl, u32);
+__read_pkg_config(readq, u64);
diff --git a/include/linux/peci.h b/include/linux/peci.h
index cdf3008321fd..f9f37b874011 100644
--- a/include/linux/peci.h
+++ b/include/linux/peci.h
@@ -9,6 +9,14 @@
#include <linux/mutex.h>
#include <linux/types.h>
+#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
+#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
+#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
+#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
+#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
+#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
+#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
+
struct peci_request;
/**
@@ -41,6 +49,11 @@ static inline struct peci_controller *to_peci_controller(void *d)
* struct peci_device - PECI device
* @dev: device object to register PECI device to the device model
* @controller: manages the bus segment hosting this PECI device
+ * @info: PECI device characteristics
+ * @info.family: device family
+ * @info.model: device model
+ * @info.peci_revision: PECI revision supported by the PECI device
+ * @info.socket_id: the socket ID represented by the PECI device
* @addr: address used on the PECI bus connected to the parent controller
*
* A peci_device identifies a single device (i.e. CPU) connected to a PECI bus.
@@ -50,6 +63,12 @@ static inline struct peci_controller *to_peci_controller(void *d)
struct peci_device {
struct device dev;
struct peci_controller *controller;
+ struct {
+ u16 family;
+ u8 model;
+ u8 peci_revision;
+ u8 socket_id;
+ } info;
u8 addr;
};
diff --git a/lib/Kconfig b/lib/Kconfig
index cc28bc1f2d84..a74e6c0fa75c 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -721,5 +721,5 @@ config ASN1_ENCODER
config GENERIC_LIB_X86
bool
- depends on X86
+ depends on X86 || PECI
default n
--
2.31.1
Since PECI devices are discoverable, we can dynamically detect devices
that are actually available in the system.
This change complements the earlier implementation by rescanning PECI
bus to detect available devices. For this purpose, it also introduces the
minimal API for PECI requests.
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
drivers/peci/Makefile | 2 +-
drivers/peci/core.c | 13 ++++-
drivers/peci/device.c | 111 ++++++++++++++++++++++++++++++++++++++++
drivers/peci/internal.h | 15 ++++++
drivers/peci/request.c | 74 +++++++++++++++++++++++++++
drivers/peci/sysfs.c | 34 ++++++++++++
6 files changed, 246 insertions(+), 3 deletions(-)
create mode 100644 drivers/peci/device.c
create mode 100644 drivers/peci/request.c
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
index 621a993e306a..917f689e147a 100644
--- a/drivers/peci/Makefile
+++ b/drivers/peci/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
# Core functionality
-peci-y := core.o sysfs.o
+peci-y := core.o request.o device.o sysfs.o
obj-$(CONFIG_PECI) += peci.o
# Hardware specific bus drivers
diff --git a/drivers/peci/core.c b/drivers/peci/core.c
index 0ad00110459d..ae7a9572cdf3 100644
--- a/drivers/peci/core.c
+++ b/drivers/peci/core.c
@@ -31,7 +31,15 @@ struct device_type peci_controller_type = {
int peci_controller_scan_devices(struct peci_controller *controller)
{
- /* Just a stub, no support for actual devices yet */
+ int ret;
+ u8 addr;
+
+ for (addr = PECI_BASE_ADDR; addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX; addr++) {
+ ret = peci_device_create(controller, addr);
+ if (ret)
+ return ret;
+ }
+
return 0;
}
@@ -106,7 +114,8 @@ EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
static int _unregister(struct device *dev, void *dummy)
{
- /* Just a stub, no support for actual devices yet */
+ peci_device_destroy(to_peci_device(dev));
+
return 0;
}
diff --git a/drivers/peci/device.c b/drivers/peci/device.c
new file mode 100644
index 000000000000..1124862211e2
--- /dev/null
+++ b/drivers/peci/device.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/peci.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+static int peci_detect(struct peci_controller *controller, u8 addr)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(NULL, 0, 0);
+ if (!req)
+ return -ENOMEM;
+
+ mutex_lock(&controller->bus_lock);
+ ret = controller->xfer(controller, addr, req);
+ mutex_unlock(&controller->bus_lock);
+
+ peci_request_free(req);
+
+ return ret;
+}
+
+static bool peci_addr_valid(u8 addr)
+{
+ return addr >= PECI_BASE_ADDR && addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX;
+}
+
+static int peci_dev_exists(struct device *dev, void *data)
+{
+ struct peci_device *device = to_peci_device(dev);
+ u8 *addr = data;
+
+ if (device->addr == *addr)
+ return -EBUSY;
+
+ return 0;
+}
+
+int peci_device_create(struct peci_controller *controller, u8 addr)
+{
+ struct peci_device *device;
+ int ret;
+
+ if (WARN_ON(!peci_addr_valid(addr)))
+ return -EINVAL;
+
+ /* Check if we have already detected this device before. */
+ ret = device_for_each_child(&controller->dev, &addr, peci_dev_exists);
+ if (ret)
+ return 0;
+
+ ret = peci_detect(controller, addr);
+ if (ret) {
+ /*
+ * Device not present or host state doesn't allow successful
+ * detection at this time.
+ */
+ if (ret == -EIO || ret == -ETIMEDOUT)
+ return 0;
+
+ return ret;
+ }
+
+ device = kzalloc(sizeof(*device), GFP_KERNEL);
+ if (!device)
+ return -ENOMEM;
+
+ device->controller = controller;
+ device->addr = addr;
+ device->dev.parent = &device->controller->dev;
+ device->dev.bus = &peci_bus_type;
+ device->dev.type = &peci_device_type;
+
+ ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device->addr);
+ if (ret)
+ goto err_free;
+
+ ret = device_register(&device->dev);
+ if (ret)
+ goto err_put;
+
+ return 0;
+
+err_put:
+ put_device(&device->dev);
+err_free:
+ kfree(device);
+
+ return ret;
+}
+
+void peci_device_destroy(struct peci_device *device)
+{
+ device_unregister(&device->dev);
+}
+
+static void peci_device_release(struct device *dev)
+{
+ struct peci_device *device = to_peci_device(dev);
+
+ kfree(device);
+}
+
+struct device_type peci_device_type = {
+ .groups = peci_device_groups,
+ .release = peci_device_release,
+};
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
index 80c61bcdfc6b..6b139adaf6b8 100644
--- a/drivers/peci/internal.h
+++ b/drivers/peci/internal.h
@@ -9,6 +9,21 @@
struct peci_controller;
struct attribute_group;
+struct peci_device;
+struct peci_request;
+
+/* PECI CPU address range 0x30-0x37 */
+#define PECI_BASE_ADDR 0x30
+#define PECI_DEVICE_NUM_MAX 8
+
+struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len);
+void peci_request_free(struct peci_request *req);
+
+extern struct device_type peci_device_type;
+extern const struct attribute_group *peci_device_groups[];
+
+int peci_device_create(struct peci_controller *controller, u8 addr);
+void peci_device_destroy(struct peci_device *device);
extern struct bus_type peci_bus_type;
extern const struct attribute_group *peci_bus_groups[];
diff --git a/drivers/peci/request.c b/drivers/peci/request.c
new file mode 100644
index 000000000000..78cee51dfae1
--- /dev/null
+++ b/drivers/peci/request.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2021 Intel Corporation
+
+#include <linux/export.h>
+#include <linux/peci.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "internal.h"
+
+/**
+ * peci_request_alloc() - allocate &struct peci_request with buffers with given lengths
+ * @device: PECI device to which request is going to be sent
+ * @tx_len: requested TX buffer length
+ * @rx_len: requested RX buffer length
+ *
+ * Return: A pointer to a newly allocated &struct peci_request on success or NULL otherwise.
+ */
+struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len)
+{
+ struct peci_request *req;
+ u8 *tx_buf, *rx_buf;
+
+ req = kzalloc(sizeof(*req), GFP_KERNEL);
+ if (!req)
+ return NULL;
+
+ req->device = device;
+
+ /*
+ * PECI controllers that we are using now don't support DMA, this
+ * should be converted to DMA API once support for controllers that do
+ * allow it is added to avoid an extra copy.
+ */
+ if (tx_len) {
+ tx_buf = kzalloc(tx_len, GFP_KERNEL);
+ if (!tx_buf)
+ goto err_free_req;
+
+ req->tx.buf = tx_buf;
+ req->tx.len = tx_len;
+ }
+
+ if (rx_len) {
+ rx_buf = kzalloc(rx_len, GFP_KERNEL);
+ if (!rx_buf)
+ goto err_free_tx;
+
+ req->rx.buf = rx_buf;
+ req->rx.len = rx_len;
+ }
+
+ return req;
+
+err_free_tx:
+ kfree(req->tx.buf);
+err_free_req:
+ kfree(req);
+
+ return NULL;
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_alloc, PECI);
+
+/**
+ * peci_request_free() - free peci_request
+ * @req: the PECI request to be freed
+ */
+void peci_request_free(struct peci_request *req)
+{
+ kfree(req->rx.buf);
+ kfree(req->tx.buf);
+ kfree(req);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c
index 36c5e2a18a92..db9ef05776e3 100644
--- a/drivers/peci/sysfs.c
+++ b/drivers/peci/sysfs.c
@@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright (c) 2021 Intel Corporation
+#include <linux/device.h>
+#include <linux/kernel.h>
#include <linux/peci.h>
#include "internal.h"
@@ -46,3 +48,35 @@ const struct attribute_group *peci_bus_groups[] = {
&peci_bus_group,
NULL
};
+
+static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct peci_device *device = to_peci_device(dev);
+ bool res;
+ int ret;
+
+ ret = kstrtobool(buf, &res);
+ if (ret)
+ return ret;
+
+ if (res && device_remove_file_self(dev, attr))
+ peci_device_destroy(device);
+
+ return count;
+}
+static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0200, NULL, remove_store);
+
+static struct attribute *peci_device_attrs[] = {
+ &dev_attr_remove.attr,
+ NULL
+};
+
+static const struct attribute_group peci_device_group = {
+ .attrs = peci_device_attrs,
+};
+
+const struct attribute_group *peci_device_groups[] = {
+ &peci_device_group,
+ NULL
+};
--
2.31.1
Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
readings of the processor package and processor cores that are
accessible via the PECI interface.
The main use case for the driver (and PECI interface) is out-of-band
management, where we're able to obtain the DTS readings from an external
entity connected with PECI, e.g. BMC on server platforms.
Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
MAINTAINERS | 7 +
drivers/hwmon/Kconfig | 2 +
drivers/hwmon/Makefile | 1 +
drivers/hwmon/peci/Kconfig | 18 ++
drivers/hwmon/peci/Makefile | 5 +
drivers/hwmon/peci/common.h | 46 ++++
drivers/hwmon/peci/cputemp.c | 503 +++++++++++++++++++++++++++++++++++
7 files changed, 582 insertions(+)
create mode 100644 drivers/hwmon/peci/Kconfig
create mode 100644 drivers/hwmon/peci/Makefile
create mode 100644 drivers/hwmon/peci/common.h
create mode 100644 drivers/hwmon/peci/cputemp.c
diff --git a/MAINTAINERS b/MAINTAINERS
index f47b5f634293..35ba9e3646bd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14504,6 +14504,13 @@ L: [email protected]
S: Maintained
F: drivers/platform/x86/peaq-wmi.c
+PECI HARDWARE MONITORING DRIVERS
+M: Iwona Winiarska <[email protected]>
+R: Jae Hyun Yoo <[email protected]>
+L: [email protected]
+S: Supported
+F: drivers/hwmon/peci/
+
PECI SUBSYSTEM
M: Iwona Winiarska <[email protected]>
R: Jae Hyun Yoo <[email protected]>
diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index e3675377bc5d..61c0e3404415 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1507,6 +1507,8 @@ config SENSORS_PCF8591
These devices are hard to detect and rarely found on mainstream
hardware. If unsure, say N.
+source "drivers/hwmon/peci/Kconfig"
+
source "drivers/hwmon/pmbus/Kconfig"
config SENSORS_PWM_FAN
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index d712c61c1f5e..f52331f212ed 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -202,6 +202,7 @@ obj-$(CONFIG_SENSORS_WM8350) += wm8350-hwmon.o
obj-$(CONFIG_SENSORS_XGENE) += xgene-hwmon.o
obj-$(CONFIG_SENSORS_OCC) += occ/
+obj-$(CONFIG_SENSORS_PECI) += peci/
obj-$(CONFIG_PMBUS) += pmbus/
ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
new file mode 100644
index 000000000000..e10eed68d70a
--- /dev/null
+++ b/drivers/hwmon/peci/Kconfig
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config SENSORS_PECI_CPUTEMP
+ tristate "PECI CPU temperature monitoring client"
+ depends on PECI
+ select SENSORS_PECI
+ select PECI_CPU
+ help
+ If you say yes here you get support for the generic Intel PECI
+ cputemp driver which provides Digital Thermal Sensor (DTS) thermal
+ readings of the CPU package and CPU cores that are accessible via
+ the processor PECI interface.
+
+ This driver can also be built as a module. If so, the module
+ will be called peci-cputemp.
+
+config SENSORS_PECI
+ tristate
diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
new file mode 100644
index 000000000000..e8a0ada5ab1f
--- /dev/null
+++ b/drivers/hwmon/peci/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+peci-cputemp-y := cputemp.o
+
+obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
new file mode 100644
index 000000000000..54580c100d06
--- /dev/null
+++ b/drivers/hwmon/peci/common.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2021 Intel Corporation */
+
+#include <linux/types.h>
+
+#ifndef __PECI_HWMON_COMMON_H
+#define __PECI_HWMON_COMMON_H
+
+#define UPDATE_INTERVAL_DEFAULT HZ
+
+/**
+ * struct peci_sensor_data - PECI sensor information
+ * @valid: flag to indicate the sensor value is valid
+ * @value: sensor value in milli units
+ * @last_updated: time of the last update in jiffies
+ */
+struct peci_sensor_data {
+ unsigned int valid;
+ s32 value;
+ unsigned long last_updated;
+};
+
+/**
+ * peci_sensor_need_update() - check whether sensor update is needed or not
+ * @sensor: pointer to sensor data struct
+ *
+ * Return: true if update is needed, false if not.
+ */
+
+static inline bool peci_sensor_need_update(struct peci_sensor_data *sensor)
+{
+ return !sensor->valid ||
+ time_after(jiffies, sensor->last_updated + UPDATE_INTERVAL_DEFAULT);
+}
+
+/**
+ * peci_sensor_mark_updated() - mark the sensor is updated
+ * @sensor: pointer to sensor data struct
+ */
+static inline void peci_sensor_mark_updated(struct peci_sensor_data *sensor)
+{
+ sensor->valid = 1;
+ sensor->last_updated = jiffies;
+}
+
+#endif /* __PECI_HWMON_COMMON_H */
diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
new file mode 100644
index 000000000000..56a526471687
--- /dev/null
+++ b/drivers/hwmon/peci/cputemp.c
@@ -0,0 +1,503 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/auxiliary_bus.h>
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/hwmon.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/peci.h>
+#include <linux/peci-cpu.h>
+#include <linux/units.h>
+#include <linux/x86/intel-family.h>
+
+#include "common.h"
+
+#define CORE_NUMS_MAX 64
+
+#define DEFAULT_CHANNEL_NUMS 5
+#define CORETEMP_CHANNEL_NUMS CORE_NUMS_MAX
+#define CPUTEMP_CHANNEL_NUMS (DEFAULT_CHANNEL_NUMS + CORETEMP_CHANNEL_NUMS)
+
+#define TEMP_TARGET_FAN_TEMP_MASK GENMASK(15, 8)
+#define TEMP_TARGET_REF_TEMP_MASK GENMASK(23, 16)
+#define TEMP_TARGET_TJ_OFFSET_MASK GENMASK(29, 24)
+
+#define DTS_MARGIN_MASK GENMASK(15, 0)
+#define PCS_MODULE_TEMP_MASK GENMASK(15, 0)
+
+#define DTS_FIXED_POINT_FRACTION 64
+
+struct resolved_cores_reg {
+ u8 bus;
+ u8 dev;
+ u8 func;
+ u8 offset;
+};
+
+struct cpu_info {
+ struct resolved_cores_reg *reg;
+ u8 min_peci_revision;
+};
+
+struct peci_cputemp {
+ struct peci_device *peci_dev;
+ struct device *dev;
+ const char *name;
+ const struct cpu_info *gen_info;
+ struct {
+ struct peci_sensor_data die;
+ struct peci_sensor_data dts;
+ struct peci_sensor_data tcontrol;
+ struct peci_sensor_data tthrottle;
+ struct peci_sensor_data tjmax;
+ struct peci_sensor_data core[CORETEMP_CHANNEL_NUMS];
+ } temp;
+ const char **coretemp_label;
+ DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
+};
+
+enum cputemp_channels {
+ channel_die,
+ channel_dts,
+ channel_tcontrol,
+ channel_tthrottle,
+ channel_tjmax,
+ channel_core,
+};
+
+static const char *cputemp_label[DEFAULT_CHANNEL_NUMS] = {
+ "Die",
+ "DTS",
+ "Tcontrol",
+ "Tthrottle",
+ "Tjmax",
+};
+
+static int get_temp_targets(struct peci_cputemp *priv)
+{
+ s32 tthrottle_offset, tcontrol_margin;
+ u32 pcs;
+ int ret;
+
+ /*
+ * Just use only the tcontrol marker to determine if target values need
+ * update.
+ */
+ if (!peci_sensor_need_update(&priv->temp.tcontrol))
+ return 0;
+
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
+ if (ret)
+ return ret;
+
+ priv->temp.tjmax.value = FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
+
+ tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
+ tcontrol_margin = sign_extend32(tcontrol_margin, 7) * MILLIDEGREE_PER_DEGREE;
+ priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
+
+ tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
+ priv->temp.tthrottle.value = priv->temp.tjmax.value - tthrottle_offset;
+
+ peci_sensor_mark_updated(&priv->temp.tcontrol);
+
+ return 0;
+}
+
+/*
+ * Processors return a value of DTS reading in S10.6 fixed point format
+ * (sign, 10 bits signed integer value, 6 bits fractional).
+ * Error codes:
+ * 0x8000: General sensor error
+ * 0x8001: Reserved
+ * 0x8002: Underflow on reading value
+ * 0x8003-0x81ff: Reserved
+ */
+static bool dts_valid(s32 val)
+{
+ return val < 0x8000 || val > 0x81ff;
+}
+
+static s32 dts_to_millidegree(s32 val)
+{
+ return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / DTS_FIXED_POINT_FRACTION;
+}
+
+static int get_die_temp(struct peci_cputemp *priv)
+{
+ s16 temp;
+ int ret;
+
+ if (!peci_sensor_need_update(&priv->temp.die))
+ return 0;
+
+ ret = peci_temp_read(priv->peci_dev, &temp);
+ if (ret)
+ return ret;
+
+ if (!dts_valid(temp))
+ return -EIO;
+
+ /* Note that the tjmax should be available before calling it */
+ priv->temp.die.value = priv->temp.tjmax.value + dts_to_millidegree(temp);
+
+ peci_sensor_mark_updated(&priv->temp.die);
+
+ return 0;
+}
+
+static int get_dts(struct peci_cputemp *priv)
+{
+ s32 dts_margin;
+ u32 pcs;
+ int ret;
+
+ if (!peci_sensor_need_update(&priv->temp.dts))
+ return 0;
+
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_THERMAL_MARGIN, 0, &pcs);
+ if (ret)
+ return ret;
+
+ dts_margin = FIELD_GET(DTS_MARGIN_MASK, pcs);
+ if (!dts_valid(dts_margin))
+ return -EIO;
+
+ /* Note that the tcontrol should be available before calling it */
+ priv->temp.dts.value = priv->temp.tcontrol.value - dts_to_millidegree(dts_margin);
+
+ peci_sensor_mark_updated(&priv->temp.dts);
+
+ return 0;
+}
+
+static int get_core_temp(struct peci_cputemp *priv, int core_index)
+{
+ s32 core_dts_margin;
+ u32 pcs;
+ int ret;
+
+ if (!peci_sensor_need_update(&priv->temp.core[core_index]))
+ return 0;
+
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_MODULE_TEMP, core_index, &pcs);
+ if (ret)
+ return ret;
+
+ core_dts_margin = FIELD_GET(PCS_MODULE_TEMP_MASK, pcs);
+ if (!dts_valid(core_dts_margin))
+ return -EIO;
+
+ /* Note that the tjmax should be available before calling it */
+ priv->temp.core[core_index].value =
+ priv->temp.tjmax.value + dts_to_millidegree(core_dts_margin);
+
+ peci_sensor_mark_updated(&priv->temp.core[core_index]);
+
+ return 0;
+}
+
+static int cputemp_read_string(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, const char **str)
+{
+ struct peci_cputemp *priv = dev_get_drvdata(dev);
+
+ if (attr != hwmon_temp_label)
+ return -EOPNOTSUPP;
+
+ *str = channel < channel_core ?
+ cputemp_label[channel] : priv->coretemp_label[channel - channel_core];
+
+ return 0;
+}
+
+static int cputemp_read(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, long *val)
+{
+ struct peci_cputemp *priv = dev_get_drvdata(dev);
+ int ret, core_index;
+
+ ret = get_temp_targets(priv);
+ if (ret)
+ return ret;
+
+ switch (attr) {
+ case hwmon_temp_input:
+ switch (channel) {
+ case channel_die:
+ ret = get_die_temp(priv);
+ if (ret)
+ return ret;
+
+ *val = priv->temp.die.value;
+ break;
+ case channel_dts:
+ ret = get_dts(priv);
+ if (ret)
+ return ret;
+
+ *val = priv->temp.dts.value;
+ break;
+ case channel_tcontrol:
+ *val = priv->temp.tcontrol.value;
+ break;
+ case channel_tthrottle:
+ *val = priv->temp.tthrottle.value;
+ break;
+ case channel_tjmax:
+ *val = priv->temp.tjmax.value;
+ break;
+ default:
+ core_index = channel - channel_core;
+ ret = get_core_temp(priv, core_index);
+ if (ret)
+ return ret;
+
+ *val = priv->temp.core[core_index].value;
+ break;
+ }
+ break;
+ case hwmon_temp_max:
+ *val = priv->temp.tcontrol.value;
+ break;
+ case hwmon_temp_crit:
+ *val = priv->temp.tjmax.value;
+ break;
+ case hwmon_temp_crit_hyst:
+ *val = priv->temp.tjmax.value - priv->temp.tcontrol.value;
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static umode_t cputemp_is_visible(const void *data, enum hwmon_sensor_types type,
+ u32 attr, int channel)
+{
+ const struct peci_cputemp *priv = data;
+
+ if (channel > CPUTEMP_CHANNEL_NUMS)
+ return 0;
+
+ if (channel < channel_core)
+ return 0444;
+
+ if (test_bit(channel - channel_core, priv->core_mask))
+ return 0444;
+
+ return 0;
+}
+
+static int init_core_mask(struct peci_cputemp *priv)
+{
+ struct peci_device *peci_dev = priv->peci_dev;
+ struct resolved_cores_reg *reg = priv->gen_info->reg;
+ u64 core_mask;
+ u32 data;
+ int ret;
+
+ /* Get the RESOLVED_CORES register value */
+ switch (peci_dev->info.model) {
+ case INTEL_FAM6_ICELAKE_X:
+ case INTEL_FAM6_ICELAKE_D:
+ ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
+ reg->func, reg->offset + 4, &data);
+ if (ret)
+ return ret;
+
+ core_mask = (u64)data << 32;
+
+ ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
+ reg->func, reg->offset, &data);
+ if (ret)
+ return ret;
+
+ core_mask |= data;
+
+ break;
+ default:
+ ret = peci_pci_local_read(peci_dev, reg->bus, reg->dev,
+ reg->func, reg->offset, &data);
+ if (ret)
+ return ret;
+
+ core_mask = data;
+
+ break;
+ }
+
+ if (!core_mask)
+ return -EIO;
+
+ bitmap_from_u64(priv->core_mask, core_mask);
+
+ return 0;
+}
+
+static int create_temp_label(struct peci_cputemp *priv)
+{
+ unsigned long core_max = find_last_bit(priv->core_mask, CORE_NUMS_MAX);
+ int i;
+
+ priv->coretemp_label = devm_kzalloc(priv->dev, core_max * sizeof(char *), GFP_KERNEL);
+ if (!priv->coretemp_label)
+ return -ENOMEM;
+
+ for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
+ priv->coretemp_label[i] = devm_kasprintf(priv->dev, GFP_KERNEL, "Core %d", i);
+ if (!priv->coretemp_label[i])
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static void check_resolved_cores(struct peci_cputemp *priv)
+{
+ int ret;
+
+ ret = init_core_mask(priv);
+ if (ret)
+ return;
+
+ ret = create_temp_label(priv);
+ if (ret)
+ bitmap_zero(priv->core_mask, CORE_NUMS_MAX);
+}
+
+static const struct hwmon_ops peci_cputemp_ops = {
+ .is_visible = cputemp_is_visible,
+ .read_string = cputemp_read_string,
+ .read = cputemp_read,
+};
+
+static const u32 peci_cputemp_temp_channel_config[] = {
+ /* Die temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
+ /* DTS margin */
+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
+ /* Tcontrol temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
+ /* Tthrottle temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT,
+ /* Tjmax temperature */
+ HWMON_T_LABEL | HWMON_T_INPUT,
+ /* Core temperature - for all core channels */
+ [channel_core ... CPUTEMP_CHANNEL_NUMS - 1] = HWMON_T_LABEL | HWMON_T_INPUT,
+ 0
+};
+
+static const struct hwmon_channel_info peci_cputemp_temp_channel = {
+ .type = hwmon_temp,
+ .config = peci_cputemp_temp_channel_config,
+};
+
+static const struct hwmon_channel_info *peci_cputemp_info[] = {
+ &peci_cputemp_temp_channel,
+ NULL
+};
+
+static const struct hwmon_chip_info peci_cputemp_chip_info = {
+ .ops = &peci_cputemp_ops,
+ .info = peci_cputemp_info,
+};
+
+static int peci_cputemp_probe(struct auxiliary_device *adev,
+ const struct auxiliary_device_id *id)
+{
+ struct device *dev = &adev->dev;
+ struct peci_device *peci_dev = to_peci_device(dev->parent);
+ struct peci_cputemp *priv;
+ struct device *hwmon_dev;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_cputemp.cpu%d",
+ peci_dev->info.socket_id);
+ if (!priv->name)
+ return -ENOMEM;
+
+ dev_set_drvdata(dev, priv);
+ priv->dev = dev;
+ priv->peci_dev = peci_dev;
+ priv->gen_info = (const struct cpu_info *)id->driver_data;
+
+ check_resolved_cores(priv);
+
+ hwmon_dev = devm_hwmon_device_register_with_info(priv->dev, priv->name,
+ priv, &peci_cputemp_chip_info, NULL);
+
+ return PTR_ERR_OR_ZERO(hwmon_dev);
+}
+
+static struct resolved_cores_reg resolved_cores_reg_hsx = {
+ .bus = 1,
+ .dev = 30,
+ .func = 3,
+ .offset = 0xb4,
+};
+
+static struct resolved_cores_reg resolved_cores_reg_icx = {
+ .bus = 14,
+ .dev = 30,
+ .func = 3,
+ .offset = 0xd0,
+};
+
+static const struct cpu_info cpu_hsx = {
+ .reg = &resolved_cores_reg_hsx,
+ .min_peci_revision = 0x30,
+};
+
+static const struct cpu_info cpu_icx = {
+ .reg = &resolved_cores_reg_icx,
+ .min_peci_revision = 0x40,
+};
+
+static const struct auxiliary_device_id peci_cputemp_ids[] = {
+ {
+ .name = "peci_cpu.cputemp.hsx",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.bdx",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.bdxd",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.skx",
+ .driver_data = (kernel_ulong_t)&cpu_hsx,
+ },
+ {
+ .name = "peci_cpu.cputemp.icx",
+ .driver_data = (kernel_ulong_t)&cpu_icx,
+ },
+ {
+ .name = "peci_cpu.cputemp.icxd",
+ .driver_data = (kernel_ulong_t)&cpu_icx,
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(auxiliary, peci_cputemp_ids);
+
+static struct auxiliary_driver peci_cputemp_driver = {
+ .probe = peci_cputemp_probe,
+ .id_table = peci_cputemp_ids,
+};
+
+module_auxiliary_driver(peci_cputemp_driver);
+
+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI cputemp driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(PECI_CPU);
--
2.31.1
Add a brief overview of PECI and PECI wire interface.
The documentation also contains kernel-doc for PECI subsystem internals
and PECI CPU Driver API.
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
Documentation/index.rst | 1 +
Documentation/peci/index.rst | 16 ++++++++++++
Documentation/peci/peci.rst | 48 ++++++++++++++++++++++++++++++++++++
MAINTAINERS | 1 +
4 files changed, 66 insertions(+)
create mode 100644 Documentation/peci/index.rst
create mode 100644 Documentation/peci/peci.rst
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 54ce34fd6fbd..7671f2cd474f 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -137,6 +137,7 @@ needed).
misc-devices/index
scheduler/index
mhi/index
+ peci/index
Architecture-agnostic documentation
-----------------------------------
diff --git a/Documentation/peci/index.rst b/Documentation/peci/index.rst
new file mode 100644
index 000000000000..989de10416e7
--- /dev/null
+++ b/Documentation/peci/index.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+====================
+Linux PECI Subsystem
+====================
+
+.. toctree::
+
+ peci
+
+.. only:: subproject and html
+
+ Indices
+ =======
+
+ * :ref:`genindex`
diff --git a/Documentation/peci/peci.rst b/Documentation/peci/peci.rst
new file mode 100644
index 000000000000..a12c8e10c4a9
--- /dev/null
+++ b/Documentation/peci/peci.rst
@@ -0,0 +1,48 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+========
+Overview
+========
+
+The Platform Environment Control Interface (PECI) is a communication
+interface between Intel processor and management controllers
+(e.g. Baseboard Management Controller, BMC).
+PECI provides services that allow the management controller to
+configure, monitor and debug platform by accessing various registers.
+It defines a dedicated command protocol, where the management
+controller is acting as a PECI originator and the processor - as
+a PECI responder.
+PECI can be used in both single processor and multiple-processor based
+systems.
+
+NOTE:
+Intel PECI specification is not released as a dedicated document,
+instead it is a part of External Design Specification (EDS) for given
+Intel CPU. External Design Specifications are usually not publicly
+available.
+
+PECI Wire
+---------
+
+PECI Wire interface uses a single wire for self-clocking and data
+transfer. It does not require any additional control lines - the
+physical layer is a self-clocked one-wire bus signal that begins each
+bit with a driven, rising edge from an idle near zero volts. The
+duration of the signal driven high allows to determine whether the bit
+value is logic '0' or logic '1'. PECI Wire also includes variable data
+rate established with every message.
+
+For PECI Wire, each processor package will utilize unique, fixed
+addresses within a defined range and that address should
+have a fixed relationship with the processor socket ID - if one of the
+processors is removed, it does not affect addresses of remaining
+processors.
+
+PECI subsystem internals
+------------------------
+
+.. kernel-doc:: include/linux/peci.h
+
+PECI CPU Driver API
+-------------------
+.. kernel-doc:: include/linux/peci-cpu.h
diff --git a/MAINTAINERS b/MAINTAINERS
index d16da127bbdc..a596453db003 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14519,6 +14519,7 @@ R: Jae Hyun Yoo <[email protected]>
L: [email protected] (moderated for non-subscribers)
S: Supported
F: Documentation/devicetree/bindings/peci/
+F: Documentation/peci/
F: drivers/peci/
F: include/linux/peci-cpu.h
F: include/linux/peci.h
--
2.31.1
From: Jae Hyun Yoo <[email protected]>
Add documentation for peci-cputemp driver that provides DTS thermal
readings for CPU packages and CPU cores and peci-dimmtemp driver that
provides DTS thermal readings for DIMMs.
Signed-off-by: Jae Hyun Yoo <[email protected]>
Co-developed-by: Iwona Winiarska <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
Documentation/hwmon/index.rst | 2 +
Documentation/hwmon/peci-cputemp.rst | 93 +++++++++++++++++++++++++++
Documentation/hwmon/peci-dimmtemp.rst | 58 +++++++++++++++++
MAINTAINERS | 2 +
4 files changed, 155 insertions(+)
create mode 100644 Documentation/hwmon/peci-cputemp.rst
create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index bc01601ea81a..cc76b5b3f791 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -154,6 +154,8 @@ Hardware Monitoring Kernel Drivers
pcf8591
pim4328
pm6764tr
+ peci-cputemp
+ peci-dimmtemp
pmbus
powr1220
pxe1610
diff --git a/Documentation/hwmon/peci-cputemp.rst b/Documentation/hwmon/peci-cputemp.rst
new file mode 100644
index 000000000000..d3a218ba810a
--- /dev/null
+++ b/Documentation/hwmon/peci-cputemp.rst
@@ -0,0 +1,93 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+Kernel driver peci-cputemp
+==========================
+
+Supported chips:
+ One of Intel server CPUs listed below which is connected to a PECI bus.
+ * Intel Xeon E5/E7 v3 server processors
+ Intel Xeon E5-14xx v3 family
+ Intel Xeon E5-24xx v3 family
+ Intel Xeon E5-16xx v3 family
+ Intel Xeon E5-26xx v3 family
+ Intel Xeon E5-46xx v3 family
+ Intel Xeon E7-48xx v3 family
+ Intel Xeon E7-88xx v3 family
+ * Intel Xeon E5/E7 v4 server processors
+ Intel Xeon E5-16xx v4 family
+ Intel Xeon E5-26xx v4 family
+ Intel Xeon E5-46xx v4 family
+ Intel Xeon E7-48xx v4 family
+ Intel Xeon E7-88xx v4 family
+ * Intel Xeon Scalable server processors
+ Intel Xeon D family
+ Intel Xeon Bronze family
+ Intel Xeon Silver family
+ Intel Xeon Gold family
+ Intel Xeon Platinum family
+
+ Datasheet: Available from http://www.intel.com/design/literature.htm
+
+Author: Jae Hyun Yoo <[email protected]>
+
+Description
+-----------
+
+This driver implements a generic PECI hwmon feature which provides Digital
+Thermal Sensor (DTS) thermal readings of the CPU package and CPU cores that are
+accessible via the processor PECI interface.
+
+All temperature values are given in millidegree Celsius and will be measurable
+only when the target CPU is powered on.
+
+Sysfs interface
+-------------------
+
+======================= =======================================================
+temp1_label "Die"
+temp1_input Provides current die temperature of the CPU package.
+temp1_max Provides thermal control temperature of the CPU package
+ which is also known as Tcontrol.
+temp1_crit Provides shutdown temperature of the CPU package which
+ is also known as the maximum processor junction
+ temperature, Tjmax or Tprochot.
+temp1_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax of
+ the CPU package.
+
+temp2_label "DTS"
+temp2_input Provides current DTS temperature of the CPU package.
+temp2_max Provides thermal control temperature of the CPU package
+ which is also known as Tcontrol.
+temp2_crit Provides shutdown temperature of the CPU package which
+ is also known as the maximum processor junction
+ temperature, Tjmax or Tprochot.
+temp2_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax of
+ the CPU package.
+
+temp3_label "Tcontrol"
+temp3_input Provides current Tcontrol temperature of the CPU
+ package which is also known as Fan Temperature target.
+ Indicates the relative value from thermal monitor trip
+ temperature at which fans should be engaged.
+temp3_crit Provides Tcontrol critical value of the CPU package
+ which is same to Tjmax.
+
+temp4_label "Tthrottle"
+temp4_input Provides current Tthrottle temperature of the CPU
+ package. Used for throttling temperature. If this value
+ is allowed and lower than Tjmax - the throttle will
+ occur and reported at lower than Tjmax.
+
+temp5_label "Tjmax"
+temp5_input Provides the maximum junction temperature, Tjmax of the
+ CPU package.
+
+temp[6-N]_label Provides string "Core X", where X is resolved core
+ number.
+temp[6-N]_input Provides current temperature of each core.
+temp[6-N]_max Provides thermal control temperature of the core.
+temp[6-N]_crit Provides shutdown temperature of the core.
+temp[6-N]_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax of
+ the core.
+
+======================= =======================================================
diff --git a/Documentation/hwmon/peci-dimmtemp.rst b/Documentation/hwmon/peci-dimmtemp.rst
new file mode 100644
index 000000000000..1778d9317e43
--- /dev/null
+++ b/Documentation/hwmon/peci-dimmtemp.rst
@@ -0,0 +1,58 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel driver peci-dimmtemp
+===========================
+
+Supported chips:
+ One of Intel server CPUs listed below which is connected to a PECI bus.
+ * Intel Xeon E5/E7 v3 server processors
+ Intel Xeon E5-14xx v3 family
+ Intel Xeon E5-24xx v3 family
+ Intel Xeon E5-16xx v3 family
+ Intel Xeon E5-26xx v3 family
+ Intel Xeon E5-46xx v3 family
+ Intel Xeon E7-48xx v3 family
+ Intel Xeon E7-88xx v3 family
+ * Intel Xeon E5/E7 v4 server processors
+ Intel Xeon E5-16xx v4 family
+ Intel Xeon E5-26xx v4 family
+ Intel Xeon E5-46xx v4 family
+ Intel Xeon E7-48xx v4 family
+ Intel Xeon E7-88xx v4 family
+ * Intel Xeon Scalable server processors
+ Intel Xeon D family
+ Intel Xeon Bronze family
+ Intel Xeon Silver family
+ Intel Xeon Gold family
+ Intel Xeon Platinum family
+
+ Datasheet: Available from http://www.intel.com/design/literature.htm
+
+Author: Jae Hyun Yoo <[email protected]>
+
+Description
+-----------
+
+This driver implements a generic PECI hwmon feature which provides Digital
+Thermal Sensor (DTS) thermal readings of DIMM components that are accessible
+via the processor PECI interface.
+
+All temperature values are given in millidegree Celsius and will be measurable
+only when the target CPU is powered on.
+
+Sysfs interface
+-------------------
+
+======================= =======================================================
+
+temp[N]_label Provides string "DIMM CI", where C is DIMM channel and
+ I is DIMM index of the populated DIMM.
+temp[N]_input Provides current temperature of the populated DIMM.
+temp[N]_max Provides thermal control temperature of the DIMM.
+temp[N]_crit Provides shutdown temperature of the DIMM.
+
+======================= =======================================================
+
+Note:
+ DIMM temperature attributes will appear when the client CPU's BIOS
+ completes memory training and testing.
diff --git a/MAINTAINERS b/MAINTAINERS
index 35ba9e3646bd..d16da127bbdc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14509,6 +14509,8 @@ M: Iwona Winiarska <[email protected]>
R: Jae Hyun Yoo <[email protected]>
L: [email protected]
S: Supported
+F: Documentation/hwmon/peci-cputemp.rst
+F: Documentation/hwmon/peci-dimmtemp.rst
F: drivers/hwmon/peci/
PECI SUBSYSTEM
--
2.31.1
PECI is an interface that may be used by different types of devices.
Here we're adding a peci-cpu driver compatible with Intel processors.
The driver is responsible for handling auxiliary devices that can
subsequently be used by other drivers (e.g. hwmons).
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
MAINTAINERS | 1 +
drivers/peci/Kconfig | 15 ++
drivers/peci/Makefile | 2 +
drivers/peci/cpu.c | 347 +++++++++++++++++++++++++++++++++++++++
drivers/peci/device.c | 1 +
drivers/peci/internal.h | 27 +++
drivers/peci/request.c | 211 ++++++++++++++++++++++++
include/linux/peci-cpu.h | 38 +++++
include/linux/peci.h | 8 -
9 files changed, 642 insertions(+), 8 deletions(-)
create mode 100644 drivers/peci/cpu.c
create mode 100644 include/linux/peci-cpu.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 4ba874afa2fa..f47b5f634293 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14511,6 +14511,7 @@ L: [email protected] (moderated for non-subscribers)
S: Supported
F: Documentation/devicetree/bindings/peci/
F: drivers/peci/
+F: include/linux/peci-cpu.h
F: include/linux/peci.h
PENSANDO ETHERNET DRIVERS
diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
index 27c31535843c..9e17e06fda90 100644
--- a/drivers/peci/Kconfig
+++ b/drivers/peci/Kconfig
@@ -16,6 +16,21 @@ menuconfig PECI
if PECI
+config PECI_CPU
+ tristate "PECI CPU"
+ select AUXILIARY_BUS
+ help
+ This option enables peci-cpu driver for Intel processors. It is
+ responsible for creating auxiliary devices that can subsequently
+ be used by other drivers in order to perform various
+ functionalities such as e.g. temperature monitoring.
+
+ Additional drivers must be enabled in order to use the functionality
+ of the device.
+
+ This driver can also be built as a module. If so, the module
+ will be called peci-cpu.
+
source "drivers/peci/controller/Kconfig"
endif # PECI
diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
index 917f689e147a..7de18137e738 100644
--- a/drivers/peci/Makefile
+++ b/drivers/peci/Makefile
@@ -3,6 +3,8 @@
# Core functionality
peci-y := core.o request.o device.o sysfs.o
obj-$(CONFIG_PECI) += peci.o
+peci-cpu-y := cpu.o
+obj-$(CONFIG_PECI_CPU) += peci-cpu.o
# Hardware specific bus drivers
obj-y += controller/
diff --git a/drivers/peci/cpu.c b/drivers/peci/cpu.c
new file mode 100644
index 000000000000..8d130a9a71ad
--- /dev/null
+++ b/drivers/peci/cpu.c
@@ -0,0 +1,347 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2021 Intel Corporation
+
+#include <linux/auxiliary_bus.h>
+#include <linux/module.h>
+#include <linux/peci.h>
+#include <linux/peci-cpu.h>
+#include <linux/slab.h>
+#include <linux/x86/intel-family.h>
+
+#include "internal.h"
+
+/**
+ * peci_temp_read() - read the maximum die temperature from PECI target device
+ * @device: PECI device to which request is going to be sent
+ * @temp_raw: where to store the read temperature
+ *
+ * It uses GetTemp PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_temp_read(struct peci_device *device, s16 *temp_raw)
+{
+ struct peci_request *req;
+
+ req = peci_get_temp(device);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ *temp_raw = peci_request_data_temp(req);
+
+ peci_request_free(req);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(peci_temp_read, PECI_CPU);
+
+/**
+ * peci_pcs_read() - read PCS register
+ * @device: PECI device to which request is going to be sent
+ * @index: PCS index
+ * @param: PCS parameter
+ * @data: where to store the read data
+ *
+ * It uses RdPkgConfig PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_pcs_read(struct peci_device *device, u8 index, u16 param, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_pkg_cfg_readl(device, index, param);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_pcs_read, PECI_CPU);
+
+/**
+ * peci_pci_local_read() - read 32-bit memory location using raw address
+ * @device: PECI device to which request is going to be sent
+ * @bus: bus
+ * @dev: device
+ * @func: function
+ * @reg: register
+ * @data: where to store the read data
+ *
+ * It uses RdPCIConfigLocal PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_pci_local_read(struct peci_device *device, u8 bus, u8 dev, u8 func,
+ u16 reg, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_pci_cfg_local_readl(device, bus, dev, func, reg);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_pci_local_read, PECI_CPU);
+
+/**
+ * peci_ep_pci_local_read() - read 32-bit memory location using raw address
+ * @device: PECI device to which request is going to be sent
+ * @seg: PCI segment
+ * @bus: bus
+ * @dev: device
+ * @func: function
+ * @reg: register
+ * @data: where to store the read data
+ *
+ * Like &peci_pci_local_read, but it uses RdEndpointConfig PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_ep_pci_local_read(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_ep_pci_cfg_local_readl(device, seg, bus, dev, func, reg);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_ep_pci_local_read, PECI_CPU);
+
+/**
+ * peci_mmio_read() - read 32-bit memory location using 64-bit bar offset address
+ * @device: PECI device to which request is going to be sent
+ * @bar: PCI bar
+ * @seg: PCI segment
+ * @bus: bus
+ * @dev: device
+ * @func: function
+ * @address: 64-bit MMIO address
+ * @data: where to store the read data
+ *
+ * It uses RdEndpointConfig PECI command.
+ *
+ * Return: 0 if succeeded, other values in case errors.
+ */
+int peci_mmio_read(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 address, u32 *data)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_ep_mmio64_readl(device, bar, seg, bus, dev, func, address);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
+ ret = peci_request_status(req);
+ if (ret)
+ goto out_req_free;
+
+ *data = peci_request_data_readl(req);
+out_req_free:
+ peci_request_free(req);
+
+ return ret;
+}
+EXPORT_SYMBOL_NS_GPL(peci_mmio_read, PECI_CPU);
+
+struct peci_cpu {
+ struct peci_device *device;
+ const struct peci_device_id *id;
+ struct auxiliary_device **aux_devices;
+};
+
+static const char * const type[] = {
+ "cputemp",
+ "dimmtemp",
+};
+
+static void adev_release(struct device *dev)
+{
+ struct auxiliary_device *adev = to_auxiliary_dev(dev);
+
+ kfree(adev->name);
+ kfree(adev);
+}
+
+static struct auxiliary_device *add_adev(struct peci_cpu *priv, int idx)
+{
+ struct peci_controller *controller = priv->device->controller;
+ struct auxiliary_device *adev;
+ const char *name;
+ int ret;
+
+ adev = kzalloc(sizeof(*adev), GFP_KERNEL);
+ if (!adev)
+ return ERR_PTR(-ENOMEM);
+
+ name = kasprintf(GFP_KERNEL, "%s.%s", type[idx], (const char *)priv->id->data);
+ if (!name) {
+ ret = -ENOMEM;
+ goto free_adev;
+ }
+
+ adev->name = name;
+ adev->dev.parent = &priv->device->dev;
+ adev->dev.release = adev_release;
+ adev->id = (controller->id << 16) | (priv->device->addr);
+
+ ret = auxiliary_device_init(adev);
+ if (ret)
+ goto free_name;
+
+ ret = auxiliary_device_add(adev);
+ if (ret) {
+ auxiliary_device_uninit(adev);
+ return ERR_PTR(ret);
+ }
+
+ return adev;
+
+free_name:
+ kfree(name);
+free_adev:
+ kfree(adev);
+ return ERR_PTR(ret);
+}
+
+static void del_adev(struct auxiliary_device *adev)
+{
+ auxiliary_device_delete(adev);
+ auxiliary_device_uninit(adev);
+}
+
+static int peci_cpu_add_adevices(struct peci_cpu *priv)
+{
+ struct device *dev = &priv->device->dev;
+ struct auxiliary_device *adev;
+ int i;
+
+ priv->aux_devices = devm_kcalloc(dev, ARRAY_SIZE(type),
+ sizeof(*priv->aux_devices),
+ GFP_KERNEL);
+ if (!priv->aux_devices)
+ return -ENOMEM;
+
+ for (i = 0; i < ARRAY_SIZE(type); i++) {
+ adev = add_adev(priv, i);
+ if (IS_ERR(adev)) {
+ dev_warn(dev, "Failed to add PECI auxiliary: %s, ret = %ld\n",
+ type[i], PTR_ERR(adev));
+ continue;
+ }
+
+ priv->aux_devices[i] = adev;
+ }
+ return 0;
+}
+
+static int
+peci_cpu_probe(struct peci_device *device, const struct peci_device_id *id)
+{
+ struct device *dev = &device->dev;
+ struct peci_cpu *priv;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ dev_set_drvdata(dev, priv);
+ priv->device = device;
+ priv->id = id;
+
+ return peci_cpu_add_adevices(priv);
+}
+
+static void peci_cpu_remove(struct peci_device *device)
+{
+ struct peci_cpu *priv = dev_get_drvdata(&device->dev);
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(type); i++) {
+ struct auxiliary_device *adev = priv->aux_devices[i];
+
+ if (adev)
+ del_adev(adev);
+ }
+}
+
+static const struct peci_device_id peci_cpu_device_ids[] = {
+ { /* Haswell Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_HASWELL_X,
+ .data = "hsx",
+ },
+ { /* Broadwell Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_BROADWELL_X,
+ .data = "bdx",
+ },
+ { /* Broadwell Xeon D */
+ .family = 6,
+ .model = INTEL_FAM6_BROADWELL_D,
+ .data = "skxd",
+ },
+ { /* Skylake Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_SKYLAKE_X,
+ .data = "skx",
+ },
+ { /* Icelake Xeon */
+ .family = 6,
+ .model = INTEL_FAM6_ICELAKE_X,
+ .data = "icx",
+ },
+ { /* Icelake Xeon D */
+ .family = 6,
+ .model = INTEL_FAM6_ICELAKE_D,
+ .data = "icxd",
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(peci, peci_cpu_device_ids);
+
+static struct peci_driver peci_cpu_driver = {
+ .probe = peci_cpu_probe,
+ .remove = peci_cpu_remove,
+ .id_table = peci_cpu_device_ids,
+ .driver = {
+ .name = "peci-cpu",
+ },
+};
+module_peci_driver(peci_cpu_driver);
+
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI CPU driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(PECI);
diff --git a/drivers/peci/device.c b/drivers/peci/device.c
index 8c4bd1ebbc29..c278c9ea166c 100644
--- a/drivers/peci/device.c
+++ b/drivers/peci/device.c
@@ -3,6 +3,7 @@
#include <linux/bitfield.h>
#include <linux/peci.h>
+#include <linux/peci-cpu.h>
#include <linux/slab.h>
#include <linux/x86/cpu.h>
diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
index c891c93e077a..1d39483a8acf 100644
--- a/drivers/peci/internal.h
+++ b/drivers/peci/internal.h
@@ -21,6 +21,7 @@ void peci_request_free(struct peci_request *req);
int peci_request_status(struct peci_request *req);
u64 peci_request_data_dib(struct peci_request *req);
+s16 peci_request_data_temp(struct peci_request *req);
u8 peci_request_data_readb(struct peci_request *req);
u16 peci_request_data_readw(struct peci_request *req);
@@ -35,6 +36,32 @@ struct peci_request *peci_pkg_cfg_readw(struct peci_device *device, u8 index, u1
struct peci_request *peci_pkg_cfg_readl(struct peci_device *device, u8 index, u16 param);
struct peci_request *peci_pkg_cfg_readq(struct peci_device *device, u8 index, u16 param);
+struct peci_request *peci_pci_cfg_local_readb(struct peci_device *device,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_pci_cfg_local_readw(struct peci_device *device,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_pci_cfg_local_readl(struct peci_device *device,
+ u8 bus, u8 dev, u8 func, u16 reg);
+
+struct peci_request *peci_ep_pci_cfg_local_readb(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_ep_pci_cfg_local_readw(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_ep_pci_cfg_local_readl(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+
+struct peci_request *peci_ep_pci_cfg_readb(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_ep_pci_cfg_readw(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+struct peci_request *peci_ep_pci_cfg_readl(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg);
+
+struct peci_request *peci_ep_mmio32_readl(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 offset);
+
+struct peci_request *peci_ep_mmio64_readl(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 offset);
/**
* struct peci_device_id - PECI device data to match
* @data: pointer to driver private data specific to device
diff --git a/drivers/peci/request.c b/drivers/peci/request.c
index 48354455b554..c5d39f7e8142 100644
--- a/drivers/peci/request.c
+++ b/drivers/peci/request.c
@@ -3,6 +3,7 @@
#include <linux/bug.h>
#include <linux/export.h>
+#include <linux/pci.h>
#include <linux/peci.h>
#include <linux/slab.h>
#include <linux/types.h>
@@ -15,6 +16,10 @@
#define PECI_GET_DIB_WR_LEN 1
#define PECI_GET_DIB_RD_LEN 8
+#define PECI_GET_TEMP_CMD 0x01
+#define PECI_GET_TEMP_WR_LEN 1
+#define PECI_GET_TEMP_RD_LEN 2
+
#define PECI_RDPKGCFG_CMD 0xa1
#define PECI_RDPKGCFG_WRITE_LEN 5
#define PECI_RDPKGCFG_READ_LEN_BASE 1
@@ -22,6 +27,44 @@
#define PECI_WRPKGCFG_WRITE_LEN_BASE 6
#define PECI_WRPKGCFG_READ_LEN 1
+#define PECI_RDIAMSR_CMD 0xb1
+#define PECI_RDIAMSR_WRITE_LEN 5
+#define PECI_RDIAMSR_READ_LEN 9
+#define PECI_WRIAMSR_CMD 0xb5
+#define PECI_RDIAMSREX_CMD 0xd1
+#define PECI_RDIAMSREX_WRITE_LEN 6
+#define PECI_RDIAMSREX_READ_LEN 9
+
+#define PECI_RDPCICFG_CMD 0x61
+#define PECI_RDPCICFG_WRITE_LEN 6
+#define PECI_RDPCICFG_READ_LEN 5
+#define PECI_RDPCICFG_READ_LEN_MAX 24
+#define PECI_WRPCICFG_CMD 0x65
+
+#define PECI_RDPCICFGLOCAL_CMD 0xe1
+#define PECI_RDPCICFGLOCAL_WRITE_LEN 5
+#define PECI_RDPCICFGLOCAL_READ_LEN_BASE 1
+#define PECI_WRPCICFGLOCAL_CMD 0xe5
+#define PECI_WRPCICFGLOCAL_WRITE_LEN_BASE 6
+#define PECI_WRPCICFGLOCAL_READ_LEN 1
+
+#define PECI_ENDPTCFG_TYPE_LOCAL_PCI 0x03
+#define PECI_ENDPTCFG_TYPE_PCI 0x04
+#define PECI_ENDPTCFG_TYPE_MMIO 0x05
+#define PECI_ENDPTCFG_ADDR_TYPE_PCI 0x04
+#define PECI_ENDPTCFG_ADDR_TYPE_MMIO_D 0x05
+#define PECI_ENDPTCFG_ADDR_TYPE_MMIO_Q 0x06
+#define PECI_RDENDPTCFG_CMD 0xc1
+#define PECI_RDENDPTCFG_PCI_WRITE_LEN 12
+#define PECI_RDENDPTCFG_MMIO_D_WRITE_LEN 14
+#define PECI_RDENDPTCFG_MMIO_Q_WRITE_LEN 18
+#define PECI_RDENDPTCFG_READ_LEN_BASE 1
+#define PECI_WRENDPTCFG_CMD 0xc5
+#define PECI_WRENDPTCFG_PCI_WRITE_LEN_BASE 13
+#define PECI_WRENDPTCFG_MMIO_D_WRITE_LEN_BASE 15
+#define PECI_WRENDPTCFG_MMIO_Q_WRITE_LEN_BASE 19
+#define PECI_WRENDPTCFG_READ_LEN 1
+
/* Device Specific Completion Code (CC) Definition */
#define PECI_CC_SUCCESS 0x40
#define PECI_CC_NEED_RETRY 0x80
@@ -223,6 +266,27 @@ struct peci_request *peci_get_dib(struct peci_device *device)
}
EXPORT_SYMBOL_NS_GPL(peci_get_dib, PECI);
+struct peci_request *peci_get_temp(struct peci_device *device)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_GET_TEMP_WR_LEN, PECI_GET_TEMP_RD_LEN);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_GET_TEMP_CMD;
+
+ ret = peci_request_xfer(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+EXPORT_SYMBOL_NS_GPL(peci_get_temp, PECI);
+
static struct peci_request *
__pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
{
@@ -248,6 +312,108 @@ __pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
return req;
}
+static u32 __get_pci_addr(u8 bus, u8 dev, u8 func, u16 reg)
+{
+ return reg | PCI_DEVID(bus, PCI_DEVFN(dev, func)) << 12;
+}
+
+static struct peci_request *
+__pci_cfg_local_read(struct peci_device *device, u8 bus, u8 dev, u8 func, u16 reg, u8 len)
+{
+ struct peci_request *req;
+ u32 pci_addr;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_RDPCICFGLOCAL_WRITE_LEN,
+ PECI_RDPCICFGLOCAL_READ_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ pci_addr = __get_pci_addr(bus, dev, func, reg);
+
+ req->tx.buf[0] = PECI_RDPCICFGLOCAL_CMD;
+ req->tx.buf[1] = 0;
+ put_unaligned_le24(pci_addr, &req->tx.buf[2]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
+static struct peci_request *
+__ep_pci_cfg_read(struct peci_device *device, u8 msg_type, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg, u8 len)
+{
+ struct peci_request *req;
+ u32 pci_addr;
+ int ret;
+
+ req = peci_request_alloc(device, PECI_RDENDPTCFG_PCI_WRITE_LEN,
+ PECI_RDENDPTCFG_READ_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ pci_addr = __get_pci_addr(bus, dev, func, reg);
+
+ req->tx.buf[0] = PECI_RDENDPTCFG_CMD;
+ req->tx.buf[1] = 0;
+ req->tx.buf[2] = msg_type;
+ req->tx.buf[3] = 0;
+ req->tx.buf[4] = 0;
+ req->tx.buf[5] = 0;
+ req->tx.buf[6] = PECI_ENDPTCFG_ADDR_TYPE_PCI;
+ req->tx.buf[7] = seg; /* PCI Segment */
+ put_unaligned_le32(pci_addr, &req->tx.buf[8]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
+static struct peci_request *
+__ep_mmio_read(struct peci_device *device, u8 bar, u8 addr_type, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 offset, u8 tx_len, u8 len)
+{
+ struct peci_request *req;
+ int ret;
+
+ req = peci_request_alloc(device, tx_len, PECI_RDENDPTCFG_READ_LEN_BASE + len);
+ if (!req)
+ return ERR_PTR(-ENOMEM);
+
+ req->tx.buf[0] = PECI_RDENDPTCFG_CMD;
+ req->tx.buf[1] = 0;
+ req->tx.buf[2] = PECI_ENDPTCFG_TYPE_MMIO;
+ req->tx.buf[3] = 0; /* Endpoint ID */
+ req->tx.buf[4] = 0; /* Reserved */
+ req->tx.buf[5] = bar;
+ req->tx.buf[6] = addr_type;
+ req->tx.buf[7] = seg; /* PCI Segment */
+ req->tx.buf[8] = PCI_DEVFN(dev, func);
+ req->tx.buf[9] = bus; /* PCI Bus */
+
+ if (addr_type == PECI_ENDPTCFG_ADDR_TYPE_MMIO_D)
+ put_unaligned_le32(offset, &req->tx.buf[10]);
+ else
+ put_unaligned_le64(offset, &req->tx.buf[10]);
+
+ ret = peci_request_xfer_retry(req);
+ if (ret) {
+ peci_request_free(req);
+ return ERR_PTR(ret);
+ }
+
+ return req;
+}
+
u8 peci_request_data_readb(struct peci_request *req)
{
return req->rx.buf[1];
@@ -278,6 +444,12 @@ u64 peci_request_data_dib(struct peci_request *req)
}
EXPORT_SYMBOL_NS_GPL(peci_request_data_dib, PECI);
+s16 peci_request_data_temp(struct peci_request *req)
+{
+ return get_unaligned_le16(&req->rx.buf[0]);
+}
+EXPORT_SYMBOL_NS_GPL(peci_request_data_temp, PECI);
+
#define __read_pkg_config(x, type) \
struct peci_request *peci_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param) \
{ \
@@ -289,3 +461,42 @@ __read_pkg_config(readb, u8);
__read_pkg_config(readw, u16);
__read_pkg_config(readl, u32);
__read_pkg_config(readq, u64);
+
+#define __read_pci_config_local(x, type) \
+struct peci_request * \
+peci_pci_cfg_local_##x(struct peci_device *device, u8 bus, u8 dev, u8 func, u16 reg) \
+{ \
+ return __pci_cfg_local_read(device, bus, dev, func, reg, sizeof(type)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_pci_cfg_local_##x, PECI)
+
+__read_pci_config_local(readb, u8);
+__read_pci_config_local(readw, u16);
+__read_pci_config_local(readl, u32);
+
+#define __read_ep_pci_config(x, msg_type, type) \
+struct peci_request * \
+peci_ep_pci_cfg_##x(struct peci_device *device, u8 seg, u8 bus, u8 dev, u8 func, u16 reg) \
+{ \
+ return __ep_pci_cfg_read(device, msg_type, seg, bus, dev, func, reg, sizeof(type)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_ep_pci_cfg_##x, PECI)
+
+__read_ep_pci_config(local_readb, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u8);
+__read_ep_pci_config(local_readw, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u16);
+__read_ep_pci_config(local_readl, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u32);
+__read_ep_pci_config(readb, PECI_ENDPTCFG_TYPE_PCI, u8);
+__read_ep_pci_config(readw, PECI_ENDPTCFG_TYPE_PCI, u16);
+__read_ep_pci_config(readl, PECI_ENDPTCFG_TYPE_PCI, u32);
+
+#define __read_ep_mmio(x, y, addr_type, type1, type2) \
+struct peci_request *peci_ep_mmio##y##_##x(struct peci_device *device, u8 bar, u8 seg, \
+ u8 bus, u8 dev, u8 func, u64 offset) \
+{ \
+ return __ep_mmio_read(device, bar, addr_type, seg, bus, dev, func, \
+ offset, 10 + sizeof(type1), sizeof(type2)); \
+} \
+EXPORT_SYMBOL_NS_GPL(peci_ep_mmio##y##_##x, PECI)
+
+__read_ep_mmio(readl, 32, PECI_ENDPTCFG_ADDR_TYPE_MMIO_D, u32, u32);
+__read_ep_mmio(readl, 64, PECI_ENDPTCFG_ADDR_TYPE_MMIO_Q, u64, u32);
diff --git a/include/linux/peci-cpu.h b/include/linux/peci-cpu.h
new file mode 100644
index 000000000000..d1b307ec2429
--- /dev/null
+++ b/include/linux/peci-cpu.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2021 Intel Corporation */
+
+#ifndef __LINUX_PECI_CPU_H
+#define __LINUX_PECI_CPU_H
+
+#include <linux/types.h>
+
+#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
+#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
+#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
+#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
+#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
+#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
+#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
+#define PECI_PCS_MODULE_TEMP 9 /* Per Core DTS Temperature Read */
+#define PECI_PCS_THERMAL_MARGIN 10 /* DTS thermal margin */
+#define PECI_PCS_DDR_DIMM_TEMP 14 /* DDR DIMM Temperature */
+#define PECI_PCS_TEMP_TARGET 16 /* Temperature Target Read */
+#define PECI_PCS_TDP_UNITS 30 /* Units for power/energy registers */
+
+struct peci_device;
+
+int peci_temp_read(struct peci_device *device, s16 *temp_raw);
+
+int peci_pcs_read(struct peci_device *device, u8 index,
+ u16 param, u32 *data);
+
+int peci_pci_local_read(struct peci_device *device, u8 bus, u8 dev,
+ u8 func, u16 reg, u32 *data);
+
+int peci_ep_pci_local_read(struct peci_device *device, u8 seg,
+ u8 bus, u8 dev, u8 func, u16 reg, u32 *data);
+
+int peci_mmio_read(struct peci_device *device, u8 bar, u8 seg,
+ u8 bus, u8 dev, u8 func, u64 address, u32 *data);
+
+#endif /* __LINUX_PECI_CPU_H */
diff --git a/include/linux/peci.h b/include/linux/peci.h
index f9f37b874011..31f9e628fd11 100644
--- a/include/linux/peci.h
+++ b/include/linux/peci.h
@@ -9,14 +9,6 @@
#include <linux/mutex.h>
#include <linux/types.h>
-#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
-#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
-#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
-#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
-#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
-#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
-#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
-
struct peci_request;
/**
--
2.31.1
Add peci-dimmtemp driver for Digital Thermal Sensor (DTS) thermal
readings of DIMMs that are accessible via the processor PECI interface.
The main use case for the driver (and PECI interface) is out-of-band
management, where we're able to obtain the DTS readings from an external
entity connected with PECI, e.g. BMC on server platforms.
Co-developed-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Jae Hyun Yoo <[email protected]>
Signed-off-by: Iwona Winiarska <[email protected]>
Reviewed-by: Pierre-Louis Bossart <[email protected]>
---
drivers/hwmon/peci/Kconfig | 13 +
drivers/hwmon/peci/Makefile | 2 +
drivers/hwmon/peci/dimmtemp.c | 508 ++++++++++++++++++++++++++++++++++
3 files changed, 523 insertions(+)
create mode 100644 drivers/hwmon/peci/dimmtemp.c
diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
index e10eed68d70a..f2d57efa508b 100644
--- a/drivers/hwmon/peci/Kconfig
+++ b/drivers/hwmon/peci/Kconfig
@@ -14,5 +14,18 @@ config SENSORS_PECI_CPUTEMP
This driver can also be built as a module. If so, the module
will be called peci-cputemp.
+config SENSORS_PECI_DIMMTEMP
+ tristate "PECI DIMM temperature monitoring client"
+ depends on PECI
+ select SENSORS_PECI
+ select PECI_CPU
+ help
+ If you say yes here you get support for the generic Intel PECI hwmon
+ driver which provides Digital Thermal Sensor (DTS) thermal readings of
+ DIMM components that are accessible via the processor PECI interface.
+
+ This driver can also be built as a module. If so, the module
+ will be called peci-dimmtemp.
+
config SENSORS_PECI
tristate
diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
index e8a0ada5ab1f..191cfa0227f3 100644
--- a/drivers/hwmon/peci/Makefile
+++ b/drivers/hwmon/peci/Makefile
@@ -1,5 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
peci-cputemp-y := cputemp.o
+peci-dimmtemp-y := dimmtemp.o
obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
+obj-$(CONFIG_SENSORS_PECI_DIMMTEMP) += peci-dimmtemp.o
diff --git a/drivers/hwmon/peci/dimmtemp.c b/drivers/hwmon/peci/dimmtemp.c
new file mode 100644
index 000000000000..2fcb8607137a
--- /dev/null
+++ b/drivers/hwmon/peci/dimmtemp.c
@@ -0,0 +1,508 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) 2018-2021 Intel Corporation
+
+#include <linux/auxiliary_bus.h>
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/hwmon.h>
+#include <linux/jiffies.h>
+#include <linux/module.h>
+#include <linux/peci.h>
+#include <linux/peci-cpu.h>
+#include <linux/units.h>
+#include <linux/workqueue.h>
+#include <linux/x86/intel-family.h>
+
+#include "common.h"
+
+#define DIMM_MASK_CHECK_DELAY_JIFFIES msecs_to_jiffies(5000)
+#define DIMM_MASK_CHECK_RETRY_MAX 60 /* 60 x 5 secs = 5 minutes */
+
+/* Max number of channel ranks and DIMM index per channel */
+#define CHAN_RANK_MAX_ON_HSX 8
+#define DIMM_IDX_MAX_ON_HSX 3
+#define CHAN_RANK_MAX_ON_BDX 4
+#define DIMM_IDX_MAX_ON_BDX 3
+#define CHAN_RANK_MAX_ON_BDXD 2
+#define DIMM_IDX_MAX_ON_BDXD 2
+#define CHAN_RANK_MAX_ON_SKX 6
+#define DIMM_IDX_MAX_ON_SKX 2
+#define CHAN_RANK_MAX_ON_ICX 8
+#define DIMM_IDX_MAX_ON_ICX 2
+#define CHAN_RANK_MAX_ON_ICXD 4
+#define DIMM_IDX_MAX_ON_ICXD 2
+
+#define CHAN_RANK_MAX CHAN_RANK_MAX_ON_HSX
+#define DIMM_IDX_MAX DIMM_IDX_MAX_ON_HSX
+#define DIMM_NUMS_MAX (CHAN_RANK_MAX * DIMM_IDX_MAX)
+
+#define CPU_SEG_MASK GENMASK(23, 16)
+#define GET_CPU_SEG(x) (((x) & CPU_SEG_MASK) >> 16)
+#define CPU_BUS_MASK GENMASK(7, 0)
+#define GET_CPU_BUS(x) ((x) & CPU_BUS_MASK)
+
+#define DIMM_TEMP_MAX GENMASK(15, 8)
+#define DIMM_TEMP_CRIT GENMASK(23, 16)
+#define GET_TEMP_MAX(x) (((x) & DIMM_TEMP_MAX) >> 8)
+#define GET_TEMP_CRIT(x) (((x) & DIMM_TEMP_CRIT) >> 16)
+
+struct dimm_info {
+ int chan_rank_max;
+ int dimm_idx_max;
+ u8 min_peci_revision;
+};
+
+struct peci_dimmtemp {
+ struct peci_device *peci_dev;
+ struct device *dev;
+ const char *name;
+ const struct dimm_info *gen_info;
+ struct delayed_work detect_work;
+ struct peci_sensor_data temp[DIMM_NUMS_MAX];
+ long temp_max[DIMM_NUMS_MAX];
+ long temp_crit[DIMM_NUMS_MAX];
+ int retry_count;
+ char **dimmtemp_label;
+ DECLARE_BITMAP(dimm_mask, DIMM_NUMS_MAX);
+};
+
+static u8 __dimm_temp(u32 reg, int dimm_order)
+{
+ return (reg >> (dimm_order * 8)) & 0xff;
+}
+
+static int get_dimm_temp(struct peci_dimmtemp *priv, int dimm_no)
+{
+ int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
+ int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
+ struct peci_device *peci_dev = priv->peci_dev;
+ u8 cpu_seg, cpu_bus, dev, func;
+ u64 offset;
+ u32 data;
+ u16 reg;
+ int ret;
+
+ if (!peci_sensor_need_update(&priv->temp[dimm_no]))
+ return 0;
+
+ ret = peci_pcs_read(peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank, &data);
+ if (ret)
+ return ret;
+
+ priv->temp[dimm_no].value = __dimm_temp(data, dimm_order) * MILLIDEGREE_PER_DEGREE;
+
+ switch (peci_dev->info.model) {
+ case INTEL_FAM6_ICELAKE_X:
+ case INTEL_FAM6_ICELAKE_D:
+ ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd4, &data);
+ if (ret || !(data & BIT(31)))
+ break; /* Use default or previous value */
+
+ ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd0, &data);
+ if (ret)
+ break; /* Use default or previous value */
+
+ cpu_seg = GET_CPU_SEG(data);
+ cpu_bus = GET_CPU_BUS(data);
+
+ /*
+ * Device 26, Offset 224e0: IMC 0 channel 0 -> rank 0
+ * Device 26, Offset 264e0: IMC 0 channel 1 -> rank 1
+ * Device 27, Offset 224e0: IMC 1 channel 0 -> rank 2
+ * Device 27, Offset 264e0: IMC 1 channel 1 -> rank 3
+ * Device 28, Offset 224e0: IMC 2 channel 0 -> rank 4
+ * Device 28, Offset 264e0: IMC 2 channel 1 -> rank 5
+ * Device 29, Offset 224e0: IMC 3 channel 0 -> rank 6
+ * Device 29, Offset 264e0: IMC 3 channel 1 -> rank 7
+ */
+ dev = 0x1a + chan_rank / 2;
+ offset = 0x224e0 + dimm_order * 4;
+ if (chan_rank % 2)
+ offset += 0x4000;
+
+ ret = peci_mmio_read(peci_dev, 0, cpu_seg, cpu_bus, dev, 0, offset, &data);
+ if (ret)
+ return ret;
+
+ priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
+ priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
+
+ break;
+ case INTEL_FAM6_SKYLAKE_X:
+ /*
+ * Device 10, Function 2: IMC 0 channel 0 -> rank 0
+ * Device 10, Function 6: IMC 0 channel 1 -> rank 1
+ * Device 11, Function 2: IMC 0 channel 2 -> rank 2
+ * Device 12, Function 2: IMC 1 channel 0 -> rank 3
+ * Device 12, Function 6: IMC 1 channel 1 -> rank 4
+ * Device 13, Function 2: IMC 1 channel 2 -> rank 5
+ */
+ dev = 10 + chan_rank / 3 * 2 + (chan_rank % 3 == 2 ? 1 : 0);
+ func = chan_rank % 3 == 1 ? 6 : 2;
+ reg = 0x120 + dimm_order * 4;
+
+ ret = peci_pci_local_read(peci_dev, 2, dev, func, reg, &data);
+ if (ret)
+ return ret;
+
+ priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
+ priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
+
+ break;
+ case INTEL_FAM6_BROADWELL_D:
+ /*
+ * Device 10, Function 2: IMC 0 channel 0 -> rank 0
+ * Device 10, Function 6: IMC 0 channel 1 -> rank 1
+ * Device 12, Function 2: IMC 1 channel 0 -> rank 2
+ * Device 12, Function 6: IMC 1 channel 1 -> rank 3
+ */
+ dev = 10 + chan_rank / 2 * 2;
+ func = (chan_rank % 2) ? 6 : 2;
+ reg = 0x120 + dimm_order * 4;
+
+ ret = peci_pci_local_read(peci_dev, 2, dev, func, reg, &data);
+ if (ret)
+ return ret;
+
+ priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
+ priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
+
+ break;
+ case INTEL_FAM6_HASWELL_X:
+ case INTEL_FAM6_BROADWELL_X:
+ /*
+ * Device 20, Function 0: IMC 0 channel 0 -> rank 0
+ * Device 20, Function 1: IMC 0 channel 1 -> rank 1
+ * Device 21, Function 0: IMC 0 channel 2 -> rank 2
+ * Device 21, Function 1: IMC 0 channel 3 -> rank 3
+ * Device 23, Function 0: IMC 1 channel 0 -> rank 4
+ * Device 23, Function 1: IMC 1 channel 1 -> rank 5
+ * Device 24, Function 0: IMC 1 channel 2 -> rank 6
+ * Device 24, Function 1: IMC 1 channel 3 -> rank 7
+ */
+ dev = 20 + chan_rank / 2 + chan_rank / 4;
+ func = chan_rank % 2;
+ reg = 0x120 + dimm_order * 4;
+
+ ret = peci_pci_local_read(peci_dev, 1, dev, func, reg, &data);
+ if (ret)
+ return ret;
+
+ priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
+ priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
+
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ peci_sensor_mark_updated(&priv->temp[dimm_no]);
+
+ return 0;
+}
+
+static int dimmtemp_read_string(struct device *dev,
+ enum hwmon_sensor_types type,
+ u32 attr, int channel, const char **str)
+{
+ struct peci_dimmtemp *priv = dev_get_drvdata(dev);
+
+ if (attr != hwmon_temp_label)
+ return -EOPNOTSUPP;
+
+ *str = (const char *)priv->dimmtemp_label[channel];
+
+ return 0;
+}
+
+static int dimmtemp_read(struct device *dev, enum hwmon_sensor_types type,
+ u32 attr, int channel, long *val)
+{
+ struct peci_dimmtemp *priv = dev_get_drvdata(dev);
+ int ret;
+
+ ret = get_dimm_temp(priv, channel);
+ if (ret)
+ return ret;
+
+ switch (attr) {
+ case hwmon_temp_input:
+ *val = priv->temp[channel].value;
+ break;
+ case hwmon_temp_max:
+ *val = priv->temp_max[channel];
+ break;
+ case hwmon_temp_crit:
+ *val = priv->temp_crit[channel];
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static umode_t dimmtemp_is_visible(const void *data, enum hwmon_sensor_types type,
+ u32 attr, int channel)
+{
+ const struct peci_dimmtemp *priv = data;
+
+ if (test_bit(channel, priv->dimm_mask))
+ return 0444;
+
+ return 0;
+}
+
+static const struct hwmon_ops peci_dimmtemp_ops = {
+ .is_visible = dimmtemp_is_visible,
+ .read_string = dimmtemp_read_string,
+ .read = dimmtemp_read,
+};
+
+static int check_populated_dimms(struct peci_dimmtemp *priv)
+{
+ int chan_rank_max = priv->gen_info->chan_rank_max;
+ int dimm_idx_max = priv->gen_info->dimm_idx_max;
+ int chan_rank, dimm_idx, ret;
+ u64 dimm_mask = 0;
+ u32 pcs;
+
+ for (chan_rank = 0; chan_rank < chan_rank_max; chan_rank++) {
+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank, &pcs);
+ if (ret) {
+ /*
+ * Overall, we expect either success or -EINVAL in
+ * order to determine whether DIMM is populated or not.
+ * For anything else - we fall back to defering the
+ * detection to be performed at a later point in time.
+ */
+ if (ret == -EINVAL)
+ continue;
+ else
+ return -EAGAIN;
+ }
+
+ for (dimm_idx = 0; dimm_idx < dimm_idx_max; dimm_idx++)
+ if (__dimm_temp(pcs, dimm_idx))
+ dimm_mask |= BIT(chan_rank * dimm_idx_max + dimm_idx);
+ }
+ /*
+ * It's possible that memory training is not done yet. In this case we
+ * defer the detection to be performed at a later point in time.
+ */
+ if (!dimm_mask)
+ return -EAGAIN;
+
+ dev_dbg(priv->dev, "Scanned populated DIMMs: %#llx\n", dimm_mask);
+
+ bitmap_from_u64(priv->dimm_mask, dimm_mask);
+
+ return 0;
+}
+
+static int create_dimm_temp_label(struct peci_dimmtemp *priv, int chan)
+{
+ int rank = chan / priv->gen_info->dimm_idx_max;
+ int idx = chan % priv->gen_info->dimm_idx_max;
+
+ priv->dimmtemp_label[chan] = devm_kasprintf(priv->dev, GFP_KERNEL,
+ "DIMM %c%d", 'A' + rank,
+ idx + 1);
+ if (!priv->dimmtemp_label[chan])
+ return -ENOMEM;
+
+ return 0;
+}
+
+static const u32 peci_dimmtemp_temp_channel_config[] = {
+ [0 ... DIMM_NUMS_MAX - 1] = HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT,
+ 0
+};
+
+static const struct hwmon_channel_info peci_dimmtemp_temp_channel = {
+ .type = hwmon_temp,
+ .config = peci_dimmtemp_temp_channel_config,
+};
+
+static const struct hwmon_channel_info *peci_dimmtemp_temp_info[] = {
+ &peci_dimmtemp_temp_channel,
+ NULL
+};
+
+static const struct hwmon_chip_info peci_dimmtemp_chip_info = {
+ .ops = &peci_dimmtemp_ops,
+ .info = peci_dimmtemp_temp_info,
+};
+
+static int create_dimm_temp_info(struct peci_dimmtemp *priv)
+{
+ int ret, i, channels;
+ struct device *dev;
+
+ ret = check_populated_dimms(priv);
+ if (ret == -EAGAIN) {
+ if (priv->retry_count < DIMM_MASK_CHECK_RETRY_MAX) {
+ schedule_delayed_work(&priv->detect_work,
+ DIMM_MASK_CHECK_DELAY_JIFFIES);
+ priv->retry_count++;
+ dev_dbg(priv->dev, "Deferred populating DIMM temp info\n");
+ return ret;
+ }
+
+ dev_info(priv->dev, "Timeout populating DIMM temp info\n");
+ return -ETIMEDOUT;
+ }
+
+ channels = priv->gen_info->chan_rank_max * priv->gen_info->dimm_idx_max;
+
+ priv->dimmtemp_label = devm_kzalloc(priv->dev, channels * sizeof(char *), GFP_KERNEL);
+ if (!priv->dimmtemp_label)
+ return -ENOMEM;
+
+ for_each_set_bit(i, priv->dimm_mask, DIMM_NUMS_MAX) {
+ ret = create_dimm_temp_label(priv, i);
+ if (ret)
+ return ret;
+ }
+
+ dev = devm_hwmon_device_register_with_info(priv->dev, priv->name, priv,
+ &peci_dimmtemp_chip_info, NULL);
+ if (IS_ERR(dev)) {
+ dev_err(priv->dev, "Failed to register hwmon device\n");
+ return PTR_ERR(dev);
+ }
+
+ dev_dbg(priv->dev, "%s: sensor '%s'\n", dev_name(dev), priv->name);
+
+ return 0;
+}
+
+static void create_dimm_temp_info_delayed(struct work_struct *work)
+{
+ struct peci_dimmtemp *priv = container_of(to_delayed_work(work),
+ struct peci_dimmtemp,
+ detect_work);
+ int ret;
+
+ ret = create_dimm_temp_info(priv);
+ if (ret && ret != -EAGAIN)
+ dev_dbg(priv->dev, "Failed to populate DIMM temp info\n");
+}
+
+static int peci_dimmtemp_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
+{
+ struct device *dev = &adev->dev;
+ struct peci_device *peci_dev = to_peci_device(dev->parent);
+ struct peci_dimmtemp *priv;
+ int ret;
+
+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_dimmtemp.cpu%d",
+ peci_dev->info.socket_id);
+ if (!priv->name)
+ return -ENOMEM;
+
+ dev_set_drvdata(dev, priv);
+ priv->dev = dev;
+ priv->peci_dev = peci_dev;
+ priv->gen_info = (const struct dimm_info *)id->driver_data;
+
+ INIT_DELAYED_WORK(&priv->detect_work, create_dimm_temp_info_delayed);
+
+ ret = create_dimm_temp_info(priv);
+ if (ret && ret != -EAGAIN) {
+ dev_dbg(dev, "Failed to populate DIMM temp info\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+static void peci_dimmtemp_remove(struct auxiliary_device *adev)
+{
+ struct peci_dimmtemp *priv = dev_get_drvdata(&adev->dev);
+
+ cancel_delayed_work_sync(&priv->detect_work);
+}
+
+static const struct dimm_info dimm_hsx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_HSX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_HSX,
+ .min_peci_revision = 0x30,
+};
+
+static const struct dimm_info dimm_bdx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_BDX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_BDX,
+ .min_peci_revision = 0x30,
+};
+
+static const struct dimm_info dimm_bdxd = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_BDXD,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_BDXD,
+ .min_peci_revision = 0x30,
+};
+
+static const struct dimm_info dimm_skx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_SKX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_SKX,
+ .min_peci_revision = 0x30,
+};
+
+static const struct dimm_info dimm_icx = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_ICX,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_ICX,
+ .min_peci_revision = 0x40,
+};
+
+static const struct dimm_info dimm_icxd = {
+ .chan_rank_max = CHAN_RANK_MAX_ON_ICXD,
+ .dimm_idx_max = DIMM_IDX_MAX_ON_ICXD,
+ .min_peci_revision = 0x40,
+};
+
+static const struct auxiliary_device_id peci_dimmtemp_ids[] = {
+ {
+ .name = "peci_cpu.dimmtemp.hsx",
+ .driver_data = (kernel_ulong_t)&dimm_hsx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.bdx",
+ .driver_data = (kernel_ulong_t)&dimm_bdx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.bdxd",
+ .driver_data = (kernel_ulong_t)&dimm_bdxd,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.skx",
+ .driver_data = (kernel_ulong_t)&dimm_skx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.icx",
+ .driver_data = (kernel_ulong_t)&dimm_icx,
+ },
+ {
+ .name = "peci_cpu.dimmtemp.icxd",
+ .driver_data = (kernel_ulong_t)&dimm_icxd,
+ },
+ { }
+};
+MODULE_DEVICE_TABLE(auxiliary, peci_dimmtemp_ids);
+
+static struct auxiliary_driver peci_dimmtemp_driver = {
+ .probe = peci_dimmtemp_probe,
+ .remove = peci_dimmtemp_remove,
+ .id_table = peci_dimmtemp_ids,
+};
+
+module_auxiliary_driver(peci_dimmtemp_driver);
+
+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
+MODULE_DESCRIPTION("PECI dimmtemp driver");
+MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(PECI_CPU);
--
2.31.1
On 7/12/21 3:04 PM, Iwona Winiarska wrote:
> diff --git a/drivers/peci/controller/Kconfig b/drivers/peci/controller/Kconfig
> new file mode 100644
> index 000000000000..8ddbe494677f
> --- /dev/null
> +++ b/drivers/peci/controller/Kconfig
> @@ -0,0 +1,12 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config PECI_ASPEED
> + tristate "ASPEED PECI support"
> + depends on ARCH_ASPEED || COMPILE_TEST
> + depends on OF
> + depends on HAS_IOMEM
> + help
> + Enable this driver if you want to support ASPEED PECI controller.
> +
> + This driver can be also build as a module. If so, the module
can also be built as a module.
> + will be called peci-aspeed.
--
~Randy
On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> Note: All changes to arch/x86 are contained within patches 01-02.
Hi Iwona,
One meta question first, who is this submission "To:"? Is there an
existing upstream maintainer path for OpenBMC changes? Are you
expecting contributions to this subsystem from others? While Greg
sometimes ends up as default maintainer for new stuff, I wonder if
someone from the OpenBMC commnuity should step up to fill this role?
>
> The Platform Environment Control Interface (PECI) is a communication
> interface between Intel processors and management controllers (e.g.
> Baseboard Management Controller, BMC).
>
> This series adds a PECI subsystem and introduces drivers which run in
> the Linux instance on the management controller (not the main Intel
> processor) and is intended to be used by the OpenBMC [1], a Linux
> distribution for BMC devices.
> The information exposed over PECI (like processor and DIMM
> temperature) refers to the Intel processor and can be consumed by
> daemons running on the BMC to, for example, display the processor
> temperature in its web interface.
>
> The PECI bus is collection of code that provides interface support
> between PECI devices (that actually represent processors) and PECI
> controllers (such as the "peci-aspeed" controller) that allow to
> access physical PECI interface. PECI devices are bound to PECI
> drivers that provides access to PECI services. This series introduces
> a generic "peci-cpu" driver that exposes hardware monitoring
> "cputemp"
> and "dimmtemp" using the auxiliary bus.
>
> Exposing "raw" PECI to userspace, either to write userspace drivers
> or
> for debug/testing purpose was left out of this series to encourage
> writing kernel drivers instead, but may be pursued in the future.
>
> Introducing PECI to upstream Linux was already attempted before [2].
> Since it's been over a year since last revision, and the series
> changed quite a bit in the meantime, I've decided to start from v1.
>
> I would also like to give credit to everyone who helped me with
> different aspects of preliminary review:
> - Pierre-Louis Bossart,
> - Tony Luck,
> - Andy Shevchenko,
> - Dave Hansen.
>
> [1] https://github.com/openbmc/openbmc
> [2]
> https://lore.kernel.org/openbmc/[email protected]/
>
> Iwona Winiarska (12):
> x86/cpu: Move intel-family to arch-independent headers
> x86/cpu: Extract cpuid helpers to arch-independent
> dt-bindings: Add generic bindings for PECI
> dt-bindings: Add bindings for peci-aspeed
> ARM: dts: aspeed: Add PECI controller nodes
> peci: Add core infrastructure
> peci: Add device detection
> peci: Add support for PECI device drivers
> peci: Add peci-cpu driver
> hwmon: peci: Add cputemp driver
> hwmon: peci: Add dimmtemp driver
> docs: Add PECI documentation
>
> Jae Hyun Yoo (2):
> peci: Add peci-aspeed controller driver
> docs: hwmon: Document PECI drivers
>
> .../devicetree/bindings/peci/peci-aspeed.yaml | 111 ++++
> .../bindings/peci/peci-controller.yaml | 28 +
> Documentation/hwmon/index.rst | 2 +
> Documentation/hwmon/peci-cputemp.rst | 93 ++++
> Documentation/hwmon/peci-dimmtemp.rst | 58 ++
> Documentation/index.rst | 1 +
> Documentation/peci/index.rst | 16 +
> Documentation/peci/peci.rst | 48 ++
> MAINTAINERS | 32 ++
> arch/arm/boot/dts/aspeed-g4.dtsi | 14 +
> arch/arm/boot/dts/aspeed-g5.dtsi | 14 +
> arch/arm/boot/dts/aspeed-g6.dtsi | 14 +
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/cpu.h | 3 -
> arch/x86/include/asm/intel-family.h | 141 +----
> arch/x86/include/asm/microcode.h | 2 +-
> arch/x86/kvm/cpuid.h | 3 +-
> arch/x86/lib/Makefile | 2 +-
> drivers/Kconfig | 3 +
> drivers/Makefile | 1 +
> drivers/edac/mce_amd.c | 3 +-
> drivers/hwmon/Kconfig | 2 +
> drivers/hwmon/Makefile | 1 +
> drivers/hwmon/peci/Kconfig | 31 ++
> drivers/hwmon/peci/Makefile | 7 +
> drivers/hwmon/peci/common.h | 46 ++
> drivers/hwmon/peci/cputemp.c | 503
> +++++++++++++++++
> drivers/hwmon/peci/dimmtemp.c | 508
> ++++++++++++++++++
> drivers/peci/Kconfig | 36 ++
> drivers/peci/Makefile | 10 +
> drivers/peci/controller/Kconfig | 12 +
> drivers/peci/controller/Makefile | 3 +
> drivers/peci/controller/peci-aspeed.c | 501
> +++++++++++++++++
> drivers/peci/core.c | 224 ++++++++
> drivers/peci/cpu.c | 347 ++++++++++++
> drivers/peci/device.c | 211 ++++++++
> drivers/peci/internal.h | 137 +++++
> drivers/peci/request.c | 502
> +++++++++++++++++
> drivers/peci/sysfs.c | 82 +++
> include/linux/peci-cpu.h | 38 ++
> include/linux/peci.h | 93 ++++
> include/linux/x86/cpu.h | 9 +
> include/linux/x86/intel-family.h | 146 +++++
> lib/Kconfig | 5 +
> lib/Makefile | 2 +
> lib/x86/Makefile | 3 +
> {arch/x86/lib => lib/x86}/cpu.c | 2 +-
> 47 files changed, 3902 insertions(+), 149 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/peci/peci-
> aspeed.yaml
> create mode 100644 Documentation/devicetree/bindings/peci/peci-
> controller.yaml
> create mode 100644 Documentation/hwmon/peci-cputemp.rst
> create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
> create mode 100644 Documentation/peci/index.rst
> create mode 100644 Documentation/peci/peci.rst
> create mode 100644 drivers/hwmon/peci/Kconfig
> create mode 100644 drivers/hwmon/peci/Makefile
> create mode 100644 drivers/hwmon/peci/common.h
> create mode 100644 drivers/hwmon/peci/cputemp.c
> create mode 100644 drivers/hwmon/peci/dimmtemp.c
> create mode 100644 drivers/peci/Kconfig
> create mode 100644 drivers/peci/Makefile
> create mode 100644 drivers/peci/controller/Kconfig
> create mode 100644 drivers/peci/controller/Makefile
> create mode 100644 drivers/peci/controller/peci-aspeed.c
> create mode 100644 drivers/peci/core.c
> create mode 100644 drivers/peci/cpu.c
> create mode 100644 drivers/peci/device.c
> create mode 100644 drivers/peci/internal.h
> create mode 100644 drivers/peci/request.c
> create mode 100644 drivers/peci/sysfs.c
> create mode 100644 include/linux/peci-cpu.h
> create mode 100644 include/linux/peci.h
> create mode 100644 include/linux/x86/cpu.h
> create mode 100644 include/linux/x86/intel-family.h
> create mode 100644 lib/x86/Makefile
> rename {arch/x86/lib => lib/x86}/cpu.c (95%)
>
On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> Baseboard management controllers (BMC) often run Linux but are usually
> implemented with non-X86 processors. They can use PECI to access package
> config space (PCS) registers on the host CPU and since some information,
> e.g. figuring out the core count, can be obtained using different
> registers on different CPU generations, they need to decode the family
> and model.
>
> Move the data from arch/x86/include/asm/intel-family.h into a new file
> include/linux/x86/intel-family.h so that it can be used by other
> architectures.
At least it would make the diffstat smaller to allow for rename
detection when the old file is deleted in the same patch:
MAINTAINERS | 1 +
{arch/x86/include/asm => include/linux/x86}/intel-family.h | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)
...one thing people have done in the past is include a conversion
script in the changelog that produced the diff. That way if a
maintainer wants to be sure to catch any new usage of the header at
the old location they just run the script.
I am not aware of x86 maintainer preference here. Either way you decide
to go you can add:
Reviewed-by: Dan Williams <[email protected]>
On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> Baseboard management controllers (BMC) often run Linux but are usually
> implemented with non-X86 processors. They can use PECI to access package
> config space (PCS) registers on the host CPU and since some information,
> e.g. figuring out the core count, can be obtained using different
> registers on different CPU generations, they need to decode the family
> and model.
>
> The format of Package Identifier PCS register that describes CPUID
> information has the same layout as CPUID_1.EAX, so let's allow to reuse
> cpuid helpers by making it available for other architectures as well.
Just some minor comments below.
You can go ahead and add:
Reviewed-by: Dan Williams <[email protected]>
>
> Signed-off-by: Iwona Winiarska <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
> MAINTAINERS | 2 ++
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/cpu.h | 3 ---
> arch/x86/include/asm/microcode.h | 2 +-
> arch/x86/kvm/cpuid.h | 3 ++-
> arch/x86/lib/Makefile | 2 +-
> drivers/edac/mce_amd.c | 3 +--
> include/linux/x86/cpu.h | 9 +++++++++
> lib/Kconfig | 5 +++++
> lib/Makefile | 2 ++
> lib/x86/Makefile | 3 +++
> {arch/x86/lib => lib/x86}/cpu.c | 2 +-
> 12 files changed, 28 insertions(+), 9 deletions(-)
> create mode 100644 include/linux/x86/cpu.h
> create mode 100644 lib/x86/Makefile
> rename {arch/x86/lib => lib/x86}/cpu.c (95%)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ec5987a00800..6f77aaca2a30 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -20081,6 +20081,8 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/core
> F: Documentation/devicetree/bindings/x86/
> F: Documentation/x86/
> F: arch/x86/
> +F: include/linux/x86/
Doesn't this technically belong in patch1 since that one introduced
the directory?
> +F: lib/x86/
>
> X86 ENTRY CODE
> M: Andy Lutomirski <[email protected]>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 49270655e827..750f9b896e4f 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -141,6 +141,7 @@ config X86
> select GENERIC_IRQ_PROBE
> select GENERIC_IRQ_RESERVATION_MODE
> select GENERIC_IRQ_SHOW
> + select GENERIC_LIB_X86
> select GENERIC_PENDING_IRQ if SMP
> select GENERIC_PTDUMP
> select GENERIC_SMP_IDLE_THREAD
> diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
> index 33d41e350c79..2a663a05a795 100644
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -37,9 +37,6 @@ extern int _debug_hotplug_cpu(int cpu, int action);
>
> int mwait_usable(const struct cpuinfo_x86 *);
>
> -unsigned int x86_family(unsigned int sig);
> -unsigned int x86_model(unsigned int sig);
> -unsigned int x86_stepping(unsigned int sig);
> #ifdef CONFIG_CPU_SUP_INTEL
> extern void __init sld_setup(struct cpuinfo_x86 *c);
> extern void switch_to_sld(unsigned long tifn);
> diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
> index ab45a220fac4..4b0eabf63b98 100644
> --- a/arch/x86/include/asm/microcode.h
> +++ b/arch/x86/include/asm/microcode.h
> @@ -2,9 +2,9 @@
> #ifndef _ASM_X86_MICROCODE_H
> #define _ASM_X86_MICROCODE_H
>
> -#include <asm/cpu.h>
> #include <linux/earlycpio.h>
> #include <linux/initrd.h>
> +#include <linux/x86/cpu.h>
Has this patch set received a build success notification from the
kbuild robot? I.e. are you sure that this include was only here for
the
unsigned int x86_family(unsigned int sig);
unsigned int x86_model(unsigned int sig);
unsigned int x86_stepping(unsigned int sig);
...helpers. All the other replacements look trivially verifiable as
only needing these 3 helpers.
>
> struct ucode_patch {
> struct list_head plist;
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index c99edfff7f82..bf070d2a2175 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -4,10 +4,11 @@
>
> #include "x86.h"
> #include "reverse_cpuid.h"
> -#include <asm/cpu.h>
> #include <asm/processor.h>
> #include <uapi/asm/kvm_para.h>
>
> +#include <linux/x86/cpu.h>
> +
> extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
> void kvm_set_cpu_caps(void);
>
> diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> index bad4dee4f0e4..fd73c1b72c3e 100644
> --- a/arch/x86/lib/Makefile
> +++ b/arch/x86/lib/Makefile
> @@ -41,7 +41,7 @@ clean-files := inat-tables.c
>
> obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
>
> -lib-y := delay.o misc.o cmdline.o cpu.o
> +lib-y := delay.o misc.o cmdline.o
> lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
> lib-y += memcpy_$(BITS).o
> lib-$(CONFIG_ARCH_HAS_COPY_MC) += copy_mc.o copy_mc_64.o
> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> index 27d56920b469..f545f5fad02c 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -1,8 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0-only
> #include <linux/module.h>
> #include <linux/slab.h>
> -
> -#include <asm/cpu.h>
> +#include <linux/x86/cpu.h>
>
> #include "mce_amd.h"
>
> diff --git a/include/linux/x86/cpu.h b/include/linux/x86/cpu.h
> new file mode 100644
> index 000000000000..5f383d47886d
> --- /dev/null
> +++ b/include/linux/x86/cpu.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef _LINUX_X86_CPU_H
> +#define _LINUX_X86_CPU_H
> +
> +unsigned int x86_family(unsigned int sig);
> +unsigned int x86_model(unsigned int sig);
> +unsigned int x86_stepping(unsigned int sig);
> +
> +#endif /* _LINUX_X86_CPU_H */
> diff --git a/lib/Kconfig b/lib/Kconfig
> index d241fe476fda..cc28bc1f2d84 100644
> --- a/lib/Kconfig
> +++ b/lib/Kconfig
> @@ -718,3 +718,8 @@ config PLDMFW
>
> config ASN1_ENCODER
> tristate
> +
> +config GENERIC_LIB_X86
> + bool
> + depends on X86
> + default n
No need for a "default n" line. Omitting a default is the same as
"default n".
On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> Intel processors provide access for various services designed to support
> processor and DRAM thermal management, platform manageability and
> processor interface tuning and diagnostics.
> Those services are available via the Platform Environment Control
> Interface (PECI) that provides a communication channel between the
> processor and the Baseboard Management Controller (BMC) or other
> platform management device.
>
> This change introduces PECI subsystem by adding the initial core module
> and API for controller drivers.
>
> Co-developed-by: Jason M Bills <[email protected]>
> Signed-off-by: Jason M Bills <[email protected]>
> Co-developed-by: Jae Hyun Yoo <[email protected]>
> Signed-off-by: Jae Hyun Yoo <[email protected]>
> Signed-off-by: Iwona Winiarska <[email protected]>
> Reviewed-by: Pierre-Louis Bossart <[email protected]>
> ---
> MAINTAINERS | 9 +++
> drivers/Kconfig | 3 +
> drivers/Makefile | 1 +
> drivers/peci/Kconfig | 14 ++++
> drivers/peci/Makefile | 5 ++
> drivers/peci/core.c | 166 ++++++++++++++++++++++++++++++++++++++++
> drivers/peci/internal.h | 20 +++++
> drivers/peci/sysfs.c | 48 ++++++++++++
> include/linux/peci.h | 82 ++++++++++++++++++++
> 9 files changed, 348 insertions(+)
> create mode 100644 drivers/peci/Kconfig
> create mode 100644 drivers/peci/Makefile
> create mode 100644 drivers/peci/core.c
> create mode 100644 drivers/peci/internal.h
> create mode 100644 drivers/peci/sysfs.c
> create mode 100644 include/linux/peci.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6f77aaca2a30..47411e2b6336 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14495,6 +14495,15 @@ L: [email protected]
> S: Maintained
> F: drivers/platform/x86/peaq-wmi.c
>
> +PECI SUBSYSTEM
> +M: Iwona Winiarska <[email protected]>
> +R: Jae Hyun Yoo <[email protected]>
> +L: [email protected] (moderated for non-subscribers)
> +S: Supported
> +F: Documentation/devicetree/bindings/peci/
> +F: drivers/peci/
> +F: include/linux/peci.h
> +
> PENSANDO ETHERNET DRIVERS
> M: Shannon Nelson <[email protected]>
> M: [email protected]
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 8bad63417a50..f472b3d972b3 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -236,4 +236,7 @@ source "drivers/interconnect/Kconfig"
> source "drivers/counter/Kconfig"
>
> source "drivers/most/Kconfig"
> +
> +source "drivers/peci/Kconfig"
> +
> endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index 27c018bdf4de..8d96f0c3dde5 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -189,3 +189,4 @@ obj-$(CONFIG_GNSS) += gnss/
> obj-$(CONFIG_INTERCONNECT) += interconnect/
> obj-$(CONFIG_COUNTER) += counter/
> obj-$(CONFIG_MOST) += most/
> +obj-$(CONFIG_PECI) += peci/
> diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
> new file mode 100644
> index 000000000000..601cc3c3c852
> --- /dev/null
> +++ b/drivers/peci/Kconfig
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +menuconfig PECI
> + tristate "PECI support"
> + help
> + The Platform Environment Control Interface (PECI) is an interface
> + that provides a communication channel to Intel processors and
> + chipset components from external monitoring or control devices.
> +
> + If you want PECI support, you should say Y here and also to the
> + specific driver for your bus adapter(s) below.
The user is reading this help text to decide if they want PECI
support, so clarifying that if they want PECI support they should turn
it on is not all that helpful. I would say "If you are building a
kernel for a Board Management Controller (BMC) say Y. If unsure say
N".
> +
> + This support is also available as a module. If so, the module
> + will be called peci.
> diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
> new file mode 100644
> index 000000000000..2bb2f51bcda7
> --- /dev/null
> +++ b/drivers/peci/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +# Core functionality
> +peci-y := core.o sysfs.o
> +obj-$(CONFIG_PECI) += peci.o
> diff --git a/drivers/peci/core.c b/drivers/peci/core.c
> new file mode 100644
> index 000000000000..0ad00110459d
> --- /dev/null
> +++ b/drivers/peci/core.c
> @@ -0,0 +1,166 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) 2018-2021 Intel Corporation
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/bug.h>
> +#include <linux/device.h>
> +#include <linux/export.h>
> +#include <linux/idr.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/peci.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/property.h>
> +#include <linux/slab.h>
> +
> +#include "internal.h"
> +
> +static DEFINE_IDA(peci_controller_ida);
> +
> +static void peci_controller_dev_release(struct device *dev)
> +{
> + struct peci_controller *controller = to_peci_controller(dev);
> +
> + mutex_destroy(&controller->bus_lock);
> +}
> +
> +struct device_type peci_controller_type = {
> + .release = peci_controller_dev_release,
> +};
I have not read further than patch 6 in this set, so I'm hoping there
is an explanation for this. As it stands it looks like a red flag that
the release function is not actually releasing anything?
> +
> +int peci_controller_scan_devices(struct peci_controller *controller)
> +{
> + /* Just a stub, no support for actual devices yet */
> + return 0;
> +}
Move this to the patch where it is needed.
> +
> +/**
> + * peci_controller_add() - Add PECI controller
> + * @controller: the PECI controller to be added
> + * @parent: device object to be registered as a parent
> + *
> + * In final stage of its probe(), peci_controller driver should include calling
s/should include calling/calls/
> + * peci_controller_add() to register itself with the PECI bus.
> + * The caller is responsible for allocating the struct peci_controller and
> + * managing its lifetime, calling peci_controller_remove() prior to releasing
> + * the allocation.
> + *
> + * It returns zero on success, else a negative error code (dropping the
> + * controller's refcount). After a successful return, the caller is responsible
> + * for calling peci_controller_remove().
> + *
> + * Return: 0 if succeeded, other values in case errors.
> + */
> +int peci_controller_add(struct peci_controller *controller, struct device *parent)
> +{
> + struct fwnode_handle *node = fwnode_handle_get(dev_fwnode(parent));
> + int ret;
> +
> + if (WARN_ON(!controller->xfer))
Why WARN()? What is 'xfer', and what is likelihood the caller forgets
to set it? For something critical like this the WARN is likely
overkill.
> + return -EINVAL;
> +
> + ret = ida_alloc_max(&peci_controller_ida, U8_MAX, GFP_KERNEL);
An '_add' function should just add, this seems to be doing more
"alloc". Speaking of which is there a peci_controller_alloc()?
> + if (ret < 0)
> + return ret;
> +
> + controller->id = ret;
> +
> + mutex_init(&controller->bus_lock);
> +
> + controller->dev.parent = parent;
> + controller->dev.bus = &peci_bus_type;
> + controller->dev.type = &peci_controller_type;
> + controller->dev.fwnode = node;
> + controller->dev.of_node = to_of_node(node);
> +
> + ret = dev_set_name(&controller->dev, "peci-%d", controller->id);
> + if (ret)
> + goto err_id;
> +
> + ret = device_register(&controller->dev);
> + if (ret)
> + goto err_put;
> +
> + pm_runtime_no_callbacks(&controller->dev);
> + pm_suspend_ignore_children(&controller->dev, true);
> + pm_runtime_enable(&controller->dev);
> +
> + /*
> + * Ignoring retval since failures during scan are non-critical for
> + * controller itself.
> + */
> + peci_controller_scan_devices(controller);
> +
> + return 0;
> +
> +err_put:
> + put_device(&controller->dev);
> +err_id:
> + fwnode_handle_put(controller->dev.fwnode);
> + ida_free(&peci_controller_ida, controller->id);
I'd expect these to be released by ->release().
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
I think it's cleaner to declare symbol namespaces in the Makefile. In
this case, add:
cflags-y += -DDEFAULT_SYMBOL_NAMESPACE=PECI
...and just use EXPORT_SYMBOL_GPL as normal in the C file.
> +
> +static int _unregister(struct device *dev, void *dummy)
> +{
> + /* Just a stub, no support for actual devices yet */
At least for me, I think it wastes review time to consider empty stubs. Just add the
whole thing back when it's actually used so it can be reviewed
properly for suitability.
> + return 0;
> +}
> +
> +/**
> + * peci_controller_remove - Delete PECI controller
> + * @controller: the PECI controller to be removed
> + *
> + * This call is used only by PECI controller drivers, which are the only ones
> + * directly touching chip registers.
> + *
> + * Note that this function also drops a reference to the controller.
> + */
> +void peci_controller_remove(struct peci_controller *controller)
> +{
> + pm_runtime_disable(&controller->dev);
> + /*
> + * Detach any active PECI devices. This can't fail, thus we do not
> + * check the returned value.
> + */
> + device_for_each_child_reverse(&controller->dev, NULL, _unregister);
How does the peci_controller_remove() get called with children still
beneath it? Can that possibility be precluded by arranging for
children to be removed first?
For example, given peci_controller_add is called from another's driver
probe routine, this unregistration could be handled by a devm action.
> +
> + device_unregister(&controller->dev);
> + fwnode_handle_put(controller->dev.fwnode);
> + ida_free(&peci_controller_ida, controller->id);
Another open coded copy of release code that belongs in ->release()?
> +}
> +EXPORT_SYMBOL_NS_GPL(peci_controller_remove, PECI);
> +
> +struct bus_type peci_bus_type = {
> + .name = "peci",
> + .bus_groups = peci_bus_groups,
> +};
> +
> +static int __init peci_init(void)
> +{
> + int ret;
> +
> + ret = bus_register(&peci_bus_type);
> + if (ret < 0) {
> + pr_err("failed to register PECI bus type!\n");
> + return ret;
> + }
> +
> + return 0;
> +}
> +subsys_initcall(peci_init);
You can't have subsys_initcall in a module. If you actually need
subsys_initcall then this can't be a module. Are you sure this can't
be module_init()?
> +
> +static void __exit peci_exit(void)
> +{
> + bus_unregister(&peci_bus_type);
> +}
> +module_exit(peci_exit);
> +
> +MODULE_AUTHOR("Jason M Bills <[email protected]>");
> +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
> +MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
Is MAINTAINERS sufficient? Do you all want to be contacted by end
users, or just kernel developers. If it's the former then keep this,
if it's the latter then MAINTAINERS is sufficient.
> +MODULE_DESCRIPTION("PECI bus core module");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
> new file mode 100644
> index 000000000000..80c61bcdfc6b
> --- /dev/null
> +++ b/drivers/peci/internal.h
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (c) 2018-2021 Intel Corporation */
> +
> +#ifndef __PECI_INTERNAL_H
> +#define __PECI_INTERNAL_H
> +
> +#include <linux/device.h>
> +#include <linux/types.h>
> +
> +struct peci_controller;
> +struct attribute_group;
> +
> +extern struct bus_type peci_bus_type;
> +extern const struct attribute_group *peci_bus_groups[];
> +
> +extern struct device_type peci_controller_type;
> +
> +int peci_controller_scan_devices(struct peci_controller *controller);
> +
> +#endif /* __PECI_INTERNAL_H */
> diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c
> new file mode 100644
> index 000000000000..36c5e2a18a92
> --- /dev/null
> +++ b/drivers/peci/sysfs.c
> @@ -0,0 +1,48 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) 2021 Intel Corporation
> +
> +#include <linux/peci.h>
> +
> +#include "internal.h"
> +
> +static int rescan_controller(struct device *dev, void *data)
> +{
> + if (dev->type != &peci_controller_type)
> + return 0;
> +
> + return peci_controller_scan_devices(to_peci_controller(dev));
> +}
> +
> +static ssize_t rescan_store(struct bus_type *bus, const char *buf, size_t count)
> +{
> + bool res;
> + int ret;
> +
> + ret = kstrtobool(buf, &res);
> + if (ret)
> + return ret;
> +
> + if (!res)
> + return count;
> +
> + ret = bus_for_each_dev(&peci_bus_type, NULL, NULL, rescan_controller);
> + if (ret)
> + return ret;
> +
> + return count;
> +}
> +static BUS_ATTR_WO(rescan);
No Documentation/ABI entry for this attribute, which means I'm not
sure if it's suitable because it's unreviewable what it actually does
reviewing this patch as a standalone.
> +
> +static struct attribute *peci_bus_attrs[] = {
> + &bus_attr_rescan.attr,
> + NULL
> +};
> +
> +static const struct attribute_group peci_bus_group = {
> + .attrs = peci_bus_attrs,
> +};
> +
> +const struct attribute_group *peci_bus_groups[] = {
> + &peci_bus_group,
> + NULL
> +};
> diff --git a/include/linux/peci.h b/include/linux/peci.h
> new file mode 100644
> index 000000000000..cdf3008321fd
> --- /dev/null
> +++ b/include/linux/peci.h
> @@ -0,0 +1,82 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (c) 2018-2021 Intel Corporation */
> +
> +#ifndef __LINUX_PECI_H
> +#define __LINUX_PECI_H
> +
> +#include <linux/device.h>
> +#include <linux/kernel.h>
> +#include <linux/mutex.h>
> +#include <linux/types.h>
> +
> +struct peci_request;
> +
> +/**
> + * struct peci_controller - PECI controller
> + * @dev: device object to register PECI controller to the device model
> + * @xfer: PECI transfer function
> + * @bus_lock: lock used to protect multiple callers
> + * @id: PECI controller ID
> + *
> + * PECI controllers usually connect to their drivers using non-PECI bus,
> + * such as the platform bus.
> + * Each PECI controller can communicate with one or more PECI devices.
> + */
> +struct peci_controller {
> + struct device dev;
> + int (*xfer)(struct peci_controller *controller, u8 addr, struct peci_request *req);
Each device will have a different way to do a PECI transfer?
I thought PECI was a standard...
> + struct mutex bus_lock; /* held for the duration of xfer */
What is it actually locking? For example, there is a mantra that goes
"lock data, not code", and this comment seems to imply that no specific
data is being locked.
> + u8 id;
No possible way to have more than 256 controllers per system?
> +};
> +
> +int peci_controller_add(struct peci_controller *controller, struct device *parent);
> +void peci_controller_remove(struct peci_controller *controller);
> +
> +static inline struct peci_controller *to_peci_controller(void *d)
> +{
> + return container_of(d, struct peci_controller, dev);
> +}
> +
> +/**
> + * struct peci_device - PECI device
> + * @dev: device object to register PECI device to the device model
> + * @controller: manages the bus segment hosting this PECI device
> + * @addr: address used on the PECI bus connected to the parent controller
> + *
> + * A peci_device identifies a single device (i.e. CPU) connected to a PECI bus.
> + * The behaviour exposed to the rest of the system is defined by the PECI driver
> + * managing the device.
> + */
> +struct peci_device {
> + struct device dev;
> + struct peci_controller *controller;
Is the device a child of the controller? If yes, then no need for a a
separate pointer vs "to_peci_controller(peci_dev->parent)"
On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> From: Jae Hyun Yoo <[email protected]>
>
> ASPEED AST24xx/AST25xx/AST26xx SoCs supports the PECI electrical
> interface (a.k.a PECI wire).
>
> Signed-off-by: Jae Hyun Yoo <[email protected]>
> Co-developed-by: Iwona Winiarska <[email protected]>
> Signed-off-by: Iwona Winiarska <[email protected]>
> Reviewed-by: Pierre-Louis Bossart <[email protected]>
> ---
> MAINTAINERS | 9 +
> drivers/peci/Kconfig | 6 +
> drivers/peci/Makefile | 3 +
> drivers/peci/controller/Kconfig | 12 +
> drivers/peci/controller/Makefile | 3 +
> drivers/peci/controller/peci-aspeed.c | 501 ++++++++++++++++++++++++++
> 6 files changed, 534 insertions(+)
> create mode 100644 drivers/peci/controller/Kconfig
> create mode 100644 drivers/peci/controller/Makefile
> create mode 100644 drivers/peci/controller/peci-aspeed.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 47411e2b6336..4ba874afa2fa 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2865,6 +2865,15 @@ S: Maintained
> F: Documentation/hwmon/asc7621.rst
> F: drivers/hwmon/asc7621.c
>
> +ASPEED PECI CONTROLLER
> +M: Iwona Winiarska <[email protected]>
> +M: Jae Hyun Yoo <[email protected]>
> +L: [email protected] (moderated for non-subscribers)
> +L: [email protected] (moderated for non-subscribers)
> +S: Supported
> +F: Documentation/devicetree/bindings/peci/peci-aspeed.yaml
> +F: drivers/peci/controller/peci-aspeed.c
> +
> ASPEED PINCTRL DRIVERS
> M: Andrew Jeffery <[email protected]>
> L: [email protected] (moderated for non-subscribers)
> diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
> index 601cc3c3c852..0d0ee8009713 100644
> --- a/drivers/peci/Kconfig
> +++ b/drivers/peci/Kconfig
> @@ -12,3 +12,9 @@ menuconfig PECI
>
> This support is also available as a module. If so, the module
> will be called peci.
> +
> +if PECI
> +
> +source "drivers/peci/controller/Kconfig"
> +
> +endif # PECI
> diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
> index 2bb2f51bcda7..621a993e306a 100644
> --- a/drivers/peci/Makefile
> +++ b/drivers/peci/Makefile
> @@ -3,3 +3,6 @@
> # Core functionality
> peci-y := core.o sysfs.o
> obj-$(CONFIG_PECI) += peci.o
> +
> +# Hardware specific bus drivers
> +obj-y += controller/
> diff --git a/drivers/peci/controller/Kconfig b/drivers/peci/controller/Kconfig
> new file mode 100644
> index 000000000000..8ddbe494677f
> --- /dev/null
> +++ b/drivers/peci/controller/Kconfig
> @@ -0,0 +1,12 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config PECI_ASPEED
> + tristate "ASPEED PECI support"
> + depends on ARCH_ASPEED || COMPILE_TEST
> + depends on OF
> + depends on HAS_IOMEM
> + help
> + Enable this driver if you want to support ASPEED PECI controller.
Perhaps a note about how one might make this determination, or maybe a
general recommendation that if they are building for deployment on an
OpenBMC system say Y else say N?
> +
> + This driver can be also build as a module. If so, the module
> + will be called peci-aspeed.
> diff --git a/drivers/peci/controller/Makefile b/drivers/peci/controller/Makefile
> new file mode 100644
> index 000000000000..022c28ef1bf0
> --- /dev/null
> +++ b/drivers/peci/controller/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +obj-$(CONFIG_PECI_ASPEED) += peci-aspeed.o
> diff --git a/drivers/peci/controller/peci-aspeed.c b/drivers/peci/controller/peci-aspeed.c
> new file mode 100644
> index 000000000000..888b46383ea4
> --- /dev/null
> +++ b/drivers/peci/controller/peci-aspeed.c
> @@ -0,0 +1,501 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (C) 2012-2017 ASPEED Technology Inc.
> +// Copyright (c) 2018-2021 Intel Corporation
> +
> +#include <linux/bitfield.h>
> +#include <linux/clk.h>
> +#include <linux/delay.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/iopoll.h>
> +#include <linux/jiffies.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/peci.h>
> +#include <linux/platform_device.h>
> +#include <linux/reset.h>
> +
> +#include <asm/unaligned.h>
> +
> +/* ASPEED PECI Registers */
> +/* Control Register */
> +#define ASPEED_PECI_CTRL 0x00
> +#define ASPEED_PECI_CTRL_SAMPLING_MASK GENMASK(19, 16)
> +#define ASPEED_PECI_CTRL_READ_MODE_MASK GENMASK(13, 12)
> +#define ASPEED_PECI_CTRL_READ_MODE_COUNT BIT(12)
> +#define ASPEED_PECI_CTRL_READ_MODE_DBG BIT(13)
> +#define ASPEED_PECI_CTRL_CLK_SOURCE_MASK BIT(11)
> +#define ASPEED_PECI_CTRL_CLK_DIV_MASK GENMASK(10, 8)
> +#define ASPEED_PECI_CTRL_INVERT_OUT BIT(7)
> +#define ASPEED_PECI_CTRL_INVERT_IN BIT(6)
> +#define ASPEED_PECI_CTRL_BUS_CONTENT_EN BIT(5)
> +#define ASPEED_PECI_CTRL_PECI_EN BIT(4)
> +#define ASPEED_PECI_CTRL_PECI_CLK_EN BIT(0)
> +
> +/* Timing Negotiation Register */
> +#define ASPEED_PECI_TIMING_NEGOTIATION 0x04
> +#define ASPEED_PECI_TIMING_MESSAGE_MASK GENMASK(15, 8)
> +#define ASPEED_PECI_TIMING_ADDRESS_MASK GENMASK(7, 0)
> +
> +/* Command Register */
> +#define ASPEED_PECI_CMD 0x08
> +#define ASPEED_PECI_CMD_PIN_MON BIT(31)
> +#define ASPEED_PECI_CMD_STS_MASK GENMASK(27, 24)
> +#define ASPEED_PECI_CMD_STS_ADDR_T_NEGO 0x3
> +#define ASPEED_PECI_CMD_IDLE_MASK \
> + (ASPEED_PECI_CMD_STS_MASK | ASPEED_PECI_CMD_PIN_MON)
> +#define ASPEED_PECI_CMD_FIRE BIT(0)
> +
> +/* Read/Write Length Register */
> +#define ASPEED_PECI_RW_LENGTH 0x0c
> +#define ASPEED_PECI_AW_FCS_EN BIT(31)
> +#define ASPEED_PECI_READ_LEN_MASK GENMASK(23, 16)
> +#define ASPEED_PECI_WRITE_LEN_MASK GENMASK(15, 8)
> +#define ASPEED_PECI_TAGET_ADDR_MASK GENMASK(7, 0)
> +
> +/* Expected FCS Data Register */
> +#define ASPEED_PECI_EXP_FCS 0x10
> +#define ASPEED_PECI_EXP_READ_FCS_MASK GENMASK(23, 16)
> +#define ASPEED_PECI_EXP_AW_FCS_AUTO_MASK GENMASK(15, 8)
> +#define ASPEED_PECI_EXP_WRITE_FCS_MASK GENMASK(7, 0)
> +
> +/* Captured FCS Data Register */
> +#define ASPEED_PECI_CAP_FCS 0x14
> +#define ASPEED_PECI_CAP_READ_FCS_MASK GENMASK(23, 16)
> +#define ASPEED_PECI_CAP_WRITE_FCS_MASK GENMASK(7, 0)
> +
> +/* Interrupt Register */
> +#define ASPEED_PECI_INT_CTRL 0x18
> +#define ASPEED_PECI_TIMING_NEGO_SEL_MASK GENMASK(31, 30)
> +#define ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO 0
> +#define ASPEED_PECI_2ND_BIT_OF_ADDR_NEGO 1
> +#define ASPEED_PECI_MESSAGE_NEGO 2
> +#define ASPEED_PECI_INT_MASK GENMASK(4, 0)
> +#define ASPEED_PECI_INT_BUS_TIMEOUT BIT(4)
> +#define ASPEED_PECI_INT_BUS_CONNECT BIT(3)
> +#define ASPEED_PECI_INT_W_FCS_BAD BIT(2)
> +#define ASPEED_PECI_INT_W_FCS_ABORT BIT(1)
> +#define ASPEED_PECI_INT_CMD_DONE BIT(0)
> +
> +/* Interrupt Status Register */
> +#define ASPEED_PECI_INT_STS 0x1c
> +#define ASPEED_PECI_INT_TIMING_RESULT_MASK GENMASK(29, 16)
> + /* bits[4..0]: Same bit fields in the 'Interrupt Register' */
> +
> +/* Rx/Tx Data Buffer Registers */
> +#define ASPEED_PECI_W_DATA0 0x20
> +#define ASPEED_PECI_W_DATA1 0x24
> +#define ASPEED_PECI_W_DATA2 0x28
> +#define ASPEED_PECI_W_DATA3 0x2c
> +#define ASPEED_PECI_R_DATA0 0x30
> +#define ASPEED_PECI_R_DATA1 0x34
> +#define ASPEED_PECI_R_DATA2 0x38
> +#define ASPEED_PECI_R_DATA3 0x3c
> +#define ASPEED_PECI_W_DATA4 0x40
> +#define ASPEED_PECI_W_DATA5 0x44
> +#define ASPEED_PECI_W_DATA6 0x48
> +#define ASPEED_PECI_W_DATA7 0x4c
> +#define ASPEED_PECI_R_DATA4 0x50
> +#define ASPEED_PECI_R_DATA5 0x54
> +#define ASPEED_PECI_R_DATA6 0x58
> +#define ASPEED_PECI_R_DATA7 0x5c
> +#define ASPEED_PECI_DATA_BUF_SIZE_MAX 32
> +
> +/* Timing Negotiation */
> +#define ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT 8
> +#define ASPEED_PECI_RD_SAMPLING_POINT_MAX (BIT(4) - 1)
> +#define ASPEED_PECI_CLK_DIV_DEFAULT 0
> +#define ASPEED_PECI_CLK_DIV_MAX (BIT(3) - 1)
> +#define ASPEED_PECI_MSG_TIMING_DEFAULT 1
> +#define ASPEED_PECI_MSG_TIMING_MAX (BIT(8) - 1)
> +#define ASPEED_PECI_ADDR_TIMING_DEFAULT 1
> +#define ASPEED_PECI_ADDR_TIMING_MAX (BIT(8) - 1)
> +
> +/* Timeout */
> +#define ASPEED_PECI_IDLE_CHECK_TIMEOUT_US (50 * USEC_PER_MSEC)
> +#define ASPEED_PECI_IDLE_CHECK_INTERVAL_US (10 * USEC_PER_MSEC)
> +#define ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT (1000)
> +#define ASPEED_PECI_CMD_TIMEOUT_MS_MAX (1000)
> +
> +struct aspeed_peci {
> + struct peci_controller controller;
Uh oh... this looks like a driver private data structure, and I know
there's a 'struct device' allocated in @controller. /me goes to check
->probe()...
> + struct device *dev;
> + void __iomem *base;
> + struct clk *clk;
> + struct reset_control *rst;
> + int irq;
> + spinlock_t lock; /* to sync completion status handling */
> + struct completion xfer_complete;
> + u32 status;
> + u32 cmd_timeout_ms;
> + u32 msg_timing;
> + u32 addr_timing;
> + u32 rd_sampling_point;
> + u32 clk_div;
> +};
> +
> +static inline struct aspeed_peci *to_aspeed_peci(struct peci_controller *a)
> +{
> + return container_of(a, struct aspeed_peci, controller);
> +}
> +
> +static void aspeed_peci_init_regs(struct aspeed_peci *priv)
> +{
> + u32 val;
> +
> + val = FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, ASPEED_PECI_CLK_DIV_DEFAULT);
> + val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
> + writel(val, priv->base + ASPEED_PECI_CTRL);
> + /*
> + * Timing negotiation period setting.
> + * The unit of the programmed value is 4 times of PECI clock period.
> + */
> + val = FIELD_PREP(ASPEED_PECI_TIMING_MESSAGE_MASK, priv->msg_timing);
> + val |= FIELD_PREP(ASPEED_PECI_TIMING_ADDRESS_MASK, priv->addr_timing);
> + writel(val, priv->base + ASPEED_PECI_TIMING_NEGOTIATION);
> +
> + /* Clear interrupts */
> + val = readl(priv->base + ASPEED_PECI_INT_STS) | ASPEED_PECI_INT_MASK;
> + writel(val, priv->base + ASPEED_PECI_INT_STS);
> +
> + /* Set timing negotiation mode and enable interrupts */
> + val = FIELD_PREP(ASPEED_PECI_TIMING_NEGO_SEL_MASK, ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO);
> + val |= ASPEED_PECI_INT_MASK;
> + writel(val, priv->base + ASPEED_PECI_INT_CTRL);
> +
> + val = FIELD_PREP(ASPEED_PECI_CTRL_SAMPLING_MASK, priv->rd_sampling_point);
> + val |= FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, priv->clk_div);
> + val |= ASPEED_PECI_CTRL_PECI_EN;
> + val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
> + writel(val, priv->base + ASPEED_PECI_CTRL);
Do these MMIO access follow a standard? I.e. is there any possibility
to have a common / generic MMIO xfer function, but just pass in a
different base address discovered by the PECI device rather than a
fully custom xfer function per controller?
> +}
> +
> +static inline int aspeed_peci_check_idle(struct aspeed_peci *priv)
> +{
> + u32 cmd_sts = readl(priv->base + ASPEED_PECI_CMD);
> +
> + if (FIELD_GET(ASPEED_PECI_CMD_STS_MASK, cmd_sts) == ASPEED_PECI_CMD_STS_ADDR_T_NEGO)
> + aspeed_peci_init_regs(priv);
> +
> + return readl_poll_timeout(priv->base + ASPEED_PECI_CMD,
> + cmd_sts,
> + !(cmd_sts & ASPEED_PECI_CMD_IDLE_MASK),
> + ASPEED_PECI_IDLE_CHECK_INTERVAL_US,
> + ASPEED_PECI_IDLE_CHECK_TIMEOUT_US);
> +}
> +
> +static int aspeed_peci_xfer(struct peci_controller *controller,
> + u8 addr, struct peci_request *req)
> +{
> + struct aspeed_peci *priv = to_aspeed_peci(controller);
> + unsigned long flags, timeout = msecs_to_jiffies(priv->cmd_timeout_ms);
> + u32 peci_head;
> + int ret;
> +
> + if (req->tx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX ||
> + req->rx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX)
> + return -EINVAL;
> +
> + /* Check command sts and bus idle state */
> + ret = aspeed_peci_check_idle(priv);
> + if (ret)
> + return ret; /* -ETIMEDOUT */
> +
> + spin_lock_irqsave(&priv->lock, flags);
> + reinit_completion(&priv->xfer_complete);
> +
> + peci_head = FIELD_PREP(ASPEED_PECI_TAGET_ADDR_MASK, addr) |
> + FIELD_PREP(ASPEED_PECI_WRITE_LEN_MASK, req->tx.len) |
> + FIELD_PREP(ASPEED_PECI_READ_LEN_MASK, req->rx.len);
> +
> + writel(peci_head, priv->base + ASPEED_PECI_RW_LENGTH);
> +
> + memcpy_toio(priv->base + ASPEED_PECI_W_DATA0, req->tx.buf,
> + req->tx.len > 16 ? 16 : req->tx.len);
> + if (req->tx.len > 16)
> + memcpy_toio(priv->base + ASPEED_PECI_W_DATA4, req->tx.buf + 16,
> + req->tx.len - 16);
> +
> + dev_dbg(priv->dev, "HEAD : 0x%08x\n", peci_head);
> + print_hex_dump_bytes("TX : ", DUMP_PREFIX_NONE, req->tx.buf, req->tx.len);
> +
> + priv->status = 0;
> + writel(ASPEED_PECI_CMD_FIRE, priv->base + ASPEED_PECI_CMD);
> + spin_unlock_irqrestore(&priv->lock, flags);
> +
> + ret = wait_for_completion_interruptible_timeout(&priv->xfer_complete, timeout);
> + if (ret < 0)
> + return ret;
> +
> + if (ret == 0) {
> + dev_dbg(priv->dev, "Timeout waiting for a response!\n");
> + return -ETIMEDOUT;
> + }
> +
> + spin_lock_irqsave(&priv->lock, flags);
> +
> + writel(0, priv->base + ASPEED_PECI_CMD);
> +
> + if (priv->status != ASPEED_PECI_INT_CMD_DONE) {
> + spin_unlock_irqrestore(&priv->lock, flags);
> + dev_dbg(priv->dev, "No valid response!\n");
> + return -EIO;
> + }
> +
> + spin_unlock_irqrestore(&priv->lock, flags);
> +
> + memcpy_fromio(req->rx.buf, priv->base + ASPEED_PECI_R_DATA0,
> + req->rx.len > 16 ? 16 : req->rx.len);
> + if (req->rx.len > 16)
> + memcpy_fromio(req->rx.buf + 16, priv->base + ASPEED_PECI_R_DATA4,
> + req->rx.len - 16);
> +
> + print_hex_dump_bytes("RX : ", DUMP_PREFIX_NONE, req->rx.buf, req->rx.len);
If dynamic debug is not enabled this will be an unconditional
printk(KERN_DEBUG.
I'm ok with dev_dbg() in slow paths, but in fast paths you should look
to tracing, or putting potentially heavyweight debug behind a
CONFIG_X_DEBUG option. I have seen Greg is even less of a fan of
dev_dbg().
> +
> + return 0;
> +}
> +
> +static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)
> +{
> + struct aspeed_peci *priv = arg;
> + u32 status;
> +
> + spin_lock(&priv->lock);
> + status = readl(priv->base + ASPEED_PECI_INT_STS);
> + writel(status, priv->base + ASPEED_PECI_INT_STS);
> + priv->status |= (status & ASPEED_PECI_INT_MASK);
> +
> + /*
> + * In most cases, interrupt bits will be set one by one but also note
> + * that multiple interrupt bits could be set at the same time.
> + */
> + if (status & ASPEED_PECI_INT_BUS_TIMEOUT)
> + dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_BUS_TIMEOUT\n");
> +
> + if (status & ASPEED_PECI_INT_BUS_CONNECT)
> + dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_BUS_CONNECT\n");
> +
> + if (status & ASPEED_PECI_INT_W_FCS_BAD)
> + dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_W_FCS_BAD\n");
> +
> + if (status & ASPEED_PECI_INT_W_FCS_ABORT)
> + dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_W_FCS_ABORT\n");
What's the utility of these debug statements? If they are for
development than maybe they are ok, if they are for debug in the field
I would make them counters and export them via debugfs, or sysfs if
you expect to always be able to debug these events in case a kernel in
the field has dev_dbg and debugfs disabled.
> +
> + /*
> + * All commands should be ended up with a ASPEED_PECI_INT_CMD_DONE bit
> + * set even in an error case.
> + */
> + if (status & ASPEED_PECI_INT_CMD_DONE)
> + complete(&priv->xfer_complete);
> +
> + spin_unlock(&priv->lock);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static void __sanitize_clock_divider(struct aspeed_peci *priv)
> +{
> + u32 clk_div;
> + int ret;
> +
> + ret = device_property_read_u32(priv->dev, "clock-divider", &clk_div);
> + if (ret) {
> + clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
> + } else if (clk_div > ASPEED_PECI_CLK_DIV_MAX) {
> + dev_warn(priv->dev, "Invalid clock-divider: %u, Using default: %u\n",
> + clk_div, ASPEED_PECI_CLK_DIV_DEFAULT);
> +
> + clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
> + }
> +
> + priv->clk_div = clk_div;
> +}
> +
> +static void __sanitize_msg_timing(struct aspeed_peci *priv)
> +{
> + u32 msg_timing;
> + int ret;
> +
> + ret = device_property_read_u32(priv->dev, "msg-timing", &msg_timing);
> + if (ret) {
> + msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
> + } else if (msg_timing > ASPEED_PECI_MSG_TIMING_MAX) {
> + dev_warn(priv->dev, "Invalid msg-timing : %u, Use default : %u\n",
> + msg_timing, ASPEED_PECI_MSG_TIMING_DEFAULT);
> +
> + msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
> + }
> +
> + priv->msg_timing = msg_timing;
> +}
> +
> +static void __sanitize_addr_timing(struct aspeed_peci *priv)
> +{
> + u32 addr_timing;
> + int ret;
> +
> + ret = device_property_read_u32(priv->dev, "addr-timing", &addr_timing);
> + if (ret) {
> + addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
> + } else if (addr_timing > ASPEED_PECI_ADDR_TIMING_MAX) {
> + dev_warn(priv->dev, "Invalid addr-timing : %u, Use default : %u\n",
> + addr_timing, ASPEED_PECI_ADDR_TIMING_DEFAULT);
> +
> + addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
> + }
> +
> + priv->addr_timing = addr_timing;
> +}
> +
> +static void __sanitize_rd_sampling_point(struct aspeed_peci *priv)
> +{
> + u32 rd_sampling_point;
> + int ret;
> +
> + ret = device_property_read_u32(priv->dev, "rd-sampling-point", &rd_sampling_point);
> + if (ret) {
> + rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
> + } else if (rd_sampling_point > ASPEED_PECI_RD_SAMPLING_POINT_MAX) {
> + dev_warn(priv->dev, "Invalid rd-sampling-point: %u, Use default : %u\n",
> + rd_sampling_point, ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT);
> +
> + rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
> + }
> +
> + priv->rd_sampling_point = rd_sampling_point;
> +}
> +
> +static void __sanitize_cmd_timeout(struct aspeed_peci *priv)
> +{
> + u32 timeout;
> + int ret;
> +
> + ret = device_property_read_u32(priv->dev, "cmd-timeout-ms", &timeout);
> + if (ret) {
> + timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
> + } else if (timeout > ASPEED_PECI_CMD_TIMEOUT_MS_MAX || timeout == 0) {
> + dev_warn(priv->dev, "Invalid cmd-timeout-ms: %u, Use default: %u\n",
> + timeout, ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT);
> +
For all of the same pattern like this above I would say "falling back
to: %u" otherwise "Use default" sounds like an action the platform
owner is expected to take.
Also, if the driver is correcting the issue does the log need to be
spammed with a warning? Is this 'info' or 'debug'?
> + timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
> + }
> +
> + priv->cmd_timeout_ms = timeout;
> +}
> +
> +static void aspeed_peci_device_property_sanitize(struct aspeed_peci *priv)
> +{
> + __sanitize_clock_divider(priv);
> + __sanitize_msg_timing(priv);
> + __sanitize_addr_timing(priv);
> + __sanitize_rd_sampling_point(priv);
> + __sanitize_cmd_timeout(priv);
> +}
> +
> +static void aspeed_peci_disable_clk(void *data)
> +{
> + clk_disable_unprepare(data);
> +}
> +
> +static int aspeed_peci_init_ctrl(struct aspeed_peci *priv)
> +{
> + int ret;
> +
> + priv->clk = devm_clk_get(priv->dev, NULL);
> + if (IS_ERR(priv->clk))
> + return dev_err_probe(priv->dev, PTR_ERR(priv->clk), "Failed to get clk source\n");
> +
> + ret = clk_prepare_enable(priv->clk);
> + if (ret) {
> + dev_err(priv->dev, "Failed to enable clock\n");
> + return ret;
> + }
> +
> + ret = devm_add_action_or_reset(priv->dev, aspeed_peci_disable_clk, priv->clk);
> + if (ret)
> + return ret;
> +
> + aspeed_peci_device_property_sanitize(priv);
> +
> + aspeed_peci_init_regs(priv);
> +
> + return 0;
> +}
> +
> +static int aspeed_peci_probe(struct platform_device *pdev)
> +{
> + struct aspeed_peci *priv;
> + int ret;
> +
> + priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
..."uh oh" from above confirmed. devm allocation lifetime and 'struct
device' lifetime are not compatible.
You can trigger use after free bugs by turning on
CONFIG_DEBUG_KOBJECT_RELEASE.
devm can be used to automatically unregister the peci_controller
device. The flow would be something like:
priv = devm_kzalloc(..., sizeof(*priv), ...);
controller = peci_controller_alloc(...);
if (IS_ERR(controller))
return PTR_ERR(controller);
rc = devm_peci_controller_add(...)
if (rc)
return rc;
This arranges for the peci_controller_alloc() to be undone by
put_device() in all cases. Internal to peci_controller_alloc() is
typical goto unwind allocation error handling.
> + if (!priv)
> + return -ENOMEM;
> +
> + priv->dev = &pdev->dev;
> + dev_set_drvdata(priv->dev, priv);
> +
> + priv->base = devm_platform_ioremap_resource(pdev, 0);
> + if (IS_ERR(priv->base))
> + return PTR_ERR(priv->base);
> +
> + priv->irq = platform_get_irq(pdev, 0);
> + if (!priv->irq)
> + return priv->irq;
> +
> + ret = devm_request_irq(&pdev->dev, priv->irq, aspeed_peci_irq_handler,
> + 0, "peci-aspeed-irq", priv);
> + if (ret)
> + return ret;
> +
> + init_completion(&priv->xfer_complete);
> + spin_lock_init(&priv->lock);
> +
> + priv->controller.xfer = aspeed_peci_xfer;
> +
> + priv->rst = devm_reset_control_get(&pdev->dev, NULL);
> + if (IS_ERR(priv->rst)) {
> + dev_err(&pdev->dev, "Missing or invalid reset controller entry\n");
> + return PTR_ERR(priv->rst);
> + }
> + reset_control_deassert(priv->rst);
> +
> + ret = aspeed_peci_init_ctrl(priv);
> + if (ret)
> + return ret;
> +
> + return peci_controller_add(&priv->controller, priv->dev);
> +}
> +
> +static int aspeed_peci_remove(struct platform_device *pdev)
> +{
> + struct aspeed_peci *priv = dev_get_drvdata(&pdev->dev);
> +
> + peci_controller_remove(&priv->controller);
> + reset_control_assert(priv->rst);
> +
It's odd to have devm in the probe path and still publish a remove
handler, i.e. why not handle controller removal and reset via devm?
The example above with devm_peci_controller_add() already assumes
peci_controller_remove is triggered by devm, reset assert can be
managed the same way.
> + return 0;
> +}
> +
> +static const struct of_device_id aspeed_peci_of_table[] = {
> + { .compatible = "aspeed,ast2400-peci", },
> + { .compatible = "aspeed,ast2500-peci", },
> + { .compatible = "aspeed,ast2600-peci", },
> + { }
> +};
> +MODULE_DEVICE_TABLE(of, aspeed_peci_of_table);
> +
> +static struct platform_driver aspeed_peci_driver = {
> + .probe = aspeed_peci_probe,
> + .remove = aspeed_peci_remove,
> + .driver = {
> + .name = "peci-aspeed",
> + .of_match_table = aspeed_peci_of_table,
> + },
> +};
> +module_platform_driver(aspeed_peci_driver);
> +
> +MODULE_AUTHOR("Ryan Chen <[email protected]>");
> +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
Same comments about MODULE_AUTHOR from patch 6, i.e. make sure this is
not duplicating what MAINTAINERS and git log handle.
I'll pause here until you've had a chance to consider fixes to the
devm vs 'struct device' lifetime issue.
> +MODULE_DESCRIPTION("ASPEED PECI driver");
> +MODULE_LICENSE("GPL");
> +MODULE_IMPORT_NS(PECI);
On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> Since PECI devices are discoverable, we can dynamically detect devices
> that are actually available in the system.
>
> This change complements the earlier implementation by rescanning PECI
> bus to detect available devices. For this purpose, it also introduces the
> minimal API for PECI requests.
>
> Signed-off-by: Iwona Winiarska <[email protected]>
> Reviewed-by: Pierre-Louis Bossart <[email protected]>
> ---
> drivers/peci/Makefile | 2 +-
> drivers/peci/core.c | 13 ++++-
> drivers/peci/device.c | 111 ++++++++++++++++++++++++++++++++++++++++
> drivers/peci/internal.h | 15 ++++++
> drivers/peci/request.c | 74 +++++++++++++++++++++++++++
> drivers/peci/sysfs.c | 34 ++++++++++++
> 6 files changed, 246 insertions(+), 3 deletions(-)
> create mode 100644 drivers/peci/device.c
> create mode 100644 drivers/peci/request.c
>
> diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
> index 621a993e306a..917f689e147a 100644
> --- a/drivers/peci/Makefile
> +++ b/drivers/peci/Makefile
> @@ -1,7 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
>
> # Core functionality
> -peci-y := core.o sysfs.o
> +peci-y := core.o request.o device.o sysfs.o
> obj-$(CONFIG_PECI) += peci.o
>
> # Hardware specific bus drivers
> diff --git a/drivers/peci/core.c b/drivers/peci/core.c
> index 0ad00110459d..ae7a9572cdf3 100644
> --- a/drivers/peci/core.c
> +++ b/drivers/peci/core.c
> @@ -31,7 +31,15 @@ struct device_type peci_controller_type = {
>
> int peci_controller_scan_devices(struct peci_controller *controller)
> {
> - /* Just a stub, no support for actual devices yet */
> + int ret;
> + u8 addr;
> +
> + for (addr = PECI_BASE_ADDR; addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX; addr++) {
> + ret = peci_device_create(controller, addr);
> + if (ret)
> + return ret;
> + }
> +
This seems to be a behavior triggered at peci_controller_add and at the
request of userspace when touching the rescan attribute? A natural way
to handle this would be to have a driver for the peci_controller device
and have that driver issue scan at probe time. Otherwise, how does
userspace know when it is time to rescan the bus?
> return 0;
> }
>
> @@ -106,7 +114,8 @@ EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
>
> static int _unregister(struct device *dev, void *dummy)
> {
> - /* Just a stub, no support for actual devices yet */
> + peci_device_destroy(to_peci_device(dev));
As mentioned previously, this could be delegated to devm to unregister
when the original driver that added the controller goes through -
>remove().
> +
> return 0;
> }
>
> diff --git a/drivers/peci/device.c b/drivers/peci/device.c
> new file mode 100644
> index 000000000000..1124862211e2
> --- /dev/null
> +++ b/drivers/peci/device.c
> @@ -0,0 +1,111 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) 2018-2021 Intel Corporation
> +
> +#include <linux/peci.h>
> +#include <linux/slab.h>
> +
> +#include "internal.h"
> +
> +static int peci_detect(struct peci_controller *controller, u8 addr)
> +{
> + struct peci_request *req;
> + int ret;
> +
> + req = peci_request_alloc(NULL, 0, 0);
> + if (!req)
> + return -ENOMEM;
> +
> + mutex_lock(&controller->bus_lock);
What is the underlying requirement to prevent 2 simultaneous ->xfer()
invocations?
> + ret = controller->xfer(controller, addr, req);
> + mutex_unlock(&controller->bus_lock);
> +
> + peci_request_free(req);
> +
> + return ret;
> +}
> +
> +static bool peci_addr_valid(u8 addr)
> +{
> + return addr >= PECI_BASE_ADDR && addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX;
> +}
> +
> +static int peci_dev_exists(struct device *dev, void *data)
> +{
> + struct peci_device *device = to_peci_device(dev);
> + u8 *addr = data;
> +
> + if (device->addr == *addr)
> + return -EBUSY;
> +
> + return 0;
> +}
> +
> +int peci_device_create(struct peci_controller *controller, u8 addr)
> +{
> + struct peci_device *device;
> + int ret;
> +
> + if (WARN_ON(!peci_addr_valid(addr)))
> + return -EINVAL;
> +
> + /* Check if we have already detected this device before. */
> + ret = device_for_each_child(&controller->dev, &addr, peci_dev_exists);
> + if (ret)
> + return 0;
> +
> + ret = peci_detect(controller, addr);
> + if (ret) {
> + /*
> + * Device not present or host state doesn't allow successful
> + * detection at this time.
> + */
> + if (ret == -EIO || ret == -ETIMEDOUT)
> + return 0;
> +
> + return ret;
> + }
> +
> + device = kzalloc(sizeof(*device), GFP_KERNEL);
> + if (!device)
> + return -ENOMEM;
> +
> + device->controller = controller;
> + device->addr = addr;
> + device->dev.parent = &device->controller->dev;
> + device->dev.bus = &peci_bus_type;
> + device->dev.type = &peci_device_type;
> +
> + ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device->addr);
> + if (ret)
> + goto err_free;
> +
> + ret = device_register(&device->dev);
There is a recent movement away from device_register() to an alloc+add
pattern [1]. I.e. have device_initialize() and device_add() steps. With
that you can unify the error exit to be put_device().
[1]: https://lore.kernel.org/r/[email protected]
> + if (ret)
> + goto err_put;
> +
> + return 0;
> +
> +err_put:
> + put_device(&device->dev);
> +err_free:
> + kfree(device);
> +
> + return ret;
> +}
> +
> +void peci_device_destroy(struct peci_device *device)
> +{
> + device_unregister(&device->dev);
> +}
> +
> +static void peci_device_release(struct device *dev)
> +{
> + struct peci_device *device = to_peci_device(dev);
> +
> + kfree(device);
> +}
> +
> +struct device_type peci_device_type = {
> + .groups = peci_device_groups,
> + .release = peci_device_release,
> +};
> diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
> index 80c61bcdfc6b..6b139adaf6b8 100644
> --- a/drivers/peci/internal.h
> +++ b/drivers/peci/internal.h
> @@ -9,6 +9,21 @@
>
> struct peci_controller;
> struct attribute_group;
> +struct peci_device;
> +struct peci_request;
> +
> +/* PECI CPU address range 0x30-0x37 */
> +#define PECI_BASE_ADDR 0x30
> +#define PECI_DEVICE_NUM_MAX 8
> +
> +struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len);
> +void peci_request_free(struct peci_request *req);
> +
> +extern struct device_type peci_device_type;
> +extern const struct attribute_group *peci_device_groups[];
> +
> +int peci_device_create(struct peci_controller *controller, u8 addr);
> +void peci_device_destroy(struct peci_device *device);
>
> extern struct bus_type peci_bus_type;
> extern const struct attribute_group *peci_bus_groups[];
> diff --git a/drivers/peci/request.c b/drivers/peci/request.c
> new file mode 100644
> index 000000000000..78cee51dfae1
> --- /dev/null
> +++ b/drivers/peci/request.c
> @@ -0,0 +1,74 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) 2021 Intel Corporation
> +
> +#include <linux/export.h>
> +#include <linux/peci.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +
> +#include "internal.h"
> +
> +/**
> + * peci_request_alloc() - allocate &struct peci_request with buffers with given lengths
> + * @device: PECI device to which request is going to be sent
> + * @tx_len: requested TX buffer length
> + * @rx_len: requested RX buffer length
> + *
> + * Return: A pointer to a newly allocated &struct peci_request on success or NULL otherwise.
> + */
> +struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len)
> +{
How big can these lengths be?
> + struct peci_request *req;
> + u8 *tx_buf, *rx_buf;
> +
> + req = kzalloc(sizeof(*req), GFP_KERNEL);
> + if (!req)
> + return NULL;
> +
> + req->device = device;
> +
> + /*
> + * PECI controllers that we are using now don't support DMA, this
> + * should be converted to DMA API once support for controllers that do
> + * allow it is added to avoid an extra copy.
> + */
> + if (tx_len) {
> + tx_buf = kzalloc(tx_len, GFP_KERNEL);
> + if (!tx_buf)
> + goto err_free_req;
> +
> + req->tx.buf = tx_buf;
> + req->tx.len = tx_len;
> + }
> +
> + if (rx_len) {
> + rx_buf = kzalloc(rx_len, GFP_KERNEL);
> + if (!rx_buf)
> + goto err_free_tx;
> +
> + req->rx.buf = rx_buf;
> + req->rx.len = rx_len;
> + }
> +
> + return req;
> +
> +err_free_tx:
> + kfree(req->tx.buf);
> +err_free_req:
> + kfree(req);
> +
> + return NULL;
> +}
> +EXPORT_SYMBOL_NS_GPL(peci_request_alloc, PECI);
> +
> +/**
> + * peci_request_free() - free peci_request
> + * @req: the PECI request to be freed
> + */
> +void peci_request_free(struct peci_request *req)
> +{
> + kfree(req->rx.buf);
> + kfree(req->tx.buf);
> + kfree(req);
> +}
> +EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
> diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c
> index 36c5e2a18a92..db9ef05776e3 100644
> --- a/drivers/peci/sysfs.c
> +++ b/drivers/peci/sysfs.c
> @@ -1,6 +1,8 @@
> // SPDX-License-Identifier: GPL-2.0-only
> // Copyright (c) 2021 Intel Corporation
>
> +#include <linux/device.h>
> +#include <linux/kernel.h>
> #include <linux/peci.h>
>
> #include "internal.h"
> @@ -46,3 +48,35 @@ const struct attribute_group *peci_bus_groups[] = {
> &peci_bus_group,
> NULL
> };
> +
> +static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct peci_device *device = to_peci_device(dev);
> + bool res;
> + int ret;
> +
> + ret = kstrtobool(buf, &res);
> + if (ret)
> + return ret;
> +
> + if (res && device_remove_file_self(dev, attr))
> + peci_device_destroy(device);
> +
> + return count;
> +}
> +static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0200, NULL, remove_store);
Why does userspace need the ability to kick devices off the bus?
Do you have an example userspace tool that is using these sysfs APIs?
> +
> +static struct attribute *peci_device_attrs[] = {
> + &dev_attr_remove.attr,
> + NULL
> +};
> +
> +static const struct attribute_group peci_device_group = {
> + .attrs = peci_device_attrs,
> +};
> +
> +const struct attribute_group *peci_device_groups[] = {
> + &peci_device_group,
> + NULL
> +};
On Wed, 2021-07-14 at 16:58 +0000, Williams, Dan J wrote:
> On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> > Baseboard management controllers (BMC) often run Linux but are
> > usually
> > implemented with non-X86 processors. They can use PECI to access
> > package
> > config space (PCS) registers on the host CPU and since some
> > information,
> > e.g. figuring out the core count, can be obtained using different
> > registers on different CPU generations, they need to decode the
> > family
> > and model.
> >
> > The format of Package Identifier PCS register that describes CPUID
> > information has the same layout as CPUID_1.EAX, so let's allow to
> > reuse
> > cpuid helpers by making it available for other architectures as
> > well.
>
> Just some minor comments below.
>
> You can go ahead and add:
>
> Reviewed-by: Dan Williams <[email protected]>
>
> >
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Tony Luck <[email protected]>
> > ---
> > MAINTAINERS | 2 ++
> > arch/x86/Kconfig | 1 +
> > arch/x86/include/asm/cpu.h | 3 ---
> > arch/x86/include/asm/microcode.h | 2 +-
> > arch/x86/kvm/cpuid.h | 3 ++-
> > arch/x86/lib/Makefile | 2 +-
> > drivers/edac/mce_amd.c | 3 +--
> > include/linux/x86/cpu.h | 9 +++++++++
> > lib/Kconfig | 5 +++++
> > lib/Makefile | 2 ++
> > lib/x86/Makefile | 3 +++
> > {arch/x86/lib => lib/x86}/cpu.c | 2 +-
> > 12 files changed, 28 insertions(+), 9 deletions(-)
> > create mode 100644 include/linux/x86/cpu.h
> > create mode 100644 lib/x86/Makefile
> > rename {arch/x86/lib => lib/x86}/cpu.c (95%)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index ec5987a00800..6f77aaca2a30 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -20081,6 +20081,8 @@ T: git
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/core
> > F: Documentation/devicetree/bindings/x86/
> > F: Documentation/x86/
> > F: arch/x86/
> > +F: include/linux/x86/
>
> Doesn't this technically belong in patch1 since that one introduced
> the directory?
In the first patch we are moving arch/x86/include/intel-family.h
content to a new file, which is why I updated MAINTAINERS just for
"INTEL CPU family model numbers".
Here we're moving other content that was maintained under arch/x86
which is why I extended "X86 ARCHITECTURE (32-BIT AND 64-BIT)".
But I agree - "X86 ARCHITECTURE" includes "INTEL CPU family", so I
guess it makes sense to add both in previous patch (otherwise
get_maintainer.pl would produce different output for
include/linux/x86/intel-family.h until this patch is applied).
Thank you
-Iwona
>
> > +F: lib/x86/
> >
> > X86 ENTRY CODE
> > M: Andy Lutomirski <[email protected]>
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 49270655e827..750f9b896e4f 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -141,6 +141,7 @@ config X86
> > select GENERIC_IRQ_PROBE
> > select GENERIC_IRQ_RESERVATION_MODE
> > select GENERIC_IRQ_SHOW
> > + select GENERIC_LIB_X86
> > select GENERIC_PENDING_IRQ if SMP
> > select GENERIC_PTDUMP
> > select GENERIC_SMP_IDLE_THREAD
> > diff --git a/arch/x86/include/asm/cpu.h
> > b/arch/x86/include/asm/cpu.h
> > index 33d41e350c79..2a663a05a795 100644
> > --- a/arch/x86/include/asm/cpu.h
> > +++ b/arch/x86/include/asm/cpu.h
> > @@ -37,9 +37,6 @@ extern int _debug_hotplug_cpu(int cpu, int
> > action);
> >
> > int mwait_usable(const struct cpuinfo_x86 *);
> >
> > -unsigned int x86_family(unsigned int sig);
> > -unsigned int x86_model(unsigned int sig);
> > -unsigned int x86_stepping(unsigned int sig);
> > #ifdef CONFIG_CPU_SUP_INTEL
> > extern void __init sld_setup(struct cpuinfo_x86 *c);
> > extern void switch_to_sld(unsigned long tifn);
> > diff --git a/arch/x86/include/asm/microcode.h
> > b/arch/x86/include/asm/microcode.h
> > index ab45a220fac4..4b0eabf63b98 100644
> > --- a/arch/x86/include/asm/microcode.h
> > +++ b/arch/x86/include/asm/microcode.h
> > @@ -2,9 +2,9 @@
> > #ifndef _ASM_X86_MICROCODE_H
> > #define _ASM_X86_MICROCODE_H
> >
> > -#include <asm/cpu.h>
> > #include <linux/earlycpio.h>
> > #include <linux/initrd.h>
> > +#include <linux/x86/cpu.h>
>
> Has this patch set received a build success notification from the
> kbuild robot? I.e. are you sure that this include was only here for
> the
>
> unsigned int x86_family(unsigned int sig);
> unsigned int x86_model(unsigned int sig);
> unsigned int x86_stepping(unsigned int sig);
>
> ...helpers. All the other replacements look trivially verifiable as
> only needing these 3 helpers.
>
> >
> > struct ucode_patch {
> > struct list_head plist;
> > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > index c99edfff7f82..bf070d2a2175 100644
> > --- a/arch/x86/kvm/cpuid.h
> > +++ b/arch/x86/kvm/cpuid.h
> > @@ -4,10 +4,11 @@
> >
> > #include "x86.h"
> > #include "reverse_cpuid.h"
> > -#include <asm/cpu.h>
> > #include <asm/processor.h>
> > #include <uapi/asm/kvm_para.h>
> >
> > +#include <linux/x86/cpu.h>
> > +
> > extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
> > void kvm_set_cpu_caps(void);
> >
> > diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> > index bad4dee4f0e4..fd73c1b72c3e 100644
> > --- a/arch/x86/lib/Makefile
> > +++ b/arch/x86/lib/Makefile
> > @@ -41,7 +41,7 @@ clean-files := inat-tables.c
> >
> > obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
> >
> > -lib-y := delay.o misc.o cmdline.o cpu.o
> > +lib-y := delay.o misc.o cmdline.o
> > lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
> > lib-y += memcpy_$(BITS).o
> > lib-$(CONFIG_ARCH_HAS_COPY_MC) += copy_mc.o copy_mc_64.o
> > diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> > index 27d56920b469..f545f5fad02c 100644
> > --- a/drivers/edac/mce_amd.c
> > +++ b/drivers/edac/mce_amd.c
> > @@ -1,8 +1,7 @@
> > // SPDX-License-Identifier: GPL-2.0-only
> > #include <linux/module.h>
> > #include <linux/slab.h>
> > -
> > -#include <asm/cpu.h>
> > +#include <linux/x86/cpu.h>
> >
> > #include "mce_amd.h"
> >
> > diff --git a/include/linux/x86/cpu.h b/include/linux/x86/cpu.h
> > new file mode 100644
> > index 000000000000..5f383d47886d
> > --- /dev/null
> > +++ b/include/linux/x86/cpu.h
> > @@ -0,0 +1,9 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +#ifndef _LINUX_X86_CPU_H
> > +#define _LINUX_X86_CPU_H
> > +
> > +unsigned int x86_family(unsigned int sig);
> > +unsigned int x86_model(unsigned int sig);
> > +unsigned int x86_stepping(unsigned int sig);
> > +
> > +#endif /* _LINUX_X86_CPU_H */
> > diff --git a/lib/Kconfig b/lib/Kconfig
> > index d241fe476fda..cc28bc1f2d84 100644
> > --- a/lib/Kconfig
> > +++ b/lib/Kconfig
> > @@ -718,3 +718,8 @@ config PLDMFW
> >
> > config ASN1_ENCODER
> > tristate
> > +
> > +config GENERIC_LIB_X86
> > + bool
> > + depends on X86
> > + default n
>
> No need for a "default n" line. Omitting a default is the same as
> "default n".
>
On Wed, 2021-07-14 at 16:54 +0000, Williams, Dan J wrote:
> On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> > Baseboard management controllers (BMC) often run Linux but are
> > usually
> > implemented with non-X86 processors. They can use PECI to access
> > package
> > config space (PCS) registers on the host CPU and since some
> > information,
> > e.g. figuring out the core count, can be obtained using different
> > registers on different CPU generations, they need to decode the
> > family
> > and model.
> >
> > Move the data from arch/x86/include/asm/intel-family.h into a new
> > file
> > include/linux/x86/intel-family.h so that it can be used by other
> > architectures.
>
> At least it would make the diffstat smaller to allow for rename
> detection when the old file is deleted in the same patch:
>
> MAINTAINERS | 1 +
> {arch/x86/include/asm => include/linux/x86}/intel-family.h | 6 +++---
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> ...one thing people have done in the past is include a conversion
> script in the changelog that produced the diff. That way if a
> maintainer wants to be sure to catch any new usage of the header at
> the old location they just run the script.
You mean like a simple s#asm/intel-family.h#linux/x86/intel-family.h#g
?
Operating on kernel tree? Or individual patches?
Is including "old" header in new code that big of a deal? I guess it
could break grepability (looking for users of the header, now that it
can be pulled from two different places).
It would be worse if someone decided to add new content to old header,
but this should be easier to catch during review.
>
> I am not aware of x86 maintainer preference here. Either way you decide
> to go you can add:
>
> Reviewed-by: Dan Williams <[email protected]>
>
Thank you
-Iwona
On Thu, 2021-07-15 at 16:51 +0000, Winiarska, Iwona wrote:
> On Wed, 2021-07-14 at 16:58 +0000, Williams, Dan J wrote:
> > On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> > > diff --git a/arch/x86/include/asm/cpu.h
> > > b/arch/x86/include/asm/cpu.h
> > > index 33d41e350c79..2a663a05a795 100644
> > > --- a/arch/x86/include/asm/cpu.h
> > > +++ b/arch/x86/include/asm/cpu.h
> > > @@ -37,9 +37,6 @@ extern int _debug_hotplug_cpu(int cpu, int
> > > action);
> > >
> > > int mwait_usable(const struct cpuinfo_x86 *);
> > >
> > > -unsigned int x86_family(unsigned int sig);
> > > -unsigned int x86_model(unsigned int sig);
> > > -unsigned int x86_stepping(unsigned int sig);
> > > #ifdef CONFIG_CPU_SUP_INTEL
> > > extern void __init sld_setup(struct cpuinfo_x86 *c);
> > > extern void switch_to_sld(unsigned long tifn);
> > > diff --git a/arch/x86/include/asm/microcode.h
> > > b/arch/x86/include/asm/microcode.h
> > > index ab45a220fac4..4b0eabf63b98 100644
> > > --- a/arch/x86/include/asm/microcode.h
> > > +++ b/arch/x86/include/asm/microcode.h
> > > @@ -2,9 +2,9 @@
> > > #ifndef _ASM_X86_MICROCODE_H
> > > #define _ASM_X86_MICROCODE_H
> > >
> > > -#include <asm/cpu.h>
> > > #include <linux/earlycpio.h>
> > > #include <linux/initrd.h>
> > > +#include <linux/x86/cpu.h>
> >
> > Has this patch set received a build success notification from the
> > kbuild robot? I.e. are you sure that this include was only here for
> > the
> >
> > unsigned int x86_family(unsigned int sig);
> > unsigned int x86_model(unsigned int sig);
> > unsigned int x86_stepping(unsigned int sig);
> >
> > ...helpers. All the other replacements look trivially verifiable as
> > only needing these 3 helpers.
Missed the rest of your email in my previous post - sorry.
Yes - and before that I ran this through allyesconfig on x86.
> >
> > >
> > > struct ucode_patch {
> > > struct list_head plist;
> > > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > > index c99edfff7f82..bf070d2a2175 100644
> > > --- a/arch/x86/kvm/cpuid.h
> > > +++ b/arch/x86/kvm/cpuid.h
> > > @@ -4,10 +4,11 @@
> > >
> > > #include "x86.h"
> > > #include "reverse_cpuid.h"
> > > -#include <asm/cpu.h>
> > > #include <asm/processor.h>
> > > #include <uapi/asm/kvm_para.h>
> > >
> > > +#include <linux/x86/cpu.h>
> > > +
> > > extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
> > > void kvm_set_cpu_caps(void);
> > >
> > > diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
> > > index bad4dee4f0e4..fd73c1b72c3e 100644
> > > --- a/arch/x86/lib/Makefile
> > > +++ b/arch/x86/lib/Makefile
> > > @@ -41,7 +41,7 @@ clean-files := inat-tables.c
> > >
> > > obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
> > >
> > > -lib-y := delay.o misc.o cmdline.o cpu.o
> > > +lib-y := delay.o misc.o cmdline.o
> > > lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
> > > lib-y += memcpy_$(BITS).o
> > > lib-$(CONFIG_ARCH_HAS_COPY_MC) += copy_mc.o copy_mc_64.o
> > > diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> > > index 27d56920b469..f545f5fad02c 100644
> > > --- a/drivers/edac/mce_amd.c
> > > +++ b/drivers/edac/mce_amd.c
> > > @@ -1,8 +1,7 @@
> > > // SPDX-License-Identifier: GPL-2.0-only
> > > #include <linux/module.h>
> > > #include <linux/slab.h>
> > > -
> > > -#include <asm/cpu.h>
> > > +#include <linux/x86/cpu.h>
> > >
> > > #include "mce_amd.h"
> > >
> > > diff --git a/include/linux/x86/cpu.h b/include/linux/x86/cpu.h
> > > new file mode 100644
> > > index 000000000000..5f383d47886d
> > > --- /dev/null
> > > +++ b/include/linux/x86/cpu.h
> > > @@ -0,0 +1,9 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +#ifndef _LINUX_X86_CPU_H
> > > +#define _LINUX_X86_CPU_H
> > > +
> > > +unsigned int x86_family(unsigned int sig);
> > > +unsigned int x86_model(unsigned int sig);
> > > +unsigned int x86_stepping(unsigned int sig);
> > > +
> > > +#endif /* _LINUX_X86_CPU_H */
> > > diff --git a/lib/Kconfig b/lib/Kconfig
> > > index d241fe476fda..cc28bc1f2d84 100644
> > > --- a/lib/Kconfig
> > > +++ b/lib/Kconfig
> > > @@ -718,3 +718,8 @@ config PLDMFW
> > >
> > > config ASN1_ENCODER
> > > tristate
> > > +
> > > +config GENERIC_LIB_X86
> > > + bool
> > > + depends on X86
> > > + default n
> >
> > No need for a "default n" line. Omitting a default is the same as
> > "default n".
Sure - I'll fix this in v2.
Thanks
-Iwona
> >
>
On Wed, 2021-07-14 at 16:51 +0000, Williams, Dan J wrote:
> On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> > Note: All changes to arch/x86 are contained within patches 01-02.
>
> Hi Iwona,
>
> One meta question first, who is this submission "To:"? Is there an
> existing upstream maintainer path for OpenBMC changes? Are you
> expecting contributions to this subsystem from others? While Greg
> sometimes ends up as default maintainer for new stuff, I wonder if
> someone from the OpenBMC commnuity should step up to fill this role?
>
The intention was to direct it to Greg, but I guess I didn't express
that through the mail headers.
I am expecting contributions - for example there is at least one other
major BMC vendor which also ships PECI controllers.
From my perspective, the pieces that make up a BMC are pretty loosely
connected (at least from the kernel perspective - scattered all over
the kernel tree), so I don't see how that would work in practice.
Thanks
-Iwona
> >
> > The Platform Environment Control Interface (PECI) is a communication
> > interface between Intel processors and management controllers (e.g.
> > Baseboard Management Controller, BMC).
> >
> > This series adds a PECI subsystem and introduces drivers which run in
> > the Linux instance on the management controller (not the main Intel
> > processor) and is intended to be used by the OpenBMC [1], a Linux
> > distribution for BMC devices.
> > The information exposed over PECI (like processor and DIMM
> > temperature) refers to the Intel processor and can be consumed by
> > daemons running on the BMC to, for example, display the processor
> > temperature in its web interface.
> >
> > The PECI bus is collection of code that provides interface support
> > between PECI devices (that actually represent processors) and PECI
> > controllers (such as the "peci-aspeed" controller) that allow to
> > access physical PECI interface. PECI devices are bound to PECI
> > drivers that provides access to PECI services. This series introduces
> > a generic "peci-cpu" driver that exposes hardware monitoring
> > "cputemp"
> > and "dimmtemp" using the auxiliary bus.
> >
> > Exposing "raw" PECI to userspace, either to write userspace drivers
> > or
> > for debug/testing purpose was left out of this series to encourage
> > writing kernel drivers instead, but may be pursued in the future.
> >
> > Introducing PECI to upstream Linux was already attempted before [2].
> > Since it's been over a year since last revision, and the series
> > changed quite a bit in the meantime, I've decided to start from v1.
> >
> > I would also like to give credit to everyone who helped me with
> > different aspects of preliminary review:
> > - Pierre-Louis Bossart,
> > - Tony Luck,
> > - Andy Shevchenko,
> > - Dave Hansen.
> >
> > [1] https://github.com/openbmc/openbmc
> > [2]
> > https://lore.kernel.org/openbmc/[email protected]/
> >
> > Iwona Winiarska (12):
> > x86/cpu: Move intel-family to arch-independent headers
> > x86/cpu: Extract cpuid helpers to arch-independent
> > dt-bindings: Add generic bindings for PECI
> > dt-bindings: Add bindings for peci-aspeed
> > ARM: dts: aspeed: Add PECI controller nodes
> > peci: Add core infrastructure
> > peci: Add device detection
> > peci: Add support for PECI device drivers
> > peci: Add peci-cpu driver
> > hwmon: peci: Add cputemp driver
> > hwmon: peci: Add dimmtemp driver
> > docs: Add PECI documentation
> >
> > Jae Hyun Yoo (2):
> > peci: Add peci-aspeed controller driver
> > docs: hwmon: Document PECI drivers
> >
> > .../devicetree/bindings/peci/peci-aspeed.yaml | 111 ++++
> > .../bindings/peci/peci-controller.yaml | 28 +
> > Documentation/hwmon/index.rst | 2 +
> > Documentation/hwmon/peci-cputemp.rst | 93 ++++
> > Documentation/hwmon/peci-dimmtemp.rst | 58 ++
> > Documentation/index.rst | 1 +
> > Documentation/peci/index.rst | 16 +
> > Documentation/peci/peci.rst | 48 ++
> > MAINTAINERS | 32 ++
> > arch/arm/boot/dts/aspeed-g4.dtsi | 14 +
> > arch/arm/boot/dts/aspeed-g5.dtsi | 14 +
> > arch/arm/boot/dts/aspeed-g6.dtsi | 14 +
> > arch/x86/Kconfig | 1 +
> > arch/x86/include/asm/cpu.h | 3 -
> > arch/x86/include/asm/intel-family.h | 141 +----
> > arch/x86/include/asm/microcode.h | 2 +-
> > arch/x86/kvm/cpuid.h | 3 +-
> > arch/x86/lib/Makefile | 2 +-
> > drivers/Kconfig | 3 +
> > drivers/Makefile | 1 +
> > drivers/edac/mce_amd.c | 3 +-
> > drivers/hwmon/Kconfig | 2 +
> > drivers/hwmon/Makefile | 1 +
> > drivers/hwmon/peci/Kconfig | 31 ++
> > drivers/hwmon/peci/Makefile | 7 +
> > drivers/hwmon/peci/common.h | 46 ++
> > drivers/hwmon/peci/cputemp.c | 503
> > +++++++++++++++++
> > drivers/hwmon/peci/dimmtemp.c | 508
> > ++++++++++++++++++
> > drivers/peci/Kconfig | 36 ++
> > drivers/peci/Makefile | 10 +
> > drivers/peci/controller/Kconfig | 12 +
> > drivers/peci/controller/Makefile | 3 +
> > drivers/peci/controller/peci-aspeed.c | 501
> > +++++++++++++++++
> > drivers/peci/core.c | 224 ++++++++
> > drivers/peci/cpu.c | 347 ++++++++++++
> > drivers/peci/device.c | 211 ++++++++
> > drivers/peci/internal.h | 137 +++++
> > drivers/peci/request.c | 502
> > +++++++++++++++++
> > drivers/peci/sysfs.c | 82 +++
> > include/linux/peci-cpu.h | 38 ++
> > include/linux/peci.h | 93 ++++
> > include/linux/x86/cpu.h | 9 +
> > include/linux/x86/intel-family.h | 146 +++++
> > lib/Kconfig | 5 +
> > lib/Makefile | 2 +
> > lib/x86/Makefile | 3 +
> > {arch/x86/lib => lib/x86}/cpu.c | 2 +-
> > 47 files changed, 3902 insertions(+), 149 deletions(-)
> > create mode 100644 Documentation/devicetree/bindings/peci/peci-
> > aspeed.yaml
> > create mode 100644 Documentation/devicetree/bindings/peci/peci-
> > controller.yaml
> > create mode 100644 Documentation/hwmon/peci-cputemp.rst
> > create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
> > create mode 100644 Documentation/peci/index.rst
> > create mode 100644 Documentation/peci/peci.rst
> > create mode 100644 drivers/hwmon/peci/Kconfig
> > create mode 100644 drivers/hwmon/peci/Makefile
> > create mode 100644 drivers/hwmon/peci/common.h
> > create mode 100644 drivers/hwmon/peci/cputemp.c
> > create mode 100644 drivers/hwmon/peci/dimmtemp.c
> > create mode 100644 drivers/peci/Kconfig
> > create mode 100644 drivers/peci/Makefile
> > create mode 100644 drivers/peci/controller/Kconfig
> > create mode 100644 drivers/peci/controller/Makefile
> > create mode 100644 drivers/peci/controller/peci-aspeed.c
> > create mode 100644 drivers/peci/core.c
> > create mode 100644 drivers/peci/cpu.c
> > create mode 100644 drivers/peci/device.c
> > create mode 100644 drivers/peci/internal.h
> > create mode 100644 drivers/peci/request.c
> > create mode 100644 drivers/peci/sysfs.c
> > create mode 100644 include/linux/peci-cpu.h
> > create mode 100644 include/linux/peci.h
> > create mode 100644 include/linux/x86/cpu.h
> > create mode 100644 include/linux/x86/intel-family.h
> > create mode 100644 lib/x86/Makefile
> > rename {arch/x86/lib => lib/x86}/cpu.c (95%)
> >
>
On Tue, Jul 13, 2021 at 12:04:44AM +0200, Iwona Winiarska wrote:
> Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
> readings of the processor package and processor cores that are
> accessible via the PECI interface.
>
> The main use case for the driver (and PECI interface) is out-of-band
> management, where we're able to obtain the DTS readings from an external
> entity connected with PECI, e.g. BMC on server platforms.
>
> Co-developed-by: Jae Hyun Yoo <[email protected]>
> Signed-off-by: Jae Hyun Yoo <[email protected]>
> Signed-off-by: Iwona Winiarska <[email protected]>
> Reviewed-by: Pierre-Louis Bossart <[email protected]>
Note: Due to lack of revision information, this review does not take
any previous discussions into account, and it may miss critical information.
For a final review I'll have to compare the code against earlier versions
to determine if there are any relevant changes and if all comments
have been addressed. This may take some time.
> ---
> MAINTAINERS | 7 +
> drivers/hwmon/Kconfig | 2 +
> drivers/hwmon/Makefile | 1 +
> drivers/hwmon/peci/Kconfig | 18 ++
> drivers/hwmon/peci/Makefile | 5 +
> drivers/hwmon/peci/common.h | 46 ++++
> drivers/hwmon/peci/cputemp.c | 503 +++++++++++++++++++++++++++++++++++
> 7 files changed, 582 insertions(+)
> create mode 100644 drivers/hwmon/peci/Kconfig
> create mode 100644 drivers/hwmon/peci/Makefile
> create mode 100644 drivers/hwmon/peci/common.h
> create mode 100644 drivers/hwmon/peci/cputemp.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f47b5f634293..35ba9e3646bd 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14504,6 +14504,13 @@ L: [email protected]
> S: Maintained
> F: drivers/platform/x86/peaq-wmi.c
>
> +PECI HARDWARE MONITORING DRIVERS
> +M: Iwona Winiarska <[email protected]>
> +R: Jae Hyun Yoo <[email protected]>
> +L: [email protected]
> +S: Supported
> +F: drivers/hwmon/peci/
> +
> PECI SUBSYSTEM
> M: Iwona Winiarska <[email protected]>
> R: Jae Hyun Yoo <[email protected]>
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index e3675377bc5d..61c0e3404415 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1507,6 +1507,8 @@ config SENSORS_PCF8591
> These devices are hard to detect and rarely found on mainstream
> hardware. If unsure, say N.
>
> +source "drivers/hwmon/peci/Kconfig"
> +
> source "drivers/hwmon/pmbus/Kconfig"
>
> config SENSORS_PWM_FAN
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index d712c61c1f5e..f52331f212ed 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -202,6 +202,7 @@ obj-$(CONFIG_SENSORS_WM8350) += wm8350-hwmon.o
> obj-$(CONFIG_SENSORS_XGENE) += xgene-hwmon.o
>
> obj-$(CONFIG_SENSORS_OCC) += occ/
> +obj-$(CONFIG_SENSORS_PECI) += peci/
> obj-$(CONFIG_PMBUS) += pmbus/
>
> ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
> diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
> new file mode 100644
> index 000000000000..e10eed68d70a
> --- /dev/null
> +++ b/drivers/hwmon/peci/Kconfig
> @@ -0,0 +1,18 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config SENSORS_PECI_CPUTEMP
> + tristate "PECI CPU temperature monitoring client"
> + depends on PECI
> + select SENSORS_PECI
> + select PECI_CPU
> + help
> + If you say yes here you get support for the generic Intel PECI
> + cputemp driver which provides Digital Thermal Sensor (DTS) thermal
> + readings of the CPU package and CPU cores that are accessible via
> + the processor PECI interface.
> +
> + This driver can also be built as a module. If so, the module
> + will be called peci-cputemp.
> +
> +config SENSORS_PECI
> + tristate
> diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
> new file mode 100644
> index 000000000000..e8a0ada5ab1f
> --- /dev/null
> +++ b/drivers/hwmon/peci/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +peci-cputemp-y := cputemp.o
> +
> +obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
> diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
> new file mode 100644
> index 000000000000..54580c100d06
> --- /dev/null
> +++ b/drivers/hwmon/peci/common.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (c) 2021 Intel Corporation */
> +
> +#include <linux/types.h>
> +
> +#ifndef __PECI_HWMON_COMMON_H
> +#define __PECI_HWMON_COMMON_H
> +
> +#define UPDATE_INTERVAL_DEFAULT HZ
> +
> +/**
> + * struct peci_sensor_data - PECI sensor information
> + * @valid: flag to indicate the sensor value is valid
> + * @value: sensor value in milli units
> + * @last_updated: time of the last update in jiffies
> + */
> +struct peci_sensor_data {
> + unsigned int valid;
Please use bool.
> + s32 value;
> + unsigned long last_updated;
> +};
> +
> +/**
> + * peci_sensor_need_update() - check whether sensor update is needed or not
> + * @sensor: pointer to sensor data struct
> + *
> + * Return: true if update is needed, false if not.
> + */
> +
> +static inline bool peci_sensor_need_update(struct peci_sensor_data *sensor)
> +{
> + return !sensor->valid ||
> + time_after(jiffies, sensor->last_updated + UPDATE_INTERVAL_DEFAULT);
Since there is no other update interval, _DEFAULT does not have any value.
Please drop. Also, please select a prefix such as PECI_.
> +}
> +
> +/**
> + * peci_sensor_mark_updated() - mark the sensor is updated
> + * @sensor: pointer to sensor data struct
> + */
> +static inline void peci_sensor_mark_updated(struct peci_sensor_data *sensor)
> +{
> + sensor->valid = 1;
= true;
> + sensor->last_updated = jiffies;
> +}
> +
> +#endif /* __PECI_HWMON_COMMON_H */
> diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
> new file mode 100644
> index 000000000000..56a526471687
> --- /dev/null
> +++ b/drivers/hwmon/peci/cputemp.c
> @@ -0,0 +1,503 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) 2018-2021 Intel Corporation
> +
> +#include <linux/auxiliary_bus.h>
> +#include <linux/bitfield.h>
> +#include <linux/bitops.h>
> +#include <linux/hwmon.h>
> +#include <linux/jiffies.h>
> +#include <linux/module.h>
> +#include <linux/peci.h>
> +#include <linux/peci-cpu.h>
> +#include <linux/units.h>
> +#include <linux/x86/intel-family.h>
> +
> +#include "common.h"
> +
> +#define CORE_NUMS_MAX 64
> +
> +#define DEFAULT_CHANNEL_NUMS 5
> +#define CORETEMP_CHANNEL_NUMS CORE_NUMS_MAX
> +#define CPUTEMP_CHANNEL_NUMS (DEFAULT_CHANNEL_NUMS + CORETEMP_CHANNEL_NUMS)
> +
> +#define TEMP_TARGET_FAN_TEMP_MASK GENMASK(15, 8)
> +#define TEMP_TARGET_REF_TEMP_MASK GENMASK(23, 16)
> +#define TEMP_TARGET_TJ_OFFSET_MASK GENMASK(29, 24)
> +
> +#define DTS_MARGIN_MASK GENMASK(15, 0)
> +#define PCS_MODULE_TEMP_MASK GENMASK(15, 0)
> +
> +#define DTS_FIXED_POINT_FRACTION 64
> +
> +struct resolved_cores_reg {
> + u8 bus;
> + u8 dev;
> + u8 func;
> + u8 offset;
> +};
> +
> +struct cpu_info {
> + struct resolved_cores_reg *reg;
> + u8 min_peci_revision;
> +};
> +
> +struct peci_cputemp {
> + struct peci_device *peci_dev;
> + struct device *dev;
> + const char *name;
> + const struct cpu_info *gen_info;
> + struct {
> + struct peci_sensor_data die;
> + struct peci_sensor_data dts;
> + struct peci_sensor_data tcontrol;
> + struct peci_sensor_data tthrottle;
> + struct peci_sensor_data tjmax;
> + struct peci_sensor_data core[CORETEMP_CHANNEL_NUMS];
> + } temp;
> + const char **coretemp_label;
> + DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
> +};
> +
> +enum cputemp_channels {
> + channel_die,
> + channel_dts,
> + channel_tcontrol,
> + channel_tthrottle,
> + channel_tjmax,
> + channel_core,
> +};
> +
> +static const char *cputemp_label[DEFAULT_CHANNEL_NUMS] = {
> + "Die",
> + "DTS",
> + "Tcontrol",
> + "Tthrottle",
> + "Tjmax",
> +};
> +
> +static int get_temp_targets(struct peci_cputemp *priv)
> +{
> + s32 tthrottle_offset, tcontrol_margin;
> + u32 pcs;
> + int ret;
> +
> + /*
> + * Just use only the tcontrol marker to determine if target values need
> + * update.
> + */
> + if (!peci_sensor_need_update(&priv->temp.tcontrol))
> + return 0;
> +
True for the entire code: Please explain how this avoids race conditions
without locking between the condition check here and the call to
peci_sensor_mark_updated() below. The explanation needs to be added
as comment into the code for later reference.
> + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
> + if (ret)
> + return ret;
> +
> + priv->temp.tjmax.value = FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
> +
> + tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
> + tcontrol_margin = sign_extend32(tcontrol_margin, 7) * MILLIDEGREE_PER_DEGREE;
> + priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
> +
> + tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
> + priv->temp.tthrottle.value = priv->temp.tjmax.value - tthrottle_offset;
> +
> + peci_sensor_mark_updated(&priv->temp.tcontrol);
> +
> + return 0;
> +}
> +
> +/*
> + * Processors return a value of DTS reading in S10.6 fixed point format
> + * (sign, 10 bits signed integer value, 6 bits fractional).
> + * Error codes:
> + * 0x8000: General sensor error
> + * 0x8001: Reserved
> + * 0x8002: Underflow on reading value
> + * 0x8003-0x81ff: Reserved
> + */
> +static bool dts_valid(s32 val)
> +{
> + return val < 0x8000 || val > 0x81ff;
> +}
> +
> +static s32 dts_to_millidegree(s32 val)
> +{
> + return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / DTS_FIXED_POINT_FRACTION;
> +}
> +
> +static int get_die_temp(struct peci_cputemp *priv)
> +{
> + s16 temp;
> + int ret;
> +
> + if (!peci_sensor_need_update(&priv->temp.die))
> + return 0;
> +
> + ret = peci_temp_read(priv->peci_dev, &temp);
> + if (ret)
> + return ret;
> +
> + if (!dts_valid(temp))
> + return -EIO;
> +
> + /* Note that the tjmax should be available before calling it */
> + priv->temp.die.value = priv->temp.tjmax.value + dts_to_millidegree(temp);
> +
> + peci_sensor_mark_updated(&priv->temp.die);
> +
> + return 0;
> +}
> +
> +static int get_dts(struct peci_cputemp *priv)
> +{
> + s32 dts_margin;
> + u32 pcs;
> + int ret;
> +
> + if (!peci_sensor_need_update(&priv->temp.dts))
> + return 0;
> +
> + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_THERMAL_MARGIN, 0, &pcs);
> + if (ret)
> + return ret;
> +
> + dts_margin = FIELD_GET(DTS_MARGIN_MASK, pcs);
> + if (!dts_valid(dts_margin))
> + return -EIO;
> +
> + /* Note that the tcontrol should be available before calling it */
> + priv->temp.dts.value = priv->temp.tcontrol.value - dts_to_millidegree(dts_margin);
> +
> + peci_sensor_mark_updated(&priv->temp.dts);
> +
> + return 0;
> +}
> +
> +static int get_core_temp(struct peci_cputemp *priv, int core_index)
> +{
> + s32 core_dts_margin;
> + u32 pcs;
> + int ret;
> +
> + if (!peci_sensor_need_update(&priv->temp.core[core_index]))
> + return 0;
> +
> + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_MODULE_TEMP, core_index, &pcs);
> + if (ret)
> + return ret;
> +
> + core_dts_margin = FIELD_GET(PCS_MODULE_TEMP_MASK, pcs);
> + if (!dts_valid(core_dts_margin))
> + return -EIO;
> +
> + /* Note that the tjmax should be available before calling it */
> + priv->temp.core[core_index].value =
> + priv->temp.tjmax.value + dts_to_millidegree(core_dts_margin);
> +
> + peci_sensor_mark_updated(&priv->temp.core[core_index]);
> +
> + return 0;
> +}
> +
> +static int cputemp_read_string(struct device *dev, enum hwmon_sensor_types type,
> + u32 attr, int channel, const char **str)
> +{
> + struct peci_cputemp *priv = dev_get_drvdata(dev);
> +
> + if (attr != hwmon_temp_label)
> + return -EOPNOTSUPP;
> +
> + *str = channel < channel_core ?
> + cputemp_label[channel] : priv->coretemp_label[channel - channel_core];
> +
> + return 0;
> +}
> +
> +static int cputemp_read(struct device *dev, enum hwmon_sensor_types type,
> + u32 attr, int channel, long *val)
> +{
> + struct peci_cputemp *priv = dev_get_drvdata(dev);
> + int ret, core_index;
> +
> + ret = get_temp_targets(priv);
> + if (ret)
> + return ret;
> +
> + switch (attr) {
> + case hwmon_temp_input:
> + switch (channel) {
> + case channel_die:
> + ret = get_die_temp(priv);
> + if (ret)
> + return ret;
> +
> + *val = priv->temp.die.value;
> + break;
> + case channel_dts:
> + ret = get_dts(priv);
> + if (ret)
> + return ret;
> +
> + *val = priv->temp.dts.value;
> + break;
> + case channel_tcontrol:
> + *val = priv->temp.tcontrol.value;
> + break;
> + case channel_tthrottle:
> + *val = priv->temp.tthrottle.value;
> + break;
> + case channel_tjmax:
> + *val = priv->temp.tjmax.value;
> + break;
> + default:
> + core_index = channel - channel_core;
> + ret = get_core_temp(priv, core_index);
> + if (ret)
> + return ret;
> +
> + *val = priv->temp.core[core_index].value;
> + break;
> + }
> + break;
> + case hwmon_temp_max:
> + *val = priv->temp.tcontrol.value;
> + break;
> + case hwmon_temp_crit:
> + *val = priv->temp.tjmax.value;
> + break;
> + case hwmon_temp_crit_hyst:
> + *val = priv->temp.tjmax.value - priv->temp.tcontrol.value;
> + break;
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + return 0;
> +}
> +
> +static umode_t cputemp_is_visible(const void *data, enum hwmon_sensor_types type,
> + u32 attr, int channel)
> +{
> + const struct peci_cputemp *priv = data;
> +
> + if (channel > CPUTEMP_CHANNEL_NUMS)
> + return 0;
> +
> + if (channel < channel_core)
> + return 0444;
> +
> + if (test_bit(channel - channel_core, priv->core_mask))
> + return 0444;
> +
> + return 0;
> +}
> +
> +static int init_core_mask(struct peci_cputemp *priv)
> +{
> + struct peci_device *peci_dev = priv->peci_dev;
> + struct resolved_cores_reg *reg = priv->gen_info->reg;
> + u64 core_mask;
> + u32 data;
> + int ret;
> +
> + /* Get the RESOLVED_CORES register value */
> + switch (peci_dev->info.model) {
> + case INTEL_FAM6_ICELAKE_X:
> + case INTEL_FAM6_ICELAKE_D:
> + ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
> + reg->func, reg->offset + 4, &data);
> + if (ret)
> + return ret;
> +
> + core_mask = (u64)data << 32;
> +
> + ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
> + reg->func, reg->offset, &data);
> + if (ret)
> + return ret;
> +
> + core_mask |= data;
> +
> + break;
> + default:
> + ret = peci_pci_local_read(peci_dev, reg->bus, reg->dev,
> + reg->func, reg->offset, &data);
> + if (ret)
> + return ret;
> +
> + core_mask = data;
> +
> + break;
> + }
> +
> + if (!core_mask)
> + return -EIO;
> +
> + bitmap_from_u64(priv->core_mask, core_mask);
> +
> + return 0;
> +}
> +
> +static int create_temp_label(struct peci_cputemp *priv)
> +{
> + unsigned long core_max = find_last_bit(priv->core_mask, CORE_NUMS_MAX);
> + int i;
> +
> + priv->coretemp_label = devm_kzalloc(priv->dev, core_max * sizeof(char *), GFP_KERNEL);
> + if (!priv->coretemp_label)
> + return -ENOMEM;
> +
> + for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
> + priv->coretemp_label[i] = devm_kasprintf(priv->dev, GFP_KERNEL, "Core %d", i);
> + if (!priv->coretemp_label[i])
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +static void check_resolved_cores(struct peci_cputemp *priv)
> +{
> + int ret;
> +
> + ret = init_core_mask(priv);
> + if (ret)
> + return;
> +
> + ret = create_temp_label(priv);
> + if (ret)
> + bitmap_zero(priv->core_mask, CORE_NUMS_MAX);
This needs a comment explaining why it is ok to ignore the above errors.
I understand it is because the non-core data will still be available.
Yet, it still needs to be explained so others don't need to examine
the code to figure out the reason.
> +}
> +
> +static const struct hwmon_ops peci_cputemp_ops = {
> + .is_visible = cputemp_is_visible,
> + .read_string = cputemp_read_string,
> + .read = cputemp_read,
> +};
> +
> +static const u32 peci_cputemp_temp_channel_config[] = {
> + /* Die temperature */
> + HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
> + /* DTS margin */
> + HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
> + /* Tcontrol temperature */
> + HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
> + /* Tthrottle temperature */
> + HWMON_T_LABEL | HWMON_T_INPUT,
> + /* Tjmax temperature */
> + HWMON_T_LABEL | HWMON_T_INPUT,
> + /* Core temperature - for all core channels */
> + [channel_core ... CPUTEMP_CHANNEL_NUMS - 1] = HWMON_T_LABEL | HWMON_T_INPUT,
> + 0
> +};
> +
> +static const struct hwmon_channel_info peci_cputemp_temp_channel = {
> + .type = hwmon_temp,
> + .config = peci_cputemp_temp_channel_config,
> +};
> +
> +static const struct hwmon_channel_info *peci_cputemp_info[] = {
> + &peci_cputemp_temp_channel,
> + NULL
> +};
> +
> +static const struct hwmon_chip_info peci_cputemp_chip_info = {
> + .ops = &peci_cputemp_ops,
> + .info = peci_cputemp_info,
> +};
> +
> +static int peci_cputemp_probe(struct auxiliary_device *adev,
> + const struct auxiliary_device_id *id)
> +{
> + struct device *dev = &adev->dev;
> + struct peci_device *peci_dev = to_peci_device(dev->parent);
> + struct peci_cputemp *priv;
> + struct device *hwmon_dev;
> +
> + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> + if (!priv)
> + return -ENOMEM;
> +
> + priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_cputemp.cpu%d",
> + peci_dev->info.socket_id);
> + if (!priv->name)
> + return -ENOMEM;
> +
> + dev_set_drvdata(dev, priv);
What is this used for ?
> + priv->dev = dev;
> + priv->peci_dev = peci_dev;
> + priv->gen_info = (const struct cpu_info *)id->driver_data;
> +
> + check_resolved_cores(priv);
> +
> + hwmon_dev = devm_hwmon_device_register_with_info(priv->dev, priv->name,
> + priv, &peci_cputemp_chip_info, NULL);
> +
> + return PTR_ERR_OR_ZERO(hwmon_dev);
> +}
> +
> +static struct resolved_cores_reg resolved_cores_reg_hsx = {
> + .bus = 1,
> + .dev = 30,
> + .func = 3,
> + .offset = 0xb4,
> +};
> +
> +static struct resolved_cores_reg resolved_cores_reg_icx = {
> + .bus = 14,
> + .dev = 30,
> + .func = 3,
> + .offset = 0xd0,
> +};
Please explain those magic numbers.
> +
> +static const struct cpu_info cpu_hsx = {
> + .reg = &resolved_cores_reg_hsx,
> + .min_peci_revision = 0x30,
> +};
> +
> +static const struct cpu_info cpu_icx = {
> + .reg = &resolved_cores_reg_icx,
> + .min_peci_revision = 0x40,
> +};
> +
> +static const struct auxiliary_device_id peci_cputemp_ids[] = {
> + {
> + .name = "peci_cpu.cputemp.hsx",
> + .driver_data = (kernel_ulong_t)&cpu_hsx,
> + },
> + {
> + .name = "peci_cpu.cputemp.bdx",
> + .driver_data = (kernel_ulong_t)&cpu_hsx,
> + },
> + {
> + .name = "peci_cpu.cputemp.bdxd",
> + .driver_data = (kernel_ulong_t)&cpu_hsx,
> + },
> + {
> + .name = "peci_cpu.cputemp.skx",
> + .driver_data = (kernel_ulong_t)&cpu_hsx,
> + },
> + {
> + .name = "peci_cpu.cputemp.icx",
> + .driver_data = (kernel_ulong_t)&cpu_icx,
> + },
> + {
> + .name = "peci_cpu.cputemp.icxd",
> + .driver_data = (kernel_ulong_t)&cpu_icx,
> + },
> + { }
> +};
> +MODULE_DEVICE_TABLE(auxiliary, peci_cputemp_ids);
> +
> +static struct auxiliary_driver peci_cputemp_driver = {
> + .probe = peci_cputemp_probe,
> + .id_table = peci_cputemp_ids,
> +};
> +
> +module_auxiliary_driver(peci_cputemp_driver);
> +
> +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
> +MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
> +MODULE_DESCRIPTION("PECI cputemp driver");
> +MODULE_LICENSE("GPL");
> +MODULE_IMPORT_NS(PECI_CPU);
On Tue, Jul 13, 2021 at 12:04:45AM +0200, Iwona Winiarska wrote:
> Add peci-dimmtemp driver for Digital Thermal Sensor (DTS) thermal
> readings of DIMMs that are accessible via the processor PECI interface.
>
> The main use case for the driver (and PECI interface) is out-of-band
> management, where we're able to obtain the DTS readings from an external
> entity connected with PECI, e.g. BMC on server platforms.
>
> Co-developed-by: Jae Hyun Yoo <[email protected]>
> Signed-off-by: Jae Hyun Yoo <[email protected]>
> Signed-off-by: Iwona Winiarska <[email protected]>
> Reviewed-by: Pierre-Louis Bossart <[email protected]>
> ---
> drivers/hwmon/peci/Kconfig | 13 +
> drivers/hwmon/peci/Makefile | 2 +
> drivers/hwmon/peci/dimmtemp.c | 508 ++++++++++++++++++++++++++++++++++
> 3 files changed, 523 insertions(+)
> create mode 100644 drivers/hwmon/peci/dimmtemp.c
>
> diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
> index e10eed68d70a..f2d57efa508b 100644
> --- a/drivers/hwmon/peci/Kconfig
> +++ b/drivers/hwmon/peci/Kconfig
> @@ -14,5 +14,18 @@ config SENSORS_PECI_CPUTEMP
> This driver can also be built as a module. If so, the module
> will be called peci-cputemp.
>
> +config SENSORS_PECI_DIMMTEMP
> + tristate "PECI DIMM temperature monitoring client"
> + depends on PECI
> + select SENSORS_PECI
> + select PECI_CPU
> + help
> + If you say yes here you get support for the generic Intel PECI hwmon
> + driver which provides Digital Thermal Sensor (DTS) thermal readings of
> + DIMM components that are accessible via the processor PECI interface.
> +
> + This driver can also be built as a module. If so, the module
> + will be called peci-dimmtemp.
> +
> config SENSORS_PECI
> tristate
> diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
> index e8a0ada5ab1f..191cfa0227f3 100644
> --- a/drivers/hwmon/peci/Makefile
> +++ b/drivers/hwmon/peci/Makefile
> @@ -1,5 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
>
> peci-cputemp-y := cputemp.o
> +peci-dimmtemp-y := dimmtemp.o
>
> obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
> +obj-$(CONFIG_SENSORS_PECI_DIMMTEMP) += peci-dimmtemp.o
> diff --git a/drivers/hwmon/peci/dimmtemp.c b/drivers/hwmon/peci/dimmtemp.c
> new file mode 100644
> index 000000000000..2fcb8607137a
> --- /dev/null
> +++ b/drivers/hwmon/peci/dimmtemp.c
> @@ -0,0 +1,508 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) 2018-2021 Intel Corporation
> +
> +#include <linux/auxiliary_bus.h>
> +#include <linux/bitfield.h>
> +#include <linux/bitops.h>
> +#include <linux/hwmon.h>
> +#include <linux/jiffies.h>
> +#include <linux/module.h>
> +#include <linux/peci.h>
> +#include <linux/peci-cpu.h>
> +#include <linux/units.h>
> +#include <linux/workqueue.h>
> +#include <linux/x86/intel-family.h>
> +
> +#include "common.h"
> +
> +#define DIMM_MASK_CHECK_DELAY_JIFFIES msecs_to_jiffies(5000)
> +#define DIMM_MASK_CHECK_RETRY_MAX 60 /* 60 x 5 secs = 5 minutes */
> +
> +/* Max number of channel ranks and DIMM index per channel */
> +#define CHAN_RANK_MAX_ON_HSX 8
> +#define DIMM_IDX_MAX_ON_HSX 3
> +#define CHAN_RANK_MAX_ON_BDX 4
> +#define DIMM_IDX_MAX_ON_BDX 3
> +#define CHAN_RANK_MAX_ON_BDXD 2
> +#define DIMM_IDX_MAX_ON_BDXD 2
> +#define CHAN_RANK_MAX_ON_SKX 6
> +#define DIMM_IDX_MAX_ON_SKX 2
> +#define CHAN_RANK_MAX_ON_ICX 8
> +#define DIMM_IDX_MAX_ON_ICX 2
> +#define CHAN_RANK_MAX_ON_ICXD 4
> +#define DIMM_IDX_MAX_ON_ICXD 2
> +
> +#define CHAN_RANK_MAX CHAN_RANK_MAX_ON_HSX
> +#define DIMM_IDX_MAX DIMM_IDX_MAX_ON_HSX
> +#define DIMM_NUMS_MAX (CHAN_RANK_MAX * DIMM_IDX_MAX)
> +
> +#define CPU_SEG_MASK GENMASK(23, 16)
> +#define GET_CPU_SEG(x) (((x) & CPU_SEG_MASK) >> 16)
> +#define CPU_BUS_MASK GENMASK(7, 0)
> +#define GET_CPU_BUS(x) ((x) & CPU_BUS_MASK)
> +
> +#define DIMM_TEMP_MAX GENMASK(15, 8)
> +#define DIMM_TEMP_CRIT GENMASK(23, 16)
> +#define GET_TEMP_MAX(x) (((x) & DIMM_TEMP_MAX) >> 8)
> +#define GET_TEMP_CRIT(x) (((x) & DIMM_TEMP_CRIT) >> 16)
> +
> +struct dimm_info {
> + int chan_rank_max;
> + int dimm_idx_max;
> + u8 min_peci_revision;
> +};
> +
> +struct peci_dimmtemp {
> + struct peci_device *peci_dev;
> + struct device *dev;
> + const char *name;
> + const struct dimm_info *gen_info;
> + struct delayed_work detect_work;
> + struct peci_sensor_data temp[DIMM_NUMS_MAX];
> + long temp_max[DIMM_NUMS_MAX];
> + long temp_crit[DIMM_NUMS_MAX];
> + int retry_count;
> + char **dimmtemp_label;
> + DECLARE_BITMAP(dimm_mask, DIMM_NUMS_MAX);
> +};
> +
> +static u8 __dimm_temp(u32 reg, int dimm_order)
> +{
> + return (reg >> (dimm_order * 8)) & 0xff;
> +}
> +
> +static int get_dimm_temp(struct peci_dimmtemp *priv, int dimm_no)
> +{
> + int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
> + int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
> + struct peci_device *peci_dev = priv->peci_dev;
> + u8 cpu_seg, cpu_bus, dev, func;
> + u64 offset;
> + u32 data;
> + u16 reg;
> + int ret;
> +
> + if (!peci_sensor_need_update(&priv->temp[dimm_no]))
> + return 0;
> +
> + ret = peci_pcs_read(peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank, &data);
> + if (ret)
> + return ret;
> +
Similar to the cpu driver, the lack of mutex protection needs to be explained.
> + priv->temp[dimm_no].value = __dimm_temp(data, dimm_order) * MILLIDEGREE_PER_DEGREE;
> +
> + switch (peci_dev->info.model) {
> + case INTEL_FAM6_ICELAKE_X:
> + case INTEL_FAM6_ICELAKE_D:
> + ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd4, &data);
> + if (ret || !(data & BIT(31)))
> + break; /* Use default or previous value */
> +
> + ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd0, &data);
> + if (ret)
> + break; /* Use default or previous value */
> +
> + cpu_seg = GET_CPU_SEG(data);
> + cpu_bus = GET_CPU_BUS(data);
> +
> + /*
> + * Device 26, Offset 224e0: IMC 0 channel 0 -> rank 0
> + * Device 26, Offset 264e0: IMC 0 channel 1 -> rank 1
> + * Device 27, Offset 224e0: IMC 1 channel 0 -> rank 2
> + * Device 27, Offset 264e0: IMC 1 channel 1 -> rank 3
> + * Device 28, Offset 224e0: IMC 2 channel 0 -> rank 4
> + * Device 28, Offset 264e0: IMC 2 channel 1 -> rank 5
> + * Device 29, Offset 224e0: IMC 3 channel 0 -> rank 6
> + * Device 29, Offset 264e0: IMC 3 channel 1 -> rank 7
> + */
> + dev = 0x1a + chan_rank / 2;
> + offset = 0x224e0 + dimm_order * 4;
> + if (chan_rank % 2)
> + offset += 0x4000;
> +
> + ret = peci_mmio_read(peci_dev, 0, cpu_seg, cpu_bus, dev, 0, offset, &data);
> + if (ret)
> + return ret;
> +
> + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
> + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
> +
> + break;
> + case INTEL_FAM6_SKYLAKE_X:
> + /*
> + * Device 10, Function 2: IMC 0 channel 0 -> rank 0
> + * Device 10, Function 6: IMC 0 channel 1 -> rank 1
> + * Device 11, Function 2: IMC 0 channel 2 -> rank 2
> + * Device 12, Function 2: IMC 1 channel 0 -> rank 3
> + * Device 12, Function 6: IMC 1 channel 1 -> rank 4
> + * Device 13, Function 2: IMC 1 channel 2 -> rank 5
> + */
> + dev = 10 + chan_rank / 3 * 2 + (chan_rank % 3 == 2 ? 1 : 0);
> + func = chan_rank % 3 == 1 ? 6 : 2;
> + reg = 0x120 + dimm_order * 4;
> +
> + ret = peci_pci_local_read(peci_dev, 2, dev, func, reg, &data);
> + if (ret)
> + return ret;
> +
> + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
> + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
> +
> + break;
> + case INTEL_FAM6_BROADWELL_D:
> + /*
> + * Device 10, Function 2: IMC 0 channel 0 -> rank 0
> + * Device 10, Function 6: IMC 0 channel 1 -> rank 1
> + * Device 12, Function 2: IMC 1 channel 0 -> rank 2
> + * Device 12, Function 6: IMC 1 channel 1 -> rank 3
> + */
> + dev = 10 + chan_rank / 2 * 2;
> + func = (chan_rank % 2) ? 6 : 2;
> + reg = 0x120 + dimm_order * 4;
> +
> + ret = peci_pci_local_read(peci_dev, 2, dev, func, reg, &data);
> + if (ret)
> + return ret;
> +
> + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
> + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
> +
> + break;
> + case INTEL_FAM6_HASWELL_X:
> + case INTEL_FAM6_BROADWELL_X:
> + /*
> + * Device 20, Function 0: IMC 0 channel 0 -> rank 0
> + * Device 20, Function 1: IMC 0 channel 1 -> rank 1
> + * Device 21, Function 0: IMC 0 channel 2 -> rank 2
> + * Device 21, Function 1: IMC 0 channel 3 -> rank 3
> + * Device 23, Function 0: IMC 1 channel 0 -> rank 4
> + * Device 23, Function 1: IMC 1 channel 1 -> rank 5
> + * Device 24, Function 0: IMC 1 channel 2 -> rank 6
> + * Device 24, Function 1: IMC 1 channel 3 -> rank 7
> + */
> + dev = 20 + chan_rank / 2 + chan_rank / 4;
> + func = chan_rank % 2;
> + reg = 0x120 + dimm_order * 4;
> +
> + ret = peci_pci_local_read(peci_dev, 1, dev, func, reg, &data);
> + if (ret)
> + return ret;
> +
> + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
> + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
> +
> + break;
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + peci_sensor_mark_updated(&priv->temp[dimm_no]);
> +
> + return 0;
> +}
> +
> +static int dimmtemp_read_string(struct device *dev,
> + enum hwmon_sensor_types type,
> + u32 attr, int channel, const char **str)
> +{
> + struct peci_dimmtemp *priv = dev_get_drvdata(dev);
> +
> + if (attr != hwmon_temp_label)
> + return -EOPNOTSUPP;
> +
> + *str = (const char *)priv->dimmtemp_label[channel];
> +
> + return 0;
> +}
> +
> +static int dimmtemp_read(struct device *dev, enum hwmon_sensor_types type,
> + u32 attr, int channel, long *val)
> +{
> + struct peci_dimmtemp *priv = dev_get_drvdata(dev);
> + int ret;
> +
> + ret = get_dimm_temp(priv, channel);
> + if (ret)
> + return ret;
> +
> + switch (attr) {
> + case hwmon_temp_input:
> + *val = priv->temp[channel].value;
> + break;
> + case hwmon_temp_max:
> + *val = priv->temp_max[channel];
> + break;
> + case hwmon_temp_crit:
> + *val = priv->temp_crit[channel];
> + break;
> + default:
> + return -EOPNOTSUPP;
> + }
> +
> + return 0;
> +}
> +
> +static umode_t dimmtemp_is_visible(const void *data, enum hwmon_sensor_types type,
> + u32 attr, int channel)
> +{
> + const struct peci_dimmtemp *priv = data;
> +
> + if (test_bit(channel, priv->dimm_mask))
> + return 0444;
> +
> + return 0;
> +}
> +
> +static const struct hwmon_ops peci_dimmtemp_ops = {
> + .is_visible = dimmtemp_is_visible,
> + .read_string = dimmtemp_read_string,
> + .read = dimmtemp_read,
> +};
> +
> +static int check_populated_dimms(struct peci_dimmtemp *priv)
> +{
> + int chan_rank_max = priv->gen_info->chan_rank_max;
> + int dimm_idx_max = priv->gen_info->dimm_idx_max;
> + int chan_rank, dimm_idx, ret;
> + u64 dimm_mask = 0;
> + u32 pcs;
> +
> + for (chan_rank = 0; chan_rank < chan_rank_max; chan_rank++) {
> + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank, &pcs);
> + if (ret) {
> + /*
> + * Overall, we expect either success or -EINVAL in
> + * order to determine whether DIMM is populated or not.
> + * For anything else - we fall back to defering the
> + * detection to be performed at a later point in time.
> + */
> + if (ret == -EINVAL)
> + continue;
> + else
else after continue is unnecessary.
> + return -EAGAIN;
> + }
> +
> + for (dimm_idx = 0; dimm_idx < dimm_idx_max; dimm_idx++)
> + if (__dimm_temp(pcs, dimm_idx))
> + dimm_mask |= BIT(chan_rank * dimm_idx_max + dimm_idx);
> + }
> + /*
> + * It's possible that memory training is not done yet. In this case we
> + * defer the detection to be performed at a later point in time.
> + */
> + if (!dimm_mask)
> + return -EAGAIN;
> +
> + dev_dbg(priv->dev, "Scanned populated DIMMs: %#llx\n", dimm_mask);
> +
> + bitmap_from_u64(priv->dimm_mask, dimm_mask);
> +
> + return 0;
> +}
> +
> +static int create_dimm_temp_label(struct peci_dimmtemp *priv, int chan)
> +{
> + int rank = chan / priv->gen_info->dimm_idx_max;
> + int idx = chan % priv->gen_info->dimm_idx_max;
> +
> + priv->dimmtemp_label[chan] = devm_kasprintf(priv->dev, GFP_KERNEL,
> + "DIMM %c%d", 'A' + rank,
> + idx + 1);
> + if (!priv->dimmtemp_label[chan])
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static const u32 peci_dimmtemp_temp_channel_config[] = {
> + [0 ... DIMM_NUMS_MAX - 1] = HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT,
> + 0
> +};
> +
> +static const struct hwmon_channel_info peci_dimmtemp_temp_channel = {
> + .type = hwmon_temp,
> + .config = peci_dimmtemp_temp_channel_config,
> +};
> +
> +static const struct hwmon_channel_info *peci_dimmtemp_temp_info[] = {
> + &peci_dimmtemp_temp_channel,
> + NULL
> +};
> +
> +static const struct hwmon_chip_info peci_dimmtemp_chip_info = {
> + .ops = &peci_dimmtemp_ops,
> + .info = peci_dimmtemp_temp_info,
> +};
> +
> +static int create_dimm_temp_info(struct peci_dimmtemp *priv)
> +{
> + int ret, i, channels;
> + struct device *dev;
> +
> + ret = check_populated_dimms(priv);
> + if (ret == -EAGAIN) {
The only error returned by check_populated_dimms() is -EAGAIN. Checking for
specifically this error here suggests that there may be other (ignored)
errors. The reader has to examine check_populated_dimms() to find out
that -EAGAIN is indeed the only possible error. To avoid confusion, please
only check for ret here.
> + if (priv->retry_count < DIMM_MASK_CHECK_RETRY_MAX) {
> + schedule_delayed_work(&priv->detect_work,
> + DIMM_MASK_CHECK_DELAY_JIFFIES);
> + priv->retry_count++;
> + dev_dbg(priv->dev, "Deferred populating DIMM temp info\n");
> + return ret;
> + }
> +
> + dev_info(priv->dev, "Timeout populating DIMM temp info\n");
If this returns an error, the message needs to be dev_err().
> + return -ETIMEDOUT;
> + }
> +
> + channels = priv->gen_info->chan_rank_max * priv->gen_info->dimm_idx_max;
> +
> + priv->dimmtemp_label = devm_kzalloc(priv->dev, channels * sizeof(char *), GFP_KERNEL);
> + if (!priv->dimmtemp_label)
> + return -ENOMEM;
> +
> + for_each_set_bit(i, priv->dimm_mask, DIMM_NUMS_MAX) {
> + ret = create_dimm_temp_label(priv, i);
> + if (ret)
> + return ret;
> + }
> +
> + dev = devm_hwmon_device_register_with_info(priv->dev, priv->name, priv,
> + &peci_dimmtemp_chip_info, NULL);
> + if (IS_ERR(dev)) {
> + dev_err(priv->dev, "Failed to register hwmon device\n");
> + return PTR_ERR(dev);
> + }
> +
> + dev_dbg(priv->dev, "%s: sensor '%s'\n", dev_name(dev), priv->name);
> +
> + return 0;
> +}
> +
> +static void create_dimm_temp_info_delayed(struct work_struct *work)
> +{
> + struct peci_dimmtemp *priv = container_of(to_delayed_work(work),
> + struct peci_dimmtemp,
> + detect_work);
> + int ret;
> +
> + ret = create_dimm_temp_info(priv);
> + if (ret && ret != -EAGAIN)
> + dev_dbg(priv->dev, "Failed to populate DIMM temp info\n");
> +}
> +
> +static int peci_dimmtemp_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
> +{
> + struct device *dev = &adev->dev;
> + struct peci_device *peci_dev = to_peci_device(dev->parent);
> + struct peci_dimmtemp *priv;
> + int ret;
> +
> + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> + if (!priv)
> + return -ENOMEM;
> +
> + priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_dimmtemp.cpu%d",
> + peci_dev->info.socket_id);
> + if (!priv->name)
> + return -ENOMEM;
> +
> + dev_set_drvdata(dev, priv);
> + priv->dev = dev;
> + priv->peci_dev = peci_dev;
> + priv->gen_info = (const struct dimm_info *)id->driver_data;
> +
> + INIT_DELAYED_WORK(&priv->detect_work, create_dimm_temp_info_delayed);
> +
> + ret = create_dimm_temp_info(priv);
> + if (ret && ret != -EAGAIN) {
> + dev_dbg(dev, "Failed to populate DIMM temp info\n");
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static void peci_dimmtemp_remove(struct auxiliary_device *adev)
> +{
> + struct peci_dimmtemp *priv = dev_get_drvdata(&adev->dev);
> +
> + cancel_delayed_work_sync(&priv->detect_work);
> +}
> +
> +static const struct dimm_info dimm_hsx = {
> + .chan_rank_max = CHAN_RANK_MAX_ON_HSX,
> + .dimm_idx_max = DIMM_IDX_MAX_ON_HSX,
> + .min_peci_revision = 0x30,
> +};
> +
> +static const struct dimm_info dimm_bdx = {
> + .chan_rank_max = CHAN_RANK_MAX_ON_BDX,
> + .dimm_idx_max = DIMM_IDX_MAX_ON_BDX,
> + .min_peci_revision = 0x30,
> +};
> +
> +static const struct dimm_info dimm_bdxd = {
> + .chan_rank_max = CHAN_RANK_MAX_ON_BDXD,
> + .dimm_idx_max = DIMM_IDX_MAX_ON_BDXD,
> + .min_peci_revision = 0x30,
> +};
> +
> +static const struct dimm_info dimm_skx = {
> + .chan_rank_max = CHAN_RANK_MAX_ON_SKX,
> + .dimm_idx_max = DIMM_IDX_MAX_ON_SKX,
> + .min_peci_revision = 0x30,
> +};
> +
> +static const struct dimm_info dimm_icx = {
> + .chan_rank_max = CHAN_RANK_MAX_ON_ICX,
> + .dimm_idx_max = DIMM_IDX_MAX_ON_ICX,
> + .min_peci_revision = 0x40,
> +};
> +
> +static const struct dimm_info dimm_icxd = {
> + .chan_rank_max = CHAN_RANK_MAX_ON_ICXD,
> + .dimm_idx_max = DIMM_IDX_MAX_ON_ICXD,
> + .min_peci_revision = 0x40,
> +};
> +
> +static const struct auxiliary_device_id peci_dimmtemp_ids[] = {
> + {
> + .name = "peci_cpu.dimmtemp.hsx",
> + .driver_data = (kernel_ulong_t)&dimm_hsx,
> + },
> + {
> + .name = "peci_cpu.dimmtemp.bdx",
> + .driver_data = (kernel_ulong_t)&dimm_bdx,
> + },
> + {
> + .name = "peci_cpu.dimmtemp.bdxd",
> + .driver_data = (kernel_ulong_t)&dimm_bdxd,
> + },
> + {
> + .name = "peci_cpu.dimmtemp.skx",
> + .driver_data = (kernel_ulong_t)&dimm_skx,
> + },
> + {
> + .name = "peci_cpu.dimmtemp.icx",
> + .driver_data = (kernel_ulong_t)&dimm_icx,
> + },
> + {
> + .name = "peci_cpu.dimmtemp.icxd",
> + .driver_data = (kernel_ulong_t)&dimm_icxd,
> + },
> + { }
> +};
> +MODULE_DEVICE_TABLE(auxiliary, peci_dimmtemp_ids);
> +
> +static struct auxiliary_driver peci_dimmtemp_driver = {
> + .probe = peci_dimmtemp_probe,
> + .remove = peci_dimmtemp_remove,
> + .id_table = peci_dimmtemp_ids,
> +};
> +
> +module_auxiliary_driver(peci_dimmtemp_driver);
> +
> +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
> +MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
> +MODULE_DESCRIPTION("PECI dimmtemp driver");
> +MODULE_LICENSE("GPL");
> +MODULE_IMPORT_NS(PECI_CPU);
On Thu, Jul 15, 2021 at 9:47 AM Winiarska, Iwona
<[email protected]> wrote:
>
> On Wed, 2021-07-14 at 16:54 +0000, Williams, Dan J wrote:
> > On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> > > Baseboard management controllers (BMC) often run Linux but are
> > > usually
> > > implemented with non-X86 processors. They can use PECI to access
> > > package
> > > config space (PCS) registers on the host CPU and since some
> > > information,
> > > e.g. figuring out the core count, can be obtained using different
> > > registers on different CPU generations, they need to decode the
> > > family
> > > and model.
> > >
> > > Move the data from arch/x86/include/asm/intel-family.h into a new
> > > file
> > > include/linux/x86/intel-family.h so that it can be used by other
> > > architectures.
> >
> > At least it would make the diffstat smaller to allow for rename
> > detection when the old file is deleted in the same patch:
> >
> > MAINTAINERS | 1 +
> > {arch/x86/include/asm => include/linux/x86}/intel-family.h | 6 +++---
> > 2 files changed, 4 insertions(+), 3 deletions(-)
> >
> > ...one thing people have done in the past is include a conversion
> > script in the changelog that produced the diff. That way if a
> > maintainer wants to be sure to catch any new usage of the header at
> > the old location they just run the script.
>
> You mean like a simple s#asm/intel-family.h#linux/x86/intel-family.h#g
> ?
> Operating on kernel tree? Or individual patches?
Whole kernel tree, something like this patch is a good example:
58c1a35cd522 btrfs: convert kmap to kmap_local_page, simple cases
In this one, Ira generated a patch from a script, and the maintainer
could re-run it if new development added more of the "old way" before
applying Ira's patch.
> Is including "old" header in new code that big of a deal?
I was proposing ripping the band-aid off and deleting the old header,
in which case it would cause compile breakage if someone added a new
instance of the old include before the conversion patch was applied.
> I guess it
> could break grepability (looking for users of the header, now that it
> can be pulled from two different places).
> It would be worse if someone decided to add new content to old header,
> but this should be easier to catch during review.
Having 2 potential places for the same definition causes a small
ongoing maintenance / review burden, so I vote moving the file rather
than leaving a place holder, but it's ultimately an x86 maintainer
call.
> Having 2 potential places for the same definition causes a small
> ongoing maintenance / review burden, so I vote moving the file rather
> than leaving a place holder, but it's ultimately an x86 maintainer
> call.
I thought the patch kept the old file as a stub with just one line:
#include <linux/x86/intel-family.h>
to grab the real data from the new location. So the information isn't
in two places.
$ git grep -l asm/intel-family.h | wc -l
53
Dang. We seem to love spraying model specific code all over the place :-(
My opinion is to post as Iwona wrote it ... but be prepared for the maintainers
to say "It's only 53 files ... just fix them all"
-Tony
On Thu, Jul 15, 2021 at 10:33 AM Winiarska, Iwona
<[email protected]> wrote:
>
> On Wed, 2021-07-14 at 16:51 +0000, Williams, Dan J wrote:
> > On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> > > Note: All changes to arch/x86 are contained within patches 01-02.
> >
> > Hi Iwona,
> >
> > One meta question first, who is this submission "To:"? Is there an
> > existing upstream maintainer path for OpenBMC changes? Are you
> > expecting contributions to this subsystem from others? While Greg
> > sometimes ends up as default maintainer for new stuff, I wonder if
> > someone from the OpenBMC commnuity should step up to fill this role?
> >
>
> The intention was to direct it to Greg, but I guess I didn't express
> that through the mail headers.
Usually something like a "Hey Greg, please consider applying..." in
the cover letter lets people know who the upstream path is for the
series.
> I am expecting contributions - for example there is at least one other
> major BMC vendor which also ships PECI controllers.
You're expecting to take patches from them and you'll forward them to
Greg, or they'll go to Greg directly?
>
> From my perspective, the pieces that make up a BMC are pretty loosely
> connected (at least from the kernel perspective - scattered all over
> the kernel tree), so I don't see how that would work in practice.
No worries, Greg continues to scale more than other mere mortals for
these kinds of things. I was more asking because it was not clear from
these patches, nor MAINTAINERS, and it's healthy for Linux to grow new
patch wranglers from time to time.
On Mon, 2021-07-12 at 22:02 -0700, Randy Dunlap wrote:
> On 7/12/21 3:04 PM, Iwona Winiarska wrote:
> > diff --git a/drivers/peci/controller/Kconfig
> > b/drivers/peci/controller/Kconfig
> > new file mode 100644
> > index 000000000000..8ddbe494677f
> > --- /dev/null
> > +++ b/drivers/peci/controller/Kconfig
> > @@ -0,0 +1,12 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config PECI_ASPEED
> > + tristate "ASPEED PECI support"
> > + depends on ARCH_ASPEED || COMPILE_TEST
> > + depends on OF
> > + depends on HAS_IOMEM
> > + help
> > + Enable this driver if you want to support ASPEED PECI
> > controller.
> > +
> > + This driver can be also build as a module. If so, the
> > module
>
> can also be built as a module.
Thank you Randy - I'll fix this in v2.
-Iwona
>
> > + will be called peci-aspeed.
>
>
> --
> ~Randy
>
On Tue, Jul 13, 2021 at 12:04:37AM +0200, Iwona Winiarska wrote:
> Add device tree bindings for the peci-aspeed controller driver.
>
> Co-developed-by: Jae Hyun Yoo <[email protected]>
> Signed-off-by: Jae Hyun Yoo <[email protected]>
> Signed-off-by: Iwona Winiarska <[email protected]>
> ---
> .../devicetree/bindings/peci/peci-aspeed.yaml | 111 ++++++++++++++++++
> 1 file changed, 111 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.yaml
>
> diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.yaml b/Documentation/devicetree/bindings/peci/peci-aspeed.yaml
> new file mode 100644
> index 000000000000..6061e06009fb
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/peci/peci-aspeed.yaml
> @@ -0,0 +1,111 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/peci/peci-aspeed.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Aspeed PECI Bus Device Tree Bindings
> +
> +maintainers:
> + - Iwona Winiarska <[email protected]>
> + - Jae Hyun Yoo <[email protected]>
> +
> +allOf:
> + - $ref: peci-controller.yaml#
> +
> +properties:
> + compatible:
> + enum:
> + - aspeed,ast2400-peci
> + - aspeed,ast2500-peci
> + - aspeed,ast2600-peci
> +
> + reg:
> + maxItems: 1
> +
> + interrupts:
> + maxItems: 1
> +
> + clocks:
> + description: |
> + Clock source for PECI controller. Should reference the external
> + oscillator clock.
> + maxItems: 1
> +
> + resets:
> + maxItems: 1
> +
> + clock-divider:
> + description: This value determines PECI controller internal clock
> + dividing rate. The divider will be calculated as 2 raised to the
> + power of the given value.
> + $ref: /schemas/types.yaml#/definitions/uint32
> + minimum: 0
> + maximum: 7
> + default: 0
> +
> + msg-timing:
> + description: |
> + Message timing negotiation period. This value will determine the period
> + of message timing negotiation to be issued by PECI controller. The unit
> + of the programmed value is four times of PECI clock period.
> + $ref: /schemas/types.yaml#/definitions/uint32
> + minimum: 0
> + maximum: 255
> + default: 1
> +
> + addr-timing:
> + description: |
> + Address timing negotiation period. This value will determine the period
> + of address timing negotiation to be issued by PECI controller. The unit
> + of the programmed value is four times of PECI clock period.
> + $ref: /schemas/types.yaml#/definitions/uint32
> + minimum: 0
> + maximum: 255
> + default: 1
> +
> + rd-sampling-point:
> + description: |
> + Read sampling point selection. The whole period of a bit time will be
> + divided into 16 time frames. This value will determine the time frame
> + in which the controller will sample PECI signal for data read back.
> + Usually in the middle of a bit time is the best.
> + $ref: /schemas/types.yaml#/definitions/uint32
> + minimum: 0
> + maximum: 15
> + default: 8
> +
> + cmd-timeout-ms:
> + description: |
> + Command timeout in units of ms.
> + $ref: /schemas/types.yaml#/definitions/uint32
> + minimum: 1
> + maximum: 1000
> + default: 1000
Are all of these properties common for PECI or specific to this
controller? The former needs to go into the common schema. The latter
need vendor prefixes.
> +
> +required:
> + - compatible
> + - reg
> + - interrupts
> + - clocks
> + - resets
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + #include <dt-bindings/interrupt-controller/arm-gic.h>
> + #include <dt-bindings/clock/ast2600-clock.h>
> + peci-controller@1e78b000 {
> + compatible = "aspeed,ast2600-peci";
> + reg = <0x1e78b000 0x100>;
> + interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
> + clocks = <&syscon ASPEED_CLK_GATE_REF0CLK>;
> + resets = <&syscon ASPEED_RESET_PECI>;
> + clock-divider = <0>;
> + msg-timing = <1>;
> + addr-timing = <1>;
> + rd-sampling-point = <8>;
> + cmd-timeout-ms = <1000>;
> + };
> +...
> --
> 2.31.1
>
>
On Wed, 2021-07-14 at 17:19 +0000, Williams, Dan J wrote:
> On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> > Intel processors provide access for various services designed to support
> > processor and DRAM thermal management, platform manageability and
> > processor interface tuning and diagnostics.
> > Those services are available via the Platform Environment Control
> > Interface (PECI) that provides a communication channel between the
> > processor and the Baseboard Management Controller (BMC) or other
> > platform management device.
> >
> > This change introduces PECI subsystem by adding the initial core module
> > and API for controller drivers.
> >
> > Co-developed-by: Jason M Bills <[email protected]>
> > Signed-off-by: Jason M Bills <[email protected]>
> > Co-developed-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> > MAINTAINERS | 9 +++
> > drivers/Kconfig | 3 +
> > drivers/Makefile | 1 +
> > drivers/peci/Kconfig | 14 ++++
> > drivers/peci/Makefile | 5 ++
> > drivers/peci/core.c | 166 ++++++++++++++++++++++++++++++++++++++++
> > drivers/peci/internal.h | 20 +++++
> > drivers/peci/sysfs.c | 48 ++++++++++++
> > include/linux/peci.h | 82 ++++++++++++++++++++
> > 9 files changed, 348 insertions(+)
> > create mode 100644 drivers/peci/Kconfig
> > create mode 100644 drivers/peci/Makefile
> > create mode 100644 drivers/peci/core.c
> > create mode 100644 drivers/peci/internal.h
> > create mode 100644 drivers/peci/sysfs.c
> > create mode 100644 include/linux/peci.h
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 6f77aaca2a30..47411e2b6336 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -14495,6 +14495,15 @@ L: [email protected]
> > S: Maintained
> > F: drivers/platform/x86/peaq-wmi.c
> >
> > +PECI SUBSYSTEM
> > +M: Iwona Winiarska <[email protected]>
> > +R: Jae Hyun Yoo <[email protected]>
> > +L: [email protected] (moderated for non-subscribers)
> > +S: Supported
> > +F: Documentation/devicetree/bindings/peci/
> > +F: drivers/peci/
> > +F: include/linux/peci.h
> > +
> > PENSANDO ETHERNET DRIVERS
> > M: Shannon Nelson <[email protected]>
> > M: [email protected]
> > diff --git a/drivers/Kconfig b/drivers/Kconfig
> > index 8bad63417a50..f472b3d972b3 100644
> > --- a/drivers/Kconfig
> > +++ b/drivers/Kconfig
> > @@ -236,4 +236,7 @@ source "drivers/interconnect/Kconfig"
> > source "drivers/counter/Kconfig"
> >
> > source "drivers/most/Kconfig"
> > +
> > +source "drivers/peci/Kconfig"
> > +
> > endmenu
> > diff --git a/drivers/Makefile b/drivers/Makefile
> > index 27c018bdf4de..8d96f0c3dde5 100644
> > --- a/drivers/Makefile
> > +++ b/drivers/Makefile
> > @@ -189,3 +189,4 @@ obj-$(CONFIG_GNSS) += gnss/
> > obj-$(CONFIG_INTERCONNECT) += interconnect/
> > obj-$(CONFIG_COUNTER) += counter/
> > obj-$(CONFIG_MOST) += most/
> > +obj-$(CONFIG_PECI) += peci/
> > diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
> > new file mode 100644
> > index 000000000000..601cc3c3c852
> > --- /dev/null
> > +++ b/drivers/peci/Kconfig
> > @@ -0,0 +1,14 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +menuconfig PECI
> > + tristate "PECI support"
> > + help
> > + The Platform Environment Control Interface (PECI) is an interface
> > + that provides a communication channel to Intel processors and
> > + chipset components from external monitoring or control devices.
> > +
> > + If you want PECI support, you should say Y here and also to the
> > + specific driver for your bus adapter(s) below.
>
> The user is reading this help text to decide if they want PECI
> support, so clarifying that if they want PECI support they should turn
> it on is not all that helpful. I would say "If you are building a
> kernel for a Board Management Controller (BMC) say Y. If unsure say
> N".
Since PECI is only available on Intel platforms, perhaps something
like:
"If you are building a Board Management Controller (BMC) kernel for
Intel platform say Y"?
>
> > +
> > + This support is also available as a module. If so, the module
> > + will be called peci.
> > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
> > new file mode 100644
> > index 000000000000..2bb2f51bcda7
> > --- /dev/null
> > +++ b/drivers/peci/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +# Core functionality
> > +peci-y := core.o sysfs.o
> > +obj-$(CONFIG_PECI) += peci.o
> > diff --git a/drivers/peci/core.c b/drivers/peci/core.c
> > new file mode 100644
> > index 000000000000..0ad00110459d
> > --- /dev/null
> > +++ b/drivers/peci/core.c
> > @@ -0,0 +1,166 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2018-2021 Intel Corporation
> > +
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/bug.h>
> > +#include <linux/device.h>
> > +#include <linux/export.h>
> > +#include <linux/idr.h>
> > +#include <linux/module.h>
> > +#include <linux/of.h>
> > +#include <linux/peci.h>
> > +#include <linux/pm_runtime.h>
> > +#include <linux/property.h>
> > +#include <linux/slab.h>
> > +
> > +#include "internal.h"
> > +
> > +static DEFINE_IDA(peci_controller_ida);
> > +
> > +static void peci_controller_dev_release(struct device *dev)
> > +{
> > + struct peci_controller *controller = to_peci_controller(dev);
> > +
> > + mutex_destroy(&controller->bus_lock);
> > +}
> > +
> > +struct device_type peci_controller_type = {
> > + .release = peci_controller_dev_release,
> > +};
>
> I have not read further than patch 6 in this set, so I'm hoping there
> is an explanation for this. As it stands it looks like a red flag that
> the release function is not actually releasing anything?
>
Ok, that's related to other comments here and in patch 7. I'll try to
refactor this. I'm thinking about splitting the "controller_add" into
separate "alloc" and "add" (or init? register?). And perhaps integrate
that into devm, so that controller can be allocated using devres, tying
that into lifetime of underlying platform device.
> > +
> > +int peci_controller_scan_devices(struct peci_controller *controller)
> > +{
> > + /* Just a stub, no support for actual devices yet */
> > + return 0;
> > +}
>
> Move this to the patch where it is needed.
It's used in this patch (in sysfs and controller add), but at this
point we haven't introduced devices yet.
I would have to move this to patch 8 - but I don't think it belongs
there.
Will it make more sense if I introduce sysfs documentation here?
Or as a completely separate patch?
I wanted to avoid going too far with split granularity, and just go
with high-level concepts starting with the controller.
>
> > +
> > +/**
> > + * peci_controller_add() - Add PECI controller
> > + * @controller: the PECI controller to be added
> > + * @parent: device object to be registered as a parent
> > + *
> > + * In final stage of its probe(), peci_controller driver should include calling
>
> s/should include calling/calls/
>
Ok.
> > + * peci_controller_add() to register itself with the PECI bus.
> > + * The caller is responsible for allocating the struct
> > peci_controller and
> > + * managing its lifetime, calling peci_controller_remove() prior
> > to releasing
> > + * the allocation.
> > + *
> > + * It returns zero on success, else a negative error code
> > (dropping the
> > + * controller's refcount). After a successful return, the caller
> > is responsible
> > + * for calling peci_controller_remove().
> > + *
> > + * Return: 0 if succeeded, other values in case errors.
> > + */
> > +int peci_controller_add(struct peci_controller *controller, struct
> > device *parent)
> > +{
> > + struct fwnode_handle *node =
> > fwnode_handle_get(dev_fwnode(parent));
> > + int ret;
> > +
> > + if (WARN_ON(!controller->xfer))
>
> Why WARN()? What is 'xfer', and what is likelihood the caller forgets
> to set it? For something critical like this the WARN is likely
> overkill.
>
Very unlikely - 'xfer' provides "connection" with hardware so it's
rather mandatory.
It indicates programmer error, so WARN() with all its consequences
(taint and so on) seemed adequate.
Do you suggest to downgrade it to pr_err()?
> > + return -EINVAL;
> > +
> > + ret = ida_alloc_max(&peci_controller_ida, U8_MAX,
> > GFP_KERNEL);
>
> An '_add' function should just add, this seems to be doing more
> "alloc". Speaking of which is there a peci_controller_alloc()?
>
Please see my previous comment (I'll try to refactor this).
>
> > + if (ret < 0)
> > + return ret;
> > +
> > + controller->id = ret;
> > +
> > + mutex_init(&controller->bus_lock);
> > +
> > + controller->dev.parent = parent;
> > + controller->dev.bus = &peci_bus_type;
> > + controller->dev.type = &peci_controller_type;
> > + controller->dev.fwnode = node;
> > + controller->dev.of_node = to_of_node(node);
> > +
> > + ret = dev_set_name(&controller->dev, "peci-%d", controller-
> > >id);
> > + if (ret)
> > + goto err_id;
> > +
> > + ret = device_register(&controller->dev);
> > + if (ret)
> > + goto err_put;
> > +
> > + pm_runtime_no_callbacks(&controller->dev);
> > + pm_suspend_ignore_children(&controller->dev, true);
> > + pm_runtime_enable(&controller->dev);
> > +
> > + /*
> > + * Ignoring retval since failures during scan are non-
> > critical for
> > + * controller itself.
> > + */
> > + peci_controller_scan_devices(controller);
> > +
> > + return 0;
> > +
> > +err_put:
> > + put_device(&controller->dev);
> > +err_id:
> > + fwnode_handle_put(controller->dev.fwnode);
> > + ida_free(&peci_controller_ida, controller->id);
>
> I'd expect these to be released by ->release().
>
Ack.
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
>
> I think it's cleaner to declare symbol namespaces in the Makefile. In
> this case, add:
>
> cflags-y += -DDEFAULT_SYMBOL_NAMESPACE=PECI
>
> ...and just use EXPORT_SYMBOL_GPL as normal in the C file.
>
I kind of prefer the more verbose EXPORT_SYMBOL_NS_GPL - it also
doesn't "hide" the fact that we're using namespaces (everything is in
the C file rather than mixed into Makefile), but it's not a strong
opinion, so sure - I can change this.
> > +
> > +static int _unregister(struct device *dev, void *dummy)
> > +{
> > + /* Just a stub, no support for actual devices yet */
>
> At least for me, I think it wastes review time to consider empty
> stubs. Just add the
> whole thing back when it's actually used so it can be reviewed
> properly for suitability.
Just like with peci_controller_scan_devices - logically it belongs to
the controller, and is used by the controller, it's just that the
devices will be added later in the series.
>
> > + return 0;
> > +}
> > +
> > +/**
> > + * peci_controller_remove - Delete PECI controller
> > + * @controller: the PECI controller to be removed
> > + *
> > + * This call is used only by PECI controller drivers, which are
> > the only ones
> > + * directly touching chip registers.
> > + *
> > + * Note that this function also drops a reference to the
> > controller.
> > + */
> > +void peci_controller_remove(struct peci_controller *controller)
> > +{
> > + pm_runtime_disable(&controller->dev);
> > + /*
> > + * Detach any active PECI devices. This can't fail, thus we
> > do not
> > + * check the returned value.
> > + */
> > + device_for_each_child_reverse(&controller->dev, NULL,
> > _unregister);
>
> How does the peci_controller_remove() get called with children still
> beneath it? Can that possibility be precluded by arranging for
> children to be removed first?
When we're unbinding the controller driver from its backing device (or
just removing the module) with children devices still present in the
system.
Yes, it could be precluded, but I don't think we should prevent this
(forcing the user to manually remove all the children devices first).
>
> For example, given peci_controller_add is called from another's
> driver
> probe routine, this unregistration could be handled by a devm action.
>
Ok, I think this should just fall into place naturally after alloc/init
gets split.
>
> > +
> > + device_unregister(&controller->dev);
> > + fwnode_handle_put(controller->dev.fwnode);
> > + ida_free(&peci_controller_ida, controller->id);
>
> Another open coded copy of release code that belongs in ->release()?
>
Ack.
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_controller_remove, PECI);
> > +
> > +struct bus_type peci_bus_type = {
> > + .name = "peci",
> > + .bus_groups = peci_bus_groups,
> > +};
> > +
> > +static int __init peci_init(void)
> > +{
> > + int ret;
> > +
> > + ret = bus_register(&peci_bus_type);
> > + if (ret < 0) {
> > + pr_err("failed to register PECI bus type!\n");
> > + return ret;
> > + }
> > +
> > + return 0;
> > +}
> > +subsys_initcall(peci_init);
>
> You can't have subsys_initcall in a module. If you actually need
> subsys_initcall then this can't be a module. Are you sure this can't
> be module_init()?
>
Sure, I'll fix this in v2.
> > +
> > +static void __exit peci_exit(void)
> > +{
> > + bus_unregister(&peci_bus_type);
> > +}
> > +module_exit(peci_exit);
> > +
> > +MODULE_AUTHOR("Jason M Bills <[email protected]>");
> > +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
> > +MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
>
> Is MAINTAINERS sufficient? Do you all want to be contacted by end
> users, or just kernel developers. If it's the former then keep this,
> if it's the latter then MAINTAINERS is sufficient.
>
It's the former.
> > +MODULE_DESCRIPTION("PECI bus core module");
> > +MODULE_LICENSE("GPL");
> > diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
> > new file mode 100644
> > index 000000000000..80c61bcdfc6b
> > --- /dev/null
> > +++ b/drivers/peci/internal.h
> > @@ -0,0 +1,20 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright (c) 2018-2021 Intel Corporation */
> > +
> > +#ifndef __PECI_INTERNAL_H
> > +#define __PECI_INTERNAL_H
> > +
> > +#include <linux/device.h>
> > +#include <linux/types.h>
> > +
> > +struct peci_controller;
> > +struct attribute_group;
> > +
> > +extern struct bus_type peci_bus_type;
> > +extern const struct attribute_group *peci_bus_groups[];
> > +
> > +extern struct device_type peci_controller_type;
> > +
> > +int peci_controller_scan_devices(struct peci_controller
> > *controller);
> > +
> > +#endif /* __PECI_INTERNAL_H */
> > diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c
> > new file mode 100644
> > index 000000000000..36c5e2a18a92
> > --- /dev/null
> > +++ b/drivers/peci/sysfs.c
> > @@ -0,0 +1,48 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2021 Intel Corporation
> > +
> > +#include <linux/peci.h>
> > +
> > +#include "internal.h"
> > +
> > +static int rescan_controller(struct device *dev, void *data)
> > +{
> > + if (dev->type != &peci_controller_type)
> > + return 0;
> > +
> > + return
> > peci_controller_scan_devices(to_peci_controller(dev));
> > +}
> > +
> > +static ssize_t rescan_store(struct bus_type *bus, const char *buf,
> > size_t count)
> > +{
> > + bool res;
> > + int ret;
> > +
> > + ret = kstrtobool(buf, &res);
> > + if (ret)
> > + return ret;
> > +
> > + if (!res)
> > + return count;
> > +
> > + ret = bus_for_each_dev(&peci_bus_type, NULL, NULL,
> > rescan_controller);
> > + if (ret)
> > + return ret;
> > +
> > + return count;
> > +}
> > +static BUS_ATTR_WO(rescan);
>
> No Documentation/ABI entry for this attribute, which means I'm not
> sure if it's suitable because it's unreviewable what it actually does
> reviewing this patch as a standalone.
>
We're expecting to use "rescan" in the similar way as it is used for
PCIe or USB.
BMC can boot up when the system is still in S5 (without any guarantee
that it will ever change this state - the user can never turn the
platform on :) ). If the controller is loaded and the platform allows
it to discover devices - great (the scan happens as last step of
controller_add), if not - userspace can use rescan.
I'll add documentation in v2.
> > +
> > +static struct attribute *peci_bus_attrs[] = {
> > + &bus_attr_rescan.attr,
> > + NULL
> > +};
> > +
> > +static const struct attribute_group peci_bus_group = {
> > + .attrs = peci_bus_attrs,
> > +};
> > +
> > +const struct attribute_group *peci_bus_groups[] = {
> > + &peci_bus_group,
> > + NULL
> > +};
> > diff --git a/include/linux/peci.h b/include/linux/peci.h
> > new file mode 100644
> > index 000000000000..cdf3008321fd
> > --- /dev/null
> > +++ b/include/linux/peci.h
> > @@ -0,0 +1,82 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright (c) 2018-2021 Intel Corporation */
> > +
> > +#ifndef __LINUX_PECI_H
> > +#define __LINUX_PECI_H
> > +
> > +#include <linux/device.h>
> > +#include <linux/kernel.h>
> > +#include <linux/mutex.h>
> > +#include <linux/types.h>
> > +
> > +struct peci_request;
> > +
> > +/**
> > + * struct peci_controller - PECI controller
> > + * @dev: device object to register PECI controller to the device
> > model
> > + * @xfer: PECI transfer function
> > + * @bus_lock: lock used to protect multiple callers
> > + * @id: PECI controller ID
> > + *
> > + * PECI controllers usually connect to their drivers using non-
> > PECI bus,
> > + * such as the platform bus.
> > + * Each PECI controller can communicate with one or more PECI
> > devices.
> > + */
> > +struct peci_controller {
> > + struct device dev;
> > + int (*xfer)(struct peci_controller *controller, u8 addr,
> > struct peci_request *req);
>
> Each device will have a different way to do a PECI transfer?
>
> I thought PECI was a standard...
>
The "standard" part only applies to the connection between the
controller and the devices - not the connection between controller and
the rest of the system on which the controller resides in.
xfer is vendor specific.
> > + struct mutex bus_lock; /* held for the duration of xfer */
>
> What is it actually locking? For example, there is a mantra that goes
> "lock data, not code", and this comment seems to imply that no
> specific
> data is being locked.
>
PECI-wire interface requires that the response follows the request -
and that should hold for all devices behind a given controller.
In other words, assuming that we have two devices, d1 and d2, we need
to have: d1.req, d1.resp, d2.req, d2.resp. Single xfer takes care of
both request and response.
I would like to eventually move that lock into individual controllers,
but before that happens - I'd like to have a reasoning behind it.
If we have interfaces that allow us to decouple requests from responses
or devices that can handle servicing more than one requests at a time,
the lock will go away from peci-core.
>
> > + u8 id;
>
> No possible way to have more than 256 controllers per system?
>
For real world scenarios - I expect single digit number of controllers
per system. The boards with HW compatible with "aspeed,ast2xxx-peci"
contain just one instance of this controller.
I expect more in the future (e.g. different "physical" transport), but
definitely not more than 256 per system.
> > +};
> > +
> > +int peci_controller_add(struct peci_controller *controller, struct
> > device *parent);
> > +void peci_controller_remove(struct peci_controller *controller);
> > +
> > +static inline struct peci_controller *to_peci_controller(void *d)
> > +{
> > + return container_of(d, struct peci_controller, dev);
> > +}
> > +
> > +/**
> > + * struct peci_device - PECI device
> > + * @dev: device object to register PECI device to the device model
> > + * @controller: manages the bus segment hosting this PECI device
> > + * @addr: address used on the PECI bus connected to the parent
> > controller
> > + *
> > + * A peci_device identifies a single device (i.e. CPU) connected
> > to a PECI bus.
> > + * The behaviour exposed to the rest of the system is defined by
> > the PECI driver
> > + * managing the device.
> > + */
> > +struct peci_device {
> > + struct device dev;
> > + struct peci_controller *controller;
>
> Is the device a child of the controller? If yes, then no need for a a
> separate pointer vs "to_peci_controller(peci_dev->parent)"
>
Yeah, it's redundant - I'll remove it.
Thank you
-Iwona
On Wed, 2021-07-14 at 17:39 +0000, Williams, Dan J wrote:
> On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> > From: Jae Hyun Yoo <[email protected]>
> >
> > ASPEED AST24xx/AST25xx/AST26xx SoCs supports the PECI electrical
> > interface (a.k.a PECI wire).
> >
> > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > Co-developed-by: Iwona Winiarska <[email protected]>
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> > MAINTAINERS | 9 +
> > drivers/peci/Kconfig | 6 +
> > drivers/peci/Makefile | 3 +
> > drivers/peci/controller/Kconfig | 12 +
> > drivers/peci/controller/Makefile | 3 +
> > drivers/peci/controller/peci-aspeed.c | 501 ++++++++++++++++++++++++++
> > 6 files changed, 534 insertions(+)
> > create mode 100644 drivers/peci/controller/Kconfig
> > create mode 100644 drivers/peci/controller/Makefile
> > create mode 100644 drivers/peci/controller/peci-aspeed.c
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 47411e2b6336..4ba874afa2fa 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2865,6 +2865,15 @@ S: Maintained
> > F: Documentation/hwmon/asc7621.rst
> > F: drivers/hwmon/asc7621.c
> >
> > +ASPEED PECI CONTROLLER
> > +M: Iwona Winiarska <[email protected]>
> > +M: Jae Hyun Yoo <[email protected]>
> > +L: [email protected] (moderated for non-subscribers)
> > +L: [email protected] (moderated for non-subscribers)
> > +S: Supported
> > +F: Documentation/devicetree/bindings/peci/peci-aspeed.yaml
> > +F: drivers/peci/controller/peci-aspeed.c
> > +
> > ASPEED PINCTRL DRIVERS
> > M: Andrew Jeffery <[email protected]>
> > L: [email protected] (moderated for non-subscribers)
> > diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
> > index 601cc3c3c852..0d0ee8009713 100644
> > --- a/drivers/peci/Kconfig
> > +++ b/drivers/peci/Kconfig
> > @@ -12,3 +12,9 @@ menuconfig PECI
> >
> > This support is also available as a module. If so, the module
> > will be called peci.
> > +
> > +if PECI
> > +
> > +source "drivers/peci/controller/Kconfig"
> > +
> > +endif # PECI
> > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
> > index 2bb2f51bcda7..621a993e306a 100644
> > --- a/drivers/peci/Makefile
> > +++ b/drivers/peci/Makefile
> > @@ -3,3 +3,6 @@
> > # Core functionality
> > peci-y := core.o sysfs.o
> > obj-$(CONFIG_PECI) += peci.o
> > +
> > +# Hardware specific bus drivers
> > +obj-y += controller/
> > diff --git a/drivers/peci/controller/Kconfig b/drivers/peci/controller/Kconfig
> > new file mode 100644
> > index 000000000000..8ddbe494677f
> > --- /dev/null
> > +++ b/drivers/peci/controller/Kconfig
> > @@ -0,0 +1,12 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config PECI_ASPEED
> > + tristate "ASPEED PECI support"
> > + depends on ARCH_ASPEED || COMPILE_TEST
> > + depends on OF
> > + depends on HAS_IOMEM
> > + help
> > + Enable this driver if you want to support ASPEED PECI controller.
>
> Perhaps a note about how one might make this determination, or maybe a
> general recommendation that if they are building for deployment on an
> OpenBMC system say Y else say N?
Ack.
>
> > +
> > + This driver can be also build as a module. If so, the module
> > + will be called peci-aspeed.
> > diff --git a/drivers/peci/controller/Makefile b/drivers/peci/controller/Makefile
> > new file mode 100644
> > index 000000000000..022c28ef1bf0
> > --- /dev/null
> > +++ b/drivers/peci/controller/Makefile
> > @@ -0,0 +1,3 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +obj-$(CONFIG_PECI_ASPEED) += peci-aspeed.o
> > diff --git a/drivers/peci/controller/peci-aspeed.c b/drivers/peci/controller/peci-aspeed.c
> > new file mode 100644
> > index 000000000000..888b46383ea4
> > --- /dev/null
> > +++ b/drivers/peci/controller/peci-aspeed.c
> > @@ -0,0 +1,501 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (C) 2012-2017 ASPEED Technology Inc.
> > +// Copyright (c) 2018-2021 Intel Corporation
> > +
> > +#include <linux/bitfield.h>
> > +#include <linux/clk.h>
> > +#include <linux/delay.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/io.h>
> > +#include <linux/iopoll.h>
> > +#include <linux/jiffies.h>
> > +#include <linux/module.h>
> > +#include <linux/of.h>
> > +#include <linux/peci.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/reset.h>
> > +
> > +#include <asm/unaligned.h>
> > +
> > +/* ASPEED PECI Registers */
> > +/* Control Register */
> > +#define ASPEED_PECI_CTRL 0x00
> > +#define ASPEED_PECI_CTRL_SAMPLING_MASK GENMASK(19, 16)
> > +#define ASPEED_PECI_CTRL_READ_MODE_MASK GENMASK(13, 12)
> > +#define ASPEED_PECI_CTRL_READ_MODE_COUNT BIT(12)
> > +#define ASPEED_PECI_CTRL_READ_MODE_DBG BIT(13)
> > +#define ASPEED_PECI_CTRL_CLK_SOURCE_MASK BIT(11)
> > +#define ASPEED_PECI_CTRL_CLK_DIV_MASK GENMASK(10, 8)
> > +#define ASPEED_PECI_CTRL_INVERT_OUT BIT(7)
> > +#define ASPEED_PECI_CTRL_INVERT_IN BIT(6)
> > +#define ASPEED_PECI_CTRL_BUS_CONTENT_EN BIT(5)
> > +#define ASPEED_PECI_CTRL_PECI_EN BIT(4)
> > +#define ASPEED_PECI_CTRL_PECI_CLK_EN BIT(0)
> > +
> > +/* Timing Negotiation Register */
> > +#define ASPEED_PECI_TIMING_NEGOTIATION 0x04
> > +#define ASPEED_PECI_TIMING_MESSAGE_MASK GENMASK(15, 8)
> > +#define ASPEED_PECI_TIMING_ADDRESS_MASK GENMASK(7, 0)
> > +
> > +/* Command Register */
> > +#define ASPEED_PECI_CMD 0x08
> > +#define ASPEED_PECI_CMD_PIN_MON BIT(31)
> > +#define ASPEED_PECI_CMD_STS_MASK GENMASK(27, 24)
> > +#define ASPEED_PECI_CMD_STS_ADDR_T_NEGO 0x3
> > +#define ASPEED_PECI_CMD_IDLE_MASK \
> > + (ASPEED_PECI_CMD_STS_MASK | ASPEED_PECI_CMD_PIN_MON)
> > +#define ASPEED_PECI_CMD_FIRE BIT(0)
> > +
> > +/* Read/Write Length Register */
> > +#define ASPEED_PECI_RW_LENGTH 0x0c
> > +#define ASPEED_PECI_AW_FCS_EN BIT(31)
> > +#define ASPEED_PECI_READ_LEN_MASK GENMASK(23, 16)
> > +#define ASPEED_PECI_WRITE_LEN_MASK GENMASK(15, 8)
> > +#define ASPEED_PECI_TAGET_ADDR_MASK GENMASK(7, 0)
> > +
> > +/* Expected FCS Data Register */
> > +#define ASPEED_PECI_EXP_FCS 0x10
> > +#define ASPEED_PECI_EXP_READ_FCS_MASK GENMASK(23, 16)
> > +#define ASPEED_PECI_EXP_AW_FCS_AUTO_MASK GENMASK(15, 8)
> > +#define ASPEED_PECI_EXP_WRITE_FCS_MASK GENMASK(7, 0)
> > +
> > +/* Captured FCS Data Register */
> > +#define ASPEED_PECI_CAP_FCS 0x14
> > +#define ASPEED_PECI_CAP_READ_FCS_MASK GENMASK(23, 16)
> > +#define ASPEED_PECI_CAP_WRITE_FCS_MASK GENMASK(7, 0)
> > +
> > +/* Interrupt Register */
> > +#define ASPEED_PECI_INT_CTRL 0x18
> > +#define ASPEED_PECI_TIMING_NEGO_SEL_MASK GENMASK(31, 30)
> > +#define ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO 0
> > +#define ASPEED_PECI_2ND_BIT_OF_ADDR_NEGO 1
> > +#define ASPEED_PECI_MESSAGE_NEGO 2
> > +#define ASPEED_PECI_INT_MASK GENMASK(4, 0)
> > +#define ASPEED_PECI_INT_BUS_TIMEOUT BIT(4)
> > +#define ASPEED_PECI_INT_BUS_CONNECT BIT(3)
> > +#define ASPEED_PECI_INT_W_FCS_BAD BIT(2)
> > +#define ASPEED_PECI_INT_W_FCS_ABORT BIT(1)
> > +#define ASPEED_PECI_INT_CMD_DONE BIT(0)
> > +
> > +/* Interrupt Status Register */
> > +#define ASPEED_PECI_INT_STS 0x1c
> > +#define ASPEED_PECI_INT_TIMING_RESULT_MASK GENMASK(29, 16)
> > + /* bits[4..0]: Same bit fields in the 'Interrupt Register' */
> > +
> > +/* Rx/Tx Data Buffer Registers */
> > +#define ASPEED_PECI_W_DATA0 0x20
> > +#define ASPEED_PECI_W_DATA1 0x24
> > +#define ASPEED_PECI_W_DATA2 0x28
> > +#define ASPEED_PECI_W_DATA3 0x2c
> > +#define ASPEED_PECI_R_DATA0 0x30
> > +#define ASPEED_PECI_R_DATA1 0x34
> > +#define ASPEED_PECI_R_DATA2 0x38
> > +#define ASPEED_PECI_R_DATA3 0x3c
> > +#define ASPEED_PECI_W_DATA4 0x40
> > +#define ASPEED_PECI_W_DATA5 0x44
> > +#define ASPEED_PECI_W_DATA6 0x48
> > +#define ASPEED_PECI_W_DATA7 0x4c
> > +#define ASPEED_PECI_R_DATA4 0x50
> > +#define ASPEED_PECI_R_DATA5 0x54
> > +#define ASPEED_PECI_R_DATA6 0x58
> > +#define ASPEED_PECI_R_DATA7 0x5c
> > +#define ASPEED_PECI_DATA_BUF_SIZE_MAX 32
> > +
> > +/* Timing Negotiation */
> > +#define ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT 8
> > +#define ASPEED_PECI_RD_SAMPLING_POINT_MAX (BIT(4) - 1)
> > +#define ASPEED_PECI_CLK_DIV_DEFAULT 0
> > +#define ASPEED_PECI_CLK_DIV_MAX (BIT(3) - 1)
> > +#define ASPEED_PECI_MSG_TIMING_DEFAULT 1
> > +#define ASPEED_PECI_MSG_TIMING_MAX (BIT(8) - 1)
> > +#define ASPEED_PECI_ADDR_TIMING_DEFAULT 1
> > +#define ASPEED_PECI_ADDR_TIMING_MAX (BIT(8) - 1)
> > +
> > +/* Timeout */
> > +#define ASPEED_PECI_IDLE_CHECK_TIMEOUT_US (50 * USEC_PER_MSEC)
> > +#define ASPEED_PECI_IDLE_CHECK_INTERVAL_US (10 * USEC_PER_MSEC)
> > +#define ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT (1000)
> > +#define ASPEED_PECI_CMD_TIMEOUT_MS_MAX (1000)
> > +
> > +struct aspeed_peci {
> > + struct peci_controller controller;
>
> Uh oh... this looks like a driver private data structure, and I know
> there's a 'struct device' allocated in @controller. /me goes to check
> ->probe()...
>
> > + struct device *dev;
> > + void __iomem *base;
> > + struct clk *clk;
> > + struct reset_control *rst;
> > + int irq;
> > + spinlock_t lock; /* to sync completion status handling */
> > + struct completion xfer_complete;
> > + u32 status;
> > + u32 cmd_timeout_ms;
> > + u32 msg_timing;
> > + u32 addr_timing;
> > + u32 rd_sampling_point;
> > + u32 clk_div;
> > +};
> > +
> > +static inline struct aspeed_peci *to_aspeed_peci(struct peci_controller *a)
> > +{
> > + return container_of(a, struct aspeed_peci, controller);
> > +}
> > +
> > +static void aspeed_peci_init_regs(struct aspeed_peci *priv)
> > +{
> > + u32 val;
> > +
> > + val = FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, ASPEED_PECI_CLK_DIV_DEFAULT);
> > + val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
> > + writel(val, priv->base + ASPEED_PECI_CTRL);
> > + /*
> > + * Timing negotiation period setting.
> > + * The unit of the programmed value is 4 times of PECI clock period.
> > + */
> > + val = FIELD_PREP(ASPEED_PECI_TIMING_MESSAGE_MASK, priv->msg_timing);
> > + val |= FIELD_PREP(ASPEED_PECI_TIMING_ADDRESS_MASK, priv->addr_timing);
> > + writel(val, priv->base + ASPEED_PECI_TIMING_NEGOTIATION);
> > +
> > + /* Clear interrupts */
> > + val = readl(priv->base + ASPEED_PECI_INT_STS) | ASPEED_PECI_INT_MASK;
> > + writel(val, priv->base + ASPEED_PECI_INT_STS);
> > +
> > + /* Set timing negotiation mode and enable interrupts */
> > + val = FIELD_PREP(ASPEED_PECI_TIMING_NEGO_SEL_MASK, ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO);
> > + val |= ASPEED_PECI_INT_MASK;
> > + writel(val, priv->base + ASPEED_PECI_INT_CTRL);
> > +
> > + val = FIELD_PREP(ASPEED_PECI_CTRL_SAMPLING_MASK, priv->rd_sampling_point);
> > + val |= FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, priv->clk_div);
> > + val |= ASPEED_PECI_CTRL_PECI_EN;
> > + val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
> > + writel(val, priv->base + ASPEED_PECI_CTRL);
>
> Do these MMIO access follow a standard? I.e. is there any possibility
> to have a common / generic MMIO xfer function, but just pass in a
> different base address discovered by the PECI device rather than a
> fully custom xfer function per controller?
>
No, it's vendor specific.
> > +}
> > +
> > +static inline int aspeed_peci_check_idle(struct aspeed_peci *priv)
> > +{
> > + u32 cmd_sts = readl(priv->base + ASPEED_PECI_CMD);
> > +
> > + if (FIELD_GET(ASPEED_PECI_CMD_STS_MASK, cmd_sts) == ASPEED_PECI_CMD_STS_ADDR_T_NEGO)
> > + aspeed_peci_init_regs(priv);
> > +
> > + return readl_poll_timeout(priv->base + ASPEED_PECI_CMD,
> > + cmd_sts,
> > + !(cmd_sts & ASPEED_PECI_CMD_IDLE_MASK),
> > + ASPEED_PECI_IDLE_CHECK_INTERVAL_US,
> > + ASPEED_PECI_IDLE_CHECK_TIMEOUT_US);
> > +}
> > +
> > +static int aspeed_peci_xfer(struct peci_controller *controller,
> > + u8 addr, struct peci_request *req)
> > +{
> > + struct aspeed_peci *priv = to_aspeed_peci(controller);
> > + unsigned long flags, timeout = msecs_to_jiffies(priv->cmd_timeout_ms);
> > + u32 peci_head;
> > + int ret;
> > +
> > + if (req->tx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX ||
> > + req->rx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX)
> > + return -EINVAL;
> > +
> > + /* Check command sts and bus idle state */
> > + ret = aspeed_peci_check_idle(priv);
> > + if (ret)
> > + return ret; /* -ETIMEDOUT */
> > +
> > + spin_lock_irqsave(&priv->lock, flags);
> > + reinit_completion(&priv->xfer_complete);
> > +
> > + peci_head = FIELD_PREP(ASPEED_PECI_TAGET_ADDR_MASK, addr) |
> > + FIELD_PREP(ASPEED_PECI_WRITE_LEN_MASK, req->tx.len) |
> > + FIELD_PREP(ASPEED_PECI_READ_LEN_MASK, req->rx.len);
> > +
> > + writel(peci_head, priv->base + ASPEED_PECI_RW_LENGTH);
> > +
> > + memcpy_toio(priv->base + ASPEED_PECI_W_DATA0, req->tx.buf,
> > + req->tx.len > 16 ? 16 : req->tx.len);
> > + if (req->tx.len > 16)
> > + memcpy_toio(priv->base + ASPEED_PECI_W_DATA4, req->tx.buf + 16,
> > + req->tx.len - 16);
> > +
> > + dev_dbg(priv->dev, "HEAD : 0x%08x\n", peci_head);
> > + print_hex_dump_bytes("TX : ", DUMP_PREFIX_NONE, req->tx.buf, req->tx.len);
> > +
> > + priv->status = 0;
> > + writel(ASPEED_PECI_CMD_FIRE, priv->base + ASPEED_PECI_CMD);
> > + spin_unlock_irqrestore(&priv->lock, flags);
> > +
> > + ret = wait_for_completion_interruptible_timeout(&priv->xfer_complete, timeout);
> > + if (ret < 0)
> > + return ret;
> > +
> > + if (ret == 0) {
> > + dev_dbg(priv->dev, "Timeout waiting for a response!\n");
> > + return -ETIMEDOUT;
> > + }
> > +
> > + spin_lock_irqsave(&priv->lock, flags);
> > +
> > + writel(0, priv->base + ASPEED_PECI_CMD);
> > +
> > + if (priv->status != ASPEED_PECI_INT_CMD_DONE) {
> > + spin_unlock_irqrestore(&priv->lock, flags);
> > + dev_dbg(priv->dev, "No valid response!\n");
> > + return -EIO;
> > + }
> > +
> > + spin_unlock_irqrestore(&priv->lock, flags);
> > +
> > + memcpy_fromio(req->rx.buf, priv->base + ASPEED_PECI_R_DATA0,
> > + req->rx.len > 16 ? 16 : req->rx.len);
> > + if (req->rx.len > 16)
> > + memcpy_fromio(req->rx.buf + 16, priv->base + ASPEED_PECI_R_DATA4,
> > + req->rx.len - 16);
> > +
> > + print_hex_dump_bytes("RX : ", DUMP_PREFIX_NONE, req->rx.buf, req->rx.len);
>
> If dynamic debug is not enabled this will be an unconditional
> printk(KERN_DEBUG.
>
> I'm ok with dev_dbg() in slow paths, but in fast paths you should look
> to tracing, or putting potentially heavyweight debug behind a
> CONFIG_X_DEBUG option. I have seen Greg is even less of a fan of
> dev_dbg().
>
Except hiding that behind CONFIG_PECI_DEBUG would require "custom"
printers (to avoid sprinkling ifdefs all over the code) and people seem
to dislike custom printers :(
If it was entirely up to me - I'd hide everything used for debug
("verbose" debug - hotpath or not) under peci_trace (that's noop if
!CONFIG_PECI_DEBUG) printer and use trace_printk under the hood.
> > +
> > + return 0;
> > +}
> > +
> > +static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)
> > +{
> > + struct aspeed_peci *priv = arg;
> > + u32 status;
> > +
> > + spin_lock(&priv->lock);
> > + status = readl(priv->base + ASPEED_PECI_INT_STS);
> > + writel(status, priv->base + ASPEED_PECI_INT_STS);
> > + priv->status |= (status & ASPEED_PECI_INT_MASK);
> > +
> > + /*
> > + * In most cases, interrupt bits will be set one by one but also note
> > + * that multiple interrupt bits could be set at the same time.
> > + */
> > + if (status & ASPEED_PECI_INT_BUS_TIMEOUT)
> > + dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_BUS_TIMEOUT\n");
> > +
> > + if (status & ASPEED_PECI_INT_BUS_CONNECT)
> > + dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_BUS_CONNECT\n");
> > +
> > + if (status & ASPEED_PECI_INT_W_FCS_BAD)
> > + dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_W_FCS_BAD\n");
> > +
> > + if (status & ASPEED_PECI_INT_W_FCS_ABORT)
> > + dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_W_FCS_ABORT\n");
>
> What's the utility of these debug statements? If they are for
> development than maybe they are ok, if they are for debug in the field
> I would make them counters and export them via debugfs, or sysfs if
> you expect to always be able to debug these events in case a kernel in
> the field has dev_dbg and debugfs disabled.
>
Development for now. Once the subsystem develops a bit more (with
additional controllers), we can think about adding counters (or
tracepoints?) that can be useful in the field but are not controller
specific.
> > +
> > + /*
> > + * All commands should be ended up with a ASPEED_PECI_INT_CMD_DONE bit
> > + * set even in an error case.
> > + */
> > + if (status & ASPEED_PECI_INT_CMD_DONE)
> > + complete(&priv->xfer_complete);
> > +
> > + spin_unlock(&priv->lock);
> > +
> > + return IRQ_HANDLED;
> > +}
> > +
> > +static void __sanitize_clock_divider(struct aspeed_peci *priv)
> > +{
> > + u32 clk_div;
> > + int ret;
> > +
> > + ret = device_property_read_u32(priv->dev, "clock-divider", &clk_div);
> > + if (ret) {
> > + clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
> > + } else if (clk_div > ASPEED_PECI_CLK_DIV_MAX) {
> > + dev_warn(priv->dev, "Invalid clock-divider: %u, Using default: %u\n",
> > + clk_div, ASPEED_PECI_CLK_DIV_DEFAULT);
> > +
> > + clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
> > + }
> > +
> > + priv->clk_div = clk_div;
> > +}
> > +
> > +static void __sanitize_msg_timing(struct aspeed_peci *priv)
> > +{
> > + u32 msg_timing;
> > + int ret;
> > +
> > + ret = device_property_read_u32(priv->dev, "msg-timing", &msg_timing);
> > + if (ret) {
> > + msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
> > + } else if (msg_timing > ASPEED_PECI_MSG_TIMING_MAX) {
> > + dev_warn(priv->dev, "Invalid msg-timing : %u, Use default : %u\n",
> > + msg_timing, ASPEED_PECI_MSG_TIMING_DEFAULT);
> > +
> > + msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
> > + }
> > +
> > + priv->msg_timing = msg_timing;
> > +}
> > +
> > +static void __sanitize_addr_timing(struct aspeed_peci *priv)
> > +{
> > + u32 addr_timing;
> > + int ret;
> > +
> > + ret = device_property_read_u32(priv->dev, "addr-timing", &addr_timing);
> > + if (ret) {
> > + addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
> > + } else if (addr_timing > ASPEED_PECI_ADDR_TIMING_MAX) {
> > + dev_warn(priv->dev, "Invalid addr-timing : %u, Use default : %u\n",
> > + addr_timing, ASPEED_PECI_ADDR_TIMING_DEFAULT);
> > +
> > + addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
> > + }
> > +
> > + priv->addr_timing = addr_timing;
> > +}
> > +
> > +static void __sanitize_rd_sampling_point(struct aspeed_peci *priv)
> > +{
> > + u32 rd_sampling_point;
> > + int ret;
> > +
> > + ret = device_property_read_u32(priv->dev, "rd-sampling-point", &rd_sampling_point);
> > + if (ret) {
> > + rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
> > + } else if (rd_sampling_point > ASPEED_PECI_RD_SAMPLING_POINT_MAX) {
> > + dev_warn(priv->dev, "Invalid rd-sampling-point: %u, Use default : %u\n",
> > + rd_sampling_point, ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT);
> > +
> > + rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
> > + }
> > +
> > + priv->rd_sampling_point = rd_sampling_point;
> > +}
> > +
> > +static void __sanitize_cmd_timeout(struct aspeed_peci *priv)
> > +{
> > + u32 timeout;
> > + int ret;
> > +
> > + ret = device_property_read_u32(priv->dev, "cmd-timeout-ms", &timeout);
> > + if (ret) {
> > + timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
> > + } else if (timeout > ASPEED_PECI_CMD_TIMEOUT_MS_MAX || timeout == 0) {
> > + dev_warn(priv->dev, "Invalid cmd-timeout-ms: %u, Use default: %u\n",
> > + timeout, ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT);
> > +
>
> For all of the same pattern like this above I would say "falling back
> to: %u" otherwise "Use default" sounds like an action the platform
> owner is expected to take.
>
Agree.
> Also, if the driver is correcting the issue does the log need to be
> spammed with a warning? Is this 'info' or 'debug'?
It suggests that device tree for that platform is incompatible with the
schema ("make dtbs_check" wasn't done).
For that reason I would prefer to keep it as a warning.
>
> > + timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
> > + }
> > +
> > + priv->cmd_timeout_ms = timeout;
> > +}
> > +
> > +static void aspeed_peci_device_property_sanitize(struct aspeed_peci *priv)
> > +{
> > + __sanitize_clock_divider(priv);
> > + __sanitize_msg_timing(priv);
> > + __sanitize_addr_timing(priv);
> > + __sanitize_rd_sampling_point(priv);
> > + __sanitize_cmd_timeout(priv);
> > +}
> > +
> > +static void aspeed_peci_disable_clk(void *data)
> > +{
> > + clk_disable_unprepare(data);
> > +}
> > +
> > +static int aspeed_peci_init_ctrl(struct aspeed_peci *priv)
> > +{
> > + int ret;
> > +
> > + priv->clk = devm_clk_get(priv->dev, NULL);
> > + if (IS_ERR(priv->clk))
> > + return dev_err_probe(priv->dev, PTR_ERR(priv->clk), "Failed to get clk source\n");
> > +
> > + ret = clk_prepare_enable(priv->clk);
> > + if (ret) {
> > + dev_err(priv->dev, "Failed to enable clock\n");
> > + return ret;
> > + }
> > +
> > + ret = devm_add_action_or_reset(priv->dev, aspeed_peci_disable_clk, priv->clk);
> > + if (ret)
> > + return ret;
> > +
> > + aspeed_peci_device_property_sanitize(priv);
> > +
> > + aspeed_peci_init_regs(priv);
> > +
> > + return 0;
> > +}
> > +
> > +static int aspeed_peci_probe(struct platform_device *pdev)
> > +{
> > + struct aspeed_peci *priv;
> > + int ret;
> > +
> > + priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
>
> ..."uh oh" from above confirmed. devm allocation lifetime and 'struct
> device' lifetime are not compatible.
>
> You can trigger use after free bugs by turning on
> CONFIG_DEBUG_KOBJECT_RELEASE.
>
> devm can be used to automatically unregister the peci_controller
> device. The flow would be something like:
>
> priv = devm_kzalloc(..., sizeof(*priv), ...);
> controller = peci_controller_alloc(...);
> if (IS_ERR(controller))
> return PTR_ERR(controller);
> rc = devm_peci_controller_add(...)
> if (rc)
> return rc;
>
> This arranges for the peci_controller_alloc() to be undone by
> put_device() in all cases. Internal to peci_controller_alloc() is
> typical goto unwind allocation error handling.
Yeah, it will be taken into account during refactoring that you
suggested in the previous patch review.
>
> > + if (!priv)
> > + return -ENOMEM;
> > +
> > + priv->dev = &pdev->dev;
> > + dev_set_drvdata(priv->dev, priv);
> > +
> > + priv->base = devm_platform_ioremap_resource(pdev, 0);
> > + if (IS_ERR(priv->base))
> > + return PTR_ERR(priv->base);
> > +
> > + priv->irq = platform_get_irq(pdev, 0);
> > + if (!priv->irq)
> > + return priv->irq;
> > +
> > + ret = devm_request_irq(&pdev->dev, priv->irq, aspeed_peci_irq_handler,
> > + 0, "peci-aspeed-irq", priv);
> > + if (ret)
> > + return ret;
> > +
> > + init_completion(&priv->xfer_complete);
> > + spin_lock_init(&priv->lock);
> > +
> > + priv->controller.xfer = aspeed_peci_xfer;
> > +
> > + priv->rst = devm_reset_control_get(&pdev->dev, NULL);
> > + if (IS_ERR(priv->rst)) {
> > + dev_err(&pdev->dev, "Missing or invalid reset controller entry\n");
> > + return PTR_ERR(priv->rst);
> > + }
> > + reset_control_deassert(priv->rst);
> > +
> > + ret = aspeed_peci_init_ctrl(priv);
> > + if (ret)
> > + return ret;
> > +
> > + return peci_controller_add(&priv->controller, priv->dev);
> > +}
> > +
> > +static int aspeed_peci_remove(struct platform_device *pdev)
> > +{
> > + struct aspeed_peci *priv = dev_get_drvdata(&pdev->dev);
> > +
> > + peci_controller_remove(&priv->controller);
> > + reset_control_assert(priv->rst);
> > +
>
> It's odd to have devm in the probe path and still publish a remove
> handler, i.e. why not handle controller removal and reset via devm?
> The example above with devm_peci_controller_add() already assumes
> peci_controller_remove is triggered by devm, reset assert can be
> managed the same way.
I think the pattern is pretty common for subsystems that don't have
devres support. Drivers use devres for resources that support it, and
go with regular "manual" remove for everything else.
But I'm fine with using devres in PECI.
>
> > + return 0;
> > +}
> > +
> > +static const struct of_device_id aspeed_peci_of_table[] = {
> > + { .compatible = "aspeed,ast2400-peci", },
> > + { .compatible = "aspeed,ast2500-peci", },
> > + { .compatible = "aspeed,ast2600-peci", },
> > + { }
> > +};
> > +MODULE_DEVICE_TABLE(of, aspeed_peci_of_table);
> > +
> > +static struct platform_driver aspeed_peci_driver = {
> > + .probe = aspeed_peci_probe,
> > + .remove = aspeed_peci_remove,
> > + .driver = {
> > + .name = "peci-aspeed",
> > + .of_match_table = aspeed_peci_of_table,
> > + },
> > +};
> > +module_platform_driver(aspeed_peci_driver);
> > +
> > +MODULE_AUTHOR("Ryan Chen <[email protected]>");
> > +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
>
> Same comments about MODULE_AUTHOR from patch 6, i.e. make sure this is
> not duplicating what MAINTAINERS and git log handle.
>
> I'll pause here until you've had a chance to consider fixes to the
> devm vs 'struct device' lifetime issue.
Sure, thank you
-Iwona
>
> > +MODULE_DESCRIPTION("ASPEED PECI driver");
> > +MODULE_LICENSE("GPL");
> > +MODULE_IMPORT_NS(PECI);
>
On Wed, 2021-07-14 at 21:05 +0000, Williams, Dan J wrote:
> On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> > Since PECI devices are discoverable, we can dynamically detect devices
> > that are actually available in the system.
> >
> > This change complements the earlier implementation by rescanning PECI
> > bus to detect available devices. For this purpose, it also introduces the
> > minimal API for PECI requests.
> >
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> > drivers/peci/Makefile | 2 +-
> > drivers/peci/core.c | 13 ++++-
> > drivers/peci/device.c | 111 ++++++++++++++++++++++++++++++++++++++++
> > drivers/peci/internal.h | 15 ++++++
> > drivers/peci/request.c | 74 +++++++++++++++++++++++++++
> > drivers/peci/sysfs.c | 34 ++++++++++++
> > 6 files changed, 246 insertions(+), 3 deletions(-)
> > create mode 100644 drivers/peci/device.c
> > create mode 100644 drivers/peci/request.c
> >
> > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
> > index 621a993e306a..917f689e147a 100644
> > --- a/drivers/peci/Makefile
> > +++ b/drivers/peci/Makefile
> > @@ -1,7 +1,7 @@
> > # SPDX-License-Identifier: GPL-2.0-only
> >
> > # Core functionality
> > -peci-y := core.o sysfs.o
> > +peci-y := core.o request.o device.o sysfs.o
> > obj-$(CONFIG_PECI) += peci.o
> >
> > # Hardware specific bus drivers
> > diff --git a/drivers/peci/core.c b/drivers/peci/core.c
> > index 0ad00110459d..ae7a9572cdf3 100644
> > --- a/drivers/peci/core.c
> > +++ b/drivers/peci/core.c
> > @@ -31,7 +31,15 @@ struct device_type peci_controller_type = {
> >
> > int peci_controller_scan_devices(struct peci_controller *controller)
> > {
> > - /* Just a stub, no support for actual devices yet */
> > + int ret;
> > + u8 addr;
> > +
> > + for (addr = PECI_BASE_ADDR; addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX; addr++) {
> > + ret = peci_device_create(controller, addr);
> > + if (ret)
> > + return ret;
> > + }
> > +
>
> This seems to be a behavior triggered at peci_controller_add and at the
> request of userspace when touching the rescan attribute? A natural way
> to handle this would be to have a driver for the peci_controller device
> and have that driver issue scan at probe time. Otherwise, how does
> userspace know when it is time to rescan the bus?
>
peci_controller_add() is expected to be called during probe() of
controller driver (otherwise the driver isn't really a controller
driver).
> > return 0;
> > }
> >
> > @@ -106,7 +114,8 @@ EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
> >
> > static int _unregister(struct device *dev, void *dummy)
> > {
> > - /* Just a stub, no support for actual devices yet */
> > + peci_device_destroy(to_peci_device(dev));
>
> As mentioned previously, this could be delegated to devm to unregister
> when the original driver that added the controller goes through -
> > remove().
>
Ack.
> > +
> > return 0;
> > }
> >
> > diff --git a/drivers/peci/device.c b/drivers/peci/device.c
> > new file mode 100644
> > index 000000000000..1124862211e2
> > --- /dev/null
> > +++ b/drivers/peci/device.c
> > @@ -0,0 +1,111 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2018-2021 Intel Corporation
> > +
> > +#include <linux/peci.h>
> > +#include <linux/slab.h>
> > +
> > +#include "internal.h"
> > +
> > +static int peci_detect(struct peci_controller *controller, u8 addr)
> > +{
> > + struct peci_request *req;
> > + int ret;
> > +
> > + req = peci_request_alloc(NULL, 0, 0);
> > + if (!req)
> > + return -ENOMEM;
> > +
> > + mutex_lock(&controller->bus_lock);
>
> What is the underlying requirement to prevent 2 simultaneous ->xfer()
> invocations?
>
It's PECI wire (physical layer) interface limitation.
> > + ret = controller->xfer(controller, addr, req);
> > + mutex_unlock(&controller->bus_lock);
> > +
> > + peci_request_free(req);
> > +
> > + return ret;
> > +}
> > +
> > +static bool peci_addr_valid(u8 addr)
> > +{
> > + return addr >= PECI_BASE_ADDR && addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX;
> > +}
> > +
> > +static int peci_dev_exists(struct device *dev, void *data)
> > +{
> > + struct peci_device *device = to_peci_device(dev);
> > + u8 *addr = data;
> > +
> > + if (device->addr == *addr)
> > + return -EBUSY;
> > +
> > + return 0;
> > +}
> > +
> > +int peci_device_create(struct peci_controller *controller, u8 addr)
> > +{
> > + struct peci_device *device;
> > + int ret;
> > +
> > + if (WARN_ON(!peci_addr_valid(addr)))
> > + return -EINVAL;
> > +
> > + /* Check if we have already detected this device before. */
> > + ret = device_for_each_child(&controller->dev, &addr, peci_dev_exists);
> > + if (ret)
> > + return 0;
> > +
> > + ret = peci_detect(controller, addr);
> > + if (ret) {
> > + /*
> > + * Device not present or host state doesn't allow successful
> > + * detection at this time.
> > + */
> > + if (ret == -EIO || ret == -ETIMEDOUT)
> > + return 0;
> > +
> > + return ret;
> > + }
> > +
> > + device = kzalloc(sizeof(*device), GFP_KERNEL);
> > + if (!device)
> > + return -ENOMEM;
> > +
> > + device->controller = controller;
> > + device->addr = addr;
> > + device->dev.parent = &device->controller->dev;
> > + device->dev.bus = &peci_bus_type;
> > + device->dev.type = &peci_device_type;
> > +
> > + ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device->addr);
> > + if (ret)
> > + goto err_free;
> > +
> > + ret = device_register(&device->dev);
>
> There is a recent movement away from device_register() to an alloc+add
> pattern [1]. I.e. have device_initialize() and device_add() steps. With
> that you can unify the error exit to be put_device().
>
> [1]: https://lore.kernel.org/r/[email protected]
>
It's just kfree in this case, but I agree. I'll modify this.
> > + if (ret)
> > + goto err_put;
> > +
> > + return 0;
> > +
> > +err_put:
> > + put_device(&device->dev);
> > +err_free:
> > + kfree(device);
> > +
> > + return ret;
> > +}
> > +
> > +void peci_device_destroy(struct peci_device *device)
> > +{
> > + device_unregister(&device->dev);
> > +}
> > +
> > +static void peci_device_release(struct device *dev)
> > +{
> > + struct peci_device *device = to_peci_device(dev);
> > +
> > + kfree(device);
> > +}
> > +
> > +struct device_type peci_device_type = {
> > + .groups = peci_device_groups,
> > + .release = peci_device_release,
> > +};
> > diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
> > index 80c61bcdfc6b..6b139adaf6b8 100644
> > --- a/drivers/peci/internal.h
> > +++ b/drivers/peci/internal.h
> > @@ -9,6 +9,21 @@
> >
> > struct peci_controller;
> > struct attribute_group;
> > +struct peci_device;
> > +struct peci_request;
> > +
> > +/* PECI CPU address range 0x30-0x37 */
> > +#define PECI_BASE_ADDR 0x30
> > +#define PECI_DEVICE_NUM_MAX 8
> > +
> > +struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len);
> > +void peci_request_free(struct peci_request *req);
> > +
> > +extern struct device_type peci_device_type;
> > +extern const struct attribute_group *peci_device_groups[];
> > +
> > +int peci_device_create(struct peci_controller *controller, u8 addr);
> > +void peci_device_destroy(struct peci_device *device);
> >
> > extern struct bus_type peci_bus_type;
> > extern const struct attribute_group *peci_bus_groups[];
> > diff --git a/drivers/peci/request.c b/drivers/peci/request.c
> > new file mode 100644
> > index 000000000000..78cee51dfae1
> > --- /dev/null
> > +++ b/drivers/peci/request.c
> > @@ -0,0 +1,74 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2021 Intel Corporation
> > +
> > +#include <linux/export.h>
> > +#include <linux/peci.h>
> > +#include <linux/slab.h>
> > +#include <linux/types.h>
> > +
> > +#include "internal.h"
> > +
> > +/**
> > + * peci_request_alloc() - allocate &struct peci_request with buffers with given lengths
> > + * @device: PECI device to which request is going to be sent
> > + * @tx_len: requested TX buffer length
> > + * @rx_len: requested RX buffer length
> > + *
> > + * Return: A pointer to a newly allocated &struct peci_request on success or NULL otherwise.
> > + */
> > +struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len)
> > +{
>
> How big can these lengths be?
PECI specification defines tx_len as a single byte, same thing for
rx_len.
Currently the largest we're using is 24 IIRC.
>
> > + struct peci_request *req;
> > + u8 *tx_buf, *rx_buf;
> > +
> > + req = kzalloc(sizeof(*req), GFP_KERNEL);
> > + if (!req)
> > + return NULL;
> > +
> > + req->device = device;
> > +
> > + /*
> > + * PECI controllers that we are using now don't support DMA, this
> > + * should be converted to DMA API once support for controllers that do
> > + * allow it is added to avoid an extra copy.
> > + */
> > + if (tx_len) {
> > + tx_buf = kzalloc(tx_len, GFP_KERNEL);
> > + if (!tx_buf)
> > + goto err_free_req;
> > +
> > + req->tx.buf = tx_buf;
> > + req->tx.len = tx_len;
> > + }
> > +
> > + if (rx_len) {
> > + rx_buf = kzalloc(rx_len, GFP_KERNEL);
> > + if (!rx_buf)
> > + goto err_free_tx;
> > +
> > + req->rx.buf = rx_buf;
> > + req->rx.len = rx_len;
> > + }
> > +
> > + return req;
> > +
> > +err_free_tx:
> > + kfree(req->tx.buf);
> > +err_free_req:
> > + kfree(req);
> > +
> > + return NULL;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_request_alloc, PECI);
> > +
> > +/**
> > + * peci_request_free() - free peci_request
> > + * @req: the PECI request to be freed
> > + */
> > +void peci_request_free(struct peci_request *req)
> > +{
> > + kfree(req->rx.buf);
> > + kfree(req->tx.buf);
> > + kfree(req);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
> > diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c
> > index 36c5e2a18a92..db9ef05776e3 100644
> > --- a/drivers/peci/sysfs.c
> > +++ b/drivers/peci/sysfs.c
> > @@ -1,6 +1,8 @@
> > // SPDX-License-Identifier: GPL-2.0-only
> > // Copyright (c) 2021 Intel Corporation
> >
> > +#include <linux/device.h>
> > +#include <linux/kernel.h>
> > #include <linux/peci.h>
> >
> > #include "internal.h"
> > @@ -46,3 +48,35 @@ const struct attribute_group *peci_bus_groups[] = {
> > &peci_bus_group,
> > NULL
> > };
> > +
> > +static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
> > + const char *buf, size_t count)
> > +{
> > + struct peci_device *device = to_peci_device(dev);
> > + bool res;
> > + int ret;
> > +
> > + ret = kstrtobool(buf, &res);
> > + if (ret)
> > + return ret;
> > +
> > + if (res && device_remove_file_self(dev, attr))
> > + peci_device_destroy(device);
> > +
> > + return count;
> > +}
> > +static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0200, NULL, remove_store);
>
> Why does userspace need the ability to kick devices off the bus?
>
> Do you have an example userspace tool that is using these sysfs APIs?
Symmetry with adding devices (in this case rescan) - it's also useful
for development and testing (e.g. kick off extra devices to leave a
single one).
Moreover, it looks like common pattern in other subsystems.
Thank you
-Iwona
>
> > +
> > +static struct attribute *peci_device_attrs[] = {
> > + &dev_attr_remove.attr,
> > + NULL
> > +};
> > +
> > +static const struct attribute_group peci_device_group = {
> > + .attrs = peci_device_attrs,
> > +};
> > +
> > +const struct attribute_group *peci_device_groups[] = {
> > + &peci_device_group,
> > + NULL
> > +};
>
On Thu, 2021-07-15 at 10:28 -0600, Rob Herring wrote:
> On Tue, Jul 13, 2021 at 12:04:37AM +0200, Iwona Winiarska wrote:
> > Add device tree bindings for the peci-aspeed controller driver.
> >
> > Co-developed-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > ---
> > .../devicetree/bindings/peci/peci-aspeed.yaml | 111 ++++++++++++++++++
> > 1 file changed, 111 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/peci/peci-aspeed.yaml b/Documentation/devicetree/bindings/peci/peci-aspeed.yaml
> > new file mode 100644
> > index 000000000000..6061e06009fb
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/peci/peci-aspeed.yaml
> > @@ -0,0 +1,111 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/peci/peci-aspeed.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Aspeed PECI Bus Device Tree Bindings
> > +
> > +maintainers:
> > + - Iwona Winiarska <[email protected]>
> > + - Jae Hyun Yoo <[email protected]>
> > +
> > +allOf:
> > + - $ref: peci-controller.yaml#
> > +
> > +properties:
> > + compatible:
> > + enum:
> > + - aspeed,ast2400-peci
> > + - aspeed,ast2500-peci
> > + - aspeed,ast2600-peci
> > +
> > + reg:
> > + maxItems: 1
> > +
> > + interrupts:
> > + maxItems: 1
> > +
> > + clocks:
> > + description: |
> > + Clock source for PECI controller. Should reference the external
> > + oscillator clock.
> > + maxItems: 1
> > +
> > + resets:
> > + maxItems: 1
> > +
> > + clock-divider:
> > + description: This value determines PECI controller internal clock
> > + dividing rate. The divider will be calculated as 2 raised to the
> > + power of the given value.
> > + $ref: /schemas/types.yaml#/definitions/uint32
> > + minimum: 0
> > + maximum: 7
> > + default: 0
> > +
>
> > + msg-timing:
> > + description: |
> > + Message timing negotiation period. This value will determine the period
> > + of message timing negotiation to be issued by PECI controller. The unit
> > + of the programmed value is four times of PECI clock period.
> > + $ref: /schemas/types.yaml#/definitions/uint32
> > + minimum: 0
> > + maximum: 255
> > + default: 1
> > +
> > + addr-timing:
> > + description: |
> > + Address timing negotiation period. This value will determine the period
> > + of address timing negotiation to be issued by PECI controller. The unit
> > + of the programmed value is four times of PECI clock period.
> > + $ref: /schemas/types.yaml#/definitions/uint32
> > + minimum: 0
> > + maximum: 255
> > + default: 1
> > +
> > + rd-sampling-point:
> > + description: |
> > + Read sampling point selection. The whole period of a bit time will be
> > + divided into 16 time frames. This value will determine the time frame
> > + in which the controller will sample PECI signal for data read back.
> > + Usually in the middle of a bit time is the best.
> > + $ref: /schemas/types.yaml#/definitions/uint32
> > + minimum: 0
> > + maximum: 15
> > + default: 8
> > +
> > + cmd-timeout-ms:
> > + description: |
> > + Command timeout in units of ms.
> > + $ref: /schemas/types.yaml#/definitions/uint32
> > + minimum: 1
> > + maximum: 1000
> > + default: 1000
>
> Are all of these properties common for PECI or specific to this
> controller? The former needs to go into the common schema. The latter
> need vendor prefixes.
>
The latter, I'll add vendor prefixes in v2.
Thank you
-Iwona
>
> > +
> > +required:
> > + - compatible
> > + - reg
> > + - interrupts
> > + - clocks
> > + - resets
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > + - |
> > + #include <dt-bindings/interrupt-controller/arm-gic.h>
> > + #include <dt-bindings/clock/ast2600-clock.h>
> > + peci-controller@1e78b000 {
> > + compatible = "aspeed,ast2600-peci";
> > + reg = <0x1e78b000 0x100>;
> > + interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
> > + clocks = <&syscon ASPEED_CLK_GATE_REF0CLK>;
> > + resets = <&syscon ASPEED_RESET_PECI>;
> > + clock-divider = <0>;
> > + msg-timing = <1>;
> > + addr-timing = <1>;
> > + rd-sampling-point = <8>;
> > + cmd-timeout-ms = <1000>;
> > + };
> > +...
> > --
> > 2.31.1
> >
> >
On Fri, Jul 16, 2021 at 2:08 PM Winiarska, Iwona
<[email protected]> wrote:
[..]
> > > diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
> > > new file mode 100644
> > > index 000000000000..601cc3c3c852
> > > --- /dev/null
> > > +++ b/drivers/peci/Kconfig
> > > @@ -0,0 +1,14 @@
> > > +# SPDX-License-Identifier: GPL-2.0-only
> > > +
> > > +menuconfig PECI
> > > + tristate "PECI support"
> > > + help
> > > + The Platform Environment Control Interface (PECI) is an interface
> > > + that provides a communication channel to Intel processors and
> > > + chipset components from external monitoring or control devices.
> > > +
> > > + If you want PECI support, you should say Y here and also to the
> > > + specific driver for your bus adapter(s) below.
> >
> > The user is reading this help text to decide if they want PECI
> > support, so clarifying that if they want PECI support they should turn
> > it on is not all that helpful. I would say "If you are building a
> > kernel for a Board Management Controller (BMC) say Y. If unsure say
> > N".
>
> Since PECI is only available on Intel platforms, perhaps something
> like:
> "If you are building a Board Management Controller (BMC) kernel for
> Intel platform say Y"?
>
Looks good.
> >
> > > +
> > > + This support is also available as a module. If so, the module
> > > + will be called peci.
> > > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
> > > new file mode 100644
> > > index 000000000000..2bb2f51bcda7
> > > --- /dev/null
> > > +++ b/drivers/peci/Makefile
> > > @@ -0,0 +1,5 @@
> > > +# SPDX-License-Identifier: GPL-2.0-only
> > > +
> > > +# Core functionality
> > > +peci-y := core.o sysfs.o
> > > +obj-$(CONFIG_PECI) += peci.o
> > > diff --git a/drivers/peci/core.c b/drivers/peci/core.c
> > > new file mode 100644
> > > index 000000000000..0ad00110459d
> > > --- /dev/null
> > > +++ b/drivers/peci/core.c
> > > @@ -0,0 +1,166 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +// Copyright (c) 2018-2021 Intel Corporation
> > > +
> > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > > +
> > > +#include <linux/bug.h>
> > > +#include <linux/device.h>
> > > +#include <linux/export.h>
> > > +#include <linux/idr.h>
> > > +#include <linux/module.h>
> > > +#include <linux/of.h>
> > > +#include <linux/peci.h>
> > > +#include <linux/pm_runtime.h>
> > > +#include <linux/property.h>
> > > +#include <linux/slab.h>
> > > +
> > > +#include "internal.h"
> > > +
> > > +static DEFINE_IDA(peci_controller_ida);
> > > +
> > > +static void peci_controller_dev_release(struct device *dev)
> > > +{
> > > + struct peci_controller *controller = to_peci_controller(dev);
> > > +
> > > + mutex_destroy(&controller->bus_lock);
> > > +}
> > > +
> > > +struct device_type peci_controller_type = {
> > > + .release = peci_controller_dev_release,
> > > +};
> >
> > I have not read further than patch 6 in this set, so I'm hoping there
> > is an explanation for this. As it stands it looks like a red flag that
> > the release function is not actually releasing anything?
> >
>
> Ok, that's related to other comments here and in patch 7. I'll try to
> refactor this. I'm thinking about splitting the "controller_add" into
> separate "alloc" and "add" (or init? register?). And perhaps integrate
> that into devm, so that controller can be allocated using devres, tying
> that into lifetime of underlying platform device.
>
The devres scheme cannot be used for allocating an object that
contains a 'struct device'. The devres lifetime is until
dev->driver.release(dev), 'struct device' lifetime is until last
put_device() where your driver has no idea what other agent in the
system might have taken a reference. That said, devres *can* be used
for triggering automatic device_del() you can see devm_cxl_add_port()
[1] as an example:
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/cxl/core.c#n333
> > > +
> > > +int peci_controller_scan_devices(struct peci_controller *controller)
> > > +{
> > > + /* Just a stub, no support for actual devices yet */
> > > + return 0;
> > > +}
> >
> > Move this to the patch where it is needed.
>
> It's used in this patch (in sysfs and controller add), but at this
> point we haven't introduced devices yet.
> I would have to move this to patch 8 - but I don't think it belongs
> there.
I would expect if patch8 fills this in then this and its caller belong
there so the mechanism can be reviewed together.
> Will it make more sense if I introduce sysfs documentation here?
The sysfs documentation should be in the same patch that adds the attribute.
> Or as a completely separate patch?
A new / separate "implement rescan" would work too...
> I wanted to avoid going too far with split granularity, and just go
> with high-level concepts starting with the controller.
Sure, I think this patchset has a reasonable split, but this rescan
feature seems unique enough to get its own patch.
>
> >
> > > +
> > > +/**
> > > + * peci_controller_add() - Add PECI controller
> > > + * @controller: the PECI controller to be added
> > > + * @parent: device object to be registered as a parent
> > > + *
> > > + * In final stage of its probe(), peci_controller driver should include calling
> >
> > s/should include calling/calls/
> >
>
> Ok.
>
> > > + * peci_controller_add() to register itself with the PECI bus.
> > > + * The caller is responsible for allocating the struct
> > > peci_controller and
> > > + * managing its lifetime, calling peci_controller_remove() prior
> > > to releasing
> > > + * the allocation.
> > > + *
> > > + * It returns zero on success, else a negative error code
> > > (dropping the
> > > + * controller's refcount). After a successful return, the caller
> > > is responsible
> > > + * for calling peci_controller_remove().
> > > + *
> > > + * Return: 0 if succeeded, other values in case errors.
> > > + */
> > > +int peci_controller_add(struct peci_controller *controller, struct
> > > device *parent)
> > > +{
> > > + struct fwnode_handle *node =
> > > fwnode_handle_get(dev_fwnode(parent));
> > > + int ret;
> > > +
> > > + if (WARN_ON(!controller->xfer))
> >
> > Why WARN()? What is 'xfer', and what is likelihood the caller forgets
> > to set it? For something critical like this the WARN is likely
> > overkill.
> >
>
> Very unlikely - 'xfer' provides "connection" with hardware so it's
> rather mandatory.
> It indicates programmer error, so WARN() with all its consequences
> (taint and so on) seemed adequate.
>
> Do you suggest to downgrade it to pr_err()?
I'd say no report at all. It's not relevant to the user, and at worst
it's a liability for environments that want to audit and control all
kernel warnings. The chances that a future developer makes the
mistake, or does not figure out quickly that they forgot to set
->xfer() is low.
[..]
> > > +
> > > + return ret;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
> >
> > I think it's cleaner to declare symbol namespaces in the Makefile. In
> > this case, add:
> >
> > cflags-y += -DDEFAULT_SYMBOL_NAMESPACE=PECI
> >
> > ...and just use EXPORT_SYMBOL_GPL as normal in the C file.
> >
>
> I kind of prefer the more verbose EXPORT_SYMBOL_NS_GPL - it also
> doesn't "hide" the fact that we're using namespaces (everything is in
> the C file rather than mixed into Makefile), but it's not a strong
> opinion, so sure - I can change this.
>
Perhaps as a tie breaker, the maintainer you are submitting this to,
Greg, uses the -DDEFAULT_SYMBOL_NAMESPACE scheme in his subsystem,
drivers/usb/.
[..]
> > > +static BUS_ATTR_WO(rescan);
> >
> > No Documentation/ABI entry for this attribute, which means I'm not
> > sure if it's suitable because it's unreviewable what it actually does
> > reviewing this patch as a standalone.
> >
>
> We're expecting to use "rescan" in the similar way as it is used for
> PCIe or USB.
> BMC can boot up when the system is still in S5 (without any guarantee
> that it will ever change this state - the user can never turn the
> platform on :) ). If the controller is loaded and the platform allows
> it to discover devices - great (the scan happens as last step of
> controller_add), if not - userspace can use rescan.
There's no interrupt or notification to the BMC that the power-on
event happened? Seems fragile to leave this responsibility to
userspace.
I had assumed rescan for those other buses is an exceptional mechanism
for platform debug, not a typical usage flow for userspace.
>
> I'll add documentation in v2.
>
> > > +
> > > +static struct attribute *peci_bus_attrs[] = {
> > > + &bus_attr_rescan.attr,
> > > + NULL
> > > +};
> > > +
> > > +static const struct attribute_group peci_bus_group = {
> > > + .attrs = peci_bus_attrs,
> > > +};
> > > +
> > > +const struct attribute_group *peci_bus_groups[] = {
> > > + &peci_bus_group,
> > > + NULL
> > > +};
> > > diff --git a/include/linux/peci.h b/include/linux/peci.h
> > > new file mode 100644
> > > index 000000000000..cdf3008321fd
> > > --- /dev/null
> > > +++ b/include/linux/peci.h
> > > @@ -0,0 +1,82 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/* Copyright (c) 2018-2021 Intel Corporation */
> > > +
> > > +#ifndef __LINUX_PECI_H
> > > +#define __LINUX_PECI_H
> > > +
> > > +#include <linux/device.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/mutex.h>
> > > +#include <linux/types.h>
> > > +
> > > +struct peci_request;
> > > +
> > > +/**
> > > + * struct peci_controller - PECI controller
> > > + * @dev: device object to register PECI controller to the device
> > > model
> > > + * @xfer: PECI transfer function
> > > + * @bus_lock: lock used to protect multiple callers
> > > + * @id: PECI controller ID
> > > + *
> > > + * PECI controllers usually connect to their drivers using non-
> > > PECI bus,
> > > + * such as the platform bus.
> > > + * Each PECI controller can communicate with one or more PECI
> > > devices.
> > > + */
> > > +struct peci_controller {
> > > + struct device dev;
> > > + int (*xfer)(struct peci_controller *controller, u8 addr,
> > > struct peci_request *req);
> >
> > Each device will have a different way to do a PECI transfer?
> >
> > I thought PECI was a standard...
> >
>
> The "standard" part only applies to the connection between the
> controller and the devices - not the connection between controller and
> the rest of the system on which the controller resides in.
> xfer is vendor specific.
...all PECI controllers implement different MMIO register layouts?
>
> > > + struct mutex bus_lock; /* held for the duration of xfer */
> >
> > What is it actually locking? For example, there is a mantra that goes
> > "lock data, not code", and this comment seems to imply that no
> > specific
> > data is being locked.
> >
>
> PECI-wire interface requires that the response follows the request -
> and that should hold for all devices behind a given controller.
> In other words, assuming that we have two devices, d1 and d2, we need
> to have: d1.req, d1.resp, d2.req, d2.resp. Single xfer takes care of
> both request and response.
>
> I would like to eventually move that lock into individual controllers,
> but before that happens - I'd like to have a reasoning behind it.
> If we have interfaces that allow us to decouple requests from responses
> or devices that can handle servicing more than one requests at a time,
> the lock will go away from peci-core.
Another way to handle a "single request/response at a time" protocol
scheme is to use a single-threaded workqueue, then no lock is needed.
Requests are posted to the queue, responses are handled in the same
thread. This way callers have the option to either post work and
asynchronously poll for completion, or synchronously wait. The SCSI
libsas driver uses such a scheme.
>
> >
> > > + u8 id;
> >
> > No possible way to have more than 256 controllers per system?
> >
>
> For real world scenarios - I expect single digit number of controllers
> per system. The boards with HW compatible with "aspeed,ast2xxx-peci"
> contain just one instance of this controller.
> I expect more in the future (e.g. different "physical" transport), but
> definitely not more than 256 per system.
>
Ok.
On Fri, Jul 16, 2021 at 02:50:04PM -0700, Dan Williams wrote:
> On Fri, Jul 16, 2021 at 2:08 PM Winiarska, Iwona
> > > > +}
> > > > +EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
> > >
> > > I think it's cleaner to declare symbol namespaces in the Makefile. In
> > > this case, add:
> > >
> > > cflags-y += -DDEFAULT_SYMBOL_NAMESPACE=PECI
> > >
> > > ...and just use EXPORT_SYMBOL_GPL as normal in the C file.
> > >
> >
> > I kind of prefer the more verbose EXPORT_SYMBOL_NS_GPL - it also
> > doesn't "hide" the fact that we're using namespaces (everything is in
> > the C file rather than mixed into Makefile), but it's not a strong
> > opinion, so sure - I can change this.
> >
>
> Perhaps as a tie breaker, the maintainer you are submitting this to,
> Greg, uses the -DDEFAULT_SYMBOL_NAMESPACE scheme in his subsystem,
> drivers/usb/.
We did that because namespaces were added _after_ the kernel code was
already there. For new code like this, the original use of
EXPORT_SYMBOL_NS_GPL() is best as it is explicit and obvious. No need
to dig around in a Makefile to find out the namespace name.
thanks,
greg k-h
On Fri, Jul 16, 2021 at 11:13 PM [email protected]
<[email protected]> wrote:
>
> On Fri, Jul 16, 2021 at 02:50:04PM -0700, Dan Williams wrote:
> > On Fri, Jul 16, 2021 at 2:08 PM Winiarska, Iwona
> > > > > +}
> > > > > +EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
> > > >
> > > > I think it's cleaner to declare symbol namespaces in the Makefile. In
> > > > this case, add:
> > > >
> > > > cflags-y += -DDEFAULT_SYMBOL_NAMESPACE=PECI
> > > >
> > > > ...and just use EXPORT_SYMBOL_GPL as normal in the C file.
> > > >
> > >
> > > I kind of prefer the more verbose EXPORT_SYMBOL_NS_GPL - it also
> > > doesn't "hide" the fact that we're using namespaces (everything is in
> > > the C file rather than mixed into Makefile), but it's not a strong
> > > opinion, so sure - I can change this.
> > >
> >
> > Perhaps as a tie breaker, the maintainer you are submitting this to,
> > Greg, uses the -DDEFAULT_SYMBOL_NAMESPACE scheme in his subsystem,
> > drivers/usb/.
>
> We did that because namespaces were added _after_ the kernel code was
> already there. For new code like this, the original use of
> EXPORT_SYMBOL_NS_GPL() is best as it is explicit and obvious. No need
> to dig around in a Makefile to find out the namespace name.
Fair enough.
/me goes to update drivers/cxl/
On Thu, 2021-07-15 at 10:45 -0700, Guenter Roeck wrote:
> On Tue, Jul 13, 2021 at 12:04:44AM +0200, Iwona Winiarska wrote:
> > Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
> > readings of the processor package and processor cores that are
> > accessible via the PECI interface.
> >
> > The main use case for the driver (and PECI interface) is out-of-band
> > management, where we're able to obtain the DTS readings from an external
> > entity connected with PECI, e.g. BMC on server platforms.
> >
> > Co-developed-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
>
> Note: Due to lack of revision information, this review does not take
> any previous discussions into account, and it may miss critical information.
> For a final review I'll have to compare the code against earlier versions
> to determine if there are any relevant changes and if all comments
> have been addressed. This may take some time.
>
Thank you for taking a look at the patches.
Unfortunately, the changes needed in the PECI-core part impacted HWMON as well.
I tried to minimize refactoring, but changes are rather big (in my opinion -
which is why I assumed that full changelog won't be helpful).
After taking over the series I noted your comments to v11. The changes you
requested are either already there or the code no longer exists due to
refactoring (but after going through that again, I noticed the duplicated
CORE_NUMS_MAX define - I'll remove it in v2).
> > ---
> > MAINTAINERS | 7 +
> > drivers/hwmon/Kconfig | 2 +
> > drivers/hwmon/Makefile | 1 +
> > drivers/hwmon/peci/Kconfig | 18 ++
> > drivers/hwmon/peci/Makefile | 5 +
> > drivers/hwmon/peci/common.h | 46 ++++
> > drivers/hwmon/peci/cputemp.c | 503 +++++++++++++++++++++++++++++++++++
> > 7 files changed, 582 insertions(+)
> > create mode 100644 drivers/hwmon/peci/Kconfig
> > create mode 100644 drivers/hwmon/peci/Makefile
> > create mode 100644 drivers/hwmon/peci/common.h
> > create mode 100644 drivers/hwmon/peci/cputemp.c
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index f47b5f634293..35ba9e3646bd 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -14504,6 +14504,13 @@ L: [email protected]
> > S: Maintained
> > F: drivers/platform/x86/peaq-wmi.c
> >
> > +PECI HARDWARE MONITORING DRIVERS
> > +M: Iwona Winiarska <[email protected]>
> > +R: Jae Hyun Yoo <[email protected]>
> > +L: [email protected]
> > +S: Supported
> > +F: drivers/hwmon/peci/
> > +
> > PECI SUBSYSTEM
> > M: Iwona Winiarska <[email protected]>
> > R: Jae Hyun Yoo <[email protected]>
> > diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> > index e3675377bc5d..61c0e3404415 100644
> > --- a/drivers/hwmon/Kconfig
> > +++ b/drivers/hwmon/Kconfig
> > @@ -1507,6 +1507,8 @@ config SENSORS_PCF8591
> > These devices are hard to detect and rarely found on mainstream
> > hardware. If unsure, say N.
> >
> > +source "drivers/hwmon/peci/Kconfig"
> > +
> > source "drivers/hwmon/pmbus/Kconfig"
> >
> > config SENSORS_PWM_FAN
> > diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> > index d712c61c1f5e..f52331f212ed 100644
> > --- a/drivers/hwmon/Makefile
> > +++ b/drivers/hwmon/Makefile
> > @@ -202,6 +202,7 @@ obj-$(CONFIG_SENSORS_WM8350) += wm8350-hwmon.o
> > obj-$(CONFIG_SENSORS_XGENE) += xgene-hwmon.o
> >
> > obj-$(CONFIG_SENSORS_OCC) += occ/
> > +obj-$(CONFIG_SENSORS_PECI) += peci/
> > obj-$(CONFIG_PMBUS) += pmbus/
> >
> > ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
> > diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
> > new file mode 100644
> > index 000000000000..e10eed68d70a
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/Kconfig
> > @@ -0,0 +1,18 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config SENSORS_PECI_CPUTEMP
> > + tristate "PECI CPU temperature monitoring client"
> > + depends on PECI
> > + select SENSORS_PECI
> > + select PECI_CPU
> > + help
> > + If you say yes here you get support for the generic Intel PECI
> > + cputemp driver which provides Digital Thermal Sensor (DTS) thermal
> > + readings of the CPU package and CPU cores that are accessible via
> > + the processor PECI interface.
> > +
> > + This driver can also be built as a module. If so, the module
> > + will be called peci-cputemp.
> > +
> > +config SENSORS_PECI
> > + tristate
> > diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
> > new file mode 100644
> > index 000000000000..e8a0ada5ab1f
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +peci-cputemp-y := cputemp.o
> > +
> > +obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
> > diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
> > new file mode 100644
> > index 000000000000..54580c100d06
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/common.h
> > @@ -0,0 +1,46 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright (c) 2021 Intel Corporation */
> > +
> > +#include <linux/types.h>
> > +
> > +#ifndef __PECI_HWMON_COMMON_H
> > +#define __PECI_HWMON_COMMON_H
> > +
> > +#define UPDATE_INTERVAL_DEFAULT HZ
> > +
> > +/**
> > + * struct peci_sensor_data - PECI sensor information
> > + * @valid: flag to indicate the sensor value is valid
> > + * @value: sensor value in milli units
> > + * @last_updated: time of the last update in jiffies
> > + */
> > +struct peci_sensor_data {
> > + unsigned int valid;
>
> Please use bool.
>
Ok.
> > + s32 value;
> > + unsigned long last_updated;
> > +};
> > +
> > +/**
> > + * peci_sensor_need_update() - check whether sensor update is needed or not
> > + * @sensor: pointer to sensor data struct
> > + *
> > + * Return: true if update is needed, false if not.
> > + */
> > +
> > +static inline bool peci_sensor_need_update(struct peci_sensor_data *sensor)
> > +{
> > + return !sensor->valid ||
> > + time_after(jiffies, sensor->last_updated +
> > UPDATE_INTERVAL_DEFAULT);
>
>
> Since there is no other update interval, _DEFAULT does not have any value.
> Please drop. Also, please select a prefix such as PECI_.
>
Sure - I'll go with PECI_HWMON_UPDATE_INTERVAL.
> > +}
> > +
> > +/**
> > + * peci_sensor_mark_updated() - mark the sensor is updated
> > + * @sensor: pointer to sensor data struct
> > + */
> > +static inline void peci_sensor_mark_updated(struct peci_sensor_data *sensor)
> > +{
> > + sensor->valid = 1;
>
> = true;
>
Ok.
> > + sensor->last_updated = jiffies;
> > +}
> > +
> > +#endif /* __PECI_HWMON_COMMON_H */
> > diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
> > new file mode 100644
> > index 000000000000..56a526471687
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/cputemp.c
> > @@ -0,0 +1,503 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2018-2021 Intel Corporation
> > +
> > +#include <linux/auxiliary_bus.h>
> > +#include <linux/bitfield.h>
> > +#include <linux/bitops.h>
> > +#include <linux/hwmon.h>
> > +#include <linux/jiffies.h>
> > +#include <linux/module.h>
> > +#include <linux/peci.h>
> > +#include <linux/peci-cpu.h>
> > +#include <linux/units.h>
> > +#include <linux/x86/intel-family.h>
> > +
> > +#include "common.h"
> > +
> > +#define CORE_NUMS_MAX 64
> > +
> > +#define DEFAULT_CHANNEL_NUMS 5
> > +#define CORETEMP_CHANNEL_NUMS CORE_NUMS_MAX
> > +#define CPUTEMP_CHANNEL_NUMS (DEFAULT_CHANNEL_NUMS + CORETEMP_CHANNEL_NUMS)
> > +
> > +#define TEMP_TARGET_FAN_TEMP_MASK GENMASK(15, 8)
> > +#define TEMP_TARGET_REF_TEMP_MASK GENMASK(23, 16)
> > +#define TEMP_TARGET_TJ_OFFSET_MASK GENMASK(29, 24)
> > +
> > +#define DTS_MARGIN_MASK GENMASK(15, 0)
> > +#define PCS_MODULE_TEMP_MASK GENMASK(15, 0)
> > +
> > +#define DTS_FIXED_POINT_FRACTION 64
> > +
> > +struct resolved_cores_reg {
> > + u8 bus;
> > + u8 dev;
> > + u8 func;
> > + u8 offset;
> > +};
> > +
> > +struct cpu_info {
> > + struct resolved_cores_reg *reg;
> > + u8 min_peci_revision;
> > +};
> > +
> > +struct peci_cputemp {
> > + struct peci_device *peci_dev;
> > + struct device *dev;
> > + const char *name;
> > + const struct cpu_info *gen_info;
> > + struct {
> > + struct peci_sensor_data die;
> > + struct peci_sensor_data dts;
> > + struct peci_sensor_data tcontrol;
> > + struct peci_sensor_data tthrottle;
> > + struct peci_sensor_data tjmax;
> > + struct peci_sensor_data core[CORETEMP_CHANNEL_NUMS];
> > + } temp;
> > + const char **coretemp_label;
> > + DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
> > +};
> > +
> > +enum cputemp_channels {
> > + channel_die,
> > + channel_dts,
> > + channel_tcontrol,
> > + channel_tthrottle,
> > + channel_tjmax,
> > + channel_core,
> > +};
> > +
> > +static const char *cputemp_label[DEFAULT_CHANNEL_NUMS] = {
> > + "Die",
> > + "DTS",
> > + "Tcontrol",
> > + "Tthrottle",
> > + "Tjmax",
> > +};
> > +
> > +static int get_temp_targets(struct peci_cputemp *priv)
> > +{
> > + s32 tthrottle_offset, tcontrol_margin;
> > + u32 pcs;
> > + int ret;
> > +
> > + /*
> > + * Just use only the tcontrol marker to determine if target values
> > need
> > + * update.
> > + */
> > + if (!peci_sensor_need_update(&priv->temp.tcontrol))
> > + return 0;
> > +
> True for the entire code: Please explain how this avoids race conditions
> without locking between the condition check here and the call to
> peci_sensor_mark_updated() below. The explanation needs to be added
> as comment into the code for later reference.
>
You're right, there is a race here that may cause PECI command to be triggered
more than once. It doesn't have any impact on correctness though.
I could add a comment explaining that, but I guess just adding a mutex to avoid
the race makes more sense.
> > + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
> > + if (ret)
> > + return ret;
> > +
> > + priv->temp.tjmax.value = FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > + tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
> > + tcontrol_margin = sign_extend32(tcontrol_margin, 7) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
> > +
> > + tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp.tthrottle.value = priv->temp.tjmax.value -
> > tthrottle_offset;
> > +
> > + peci_sensor_mark_updated(&priv->temp.tcontrol);
> > +
> > + return 0;
> > +}
> > +
> > +/*
> > + * Processors return a value of DTS reading in S10.6 fixed point format
> > + * (sign, 10 bits signed integer value, 6 bits fractional).
> > + * Error codes:
> > + * 0x8000: General sensor error
> > + * 0x8001: Reserved
> > + * 0x8002: Underflow on reading value
> > + * 0x8003-0x81ff: Reserved
> > + */
> > +static bool dts_valid(s32 val)
> > +{
> > + return val < 0x8000 || val > 0x81ff;
> > +}
> > +
> > +static s32 dts_to_millidegree(s32 val)
> > +{
> > + return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE /
> > DTS_FIXED_POINT_FRACTION;
> > +}
> > +
> > +static int get_die_temp(struct peci_cputemp *priv)
> > +{
> > + s16 temp;
> > + int ret;
> > +
> > + if (!peci_sensor_need_update(&priv->temp.die))
> > + return 0;
> > +
> > + ret = peci_temp_read(priv->peci_dev, &temp);
> > + if (ret)
> > + return ret;
> > +
> > + if (!dts_valid(temp))
> > + return -EIO;
> > +
> > + /* Note that the tjmax should be available before calling it */
> > + priv->temp.die.value = priv->temp.tjmax.value +
> > dts_to_millidegree(temp);
> > +
> > + peci_sensor_mark_updated(&priv->temp.die);
> > +
> > + return 0;
> > +}
> > +
> > +static int get_dts(struct peci_cputemp *priv)
> > +{
> > + s32 dts_margin;
> > + u32 pcs;
> > + int ret;
> > +
> > + if (!peci_sensor_need_update(&priv->temp.dts))
> > + return 0;
> > +
> > + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_THERMAL_MARGIN, 0, &pcs);
> > + if (ret)
> > + return ret;
> > +
> > + dts_margin = FIELD_GET(DTS_MARGIN_MASK, pcs);
> > + if (!dts_valid(dts_margin))
> > + return -EIO;
> > +
> > + /* Note that the tcontrol should be available before calling it */
> > + priv->temp.dts.value = priv->temp.tcontrol.value -
> > dts_to_millidegree(dts_margin);
> > +
> > + peci_sensor_mark_updated(&priv->temp.dts);
> > +
> > + return 0;
> > +}
> > +
> > +static int get_core_temp(struct peci_cputemp *priv, int core_index)
> > +{
> > + s32 core_dts_margin;
> > + u32 pcs;
> > + int ret;
> > +
> > + if (!peci_sensor_need_update(&priv->temp.core[core_index]))
> > + return 0;
> > +
> > + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_MODULE_TEMP, core_index,
> > &pcs);
> > + if (ret)
> > + return ret;
> > +
> > + core_dts_margin = FIELD_GET(PCS_MODULE_TEMP_MASK, pcs);
> > + if (!dts_valid(core_dts_margin))
> > + return -EIO;
> > +
> > + /* Note that the tjmax should be available before calling it */
> > + priv->temp.core[core_index].value =
> > + priv->temp.tjmax.value + dts_to_millidegree(core_dts_margin);
> > +
> > + peci_sensor_mark_updated(&priv->temp.core[core_index]);
> > +
> > + return 0;
> > +}
> > +
> > +static int cputemp_read_string(struct device *dev, enum hwmon_sensor_types
> > type,
> > + u32 attr, int channel, const char **str)
> > +{
> > + struct peci_cputemp *priv = dev_get_drvdata(dev);
> > +
> > + if (attr != hwmon_temp_label)
> > + return -EOPNOTSUPP;
> > +
> > + *str = channel < channel_core ?
> > + cputemp_label[channel] : priv->coretemp_label[channel -
> > channel_core];
> > +
> > + return 0;
> > +}
> > +
> > +static int cputemp_read(struct device *dev, enum hwmon_sensor_types type,
> > + u32 attr, int channel, long *val)
> > +{
> > + struct peci_cputemp *priv = dev_get_drvdata(dev);
> > + int ret, core_index;
> > +
> > + ret = get_temp_targets(priv);
> > + if (ret)
> > + return ret;
> > +
> > + switch (attr) {
> > + case hwmon_temp_input:
> > + switch (channel) {
> > + case channel_die:
> > + ret = get_die_temp(priv);
> > + if (ret)
> > + return ret;
> > +
> > + *val = priv->temp.die.value;
> > + break;
> > + case channel_dts:
> > + ret = get_dts(priv);
> > + if (ret)
> > + return ret;
> > +
> > + *val = priv->temp.dts.value;
> > + break;
> > + case channel_tcontrol:
> > + *val = priv->temp.tcontrol.value;
> > + break;
> > + case channel_tthrottle:
> > + *val = priv->temp.tthrottle.value;
> > + break;
> > + case channel_tjmax:
> > + *val = priv->temp.tjmax.value;
> > + break;
> > + default:
> > + core_index = channel - channel_core;
> > + ret = get_core_temp(priv, core_index);
> > + if (ret)
> > + return ret;
> > +
> > + *val = priv->temp.core[core_index].value;
> > + break;
> > + }
> > + break;
> > + case hwmon_temp_max:
> > + *val = priv->temp.tcontrol.value;
> > + break;
> > + case hwmon_temp_crit:
> > + *val = priv->temp.tjmax.value;
> > + break;
> > + case hwmon_temp_crit_hyst:
> > + *val = priv->temp.tjmax.value - priv->temp.tcontrol.value;
> > + break;
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static umode_t cputemp_is_visible(const void *data, enum hwmon_sensor_types
> > type,
> > + u32 attr, int channel)
> > +{
> > + const struct peci_cputemp *priv = data;
> > +
> > + if (channel > CPUTEMP_CHANNEL_NUMS)
> > + return 0;
> > +
> > + if (channel < channel_core)
> > + return 0444;
> > +
> > + if (test_bit(channel - channel_core, priv->core_mask))
> > + return 0444;
> > +
> > + return 0;
> > +}
> > +
> > +static int init_core_mask(struct peci_cputemp *priv)
> > +{
> > + struct peci_device *peci_dev = priv->peci_dev;
> > + struct resolved_cores_reg *reg = priv->gen_info->reg;
> > + u64 core_mask;
> > + u32 data;
> > + int ret;
> > +
> > + /* Get the RESOLVED_CORES register value */
> > + switch (peci_dev->info.model) {
> > + case INTEL_FAM6_ICELAKE_X:
> > + case INTEL_FAM6_ICELAKE_D:
> > + ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
> > + reg->func, reg->offset + 4,
> > &data);
> > + if (ret)
> > + return ret;
> > +
> > + core_mask = (u64)data << 32;
> > +
> > + ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
> > + reg->func, reg->offset, &data);
> > + if (ret)
> > + return ret;
> > +
> > + core_mask |= data;
> > +
> > + break;
> > + default:
> > + ret = peci_pci_local_read(peci_dev, reg->bus, reg->dev,
> > + reg->func, reg->offset, &data);
> > + if (ret)
> > + return ret;
> > +
> > + core_mask = data;
> > +
> > + break;
> > + }
> > +
> > + if (!core_mask)
> > + return -EIO;
> > +
> > + bitmap_from_u64(priv->core_mask, core_mask);
> > +
> > + return 0;
> > +}
> > +
> > +static int create_temp_label(struct peci_cputemp *priv)
> > +{
> > + unsigned long core_max = find_last_bit(priv->core_mask,
> > CORE_NUMS_MAX);
> > + int i;
> > +
> > + priv->coretemp_label = devm_kzalloc(priv->dev, core_max * sizeof(char
> > *), GFP_KERNEL);
> > + if (!priv->coretemp_label)
> > + return -ENOMEM;
> > +
> > + for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
> > + priv->coretemp_label[i] = devm_kasprintf(priv->dev,
> > GFP_KERNEL, "Core %d", i);
> > + if (!priv->coretemp_label[i])
> > + return -ENOMEM;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void check_resolved_cores(struct peci_cputemp *priv)
> > +{
> > + int ret;
> > +
> > + ret = init_core_mask(priv);
> > + if (ret)
> > + return;
> > +
> > + ret = create_temp_label(priv);
> > + if (ret)
> > + bitmap_zero(priv->core_mask, CORE_NUMS_MAX);
>
> This needs a comment explaining why it is ok to ignore the above errors.
>
> I understand it is because the non-core data will still be available.
> Yet, it still needs to be explained so others don't need to examine
> the code to figure out the reason.
>
Right - I'll add a:
/*
* Failure to resolve cores is non-critical, we're still able to
* provide other sensor data.
*/
> > +}
> > +
> > +static const struct hwmon_ops peci_cputemp_ops = {
> > + .is_visible = cputemp_is_visible,
> > + .read_string = cputemp_read_string,
> > + .read = cputemp_read,
> > +};
> > +
> > +static const u32 peci_cputemp_temp_channel_config[] = {
> > + /* Die temperature */
> > + HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> > HWMON_T_CRIT_HYST,
> > + /* DTS margin */
> > + HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> > HWMON_T_CRIT_HYST,
> > + /* Tcontrol temperature */
> > + HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
> > + /* Tthrottle temperature */
> > + HWMON_T_LABEL | HWMON_T_INPUT,
> > + /* Tjmax temperature */
> > + HWMON_T_LABEL | HWMON_T_INPUT,
> > + /* Core temperature - for all core channels */
> > + [channel_core ... CPUTEMP_CHANNEL_NUMS - 1] = HWMON_T_LABEL |
> > HWMON_T_INPUT,
> > + 0
> > +};
> > +
> > +static const struct hwmon_channel_info peci_cputemp_temp_channel = {
> > + .type = hwmon_temp,
> > + .config = peci_cputemp_temp_channel_config,
> > +};
> > +
> > +static const struct hwmon_channel_info *peci_cputemp_info[] = {
> > + &peci_cputemp_temp_channel,
> > + NULL
> > +};
> > +
> > +static const struct hwmon_chip_info peci_cputemp_chip_info = {
> > + .ops = &peci_cputemp_ops,
> > + .info = peci_cputemp_info,
> > +};
> > +
> > +static int peci_cputemp_probe(struct auxiliary_device *adev,
> > + const struct auxiliary_device_id *id)
> > +{
> > + struct device *dev = &adev->dev;
> > + struct peci_device *peci_dev = to_peci_device(dev->parent);
> > + struct peci_cputemp *priv;
> > + struct device *hwmon_dev;
> > +
> > + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> > + if (!priv)
> > + return -ENOMEM;
> > +
> > + priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_cputemp.cpu%d",
> > + peci_dev->info.socket_id);
> > + if (!priv->name)
> > + return -ENOMEM;
> > +
> > + dev_set_drvdata(dev, priv);
>
> What is this used for ?
Our sensors are per-device. We need this to access the corresponding priv in
cputemp_read_string() and cputemp_read().
>
> > + priv->dev = dev;
> > + priv->peci_dev = peci_dev;
> > + priv->gen_info = (const struct cpu_info *)id->driver_data;
> > +
> > + check_resolved_cores(priv);
> > +
> > + hwmon_dev = devm_hwmon_device_register_with_info(priv->dev, priv-
> > >name,
> > + priv,
> > &peci_cputemp_chip_info, NULL);
> > +
> > + return PTR_ERR_OR_ZERO(hwmon_dev);
> > +}
> > +
> > +static struct resolved_cores_reg resolved_cores_reg_hsx = {
> > + .bus = 1,
> > + .dev = 30,
> > + .func = 3,
> > + .offset = 0xb4,
> > +};
> > +
> > +static struct resolved_cores_reg resolved_cores_reg_icx = {
> > + .bus = 14,
> > + .dev = 30,
> > + .func = 3,
> > + .offset = 0xd0,
> > +};
>
> Please explain those magic numbers.
>
Those magic numbers refer to BDF (bus, device, function) and offset of the PCI
config register (RESOLVED_CORES_CFG) that can be accessed via PECI to read
resolved cores configuration.
Unfortunately, the values are just platform-specific magic numbers.
Do you think this should be explained with an additional comment?
Thank you
-Iwona
> > +
> > +static const struct cpu_info cpu_hsx = {
> > + .reg = &resolved_cores_reg_hsx,
> > + .min_peci_revision = 0x30,
> > +};
> > +
> > +static const struct cpu_info cpu_icx = {
> > + .reg = &resolved_cores_reg_icx,
> > + .min_peci_revision = 0x40,
> > +};
> > +
> > +static const struct auxiliary_device_id peci_cputemp_ids[] = {
> > + {
> > + .name = "peci_cpu.cputemp.hsx",
> > + .driver_data = (kernel_ulong_t)&cpu_hsx,
> > + },
> > + {
> > + .name = "peci_cpu.cputemp.bdx",
> > + .driver_data = (kernel_ulong_t)&cpu_hsx,
> > + },
> > + {
> > + .name = "peci_cpu.cputemp.bdxd",
> > + .driver_data = (kernel_ulong_t)&cpu_hsx,
> > + },
> > + {
> > + .name = "peci_cpu.cputemp.skx",
> > + .driver_data = (kernel_ulong_t)&cpu_hsx,
> > + },
> > + {
> > + .name = "peci_cpu.cputemp.icx",
> > + .driver_data = (kernel_ulong_t)&cpu_icx,
> > + },
> > + {
> > + .name = "peci_cpu.cputemp.icxd",
> > + .driver_data = (kernel_ulong_t)&cpu_icx,
> > + },
> > + { }
> > +};
> > +MODULE_DEVICE_TABLE(auxiliary, peci_cputemp_ids);
> > +
> > +static struct auxiliary_driver peci_cputemp_driver = {
> > + .probe = peci_cputemp_probe,
> > + .id_table = peci_cputemp_ids,
> > +};
> > +
> > +module_auxiliary_driver(peci_cputemp_driver);
> > +
> > +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
> > +MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
> > +MODULE_DESCRIPTION("PECI cputemp driver");
> > +MODULE_LICENSE("GPL");
> > +MODULE_IMPORT_NS(PECI_CPU);
On Mon, Jul 19, 2021 at 08:12:54PM +0000, Winiarska, Iwona wrote:
> > > +static const char *cputemp_label[DEFAULT_CHANNEL_NUMS] = {
> > > +???????"Die",
> > > +???????"DTS",
> > > +???????"Tcontrol",
> > > +???????"Tthrottle",
> > > +???????"Tjmax",
> > > +};
> > > +
> > > +static int get_temp_targets(struct peci_cputemp *priv)
> > > +{
> > > +???????s32 tthrottle_offset, tcontrol_margin;
> > > +???????u32 pcs;
> > > +???????int ret;
> > > +
> > > +???????/*
> > > +??????? * Just use only the tcontrol marker to determine if target values
> > > need
> > > +??????? * update.
> > > +??????? */
> > > +???????if (!peci_sensor_need_update(&priv->temp.tcontrol))
> > > +???????????????return 0;
> > > +
> > True for the entire code: Please explain how this avoids race conditions
> > without locking between the condition check here and the call to
> > peci_sensor_mark_updated() below. The explanation needs to be added
> > as comment into the code for later reference.
> >
>
> You're right, there is a race here that may cause PECI command to be triggered
> more than once. It doesn't have any impact on correctness though.
That is only correct if multiple read operations of PECI_PCS_TEMP_TARGET
always return the same value. If so, reading those values multiple times
would not make sense. Instead, the values could be read once and cached.
If PECI_PCS_TEMP_TARGET can return different values each time it is called,
the lack of mutex protection could result in inconsistent values for
priv->temp.tjmax.value, priv->temp.tcontrol.value, and
priv->temp.tthrottle.value.
So either this needs a mutex, or the code should be changed to read the
static values only once.
You could instead add a comment stating that multiple unprotected reads
are redundant because the returned data is static, that parallel reads
are thus not racy, and that a mutex is therefore not needed, but I won't
accept such code.
> I could add a comment explaining that, but I guess just adding a mutex to avoid
> the race makes more sense.
>
> > > +???????ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
> > > +???????if (ret)
> > > +???????????????return ret;
> > > +
> > > +???????priv->temp.tjmax.value = FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) *
> > > MILLIDEGREE_PER_DEGREE;
> > > +
> > > +???????tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
> > > +???????tcontrol_margin = sign_extend32(tcontrol_margin, 7) *
> > > MILLIDEGREE_PER_DEGREE;
> > > +???????priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
> > > +
> > > +???????tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) *
> > > MILLIDEGREE_PER_DEGREE;
> > > +???????priv->temp.tthrottle.value = priv->temp.tjmax.value -
> > > tthrottle_offset;
> > > +
> > > +???????peci_sensor_mark_updated(&priv->temp.tcontrol);
> > > +
> > > +???????return 0;
> > > +}
> > > +
> > > +/*
> > > + * Processors return a value of DTS reading in S10.6 fixed point format
> > > + * (sign, 10 bits signed integer value, 6 bits fractional).
> > > + * Error codes:
> > > + *?? 0x8000: General sensor error
> > > + *?? 0x8001: Reserved
> > > + *?? 0x8002: Underflow on reading value
> > > + *?? 0x8003-0x81ff: Reserved
> > > + */
> > > +static bool dts_valid(s32 val)
> > > +{
> > > +???????return val < 0x8000 || val > 0x81ff;
> > > +}
> > > +
> > > +static s32 dts_to_millidegree(s32 val)
> > > +{
> > > +???????return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE /
> > > DTS_FIXED_POINT_FRACTION;
> > > +}
> > > +
> > > +static int get_die_temp(struct peci_cputemp *priv)
> > > +{
> > > +???????s16 temp;
> > > +???????int ret;
> > > +
> > > +???????if (!peci_sensor_need_update(&priv->temp.die))
> > > +???????????????return 0;
> > > +
> > > +???????ret = peci_temp_read(priv->peci_dev, &temp);
> > > +???????if (ret)
> > > +???????????????return ret;
> > > +
> > > +???????if (!dts_valid(temp))
> > > +???????????????return -EIO;
> > > +
> > > +???????/* Note that the tjmax should be available before calling it */
> > > +???????priv->temp.die.value = priv->temp.tjmax.value +
> > > dts_to_millidegree(temp);
> > > +
> > > +???????peci_sensor_mark_updated(&priv->temp.die);
The same is true here: Either the value returned from peci_temp_read()
is static (which seems unlikely), or there is a race between reading
the temperature, storing it in priv->temp.die.value, and setting the
update flag. With the current code, there is no guarantee that the
stored value is the value that was read by the thread that sets the
updated flag. One could argue that it doesn't really matter because it
is irrelevant which thread stores the temperature and which thread sets
the updated flag, but that is really bad coding style, and I won't
accept it.
This is true for all code which reads a value from the chip, stores
it locally, and then sets the updated flag.
> > > +
> > > +???????return 0;
> > > +}
> > > +
> > > +static int get_dts(struct peci_cputemp *priv)
> > > +{
> > > +???????s32 dts_margin;
> > > +???????u32 pcs;
> > > +???????int ret;
> > > +
> > > +???????if (!peci_sensor_need_update(&priv->temp.dts))
> > > +???????????????return 0;
> > > +
> > > +???????ret = peci_pcs_read(priv->peci_dev, PECI_PCS_THERMAL_MARGIN, 0, &pcs);
> > > +???????if (ret)
> > > +???????????????return ret;
> > > +
> > > +???????dts_margin = FIELD_GET(DTS_MARGIN_MASK, pcs);
> > > +???????if (!dts_valid(dts_margin))
> > > +???????????????return -EIO;
> > > +
> > > +???????/* Note that the tcontrol should be available before calling it */
> > > +???????priv->temp.dts.value = priv->temp.tcontrol.value -
> > > dts_to_millidegree(dts_margin);
> > > +
> > > +???????peci_sensor_mark_updated(&priv->temp.dts);
> > > +
> > > +???????return 0;
> > > +}
> > > +
> > > +static int get_core_temp(struct peci_cputemp *priv, int core_index)
> > > +{
> > > +???????s32 core_dts_margin;
> > > +???????u32 pcs;
> > > +???????int ret;
> > > +
> > > +???????if (!peci_sensor_need_update(&priv->temp.core[core_index]))
> > > +???????????????return 0;
> > > +
> > > +???????ret = peci_pcs_read(priv->peci_dev, PECI_PCS_MODULE_TEMP, core_index,
> > > &pcs);
> > > +???????if (ret)
> > > +???????????????return ret;
> > > +
> > > +???????core_dts_margin = FIELD_GET(PCS_MODULE_TEMP_MASK, pcs);
> > > +???????if (!dts_valid(core_dts_margin))
> > > +???????????????return -EIO;
> > > +
> > > +???????/* Note that the tjmax should be available before calling it */
> > > +???????priv->temp.core[core_index].value =
> > > +???????????????priv->temp.tjmax.value + dts_to_millidegree(core_dts_margin);
> > > +
> > > +???????peci_sensor_mark_updated(&priv->temp.core[core_index]);
> > > +
> > > +???????return 0;
> > > +}
> > > +
> > > +static int cputemp_read_string(struct device *dev, enum hwmon_sensor_types
> > > type,
> > > +????????????????????????????? u32 attr, int channel, const char **str)
> > > +{
> > > +???????struct peci_cputemp *priv = dev_get_drvdata(dev);
> > > +
> > > +???????if (attr != hwmon_temp_label)
> > > +???????????????return -EOPNOTSUPP;
> > > +
> > > +???????*str = channel < channel_core ?
> > > +???????????????cputemp_label[channel] : priv->coretemp_label[channel -
> > > channel_core];
> > > +
> > > +???????return 0;
> > > +}
> > > +
> > > +static int cputemp_read(struct device *dev, enum hwmon_sensor_types type,
> > > +???????????????????????u32 attr, int channel, long *val)
> > > +{
> > > +???????struct peci_cputemp *priv = dev_get_drvdata(dev);
> > > +???????int ret, core_index;
> > > +
> > > +???????ret = get_temp_targets(priv);
> > > +???????if (ret)
> > > +???????????????return ret;
> > > +
> > > +???????switch (attr) {
> > > +???????case hwmon_temp_input:
> > > +???????????????switch (channel) {
> > > +???????????????case channel_die:
> > > +???????????????????????ret = get_die_temp(priv);
> > > +???????????????????????if (ret)
> > > +???????????????????????????????return ret;
> > > +
> > > +???????????????????????*val = priv->temp.die.value;
> > > +???????????????????????break;
> > > +???????????????case channel_dts:
> > > +???????????????????????ret = get_dts(priv);
> > > +???????????????????????if (ret)
> > > +???????????????????????????????return ret;
> > > +
> > > +???????????????????????*val = priv->temp.dts.value;
> > > +???????????????????????break;
> > > +???????????????case channel_tcontrol:
> > > +???????????????????????*val = priv->temp.tcontrol.value;
> > > +???????????????????????break;
> > > +???????????????case channel_tthrottle:
> > > +???????????????????????*val = priv->temp.tthrottle.value;
> > > +???????????????????????break;
> > > +???????????????case channel_tjmax:
> > > +???????????????????????*val = priv->temp.tjmax.value;
> > > +???????????????????????break;
> > > +???????????????default:
> > > +???????????????????????core_index = channel - channel_core;
> > > +???????????????????????ret = get_core_temp(priv, core_index);
> > > +???????????????????????if (ret)
> > > +???????????????????????????????return ret;
> > > +
> > > +???????????????????????*val = priv->temp.core[core_index].value;
> > > +???????????????????????break;
> > > +???????????????}
> > > +???????????????break;
> > > +???????case hwmon_temp_max:
> > > +???????????????*val = priv->temp.tcontrol.value;
> > > +???????????????break;
> > > +???????case hwmon_temp_crit:
> > > +???????????????*val = priv->temp.tjmax.value;
> > > +???????????????break;
> > > +???????case hwmon_temp_crit_hyst:
> > > +???????????????*val = priv->temp.tjmax.value - priv->temp.tcontrol.value;
> > > +???????????????break;
> > > +???????default:
> > > +???????????????return -EOPNOTSUPP;
> > > +???????}
> > > +
> > > +???????return 0;
> > > +}
> > > +
> > > +static umode_t cputemp_is_visible(const void *data, enum hwmon_sensor_types
> > > type,
> > > +???????????????????????????????? u32 attr, int channel)
> > > +{
> > > +???????const struct peci_cputemp *priv = data;
> > > +
> > > +???????if (channel > CPUTEMP_CHANNEL_NUMS)
> > > +???????????????return 0;
> > > +
> > > +???????if (channel < channel_core)
> > > +???????????????return 0444;
> > > +
> > > +???????if (test_bit(channel - channel_core, priv->core_mask))
> > > +???????????????return 0444;
> > > +
> > > +???????return 0;
> > > +}
> > > +
> > > +static int init_core_mask(struct peci_cputemp *priv)
> > > +{
> > > +???????struct peci_device *peci_dev = priv->peci_dev;
> > > +???????struct resolved_cores_reg *reg = priv->gen_info->reg;
> > > +???????u64 core_mask;
> > > +???????u32 data;
> > > +???????int ret;
> > > +
> > > +???????/* Get the RESOLVED_CORES register value */
> > > +???????switch (peci_dev->info.model) {
> > > +???????case INTEL_FAM6_ICELAKE_X:
> > > +???????case INTEL_FAM6_ICELAKE_D:
> > > +???????????????ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
> > > +??????????????????????????????????????????? reg->func, reg->offset + 4,
> > > &data);
> > > +???????????????if (ret)
> > > +???????????????????????return ret;
> > > +
> > > +???????????????core_mask = (u64)data << 32;
> > > +
> > > +???????????????ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
> > > +??????????????????????????????????????????? reg->func, reg->offset, &data);
> > > +???????????????if (ret)
> > > +???????????????????????return ret;
> > > +
> > > +???????????????core_mask |= data;
> > > +
> > > +???????????????break;
> > > +???????default:
> > > +???????????????ret = peci_pci_local_read(peci_dev, reg->bus, reg->dev,
> > > +???????????????????????????????????????? reg->func, reg->offset, &data);
> > > +???????????????if (ret)
> > > +???????????????????????return ret;
> > > +
> > > +???????????????core_mask = data;
> > > +
> > > +???????????????break;
> > > +???????}
> > > +
> > > +???????if (!core_mask)
> > > +???????????????return -EIO;
> > > +
> > > +???????bitmap_from_u64(priv->core_mask, core_mask);
> > > +
> > > +???????return 0;
> > > +}
> > > +
> > > +static int create_temp_label(struct peci_cputemp *priv)
> > > +{
> > > +???????unsigned long core_max = find_last_bit(priv->core_mask,
> > > CORE_NUMS_MAX);
> > > +???????int i;
> > > +
> > > +???????priv->coretemp_label = devm_kzalloc(priv->dev, core_max * sizeof(char
> > > *), GFP_KERNEL);
> > > +???????if (!priv->coretemp_label)
> > > +???????????????return -ENOMEM;
> > > +
> > > +???????for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
> > > +???????????????priv->coretemp_label[i] = devm_kasprintf(priv->dev,
> > > GFP_KERNEL, "Core %d", i);
> > > +???????????????if (!priv->coretemp_label[i])
> > > +???????????????????????return -ENOMEM;
> > > +???????}
> > > +
> > > +???????return 0;
> > > +}
> > > +
> > > +static void check_resolved_cores(struct peci_cputemp *priv)
> > > +{
> > > +???????int ret;
> > > +
> > > +???????ret = init_core_mask(priv);
> > > +???????if (ret)
> > > +???????????????return;
> > > +
> > > +???????ret = create_temp_label(priv);
> > > +???????if (ret)
> > > +???????????????bitmap_zero(priv->core_mask, CORE_NUMS_MAX);
> >
> > This needs a comment explaining why it is ok to ignore the above errors.
> >
> > I understand it is because the non-core data will still be available.
> > Yet, it still needs to be explained so others don't need to examine
> > the code to figure out the reason.
> >
>
> Right - I'll add a:
> /*
> * Failure to resolve cores is non-critical, we're still able to
> * provide other sensor data.
> */
>
> > > +}
> > > +
> > > +static const struct hwmon_ops peci_cputemp_ops = {
> > > +???????.is_visible = cputemp_is_visible,
> > > +???????.read_string = cputemp_read_string,
> > > +???????.read = cputemp_read,
> > > +};
> > > +
> > > +static const u32 peci_cputemp_temp_channel_config[] = {
> > > +???????/* Die temperature */
> > > +???????HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> > > HWMON_T_CRIT_HYST,
> > > +???????/* DTS margin */
> > > +???????HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> > > HWMON_T_CRIT_HYST,
> > > +???????/* Tcontrol temperature */
> > > +???????HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
> > > +???????/* Tthrottle temperature */
> > > +???????HWMON_T_LABEL | HWMON_T_INPUT,
> > > +???????/* Tjmax temperature */
> > > +???????HWMON_T_LABEL | HWMON_T_INPUT,
> > > +???????/* Core temperature - for all core channels */
> > > +???????[channel_core ... CPUTEMP_CHANNEL_NUMS - 1] = HWMON_T_LABEL |
> > > HWMON_T_INPUT,
> > > +???????0
> > > +};
> > > +
> > > +static const struct hwmon_channel_info peci_cputemp_temp_channel = {
> > > +???????.type = hwmon_temp,
> > > +???????.config = peci_cputemp_temp_channel_config,
> > > +};
> > > +
> > > +static const struct hwmon_channel_info *peci_cputemp_info[] = {
> > > +???????&peci_cputemp_temp_channel,
> > > +???????NULL
> > > +};
> > > +
> > > +static const struct hwmon_chip_info peci_cputemp_chip_info = {
> > > +???????.ops = &peci_cputemp_ops,
> > > +???????.info = peci_cputemp_info,
> > > +};
> > > +
> > > +static int peci_cputemp_probe(struct auxiliary_device *adev,
> > > +???????????????????????????? const struct auxiliary_device_id *id)
> > > +{
> > > +???????struct device *dev = &adev->dev;
> > > +???????struct peci_device *peci_dev = to_peci_device(dev->parent);
> > > +???????struct peci_cputemp *priv;
> > > +???????struct device *hwmon_dev;
> > > +
> > > +???????priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> > > +???????if (!priv)
> > > +???????????????return -ENOMEM;
> > > +
> > > +???????priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_cputemp.cpu%d",
> > > +?????????????????????????????????? peci_dev->info.socket_id);
> > > +???????if (!priv->name)
> > > +???????????????return -ENOMEM;
> > > +
> > > +???????dev_set_drvdata(dev, priv);
> >
> > What is this used for ?
>
> Our sensors are per-device. We need this to access the corresponding priv in
> cputemp_read_string() and cputemp_read().
>
The parameter to both cputemp_read_string() and cputemp_read() is the
pointer to the hwmon device, not the pointer to the auxiliary device.
It has its driver data set to 'priv' from the parameter passed to
devm_hwmon_device_register_with_info().
> >
> > > +???????priv->dev = dev;
> > > +???????priv->peci_dev = peci_dev;
> > > +???????priv->gen_info = (const struct cpu_info *)id->driver_data;
> > > +
> > > +???????check_resolved_cores(priv);
> > > +
> > > +???????hwmon_dev = devm_hwmon_device_register_with_info(priv->dev, priv-
> > > >name,
> > > +??????????????????????????????????????????????????????? priv,
> > > &peci_cputemp_chip_info, NULL);
> > > +
> > > +???????return PTR_ERR_OR_ZERO(hwmon_dev);
> > > +}
> > > +
> > > +static struct resolved_cores_reg resolved_cores_reg_hsx = {
> > > +???????.bus = 1,
> > > +???????.dev = 30,
> > > +???????.func = 3,
> > > +???????.offset = 0xb4,
> > > +};
> > > +
> > > +static struct resolved_cores_reg resolved_cores_reg_icx = {
> > > +???????.bus = 14,
> > > +???????.dev = 30,
> > > +???????.func = 3,
> > > +???????.offset = 0xd0,
> > > +};
> >
> > Please explain those magic numbers.
> >
>
> Those magic numbers refer to BDF (bus, device, function) and offset of the PCI
> config register (RESOLVED_CORES_CFG) that can be accessed via PECI to read
> resolved cores configuration.
> Unfortunately, the values are just platform-specific magic numbers.
> Do you think this should be explained with an additional comment?
>
Yes, please.
Guenter
> Thank you
> -Iwona
>
> > > +
> > > +static const struct cpu_info cpu_hsx = {
> > > +???????.reg????????????= &resolved_cores_reg_hsx,
> > > +???????.min_peci_revision = 0x30,
> > > +};
> > > +
> > > +static const struct cpu_info cpu_icx = {
> > > +???????.reg????????????= &resolved_cores_reg_icx,
> > > +???????.min_peci_revision = 0x40,
> > > +};
> > > +
> > > +static const struct auxiliary_device_id peci_cputemp_ids[] = {
> > > +???????{
> > > +???????????????.name = "peci_cpu.cputemp.hsx",
> > > +???????????????.driver_data = (kernel_ulong_t)&cpu_hsx,
> > > +???????},
> > > +???????{
> > > +???????????????.name = "peci_cpu.cputemp.bdx",
> > > +???????????????.driver_data = (kernel_ulong_t)&cpu_hsx,
> > > +???????},
> > > +???????{
> > > +???????????????.name = "peci_cpu.cputemp.bdxd",
> > > +???????????????.driver_data = (kernel_ulong_t)&cpu_hsx,
> > > +???????},
> > > +???????{
> > > +???????????????.name = "peci_cpu.cputemp.skx",
> > > +???????????????.driver_data = (kernel_ulong_t)&cpu_hsx,
> > > +???????},
> > > +???????{
> > > +???????????????.name = "peci_cpu.cputemp.icx",
> > > +???????????????.driver_data = (kernel_ulong_t)&cpu_icx,
> > > +???????},
> > > +???????{
> > > +???????????????.name = "peci_cpu.cputemp.icxd",
> > > +???????????????.driver_data = (kernel_ulong_t)&cpu_icx,
> > > +???????},
> > > +???????{ }
> > > +};
> > > +MODULE_DEVICE_TABLE(auxiliary, peci_cputemp_ids);
> > > +
> > > +static struct auxiliary_driver peci_cputemp_driver = {
> > > +???????.probe??????????= peci_cputemp_probe,
> > > +???????.id_table???????= peci_cputemp_ids,
> > > +};
> > > +
> > > +module_auxiliary_driver(peci_cputemp_driver);
> > > +
> > > +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
> > > +MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
> > > +MODULE_DESCRIPTION("PECI cputemp driver");
> > > +MODULE_LICENSE("GPL");
> > > +MODULE_IMPORT_NS(PECI_CPU);
>
On 7/19/21 1:31 PM, Winiarska, Iwona wrote:
> On Thu, 2021-07-15 at 10:56 -0700, Guenter Roeck wrote:
>> On Tue, Jul 13, 2021 at 12:04:45AM +0200, Iwona Winiarska wrote:
>>> Add peci-dimmtemp driver for Digital Thermal Sensor (DTS) thermal
>>> readings of DIMMs that are accessible via the processor PECI interface.
>>>
>>> The main use case for the driver (and PECI interface) is out-of-band
>>> management, where we're able to obtain the DTS readings from an external
>>> entity connected with PECI, e.g. BMC on server platforms.
>>>
>>> Co-developed-by: Jae Hyun Yoo <[email protected]>
>>> Signed-off-by: Jae Hyun Yoo <[email protected]>
>>> Signed-off-by: Iwona Winiarska <[email protected]>
>>> Reviewed-by: Pierre-Louis Bossart <[email protected]>
>>> ---
>>> drivers/hwmon/peci/Kconfig | 13 +
>>> drivers/hwmon/peci/Makefile | 2 +
>>> drivers/hwmon/peci/dimmtemp.c | 508 ++++++++++++++++++++++++++++++++++
>>> 3 files changed, 523 insertions(+)
>>> create mode 100644 drivers/hwmon/peci/dimmtemp.c
>>>
>>> diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
>>> index e10eed68d70a..f2d57efa508b 100644
>>> --- a/drivers/hwmon/peci/Kconfig
>>> +++ b/drivers/hwmon/peci/Kconfig
>>> @@ -14,5 +14,18 @@ config SENSORS_PECI_CPUTEMP
>>> This driver can also be built as a module. If so, the module
>>> will be called peci-cputemp.
>>>
>>> +config SENSORS_PECI_DIMMTEMP
>>> + tristate "PECI DIMM temperature monitoring client"
>>> + depends on PECI
>>> + select SENSORS_PECI
>>> + select PECI_CPU
>>> + help
>>> + If you say yes here you get support for the generic Intel PECI hwmon
>>> + driver which provides Digital Thermal Sensor (DTS) thermal readings
>>> of
>>> + DIMM components that are accessible via the processor PECI
>>> interface.
>>> +
>>> + This driver can also be built as a module. If so, the module
>>> + will be called peci-dimmtemp.
>>> +
>>> config SENSORS_PECI
>>> tristate
>>> diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
>>> index e8a0ada5ab1f..191cfa0227f3 100644
>>> --- a/drivers/hwmon/peci/Makefile
>>> +++ b/drivers/hwmon/peci/Makefile
>>> @@ -1,5 +1,7 @@
>>> # SPDX-License-Identifier: GPL-2.0-only
>>>
>>> peci-cputemp-y := cputemp.o
>>> +peci-dimmtemp-y := dimmtemp.o
>>>
>>> obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
>>> +obj-$(CONFIG_SENSORS_PECI_DIMMTEMP) += peci-dimmtemp.o
>>> diff --git a/drivers/hwmon/peci/dimmtemp.c b/drivers/hwmon/peci/dimmtemp.c
>>> new file mode 100644
>>> index 000000000000..2fcb8607137a
>>> --- /dev/null
>>> +++ b/drivers/hwmon/peci/dimmtemp.c
>>> @@ -0,0 +1,508 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +// Copyright (c) 2018-2021 Intel Corporation
>>> +
>>> +#include <linux/auxiliary_bus.h>
>>> +#include <linux/bitfield.h>
>>> +#include <linux/bitops.h>
>>> +#include <linux/hwmon.h>
>>> +#include <linux/jiffies.h>
>>> +#include <linux/module.h>
>>> +#include <linux/peci.h>
>>> +#include <linux/peci-cpu.h>
>>> +#include <linux/units.h>
>>> +#include <linux/workqueue.h>
>>> +#include <linux/x86/intel-family.h>
>>> +
>>> +#include "common.h"
>>> +
>>> +#define DIMM_MASK_CHECK_DELAY_JIFFIES msecs_to_jiffies(5000)
>>> +#define DIMM_MASK_CHECK_RETRY_MAX 60 /* 60 x 5 secs = 5 minutes */
>>> +
>>> +/* Max number of channel ranks and DIMM index per channel */
>>> +#define CHAN_RANK_MAX_ON_HSX 8
>>> +#define DIMM_IDX_MAX_ON_HSX 3
>>> +#define CHAN_RANK_MAX_ON_BDX 4
>>> +#define DIMM_IDX_MAX_ON_BDX 3
>>> +#define CHAN_RANK_MAX_ON_BDXD 2
>>> +#define DIMM_IDX_MAX_ON_BDXD 2
>>> +#define CHAN_RANK_MAX_ON_SKX 6
>>> +#define DIMM_IDX_MAX_ON_SKX 2
>>> +#define CHAN_RANK_MAX_ON_ICX 8
>>> +#define DIMM_IDX_MAX_ON_ICX 2
>>> +#define CHAN_RANK_MAX_ON_ICXD 4
>>> +#define DIMM_IDX_MAX_ON_ICXD 2
>>> +
>>> +#define CHAN_RANK_MAX CHAN_RANK_MAX_ON_HSX
>>> +#define DIMM_IDX_MAX DIMM_IDX_MAX_ON_HSX
>>> +#define DIMM_NUMS_MAX (CHAN_RANK_MAX * DIMM_IDX_MAX)
>>> +
>>> +#define CPU_SEG_MASK GENMASK(23, 16)
>>> +#define GET_CPU_SEG(x) (((x) & CPU_SEG_MASK) >> 16)
>>> +#define CPU_BUS_MASK GENMASK(7, 0)
>>> +#define GET_CPU_BUS(x) ((x) & CPU_BUS_MASK)
>>> +
>>> +#define DIMM_TEMP_MAX GENMASK(15, 8)
>>> +#define DIMM_TEMP_CRIT GENMASK(23, 16)
>>> +#define GET_TEMP_MAX(x) (((x) & DIMM_TEMP_MAX) >> 8)
>>> +#define GET_TEMP_CRIT(x) (((x) & DIMM_TEMP_CRIT) >> 16)
>>> +
>>> +struct dimm_info {
>>> + int chan_rank_max;
>>> + int dimm_idx_max;
>>> + u8 min_peci_revision;
>>> +};
>>> +
>>> +struct peci_dimmtemp {
>>> + struct peci_device *peci_dev;
>>> + struct device *dev;
>>> + const char *name;
>>> + const struct dimm_info *gen_info;
>>> + struct delayed_work detect_work;
>>> + struct peci_sensor_data temp[DIMM_NUMS_MAX];
>>> + long temp_max[DIMM_NUMS_MAX];
>>> + long temp_crit[DIMM_NUMS_MAX];
>>> + int retry_count;
>>> + char **dimmtemp_label;
>>> + DECLARE_BITMAP(dimm_mask, DIMM_NUMS_MAX);
>>> +};
>>> +
>>> +static u8 __dimm_temp(u32 reg, int dimm_order)
>>> +{
>>> + return (reg >> (dimm_order * 8)) & 0xff;
>>> +}
>>> +
>>> +static int get_dimm_temp(struct peci_dimmtemp *priv, int dimm_no)
>>> +{
>>> + int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
>>> + int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
>>> + struct peci_device *peci_dev = priv->peci_dev;
>>> + u8 cpu_seg, cpu_bus, dev, func;
>>> + u64 offset;
>>> + u32 data;
>>> + u16 reg;
>>> + int ret;
>>> +
>>> + if (!peci_sensor_need_update(&priv->temp[dimm_no]))
>>> + return 0;
>>> +
>>> + ret = peci_pcs_read(peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank,
>>> &data);
>>> + if (ret)
>>> + return ret;
>>> +
>>
>> Similar to the cpu driver, the lack of mutex protection needs to be explained.
>>
>
> Sure, it will be consistent for the two drivers.
>
>>> + priv->temp[dimm_no].value = __dimm_temp(data, dimm_order) *
>>> MILLIDEGREE_PER_DEGREE;
>>> +
>>> + switch (peci_dev->info.model) {
>>> + case INTEL_FAM6_ICELAKE_X:
>>> + case INTEL_FAM6_ICELAKE_D:
>>> + ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd4,
>>> &data);
>>> + if (ret || !(data & BIT(31)))
>>> + break; /* Use default or previous value */
>>> +
>>> + ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd0,
>>> &data);
>>> + if (ret)
>>> + break; /* Use default or previous value */
>>> +
>>> + cpu_seg = GET_CPU_SEG(data);
>>> + cpu_bus = GET_CPU_BUS(data);
>>> +
>>> + /*
>>> + * Device 26, Offset 224e0: IMC 0 channel 0 -> rank 0
>>> + * Device 26, Offset 264e0: IMC 0 channel 1 -> rank 1
>>> + * Device 27, Offset 224e0: IMC 1 channel 0 -> rank 2
>>> + * Device 27, Offset 264e0: IMC 1 channel 1 -> rank 3
>>> + * Device 28, Offset 224e0: IMC 2 channel 0 -> rank 4
>>> + * Device 28, Offset 264e0: IMC 2 channel 1 -> rank 5
>>> + * Device 29, Offset 224e0: IMC 3 channel 0 -> rank 6
>>> + * Device 29, Offset 264e0: IMC 3 channel 1 -> rank 7
>>> + */
>>> + dev = 0x1a + chan_rank / 2;
>>> + offset = 0x224e0 + dimm_order * 4;
>>> + if (chan_rank % 2)
>>> + offset += 0x4000;
>>> +
>>> + ret = peci_mmio_read(peci_dev, 0, cpu_seg, cpu_bus, dev, 0,
>>> offset, &data);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
>>> MILLIDEGREE_PER_DEGREE;
>>> + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
>>> MILLIDEGREE_PER_DEGREE;
>>> +
>>> + break;
>>> + case INTEL_FAM6_SKYLAKE_X:
>>> + /*
>>> + * Device 10, Function 2: IMC 0 channel 0 -> rank 0
>>> + * Device 10, Function 6: IMC 0 channel 1 -> rank 1
>>> + * Device 11, Function 2: IMC 0 channel 2 -> rank 2
>>> + * Device 12, Function 2: IMC 1 channel 0 -> rank 3
>>> + * Device 12, Function 6: IMC 1 channel 1 -> rank 4
>>> + * Device 13, Function 2: IMC 1 channel 2 -> rank 5
>>> + */
>>> + dev = 10 + chan_rank / 3 * 2 + (chan_rank % 3 == 2 ? 1 : 0);
>>> + func = chan_rank % 3 == 1 ? 6 : 2;
>>> + reg = 0x120 + dimm_order * 4;
>>> +
>>> + ret = peci_pci_local_read(peci_dev, 2, dev, func, reg, &data);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
>>> MILLIDEGREE_PER_DEGREE;
>>> + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
>>> MILLIDEGREE_PER_DEGREE;
>>> +
>>> + break;
>>> + case INTEL_FAM6_BROADWELL_D:
>>> + /*
>>> + * Device 10, Function 2: IMC 0 channel 0 -> rank 0
>>> + * Device 10, Function 6: IMC 0 channel 1 -> rank 1
>>> + * Device 12, Function 2: IMC 1 channel 0 -> rank 2
>>> + * Device 12, Function 6: IMC 1 channel 1 -> rank 3
>>> + */
>>> + dev = 10 + chan_rank / 2 * 2;
>>> + func = (chan_rank % 2) ? 6 : 2;
>>> + reg = 0x120 + dimm_order * 4;
>>> +
>>> + ret = peci_pci_local_read(peci_dev, 2, dev, func, reg, &data);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
>>> MILLIDEGREE_PER_DEGREE;
>>> + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
>>> MILLIDEGREE_PER_DEGREE;
>>> +
>>> + break;
>>> + case INTEL_FAM6_HASWELL_X:
>>> + case INTEL_FAM6_BROADWELL_X:
>>> + /*
>>> + * Device 20, Function 0: IMC 0 channel 0 -> rank 0
>>> + * Device 20, Function 1: IMC 0 channel 1 -> rank 1
>>> + * Device 21, Function 0: IMC 0 channel 2 -> rank 2
>>> + * Device 21, Function 1: IMC 0 channel 3 -> rank 3
>>> + * Device 23, Function 0: IMC 1 channel 0 -> rank 4
>>> + * Device 23, Function 1: IMC 1 channel 1 -> rank 5
>>> + * Device 24, Function 0: IMC 1 channel 2 -> rank 6
>>> + * Device 24, Function 1: IMC 1 channel 3 -> rank 7
>>> + */
>>> + dev = 20 + chan_rank / 2 + chan_rank / 4;
>>> + func = chan_rank % 2;
>>> + reg = 0x120 + dimm_order * 4;
>>> +
>>> + ret = peci_pci_local_read(peci_dev, 1, dev, func, reg, &data);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
>>> MILLIDEGREE_PER_DEGREE;
>>> + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
>>> MILLIDEGREE_PER_DEGREE;
>>> +
>>> + break;
>>> + default:
>>> + return -EOPNOTSUPP;
>>> + }
>>> +
>>> + peci_sensor_mark_updated(&priv->temp[dimm_no]);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int dimmtemp_read_string(struct device *dev,
>>> + enum hwmon_sensor_types type,
>>> + u32 attr, int channel, const char **str)
>>> +{
>>> + struct peci_dimmtemp *priv = dev_get_drvdata(dev);
>>> +
>>> + if (attr != hwmon_temp_label)
>>> + return -EOPNOTSUPP;
>>> +
>>> + *str = (const char *)priv->dimmtemp_label[channel];
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int dimmtemp_read(struct device *dev, enum hwmon_sensor_types type,
>>> + u32 attr, int channel, long *val)
>>> +{
>>> + struct peci_dimmtemp *priv = dev_get_drvdata(dev);
>>> + int ret;
>>> +
>>> + ret = get_dimm_temp(priv, channel);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + switch (attr) {
>>> + case hwmon_temp_input:
>>> + *val = priv->temp[channel].value;
>>> + break;
>>> + case hwmon_temp_max:
>>> + *val = priv->temp_max[channel];
>>> + break;
>>> + case hwmon_temp_crit:
>>> + *val = priv->temp_crit[channel];
>>> + break;
>>> + default:
>>> + return -EOPNOTSUPP;
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static umode_t dimmtemp_is_visible(const void *data, enum hwmon_sensor_types
>>> type,
>>> + u32 attr, int channel)
>>> +{
>>> + const struct peci_dimmtemp *priv = data;
>>> +
>>> + if (test_bit(channel, priv->dimm_mask))
>>> + return 0444;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static const struct hwmon_ops peci_dimmtemp_ops = {
>>> + .is_visible = dimmtemp_is_visible,
>>> + .read_string = dimmtemp_read_string,
>>> + .read = dimmtemp_read,
>>> +};
>>> +
>>> +static int check_populated_dimms(struct peci_dimmtemp *priv)
>>> +{
>>> + int chan_rank_max = priv->gen_info->chan_rank_max;
>>> + int dimm_idx_max = priv->gen_info->dimm_idx_max;
>>> + int chan_rank, dimm_idx, ret;
>>> + u64 dimm_mask = 0;
>>> + u32 pcs;
>>> +
>>> + for (chan_rank = 0; chan_rank < chan_rank_max; chan_rank++) {
>>> + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_DDR_DIMM_TEMP,
>>> chan_rank, &pcs);
>>> + if (ret) {
>>> + /*
>>> + * Overall, we expect either success or -EINVAL in
>>> + * order to determine whether DIMM is populated or
>>> not.
>>> + * For anything else - we fall back to defering the
>>> + * detection to be performed at a later point in time.
>>> + */
>>> + if (ret == -EINVAL)
>>> + continue;
>>> + else
>>
>> else after continue is unnecessary.
>>
>
> Ok.
>
>>> + return -EAGAIN;
>>> + }
>>> +
>>> + for (dimm_idx = 0; dimm_idx < dimm_idx_max; dimm_idx++)
>>> + if (__dimm_temp(pcs, dimm_idx))
>>> + dimm_mask |= BIT(chan_rank * dimm_idx_max +
>>> dimm_idx);
>>> + }
>>> + /*
>>> + * It's possible that memory training is not done yet. In this case we
>>> + * defer the detection to be performed at a later point in time.
>>> + */
>>> + if (!dimm_mask)
>>> + return -EAGAIN;
>>> +
>>> + dev_dbg(priv->dev, "Scanned populated DIMMs: %#llx\n", dimm_mask);
>>> +
>>> + bitmap_from_u64(priv->dimm_mask, dimm_mask);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int create_dimm_temp_label(struct peci_dimmtemp *priv, int chan)
>>> +{
>>> + int rank = chan / priv->gen_info->dimm_idx_max;
>>> + int idx = chan % priv->gen_info->dimm_idx_max;
>>> +
>>> + priv->dimmtemp_label[chan] = devm_kasprintf(priv->dev, GFP_KERNEL,
>>> + "DIMM %c%d", 'A' + rank,
>>> + idx + 1);
>>> + if (!priv->dimmtemp_label[chan])
>>> + return -ENOMEM;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static const u32 peci_dimmtemp_temp_channel_config[] = {
>>> + [0 ... DIMM_NUMS_MAX - 1] = HWMON_T_LABEL | HWMON_T_INPUT |
>>> HWMON_T_MAX | HWMON_T_CRIT,
>>> + 0
>>> +};
>>> +
>>> +static const struct hwmon_channel_info peci_dimmtemp_temp_channel = {
>>> + .type = hwmon_temp,
>>> + .config = peci_dimmtemp_temp_channel_config,
>>> +};
>>> +
>>> +static const struct hwmon_channel_info *peci_dimmtemp_temp_info[] = {
>>> + &peci_dimmtemp_temp_channel,
>>> + NULL
>>> +};
>>> +
>>> +static const struct hwmon_chip_info peci_dimmtemp_chip_info = {
>>> + .ops = &peci_dimmtemp_ops,
>>> + .info = peci_dimmtemp_temp_info,
>>> +};
>>> +
>>> +static int create_dimm_temp_info(struct peci_dimmtemp *priv)
>>> +{
>>> + int ret, i, channels;
>>> + struct device *dev;
>>> +
>>> + ret = check_populated_dimms(priv);
>>> + if (ret == -EAGAIN) {
>>
>> The only error returned by check_populated_dimms() is -EAGAIN. Checking for
>> specifically this error here suggests that there may be other (ignored)
>> errors. The reader has to examine check_populated_dimms() to find out
>> that -EAGAIN is indeed the only possible error. To avoid confusion, please
>> only check for ret here.
>>
>
> Makes sense.
>
>>> + if (priv->retry_count < DIMM_MASK_CHECK_RETRY_MAX) {
>>> + schedule_delayed_work(&priv->detect_work,
>>> + DIMM_MASK_CHECK_DELAY_JIFFIES);
>>> + priv->retry_count++;
>>> + dev_dbg(priv->dev, "Deferred populating DIMM temp
>>> info\n");
>>> + return ret;
>>> + }
>>> +
>>> + dev_info(priv->dev, "Timeout populating DIMM temp info\n");
>>
>> If this returns an error, the message needs to be dev_err().
>>
>
> We need to check each CPU, but it's completely legal that only one processor in
> the systems has populated DIMMs.
> I'd prefer to keep dev_info() or maybe even downgrade it to dev_dbg().
>
If this is not an error, there should be no message....
> Thank you
> -Iwona
>
>>> + return -ETIMEDOUT;
and no error either.
Guenter
>>> + }
>>> +
>>> + channels = priv->gen_info->chan_rank_max * priv->gen_info-
>>>> dimm_idx_max;
>>> +
>>> + priv->dimmtemp_label = devm_kzalloc(priv->dev, channels * sizeof(char
>>> *), GFP_KERNEL);
>>> + if (!priv->dimmtemp_label)
>>> + return -ENOMEM;
>>> +
>>> + for_each_set_bit(i, priv->dimm_mask, DIMM_NUMS_MAX) {
>>> + ret = create_dimm_temp_label(priv, i);
>>> + if (ret)
>>> + return ret;
>>> + }
>>> +
>>> + dev = devm_hwmon_device_register_with_info(priv->dev, priv->name,
>>> priv,
>>> + &peci_dimmtemp_chip_info,
>>> NULL);
>>> + if (IS_ERR(dev)) {
>>> + dev_err(priv->dev, "Failed to register hwmon device\n");
>>> + return PTR_ERR(dev);
>>> + }
>>> +
>>> + dev_dbg(priv->dev, "%s: sensor '%s'\n", dev_name(dev), priv->name);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void create_dimm_temp_info_delayed(struct work_struct *work)
>>> +{
>>> + struct peci_dimmtemp *priv = container_of(to_delayed_work(work),
>>> + struct peci_dimmtemp,
>>> + detect_work);
>>> + int ret;
>>> +
>>> + ret = create_dimm_temp_info(priv);
>>> + if (ret && ret != -EAGAIN)
>>> + dev_dbg(priv->dev, "Failed to populate DIMM temp info\n");
>>> +}
>>> +
>>> +static int peci_dimmtemp_probe(struct auxiliary_device *adev, const struct
>>> auxiliary_device_id *id)
>>> +{
>>> + struct device *dev = &adev->dev;
>>> + struct peci_device *peci_dev = to_peci_device(dev->parent);
>>> + struct peci_dimmtemp *priv;
>>> + int ret;
>>> +
>>> + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>>> + if (!priv)
>>> + return -ENOMEM;
>>> +
>>> + priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_dimmtemp.cpu%d",
>>> + peci_dev->info.socket_id);
>>> + if (!priv->name)
>>> + return -ENOMEM;
>>> +
>>> + dev_set_drvdata(dev, priv);
>>> + priv->dev = dev;
>>> + priv->peci_dev = peci_dev;
>>> + priv->gen_info = (const struct dimm_info *)id->driver_data;
>>> +
>>> + INIT_DELAYED_WORK(&priv->detect_work, create_dimm_temp_info_delayed);
>>> +
>>> + ret = create_dimm_temp_info(priv);
>>> + if (ret && ret != -EAGAIN) {
>>> + dev_dbg(dev, "Failed to populate DIMM temp info\n");
>>> + return ret;
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void peci_dimmtemp_remove(struct auxiliary_device *adev)
>>> +{
>>> + struct peci_dimmtemp *priv = dev_get_drvdata(&adev->dev);
>>> +
>>> + cancel_delayed_work_sync(&priv->detect_work);
>>> +}
>>> +
>>> +static const struct dimm_info dimm_hsx = {
>>> + .chan_rank_max = CHAN_RANK_MAX_ON_HSX,
>>> + .dimm_idx_max = DIMM_IDX_MAX_ON_HSX,
>>> + .min_peci_revision = 0x30,
>>> +};
>>> +
>>> +static const struct dimm_info dimm_bdx = {
>>> + .chan_rank_max = CHAN_RANK_MAX_ON_BDX,
>>> + .dimm_idx_max = DIMM_IDX_MAX_ON_BDX,
>>> + .min_peci_revision = 0x30,
>>> +};
>>> +
>>> +static const struct dimm_info dimm_bdxd = {
>>> + .chan_rank_max = CHAN_RANK_MAX_ON_BDXD,
>>> + .dimm_idx_max = DIMM_IDX_MAX_ON_BDXD,
>>> + .min_peci_revision = 0x30,
>>> +};
>>> +
>>> +static const struct dimm_info dimm_skx = {
>>> + .chan_rank_max = CHAN_RANK_MAX_ON_SKX,
>>> + .dimm_idx_max = DIMM_IDX_MAX_ON_SKX,
>>> + .min_peci_revision = 0x30,
>>> +};
>>> +
>>> +static const struct dimm_info dimm_icx = {
>>> + .chan_rank_max = CHAN_RANK_MAX_ON_ICX,
>>> + .dimm_idx_max = DIMM_IDX_MAX_ON_ICX,
>>> + .min_peci_revision = 0x40,
>>> +};
>>> +
>>> +static const struct dimm_info dimm_icxd = {
>>> + .chan_rank_max = CHAN_RANK_MAX_ON_ICXD,
>>> + .dimm_idx_max = DIMM_IDX_MAX_ON_ICXD,
>>> + .min_peci_revision = 0x40,
>>> +};
>>> +
>>> +static const struct auxiliary_device_id peci_dimmtemp_ids[] = {
>>> + {
>>> + .name = "peci_cpu.dimmtemp.hsx",
>>> + .driver_data = (kernel_ulong_t)&dimm_hsx,
>>> + },
>>> + {
>>> + .name = "peci_cpu.dimmtemp.bdx",
>>> + .driver_data = (kernel_ulong_t)&dimm_bdx,
>>> + },
>>> + {
>>> + .name = "peci_cpu.dimmtemp.bdxd",
>>> + .driver_data = (kernel_ulong_t)&dimm_bdxd,
>>> + },
>>> + {
>>> + .name = "peci_cpu.dimmtemp.skx",
>>> + .driver_data = (kernel_ulong_t)&dimm_skx,
>>> + },
>>> + {
>>> + .name = "peci_cpu.dimmtemp.icx",
>>> + .driver_data = (kernel_ulong_t)&dimm_icx,
>>> + },
>>> + {
>>> + .name = "peci_cpu.dimmtemp.icxd",
>>> + .driver_data = (kernel_ulong_t)&dimm_icxd,
>>> + },
>>> + { }
>>> +};
>>> +MODULE_DEVICE_TABLE(auxiliary, peci_dimmtemp_ids);
>>> +
>>> +static struct auxiliary_driver peci_dimmtemp_driver = {
>>> + .probe = peci_dimmtemp_probe,
>>> + .remove = peci_dimmtemp_remove,
>>> + .id_table = peci_dimmtemp_ids,
>>> +};
>>> +
>>> +module_auxiliary_driver(peci_dimmtemp_driver);
>>> +
>>> +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
>>> +MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
>>> +MODULE_DESCRIPTION("PECI dimmtemp driver");
>>> +MODULE_LICENSE("GPL");
>>> +MODULE_IMPORT_NS(PECI_CPU);
>
On Thu, 2021-07-15 at 10:56 -0700, Guenter Roeck wrote:
> On Tue, Jul 13, 2021 at 12:04:45AM +0200, Iwona Winiarska wrote:
> > Add peci-dimmtemp driver for Digital Thermal Sensor (DTS) thermal
> > readings of DIMMs that are accessible via the processor PECI interface.
> >
> > The main use case for the driver (and PECI interface) is out-of-band
> > management, where we're able to obtain the DTS readings from an external
> > entity connected with PECI, e.g. BMC on server platforms.
> >
> > Co-developed-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> > drivers/hwmon/peci/Kconfig | 13 +
> > drivers/hwmon/peci/Makefile | 2 +
> > drivers/hwmon/peci/dimmtemp.c | 508 ++++++++++++++++++++++++++++++++++
> > 3 files changed, 523 insertions(+)
> > create mode 100644 drivers/hwmon/peci/dimmtemp.c
> >
> > diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
> > index e10eed68d70a..f2d57efa508b 100644
> > --- a/drivers/hwmon/peci/Kconfig
> > +++ b/drivers/hwmon/peci/Kconfig
> > @@ -14,5 +14,18 @@ config SENSORS_PECI_CPUTEMP
> > This driver can also be built as a module. If so, the module
> > will be called peci-cputemp.
> >
> > +config SENSORS_PECI_DIMMTEMP
> > + tristate "PECI DIMM temperature monitoring client"
> > + depends on PECI
> > + select SENSORS_PECI
> > + select PECI_CPU
> > + help
> > + If you say yes here you get support for the generic Intel PECI hwmon
> > + driver which provides Digital Thermal Sensor (DTS) thermal readings
> > of
> > + DIMM components that are accessible via the processor PECI
> > interface.
> > +
> > + This driver can also be built as a module. If so, the module
> > + will be called peci-dimmtemp.
> > +
> > config SENSORS_PECI
> > tristate
> > diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
> > index e8a0ada5ab1f..191cfa0227f3 100644
> > --- a/drivers/hwmon/peci/Makefile
> > +++ b/drivers/hwmon/peci/Makefile
> > @@ -1,5 +1,7 @@
> > # SPDX-License-Identifier: GPL-2.0-only
> >
> > peci-cputemp-y := cputemp.o
> > +peci-dimmtemp-y := dimmtemp.o
> >
> > obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
> > +obj-$(CONFIG_SENSORS_PECI_DIMMTEMP) += peci-dimmtemp.o
> > diff --git a/drivers/hwmon/peci/dimmtemp.c b/drivers/hwmon/peci/dimmtemp.c
> > new file mode 100644
> > index 000000000000..2fcb8607137a
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/dimmtemp.c
> > @@ -0,0 +1,508 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2018-2021 Intel Corporation
> > +
> > +#include <linux/auxiliary_bus.h>
> > +#include <linux/bitfield.h>
> > +#include <linux/bitops.h>
> > +#include <linux/hwmon.h>
> > +#include <linux/jiffies.h>
> > +#include <linux/module.h>
> > +#include <linux/peci.h>
> > +#include <linux/peci-cpu.h>
> > +#include <linux/units.h>
> > +#include <linux/workqueue.h>
> > +#include <linux/x86/intel-family.h>
> > +
> > +#include "common.h"
> > +
> > +#define DIMM_MASK_CHECK_DELAY_JIFFIES msecs_to_jiffies(5000)
> > +#define DIMM_MASK_CHECK_RETRY_MAX 60 /* 60 x 5 secs = 5 minutes */
> > +
> > +/* Max number of channel ranks and DIMM index per channel */
> > +#define CHAN_RANK_MAX_ON_HSX 8
> > +#define DIMM_IDX_MAX_ON_HSX 3
> > +#define CHAN_RANK_MAX_ON_BDX 4
> > +#define DIMM_IDX_MAX_ON_BDX 3
> > +#define CHAN_RANK_MAX_ON_BDXD 2
> > +#define DIMM_IDX_MAX_ON_BDXD 2
> > +#define CHAN_RANK_MAX_ON_SKX 6
> > +#define DIMM_IDX_MAX_ON_SKX 2
> > +#define CHAN_RANK_MAX_ON_ICX 8
> > +#define DIMM_IDX_MAX_ON_ICX 2
> > +#define CHAN_RANK_MAX_ON_ICXD 4
> > +#define DIMM_IDX_MAX_ON_ICXD 2
> > +
> > +#define CHAN_RANK_MAX CHAN_RANK_MAX_ON_HSX
> > +#define DIMM_IDX_MAX DIMM_IDX_MAX_ON_HSX
> > +#define DIMM_NUMS_MAX (CHAN_RANK_MAX * DIMM_IDX_MAX)
> > +
> > +#define CPU_SEG_MASK GENMASK(23, 16)
> > +#define GET_CPU_SEG(x) (((x) & CPU_SEG_MASK) >> 16)
> > +#define CPU_BUS_MASK GENMASK(7, 0)
> > +#define GET_CPU_BUS(x) ((x) & CPU_BUS_MASK)
> > +
> > +#define DIMM_TEMP_MAX GENMASK(15, 8)
> > +#define DIMM_TEMP_CRIT GENMASK(23, 16)
> > +#define GET_TEMP_MAX(x) (((x) & DIMM_TEMP_MAX) >> 8)
> > +#define GET_TEMP_CRIT(x) (((x) & DIMM_TEMP_CRIT) >> 16)
> > +
> > +struct dimm_info {
> > + int chan_rank_max;
> > + int dimm_idx_max;
> > + u8 min_peci_revision;
> > +};
> > +
> > +struct peci_dimmtemp {
> > + struct peci_device *peci_dev;
> > + struct device *dev;
> > + const char *name;
> > + const struct dimm_info *gen_info;
> > + struct delayed_work detect_work;
> > + struct peci_sensor_data temp[DIMM_NUMS_MAX];
> > + long temp_max[DIMM_NUMS_MAX];
> > + long temp_crit[DIMM_NUMS_MAX];
> > + int retry_count;
> > + char **dimmtemp_label;
> > + DECLARE_BITMAP(dimm_mask, DIMM_NUMS_MAX);
> > +};
> > +
> > +static u8 __dimm_temp(u32 reg, int dimm_order)
> > +{
> > + return (reg >> (dimm_order * 8)) & 0xff;
> > +}
> > +
> > +static int get_dimm_temp(struct peci_dimmtemp *priv, int dimm_no)
> > +{
> > + int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
> > + int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
> > + struct peci_device *peci_dev = priv->peci_dev;
> > + u8 cpu_seg, cpu_bus, dev, func;
> > + u64 offset;
> > + u32 data;
> > + u16 reg;
> > + int ret;
> > +
> > + if (!peci_sensor_need_update(&priv->temp[dimm_no]))
> > + return 0;
> > +
> > + ret = peci_pcs_read(peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank,
> > &data);
> > + if (ret)
> > + return ret;
> > +
>
> Similar to the cpu driver, the lack of mutex protection needs to be explained.
>
Sure, it will be consistent for the two drivers.
> > + priv->temp[dimm_no].value = __dimm_temp(data, dimm_order) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > + switch (peci_dev->info.model) {
> > + case INTEL_FAM6_ICELAKE_X:
> > + case INTEL_FAM6_ICELAKE_D:
> > + ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd4,
> > &data);
> > + if (ret || !(data & BIT(31)))
> > + break; /* Use default or previous value */
> > +
> > + ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd0,
> > &data);
> > + if (ret)
> > + break; /* Use default or previous value */
> > +
> > + cpu_seg = GET_CPU_SEG(data);
> > + cpu_bus = GET_CPU_BUS(data);
> > +
> > + /*
> > + * Device 26, Offset 224e0: IMC 0 channel 0 -> rank 0
> > + * Device 26, Offset 264e0: IMC 0 channel 1 -> rank 1
> > + * Device 27, Offset 224e0: IMC 1 channel 0 -> rank 2
> > + * Device 27, Offset 264e0: IMC 1 channel 1 -> rank 3
> > + * Device 28, Offset 224e0: IMC 2 channel 0 -> rank 4
> > + * Device 28, Offset 264e0: IMC 2 channel 1 -> rank 5
> > + * Device 29, Offset 224e0: IMC 3 channel 0 -> rank 6
> > + * Device 29, Offset 264e0: IMC 3 channel 1 -> rank 7
> > + */
> > + dev = 0x1a + chan_rank / 2;
> > + offset = 0x224e0 + dimm_order * 4;
> > + if (chan_rank % 2)
> > + offset += 0x4000;
> > +
> > + ret = peci_mmio_read(peci_dev, 0, cpu_seg, cpu_bus, dev, 0,
> > offset, &data);
> > + if (ret)
> > + return ret;
> > +
> > + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > + break;
> > + case INTEL_FAM6_SKYLAKE_X:
> > + /*
> > + * Device 10, Function 2: IMC 0 channel 0 -> rank 0
> > + * Device 10, Function 6: IMC 0 channel 1 -> rank 1
> > + * Device 11, Function 2: IMC 0 channel 2 -> rank 2
> > + * Device 12, Function 2: IMC 1 channel 0 -> rank 3
> > + * Device 12, Function 6: IMC 1 channel 1 -> rank 4
> > + * Device 13, Function 2: IMC 1 channel 2 -> rank 5
> > + */
> > + dev = 10 + chan_rank / 3 * 2 + (chan_rank % 3 == 2 ? 1 : 0);
> > + func = chan_rank % 3 == 1 ? 6 : 2;
> > + reg = 0x120 + dimm_order * 4;
> > +
> > + ret = peci_pci_local_read(peci_dev, 2, dev, func, reg, &data);
> > + if (ret)
> > + return ret;
> > +
> > + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > + break;
> > + case INTEL_FAM6_BROADWELL_D:
> > + /*
> > + * Device 10, Function 2: IMC 0 channel 0 -> rank 0
> > + * Device 10, Function 6: IMC 0 channel 1 -> rank 1
> > + * Device 12, Function 2: IMC 1 channel 0 -> rank 2
> > + * Device 12, Function 6: IMC 1 channel 1 -> rank 3
> > + */
> > + dev = 10 + chan_rank / 2 * 2;
> > + func = (chan_rank % 2) ? 6 : 2;
> > + reg = 0x120 + dimm_order * 4;
> > +
> > + ret = peci_pci_local_read(peci_dev, 2, dev, func, reg, &data);
> > + if (ret)
> > + return ret;
> > +
> > + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > + break;
> > + case INTEL_FAM6_HASWELL_X:
> > + case INTEL_FAM6_BROADWELL_X:
> > + /*
> > + * Device 20, Function 0: IMC 0 channel 0 -> rank 0
> > + * Device 20, Function 1: IMC 0 channel 1 -> rank 1
> > + * Device 21, Function 0: IMC 0 channel 2 -> rank 2
> > + * Device 21, Function 1: IMC 0 channel 3 -> rank 3
> > + * Device 23, Function 0: IMC 1 channel 0 -> rank 4
> > + * Device 23, Function 1: IMC 1 channel 1 -> rank 5
> > + * Device 24, Function 0: IMC 1 channel 2 -> rank 6
> > + * Device 24, Function 1: IMC 1 channel 3 -> rank 7
> > + */
> > + dev = 20 + chan_rank / 2 + chan_rank / 4;
> > + func = chan_rank % 2;
> > + reg = 0x120 + dimm_order * 4;
> > +
> > + ret = peci_pci_local_read(peci_dev, 1, dev, func, reg, &data);
> > + if (ret)
> > + return ret;
> > +
> > + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > + break;
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + peci_sensor_mark_updated(&priv->temp[dimm_no]);
> > +
> > + return 0;
> > +}
> > +
> > +static int dimmtemp_read_string(struct device *dev,
> > + enum hwmon_sensor_types type,
> > + u32 attr, int channel, const char **str)
> > +{
> > + struct peci_dimmtemp *priv = dev_get_drvdata(dev);
> > +
> > + if (attr != hwmon_temp_label)
> > + return -EOPNOTSUPP;
> > +
> > + *str = (const char *)priv->dimmtemp_label[channel];
> > +
> > + return 0;
> > +}
> > +
> > +static int dimmtemp_read(struct device *dev, enum hwmon_sensor_types type,
> > + u32 attr, int channel, long *val)
> > +{
> > + struct peci_dimmtemp *priv = dev_get_drvdata(dev);
> > + int ret;
> > +
> > + ret = get_dimm_temp(priv, channel);
> > + if (ret)
> > + return ret;
> > +
> > + switch (attr) {
> > + case hwmon_temp_input:
> > + *val = priv->temp[channel].value;
> > + break;
> > + case hwmon_temp_max:
> > + *val = priv->temp_max[channel];
> > + break;
> > + case hwmon_temp_crit:
> > + *val = priv->temp_crit[channel];
> > + break;
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static umode_t dimmtemp_is_visible(const void *data, enum hwmon_sensor_types
> > type,
> > + u32 attr, int channel)
> > +{
> > + const struct peci_dimmtemp *priv = data;
> > +
> > + if (test_bit(channel, priv->dimm_mask))
> > + return 0444;
> > +
> > + return 0;
> > +}
> > +
> > +static const struct hwmon_ops peci_dimmtemp_ops = {
> > + .is_visible = dimmtemp_is_visible,
> > + .read_string = dimmtemp_read_string,
> > + .read = dimmtemp_read,
> > +};
> > +
> > +static int check_populated_dimms(struct peci_dimmtemp *priv)
> > +{
> > + int chan_rank_max = priv->gen_info->chan_rank_max;
> > + int dimm_idx_max = priv->gen_info->dimm_idx_max;
> > + int chan_rank, dimm_idx, ret;
> > + u64 dimm_mask = 0;
> > + u32 pcs;
> > +
> > + for (chan_rank = 0; chan_rank < chan_rank_max; chan_rank++) {
> > + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_DDR_DIMM_TEMP,
> > chan_rank, &pcs);
> > + if (ret) {
> > + /*
> > + * Overall, we expect either success or -EINVAL in
> > + * order to determine whether DIMM is populated or
> > not.
> > + * For anything else - we fall back to defering the
> > + * detection to be performed at a later point in time.
> > + */
> > + if (ret == -EINVAL)
> > + continue;
> > + else
>
> else after continue is unnecessary.
>
Ok.
> > + return -EAGAIN;
> > + }
> > +
> > + for (dimm_idx = 0; dimm_idx < dimm_idx_max; dimm_idx++)
> > + if (__dimm_temp(pcs, dimm_idx))
> > + dimm_mask |= BIT(chan_rank * dimm_idx_max +
> > dimm_idx);
> > + }
> > + /*
> > + * It's possible that memory training is not done yet. In this case we
> > + * defer the detection to be performed at a later point in time.
> > + */
> > + if (!dimm_mask)
> > + return -EAGAIN;
> > +
> > + dev_dbg(priv->dev, "Scanned populated DIMMs: %#llx\n", dimm_mask);
> > +
> > + bitmap_from_u64(priv->dimm_mask, dimm_mask);
> > +
> > + return 0;
> > +}
> > +
> > +static int create_dimm_temp_label(struct peci_dimmtemp *priv, int chan)
> > +{
> > + int rank = chan / priv->gen_info->dimm_idx_max;
> > + int idx = chan % priv->gen_info->dimm_idx_max;
> > +
> > + priv->dimmtemp_label[chan] = devm_kasprintf(priv->dev, GFP_KERNEL,
> > + "DIMM %c%d", 'A' + rank,
> > + idx + 1);
> > + if (!priv->dimmtemp_label[chan])
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
> > +
> > +static const u32 peci_dimmtemp_temp_channel_config[] = {
> > + [0 ... DIMM_NUMS_MAX - 1] = HWMON_T_LABEL | HWMON_T_INPUT |
> > HWMON_T_MAX | HWMON_T_CRIT,
> > + 0
> > +};
> > +
> > +static const struct hwmon_channel_info peci_dimmtemp_temp_channel = {
> > + .type = hwmon_temp,
> > + .config = peci_dimmtemp_temp_channel_config,
> > +};
> > +
> > +static const struct hwmon_channel_info *peci_dimmtemp_temp_info[] = {
> > + &peci_dimmtemp_temp_channel,
> > + NULL
> > +};
> > +
> > +static const struct hwmon_chip_info peci_dimmtemp_chip_info = {
> > + .ops = &peci_dimmtemp_ops,
> > + .info = peci_dimmtemp_temp_info,
> > +};
> > +
> > +static int create_dimm_temp_info(struct peci_dimmtemp *priv)
> > +{
> > + int ret, i, channels;
> > + struct device *dev;
> > +
> > + ret = check_populated_dimms(priv);
> > + if (ret == -EAGAIN) {
>
> The only error returned by check_populated_dimms() is -EAGAIN. Checking for
> specifically this error here suggests that there may be other (ignored)
> errors. The reader has to examine check_populated_dimms() to find out
> that -EAGAIN is indeed the only possible error. To avoid confusion, please
> only check for ret here.
>
Makes sense.
> > + if (priv->retry_count < DIMM_MASK_CHECK_RETRY_MAX) {
> > + schedule_delayed_work(&priv->detect_work,
> > + DIMM_MASK_CHECK_DELAY_JIFFIES);
> > + priv->retry_count++;
> > + dev_dbg(priv->dev, "Deferred populating DIMM temp
> > info\n");
> > + return ret;
> > + }
> > +
> > + dev_info(priv->dev, "Timeout populating DIMM temp info\n");
>
> If this returns an error, the message needs to be dev_err().
>
We need to check each CPU, but it's completely legal that only one processor in
the systems has populated DIMMs.
I'd prefer to keep dev_info() or maybe even downgrade it to dev_dbg().
Thank you
-Iwona
> > + return -ETIMEDOUT;
> > + }
> > +
> > + channels = priv->gen_info->chan_rank_max * priv->gen_info-
> > >dimm_idx_max;
> > +
> > + priv->dimmtemp_label = devm_kzalloc(priv->dev, channels * sizeof(char
> > *), GFP_KERNEL);
> > + if (!priv->dimmtemp_label)
> > + return -ENOMEM;
> > +
> > + for_each_set_bit(i, priv->dimm_mask, DIMM_NUMS_MAX) {
> > + ret = create_dimm_temp_label(priv, i);
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + dev = devm_hwmon_device_register_with_info(priv->dev, priv->name,
> > priv,
> > + &peci_dimmtemp_chip_info,
> > NULL);
> > + if (IS_ERR(dev)) {
> > + dev_err(priv->dev, "Failed to register hwmon device\n");
> > + return PTR_ERR(dev);
> > + }
> > +
> > + dev_dbg(priv->dev, "%s: sensor '%s'\n", dev_name(dev), priv->name);
> > +
> > + return 0;
> > +}
> > +
> > +static void create_dimm_temp_info_delayed(struct work_struct *work)
> > +{
> > + struct peci_dimmtemp *priv = container_of(to_delayed_work(work),
> > + struct peci_dimmtemp,
> > + detect_work);
> > + int ret;
> > +
> > + ret = create_dimm_temp_info(priv);
> > + if (ret && ret != -EAGAIN)
> > + dev_dbg(priv->dev, "Failed to populate DIMM temp info\n");
> > +}
> > +
> > +static int peci_dimmtemp_probe(struct auxiliary_device *adev, const struct
> > auxiliary_device_id *id)
> > +{
> > + struct device *dev = &adev->dev;
> > + struct peci_device *peci_dev = to_peci_device(dev->parent);
> > + struct peci_dimmtemp *priv;
> > + int ret;
> > +
> > + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> > + if (!priv)
> > + return -ENOMEM;
> > +
> > + priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_dimmtemp.cpu%d",
> > + peci_dev->info.socket_id);
> > + if (!priv->name)
> > + return -ENOMEM;
> > +
> > + dev_set_drvdata(dev, priv);
> > + priv->dev = dev;
> > + priv->peci_dev = peci_dev;
> > + priv->gen_info = (const struct dimm_info *)id->driver_data;
> > +
> > + INIT_DELAYED_WORK(&priv->detect_work, create_dimm_temp_info_delayed);
> > +
> > + ret = create_dimm_temp_info(priv);
> > + if (ret && ret != -EAGAIN) {
> > + dev_dbg(dev, "Failed to populate DIMM temp info\n");
> > + return ret;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void peci_dimmtemp_remove(struct auxiliary_device *adev)
> > +{
> > + struct peci_dimmtemp *priv = dev_get_drvdata(&adev->dev);
> > +
> > + cancel_delayed_work_sync(&priv->detect_work);
> > +}
> > +
> > +static const struct dimm_info dimm_hsx = {
> > + .chan_rank_max = CHAN_RANK_MAX_ON_HSX,
> > + .dimm_idx_max = DIMM_IDX_MAX_ON_HSX,
> > + .min_peci_revision = 0x30,
> > +};
> > +
> > +static const struct dimm_info dimm_bdx = {
> > + .chan_rank_max = CHAN_RANK_MAX_ON_BDX,
> > + .dimm_idx_max = DIMM_IDX_MAX_ON_BDX,
> > + .min_peci_revision = 0x30,
> > +};
> > +
> > +static const struct dimm_info dimm_bdxd = {
> > + .chan_rank_max = CHAN_RANK_MAX_ON_BDXD,
> > + .dimm_idx_max = DIMM_IDX_MAX_ON_BDXD,
> > + .min_peci_revision = 0x30,
> > +};
> > +
> > +static const struct dimm_info dimm_skx = {
> > + .chan_rank_max = CHAN_RANK_MAX_ON_SKX,
> > + .dimm_idx_max = DIMM_IDX_MAX_ON_SKX,
> > + .min_peci_revision = 0x30,
> > +};
> > +
> > +static const struct dimm_info dimm_icx = {
> > + .chan_rank_max = CHAN_RANK_MAX_ON_ICX,
> > + .dimm_idx_max = DIMM_IDX_MAX_ON_ICX,
> > + .min_peci_revision = 0x40,
> > +};
> > +
> > +static const struct dimm_info dimm_icxd = {
> > + .chan_rank_max = CHAN_RANK_MAX_ON_ICXD,
> > + .dimm_idx_max = DIMM_IDX_MAX_ON_ICXD,
> > + .min_peci_revision = 0x40,
> > +};
> > +
> > +static const struct auxiliary_device_id peci_dimmtemp_ids[] = {
> > + {
> > + .name = "peci_cpu.dimmtemp.hsx",
> > + .driver_data = (kernel_ulong_t)&dimm_hsx,
> > + },
> > + {
> > + .name = "peci_cpu.dimmtemp.bdx",
> > + .driver_data = (kernel_ulong_t)&dimm_bdx,
> > + },
> > + {
> > + .name = "peci_cpu.dimmtemp.bdxd",
> > + .driver_data = (kernel_ulong_t)&dimm_bdxd,
> > + },
> > + {
> > + .name = "peci_cpu.dimmtemp.skx",
> > + .driver_data = (kernel_ulong_t)&dimm_skx,
> > + },
> > + {
> > + .name = "peci_cpu.dimmtemp.icx",
> > + .driver_data = (kernel_ulong_t)&dimm_icx,
> > + },
> > + {
> > + .name = "peci_cpu.dimmtemp.icxd",
> > + .driver_data = (kernel_ulong_t)&dimm_icxd,
> > + },
> > + { }
> > +};
> > +MODULE_DEVICE_TABLE(auxiliary, peci_dimmtemp_ids);
> > +
> > +static struct auxiliary_driver peci_dimmtemp_driver = {
> > + .probe = peci_dimmtemp_probe,
> > + .remove = peci_dimmtemp_remove,
> > + .id_table = peci_dimmtemp_ids,
> > +};
> > +
> > +module_auxiliary_driver(peci_dimmtemp_driver);
> > +
> > +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
> > +MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
> > +MODULE_DESCRIPTION("PECI dimmtemp driver");
> > +MODULE_LICENSE("GPL");
> > +MODULE_IMPORT_NS(PECI_CPU);
On Mon, Jul 12, 2021 at 05:04:45PM CDT, Iwona Winiarska wrote:
>Add peci-dimmtemp driver for Digital Thermal Sensor (DTS) thermal
>readings of DIMMs that are accessible via the processor PECI interface.
>
>The main use case for the driver (and PECI interface) is out-of-band
>management, where we're able to obtain the DTS readings from an external
>entity connected with PECI, e.g. BMC on server platforms.
>
>Co-developed-by: Jae Hyun Yoo <[email protected]>
>Signed-off-by: Jae Hyun Yoo <[email protected]>
>Signed-off-by: Iwona Winiarska <[email protected]>
>Reviewed-by: Pierre-Louis Bossart <[email protected]>
>---
> drivers/hwmon/peci/Kconfig | 13 +
> drivers/hwmon/peci/Makefile | 2 +
> drivers/hwmon/peci/dimmtemp.c | 508 ++++++++++++++++++++++++++++++++++
> 3 files changed, 523 insertions(+)
> create mode 100644 drivers/hwmon/peci/dimmtemp.c
>
>diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
>index e10eed68d70a..f2d57efa508b 100644
>--- a/drivers/hwmon/peci/Kconfig
>+++ b/drivers/hwmon/peci/Kconfig
>@@ -14,5 +14,18 @@ config SENSORS_PECI_CPUTEMP
> This driver can also be built as a module. If so, the module
> will be called peci-cputemp.
>
>+config SENSORS_PECI_DIMMTEMP
>+ tristate "PECI DIMM temperature monitoring client"
>+ depends on PECI
>+ select SENSORS_PECI
>+ select PECI_CPU
>+ help
>+ If you say yes here you get support for the generic Intel PECI hwmon
>+ driver which provides Digital Thermal Sensor (DTS) thermal readings of
>+ DIMM components that are accessible via the processor PECI interface.
>+
>+ This driver can also be built as a module. If so, the module
>+ will be called peci-dimmtemp.
>+
> config SENSORS_PECI
> tristate
>diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
>index e8a0ada5ab1f..191cfa0227f3 100644
>--- a/drivers/hwmon/peci/Makefile
>+++ b/drivers/hwmon/peci/Makefile
>@@ -1,5 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
>
> peci-cputemp-y := cputemp.o
>+peci-dimmtemp-y := dimmtemp.o
>
> obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
>+obj-$(CONFIG_SENSORS_PECI_DIMMTEMP) += peci-dimmtemp.o
>diff --git a/drivers/hwmon/peci/dimmtemp.c b/drivers/hwmon/peci/dimmtemp.c
>new file mode 100644
>index 000000000000..2fcb8607137a
>--- /dev/null
>+++ b/drivers/hwmon/peci/dimmtemp.c
>@@ -0,0 +1,508 @@
>+// SPDX-License-Identifier: GPL-2.0-only
>+// Copyright (c) 2018-2021 Intel Corporation
>+
>+#include <linux/auxiliary_bus.h>
>+#include <linux/bitfield.h>
>+#include <linux/bitops.h>
>+#include <linux/hwmon.h>
>+#include <linux/jiffies.h>
>+#include <linux/module.h>
>+#include <linux/peci.h>
>+#include <linux/peci-cpu.h>
>+#include <linux/units.h>
>+#include <linux/workqueue.h>
>+#include <linux/x86/intel-family.h>
>+
>+#include "common.h"
>+
>+#define DIMM_MASK_CHECK_DELAY_JIFFIES msecs_to_jiffies(5000)
>+#define DIMM_MASK_CHECK_RETRY_MAX 60 /* 60 x 5 secs = 5 minutes */
>+
>+/* Max number of channel ranks and DIMM index per channel */
>+#define CHAN_RANK_MAX_ON_HSX 8
>+#define DIMM_IDX_MAX_ON_HSX 3
>+#define CHAN_RANK_MAX_ON_BDX 4
>+#define DIMM_IDX_MAX_ON_BDX 3
>+#define CHAN_RANK_MAX_ON_BDXD 2
>+#define DIMM_IDX_MAX_ON_BDXD 2
>+#define CHAN_RANK_MAX_ON_SKX 6
>+#define DIMM_IDX_MAX_ON_SKX 2
>+#define CHAN_RANK_MAX_ON_ICX 8
>+#define DIMM_IDX_MAX_ON_ICX 2
>+#define CHAN_RANK_MAX_ON_ICXD 4
>+#define DIMM_IDX_MAX_ON_ICXD 2
>+
>+#define CHAN_RANK_MAX CHAN_RANK_MAX_ON_HSX
>+#define DIMM_IDX_MAX DIMM_IDX_MAX_ON_HSX
>+#define DIMM_NUMS_MAX (CHAN_RANK_MAX * DIMM_IDX_MAX)
Should we perhaps have a static_assert(DIMM_NUMS_MAX <= 64) so that
check_populated_dimms() doesn't silently break if we ever have a system
with > 64 dimms? (Not sure how far off that might be, but doesn't seem
*that* wildly inconceivable, anyway.)
On a similar note, it'd be nice if there were some neat way of
automating the maintenance of CHAN_RANK_MAX and DIMM_IDX_MAX, but I
don't know of any great solutions for that offhand. (Shrug.)
>+
>+#define CPU_SEG_MASK GENMASK(23, 16)
>+#define GET_CPU_SEG(x) (((x) & CPU_SEG_MASK) >> 16)
>+#define CPU_BUS_MASK GENMASK(7, 0)
>+#define GET_CPU_BUS(x) ((x) & CPU_BUS_MASK)
>+
>+#define DIMM_TEMP_MAX GENMASK(15, 8)
>+#define DIMM_TEMP_CRIT GENMASK(23, 16)
>+#define GET_TEMP_MAX(x) (((x) & DIMM_TEMP_MAX) >> 8)
>+#define GET_TEMP_CRIT(x) (((x) & DIMM_TEMP_CRIT) >> 16)
>+
>+struct dimm_info {
>+ int chan_rank_max;
>+ int dimm_idx_max;
>+ u8 min_peci_revision;
This field doesn't seem to be used for anything that I can see; is it
really needed?
>+};
>+
>+struct peci_dimmtemp {
>+ struct peci_device *peci_dev;
>+ struct device *dev;
>+ const char *name;
>+ const struct dimm_info *gen_info;
>+ struct delayed_work detect_work;
>+ struct peci_sensor_data temp[DIMM_NUMS_MAX];
>+ long temp_max[DIMM_NUMS_MAX];
>+ long temp_crit[DIMM_NUMS_MAX];
>+ int retry_count;
>+ char **dimmtemp_label;
>+ DECLARE_BITMAP(dimm_mask, DIMM_NUMS_MAX);
>+};
>+
>+static u8 __dimm_temp(u32 reg, int dimm_order)
>+{
>+ return (reg >> (dimm_order * 8)) & 0xff;
>+}
>+
>+static int get_dimm_temp(struct peci_dimmtemp *priv, int dimm_no)
>+{
>+ int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
>+ int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
>+ struct peci_device *peci_dev = priv->peci_dev;
>+ u8 cpu_seg, cpu_bus, dev, func;
>+ u64 offset;
>+ u32 data;
>+ u16 reg;
>+ int ret;
>+
>+ if (!peci_sensor_need_update(&priv->temp[dimm_no]))
>+ return 0;
>+
>+ ret = peci_pcs_read(peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank, &data);
>+ if (ret)
>+ return ret;
>+
>+ priv->temp[dimm_no].value = __dimm_temp(data, dimm_order) * MILLIDEGREE_PER_DEGREE;
>+
>+ switch (peci_dev->info.model) {
>+ case INTEL_FAM6_ICELAKE_X:
>+ case INTEL_FAM6_ICELAKE_D:
>+ ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd4, &data);
>+ if (ret || !(data & BIT(31)))
>+ break; /* Use default or previous value */
>+
>+ ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd0, &data);
>+ if (ret)
>+ break; /* Use default or previous value */
>+
>+ cpu_seg = GET_CPU_SEG(data);
>+ cpu_bus = GET_CPU_BUS(data);
>+
>+ /*
>+ * Device 26, Offset 224e0: IMC 0 channel 0 -> rank 0
>+ * Device 26, Offset 264e0: IMC 0 channel 1 -> rank 1
>+ * Device 27, Offset 224e0: IMC 1 channel 0 -> rank 2
>+ * Device 27, Offset 264e0: IMC 1 channel 1 -> rank 3
>+ * Device 28, Offset 224e0: IMC 2 channel 0 -> rank 4
>+ * Device 28, Offset 264e0: IMC 2 channel 1 -> rank 5
>+ * Device 29, Offset 224e0: IMC 3 channel 0 -> rank 6
>+ * Device 29, Offset 264e0: IMC 3 channel 1 -> rank 7
>+ */
>+ dev = 0x1a + chan_rank / 2;
>+ offset = 0x224e0 + dimm_order * 4;
>+ if (chan_rank % 2)
>+ offset += 0x4000;
>+
>+ ret = peci_mmio_read(peci_dev, 0, cpu_seg, cpu_bus, dev, 0, offset, &data);
>+ if (ret)
>+ return ret;
>+
>+ priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
>+ priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
These two lines look identical in all (non-default) cases; should we
deduplicate them by just moving them to after the switch?
>+
>+ break;
>+ case INTEL_FAM6_SKYLAKE_X:
>+ /*
>+ * Device 10, Function 2: IMC 0 channel 0 -> rank 0
>+ * Device 10, Function 6: IMC 0 channel 1 -> rank 1
>+ * Device 11, Function 2: IMC 0 channel 2 -> rank 2
>+ * Device 12, Function 2: IMC 1 channel 0 -> rank 3
>+ * Device 12, Function 6: IMC 1 channel 1 -> rank 4
>+ * Device 13, Function 2: IMC 1 channel 2 -> rank 5
>+ */
>+ dev = 10 + chan_rank / 3 * 2 + (chan_rank % 3 == 2 ? 1 : 0);
>+ func = chan_rank % 3 == 1 ? 6 : 2;
>+ reg = 0x120 + dimm_order * 4;
>+
>+ ret = peci_pci_local_read(peci_dev, 2, dev, func, reg, &data);
>+ if (ret)
>+ return ret;
>+
>+ priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
>+ priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
>+
>+ break;
>+ case INTEL_FAM6_BROADWELL_D:
>+ /*
>+ * Device 10, Function 2: IMC 0 channel 0 -> rank 0
>+ * Device 10, Function 6: IMC 0 channel 1 -> rank 1
>+ * Device 12, Function 2: IMC 1 channel 0 -> rank 2
>+ * Device 12, Function 6: IMC 1 channel 1 -> rank 3
>+ */
>+ dev = 10 + chan_rank / 2 * 2;
>+ func = (chan_rank % 2) ? 6 : 2;
>+ reg = 0x120 + dimm_order * 4;
>+
>+ ret = peci_pci_local_read(peci_dev, 2, dev, func, reg, &data);
>+ if (ret)
>+ return ret;
>+
>+ priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
>+ priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
>+
>+ break;
>+ case INTEL_FAM6_HASWELL_X:
>+ case INTEL_FAM6_BROADWELL_X:
>+ /*
>+ * Device 20, Function 0: IMC 0 channel 0 -> rank 0
>+ * Device 20, Function 1: IMC 0 channel 1 -> rank 1
>+ * Device 21, Function 0: IMC 0 channel 2 -> rank 2
>+ * Device 21, Function 1: IMC 0 channel 3 -> rank 3
>+ * Device 23, Function 0: IMC 1 channel 0 -> rank 4
>+ * Device 23, Function 1: IMC 1 channel 1 -> rank 5
>+ * Device 24, Function 0: IMC 1 channel 2 -> rank 6
>+ * Device 24, Function 1: IMC 1 channel 3 -> rank 7
>+ */
>+ dev = 20 + chan_rank / 2 + chan_rank / 4;
>+ func = chan_rank % 2;
>+ reg = 0x120 + dimm_order * 4;
>+
>+ ret = peci_pci_local_read(peci_dev, 1, dev, func, reg, &data);
>+ if (ret)
>+ return ret;
>+
>+ priv->temp_max[dimm_no] = GET_TEMP_MAX(data) * MILLIDEGREE_PER_DEGREE;
>+ priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) * MILLIDEGREE_PER_DEGREE;
>+
>+ break;
>+ default:
>+ return -EOPNOTSUPP;
>+ }
>+
>+ peci_sensor_mark_updated(&priv->temp[dimm_no]);
>+
>+ return 0;
>+}
>+
>+static int dimmtemp_read_string(struct device *dev,
>+ enum hwmon_sensor_types type,
>+ u32 attr, int channel, const char **str)
>+{
>+ struct peci_dimmtemp *priv = dev_get_drvdata(dev);
>+
>+ if (attr != hwmon_temp_label)
>+ return -EOPNOTSUPP;
>+
>+ *str = (const char *)priv->dimmtemp_label[channel];
>+
>+ return 0;
>+}
>+
>+static int dimmtemp_read(struct device *dev, enum hwmon_sensor_types type,
>+ u32 attr, int channel, long *val)
>+{
>+ struct peci_dimmtemp *priv = dev_get_drvdata(dev);
>+ int ret;
>+
>+ ret = get_dimm_temp(priv, channel);
>+ if (ret)
>+ return ret;
>+
>+ switch (attr) {
>+ case hwmon_temp_input:
>+ *val = priv->temp[channel].value;
>+ break;
>+ case hwmon_temp_max:
>+ *val = priv->temp_max[channel];
>+ break;
>+ case hwmon_temp_crit:
>+ *val = priv->temp_crit[channel];
>+ break;
>+ default:
>+ return -EOPNOTSUPP;
>+ }
>+
>+ return 0;
>+}
>+
>+static umode_t dimmtemp_is_visible(const void *data, enum hwmon_sensor_types type,
>+ u32 attr, int channel)
>+{
>+ const struct peci_dimmtemp *priv = data;
>+
>+ if (test_bit(channel, priv->dimm_mask))
>+ return 0444;
>+
>+ return 0;
>+}
>+
>+static const struct hwmon_ops peci_dimmtemp_ops = {
>+ .is_visible = dimmtemp_is_visible,
>+ .read_string = dimmtemp_read_string,
>+ .read = dimmtemp_read,
>+};
>+
>+static int check_populated_dimms(struct peci_dimmtemp *priv)
>+{
>+ int chan_rank_max = priv->gen_info->chan_rank_max;
>+ int dimm_idx_max = priv->gen_info->dimm_idx_max;
>+ int chan_rank, dimm_idx, ret;
>+ u64 dimm_mask = 0;
>+ u32 pcs;
>+
>+ for (chan_rank = 0; chan_rank < chan_rank_max; chan_rank++) {
>+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank, &pcs);
>+ if (ret) {
>+ /*
>+ * Overall, we expect either success or -EINVAL in
>+ * order to determine whether DIMM is populated or not.
>+ * For anything else - we fall back to defering the
>+ * detection to be performed at a later point in time.
>+ */
>+ if (ret == -EINVAL)
>+ continue;
>+ else
>+ return -EAGAIN;
>+ }
>+
>+ for (dimm_idx = 0; dimm_idx < dimm_idx_max; dimm_idx++)
>+ if (__dimm_temp(pcs, dimm_idx))
>+ dimm_mask |= BIT(chan_rank * dimm_idx_max + dimm_idx);
>+ }
>+ /*
>+ * It's possible that memory training is not done yet. In this case we
>+ * defer the detection to be performed at a later point in time.
>+ */
>+ if (!dimm_mask)
>+ return -EAGAIN;
>+
>+ dev_dbg(priv->dev, "Scanned populated DIMMs: %#llx\n", dimm_mask);
Hmm, though aside from this one debug print it seems like this function
could just as easily operate directly on priv->dimm_mask if we wanted to
make it safe for >64 dimms (I have no particular objection to keeping it
as-is for now though).
>+
>+ bitmap_from_u64(priv->dimm_mask, dimm_mask);
>+
>+ return 0;
>+}
>+
>+static int create_dimm_temp_label(struct peci_dimmtemp *priv, int chan)
>+{
>+ int rank = chan / priv->gen_info->dimm_idx_max;
>+ int idx = chan % priv->gen_info->dimm_idx_max;
>+
>+ priv->dimmtemp_label[chan] = devm_kasprintf(priv->dev, GFP_KERNEL,
>+ "DIMM %c%d", 'A' + rank,
>+ idx + 1);
>+ if (!priv->dimmtemp_label[chan])
>+ return -ENOMEM;
>+
>+ return 0;
>+}
>+
>+static const u32 peci_dimmtemp_temp_channel_config[] = {
>+ [0 ... DIMM_NUMS_MAX - 1] = HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT,
>+ 0
>+};
>+
>+static const struct hwmon_channel_info peci_dimmtemp_temp_channel = {
>+ .type = hwmon_temp,
>+ .config = peci_dimmtemp_temp_channel_config,
>+};
>+
>+static const struct hwmon_channel_info *peci_dimmtemp_temp_info[] = {
>+ &peci_dimmtemp_temp_channel,
>+ NULL
>+};
>+
>+static const struct hwmon_chip_info peci_dimmtemp_chip_info = {
>+ .ops = &peci_dimmtemp_ops,
>+ .info = peci_dimmtemp_temp_info,
>+};
>+
>+static int create_dimm_temp_info(struct peci_dimmtemp *priv)
>+{
>+ int ret, i, channels;
>+ struct device *dev;
>+
>+ ret = check_populated_dimms(priv);
>+ if (ret == -EAGAIN) {
>+ if (priv->retry_count < DIMM_MASK_CHECK_RETRY_MAX) {
>+ schedule_delayed_work(&priv->detect_work,
>+ DIMM_MASK_CHECK_DELAY_JIFFIES);
>+ priv->retry_count++;
>+ dev_dbg(priv->dev, "Deferred populating DIMM temp info\n");
>+ return ret;
>+ }
>+
>+ dev_info(priv->dev, "Timeout populating DIMM temp info\n");
>+ return -ETIMEDOUT;
>+ }
>+
>+ channels = priv->gen_info->chan_rank_max * priv->gen_info->dimm_idx_max;
>+
>+ priv->dimmtemp_label = devm_kzalloc(priv->dev, channels * sizeof(char *), GFP_KERNEL);
>+ if (!priv->dimmtemp_label)
>+ return -ENOMEM;
>+
>+ for_each_set_bit(i, priv->dimm_mask, DIMM_NUMS_MAX) {
>+ ret = create_dimm_temp_label(priv, i);
>+ if (ret)
>+ return ret;
>+ }
>+
>+ dev = devm_hwmon_device_register_with_info(priv->dev, priv->name, priv,
>+ &peci_dimmtemp_chip_info, NULL);
>+ if (IS_ERR(dev)) {
>+ dev_err(priv->dev, "Failed to register hwmon device\n");
>+ return PTR_ERR(dev);
>+ }
>+
>+ dev_dbg(priv->dev, "%s: sensor '%s'\n", dev_name(dev), priv->name);
>+
>+ return 0;
>+}
>+
>+static void create_dimm_temp_info_delayed(struct work_struct *work)
>+{
>+ struct peci_dimmtemp *priv = container_of(to_delayed_work(work),
>+ struct peci_dimmtemp,
>+ detect_work);
>+ int ret;
>+
>+ ret = create_dimm_temp_info(priv);
>+ if (ret && ret != -EAGAIN)
>+ dev_dbg(priv->dev, "Failed to populate DIMM temp info\n");
>+}
>+
>+static int peci_dimmtemp_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
>+{
>+ struct device *dev = &adev->dev;
>+ struct peci_device *peci_dev = to_peci_device(dev->parent);
>+ struct peci_dimmtemp *priv;
>+ int ret;
>+
>+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>+ if (!priv)
>+ return -ENOMEM;
>+
>+ priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_dimmtemp.cpu%d",
>+ peci_dev->info.socket_id);
>+ if (!priv->name)
>+ return -ENOMEM;
>+
>+ dev_set_drvdata(dev, priv);
>+ priv->dev = dev;
>+ priv->peci_dev = peci_dev;
>+ priv->gen_info = (const struct dimm_info *)id->driver_data;
>+
>+ INIT_DELAYED_WORK(&priv->detect_work, create_dimm_temp_info_delayed);
>+
>+ ret = create_dimm_temp_info(priv);
>+ if (ret && ret != -EAGAIN) {
>+ dev_dbg(dev, "Failed to populate DIMM temp info\n");
>+ return ret;
>+ }
>+
>+ return 0;
>+}
>+
>+static void peci_dimmtemp_remove(struct auxiliary_device *adev)
>+{
>+ struct peci_dimmtemp *priv = dev_get_drvdata(&adev->dev);
>+
>+ cancel_delayed_work_sync(&priv->detect_work);
>+}
>+
>+static const struct dimm_info dimm_hsx = {
>+ .chan_rank_max = CHAN_RANK_MAX_ON_HSX,
>+ .dimm_idx_max = DIMM_IDX_MAX_ON_HSX,
>+ .min_peci_revision = 0x30,
>+};
>+
>+static const struct dimm_info dimm_bdx = {
>+ .chan_rank_max = CHAN_RANK_MAX_ON_BDX,
>+ .dimm_idx_max = DIMM_IDX_MAX_ON_BDX,
>+ .min_peci_revision = 0x30,
>+};
>+
>+static const struct dimm_info dimm_bdxd = {
>+ .chan_rank_max = CHAN_RANK_MAX_ON_BDXD,
>+ .dimm_idx_max = DIMM_IDX_MAX_ON_BDXD,
>+ .min_peci_revision = 0x30,
>+};
>+
>+static const struct dimm_info dimm_skx = {
>+ .chan_rank_max = CHAN_RANK_MAX_ON_SKX,
>+ .dimm_idx_max = DIMM_IDX_MAX_ON_SKX,
>+ .min_peci_revision = 0x30,
>+};
>+
>+static const struct dimm_info dimm_icx = {
>+ .chan_rank_max = CHAN_RANK_MAX_ON_ICX,
>+ .dimm_idx_max = DIMM_IDX_MAX_ON_ICX,
>+ .min_peci_revision = 0x40,
>+};
>+
>+static const struct dimm_info dimm_icxd = {
>+ .chan_rank_max = CHAN_RANK_MAX_ON_ICXD,
>+ .dimm_idx_max = DIMM_IDX_MAX_ON_ICXD,
>+ .min_peci_revision = 0x40,
>+};
>+
>+static const struct auxiliary_device_id peci_dimmtemp_ids[] = {
>+ {
>+ .name = "peci_cpu.dimmtemp.hsx",
>+ .driver_data = (kernel_ulong_t)&dimm_hsx,
>+ },
>+ {
>+ .name = "peci_cpu.dimmtemp.bdx",
>+ .driver_data = (kernel_ulong_t)&dimm_bdx,
>+ },
>+ {
>+ .name = "peci_cpu.dimmtemp.bdxd",
>+ .driver_data = (kernel_ulong_t)&dimm_bdxd,
>+ },
>+ {
>+ .name = "peci_cpu.dimmtemp.skx",
>+ .driver_data = (kernel_ulong_t)&dimm_skx,
>+ },
>+ {
>+ .name = "peci_cpu.dimmtemp.icx",
>+ .driver_data = (kernel_ulong_t)&dimm_icx,
>+ },
>+ {
>+ .name = "peci_cpu.dimmtemp.icxd",
>+ .driver_data = (kernel_ulong_t)&dimm_icxd,
>+ },
>+ { }
>+};
>+MODULE_DEVICE_TABLE(auxiliary, peci_dimmtemp_ids);
>+
>+static struct auxiliary_driver peci_dimmtemp_driver = {
>+ .probe = peci_dimmtemp_probe,
>+ .remove = peci_dimmtemp_remove,
>+ .id_table = peci_dimmtemp_ids,
>+};
>+
>+module_auxiliary_driver(peci_dimmtemp_driver);
>+
>+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
>+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
>+MODULE_DESCRIPTION("PECI dimmtemp driver");
>+MODULE_LICENSE("GPL");
>+MODULE_IMPORT_NS(PECI_CPU);
>--
>2.31.1
>
On Mon, Jul 12, 2021 at 05:04:44PM CDT, Iwona Winiarska wrote:
>Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
>readings of the processor package and processor cores that are
>accessible via the PECI interface.
>
>The main use case for the driver (and PECI interface) is out-of-band
>management, where we're able to obtain the DTS readings from an external
>entity connected with PECI, e.g. BMC on server platforms.
>
>Co-developed-by: Jae Hyun Yoo <[email protected]>
>Signed-off-by: Jae Hyun Yoo <[email protected]>
>Signed-off-by: Iwona Winiarska <[email protected]>
>Reviewed-by: Pierre-Louis Bossart <[email protected]>
>---
> MAINTAINERS | 7 +
> drivers/hwmon/Kconfig | 2 +
> drivers/hwmon/Makefile | 1 +
> drivers/hwmon/peci/Kconfig | 18 ++
> drivers/hwmon/peci/Makefile | 5 +
> drivers/hwmon/peci/common.h | 46 ++++
> drivers/hwmon/peci/cputemp.c | 503 +++++++++++++++++++++++++++++++++++
> 7 files changed, 582 insertions(+)
> create mode 100644 drivers/hwmon/peci/Kconfig
> create mode 100644 drivers/hwmon/peci/Makefile
> create mode 100644 drivers/hwmon/peci/common.h
> create mode 100644 drivers/hwmon/peci/cputemp.c
>
>diff --git a/MAINTAINERS b/MAINTAINERS
>index f47b5f634293..35ba9e3646bd 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -14504,6 +14504,13 @@ L: [email protected]
> S: Maintained
> F: drivers/platform/x86/peaq-wmi.c
>
>+PECI HARDWARE MONITORING DRIVERS
>+M: Iwona Winiarska <[email protected]>
>+R: Jae Hyun Yoo <[email protected]>
>+L: [email protected]
>+S: Supported
>+F: drivers/hwmon/peci/
>+
> PECI SUBSYSTEM
> M: Iwona Winiarska <[email protected]>
> R: Jae Hyun Yoo <[email protected]>
>diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
>index e3675377bc5d..61c0e3404415 100644
>--- a/drivers/hwmon/Kconfig
>+++ b/drivers/hwmon/Kconfig
>@@ -1507,6 +1507,8 @@ config SENSORS_PCF8591
> These devices are hard to detect and rarely found on mainstream
> hardware. If unsure, say N.
>
>+source "drivers/hwmon/peci/Kconfig"
>+
> source "drivers/hwmon/pmbus/Kconfig"
>
> config SENSORS_PWM_FAN
>diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
>index d712c61c1f5e..f52331f212ed 100644
>--- a/drivers/hwmon/Makefile
>+++ b/drivers/hwmon/Makefile
>@@ -202,6 +202,7 @@ obj-$(CONFIG_SENSORS_WM8350) += wm8350-hwmon.o
> obj-$(CONFIG_SENSORS_XGENE) += xgene-hwmon.o
>
> obj-$(CONFIG_SENSORS_OCC) += occ/
>+obj-$(CONFIG_SENSORS_PECI) += peci/
> obj-$(CONFIG_PMBUS) += pmbus/
>
> ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
>diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
>new file mode 100644
>index 000000000000..e10eed68d70a
>--- /dev/null
>+++ b/drivers/hwmon/peci/Kconfig
>@@ -0,0 +1,18 @@
>+# SPDX-License-Identifier: GPL-2.0-only
>+
>+config SENSORS_PECI_CPUTEMP
>+ tristate "PECI CPU temperature monitoring client"
>+ depends on PECI
>+ select SENSORS_PECI
>+ select PECI_CPU
>+ help
>+ If you say yes here you get support for the generic Intel PECI
>+ cputemp driver which provides Digital Thermal Sensor (DTS) thermal
>+ readings of the CPU package and CPU cores that are accessible via
>+ the processor PECI interface.
>+
>+ This driver can also be built as a module. If so, the module
>+ will be called peci-cputemp.
>+
>+config SENSORS_PECI
>+ tristate
>diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
>new file mode 100644
>index 000000000000..e8a0ada5ab1f
>--- /dev/null
>+++ b/drivers/hwmon/peci/Makefile
>@@ -0,0 +1,5 @@
>+# SPDX-License-Identifier: GPL-2.0-only
>+
>+peci-cputemp-y := cputemp.o
>+
>+obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
>diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
>new file mode 100644
>index 000000000000..54580c100d06
>--- /dev/null
>+++ b/drivers/hwmon/peci/common.h
>@@ -0,0 +1,46 @@
>+/* SPDX-License-Identifier: GPL-2.0-only */
>+/* Copyright (c) 2021 Intel Corporation */
>+
>+#include <linux/types.h>
>+
>+#ifndef __PECI_HWMON_COMMON_H
>+#define __PECI_HWMON_COMMON_H
>+
>+#define UPDATE_INTERVAL_DEFAULT HZ
>+
>+/**
>+ * struct peci_sensor_data - PECI sensor information
>+ * @valid: flag to indicate the sensor value is valid
>+ * @value: sensor value in milli units
>+ * @last_updated: time of the last update in jiffies
>+ */
>+struct peci_sensor_data {
>+ unsigned int valid;
From what I can see it looks like the 'valid' member here is strictly a
one-shot has-this-value-ever-been-set indicator, which seems a bit
wasteful to keep around forever post initialization; couldn't the same
information be inferred from checking last_updated != 0 or something?
>+ s32 value;
>+ unsigned long last_updated;
>+};
>+
>+/**
>+ * peci_sensor_need_update() - check whether sensor update is needed or not
>+ * @sensor: pointer to sensor data struct
>+ *
>+ * Return: true if update is needed, false if not.
>+ */
>+
>+static inline bool peci_sensor_need_update(struct peci_sensor_data *sensor)
>+{
>+ return !sensor->valid ||
>+ time_after(jiffies, sensor->last_updated + UPDATE_INTERVAL_DEFAULT);
>+}
>+
>+/**
>+ * peci_sensor_mark_updated() - mark the sensor is updated
>+ * @sensor: pointer to sensor data struct
>+ */
>+static inline void peci_sensor_mark_updated(struct peci_sensor_data *sensor)
>+{
>+ sensor->valid = 1;
>+ sensor->last_updated = jiffies;
>+}
>+
>+#endif /* __PECI_HWMON_COMMON_H */
>diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
>new file mode 100644
>index 000000000000..56a526471687
>--- /dev/null
>+++ b/drivers/hwmon/peci/cputemp.c
>@@ -0,0 +1,503 @@
>+// SPDX-License-Identifier: GPL-2.0-only
>+// Copyright (c) 2018-2021 Intel Corporation
>+
>+#include <linux/auxiliary_bus.h>
>+#include <linux/bitfield.h>
>+#include <linux/bitops.h>
>+#include <linux/hwmon.h>
>+#include <linux/jiffies.h>
>+#include <linux/module.h>
>+#include <linux/peci.h>
>+#include <linux/peci-cpu.h>
>+#include <linux/units.h>
>+#include <linux/x86/intel-family.h>
>+
>+#include "common.h"
>+
>+#define CORE_NUMS_MAX 64
>+
>+#define DEFAULT_CHANNEL_NUMS 5
DEFAULT_ seems like a slightly odd prefix for this (it's not something
that can really be overridden or anything); would BASE_ perhaps be a bit
more appropriate?
>+#define CORETEMP_CHANNEL_NUMS CORE_NUMS_MAX
>+#define CPUTEMP_CHANNEL_NUMS (DEFAULT_CHANNEL_NUMS + CORETEMP_CHANNEL_NUMS)
>+
>+#define TEMP_TARGET_FAN_TEMP_MASK GENMASK(15, 8)
>+#define TEMP_TARGET_REF_TEMP_MASK GENMASK(23, 16)
>+#define TEMP_TARGET_TJ_OFFSET_MASK GENMASK(29, 24)
>+
>+#define DTS_MARGIN_MASK GENMASK(15, 0)
>+#define PCS_MODULE_TEMP_MASK GENMASK(15, 0)
>+
>+#define DTS_FIXED_POINT_FRACTION 64
>+
>+struct resolved_cores_reg {
>+ u8 bus;
>+ u8 dev;
>+ u8 func;
>+ u8 offset;
>+};
>+
>+struct cpu_info {
>+ struct resolved_cores_reg *reg;
>+ u8 min_peci_revision;
As with the dimmtemp driver, min_peci_revision appears unused here,
though in this case if it were removed there'd only be one (pointer)
member left in struct cpu_info, so we could perhaps remove it as well
and then also a level of indirection in peci_cputemp_ids/cpu_{hsx,icx}
too?
>+};
>+
>+struct peci_cputemp {
>+ struct peci_device *peci_dev;
>+ struct device *dev;
>+ const char *name;
>+ const struct cpu_info *gen_info;
>+ struct {
>+ struct peci_sensor_data die;
>+ struct peci_sensor_data dts;
>+ struct peci_sensor_data tcontrol;
>+ struct peci_sensor_data tthrottle;
>+ struct peci_sensor_data tjmax;
>+ struct peci_sensor_data core[CORETEMP_CHANNEL_NUMS];
>+ } temp;
>+ const char **coretemp_label;
>+ DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
>+};
>+
>+enum cputemp_channels {
>+ channel_die,
>+ channel_dts,
>+ channel_tcontrol,
>+ channel_tthrottle,
>+ channel_tjmax,
>+ channel_core,
>+};
>+
>+static const char *cputemp_label[DEFAULT_CHANNEL_NUMS] = {
static const char * const cputemp_label? (That is, const pointer to
const char, rather than non-const pointer to const char.)
>+ "Die",
>+ "DTS",
>+ "Tcontrol",
>+ "Tthrottle",
>+ "Tjmax",
>+};
>+
>+static int get_temp_targets(struct peci_cputemp *priv)
>+{
>+ s32 tthrottle_offset, tcontrol_margin;
>+ u32 pcs;
>+ int ret;
>+
>+ /*
>+ * Just use only the tcontrol marker to determine if target values need
>+ * update.
>+ */
>+ if (!peci_sensor_need_update(&priv->temp.tcontrol))
>+ return 0;
>+
>+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
>+ if (ret)
>+ return ret;
>+
>+ priv->temp.tjmax.value = FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
>+
>+ tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
>+ tcontrol_margin = sign_extend32(tcontrol_margin, 7) * MILLIDEGREE_PER_DEGREE;
>+ priv->temp.tcontrol.value = priv->temp.tjmax.value - tcontrol_margin;
>+
>+ tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
>+ priv->temp.tthrottle.value = priv->temp.tjmax.value - tthrottle_offset;
>+
>+ peci_sensor_mark_updated(&priv->temp.tcontrol);
>+
>+ return 0;
>+}
>+
>+/*
>+ * Processors return a value of DTS reading in S10.6 fixed point format
>+ * (sign, 10 bits signed integer value, 6 bits fractional).
This parenthetical reads to me like it's describing 17 bits -- I'm not a
PECI expert, but from my reading of the (somewhat skimpy) docs I've got
on it I'd suggest a description more like "sign, 9-bit magnitude, 6-bit
fraction".
>+ * Error codes:
>+ * 0x8000: General sensor error
>+ * 0x8001: Reserved
>+ * 0x8002: Underflow on reading value
>+ * 0x8003-0x81ff: Reserved
>+ */
>+static bool dts_valid(s32 val)
>+{
>+ return val < 0x8000 || val > 0x81ff;
>+}
>+
>+static s32 dts_to_millidegree(s32 val)
>+{
>+ return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / DTS_FIXED_POINT_FRACTION;
>+}
>+
>+static int get_die_temp(struct peci_cputemp *priv)
>+{
>+ s16 temp;
>+ int ret;
>+
>+ if (!peci_sensor_need_update(&priv->temp.die))
>+ return 0;
>+
>+ ret = peci_temp_read(priv->peci_dev, &temp);
>+ if (ret)
>+ return ret;
>+
>+ if (!dts_valid(temp))
>+ return -EIO;
>+
>+ /* Note that the tjmax should be available before calling it */
>+ priv->temp.die.value = priv->temp.tjmax.value + dts_to_millidegree(temp);
>+
>+ peci_sensor_mark_updated(&priv->temp.die);
>+
>+ return 0;
>+}
>+
>+static int get_dts(struct peci_cputemp *priv)
>+{
>+ s32 dts_margin;
>+ u32 pcs;
>+ int ret;
>+
>+ if (!peci_sensor_need_update(&priv->temp.dts))
>+ return 0;
>+
>+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_THERMAL_MARGIN, 0, &pcs);
>+ if (ret)
>+ return ret;
>+
>+ dts_margin = FIELD_GET(DTS_MARGIN_MASK, pcs);
>+ if (!dts_valid(dts_margin))
>+ return -EIO;
>+
>+ /* Note that the tcontrol should be available before calling it */
>+ priv->temp.dts.value = priv->temp.tcontrol.value - dts_to_millidegree(dts_margin);
>+
>+ peci_sensor_mark_updated(&priv->temp.dts);
>+
>+ return 0;
>+}
>+
>+static int get_core_temp(struct peci_cputemp *priv, int core_index)
>+{
>+ s32 core_dts_margin;
>+ u32 pcs;
>+ int ret;
>+
>+ if (!peci_sensor_need_update(&priv->temp.core[core_index]))
>+ return 0;
>+
>+ ret = peci_pcs_read(priv->peci_dev, PECI_PCS_MODULE_TEMP, core_index, &pcs);
>+ if (ret)
>+ return ret;
>+
>+ core_dts_margin = FIELD_GET(PCS_MODULE_TEMP_MASK, pcs);
>+ if (!dts_valid(core_dts_margin))
>+ return -EIO;
>+
>+ /* Note that the tjmax should be available before calling it */
>+ priv->temp.core[core_index].value =
>+ priv->temp.tjmax.value + dts_to_millidegree(core_dts_margin);
>+
>+ peci_sensor_mark_updated(&priv->temp.core[core_index]);
>+
>+ return 0;
>+}
>+
>+static int cputemp_read_string(struct device *dev, enum hwmon_sensor_types type,
>+ u32 attr, int channel, const char **str)
>+{
>+ struct peci_cputemp *priv = dev_get_drvdata(dev);
>+
>+ if (attr != hwmon_temp_label)
>+ return -EOPNOTSUPP;
>+
>+ *str = channel < channel_core ?
>+ cputemp_label[channel] : priv->coretemp_label[channel - channel_core];
>+
>+ return 0;
>+}
>+
>+static int cputemp_read(struct device *dev, enum hwmon_sensor_types type,
>+ u32 attr, int channel, long *val)
>+{
>+ struct peci_cputemp *priv = dev_get_drvdata(dev);
>+ int ret, core_index;
>+
>+ ret = get_temp_targets(priv);
>+ if (ret)
>+ return ret;
>+
>+ switch (attr) {
>+ case hwmon_temp_input:
>+ switch (channel) {
>+ case channel_die:
>+ ret = get_die_temp(priv);
>+ if (ret)
>+ return ret;
>+
>+ *val = priv->temp.die.value;
>+ break;
>+ case channel_dts:
>+ ret = get_dts(priv);
>+ if (ret)
>+ return ret;
>+
>+ *val = priv->temp.dts.value;
>+ break;
>+ case channel_tcontrol:
>+ *val = priv->temp.tcontrol.value;
>+ break;
>+ case channel_tthrottle:
>+ *val = priv->temp.tthrottle.value;
>+ break;
>+ case channel_tjmax:
>+ *val = priv->temp.tjmax.value;
>+ break;
>+ default:
>+ core_index = channel - channel_core;
>+ ret = get_core_temp(priv, core_index);
>+ if (ret)
>+ return ret;
>+
>+ *val = priv->temp.core[core_index].value;
>+ break;
>+ }
>+ break;
>+ case hwmon_temp_max:
>+ *val = priv->temp.tcontrol.value;
>+ break;
>+ case hwmon_temp_crit:
>+ *val = priv->temp.tjmax.value;
>+ break;
>+ case hwmon_temp_crit_hyst:
>+ *val = priv->temp.tjmax.value - priv->temp.tcontrol.value;
>+ break;
>+ default:
>+ return -EOPNOTSUPP;
>+ }
>+
>+ return 0;
>+}
>+
>+static umode_t cputemp_is_visible(const void *data, enum hwmon_sensor_types type,
>+ u32 attr, int channel)
>+{
>+ const struct peci_cputemp *priv = data;
>+
>+ if (channel > CPUTEMP_CHANNEL_NUMS)
>+ return 0;
>+
>+ if (channel < channel_core)
>+ return 0444;
>+
>+ if (test_bit(channel - channel_core, priv->core_mask))
>+ return 0444;
>+
>+ return 0;
>+}
>+
>+static int init_core_mask(struct peci_cputemp *priv)
>+{
>+ struct peci_device *peci_dev = priv->peci_dev;
>+ struct resolved_cores_reg *reg = priv->gen_info->reg;
>+ u64 core_mask;
>+ u32 data;
>+ int ret;
>+
>+ /* Get the RESOLVED_CORES register value */
>+ switch (peci_dev->info.model) {
>+ case INTEL_FAM6_ICELAKE_X:
>+ case INTEL_FAM6_ICELAKE_D:
>+ ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
>+ reg->func, reg->offset + 4, &data);
>+ if (ret)
>+ return ret;
>+
>+ core_mask = (u64)data << 32;
>+
>+ ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
>+ reg->func, reg->offset, &data);
>+ if (ret)
>+ return ret;
>+
>+ core_mask |= data;
>+
>+ break;
>+ default:
>+ ret = peci_pci_local_read(peci_dev, reg->bus, reg->dev,
>+ reg->func, reg->offset, &data);
>+ if (ret)
>+ return ret;
>+
>+ core_mask = data;
>+
>+ break;
>+ }
>+
>+ if (!core_mask)
>+ return -EIO;
>+
>+ bitmap_from_u64(priv->core_mask, core_mask);
>+
>+ return 0;
>+}
>+
>+static int create_temp_label(struct peci_cputemp *priv)
>+{
>+ unsigned long core_max = find_last_bit(priv->core_mask, CORE_NUMS_MAX);
>+ int i;
>+
>+ priv->coretemp_label = devm_kzalloc(priv->dev, core_max * sizeof(char *), GFP_KERNEL);
>+ if (!priv->coretemp_label)
>+ return -ENOMEM;
>+
>+ for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
>+ priv->coretemp_label[i] = devm_kasprintf(priv->dev, GFP_KERNEL, "Core %d", i);
>+ if (!priv->coretemp_label[i])
>+ return -ENOMEM;
>+ }
>+
>+ return 0;
>+}
>+
>+static void check_resolved_cores(struct peci_cputemp *priv)
>+{
>+ int ret;
>+
>+ ret = init_core_mask(priv);
>+ if (ret)
>+ return;
>+
>+ ret = create_temp_label(priv);
>+ if (ret)
>+ bitmap_zero(priv->core_mask, CORE_NUMS_MAX);
>+}
>+
>+static const struct hwmon_ops peci_cputemp_ops = {
>+ .is_visible = cputemp_is_visible,
>+ .read_string = cputemp_read_string,
>+ .read = cputemp_read,
>+};
>+
>+static const u32 peci_cputemp_temp_channel_config[] = {
>+ /* Die temperature */
>+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
>+ /* DTS margin */
>+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
>+ /* Tcontrol temperature */
>+ HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
>+ /* Tthrottle temperature */
>+ HWMON_T_LABEL | HWMON_T_INPUT,
>+ /* Tjmax temperature */
>+ HWMON_T_LABEL | HWMON_T_INPUT,
>+ /* Core temperature - for all core channels */
>+ [channel_core ... CPUTEMP_CHANNEL_NUMS - 1] = HWMON_T_LABEL | HWMON_T_INPUT,
>+ 0
>+};
>+
>+static const struct hwmon_channel_info peci_cputemp_temp_channel = {
>+ .type = hwmon_temp,
>+ .config = peci_cputemp_temp_channel_config,
>+};
>+
>+static const struct hwmon_channel_info *peci_cputemp_info[] = {
>+ &peci_cputemp_temp_channel,
>+ NULL
>+};
>+
>+static const struct hwmon_chip_info peci_cputemp_chip_info = {
>+ .ops = &peci_cputemp_ops,
>+ .info = peci_cputemp_info,
>+};
>+
>+static int peci_cputemp_probe(struct auxiliary_device *adev,
>+ const struct auxiliary_device_id *id)
>+{
>+ struct device *dev = &adev->dev;
>+ struct peci_device *peci_dev = to_peci_device(dev->parent);
>+ struct peci_cputemp *priv;
>+ struct device *hwmon_dev;
>+
>+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>+ if (!priv)
>+ return -ENOMEM;
>+
>+ priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_cputemp.cpu%d",
>+ peci_dev->info.socket_id);
>+ if (!priv->name)
>+ return -ENOMEM;
>+
>+ dev_set_drvdata(dev, priv);
>+ priv->dev = dev;
>+ priv->peci_dev = peci_dev;
>+ priv->gen_info = (const struct cpu_info *)id->driver_data;
>+
>+ check_resolved_cores(priv);
>+
>+ hwmon_dev = devm_hwmon_device_register_with_info(priv->dev, priv->name,
>+ priv, &peci_cputemp_chip_info, NULL);
>+
>+ return PTR_ERR_OR_ZERO(hwmon_dev);
>+}
>+
>+static struct resolved_cores_reg resolved_cores_reg_hsx = {
>+ .bus = 1,
>+ .dev = 30,
>+ .func = 3,
>+ .offset = 0xb4,
>+};
>+
>+static struct resolved_cores_reg resolved_cores_reg_icx = {
>+ .bus = 14,
>+ .dev = 30,
>+ .func = 3,
>+ .offset = 0xd0,
>+};
>+
>+static const struct cpu_info cpu_hsx = {
>+ .reg = &resolved_cores_reg_hsx,
>+ .min_peci_revision = 0x30,
>+};
>+
>+static const struct cpu_info cpu_icx = {
>+ .reg = &resolved_cores_reg_icx,
>+ .min_peci_revision = 0x40,
>+};
>+
>+static const struct auxiliary_device_id peci_cputemp_ids[] = {
>+ {
>+ .name = "peci_cpu.cputemp.hsx",
>+ .driver_data = (kernel_ulong_t)&cpu_hsx,
>+ },
>+ {
>+ .name = "peci_cpu.cputemp.bdx",
>+ .driver_data = (kernel_ulong_t)&cpu_hsx,
>+ },
>+ {
>+ .name = "peci_cpu.cputemp.bdxd",
>+ .driver_data = (kernel_ulong_t)&cpu_hsx,
>+ },
>+ {
>+ .name = "peci_cpu.cputemp.skx",
>+ .driver_data = (kernel_ulong_t)&cpu_hsx,
>+ },
>+ {
>+ .name = "peci_cpu.cputemp.icx",
>+ .driver_data = (kernel_ulong_t)&cpu_icx,
>+ },
>+ {
>+ .name = "peci_cpu.cputemp.icxd",
>+ .driver_data = (kernel_ulong_t)&cpu_icx,
>+ },
>+ { }
>+};
>+MODULE_DEVICE_TABLE(auxiliary, peci_cputemp_ids);
>+
>+static struct auxiliary_driver peci_cputemp_driver = {
>+ .probe = peci_cputemp_probe,
>+ .id_table = peci_cputemp_ids,
>+};
>+
>+module_auxiliary_driver(peci_cputemp_driver);
>+
>+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
>+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
>+MODULE_DESCRIPTION("PECI cputemp driver");
>+MODULE_LICENSE("GPL");
>+MODULE_IMPORT_NS(PECI_CPU);
>--
>2.31.1
>
On Mon, Jul 12, 2021 at 05:04:40PM CDT, Iwona Winiarska wrote:
>From: Jae Hyun Yoo <[email protected]>
>
>ASPEED AST24xx/AST25xx/AST26xx SoCs supports the PECI electrical
>interface (a.k.a PECI wire).
>
>Signed-off-by: Jae Hyun Yoo <[email protected]>
>Co-developed-by: Iwona Winiarska <[email protected]>
>Signed-off-by: Iwona Winiarska <[email protected]>
>Reviewed-by: Pierre-Louis Bossart <[email protected]>
>---
> MAINTAINERS | 9 +
> drivers/peci/Kconfig | 6 +
> drivers/peci/Makefile | 3 +
> drivers/peci/controller/Kconfig | 12 +
> drivers/peci/controller/Makefile | 3 +
> drivers/peci/controller/peci-aspeed.c | 501 ++++++++++++++++++++++++++
> 6 files changed, 534 insertions(+)
> create mode 100644 drivers/peci/controller/Kconfig
> create mode 100644 drivers/peci/controller/Makefile
> create mode 100644 drivers/peci/controller/peci-aspeed.c
>
>diff --git a/MAINTAINERS b/MAINTAINERS
>index 47411e2b6336..4ba874afa2fa 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -2865,6 +2865,15 @@ S: Maintained
> F: Documentation/hwmon/asc7621.rst
> F: drivers/hwmon/asc7621.c
>
>+ASPEED PECI CONTROLLER
>+M: Iwona Winiarska <[email protected]>
>+M: Jae Hyun Yoo <[email protected]>
>+L: [email protected] (moderated for non-subscribers)
>+L: [email protected] (moderated for non-subscribers)
>+S: Supported
>+F: Documentation/devicetree/bindings/peci/peci-aspeed.yaml
>+F: drivers/peci/controller/peci-aspeed.c
>+
> ASPEED PINCTRL DRIVERS
> M: Andrew Jeffery <[email protected]>
> L: [email protected] (moderated for non-subscribers)
>diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
>index 601cc3c3c852..0d0ee8009713 100644
>--- a/drivers/peci/Kconfig
>+++ b/drivers/peci/Kconfig
>@@ -12,3 +12,9 @@ menuconfig PECI
>
> This support is also available as a module. If so, the module
> will be called peci.
>+
>+if PECI
>+
>+source "drivers/peci/controller/Kconfig"
>+
>+endif # PECI
>diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
>index 2bb2f51bcda7..621a993e306a 100644
>--- a/drivers/peci/Makefile
>+++ b/drivers/peci/Makefile
>@@ -3,3 +3,6 @@
> # Core functionality
> peci-y := core.o sysfs.o
> obj-$(CONFIG_PECI) += peci.o
>+
>+# Hardware specific bus drivers
>+obj-y += controller/
>diff --git a/drivers/peci/controller/Kconfig b/drivers/peci/controller/Kconfig
>new file mode 100644
>index 000000000000..8ddbe494677f
>--- /dev/null
>+++ b/drivers/peci/controller/Kconfig
>@@ -0,0 +1,12 @@
>+# SPDX-License-Identifier: GPL-2.0-only
>+
>+config PECI_ASPEED
>+ tristate "ASPEED PECI support"
>+ depends on ARCH_ASPEED || COMPILE_TEST
>+ depends on OF
>+ depends on HAS_IOMEM
>+ help
>+ Enable this driver if you want to support ASPEED PECI controller.
>+
>+ This driver can be also build as a module. If so, the module
>+ will be called peci-aspeed.
>diff --git a/drivers/peci/controller/Makefile b/drivers/peci/controller/Makefile
>new file mode 100644
>index 000000000000..022c28ef1bf0
>--- /dev/null
>+++ b/drivers/peci/controller/Makefile
>@@ -0,0 +1,3 @@
>+# SPDX-License-Identifier: GPL-2.0-only
>+
>+obj-$(CONFIG_PECI_ASPEED) += peci-aspeed.o
>diff --git a/drivers/peci/controller/peci-aspeed.c b/drivers/peci/controller/peci-aspeed.c
>new file mode 100644
>index 000000000000..888b46383ea4
>--- /dev/null
>+++ b/drivers/peci/controller/peci-aspeed.c
>@@ -0,0 +1,501 @@
>+// SPDX-License-Identifier: GPL-2.0-only
>+// Copyright (C) 2012-2017 ASPEED Technology Inc.
>+// Copyright (c) 2018-2021 Intel Corporation
>+
>+#include <linux/bitfield.h>
>+#include <linux/clk.h>
>+#include <linux/delay.h>
>+#include <linux/interrupt.h>
>+#include <linux/io.h>
>+#include <linux/iopoll.h>
>+#include <linux/jiffies.h>
>+#include <linux/module.h>
>+#include <linux/of.h>
>+#include <linux/peci.h>
>+#include <linux/platform_device.h>
>+#include <linux/reset.h>
>+
>+#include <asm/unaligned.h>
>+
>+/* ASPEED PECI Registers */
>+/* Control Register */
>+#define ASPEED_PECI_CTRL 0x00
>+#define ASPEED_PECI_CTRL_SAMPLING_MASK GENMASK(19, 16)
>+#define ASPEED_PECI_CTRL_READ_MODE_MASK GENMASK(13, 12)
>+#define ASPEED_PECI_CTRL_READ_MODE_COUNT BIT(12)
>+#define ASPEED_PECI_CTRL_READ_MODE_DBG BIT(13)
Nitpick: might be nice to keep things in a consistent descending order
here (13 then 12).
>+#define ASPEED_PECI_CTRL_CLK_SOURCE_MASK BIT(11)
_MASK suffix seems out of place on this one.
>+#define ASPEED_PECI_CTRL_CLK_DIV_MASK GENMASK(10, 8)
>+#define ASPEED_PECI_CTRL_INVERT_OUT BIT(7)
>+#define ASPEED_PECI_CTRL_INVERT_IN BIT(6)
>+#define ASPEED_PECI_CTRL_BUS_CONTENT_EN BIT(5)
It *is* already kind of a long macro name, but abbreviating "contention"
to "content" seems a bit confusing; I'd suggest keeping the extra three
characters (or maybe drop the _EN suffix if you want to avoid making it
even longer).
>+#define ASPEED_PECI_CTRL_PECI_EN BIT(4)
>+#define ASPEED_PECI_CTRL_PECI_CLK_EN BIT(0)
>+
>+/* Timing Negotiation Register */
>+#define ASPEED_PECI_TIMING_NEGOTIATION 0x04
>+#define ASPEED_PECI_TIMING_MESSAGE_MASK GENMASK(15, 8)
>+#define ASPEED_PECI_TIMING_ADDRESS_MASK GENMASK(7, 0)
>+
>+/* Command Register */
>+#define ASPEED_PECI_CMD 0x08
>+#define ASPEED_PECI_CMD_PIN_MON BIT(31)
>+#define ASPEED_PECI_CMD_STS_MASK GENMASK(27, 24)
>+#define ASPEED_PECI_CMD_STS_ADDR_T_NEGO 0x3
>+#define ASPEED_PECI_CMD_IDLE_MASK \
>+ (ASPEED_PECI_CMD_STS_MASK | ASPEED_PECI_CMD_PIN_MON)
>+#define ASPEED_PECI_CMD_FIRE BIT(0)
>+
>+/* Read/Write Length Register */
>+#define ASPEED_PECI_RW_LENGTH 0x0c
>+#define ASPEED_PECI_AW_FCS_EN BIT(31)
>+#define ASPEED_PECI_READ_LEN_MASK GENMASK(23, 16)
>+#define ASPEED_PECI_WRITE_LEN_MASK GENMASK(15, 8)
>+#define ASPEED_PECI_TAGET_ADDR_MASK GENMASK(7, 0)
s/TAGET/TARGET/
>+
>+/* Expected FCS Data Register */
>+#define ASPEED_PECI_EXP_FCS 0x10
>+#define ASPEED_PECI_EXP_READ_FCS_MASK GENMASK(23, 16)
>+#define ASPEED_PECI_EXP_AW_FCS_AUTO_MASK GENMASK(15, 8)
>+#define ASPEED_PECI_EXP_WRITE_FCS_MASK GENMASK(7, 0)
>+
>+/* Captured FCS Data Register */
>+#define ASPEED_PECI_CAP_FCS 0x14
>+#define ASPEED_PECI_CAP_READ_FCS_MASK GENMASK(23, 16)
>+#define ASPEED_PECI_CAP_WRITE_FCS_MASK GENMASK(7, 0)
>+
>+/* Interrupt Register */
>+#define ASPEED_PECI_INT_CTRL 0x18
>+#define ASPEED_PECI_TIMING_NEGO_SEL_MASK GENMASK(31, 30)
>+#define ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO 0
>+#define ASPEED_PECI_2ND_BIT_OF_ADDR_NEGO 1
>+#define ASPEED_PECI_MESSAGE_NEGO 2
>+#define ASPEED_PECI_INT_MASK GENMASK(4, 0)
>+#define ASPEED_PECI_INT_BUS_TIMEOUT BIT(4)
>+#define ASPEED_PECI_INT_BUS_CONNECT BIT(3)
s/CONNECT/CONTENTION/
>+#define ASPEED_PECI_INT_W_FCS_BAD BIT(2)
>+#define ASPEED_PECI_INT_W_FCS_ABORT BIT(1)
>+#define ASPEED_PECI_INT_CMD_DONE BIT(0)
>+
>+/* Interrupt Status Register */
>+#define ASPEED_PECI_INT_STS 0x1c
>+#define ASPEED_PECI_INT_TIMING_RESULT_MASK GENMASK(29, 16)
>+ /* bits[4..0]: Same bit fields in the 'Interrupt Register' */
>+
>+/* Rx/Tx Data Buffer Registers */
>+#define ASPEED_PECI_W_DATA0 0x20
>+#define ASPEED_PECI_W_DATA1 0x24
>+#define ASPEED_PECI_W_DATA2 0x28
>+#define ASPEED_PECI_W_DATA3 0x2c
>+#define ASPEED_PECI_R_DATA0 0x30
>+#define ASPEED_PECI_R_DATA1 0x34
>+#define ASPEED_PECI_R_DATA2 0x38
>+#define ASPEED_PECI_R_DATA3 0x3c
>+#define ASPEED_PECI_W_DATA4 0x40
>+#define ASPEED_PECI_W_DATA5 0x44
>+#define ASPEED_PECI_W_DATA6 0x48
>+#define ASPEED_PECI_W_DATA7 0x4c
>+#define ASPEED_PECI_R_DATA4 0x50
>+#define ASPEED_PECI_R_DATA5 0x54
>+#define ASPEED_PECI_R_DATA6 0x58
>+#define ASPEED_PECI_R_DATA7 0x5c
>+#define ASPEED_PECI_DATA_BUF_SIZE_MAX 32
>+
>+/* Timing Negotiation */
>+#define ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT 8
>+#define ASPEED_PECI_RD_SAMPLING_POINT_MAX (BIT(4) - 1)
>+#define ASPEED_PECI_CLK_DIV_DEFAULT 0
>+#define ASPEED_PECI_CLK_DIV_MAX (BIT(3) - 1)
>+#define ASPEED_PECI_MSG_TIMING_DEFAULT 1
>+#define ASPEED_PECI_MSG_TIMING_MAX (BIT(8) - 1)
>+#define ASPEED_PECI_ADDR_TIMING_DEFAULT 1
>+#define ASPEED_PECI_ADDR_TIMING_MAX (BIT(8) - 1)
>+
>+/* Timeout */
>+#define ASPEED_PECI_IDLE_CHECK_TIMEOUT_US (50 * USEC_PER_MSEC)
>+#define ASPEED_PECI_IDLE_CHECK_INTERVAL_US (10 * USEC_PER_MSEC)
>+#define ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT (1000)
>+#define ASPEED_PECI_CMD_TIMEOUT_MS_MAX (1000)
>+
>+struct aspeed_peci {
>+ struct peci_controller controller;
>+ struct device *dev;
>+ void __iomem *base;
>+ struct clk *clk;
>+ struct reset_control *rst;
>+ int irq;
>+ spinlock_t lock; /* to sync completion status handling */
>+ struct completion xfer_complete;
>+ u32 status;
>+ u32 cmd_timeout_ms;
>+ u32 msg_timing;
>+ u32 addr_timing;
>+ u32 rd_sampling_point;
>+ u32 clk_div;
>+};
>+
>+static inline struct aspeed_peci *to_aspeed_peci(struct peci_controller *a)
>+{
>+ return container_of(a, struct aspeed_peci, controller);
>+}
>+
>+static void aspeed_peci_init_regs(struct aspeed_peci *priv)
>+{
>+ u32 val;
>+
>+ val = FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, ASPEED_PECI_CLK_DIV_DEFAULT);
>+ val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
>+ writel(val, priv->base + ASPEED_PECI_CTRL);
>+ /*
>+ * Timing negotiation period setting.
>+ * The unit of the programmed value is 4 times of PECI clock period.
>+ */
>+ val = FIELD_PREP(ASPEED_PECI_TIMING_MESSAGE_MASK, priv->msg_timing);
>+ val |= FIELD_PREP(ASPEED_PECI_TIMING_ADDRESS_MASK, priv->addr_timing);
>+ writel(val, priv->base + ASPEED_PECI_TIMING_NEGOTIATION);
>+
>+ /* Clear interrupts */
>+ val = readl(priv->base + ASPEED_PECI_INT_STS) | ASPEED_PECI_INT_MASK;
This should be & instead of |, I'm guessing?
>+ writel(val, priv->base + ASPEED_PECI_INT_STS);
>+
>+ /* Set timing negotiation mode and enable interrupts */
>+ val = FIELD_PREP(ASPEED_PECI_TIMING_NEGO_SEL_MASK, ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO);
>+ val |= ASPEED_PECI_INT_MASK;
>+ writel(val, priv->base + ASPEED_PECI_INT_CTRL);
>+
>+ val = FIELD_PREP(ASPEED_PECI_CTRL_SAMPLING_MASK, priv->rd_sampling_point);
>+ val |= FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, priv->clk_div);
>+ val |= ASPEED_PECI_CTRL_PECI_EN;
>+ val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
>+ writel(val, priv->base + ASPEED_PECI_CTRL);
>+}
>+
>+static inline int aspeed_peci_check_idle(struct aspeed_peci *priv)
>+{
>+ u32 cmd_sts = readl(priv->base + ASPEED_PECI_CMD);
>+
>+ if (FIELD_GET(ASPEED_PECI_CMD_STS_MASK, cmd_sts) == ASPEED_PECI_CMD_STS_ADDR_T_NEGO)
>+ aspeed_peci_init_regs(priv);
>+
>+ return readl_poll_timeout(priv->base + ASPEED_PECI_CMD,
>+ cmd_sts,
>+ !(cmd_sts & ASPEED_PECI_CMD_IDLE_MASK),
>+ ASPEED_PECI_IDLE_CHECK_INTERVAL_US,
>+ ASPEED_PECI_IDLE_CHECK_TIMEOUT_US);
>+}
>+
>+static int aspeed_peci_xfer(struct peci_controller *controller,
>+ u8 addr, struct peci_request *req)
>+{
>+ struct aspeed_peci *priv = to_aspeed_peci(controller);
>+ unsigned long flags, timeout = msecs_to_jiffies(priv->cmd_timeout_ms);
>+ u32 peci_head;
>+ int ret;
>+
>+ if (req->tx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX ||
>+ req->rx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX)
>+ return -EINVAL;
>+
>+ /* Check command sts and bus idle state */
>+ ret = aspeed_peci_check_idle(priv);
>+ if (ret)
>+ return ret; /* -ETIMEDOUT */
>+
>+ spin_lock_irqsave(&priv->lock, flags);
>+ reinit_completion(&priv->xfer_complete);
>+
>+ peci_head = FIELD_PREP(ASPEED_PECI_TAGET_ADDR_MASK, addr) |
>+ FIELD_PREP(ASPEED_PECI_WRITE_LEN_MASK, req->tx.len) |
>+ FIELD_PREP(ASPEED_PECI_READ_LEN_MASK, req->rx.len);
>+
>+ writel(peci_head, priv->base + ASPEED_PECI_RW_LENGTH);
>+
>+ memcpy_toio(priv->base + ASPEED_PECI_W_DATA0, req->tx.buf,
>+ req->tx.len > 16 ? 16 : req->tx.len);
min(req->tx.len, 16) for the third argument there might be a bit
clearer.
>+ if (req->tx.len > 16)
>+ memcpy_toio(priv->base + ASPEED_PECI_W_DATA4, req->tx.buf + 16,
>+ req->tx.len - 16);
>+
>+ dev_dbg(priv->dev, "HEAD : 0x%08x\n", peci_head);
>+ print_hex_dump_bytes("TX : ", DUMP_PREFIX_NONE, req->tx.buf, req->tx.len);
>+
>+ priv->status = 0;
>+ writel(ASPEED_PECI_CMD_FIRE, priv->base + ASPEED_PECI_CMD);
>+ spin_unlock_irqrestore(&priv->lock, flags);
>+
>+ ret = wait_for_completion_interruptible_timeout(&priv->xfer_complete, timeout);
>+ if (ret < 0)
>+ return ret;
>+
>+ if (ret == 0) {
>+ dev_dbg(priv->dev, "Timeout waiting for a response!\n");
>+ return -ETIMEDOUT;
>+ }
>+
>+ spin_lock_irqsave(&priv->lock, flags);
>+
>+ writel(0, priv->base + ASPEED_PECI_CMD);
>+
>+ if (priv->status != ASPEED_PECI_INT_CMD_DONE) {
>+ spin_unlock_irqrestore(&priv->lock, flags);
>+ dev_dbg(priv->dev, "No valid response!\n");
>+ return -EIO;
>+ }
>+
>+ spin_unlock_irqrestore(&priv->lock, flags);
>+
>+ memcpy_fromio(req->rx.buf, priv->base + ASPEED_PECI_R_DATA0,
>+ req->rx.len > 16 ? 16 : req->rx.len);
Likewise, min(req->rx.len, 16) here.
>+ if (req->rx.len > 16)
>+ memcpy_fromio(req->rx.buf + 16, priv->base + ASPEED_PECI_R_DATA4,
>+ req->rx.len - 16);
>+
>+ print_hex_dump_bytes("RX : ", DUMP_PREFIX_NONE, req->rx.buf, req->rx.len);
>+
>+ return 0;
>+}
>+
>+static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)
>+{
>+ struct aspeed_peci *priv = arg;
>+ u32 status;
>+
>+ spin_lock(&priv->lock);
>+ status = readl(priv->base + ASPEED_PECI_INT_STS);
>+ writel(status, priv->base + ASPEED_PECI_INT_STS);
>+ priv->status |= (status & ASPEED_PECI_INT_MASK);
>+
>+ /*
>+ * In most cases, interrupt bits will be set one by one but also note
>+ * that multiple interrupt bits could be set at the same time.
>+ */
>+ if (status & ASPEED_PECI_INT_BUS_TIMEOUT)
>+ dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_BUS_TIMEOUT\n");
>+
>+ if (status & ASPEED_PECI_INT_BUS_CONNECT)
>+ dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_BUS_CONNECT\n");
s/CONNECT/CONTENTION/ here too (in the message string).
>+
>+ if (status & ASPEED_PECI_INT_W_FCS_BAD)
>+ dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_W_FCS_BAD\n");
>+
>+ if (status & ASPEED_PECI_INT_W_FCS_ABORT)
>+ dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_W_FCS_ABORT\n");
Bus contention can of course arise legitimately, and I suppose an
offline host CPU might result in a timeout, so dbg seems fine for those
(though as Dan suggests, making some counters available seems like a
good idea, especially for contention). Are the FCS error cases
significant enough to warrant something less likely to go unnoticed
though? (e.g. dev_warn_ratelimited() or something?)
>+
>+ /*
>+ * All commands should be ended up with a ASPEED_PECI_INT_CMD_DONE bit
>+ * set even in an error case.
>+ */
>+ if (status & ASPEED_PECI_INT_CMD_DONE)
>+ complete(&priv->xfer_complete);
>+
>+ spin_unlock(&priv->lock);
>+
>+ return IRQ_HANDLED;
>+}
>+
>+static void __sanitize_clock_divider(struct aspeed_peci *priv)
>+{
>+ u32 clk_div;
>+ int ret;
>+
>+ ret = device_property_read_u32(priv->dev, "clock-divider", &clk_div);
>+ if (ret) {
>+ clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
>+ } else if (clk_div > ASPEED_PECI_CLK_DIV_MAX) {
>+ dev_warn(priv->dev, "Invalid clock-divider: %u, Using default: %u\n",
>+ clk_div, ASPEED_PECI_CLK_DIV_DEFAULT);
>+
>+ clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
>+ }
>+
>+ priv->clk_div = clk_div;
>+}
>+
The naming of these __sanitize_*() functions is a bit inconsistent with
the rest of the driver -- though given how similar they all look, could
they instead be refactored into a single helper function taking
property-name, default-value, and max-value parameters?
>+static void __sanitize_msg_timing(struct aspeed_peci *priv)
>+{
>+ u32 msg_timing;
>+ int ret;
>+
>+ ret = device_property_read_u32(priv->dev, "msg-timing", &msg_timing);
>+ if (ret) {
>+ msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
>+ } else if (msg_timing > ASPEED_PECI_MSG_TIMING_MAX) {
>+ dev_warn(priv->dev, "Invalid msg-timing : %u, Use default : %u\n",
>+ msg_timing, ASPEED_PECI_MSG_TIMING_DEFAULT);
>+
>+ msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
>+ }
>+
>+ priv->msg_timing = msg_timing;
>+}
>+
>+static void __sanitize_addr_timing(struct aspeed_peci *priv)
>+{
>+ u32 addr_timing;
>+ int ret;
>+
>+ ret = device_property_read_u32(priv->dev, "addr-timing", &addr_timing);
>+ if (ret) {
>+ addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
>+ } else if (addr_timing > ASPEED_PECI_ADDR_TIMING_MAX) {
>+ dev_warn(priv->dev, "Invalid addr-timing : %u, Use default : %u\n",
>+ addr_timing, ASPEED_PECI_ADDR_TIMING_DEFAULT);
>+
>+ addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
>+ }
>+
>+ priv->addr_timing = addr_timing;
>+}
>+
>+static void __sanitize_rd_sampling_point(struct aspeed_peci *priv)
>+{
>+ u32 rd_sampling_point;
>+ int ret;
>+
>+ ret = device_property_read_u32(priv->dev, "rd-sampling-point", &rd_sampling_point);
>+ if (ret) {
>+ rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
>+ } else if (rd_sampling_point > ASPEED_PECI_RD_SAMPLING_POINT_MAX) {
>+ dev_warn(priv->dev, "Invalid rd-sampling-point: %u, Use default : %u\n",
>+ rd_sampling_point, ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT);
>+
>+ rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
>+ }
>+
>+ priv->rd_sampling_point = rd_sampling_point;
>+}
>+
>+static void __sanitize_cmd_timeout(struct aspeed_peci *priv)
>+{
>+ u32 timeout;
>+ int ret;
>+
>+ ret = device_property_read_u32(priv->dev, "cmd-timeout-ms", &timeout);
>+ if (ret) {
>+ timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
>+ } else if (timeout > ASPEED_PECI_CMD_TIMEOUT_MS_MAX || timeout == 0) {
>+ dev_warn(priv->dev, "Invalid cmd-timeout-ms: %u, Use default: %u\n",
>+ timeout, ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT);
>+
>+ timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
>+ }
>+
>+ priv->cmd_timeout_ms = timeout;
>+}
>+
>+static void aspeed_peci_device_property_sanitize(struct aspeed_peci *priv)
>+{
>+ __sanitize_clock_divider(priv);
>+ __sanitize_msg_timing(priv);
>+ __sanitize_addr_timing(priv);
>+ __sanitize_rd_sampling_point(priv);
>+ __sanitize_cmd_timeout(priv);
>+}
>+
>+static void aspeed_peci_disable_clk(void *data)
>+{
>+ clk_disable_unprepare(data);
>+}
>+
>+static int aspeed_peci_init_ctrl(struct aspeed_peci *priv)
>+{
>+ int ret;
>+
>+ priv->clk = devm_clk_get(priv->dev, NULL);
>+ if (IS_ERR(priv->clk))
>+ return dev_err_probe(priv->dev, PTR_ERR(priv->clk), "Failed to get clk source\n");
>+
>+ ret = clk_prepare_enable(priv->clk);
>+ if (ret) {
>+ dev_err(priv->dev, "Failed to enable clock\n");
>+ return ret;
>+ }
>+
>+ ret = devm_add_action_or_reset(priv->dev, aspeed_peci_disable_clk, priv->clk);
>+ if (ret)
>+ return ret;
>+
>+ aspeed_peci_device_property_sanitize(priv);
>+
>+ aspeed_peci_init_regs(priv);
>+
>+ return 0;
>+}
>+
>+static int aspeed_peci_probe(struct platform_device *pdev)
>+{
>+ struct aspeed_peci *priv;
>+ int ret;
>+
>+ priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
>+ if (!priv)
>+ return -ENOMEM;
>+
>+ priv->dev = &pdev->dev;
>+ dev_set_drvdata(priv->dev, priv);
>+
>+ priv->base = devm_platform_ioremap_resource(pdev, 0);
>+ if (IS_ERR(priv->base))
>+ return PTR_ERR(priv->base);
>+
>+ priv->irq = platform_get_irq(pdev, 0);
>+ if (!priv->irq)
>+ return priv->irq;
>+
>+ ret = devm_request_irq(&pdev->dev, priv->irq, aspeed_peci_irq_handler,
>+ 0, "peci-aspeed-irq", priv);
Might as well drop the "-irq" suffix here? (Seems a bit redundant, and
a quick glance through /proc/interrupts on the systems I have at hand
doesn't show anything else following that convention.)
>+ if (ret)
>+ return ret;
>+
>+ init_completion(&priv->xfer_complete);
>+ spin_lock_init(&priv->lock);
>+
>+ priv->controller.xfer = aspeed_peci_xfer;
>+
>+ priv->rst = devm_reset_control_get(&pdev->dev, NULL);
>+ if (IS_ERR(priv->rst)) {
>+ dev_err(&pdev->dev, "Missing or invalid reset controller entry\n");
>+ return PTR_ERR(priv->rst);
>+ }
>+ reset_control_deassert(priv->rst);
>+
>+ ret = aspeed_peci_init_ctrl(priv);
>+ if (ret)
>+ return ret;
>+
>+ return peci_controller_add(&priv->controller, priv->dev);
>+}
>+
>+static int aspeed_peci_remove(struct platform_device *pdev)
>+{
>+ struct aspeed_peci *priv = dev_get_drvdata(&pdev->dev);
>+
>+ peci_controller_remove(&priv->controller);
>+ reset_control_assert(priv->rst);
>+
>+ return 0;
>+}
>+
>+static const struct of_device_id aspeed_peci_of_table[] = {
>+ { .compatible = "aspeed,ast2400-peci", },
>+ { .compatible = "aspeed,ast2500-peci", },
>+ { .compatible = "aspeed,ast2600-peci", },
>+ { }
>+};
>+MODULE_DEVICE_TABLE(of, aspeed_peci_of_table);
>+
>+static struct platform_driver aspeed_peci_driver = {
>+ .probe = aspeed_peci_probe,
>+ .remove = aspeed_peci_remove,
>+ .driver = {
>+ .name = "peci-aspeed",
>+ .of_match_table = aspeed_peci_of_table,
>+ },
>+};
>+module_platform_driver(aspeed_peci_driver);
>+
>+MODULE_AUTHOR("Ryan Chen <[email protected]>");
>+MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
>+MODULE_DESCRIPTION("ASPEED PECI driver");
>+MODULE_LICENSE("GPL");
>+MODULE_IMPORT_NS(PECI);
>--
>2.31.1
>
Iwona Winiarska wrote:
> +static const struct peci_device_id peci_cpu_device_ids[] = {
> + { /* Haswell Xeon */
> + .family = 6,
> + .model = INTEL_FAM6_HASWELL_X,
> + .data = "hsx",
> + },
> + { /* Broadwell Xeon */
> + .family = 6,
> + .model = INTEL_FAM6_BROADWELL_X,
> + .data = "bdx",
> + },
> + { /* Broadwell Xeon D */
> + .family = 6,
> + .model = INTEL_FAM6_BROADWELL_D,
> + .data = "skxd",
I think this should read "bdxd" as "skxd" does not exist in the
cputemp/dimmtemp drivers.
On Mon, Jul 12, 2021 at 05:04:41PM CDT, Iwona Winiarska wrote:
>Since PECI devices are discoverable, we can dynamically detect devices
>that are actually available in the system.
>
>This change complements the earlier implementation by rescanning PECI
>bus to detect available devices. For this purpose, it also introduces the
>minimal API for PECI requests.
>
>Signed-off-by: Iwona Winiarska <[email protected]>
>Reviewed-by: Pierre-Louis Bossart <[email protected]>
>---
> drivers/peci/Makefile | 2 +-
> drivers/peci/core.c | 13 ++++-
> drivers/peci/device.c | 111 ++++++++++++++++++++++++++++++++++++++++
> drivers/peci/internal.h | 15 ++++++
> drivers/peci/request.c | 74 +++++++++++++++++++++++++++
> drivers/peci/sysfs.c | 34 ++++++++++++
> 6 files changed, 246 insertions(+), 3 deletions(-)
> create mode 100644 drivers/peci/device.c
> create mode 100644 drivers/peci/request.c
>
>diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
>index 621a993e306a..917f689e147a 100644
>--- a/drivers/peci/Makefile
>+++ b/drivers/peci/Makefile
>@@ -1,7 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
>
> # Core functionality
>-peci-y := core.o sysfs.o
>+peci-y := core.o request.o device.o sysfs.o
> obj-$(CONFIG_PECI) += peci.o
>
> # Hardware specific bus drivers
>diff --git a/drivers/peci/core.c b/drivers/peci/core.c
>index 0ad00110459d..ae7a9572cdf3 100644
>--- a/drivers/peci/core.c
>+++ b/drivers/peci/core.c
>@@ -31,7 +31,15 @@ struct device_type peci_controller_type = {
>
> int peci_controller_scan_devices(struct peci_controller *controller)
> {
>- /* Just a stub, no support for actual devices yet */
>+ int ret;
>+ u8 addr;
>+
>+ for (addr = PECI_BASE_ADDR; addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX; addr++) {
>+ ret = peci_device_create(controller, addr);
>+ if (ret)
>+ return ret;
>+ }
>+
> return 0;
> }
>
>@@ -106,7 +114,8 @@ EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
>
> static int _unregister(struct device *dev, void *dummy)
> {
>- /* Just a stub, no support for actual devices yet */
>+ peci_device_destroy(to_peci_device(dev));
>+
> return 0;
> }
>
>diff --git a/drivers/peci/device.c b/drivers/peci/device.c
>new file mode 100644
>index 000000000000..1124862211e2
>--- /dev/null
>+++ b/drivers/peci/device.c
>@@ -0,0 +1,111 @@
>+// SPDX-License-Identifier: GPL-2.0-only
>+// Copyright (c) 2018-2021 Intel Corporation
>+
>+#include <linux/peci.h>
>+#include <linux/slab.h>
>+
>+#include "internal.h"
>+
>+static int peci_detect(struct peci_controller *controller, u8 addr)
>+{
>+ struct peci_request *req;
>+ int ret;
>+
>+ req = peci_request_alloc(NULL, 0, 0);
>+ if (!req)
>+ return -ENOMEM;
>+
Might be worth a brief comment here noting that an empty request happens
to be the format of a PECI ping command (and/or change the name of the
function to peci_ping()).
>+ mutex_lock(&controller->bus_lock);
>+ ret = controller->xfer(controller, addr, req);
>+ mutex_unlock(&controller->bus_lock);
>+
>+ peci_request_free(req);
>+
>+ return ret;
>+}
>+
>+static bool peci_addr_valid(u8 addr)
>+{
>+ return addr >= PECI_BASE_ADDR && addr < PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX;
>+}
>+
>+static int peci_dev_exists(struct device *dev, void *data)
>+{
>+ struct peci_device *device = to_peci_device(dev);
>+ u8 *addr = data;
>+
>+ if (device->addr == *addr)
>+ return -EBUSY;
>+
>+ return 0;
>+}
>+
>+int peci_device_create(struct peci_controller *controller, u8 addr)
>+{
>+ struct peci_device *device;
>+ int ret;
>+
>+ if (WARN_ON(!peci_addr_valid(addr)))
>+ return -EINVAL;
Wondering about the necessity of this check (and the peci_addr_valid()
function) -- as of the end of this patch series, there's only one caller
of peci_device_create(), and it's peci_controller_scan_devices() looping
from PECI_BASE_ADDR to PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX, so
checking that the address is in that range seems a bit redundant. Do we
anticipate that we might gain additional callers in the future that
could run a non-zero risk of passing a bad address?
>+
>+ /* Check if we have already detected this device before. */
>+ ret = device_for_each_child(&controller->dev, &addr, peci_dev_exists);
>+ if (ret)
>+ return 0;
>+
>+ ret = peci_detect(controller, addr);
>+ if (ret) {
>+ /*
>+ * Device not present or host state doesn't allow successful
>+ * detection at this time.
>+ */
>+ if (ret == -EIO || ret == -ETIMEDOUT)
>+ return 0;
Do we really want to be ignoring EIO here? From a look at
aspeed_peci_xfer(), it looks like the only path that would produce that
is the non-timeout, non-CMD_DONE case, which I guess happens on
contention or FCS errors and such. Should we maybe have some automatic
(limited) retry loop for cases like those?
>+
>+ return ret;
>+ }
>+
>+ device = kzalloc(sizeof(*device), GFP_KERNEL);
>+ if (!device)
>+ return -ENOMEM;
>+
>+ device->controller = controller;
>+ device->addr = addr;
>+ device->dev.parent = &device->controller->dev;
>+ device->dev.bus = &peci_bus_type;
>+ device->dev.type = &peci_device_type;
>+
>+ ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device->addr);
>+ if (ret)
>+ goto err_free;
>+
>+ ret = device_register(&device->dev);
>+ if (ret)
>+ goto err_put;
>+
>+ return 0;
>+
>+err_put:
>+ put_device(&device->dev);
>+err_free:
>+ kfree(device);
>+
>+ return ret;
>+}
>+
>+void peci_device_destroy(struct peci_device *device)
>+{
>+ device_unregister(&device->dev);
>+}
>+
>+static void peci_device_release(struct device *dev)
>+{
>+ struct peci_device *device = to_peci_device(dev);
>+
>+ kfree(device);
>+}
>+
>+struct device_type peci_device_type = {
>+ .groups = peci_device_groups,
>+ .release = peci_device_release,
>+};
>diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
>index 80c61bcdfc6b..6b139adaf6b8 100644
>--- a/drivers/peci/internal.h
>+++ b/drivers/peci/internal.h
>@@ -9,6 +9,21 @@
>
> struct peci_controller;
> struct attribute_group;
>+struct peci_device;
>+struct peci_request;
>+
>+/* PECI CPU address range 0x30-0x37 */
>+#define PECI_BASE_ADDR 0x30
>+#define PECI_DEVICE_NUM_MAX 8
>+
>+struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len);
>+void peci_request_free(struct peci_request *req);
>+
>+extern struct device_type peci_device_type;
>+extern const struct attribute_group *peci_device_groups[];
>+
>+int peci_device_create(struct peci_controller *controller, u8 addr);
>+void peci_device_destroy(struct peci_device *device);
>
> extern struct bus_type peci_bus_type;
> extern const struct attribute_group *peci_bus_groups[];
>diff --git a/drivers/peci/request.c b/drivers/peci/request.c
>new file mode 100644
>index 000000000000..78cee51dfae1
>--- /dev/null
>+++ b/drivers/peci/request.c
>@@ -0,0 +1,74 @@
>+// SPDX-License-Identifier: GPL-2.0-only
>+// Copyright (c) 2021 Intel Corporation
>+
>+#include <linux/export.h>
>+#include <linux/peci.h>
>+#include <linux/slab.h>
>+#include <linux/types.h>
>+
>+#include "internal.h"
>+
>+/**
>+ * peci_request_alloc() - allocate &struct peci_request with buffers with given lengths
>+ * @device: PECI device to which request is going to be sent
>+ * @tx_len: requested TX buffer length
>+ * @rx_len: requested RX buffer length
>+ *
>+ * Return: A pointer to a newly allocated &struct peci_request on success or NULL otherwise.
>+ */
>+struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len)
>+{
>+ struct peci_request *req;
>+ u8 *tx_buf, *rx_buf;
>+
>+ req = kzalloc(sizeof(*req), GFP_KERNEL);
>+ if (!req)
>+ return NULL;
>+
>+ req->device = device;
>+
>+ /*
>+ * PECI controllers that we are using now don't support DMA, this
>+ * should be converted to DMA API once support for controllers that do
>+ * allow it is added to avoid an extra copy.
>+ */
>+ if (tx_len) {
>+ tx_buf = kzalloc(tx_len, GFP_KERNEL);
>+ if (!tx_buf)
>+ goto err_free_req;
>+
>+ req->tx.buf = tx_buf;
>+ req->tx.len = tx_len;
>+ }
>+
>+ if (rx_len) {
>+ rx_buf = kzalloc(rx_len, GFP_KERNEL);
>+ if (!rx_buf)
>+ goto err_free_tx;
>+
>+ req->rx.buf = rx_buf;
>+ req->rx.len = rx_len;
>+ }
>+
As long as we're punting on DMA support, could we do the whole thing in
a single allocation instead of three? It'd add some pointer arithmetic,
but would also simplify the error-handling/deallocation paths a bit.
Or, given that the one controller we're currently supporting has a
hardware limit of 32 bytes per transfer anyway, maybe just inline
fixed-size rx/tx buffers into struct peci_request and have callers keep
them on the stack instead of kmalloc()-ing them?
>+ return req;
>+
>+err_free_tx:
>+ kfree(req->tx.buf);
>+err_free_req:
>+ kfree(req);
>+
>+ return NULL;
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_request_alloc, PECI);
>+
>+/**
>+ * peci_request_free() - free peci_request
>+ * @req: the PECI request to be freed
>+ */
>+void peci_request_free(struct peci_request *req)
>+{
>+ kfree(req->rx.buf);
>+ kfree(req->tx.buf);
>+ kfree(req);
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
>diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c
>index 36c5e2a18a92..db9ef05776e3 100644
>--- a/drivers/peci/sysfs.c
>+++ b/drivers/peci/sysfs.c
>@@ -1,6 +1,8 @@
> // SPDX-License-Identifier: GPL-2.0-only
> // Copyright (c) 2021 Intel Corporation
>
>+#include <linux/device.h>
>+#include <linux/kernel.h>
> #include <linux/peci.h>
>
> #include "internal.h"
>@@ -46,3 +48,35 @@ const struct attribute_group *peci_bus_groups[] = {
> &peci_bus_group,
> NULL
> };
>+
>+static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
>+ const char *buf, size_t count)
>+{
>+ struct peci_device *device = to_peci_device(dev);
>+ bool res;
>+ int ret;
>+
>+ ret = kstrtobool(buf, &res);
>+ if (ret)
>+ return ret;
>+
>+ if (res && device_remove_file_self(dev, attr))
>+ peci_device_destroy(device);
>+
>+ return count;
>+}
>+static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0200, NULL, remove_store);
>+
>+static struct attribute *peci_device_attrs[] = {
>+ &dev_attr_remove.attr,
>+ NULL
>+};
>+
>+static const struct attribute_group peci_device_group = {
>+ .attrs = peci_device_attrs,
>+};
>+
>+const struct attribute_group *peci_device_groups[] = {
>+ &peci_device_group,
>+ NULL
>+};
>--
>2.31.1
>
On Mon, Jul 12, 2021 at 05:04:42PM CDT, Iwona Winiarska wrote:
>Here we're adding support for PECI device drivers, which unlike PECI
>controller drivers are actually able to provide functionalities to
>userspace.
>
>We're also extending peci_request API to allow querying more details
>about PECI device (e.g. model/family), that's going to be used to find
>a compatible peci_driver.
>
>Signed-off-by: Iwona Winiarska <[email protected]>
>Reviewed-by: Pierre-Louis Bossart <[email protected]>
>---
> drivers/peci/Kconfig | 1 +
> drivers/peci/core.c | 49 +++++++++
> drivers/peci/device.c | 99 ++++++++++++++++++
> drivers/peci/internal.h | 75 ++++++++++++++
> drivers/peci/request.c | 217 ++++++++++++++++++++++++++++++++++++++++
> include/linux/peci.h | 19 ++++
> lib/Kconfig | 2 +-
> 7 files changed, 461 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
>index 0d0ee8009713..27c31535843c 100644
>--- a/drivers/peci/Kconfig
>+++ b/drivers/peci/Kconfig
>@@ -2,6 +2,7 @@
>
> menuconfig PECI
> tristate "PECI support"
>+ select GENERIC_LIB_X86
> help
> The Platform Environment Control Interface (PECI) is an interface
> that provides a communication channel to Intel processors and
>diff --git a/drivers/peci/core.c b/drivers/peci/core.c
>index ae7a9572cdf3..94426b7f2618 100644
>--- a/drivers/peci/core.c
>+++ b/drivers/peci/core.c
>@@ -143,8 +143,57 @@ void peci_controller_remove(struct peci_controller *controller)
> }
> EXPORT_SYMBOL_NS_GPL(peci_controller_remove, PECI);
>
>+static const struct peci_device_id *
>+peci_bus_match_device_id(const struct peci_device_id *id, struct peci_device *device)
>+{
>+ while (id->family != 0) {
>+ if (id->family == device->info.family &&
>+ id->model == device->info.model)
>+ return id;
>+ id++;
>+ }
>+
>+ return NULL;
>+}
>+
>+static int peci_bus_device_match(struct device *dev, struct device_driver *drv)
>+{
>+ struct peci_device *device = to_peci_device(dev);
>+ struct peci_driver *peci_drv = to_peci_driver(drv);
>+
>+ if (dev->type != &peci_device_type)
>+ return 0;
>+
>+ if (peci_bus_match_device_id(peci_drv->id_table, device))
>+ return 1;
>+
>+ return 0;
>+}
>+
>+static int peci_bus_device_probe(struct device *dev)
>+{
>+ struct peci_device *device = to_peci_device(dev);
>+ struct peci_driver *driver = to_peci_driver(dev->driver);
>+
>+ return driver->probe(device, peci_bus_match_device_id(driver->id_table, device));
>+}
>+
>+static int peci_bus_device_remove(struct device *dev)
>+{
>+ struct peci_device *device = to_peci_device(dev);
>+ struct peci_driver *driver = to_peci_driver(dev->driver);
>+
>+ if (driver->remove)
>+ driver->remove(device);
>+
>+ return 0;
>+}
>+
> struct bus_type peci_bus_type = {
> .name = "peci",
>+ .match = peci_bus_device_match,
>+ .probe = peci_bus_device_probe,
>+ .remove = peci_bus_device_remove,
> .bus_groups = peci_bus_groups,
> };
>
>diff --git a/drivers/peci/device.c b/drivers/peci/device.c
>index 1124862211e2..8c4bd1ebbc29 100644
>--- a/drivers/peci/device.c
>+++ b/drivers/peci/device.c
>@@ -1,11 +1,79 @@
> // SPDX-License-Identifier: GPL-2.0-only
> // Copyright (c) 2018-2021 Intel Corporation
>
>+#include <linux/bitfield.h>
> #include <linux/peci.h>
> #include <linux/slab.h>
>+#include <linux/x86/cpu.h>
>
> #include "internal.h"
>
>+#define REVISION_NUM_MASK GENMASK(15, 8)
>+static int peci_get_revision(struct peci_device *device, u8 *revision)
>+{
>+ struct peci_request *req;
>+ u64 dib;
>+
>+ req = peci_get_dib(device);
>+ if (IS_ERR(req))
>+ return PTR_ERR(req);
>+
>+ dib = peci_request_data_dib(req);
>+ if (dib == 0) {
>+ peci_request_free(req);
>+ return -EIO;
Any particular reason to check for zero specifically here? It looks
like that would be a case where the host CPU responds and everything's
otherwise fine, but the host just happened to send back a bunch of zeros
for whatever reason -- which may not be a valid PECI revision number,
but if it sent back a bunch of 0xff bytes instead wouldn't that be
equally invalid?
Also, given that the docs (the ones I have, at least) describe the DIB
as a collection of individual bytes, dealing with it as a combined u64
seems a bit confusing to me -- could we just return req->rx.buf[1]
instead?
>+ }
>+
>+ *revision = FIELD_GET(REVISION_NUM_MASK, dib);
>+
>+ peci_request_free(req);
>+
>+ return 0;
>+}
>+
>+static int peci_get_cpu_id(struct peci_device *device, u32 *cpu_id)
>+{
>+ struct peci_request *req;
>+ int ret;
>+
>+ req = peci_pkg_cfg_readl(device, PECI_PCS_PKG_ID, PECI_PKG_ID_CPU_ID);
>+ if (IS_ERR(req))
>+ return PTR_ERR(req);
>+
>+ ret = peci_request_status(req);
>+ if (ret)
>+ goto out_req_free;
>+
>+ *cpu_id = peci_request_data_readl(req);
>+out_req_free:
As suggested on patch #8, I think it might be cleaner to stack-allocate
struct peci_request, which would obviate the need for explicit free
calls in functions like this and hence might simplify it away entirely,
but if this does remain like this we could just do
if (!ret)
*cpu_id = peci_request_data_readl(req);
instead of using a goto to skip a single line.
>+ peci_request_free(req);
>+
>+ return ret;
>+}
>+
>+static int peci_device_info_init(struct peci_device *device)
>+{
>+ u8 revision;
>+ u32 cpu_id;
>+ int ret;
>+
>+ ret = peci_get_cpu_id(device, &cpu_id);
>+ if (ret)
>+ return ret;
>+
>+ device->info.family = x86_family(cpu_id);
>+ device->info.model = x86_model(cpu_id);
>+
>+ ret = peci_get_revision(device, &revision);
>+ if (ret)
>+ return ret;
>+ device->info.peci_revision = revision;
>+
>+ device->info.socket_id = device->addr - PECI_BASE_ADDR;
>+
>+ return 0;
>+}
>+
> static int peci_detect(struct peci_controller *controller, u8 addr)
> {
> struct peci_request *req;
>@@ -75,6 +143,10 @@ int peci_device_create(struct peci_controller *controller, u8 addr)
> device->dev.bus = &peci_bus_type;
> device->dev.type = &peci_device_type;
>
>+ ret = peci_device_info_init(device);
>+ if (ret)
>+ goto err_free;
>+
> ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device->addr);
> if (ret)
> goto err_free;
>@@ -98,6 +170,33 @@ void peci_device_destroy(struct peci_device *device)
> device_unregister(&device->dev);
> }
>
>+int __peci_driver_register(struct peci_driver *driver, struct module *owner,
>+ const char *mod_name)
>+{
>+ driver->driver.bus = &peci_bus_type;
>+ driver->driver.owner = owner;
>+ driver->driver.mod_name = mod_name;
>+
>+ if (!driver->probe) {
>+ pr_err("peci: trying to register driver without probe callback\n");
>+ return -EINVAL;
>+ }
>+
>+ if (!driver->id_table) {
>+ pr_err("peci: trying to register driver without device id table\n");
>+ return -EINVAL;
>+ }
>+
>+ return driver_register(&driver->driver);
>+}
>+EXPORT_SYMBOL_NS_GPL(__peci_driver_register, PECI);
>+
>+void peci_driver_unregister(struct peci_driver *driver)
>+{
>+ driver_unregister(&driver->driver);
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_driver_unregister, PECI);
>+
> static void peci_device_release(struct device *dev)
> {
> struct peci_device *device = to_peci_device(dev);
>diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
>index 6b139adaf6b8..c891c93e077a 100644
>--- a/drivers/peci/internal.h
>+++ b/drivers/peci/internal.h
>@@ -19,6 +19,34 @@ struct peci_request;
> struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len);
> void peci_request_free(struct peci_request *req);
>
>+int peci_request_status(struct peci_request *req);
>+u64 peci_request_data_dib(struct peci_request *req);
>+
>+u8 peci_request_data_readb(struct peci_request *req);
>+u16 peci_request_data_readw(struct peci_request *req);
>+u32 peci_request_data_readl(struct peci_request *req);
>+u64 peci_request_data_readq(struct peci_request *req);
>+
>+struct peci_request *peci_get_dib(struct peci_device *device);
>+struct peci_request *peci_get_temp(struct peci_device *device);
>+
>+struct peci_request *peci_pkg_cfg_readb(struct peci_device *device, u8 index, u16 param);
>+struct peci_request *peci_pkg_cfg_readw(struct peci_device *device, u8 index, u16 param);
>+struct peci_request *peci_pkg_cfg_readl(struct peci_device *device, u8 index, u16 param);
>+struct peci_request *peci_pkg_cfg_readq(struct peci_device *device, u8 index, u16 param);
>+
>+/**
>+ * struct peci_device_id - PECI device data to match
>+ * @data: pointer to driver private data specific to device
>+ * @family: device family
>+ * @model: device model
>+ */
>+struct peci_device_id {
>+ const void *data;
>+ u16 family;
>+ u8 model;
>+};
>+
> extern struct device_type peci_device_type;
> extern const struct attribute_group *peci_device_groups[];
>
>@@ -28,6 +56,53 @@ void peci_device_destroy(struct peci_device *device);
> extern struct bus_type peci_bus_type;
> extern const struct attribute_group *peci_bus_groups[];
>
>+/**
>+ * struct peci_driver - PECI driver
>+ * @driver: inherit device driver
>+ * @probe: probe callback
>+ * @remove: remove callback
>+ * @id_table: PECI device match table to decide which device to bind
>+ */
>+struct peci_driver {
>+ struct device_driver driver;
>+ int (*probe)(struct peci_device *device, const struct peci_device_id *id);
>+ void (*remove)(struct peci_device *device);
>+ const struct peci_device_id *id_table;
>+};
>+
>+static inline struct peci_driver *to_peci_driver(struct device_driver *d)
>+{
>+ return container_of(d, struct peci_driver, driver);
>+}
>+
>+int __peci_driver_register(struct peci_driver *driver, struct module *owner,
>+ const char *mod_name);
>+/**
>+ * peci_driver_register() - register PECI driver
>+ * @driver: the driver to be registered
>+ * @owner: owner module of the driver being registered
>+ * @mod_name: module name string
>+ *
>+ * PECI drivers that don't need to do anything special in module init should
>+ * use the convenience "module_peci_driver" macro instead
>+ *
>+ * Return: zero on success, else a negative error code.
>+ */
>+#define peci_driver_register(driver) \
>+ __peci_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
>+void peci_driver_unregister(struct peci_driver *driver);
>+
>+/**
>+ * module_peci_driver() - Helper macro for registering a modular PECI driver
>+ * @__peci_driver: peci_driver struct
>+ *
>+ * Helper macro for PECI drivers which do not do anything special in module
>+ * init/exit. This eliminates a lot of boilerplate. Each module may only
>+ * use this macro once, and calling it replaces module_init() and module_exit()
>+ */
>+#define module_peci_driver(__peci_driver) \
>+ module_driver(__peci_driver, peci_driver_register, peci_driver_unregister)
>+
> extern struct device_type peci_controller_type;
>
> int peci_controller_scan_devices(struct peci_controller *controller);
>diff --git a/drivers/peci/request.c b/drivers/peci/request.c
>index 78cee51dfae1..48354455b554 100644
>--- a/drivers/peci/request.c
>+++ b/drivers/peci/request.c
>@@ -1,13 +1,142 @@
> // SPDX-License-Identifier: GPL-2.0-only
> // Copyright (c) 2021 Intel Corporation
>
>+#include <linux/bug.h>
> #include <linux/export.h>
> #include <linux/peci.h>
> #include <linux/slab.h>
> #include <linux/types.h>
>
>+#include <asm/unaligned.h>
>+
> #include "internal.h"
>
>+#define PECI_GET_DIB_CMD 0xf7
>+#define PECI_GET_DIB_WR_LEN 1
>+#define PECI_GET_DIB_RD_LEN 8
>+
>+#define PECI_RDPKGCFG_CMD 0xa1
>+#define PECI_RDPKGCFG_WRITE_LEN 5
>+#define PECI_RDPKGCFG_READ_LEN_BASE 1
>+#define PECI_WRPKGCFG_CMD 0xa5
>+#define PECI_WRPKGCFG_WRITE_LEN_BASE 6
>+#define PECI_WRPKGCFG_READ_LEN 1
>+
>+/* Device Specific Completion Code (CC) Definition */
>+#define PECI_CC_SUCCESS 0x40
>+#define PECI_CC_NEED_RETRY 0x80
>+#define PECI_CC_OUT_OF_RESOURCE 0x81
>+#define PECI_CC_UNAVAIL_RESOURCE 0x82
>+#define PECI_CC_INVALID_REQ 0x90
>+#define PECI_CC_MCA_ERROR 0x91
>+#define PECI_CC_CATASTROPHIC_MCA_ERROR 0x93
>+#define PECI_CC_FATAL_MCA_ERROR 0x94
>+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB 0x98
>+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR 0x9B
>+#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA 0x9C
>+
>+#define PECI_RETRY_BIT BIT(0)
>+
>+#define PECI_RETRY_TIMEOUT msecs_to_jiffies(700)
>+#define PECI_RETRY_INTERVAL_MIN msecs_to_jiffies(1)
>+#define PECI_RETRY_INTERVAL_MAX msecs_to_jiffies(128)
>+
>+static u8 peci_request_data_cc(struct peci_request *req)
>+{
>+ return req->rx.buf[0];
>+}
>+
>+/**
>+ * peci_request_status() - return -errno based on PECI completion code
>+ * @req: the PECI request that contains response data with completion code
>+ *
>+ * It can't be used for Ping(), GetDIB() and GetTemp() - for those commands we
>+ * don't expect completion code in the response.
>+ *
>+ * Return: -errno
>+ */
>+int peci_request_status(struct peci_request *req)
>+{
>+ u8 cc = peci_request_data_cc(req);
>+
>+ if (cc != PECI_CC_SUCCESS)
>+ dev_dbg(&req->device->dev, "ret: %#02x\n", cc);
>+
>+ switch (cc) {
>+ case PECI_CC_SUCCESS:
>+ return 0;
>+ case PECI_CC_NEED_RETRY:
>+ case PECI_CC_OUT_OF_RESOURCE:
>+ case PECI_CC_UNAVAIL_RESOURCE:
>+ return -EAGAIN;
>+ case PECI_CC_INVALID_REQ:
>+ return -EINVAL;
>+ case PECI_CC_MCA_ERROR:
>+ case PECI_CC_CATASTROPHIC_MCA_ERROR:
>+ case PECI_CC_FATAL_MCA_ERROR:
>+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB:
>+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR:
>+ case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA:
>+ return -EIO;
>+ }
>+
>+ WARN_ONCE(1, "Unknown PECI completion code: %#02x\n", cc);
>+
>+ return -EIO;
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_request_status, PECI);
>+
>+static int peci_request_xfer(struct peci_request *req)
>+{
>+ struct peci_device *device = req->device;
>+ struct peci_controller *controller = device->controller;
>+ int ret;
>+
>+ mutex_lock(&controller->bus_lock);
>+ ret = controller->xfer(controller, device->addr, req);
>+ mutex_unlock(&controller->bus_lock);
>+
>+ return ret;
>+}
>+
>+static int peci_request_xfer_retry(struct peci_request *req)
>+{
>+ long wait_interval = PECI_RETRY_INTERVAL_MIN;
>+ struct peci_device *device = req->device;
>+ struct peci_controller *controller = device->controller;
>+ unsigned long start = jiffies;
>+ int ret;
>+
>+ /* Don't try to use it for ping */
>+ if (WARN_ON(!req->rx.buf))
>+ return 0;
>+
>+ do {
>+ ret = peci_request_xfer(req);
>+ if (ret) {
>+ dev_dbg(&controller->dev, "xfer error: %d\n", ret);
>+ return ret;
>+ }
>+
>+ if (peci_request_status(req) != -EAGAIN)
>+ return 0;
>+
>+ /* Set the retry bit to indicate a retry attempt */
>+ req->tx.buf[1] |= PECI_RETRY_BIT;
>+
>+ if (schedule_timeout_interruptible(wait_interval))
>+ return -ERESTARTSYS;
>+
>+ wait_interval *= 2;
>+ if (wait_interval > PECI_RETRY_INTERVAL_MAX)
>+ wait_interval = PECI_RETRY_INTERVAL_MAX;
wait_interval = min(wait_interval * 2, PECI_RETRY_INTERVAL_MAX) ?
>+ } while (time_before(jiffies, start + PECI_RETRY_TIMEOUT));
>+
>+ dev_dbg(&controller->dev, "request timed out\n");
>+
>+ return -ETIMEDOUT;
>+}
>+
> /**
> * peci_request_alloc() - allocate &struct peci_request with buffers with given lengths
> * @device: PECI device to which request is going to be sent
>@@ -72,3 +201,91 @@ void peci_request_free(struct peci_request *req)
> kfree(req);
> }
> EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
>+
>+struct peci_request *peci_get_dib(struct peci_device *device)
>+{
>+ struct peci_request *req;
>+ int ret;
>+
>+ req = peci_request_alloc(device, PECI_GET_DIB_WR_LEN, PECI_GET_DIB_RD_LEN);
>+ if (!req)
>+ return ERR_PTR(-ENOMEM);
>+
>+ req->tx.buf[0] = PECI_GET_DIB_CMD;
>+
>+ ret = peci_request_xfer(req);
>+ if (ret) {
>+ peci_request_free(req);
>+ return ERR_PTR(ret);
>+ }
>+
>+ return req;
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_get_dib, PECI);
>+
>+static struct peci_request *
>+__pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
>+{
>+ struct peci_request *req;
>+ int ret;
>+
>+ req = peci_request_alloc(device, PECI_RDPKGCFG_WRITE_LEN,
>+ PECI_RDPKGCFG_READ_LEN_BASE + len);
>+ if (!req)
>+ return ERR_PTR(-ENOMEM);
>+
>+ req->tx.buf[0] = PECI_RDPKGCFG_CMD;
>+ req->tx.buf[1] = 0;
>+ req->tx.buf[2] = index;
>+ put_unaligned_le16(param, &req->tx.buf[3]);
>+
>+ ret = peci_request_xfer_retry(req);
>+ if (ret) {
>+ peci_request_free(req);
>+ return ERR_PTR(ret);
>+ }
>+
>+ return req;
>+}
>+
>+u8 peci_request_data_readb(struct peci_request *req)
>+{
>+ return req->rx.buf[1];
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_request_data_readb, PECI);
>+
>+u16 peci_request_data_readw(struct peci_request *req)
>+{
>+ return get_unaligned_le16(&req->rx.buf[1]);
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_request_data_readw, PECI);
>+
>+u32 peci_request_data_readl(struct peci_request *req)
>+{
>+ return get_unaligned_le32(&req->rx.buf[1]);
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_request_data_readl, PECI);
>+
>+u64 peci_request_data_readq(struct peci_request *req)
>+{
>+ return get_unaligned_le64(&req->rx.buf[1]);
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_request_data_readq, PECI);
>+
>+u64 peci_request_data_dib(struct peci_request *req)
>+{
>+ return get_unaligned_le64(&req->rx.buf[0]);
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_request_data_dib, PECI);
>+
>+#define __read_pkg_config(x, type) \
>+struct peci_request *peci_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param) \
>+{ \
>+ return __pkg_cfg_read(device, index, param, sizeof(type)); \
>+} \
Is there a reason for this particular API? I'd think a more natural one
that would offload a bit of boilerplate from callers would look more like
int peci_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param, type *outp),
returning peci_request_status() and writing the requested data to *outp
if that status is zero.
>+EXPORT_SYMBOL_NS_GPL(peci_pkg_cfg_##x, PECI)
>+
>+__read_pkg_config(readb, u8);
>+__read_pkg_config(readw, u16);
>+__read_pkg_config(readl, u32);
>+__read_pkg_config(readq, u64);
>diff --git a/include/linux/peci.h b/include/linux/peci.h
>index cdf3008321fd..f9f37b874011 100644
>--- a/include/linux/peci.h
>+++ b/include/linux/peci.h
>@@ -9,6 +9,14 @@
> #include <linux/mutex.h>
> #include <linux/types.h>
>
>+#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
>+#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
>+#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
>+#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
>+#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
>+#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
>+#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
>+
> struct peci_request;
>
> /**
>@@ -41,6 +49,11 @@ static inline struct peci_controller *to_peci_controller(void *d)
> * struct peci_device - PECI device
> * @dev: device object to register PECI device to the device model
> * @controller: manages the bus segment hosting this PECI device
>+ * @info: PECI device characteristics
>+ * @info.family: device family
>+ * @info.model: device model
>+ * @info.peci_revision: PECI revision supported by the PECI device
>+ * @info.socket_id: the socket ID represented by the PECI device
> * @addr: address used on the PECI bus connected to the parent controller
> *
> * A peci_device identifies a single device (i.e. CPU) connected to a PECI bus.
>@@ -50,6 +63,12 @@ static inline struct peci_controller *to_peci_controller(void *d)
> struct peci_device {
> struct device dev;
> struct peci_controller *controller;
>+ struct {
>+ u16 family;
>+ u8 model;
>+ u8 peci_revision;
This field gets set but doesn't seem to end up used anywhere; is it
useful?
>+ u8 socket_id;
>+ } info;
> u8 addr;
> };
>
>diff --git a/lib/Kconfig b/lib/Kconfig
>index cc28bc1f2d84..a74e6c0fa75c 100644
>--- a/lib/Kconfig
>+++ b/lib/Kconfig
>@@ -721,5 +721,5 @@ config ASN1_ENCODER
>
> config GENERIC_LIB_X86
> bool
>- depends on X86
>+ depends on X86 || PECI
> default n
>--
>2.31.1
>
On 7/27/21 1:10 PM, Zev Weiss wrote:
> On Mon, Jul 12, 2021 at 05:04:42PM CDT, Iwona Winiarska wrote:
>> Here we're adding support for PECI device drivers, which unlike PECI
>> controller drivers are actually able to provide functionalities to
>> userspace.
>>
>> We're also extending peci_request API to allow querying more details
>> about PECI device (e.g. model/family), that's going to be used to find
>> a compatible peci_driver.
>>
>> Signed-off-by: Iwona Winiarska <[email protected]>
>> Reviewed-by: Pierre-Louis Bossart <[email protected]>
>> ---
>> drivers/peci/Kconfig | 1 +
>> drivers/peci/core.c | 49 +++++++++
>> drivers/peci/device.c | 99 ++++++++++++++++++
>> drivers/peci/internal.h | 75 ++++++++++++++
>> drivers/peci/request.c | 217 ++++++++++++++++++++++++++++++++++++++++
>> include/linux/peci.h | 19 ++++
>> lib/Kconfig | 2 +-
>> 7 files changed, 461 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
>> index 0d0ee8009713..27c31535843c 100644
>> --- a/drivers/peci/Kconfig
>> +++ b/drivers/peci/Kconfig
>> @@ -2,6 +2,7 @@
>>
>> menuconfig PECI
>> tristate "PECI support"
>> + select GENERIC_LIB_X86
>> help
>> The Platform Environment Control Interface (PECI) is an interface
>> that provides a communication channel to Intel processors and
>> diff --git a/drivers/peci/core.c b/drivers/peci/core.c
>> index ae7a9572cdf3..94426b7f2618 100644
>> --- a/drivers/peci/core.c
>> +++ b/drivers/peci/core.c
>> @@ -143,8 +143,57 @@ void peci_controller_remove(struct peci_controller *controller)
>> }
>> EXPORT_SYMBOL_NS_GPL(peci_controller_remove, PECI);
>>
>> +static const struct peci_device_id *
>> +peci_bus_match_device_id(const struct peci_device_id *id, struct peci_device *device)
>> +{
>> + while (id->family != 0) {
>> + if (id->family == device->info.family &&
>> + id->model == device->info.model)
>> + return id;
>> + id++;
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> +static int peci_bus_device_match(struct device *dev, struct device_driver *drv)
>> +{
>> + struct peci_device *device = to_peci_device(dev);
>> + struct peci_driver *peci_drv = to_peci_driver(drv);
>> +
>> + if (dev->type != &peci_device_type)
>> + return 0;
>> +
>> + if (peci_bus_match_device_id(peci_drv->id_table, device))
>> + return 1;
>> +
>> + return 0;
>> +}
>> +
>> +static int peci_bus_device_probe(struct device *dev)
>> +{
>> + struct peci_device *device = to_peci_device(dev);
>> + struct peci_driver *driver = to_peci_driver(dev->driver);
>> +
>> + return driver->probe(device, peci_bus_match_device_id(driver->id_table, device));
>> +}
>> +
>> +static int peci_bus_device_remove(struct device *dev)
>> +{
>> + struct peci_device *device = to_peci_device(dev);
>> + struct peci_driver *driver = to_peci_driver(dev->driver);
>> +
>> + if (driver->remove)
>> + driver->remove(device);
>> +
>> + return 0;
>> +}
>> +
>> struct bus_type peci_bus_type = {
>> .name = "peci",
>> + .match = peci_bus_device_match,
>> + .probe = peci_bus_device_probe,
>> + .remove = peci_bus_device_remove,
>> .bus_groups = peci_bus_groups,
>> };
>>
>> diff --git a/drivers/peci/device.c b/drivers/peci/device.c
>> index 1124862211e2..8c4bd1ebbc29 100644
>> --- a/drivers/peci/device.c
>> +++ b/drivers/peci/device.c
>> @@ -1,11 +1,79 @@
>> // SPDX-License-Identifier: GPL-2.0-only
>> // Copyright (c) 2018-2021 Intel Corporation
>>
>> +#include <linux/bitfield.h>
>> #include <linux/peci.h>
>> #include <linux/slab.h>
>> +#include <linux/x86/cpu.h>
>>
>> #include "internal.h"
>>
>> +#define REVISION_NUM_MASK GENMASK(15, 8)
>> +static int peci_get_revision(struct peci_device *device, u8 *revision)
>> +{
>> + struct peci_request *req;
>> + u64 dib;
>> +
>> + req = peci_get_dib(device);
>> + if (IS_ERR(req))
>> + return PTR_ERR(req);
>> +
>> + dib = peci_request_data_dib(req);
>> + if (dib == 0) {
>> + peci_request_free(req);
>> + return -EIO;
>
> Any particular reason to check for zero specifically here? It looks
> like that would be a case where the host CPU responds and everything's
> otherwise fine, but the host just happened to send back a bunch of zeros
> for whatever reason -- which may not be a valid PECI revision number,
> but if it sent back a bunch of 0xff bytes instead wouldn't that be
> equally invalid?
>
> Also, given that the docs (the ones I have, at least) describe the DIB
> as a collection of individual bytes, dealing with it as a combined u64
> seems a bit confusing to me -- could we just return req->rx.buf[1]
> instead?
>
>> + }
>> +
>> + *revision = FIELD_GET(REVISION_NUM_MASK, dib);
>> +
>> + peci_request_free(req);
>> +
>> + return 0;
>> +}
>> +
>> +static int peci_get_cpu_id(struct peci_device *device, u32 *cpu_id)
>> +{
>> + struct peci_request *req;
>> + int ret;
>> +
>> + req = peci_pkg_cfg_readl(device, PECI_PCS_PKG_ID, PECI_PKG_ID_CPU_ID);
>> + if (IS_ERR(req))
>> + return PTR_ERR(req);
>> +
>> + ret = peci_request_status(req);
>> + if (ret)
>> + goto out_req_free;
>> +
>> + *cpu_id = peci_request_data_readl(req);
>> +out_req_free:
>
> As suggested on patch #8, I think it might be cleaner to stack-allocate
> struct peci_request, which would obviate the need for explicit free
> calls in functions like this and hence might simplify it away entirely,
> but if this does remain like this we could just do
>
> if (!ret)
> *cpu_id = peci_request_data_readl(req);
>
> instead of using a goto to skip a single line.
>
As a maintainer I would ask submitters to follow
Documentation/process/coding-style.rst, chapter 7.
Guenter
>> + peci_request_free(req);
>> +
>> + return ret;
>> +}
>> +
>> +static int peci_device_info_init(struct peci_device *device)
>> +{
>> + u8 revision;
>> + u32 cpu_id;
>> + int ret;
>> +
>> + ret = peci_get_cpu_id(device, &cpu_id);
>> + if (ret)
>> + return ret;
>> +
>> + device->info.family = x86_family(cpu_id);
>> + device->info.model = x86_model(cpu_id);
>> +
>> + ret = peci_get_revision(device, &revision);
>> + if (ret)
>> + return ret;
>> + device->info.peci_revision = revision;
>> +
>> + device->info.socket_id = device->addr - PECI_BASE_ADDR;
>> +
>> + return 0;
>> +}
>> +
>> static int peci_detect(struct peci_controller *controller, u8 addr)
>> {
>> struct peci_request *req;
>> @@ -75,6 +143,10 @@ int peci_device_create(struct peci_controller *controller, u8 addr)
>> device->dev.bus = &peci_bus_type;
>> device->dev.type = &peci_device_type;
>>
>> + ret = peci_device_info_init(device);
>> + if (ret)
>> + goto err_free;
>> +
>> ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device->addr);
>> if (ret)
>> goto err_free;
>> @@ -98,6 +170,33 @@ void peci_device_destroy(struct peci_device *device)
>> device_unregister(&device->dev);
>> }
>>
>> +int __peci_driver_register(struct peci_driver *driver, struct module *owner,
>> + const char *mod_name)
>> +{
>> + driver->driver.bus = &peci_bus_type;
>> + driver->driver.owner = owner;
>> + driver->driver.mod_name = mod_name;
>> +
>> + if (!driver->probe) {
>> + pr_err("peci: trying to register driver without probe callback\n");
>> + return -EINVAL;
>> + }
>> +
>> + if (!driver->id_table) {
>> + pr_err("peci: trying to register driver without device id table\n");
>> + return -EINVAL;
>> + }
>> +
>> + return driver_register(&driver->driver);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(__peci_driver_register, PECI);
>> +
>> +void peci_driver_unregister(struct peci_driver *driver)
>> +{
>> + driver_unregister(&driver->driver);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(peci_driver_unregister, PECI);
>> +
>> static void peci_device_release(struct device *dev)
>> {
>> struct peci_device *device = to_peci_device(dev);
>> diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
>> index 6b139adaf6b8..c891c93e077a 100644
>> --- a/drivers/peci/internal.h
>> +++ b/drivers/peci/internal.h
>> @@ -19,6 +19,34 @@ struct peci_request;
>> struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u8 rx_len);
>> void peci_request_free(struct peci_request *req);
>>
>> +int peci_request_status(struct peci_request *req);
>> +u64 peci_request_data_dib(struct peci_request *req);
>> +
>> +u8 peci_request_data_readb(struct peci_request *req);
>> +u16 peci_request_data_readw(struct peci_request *req);
>> +u32 peci_request_data_readl(struct peci_request *req);
>> +u64 peci_request_data_readq(struct peci_request *req);
>> +
>> +struct peci_request *peci_get_dib(struct peci_device *device);
>> +struct peci_request *peci_get_temp(struct peci_device *device);
>> +
>> +struct peci_request *peci_pkg_cfg_readb(struct peci_device *device, u8 index, u16 param);
>> +struct peci_request *peci_pkg_cfg_readw(struct peci_device *device, u8 index, u16 param);
>> +struct peci_request *peci_pkg_cfg_readl(struct peci_device *device, u8 index, u16 param);
>> +struct peci_request *peci_pkg_cfg_readq(struct peci_device *device, u8 index, u16 param);
>> +
>> +/**
>> + * struct peci_device_id - PECI device data to match
>> + * @data: pointer to driver private data specific to device
>> + * @family: device family
>> + * @model: device model
>> + */
>> +struct peci_device_id {
>> + const void *data;
>> + u16 family;
>> + u8 model;
>> +};
>> +
>> extern struct device_type peci_device_type;
>> extern const struct attribute_group *peci_device_groups[];
>>
>> @@ -28,6 +56,53 @@ void peci_device_destroy(struct peci_device *device);
>> extern struct bus_type peci_bus_type;
>> extern const struct attribute_group *peci_bus_groups[];
>>
>> +/**
>> + * struct peci_driver - PECI driver
>> + * @driver: inherit device driver
>> + * @probe: probe callback
>> + * @remove: remove callback
>> + * @id_table: PECI device match table to decide which device to bind
>> + */
>> +struct peci_driver {
>> + struct device_driver driver;
>> + int (*probe)(struct peci_device *device, const struct peci_device_id *id);
>> + void (*remove)(struct peci_device *device);
>> + const struct peci_device_id *id_table;
>> +};
>> +
>> +static inline struct peci_driver *to_peci_driver(struct device_driver *d)
>> +{
>> + return container_of(d, struct peci_driver, driver);
>> +}
>> +
>> +int __peci_driver_register(struct peci_driver *driver, struct module *owner,
>> + const char *mod_name);
>> +/**
>> + * peci_driver_register() - register PECI driver
>> + * @driver: the driver to be registered
>> + * @owner: owner module of the driver being registered
>> + * @mod_name: module name string
>> + *
>> + * PECI drivers that don't need to do anything special in module init should
>> + * use the convenience "module_peci_driver" macro instead
>> + *
>> + * Return: zero on success, else a negative error code.
>> + */
>> +#define peci_driver_register(driver) \
>> + __peci_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
>> +void peci_driver_unregister(struct peci_driver *driver);
>> +
>> +/**
>> + * module_peci_driver() - Helper macro for registering a modular PECI driver
>> + * @__peci_driver: peci_driver struct
>> + *
>> + * Helper macro for PECI drivers which do not do anything special in module
>> + * init/exit. This eliminates a lot of boilerplate. Each module may only
>> + * use this macro once, and calling it replaces module_init() and module_exit()
>> + */
>> +#define module_peci_driver(__peci_driver) \
>> + module_driver(__peci_driver, peci_driver_register, peci_driver_unregister)
>> +
>> extern struct device_type peci_controller_type;
>>
>> int peci_controller_scan_devices(struct peci_controller *controller);
>> diff --git a/drivers/peci/request.c b/drivers/peci/request.c
>> index 78cee51dfae1..48354455b554 100644
>> --- a/drivers/peci/request.c
>> +++ b/drivers/peci/request.c
>> @@ -1,13 +1,142 @@
>> // SPDX-License-Identifier: GPL-2.0-only
>> // Copyright (c) 2021 Intel Corporation
>>
>> +#include <linux/bug.h>
>> #include <linux/export.h>
>> #include <linux/peci.h>
>> #include <linux/slab.h>
>> #include <linux/types.h>
>>
>> +#include <asm/unaligned.h>
>> +
>> #include "internal.h"
>>
>> +#define PECI_GET_DIB_CMD 0xf7
>> +#define PECI_GET_DIB_WR_LEN 1
>> +#define PECI_GET_DIB_RD_LEN 8
>> +
>> +#define PECI_RDPKGCFG_CMD 0xa1
>> +#define PECI_RDPKGCFG_WRITE_LEN 5
>> +#define PECI_RDPKGCFG_READ_LEN_BASE 1
>> +#define PECI_WRPKGCFG_CMD 0xa5
>> +#define PECI_WRPKGCFG_WRITE_LEN_BASE 6
>> +#define PECI_WRPKGCFG_READ_LEN 1
>> +
>> +/* Device Specific Completion Code (CC) Definition */
>> +#define PECI_CC_SUCCESS 0x40
>> +#define PECI_CC_NEED_RETRY 0x80
>> +#define PECI_CC_OUT_OF_RESOURCE 0x81
>> +#define PECI_CC_UNAVAIL_RESOURCE 0x82
>> +#define PECI_CC_INVALID_REQ 0x90
>> +#define PECI_CC_MCA_ERROR 0x91
>> +#define PECI_CC_CATASTROPHIC_MCA_ERROR 0x93
>> +#define PECI_CC_FATAL_MCA_ERROR 0x94
>> +#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB 0x98
>> +#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR 0x9B
>> +#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA 0x9C
>> +
>> +#define PECI_RETRY_BIT BIT(0)
>> +
>> +#define PECI_RETRY_TIMEOUT msecs_to_jiffies(700)
>> +#define PECI_RETRY_INTERVAL_MIN msecs_to_jiffies(1)
>> +#define PECI_RETRY_INTERVAL_MAX msecs_to_jiffies(128)
>> +
>> +static u8 peci_request_data_cc(struct peci_request *req)
>> +{
>> + return req->rx.buf[0];
>> +}
>> +
>> +/**
>> + * peci_request_status() - return -errno based on PECI completion code
>> + * @req: the PECI request that contains response data with completion code
>> + *
>> + * It can't be used for Ping(), GetDIB() and GetTemp() - for those commands we
>> + * don't expect completion code in the response.
>> + *
>> + * Return: -errno
>> + */
>> +int peci_request_status(struct peci_request *req)
>> +{
>> + u8 cc = peci_request_data_cc(req);
>> +
>> + if (cc != PECI_CC_SUCCESS)
>> + dev_dbg(&req->device->dev, "ret: %#02x\n", cc);
>> +
>> + switch (cc) {
>> + case PECI_CC_SUCCESS:
>> + return 0;
>> + case PECI_CC_NEED_RETRY:
>> + case PECI_CC_OUT_OF_RESOURCE:
>> + case PECI_CC_UNAVAIL_RESOURCE:
>> + return -EAGAIN;
>> + case PECI_CC_INVALID_REQ:
>> + return -EINVAL;
>> + case PECI_CC_MCA_ERROR:
>> + case PECI_CC_CATASTROPHIC_MCA_ERROR:
>> + case PECI_CC_FATAL_MCA_ERROR:
>> + case PECI_CC_PARITY_ERR_GPSB_OR_PMSB:
>> + case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR:
>> + case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA:
>> + return -EIO;
>> + }
>> +
>> + WARN_ONCE(1, "Unknown PECI completion code: %#02x\n", cc);
>> +
>> + return -EIO;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(peci_request_status, PECI);
>> +
>> +static int peci_request_xfer(struct peci_request *req)
>> +{
>> + struct peci_device *device = req->device;
>> + struct peci_controller *controller = device->controller;
>> + int ret;
>> +
>> + mutex_lock(&controller->bus_lock);
>> + ret = controller->xfer(controller, device->addr, req);
>> + mutex_unlock(&controller->bus_lock);
>> +
>> + return ret;
>> +}
>> +
>> +static int peci_request_xfer_retry(struct peci_request *req)
>> +{
>> + long wait_interval = PECI_RETRY_INTERVAL_MIN;
>> + struct peci_device *device = req->device;
>> + struct peci_controller *controller = device->controller;
>> + unsigned long start = jiffies;
>> + int ret;
>> +
>> + /* Don't try to use it for ping */
>> + if (WARN_ON(!req->rx.buf))
>> + return 0;
>> +
>> + do {
>> + ret = peci_request_xfer(req);
>> + if (ret) {
>> + dev_dbg(&controller->dev, "xfer error: %d\n", ret);
>> + return ret;
>> + }
>> +
>> + if (peci_request_status(req) != -EAGAIN)
>> + return 0;
>> +
>> + /* Set the retry bit to indicate a retry attempt */
>> + req->tx.buf[1] |= PECI_RETRY_BIT;
>> +
>> + if (schedule_timeout_interruptible(wait_interval))
>> + return -ERESTARTSYS;
>> +
>> + wait_interval *= 2;
>> + if (wait_interval > PECI_RETRY_INTERVAL_MAX)
>> + wait_interval = PECI_RETRY_INTERVAL_MAX;
>
> wait_interval = min(wait_interval * 2, PECI_RETRY_INTERVAL_MAX) ?
>
>> + } while (time_before(jiffies, start + PECI_RETRY_TIMEOUT));
>> +
>> + dev_dbg(&controller->dev, "request timed out\n");
>> +
>> + return -ETIMEDOUT;
>> +}
>> +
>> /**
>> * peci_request_alloc() - allocate &struct peci_request with buffers with given lengths
>> * @device: PECI device to which request is going to be sent
>> @@ -72,3 +201,91 @@ void peci_request_free(struct peci_request *req)
>> kfree(req);
>> }
>> EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
>> +
>> +struct peci_request *peci_get_dib(struct peci_device *device)
>> +{
>> + struct peci_request *req;
>> + int ret;
>> +
>> + req = peci_request_alloc(device, PECI_GET_DIB_WR_LEN, PECI_GET_DIB_RD_LEN);
>> + if (!req)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + req->tx.buf[0] = PECI_GET_DIB_CMD;
>> +
>> + ret = peci_request_xfer(req);
>> + if (ret) {
>> + peci_request_free(req);
>> + return ERR_PTR(ret);
>> + }
>> +
>> + return req;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(peci_get_dib, PECI);
>> +
>> +static struct peci_request *
>> +__pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
>> +{
>> + struct peci_request *req;
>> + int ret;
>> +
>> + req = peci_request_alloc(device, PECI_RDPKGCFG_WRITE_LEN,
>> + PECI_RDPKGCFG_READ_LEN_BASE + len);
>> + if (!req)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + req->tx.buf[0] = PECI_RDPKGCFG_CMD;
>> + req->tx.buf[1] = 0;
>> + req->tx.buf[2] = index;
>> + put_unaligned_le16(param, &req->tx.buf[3]);
>> +
>> + ret = peci_request_xfer_retry(req);
>> + if (ret) {
>> + peci_request_free(req);
>> + return ERR_PTR(ret);
>> + }
>> +
>> + return req;
>> +}
>> +
>> +u8 peci_request_data_readb(struct peci_request *req)
>> +{
>> + return req->rx.buf[1];
>> +}
>> +EXPORT_SYMBOL_NS_GPL(peci_request_data_readb, PECI);
>> +
>> +u16 peci_request_data_readw(struct peci_request *req)
>> +{
>> + return get_unaligned_le16(&req->rx.buf[1]);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(peci_request_data_readw, PECI);
>> +
>> +u32 peci_request_data_readl(struct peci_request *req)
>> +{
>> + return get_unaligned_le32(&req->rx.buf[1]);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(peci_request_data_readl, PECI);
>> +
>> +u64 peci_request_data_readq(struct peci_request *req)
>> +{
>> + return get_unaligned_le64(&req->rx.buf[1]);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(peci_request_data_readq, PECI);
>> +
>> +u64 peci_request_data_dib(struct peci_request *req)
>> +{
>> + return get_unaligned_le64(&req->rx.buf[0]);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(peci_request_data_dib, PECI);
>> +
>> +#define __read_pkg_config(x, type) \
>> +struct peci_request *peci_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param) \
>> +{ \
>> + return __pkg_cfg_read(device, index, param, sizeof(type)); \
>> +} \
>
> Is there a reason for this particular API? I'd think a more natural one
> that would offload a bit of boilerplate from callers would look more like
>
> int peci_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param, type *outp),
>
> returning peci_request_status() and writing the requested data to *outp
> if that status is zero.
>
>> +EXPORT_SYMBOL_NS_GPL(peci_pkg_cfg_##x, PECI)
>> +
>> +__read_pkg_config(readb, u8);
>> +__read_pkg_config(readw, u16);
>> +__read_pkg_config(readl, u32);
>> +__read_pkg_config(readq, u64);
>> diff --git a/include/linux/peci.h b/include/linux/peci.h
>> index cdf3008321fd..f9f37b874011 100644
>> --- a/include/linux/peci.h
>> +++ b/include/linux/peci.h
>> @@ -9,6 +9,14 @@
>> #include <linux/mutex.h>
>> #include <linux/types.h>
>>
>> +#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
>> +#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
>> +#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
>> +#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
>> +#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
>> +#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
>> +#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
>> +
>> struct peci_request;
>>
>> /**
>> @@ -41,6 +49,11 @@ static inline struct peci_controller *to_peci_controller(void *d)
>> * struct peci_device - PECI device
>> * @dev: device object to register PECI device to the device model
>> * @controller: manages the bus segment hosting this PECI device
>> + * @info: PECI device characteristics
>> + * @info.family: device family
>> + * @info.model: device model
>> + * @info.peci_revision: PECI revision supported by the PECI device
>> + * @info.socket_id: the socket ID represented by the PECI device
>> * @addr: address used on the PECI bus connected to the parent controller
>> *
>> * A peci_device identifies a single device (i.e. CPU) connected to a PECI bus.
>> @@ -50,6 +63,12 @@ static inline struct peci_controller *to_peci_controller(void *d)
>> struct peci_device {
>> struct device dev;
>> struct peci_controller *controller;
>> + struct {
>> + u16 family;
>> + u8 model;
>> + u8 peci_revision;
>
> This field gets set but doesn't seem to end up used anywhere; is it
> useful?
>
>> + u8 socket_id;
>> + } info;
>> u8 addr;
>> };
>>
>> diff --git a/lib/Kconfig b/lib/Kconfig
>> index cc28bc1f2d84..a74e6c0fa75c 100644
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -721,5 +721,5 @@ config ASN1_ENCODER
>>
>> config GENERIC_LIB_X86
>> bool
>> - depends on X86
>> + depends on X86 || PECI
>> default n
>> --
>> 2.31.1
>>
>
On Mon, Jul 12, 2021 at 05:04:43PM CDT, Iwona Winiarska wrote:
>PECI is an interface that may be used by different types of devices.
>Here we're adding a peci-cpu driver compatible with Intel processors.
>The driver is responsible for handling auxiliary devices that can
>subsequently be used by other drivers (e.g. hwmons).
>
>Signed-off-by: Iwona Winiarska <[email protected]>
>Reviewed-by: Pierre-Louis Bossart <[email protected]>
>---
> MAINTAINERS | 1 +
> drivers/peci/Kconfig | 15 ++
> drivers/peci/Makefile | 2 +
> drivers/peci/cpu.c | 347 +++++++++++++++++++++++++++++++++++++++
> drivers/peci/device.c | 1 +
> drivers/peci/internal.h | 27 +++
> drivers/peci/request.c | 211 ++++++++++++++++++++++++
> include/linux/peci-cpu.h | 38 +++++
> include/linux/peci.h | 8 -
> 9 files changed, 642 insertions(+), 8 deletions(-)
> create mode 100644 drivers/peci/cpu.c
> create mode 100644 include/linux/peci-cpu.h
>
>diff --git a/MAINTAINERS b/MAINTAINERS
>index 4ba874afa2fa..f47b5f634293 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -14511,6 +14511,7 @@ L: [email protected] (moderated for non-subscribers)
> S: Supported
> F: Documentation/devicetree/bindings/peci/
> F: drivers/peci/
>+F: include/linux/peci-cpu.h
> F: include/linux/peci.h
>
> PENSANDO ETHERNET DRIVERS
>diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
>index 27c31535843c..9e17e06fda90 100644
>--- a/drivers/peci/Kconfig
>+++ b/drivers/peci/Kconfig
>@@ -16,6 +16,21 @@ menuconfig PECI
>
> if PECI
>
>+config PECI_CPU
>+ tristate "PECI CPU"
>+ select AUXILIARY_BUS
>+ help
>+ This option enables peci-cpu driver for Intel processors. It is
>+ responsible for creating auxiliary devices that can subsequently
>+ be used by other drivers in order to perform various
>+ functionalities such as e.g. temperature monitoring.
>+
>+ Additional drivers must be enabled in order to use the functionality
>+ of the device.
>+
>+ This driver can also be built as a module. If so, the module
>+ will be called peci-cpu.
>+
> source "drivers/peci/controller/Kconfig"
>
> endif # PECI
>diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
>index 917f689e147a..7de18137e738 100644
>--- a/drivers/peci/Makefile
>+++ b/drivers/peci/Makefile
>@@ -3,6 +3,8 @@
> # Core functionality
> peci-y := core.o request.o device.o sysfs.o
> obj-$(CONFIG_PECI) += peci.o
>+peci-cpu-y := cpu.o
>+obj-$(CONFIG_PECI_CPU) += peci-cpu.o
>
> # Hardware specific bus drivers
> obj-y += controller/
>diff --git a/drivers/peci/cpu.c b/drivers/peci/cpu.c
>new file mode 100644
>index 000000000000..8d130a9a71ad
>--- /dev/null
>+++ b/drivers/peci/cpu.c
>@@ -0,0 +1,347 @@
>+// SPDX-License-Identifier: GPL-2.0-only
>+// Copyright (c) 2021 Intel Corporation
>+
>+#include <linux/auxiliary_bus.h>
>+#include <linux/module.h>
>+#include <linux/peci.h>
>+#include <linux/peci-cpu.h>
>+#include <linux/slab.h>
>+#include <linux/x86/intel-family.h>
>+
>+#include "internal.h"
>+
>+/**
>+ * peci_temp_read() - read the maximum die temperature from PECI target device
>+ * @device: PECI device to which request is going to be sent
>+ * @temp_raw: where to store the read temperature
>+ *
>+ * It uses GetTemp PECI command.
>+ *
>+ * Return: 0 if succeeded, other values in case errors.
>+ */
>+int peci_temp_read(struct peci_device *device, s16 *temp_raw)
>+{
>+ struct peci_request *req;
>+
>+ req = peci_get_temp(device);
>+ if (IS_ERR(req))
>+ return PTR_ERR(req);
>+
>+ *temp_raw = peci_request_data_temp(req);
>+
>+ peci_request_free(req);
>+
>+ return 0;
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_temp_read, PECI_CPU);
>+
>+/**
>+ * peci_pcs_read() - read PCS register
>+ * @device: PECI device to which request is going to be sent
>+ * @index: PCS index
>+ * @param: PCS parameter
>+ * @data: where to store the read data
>+ *
>+ * It uses RdPkgConfig PECI command.
>+ *
>+ * Return: 0 if succeeded, other values in case errors.
>+ */
>+int peci_pcs_read(struct peci_device *device, u8 index, u16 param, u32 *data)
>+{
>+ struct peci_request *req;
>+ int ret;
>+
>+ req = peci_pkg_cfg_readl(device, index, param);
>+ if (IS_ERR(req))
>+ return PTR_ERR(req);
>+
>+ ret = peci_request_status(req);
>+ if (ret)
>+ goto out_req_free;
>+
>+ *data = peci_request_data_readl(req);
>+out_req_free:
As in patch 9, this control flow could be rewritten as just
if (!ret)
*data = peci_request_data_readl(req);
and avoid the goto.
>+ peci_request_free(req);
>+
>+ return ret;
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_pcs_read, PECI_CPU);
>+
>+/**
>+ * peci_pci_local_read() - read 32-bit memory location using raw address
>+ * @device: PECI device to which request is going to be sent
>+ * @bus: bus
>+ * @dev: device
>+ * @func: function
>+ * @reg: register
>+ * @data: where to store the read data
>+ *
>+ * It uses RdPCIConfigLocal PECI command.
>+ *
>+ * Return: 0 if succeeded, other values in case errors.
>+ */
>+int peci_pci_local_read(struct peci_device *device, u8 bus, u8 dev, u8 func,
>+ u16 reg, u32 *data)
>+{
>+ struct peci_request *req;
>+ int ret;
>+
>+ req = peci_pci_cfg_local_readl(device, bus, dev, func, reg);
>+ if (IS_ERR(req))
>+ return PTR_ERR(req);
>+
>+ ret = peci_request_status(req);
>+ if (ret)
>+ goto out_req_free;
>+
>+ *data = peci_request_data_readl(req);
>+out_req_free:
>+ peci_request_free(req);
>+
>+ return ret;
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_pci_local_read, PECI_CPU);
>+
>+/**
>+ * peci_ep_pci_local_read() - read 32-bit memory location using raw address
>+ * @device: PECI device to which request is going to be sent
>+ * @seg: PCI segment
>+ * @bus: bus
>+ * @dev: device
>+ * @func: function
>+ * @reg: register
>+ * @data: where to store the read data
>+ *
>+ * Like &peci_pci_local_read, but it uses RdEndpointConfig PECI command.
>+ *
>+ * Return: 0 if succeeded, other values in case errors.
>+ */
>+int peci_ep_pci_local_read(struct peci_device *device, u8 seg,
>+ u8 bus, u8 dev, u8 func, u16 reg, u32 *data)
>+{
>+ struct peci_request *req;
>+ int ret;
>+
>+ req = peci_ep_pci_cfg_local_readl(device, seg, bus, dev, func, reg);
>+ if (IS_ERR(req))
>+ return PTR_ERR(req);
>+
>+ ret = peci_request_status(req);
>+ if (ret)
>+ goto out_req_free;
>+
>+ *data = peci_request_data_readl(req);
>+out_req_free:
>+ peci_request_free(req);
>+
>+ return ret;
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_ep_pci_local_read, PECI_CPU);
>+
>+/**
>+ * peci_mmio_read() - read 32-bit memory location using 64-bit bar offset address
>+ * @device: PECI device to which request is going to be sent
>+ * @bar: PCI bar
>+ * @seg: PCI segment
>+ * @bus: bus
>+ * @dev: device
>+ * @func: function
>+ * @address: 64-bit MMIO address
>+ * @data: where to store the read data
>+ *
>+ * It uses RdEndpointConfig PECI command.
>+ *
>+ * Return: 0 if succeeded, other values in case errors.
>+ */
>+int peci_mmio_read(struct peci_device *device, u8 bar, u8 seg,
>+ u8 bus, u8 dev, u8 func, u64 address, u32 *data)
>+{
>+ struct peci_request *req;
>+ int ret;
>+
>+ req = peci_ep_mmio64_readl(device, bar, seg, bus, dev, func, address);
>+ if (IS_ERR(req))
>+ return PTR_ERR(req);
>+
>+ ret = peci_request_status(req);
>+ if (ret)
>+ goto out_req_free;
>+
>+ *data = peci_request_data_readl(req);
>+out_req_free:
>+ peci_request_free(req);
>+
>+ return ret;
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_mmio_read, PECI_CPU);
>+
>+struct peci_cpu {
>+ struct peci_device *device;
>+ const struct peci_device_id *id;
>+ struct auxiliary_device **aux_devices;
Given that the size for this allocation is a compile-time constant,
should we just inline this as 'struct auxiliary_device
*aux_devices[ARRAY_SIZE(type)]' and avoid some kmalloc work in
peci_cpu_add_adevices()?
>+};
>+
>+static const char * const type[] = {
A slightly more descriptive name might be good -- maybe something like
'peci_adevice_types'?
>+ "cputemp",
>+ "dimmtemp",
>+};
>+
>+static void adev_release(struct device *dev)
>+{
>+ struct auxiliary_device *adev = to_auxiliary_dev(dev);
>+
>+ kfree(adev->name);
>+ kfree(adev);
>+}
>+
>+static struct auxiliary_device *add_adev(struct peci_cpu *priv, int idx)
>+{
>+ struct peci_controller *controller = priv->device->controller;
>+ struct auxiliary_device *adev;
>+ const char *name;
>+ int ret;
>+
>+ adev = kzalloc(sizeof(*adev), GFP_KERNEL);
>+ if (!adev)
>+ return ERR_PTR(-ENOMEM);
>+
>+ name = kasprintf(GFP_KERNEL, "%s.%s", type[idx], (const char *)priv->id->data);
>+ if (!name) {
>+ ret = -ENOMEM;
>+ goto free_adev;
>+ }
>+
>+ adev->name = name;
>+ adev->dev.parent = &priv->device->dev;
>+ adev->dev.release = adev_release;
>+ adev->id = (controller->id << 16) | (priv->device->addr);
>+
>+ ret = auxiliary_device_init(adev);
>+ if (ret)
>+ goto free_name;
>+
>+ ret = auxiliary_device_add(adev);
>+ if (ret) {
>+ auxiliary_device_uninit(adev);
>+ return ERR_PTR(ret);
>+ }
>+
>+ return adev;
>+
>+free_name:
>+ kfree(name);
>+free_adev:
>+ kfree(adev);
>+ return ERR_PTR(ret);
>+}
>+
>+static void del_adev(struct auxiliary_device *adev)
>+{
>+ auxiliary_device_delete(adev);
>+ auxiliary_device_uninit(adev);
>+}
>+
>+static int peci_cpu_add_adevices(struct peci_cpu *priv)
>+{
>+ struct device *dev = &priv->device->dev;
>+ struct auxiliary_device *adev;
>+ int i;
>+
>+ priv->aux_devices = devm_kcalloc(dev, ARRAY_SIZE(type),
>+ sizeof(*priv->aux_devices),
>+ GFP_KERNEL);
>+ if (!priv->aux_devices)
>+ return -ENOMEM;
>+
>+ for (i = 0; i < ARRAY_SIZE(type); i++) {
>+ adev = add_adev(priv, i);
>+ if (IS_ERR(adev)) {
>+ dev_warn(dev, "Failed to add PECI auxiliary: %s, ret = %ld\n",
>+ type[i], PTR_ERR(adev));
>+ continue;
>+ }
>+
>+ priv->aux_devices[i] = adev;
>+ }
>+ return 0;
>+}
>+
>+static int
>+peci_cpu_probe(struct peci_device *device, const struct peci_device_id *id)
>+{
>+ struct device *dev = &device->dev;
>+ struct peci_cpu *priv;
>+
>+ priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
>+ if (!priv)
>+ return -ENOMEM;
>+
>+ dev_set_drvdata(dev, priv);
>+ priv->device = device;
>+ priv->id = id;
>+
>+ return peci_cpu_add_adevices(priv);
>+}
>+
>+static void peci_cpu_remove(struct peci_device *device)
>+{
>+ struct peci_cpu *priv = dev_get_drvdata(&device->dev);
>+ int i;
>+
>+ for (i = 0; i < ARRAY_SIZE(type); i++) {
>+ struct auxiliary_device *adev = priv->aux_devices[i];
>+
>+ if (adev)
>+ del_adev(adev);
>+ }
>+}
>+
>+static const struct peci_device_id peci_cpu_device_ids[] = {
>+ { /* Haswell Xeon */
>+ .family = 6,
>+ .model = INTEL_FAM6_HASWELL_X,
>+ .data = "hsx",
>+ },
>+ { /* Broadwell Xeon */
>+ .family = 6,
>+ .model = INTEL_FAM6_BROADWELL_X,
>+ .data = "bdx",
>+ },
>+ { /* Broadwell Xeon D */
>+ .family = 6,
>+ .model = INTEL_FAM6_BROADWELL_D,
>+ .data = "skxd",
>+ },
>+ { /* Skylake Xeon */
>+ .family = 6,
>+ .model = INTEL_FAM6_SKYLAKE_X,
>+ .data = "skx",
>+ },
>+ { /* Icelake Xeon */
>+ .family = 6,
>+ .model = INTEL_FAM6_ICELAKE_X,
>+ .data = "icx",
>+ },
>+ { /* Icelake Xeon D */
>+ .family = 6,
>+ .model = INTEL_FAM6_ICELAKE_D,
>+ .data = "icxd",
>+ },
>+ { }
>+};
>+MODULE_DEVICE_TABLE(peci, peci_cpu_device_ids);
>+
>+static struct peci_driver peci_cpu_driver = {
>+ .probe = peci_cpu_probe,
>+ .remove = peci_cpu_remove,
>+ .id_table = peci_cpu_device_ids,
>+ .driver = {
>+ .name = "peci-cpu",
>+ },
>+};
>+module_peci_driver(peci_cpu_driver);
>+
>+MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
>+MODULE_DESCRIPTION("PECI CPU driver");
>+MODULE_LICENSE("GPL");
>+MODULE_IMPORT_NS(PECI);
>diff --git a/drivers/peci/device.c b/drivers/peci/device.c
>index 8c4bd1ebbc29..c278c9ea166c 100644
>--- a/drivers/peci/device.c
>+++ b/drivers/peci/device.c
>@@ -3,6 +3,7 @@
>
> #include <linux/bitfield.h>
> #include <linux/peci.h>
>+#include <linux/peci-cpu.h>
> #include <linux/slab.h>
> #include <linux/x86/cpu.h>
>
>diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
>index c891c93e077a..1d39483a8acf 100644
>--- a/drivers/peci/internal.h
>+++ b/drivers/peci/internal.h
>@@ -21,6 +21,7 @@ void peci_request_free(struct peci_request *req);
>
> int peci_request_status(struct peci_request *req);
> u64 peci_request_data_dib(struct peci_request *req);
>+s16 peci_request_data_temp(struct peci_request *req);
>
> u8 peci_request_data_readb(struct peci_request *req);
> u16 peci_request_data_readw(struct peci_request *req);
>@@ -35,6 +36,32 @@ struct peci_request *peci_pkg_cfg_readw(struct peci_device *device, u8 index, u1
> struct peci_request *peci_pkg_cfg_readl(struct peci_device *device, u8 index, u16 param);
> struct peci_request *peci_pkg_cfg_readq(struct peci_device *device, u8 index, u16 param);
>
>+struct peci_request *peci_pci_cfg_local_readb(struct peci_device *device,
>+ u8 bus, u8 dev, u8 func, u16 reg);
>+struct peci_request *peci_pci_cfg_local_readw(struct peci_device *device,
>+ u8 bus, u8 dev, u8 func, u16 reg);
>+struct peci_request *peci_pci_cfg_local_readl(struct peci_device *device,
>+ u8 bus, u8 dev, u8 func, u16 reg);
>+
>+struct peci_request *peci_ep_pci_cfg_local_readb(struct peci_device *device, u8 seg,
>+ u8 bus, u8 dev, u8 func, u16 reg);
>+struct peci_request *peci_ep_pci_cfg_local_readw(struct peci_device *device, u8 seg,
>+ u8 bus, u8 dev, u8 func, u16 reg);
>+struct peci_request *peci_ep_pci_cfg_local_readl(struct peci_device *device, u8 seg,
>+ u8 bus, u8 dev, u8 func, u16 reg);
>+
>+struct peci_request *peci_ep_pci_cfg_readb(struct peci_device *device, u8 seg,
>+ u8 bus, u8 dev, u8 func, u16 reg);
>+struct peci_request *peci_ep_pci_cfg_readw(struct peci_device *device, u8 seg,
>+ u8 bus, u8 dev, u8 func, u16 reg);
>+struct peci_request *peci_ep_pci_cfg_readl(struct peci_device *device, u8 seg,
>+ u8 bus, u8 dev, u8 func, u16 reg);
>+
>+struct peci_request *peci_ep_mmio32_readl(struct peci_device *device, u8 bar, u8 seg,
>+ u8 bus, u8 dev, u8 func, u64 offset);
>+
>+struct peci_request *peci_ep_mmio64_readl(struct peci_device *device, u8 bar, u8 seg,
>+ u8 bus, u8 dev, u8 func, u64 offset);
> /**
> * struct peci_device_id - PECI device data to match
> * @data: pointer to driver private data specific to device
>diff --git a/drivers/peci/request.c b/drivers/peci/request.c
>index 48354455b554..c5d39f7e8142 100644
>--- a/drivers/peci/request.c
>+++ b/drivers/peci/request.c
>@@ -3,6 +3,7 @@
>
> #include <linux/bug.h>
> #include <linux/export.h>
>+#include <linux/pci.h>
> #include <linux/peci.h>
> #include <linux/slab.h>
> #include <linux/types.h>
>@@ -15,6 +16,10 @@
> #define PECI_GET_DIB_WR_LEN 1
> #define PECI_GET_DIB_RD_LEN 8
>
>+#define PECI_GET_TEMP_CMD 0x01
>+#define PECI_GET_TEMP_WR_LEN 1
>+#define PECI_GET_TEMP_RD_LEN 2
>+
> #define PECI_RDPKGCFG_CMD 0xa1
> #define PECI_RDPKGCFG_WRITE_LEN 5
> #define PECI_RDPKGCFG_READ_LEN_BASE 1
>@@ -22,6 +27,44 @@
> #define PECI_WRPKGCFG_WRITE_LEN_BASE 6
> #define PECI_WRPKGCFG_READ_LEN 1
>
>+#define PECI_RDIAMSR_CMD 0xb1
>+#define PECI_RDIAMSR_WRITE_LEN 5
>+#define PECI_RDIAMSR_READ_LEN 9
>+#define PECI_WRIAMSR_CMD 0xb5
>+#define PECI_RDIAMSREX_CMD 0xd1
>+#define PECI_RDIAMSREX_WRITE_LEN 6
>+#define PECI_RDIAMSREX_READ_LEN 9
>+
>+#define PECI_RDPCICFG_CMD 0x61
>+#define PECI_RDPCICFG_WRITE_LEN 6
>+#define PECI_RDPCICFG_READ_LEN 5
>+#define PECI_RDPCICFG_READ_LEN_MAX 24
>+#define PECI_WRPCICFG_CMD 0x65
>+
>+#define PECI_RDPCICFGLOCAL_CMD 0xe1
>+#define PECI_RDPCICFGLOCAL_WRITE_LEN 5
>+#define PECI_RDPCICFGLOCAL_READ_LEN_BASE 1
>+#define PECI_WRPCICFGLOCAL_CMD 0xe5
>+#define PECI_WRPCICFGLOCAL_WRITE_LEN_BASE 6
>+#define PECI_WRPCICFGLOCAL_READ_LEN 1
>+
>+#define PECI_ENDPTCFG_TYPE_LOCAL_PCI 0x03
>+#define PECI_ENDPTCFG_TYPE_PCI 0x04
>+#define PECI_ENDPTCFG_TYPE_MMIO 0x05
>+#define PECI_ENDPTCFG_ADDR_TYPE_PCI 0x04
>+#define PECI_ENDPTCFG_ADDR_TYPE_MMIO_D 0x05
>+#define PECI_ENDPTCFG_ADDR_TYPE_MMIO_Q 0x06
>+#define PECI_RDENDPTCFG_CMD 0xc1
>+#define PECI_RDENDPTCFG_PCI_WRITE_LEN 12
>+#define PECI_RDENDPTCFG_MMIO_D_WRITE_LEN 14
>+#define PECI_RDENDPTCFG_MMIO_Q_WRITE_LEN 18
>+#define PECI_RDENDPTCFG_READ_LEN_BASE 1
>+#define PECI_WRENDPTCFG_CMD 0xc5
>+#define PECI_WRENDPTCFG_PCI_WRITE_LEN_BASE 13
>+#define PECI_WRENDPTCFG_MMIO_D_WRITE_LEN_BASE 15
>+#define PECI_WRENDPTCFG_MMIO_Q_WRITE_LEN_BASE 19
>+#define PECI_WRENDPTCFG_READ_LEN 1
>+
> /* Device Specific Completion Code (CC) Definition */
> #define PECI_CC_SUCCESS 0x40
> #define PECI_CC_NEED_RETRY 0x80
>@@ -223,6 +266,27 @@ struct peci_request *peci_get_dib(struct peci_device *device)
> }
> EXPORT_SYMBOL_NS_GPL(peci_get_dib, PECI);
>
>+struct peci_request *peci_get_temp(struct peci_device *device)
>+{
>+ struct peci_request *req;
>+ int ret;
>+
>+ req = peci_request_alloc(device, PECI_GET_TEMP_WR_LEN, PECI_GET_TEMP_RD_LEN);
>+ if (!req)
>+ return ERR_PTR(-ENOMEM);
>+
>+ req->tx.buf[0] = PECI_GET_TEMP_CMD;
>+
>+ ret = peci_request_xfer(req);
>+ if (ret) {
>+ peci_request_free(req);
>+ return ERR_PTR(ret);
>+ }
>+
>+ return req;
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_get_temp, PECI);
>+
> static struct peci_request *
> __pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
> {
>@@ -248,6 +312,108 @@ __pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
> return req;
> }
>
>+static u32 __get_pci_addr(u8 bus, u8 dev, u8 func, u16 reg)
>+{
>+ return reg | PCI_DEVID(bus, PCI_DEVFN(dev, func)) << 12;
>+}
>+
>+static struct peci_request *
>+__pci_cfg_local_read(struct peci_device *device, u8 bus, u8 dev, u8 func, u16 reg, u8 len)
>+{
>+ struct peci_request *req;
>+ u32 pci_addr;
>+ int ret;
>+
>+ req = peci_request_alloc(device, PECI_RDPCICFGLOCAL_WRITE_LEN,
>+ PECI_RDPCICFGLOCAL_READ_LEN_BASE + len);
>+ if (!req)
>+ return ERR_PTR(-ENOMEM);
>+
>+ pci_addr = __get_pci_addr(bus, dev, func, reg);
>+
>+ req->tx.buf[0] = PECI_RDPCICFGLOCAL_CMD;
>+ req->tx.buf[1] = 0;
>+ put_unaligned_le24(pci_addr, &req->tx.buf[2]);
>+
>+ ret = peci_request_xfer_retry(req);
>+ if (ret) {
>+ peci_request_free(req);
>+ return ERR_PTR(ret);
>+ }
>+
>+ return req;
>+}
>+
>+static struct peci_request *
>+__ep_pci_cfg_read(struct peci_device *device, u8 msg_type, u8 seg,
>+ u8 bus, u8 dev, u8 func, u16 reg, u8 len)
>+{
>+ struct peci_request *req;
>+ u32 pci_addr;
>+ int ret;
>+
>+ req = peci_request_alloc(device, PECI_RDENDPTCFG_PCI_WRITE_LEN,
>+ PECI_RDENDPTCFG_READ_LEN_BASE + len);
>+ if (!req)
>+ return ERR_PTR(-ENOMEM);
>+
>+ pci_addr = __get_pci_addr(bus, dev, func, reg);
>+
>+ req->tx.buf[0] = PECI_RDENDPTCFG_CMD;
>+ req->tx.buf[1] = 0;
>+ req->tx.buf[2] = msg_type;
>+ req->tx.buf[3] = 0;
>+ req->tx.buf[4] = 0;
>+ req->tx.buf[5] = 0;
>+ req->tx.buf[6] = PECI_ENDPTCFG_ADDR_TYPE_PCI;
>+ req->tx.buf[7] = seg; /* PCI Segment */
>+ put_unaligned_le32(pci_addr, &req->tx.buf[8]);
>+
>+ ret = peci_request_xfer_retry(req);
>+ if (ret) {
>+ peci_request_free(req);
>+ return ERR_PTR(ret);
>+ }
>+
>+ return req;
>+}
>+
>+static struct peci_request *
>+__ep_mmio_read(struct peci_device *device, u8 bar, u8 addr_type, u8 seg,
>+ u8 bus, u8 dev, u8 func, u64 offset, u8 tx_len, u8 len)
>+{
>+ struct peci_request *req;
>+ int ret;
>+
>+ req = peci_request_alloc(device, tx_len, PECI_RDENDPTCFG_READ_LEN_BASE + len);
>+ if (!req)
>+ return ERR_PTR(-ENOMEM);
>+
>+ req->tx.buf[0] = PECI_RDENDPTCFG_CMD;
>+ req->tx.buf[1] = 0;
>+ req->tx.buf[2] = PECI_ENDPTCFG_TYPE_MMIO;
>+ req->tx.buf[3] = 0; /* Endpoint ID */
>+ req->tx.buf[4] = 0; /* Reserved */
>+ req->tx.buf[5] = bar;
>+ req->tx.buf[6] = addr_type;
>+ req->tx.buf[7] = seg; /* PCI Segment */
>+ req->tx.buf[8] = PCI_DEVFN(dev, func);
>+ req->tx.buf[9] = bus; /* PCI Bus */
>+
>+ if (addr_type == PECI_ENDPTCFG_ADDR_TYPE_MMIO_D)
>+ put_unaligned_le32(offset, &req->tx.buf[10]);
>+ else
>+ put_unaligned_le64(offset, &req->tx.buf[10]);
>+
>+ ret = peci_request_xfer_retry(req);
>+ if (ret) {
>+ peci_request_free(req);
>+ return ERR_PTR(ret);
>+ }
>+
>+ return req;
>+}
>+
> u8 peci_request_data_readb(struct peci_request *req)
> {
> return req->rx.buf[1];
>@@ -278,6 +444,12 @@ u64 peci_request_data_dib(struct peci_request *req)
> }
> EXPORT_SYMBOL_NS_GPL(peci_request_data_dib, PECI);
>
>+s16 peci_request_data_temp(struct peci_request *req)
>+{
>+ return get_unaligned_le16(&req->rx.buf[0]);
>+}
>+EXPORT_SYMBOL_NS_GPL(peci_request_data_temp, PECI);
>+
> #define __read_pkg_config(x, type) \
> struct peci_request *peci_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param) \
> { \
>@@ -289,3 +461,42 @@ __read_pkg_config(readb, u8);
> __read_pkg_config(readw, u16);
> __read_pkg_config(readl, u32);
> __read_pkg_config(readq, u64);
>+
>+#define __read_pci_config_local(x, type) \
>+struct peci_request * \
>+peci_pci_cfg_local_##x(struct peci_device *device, u8 bus, u8 dev, u8 func, u16 reg) \
>+{ \
>+ return __pci_cfg_local_read(device, bus, dev, func, reg, sizeof(type)); \
>+} \
As with peci_pkg_cfg_*() in patch 9, it seems like this could relieve
callers of some busy-work by returning a status int and writing the data
to a 'type*' pointer instead of returning a struct peci_request*.
>+EXPORT_SYMBOL_NS_GPL(peci_pci_cfg_local_##x, PECI)
>+
>+__read_pci_config_local(readb, u8);
>+__read_pci_config_local(readw, u16);
>+__read_pci_config_local(readl, u32);
>+
>+#define __read_ep_pci_config(x, msg_type, type) \
>+struct peci_request * \
>+peci_ep_pci_cfg_##x(struct peci_device *device, u8 seg, u8 bus, u8 dev, u8 func, u16 reg) \
>+{ \
>+ return __ep_pci_cfg_read(device, msg_type, seg, bus, dev, func, reg, sizeof(type)); \
>+} \
Likewise here.
>+EXPORT_SYMBOL_NS_GPL(peci_ep_pci_cfg_##x, PECI)
>+
>+__read_ep_pci_config(local_readb, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u8);
>+__read_ep_pci_config(local_readw, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u16);
>+__read_ep_pci_config(local_readl, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u32);
>+__read_ep_pci_config(readb, PECI_ENDPTCFG_TYPE_PCI, u8);
>+__read_ep_pci_config(readw, PECI_ENDPTCFG_TYPE_PCI, u16);
>+__read_ep_pci_config(readl, PECI_ENDPTCFG_TYPE_PCI, u32);
>+
>+#define __read_ep_mmio(x, y, addr_type, type1, type2) \
>+struct peci_request *peci_ep_mmio##y##_##x(struct peci_device *device, u8 bar, u8 seg, \
>+ u8 bus, u8 dev, u8 func, u64 offset) \
>+{ \
>+ return __ep_mmio_read(device, bar, addr_type, seg, bus, dev, func, \
>+ offset, 10 + sizeof(type1), sizeof(type2)); \
>+} \
And here (I think).
Also, the '10 +' looks a bit magical/mysterious. Could that be
clarified a bit with a macro or something?
>+EXPORT_SYMBOL_NS_GPL(peci_ep_mmio##y##_##x, PECI)
>+
>+__read_ep_mmio(readl, 32, PECI_ENDPTCFG_ADDR_TYPE_MMIO_D, u32, u32);
>+__read_ep_mmio(readl, 64, PECI_ENDPTCFG_ADDR_TYPE_MMIO_Q, u64, u32);
>diff --git a/include/linux/peci-cpu.h b/include/linux/peci-cpu.h
>new file mode 100644
>index 000000000000..d1b307ec2429
>--- /dev/null
>+++ b/include/linux/peci-cpu.h
>@@ -0,0 +1,38 @@
>+/* SPDX-License-Identifier: GPL-2.0-only */
>+/* Copyright (c) 2021 Intel Corporation */
>+
>+#ifndef __LINUX_PECI_CPU_H
>+#define __LINUX_PECI_CPU_H
>+
>+#include <linux/types.h>
>+
>+#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
>+#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
>+#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
>+#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
>+#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
>+#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
>+#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
>+#define PECI_PCS_MODULE_TEMP 9 /* Per Core DTS Temperature Read */
>+#define PECI_PCS_THERMAL_MARGIN 10 /* DTS thermal margin */
>+#define PECI_PCS_DDR_DIMM_TEMP 14 /* DDR DIMM Temperature */
>+#define PECI_PCS_TEMP_TARGET 16 /* Temperature Target Read */
>+#define PECI_PCS_TDP_UNITS 30 /* Units for power/energy registers */
>+
>+struct peci_device;
>+
>+int peci_temp_read(struct peci_device *device, s16 *temp_raw);
>+
>+int peci_pcs_read(struct peci_device *device, u8 index,
>+ u16 param, u32 *data);
>+
>+int peci_pci_local_read(struct peci_device *device, u8 bus, u8 dev,
>+ u8 func, u16 reg, u32 *data);
>+
>+int peci_ep_pci_local_read(struct peci_device *device, u8 seg,
>+ u8 bus, u8 dev, u8 func, u16 reg, u32 *data);
>+
>+int peci_mmio_read(struct peci_device *device, u8 bar, u8 seg,
>+ u8 bus, u8 dev, u8 func, u64 address, u32 *data);
>+
>+#endif /* __LINUX_PECI_CPU_H */
>diff --git a/include/linux/peci.h b/include/linux/peci.h
>index f9f37b874011..31f9e628fd11 100644
>--- a/include/linux/peci.h
>+++ b/include/linux/peci.h
>@@ -9,14 +9,6 @@
> #include <linux/mutex.h>
> #include <linux/types.h>
>
>-#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */
>-#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
>-#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
>-#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
>-#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
>-#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */
>-#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
>-
> struct peci_request;
>
> /**
>--
>2.31.1
>
On Mon, Jul 12, 2021 at 05:04:46PM CDT, Iwona Winiarska wrote:
>From: Jae Hyun Yoo <[email protected]>
>
>Add documentation for peci-cputemp driver that provides DTS thermal
>readings for CPU packages and CPU cores and peci-dimmtemp driver that
>provides DTS thermal readings for DIMMs.
>
>Signed-off-by: Jae Hyun Yoo <[email protected]>
>Co-developed-by: Iwona Winiarska <[email protected]>
>Signed-off-by: Iwona Winiarska <[email protected]>
>Reviewed-by: Pierre-Louis Bossart <[email protected]>
>---
> Documentation/hwmon/index.rst | 2 +
> Documentation/hwmon/peci-cputemp.rst | 93 +++++++++++++++++++++++++++
> Documentation/hwmon/peci-dimmtemp.rst | 58 +++++++++++++++++
> MAINTAINERS | 2 +
> 4 files changed, 155 insertions(+)
> create mode 100644 Documentation/hwmon/peci-cputemp.rst
> create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
>
>diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
>index bc01601ea81a..cc76b5b3f791 100644
>--- a/Documentation/hwmon/index.rst
>+++ b/Documentation/hwmon/index.rst
>@@ -154,6 +154,8 @@ Hardware Monitoring Kernel Drivers
> pcf8591
> pim4328
> pm6764tr
>+ peci-cputemp
>+ peci-dimmtemp
> pmbus
> powr1220
> pxe1610
>diff --git a/Documentation/hwmon/peci-cputemp.rst b/Documentation/hwmon/peci-cputemp.rst
>new file mode 100644
>index 000000000000..d3a218ba810a
>--- /dev/null
>+++ b/Documentation/hwmon/peci-cputemp.rst
>@@ -0,0 +1,93 @@
>+.. SPDX-License-Identifier: GPL-2.0-only
>+
>+Kernel driver peci-cputemp
>+==========================
>+
>+Supported chips:
>+ One of Intel server CPUs listed below which is connected to a PECI bus.
>+ * Intel Xeon E5/E7 v3 server processors
>+ Intel Xeon E5-14xx v3 family
>+ Intel Xeon E5-24xx v3 family
>+ Intel Xeon E5-16xx v3 family
>+ Intel Xeon E5-26xx v3 family
>+ Intel Xeon E5-46xx v3 family
>+ Intel Xeon E7-48xx v3 family
>+ Intel Xeon E7-88xx v3 family
>+ * Intel Xeon E5/E7 v4 server processors
>+ Intel Xeon E5-16xx v4 family
>+ Intel Xeon E5-26xx v4 family
>+ Intel Xeon E5-46xx v4 family
>+ Intel Xeon E7-48xx v4 family
>+ Intel Xeon E7-88xx v4 family
>+ * Intel Xeon Scalable server processors
>+ Intel Xeon D family
>+ Intel Xeon Bronze family
>+ Intel Xeon Silver family
>+ Intel Xeon Gold family
>+ Intel Xeon Platinum family
>+
>+ Datasheet: Available from http://www.intel.com/design/literature.htm
>+
>+Author: Jae Hyun Yoo <[email protected]>
>+
>+Description
>+-----------
>+
>+This driver implements a generic PECI hwmon feature which provides Digital
>+Thermal Sensor (DTS) thermal readings of the CPU package and CPU cores that are
>+accessible via the processor PECI interface.
>+
>+All temperature values are given in millidegree Celsius and will be measurable
>+only when the target CPU is powered on.
>+
>+Sysfs interface
>+-------------------
>+
>+======================= =======================================================
>+temp1_label "Die"
>+temp1_input Provides current die temperature of the CPU package.
>+temp1_max Provides thermal control temperature of the CPU package
>+ which is also known as Tcontrol.
>+temp1_crit Provides shutdown temperature of the CPU package which
>+ is also known as the maximum processor junction
>+ temperature, Tjmax or Tprochot.
>+temp1_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax of
>+ the CPU package.
>+
>+temp2_label "DTS"
>+temp2_input Provides current DTS temperature of the CPU package.
Would this be a good place to note the slightly counter-intuitive nature
of DTS readings? i.e. add something along the lines of "The DTS sensor
produces a delta relative to Tjmax, so negative values are normal and
values approaching zero are hot." (In my experience people who aren't
already familiar with it tend to think something's wrong when a CPU
temperature reading shows -50C.)
>+temp2_max Provides thermal control temperature of the CPU package
>+ which is also known as Tcontrol.
>+temp2_crit Provides shutdown temperature of the CPU package which
>+ is also known as the maximum processor junction
>+ temperature, Tjmax or Tprochot.
>+temp2_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax of
>+ the CPU package.
>+
>+temp3_label "Tcontrol"
>+temp3_input Provides current Tcontrol temperature of the CPU
>+ package which is also known as Fan Temperature target.
>+ Indicates the relative value from thermal monitor trip
>+ temperature at which fans should be engaged.
>+temp3_crit Provides Tcontrol critical value of the CPU package
>+ which is same to Tjmax.
>+
>+temp4_label "Tthrottle"
>+temp4_input Provides current Tthrottle temperature of the CPU
>+ package. Used for throttling temperature. If this value
>+ is allowed and lower than Tjmax - the throttle will
>+ occur and reported at lower than Tjmax.
>+
>+temp5_label "Tjmax"
>+temp5_input Provides the maximum junction temperature, Tjmax of the
>+ CPU package.
>+
>+temp[6-N]_label Provides string "Core X", where X is resolved core
>+ number.
>+temp[6-N]_input Provides current temperature of each core.
>+temp[6-N]_max Provides thermal control temperature of the core.
>+temp[6-N]_crit Provides shutdown temperature of the core.
>+temp[6-N]_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax of
>+ the core.
I only see *_label and *_input for the per-core temperature sensors, no
*_max, *_crit, or *_crit_hyst.
>+
>+======================= =======================================================
>diff --git a/Documentation/hwmon/peci-dimmtemp.rst b/Documentation/hwmon/peci-dimmtemp.rst
>new file mode 100644
>index 000000000000..1778d9317e43
>--- /dev/null
>+++ b/Documentation/hwmon/peci-dimmtemp.rst
>@@ -0,0 +1,58 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+Kernel driver peci-dimmtemp
>+===========================
>+
>+Supported chips:
>+ One of Intel server CPUs listed below which is connected to a PECI bus.
>+ * Intel Xeon E5/E7 v3 server processors
>+ Intel Xeon E5-14xx v3 family
>+ Intel Xeon E5-24xx v3 family
>+ Intel Xeon E5-16xx v3 family
>+ Intel Xeon E5-26xx v3 family
>+ Intel Xeon E5-46xx v3 family
>+ Intel Xeon E7-48xx v3 family
>+ Intel Xeon E7-88xx v3 family
>+ * Intel Xeon E5/E7 v4 server processors
>+ Intel Xeon E5-16xx v4 family
>+ Intel Xeon E5-26xx v4 family
>+ Intel Xeon E5-46xx v4 family
>+ Intel Xeon E7-48xx v4 family
>+ Intel Xeon E7-88xx v4 family
>+ * Intel Xeon Scalable server processors
>+ Intel Xeon D family
>+ Intel Xeon Bronze family
>+ Intel Xeon Silver family
>+ Intel Xeon Gold family
>+ Intel Xeon Platinum family
>+
>+ Datasheet: Available from http://www.intel.com/design/literature.htm
>+
>+Author: Jae Hyun Yoo <[email protected]>
>+
>+Description
>+-----------
>+
>+This driver implements a generic PECI hwmon feature which provides Digital
>+Thermal Sensor (DTS) thermal readings of DIMM components that are accessible
>+via the processor PECI interface.
I had thought "DTS" referred to a fairly specific sensor in the CPU; is
the same term also used for DIMM temp sensors or is the mention of it
here a copy/paste error?
>+
>+All temperature values are given in millidegree Celsius and will be measurable
>+only when the target CPU is powered on.
>+
>+Sysfs interface
>+-------------------
>+
>+======================= =======================================================
>+
>+temp[N]_label Provides string "DIMM CI", where C is DIMM channel and
>+ I is DIMM index of the populated DIMM.
>+temp[N]_input Provides current temperature of the populated DIMM.
>+temp[N]_max Provides thermal control temperature of the DIMM.
>+temp[N]_crit Provides shutdown temperature of the DIMM.
>+
>+======================= =======================================================
>+
>+Note:
>+ DIMM temperature attributes will appear when the client CPU's BIOS
>+ completes memory training and testing.
>diff --git a/MAINTAINERS b/MAINTAINERS
>index 35ba9e3646bd..d16da127bbdc 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -14509,6 +14509,8 @@ M: Iwona Winiarska <[email protected]>
> R: Jae Hyun Yoo <[email protected]>
> L: [email protected]
> S: Supported
>+F: Documentation/hwmon/peci-cputemp.rst
>+F: Documentation/hwmon/peci-dimmtemp.rst
> F: drivers/hwmon/peci/
>
> PECI SUBSYSTEM
>--
>2.31.1
>
On 7/27/21 3:58 PM, Zev Weiss wrote:
> On Mon, Jul 12, 2021 at 05:04:46PM CDT, Iwona Winiarska wrote:
>> From: Jae Hyun Yoo <[email protected]>
>>
>> Add documentation for peci-cputemp driver that provides DTS thermal
>> readings for CPU packages and CPU cores and peci-dimmtemp driver that
>> provides DTS thermal readings for DIMMs.
>>
>> Signed-off-by: Jae Hyun Yoo <[email protected]>
>> Co-developed-by: Iwona Winiarska <[email protected]>
>> Signed-off-by: Iwona Winiarska <[email protected]>
>> Reviewed-by: Pierre-Louis Bossart <[email protected]>
>> ---
>> Documentation/hwmon/index.rst | 2 +
>> Documentation/hwmon/peci-cputemp.rst | 93 +++++++++++++++++++++++++++
>> Documentation/hwmon/peci-dimmtemp.rst | 58 +++++++++++++++++
>> MAINTAINERS | 2 +
>> 4 files changed, 155 insertions(+)
>> create mode 100644 Documentation/hwmon/peci-cputemp.rst
>> create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
>>
>> diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
>> index bc01601ea81a..cc76b5b3f791 100644
>> --- a/Documentation/hwmon/index.rst
>> +++ b/Documentation/hwmon/index.rst
>> @@ -154,6 +154,8 @@ Hardware Monitoring Kernel Drivers
>> pcf8591
>> pim4328
>> pm6764tr
>> + peci-cputemp
>> + peci-dimmtemp
>> pmbus
>> powr1220
>> pxe1610
>> diff --git a/Documentation/hwmon/peci-cputemp.rst b/Documentation/hwmon/peci-cputemp.rst
>> new file mode 100644
>> index 000000000000..d3a218ba810a
>> --- /dev/null
>> +++ b/Documentation/hwmon/peci-cputemp.rst
>> @@ -0,0 +1,93 @@
>> +.. SPDX-License-Identifier: GPL-2.0-only
>> +
>> +Kernel driver peci-cputemp
>> +==========================
>> +
>> +Supported chips:
>> + One of Intel server CPUs listed below which is connected to a PECI bus.
>> + * Intel Xeon E5/E7 v3 server processors
>> + Intel Xeon E5-14xx v3 family
>> + Intel Xeon E5-24xx v3 family
>> + Intel Xeon E5-16xx v3 family
>> + Intel Xeon E5-26xx v3 family
>> + Intel Xeon E5-46xx v3 family
>> + Intel Xeon E7-48xx v3 family
>> + Intel Xeon E7-88xx v3 family
>> + * Intel Xeon E5/E7 v4 server processors
>> + Intel Xeon E5-16xx v4 family
>> + Intel Xeon E5-26xx v4 family
>> + Intel Xeon E5-46xx v4 family
>> + Intel Xeon E7-48xx v4 family
>> + Intel Xeon E7-88xx v4 family
>> + * Intel Xeon Scalable server processors
>> + Intel Xeon D family
>> + Intel Xeon Bronze family
>> + Intel Xeon Silver family
>> + Intel Xeon Gold family
>> + Intel Xeon Platinum family
>> +
>> + Datasheet: Available from http://www.intel.com/design/literature.htm
>> +
>> +Author: Jae Hyun Yoo <[email protected]>
>> +
>> +Description
>> +-----------
>> +
>> +This driver implements a generic PECI hwmon feature which provides Digital
>> +Thermal Sensor (DTS) thermal readings of the CPU package and CPU cores that are
>> +accessible via the processor PECI interface.
>> +
>> +All temperature values are given in millidegree Celsius and will be measurable
>> +only when the target CPU is powered on.
>> +
>> +Sysfs interface
>> +-------------------
>> +
>> +======================= =======================================================
>> +temp1_label "Die"
>> +temp1_input Provides current die temperature of the CPU package.
>> +temp1_max Provides thermal control temperature of the CPU package
>> + which is also known as Tcontrol.
>> +temp1_crit Provides shutdown temperature of the CPU package which
>> + is also known as the maximum processor junction
>> + temperature, Tjmax or Tprochot.
>> +temp1_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax of
>> + the CPU package.
>> +
>> +temp2_label "DTS"
>> +temp2_input Provides current DTS temperature of the CPU package.
>
> Would this be a good place to note the slightly counter-intuitive nature
> of DTS readings? i.e. add something along the lines of "The DTS sensor
> produces a delta relative to Tjmax, so negative values are normal and
> values approaching zero are hot." (In my experience people who aren't
> already familiar with it tend to think something's wrong when a CPU
> temperature reading shows -50C.)
>
All attributes shall follow the ABI, and the driver must translate reported
values to degrees C. If those sensors do not follow the ABI and report something
else, I won't accept the driver.
Guenter
>> +temp2_max Provides thermal control temperature of the CPU package
>> + which is also known as Tcontrol.
>> +temp2_crit Provides shutdown temperature of the CPU package which
>> + is also known as the maximum processor junction
>> + temperature, Tjmax or Tprochot.
>> +temp2_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax of
>> + the CPU package.
>> +
>> +temp3_label "Tcontrol"
>> +temp3_input Provides current Tcontrol temperature of the CPU
>> + package which is also known as Fan Temperature target.
>> + Indicates the relative value from thermal monitor trip
>> + temperature at which fans should be engaged.
>> +temp3_crit Provides Tcontrol critical value of the CPU package
>> + which is same to Tjmax.
>> +
>> +temp4_label "Tthrottle"
>> +temp4_input Provides current Tthrottle temperature of the CPU
>> + package. Used for throttling temperature. If this value
>> + is allowed and lower than Tjmax - the throttle will
>> + occur and reported at lower than Tjmax.
>> +
>> +temp5_label "Tjmax"
>> +temp5_input Provides the maximum junction temperature, Tjmax of the
>> + CPU package.
>> +
>> +temp[6-N]_label Provides string "Core X", where X is resolved core
>> + number.
>> +temp[6-N]_input Provides current temperature of each core.
>> +temp[6-N]_max Provides thermal control temperature of the core.
>> +temp[6-N]_crit Provides shutdown temperature of the core.
>> +temp[6-N]_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax of
>> + the core.
>
> I only see *_label and *_input for the per-core temperature sensors, no
> *_max, *_crit, or *_crit_hyst.
>
>> +
>> +======================= =======================================================
>> diff --git a/Documentation/hwmon/peci-dimmtemp.rst b/Documentation/hwmon/peci-dimmtemp.rst
>> new file mode 100644
>> index 000000000000..1778d9317e43
>> --- /dev/null
>> +++ b/Documentation/hwmon/peci-dimmtemp.rst
>> @@ -0,0 +1,58 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +Kernel driver peci-dimmtemp
>> +===========================
>> +
>> +Supported chips:
>> + One of Intel server CPUs listed below which is connected to a PECI bus.
>> + * Intel Xeon E5/E7 v3 server processors
>> + Intel Xeon E5-14xx v3 family
>> + Intel Xeon E5-24xx v3 family
>> + Intel Xeon E5-16xx v3 family
>> + Intel Xeon E5-26xx v3 family
>> + Intel Xeon E5-46xx v3 family
>> + Intel Xeon E7-48xx v3 family
>> + Intel Xeon E7-88xx v3 family
>> + * Intel Xeon E5/E7 v4 server processors
>> + Intel Xeon E5-16xx v4 family
>> + Intel Xeon E5-26xx v4 family
>> + Intel Xeon E5-46xx v4 family
>> + Intel Xeon E7-48xx v4 family
>> + Intel Xeon E7-88xx v4 family
>> + * Intel Xeon Scalable server processors
>> + Intel Xeon D family
>> + Intel Xeon Bronze family
>> + Intel Xeon Silver family
>> + Intel Xeon Gold family
>> + Intel Xeon Platinum family
>> +
>> + Datasheet: Available from http://www.intel.com/design/literature.htm
>> +
>> +Author: Jae Hyun Yoo <[email protected]>
>> +
>> +Description
>> +-----------
>> +
>> +This driver implements a generic PECI hwmon feature which provides Digital
>> +Thermal Sensor (DTS) thermal readings of DIMM components that are accessible
>> +via the processor PECI interface.
>
> I had thought "DTS" referred to a fairly specific sensor in the CPU; is
> the same term also used for DIMM temp sensors or is the mention of it
> here a copy/paste error?
>
>> +
>> +All temperature values are given in millidegree Celsius and will be measurable
>> +only when the target CPU is powered on.
>> +
>> +Sysfs interface
>> +-------------------
>> +
>> +======================= =======================================================
>> +
>> +temp[N]_label Provides string "DIMM CI", where C is DIMM channel and
>> + I is DIMM index of the populated DIMM.
>> +temp[N]_input Provides current temperature of the populated DIMM.
>> +temp[N]_max Provides thermal control temperature of the DIMM.
>> +temp[N]_crit Provides shutdown temperature of the DIMM.
>> +
>> +======================= =======================================================
>> +
>> +Note:
>> + DIMM temperature attributes will appear when the client CPU's BIOS
>> + completes memory training and testing.
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 35ba9e3646bd..d16da127bbdc 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -14509,6 +14509,8 @@ M: Iwona Winiarska <[email protected]>
>> R: Jae Hyun Yoo <[email protected]>
>> L: [email protected]
>> S: Supported
>> +F: Documentation/hwmon/peci-cputemp.rst
>> +F: Documentation/hwmon/peci-dimmtemp.rst
>> F: drivers/hwmon/peci/
>>
>> PECI SUBSYSTEM
>> --
>> 2.31.1
>>
>
On Tue, 2021-07-27 at 08:49 +0000, Zev Weiss wrote:
> On Mon, Jul 12, 2021 at 05:04:40PM CDT, Iwona Winiarska wrote:
> > From: Jae Hyun Yoo <[email protected]>
> >
> > ASPEED AST24xx/AST25xx/AST26xx SoCs supports the PECI electrical
> > interface (a.k.a PECI wire).
> >
> > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > Co-developed-by: Iwona Winiarska <[email protected]>
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> > MAINTAINERS | 9 +
> > drivers/peci/Kconfig | 6 +
> > drivers/peci/Makefile | 3 +
> > drivers/peci/controller/Kconfig | 12 +
> > drivers/peci/controller/Makefile | 3 +
> > drivers/peci/controller/peci-aspeed.c | 501 ++++++++++++++++++++++++++
> > 6 files changed, 534 insertions(+)
> > create mode 100644 drivers/peci/controller/Kconfig
> > create mode 100644 drivers/peci/controller/Makefile
> > create mode 100644 drivers/peci/controller/peci-aspeed.c
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 47411e2b6336..4ba874afa2fa 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2865,6 +2865,15 @@ S: Maintained
> > F: Documentation/hwmon/asc7621.rst
> > F: drivers/hwmon/asc7621.c
> >
> > +ASPEED PECI CONTROLLER
> > +M: Iwona Winiarska <[email protected]>
> > +M: Jae Hyun Yoo <[email protected]>
> > +L: [email protected] (moderated for non-subscribers)
> > +L: [email protected] (moderated for non-subscribers)
> > +S: Supported
> > +F: Documentation/devicetree/bindings/peci/peci-aspeed.yaml
> > +F: drivers/peci/controller/peci-aspeed.c
> > +
> > ASPEED PINCTRL DRIVERS
> > M: Andrew Jeffery <[email protected]>
> > L: [email protected] (moderated for non-subscribers)
> > diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
> > index 601cc3c3c852..0d0ee8009713 100644
> > --- a/drivers/peci/Kconfig
> > +++ b/drivers/peci/Kconfig
> > @@ -12,3 +12,9 @@ menuconfig PECI
> >
> > This support is also available as a module. If so, the module
> > will be called peci.
> > +
> > +if PECI
> > +
> > +source "drivers/peci/controller/Kconfig"
> > +
> > +endif # PECI
> > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
> > index 2bb2f51bcda7..621a993e306a 100644
> > --- a/drivers/peci/Makefile
> > +++ b/drivers/peci/Makefile
> > @@ -3,3 +3,6 @@
> > # Core functionality
> > peci-y := core.o sysfs.o
> > obj-$(CONFIG_PECI) += peci.o
> > +
> > +# Hardware specific bus drivers
> > +obj-y += controller/
> > diff --git a/drivers/peci/controller/Kconfig
> > b/drivers/peci/controller/Kconfig
> > new file mode 100644
> > index 000000000000..8ddbe494677f
> > --- /dev/null
> > +++ b/drivers/peci/controller/Kconfig
> > @@ -0,0 +1,12 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config PECI_ASPEED
> > + tristate "ASPEED PECI support"
> > + depends on ARCH_ASPEED || COMPILE_TEST
> > + depends on OF
> > + depends on HAS_IOMEM
> > + help
> > + Enable this driver if you want to support ASPEED PECI controller.
> > +
> > + This driver can be also build as a module. If so, the module
> > + will be called peci-aspeed.
> > diff --git a/drivers/peci/controller/Makefile
> > b/drivers/peci/controller/Makefile
> > new file mode 100644
> > index 000000000000..022c28ef1bf0
> > --- /dev/null
> > +++ b/drivers/peci/controller/Makefile
> > @@ -0,0 +1,3 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +obj-$(CONFIG_PECI_ASPEED) += peci-aspeed.o
> > diff --git a/drivers/peci/controller/peci-aspeed.c
> > b/drivers/peci/controller/peci-aspeed.c
> > new file mode 100644
> > index 000000000000..888b46383ea4
> > --- /dev/null
> > +++ b/drivers/peci/controller/peci-aspeed.c
> > @@ -0,0 +1,501 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (C) 2012-2017 ASPEED Technology Inc.
> > +// Copyright (c) 2018-2021 Intel Corporation
> > +
> > +#include <linux/bitfield.h>
> > +#include <linux/clk.h>
> > +#include <linux/delay.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/io.h>
> > +#include <linux/iopoll.h>
> > +#include <linux/jiffies.h>
> > +#include <linux/module.h>
> > +#include <linux/of.h>
> > +#include <linux/peci.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/reset.h>
> > +
> > +#include <asm/unaligned.h>
> > +
> > +/* ASPEED PECI Registers */
> > +/* Control Register */
> > +#define ASPEED_PECI_CTRL 0x00
> > +#define ASPEED_PECI_CTRL_SAMPLING_MASK GENMASK(19, 16)
> > +#define ASPEED_PECI_CTRL_READ_MODE_MASK GENMASK(13, 12)
> > +#define ASPEED_PECI_CTRL_READ_MODE_COUNT BIT(12)
> > +#define ASPEED_PECI_CTRL_READ_MODE_DBG BIT(13)
>
> Nitpick: might be nice to keep things in a consistent descending order
> here (13 then 12).
>
Sure, I'll change it in v2.
> > +#define ASPEED_PECI_CTRL_CLK_SOURCE_MASK BIT(11)
>
> _MASK suffix seems out of place on this one.
Ack.
>
> > +#define ASPEED_PECI_CTRL_CLK_DIV_MASK GENMASK(10, 8)
> > +#define ASPEED_PECI_CTRL_INVERT_OUT BIT(7)
> > +#define ASPEED_PECI_CTRL_INVERT_IN BIT(6)
> > +#define ASPEED_PECI_CTRL_BUS_CONTENT_EN BIT(5)
>
> It *is* already kind of a long macro name, but abbreviating "contention"
> to "content" seems a bit confusing; I'd suggest keeping the extra three
> characters (or maybe drop the _EN suffix if you want to avoid making it
> even longer).
>
You're right - it'll be renamed properly in v2.
> > +#define ASPEED_PECI_CTRL_PECI_EN BIT(4)
> > +#define ASPEED_PECI_CTRL_PECI_CLK_EN BIT(0)
> > +
> > +/* Timing Negotiation Register */
> > +#define ASPEED_PECI_TIMING_NEGOTIATION 0x04
> > +#define ASPEED_PECI_TIMING_MESSAGE_MASK GENMASK(15, 8)
> > +#define ASPEED_PECI_TIMING_ADDRESS_MASK GENMASK(7, 0)
> > +
> > +/* Command Register */
> > +#define ASPEED_PECI_CMD 0x08
> > +#define ASPEED_PECI_CMD_PIN_MON BIT(31)
> > +#define ASPEED_PECI_CMD_STS_MASK GENMASK(27, 24)
> > +#define ASPEED_PECI_CMD_STS_ADDR_T_NEGO 0x3
> > +#define ASPEED_PECI_CMD_IDLE_MASK \
> > + (ASPEED_PECI_CMD_STS_MASK | ASPEED_PECI_CMD_PIN_MON)
> > +#define ASPEED_PECI_CMD_FIRE BIT(0)
> > +
> > +/* Read/Write Length Register */
> > +#define ASPEED_PECI_RW_LENGTH 0x0c
> > +#define ASPEED_PECI_AW_FCS_EN BIT(31)
> > +#define ASPEED_PECI_READ_LEN_MASK GENMASK(23, 16)
> > +#define ASPEED_PECI_WRITE_LEN_MASK GENMASK(15, 8)
> > +#define ASPEED_PECI_TAGET_ADDR_MASK GENMASK(7, 0)
>
> s/TAGET/TARGET/
>
Ack.
> > +
> > +/* Expected FCS Data Register */
> > +#define ASPEED_PECI_EXP_FCS 0x10
> > +#define ASPEED_PECI_EXP_READ_FCS_MASK GENMASK(23, 16)
> > +#define ASPEED_PECI_EXP_AW_FCS_AUTO_MASK GENMASK(15, 8)
> > +#define ASPEED_PECI_EXP_WRITE_FCS_MASK GENMASK(7, 0)
> > +
> > +/* Captured FCS Data Register */
> > +#define ASPEED_PECI_CAP_FCS 0x14
> > +#define ASPEED_PECI_CAP_READ_FCS_MASK GENMASK(23, 16)
> > +#define ASPEED_PECI_CAP_WRITE_FCS_MASK GENMASK(7, 0)
> > +
> > +/* Interrupt Register */
> > +#define ASPEED_PECI_INT_CTRL 0x18
> > +#define ASPEED_PECI_TIMING_NEGO_SEL_MASK GENMASK(31, 30)
> > +#define ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO 0
> > +#define ASPEED_PECI_2ND_BIT_OF_ADDR_NEGO 1
> > +#define ASPEED_PECI_MESSAGE_NEGO 2
> > +#define ASPEED_PECI_INT_MASK GENMASK(4, 0)
> > +#define ASPEED_PECI_INT_BUS_TIMEOUT BIT(4)
> > +#define ASPEED_PECI_INT_BUS_CONNECT BIT(3)
>
> s/CONNECT/CONTENTION/
Ack.
>
> > +#define ASPEED_PECI_INT_W_FCS_BAD BIT(2)
> > +#define ASPEED_PECI_INT_W_FCS_ABORT BIT(1)
> > +#define ASPEED_PECI_INT_CMD_DONE BIT(0)
> > +
> > +/* Interrupt Status Register */
> > +#define ASPEED_PECI_INT_STS 0x1c
> > +#define ASPEED_PECI_INT_TIMING_RESULT_MASK GENMASK(29, 16)
> > + /* bits[4..0]: Same bit fields in the 'Interrupt Register' */
> > +
> > +/* Rx/Tx Data Buffer Registers */
> > +#define ASPEED_PECI_W_DATA0 0x20
> > +#define ASPEED_PECI_W_DATA1 0x24
> > +#define ASPEED_PECI_W_DATA2 0x28
> > +#define ASPEED_PECI_W_DATA3 0x2c
> > +#define ASPEED_PECI_R_DATA0 0x30
> > +#define ASPEED_PECI_R_DATA1 0x34
> > +#define ASPEED_PECI_R_DATA2 0x38
> > +#define ASPEED_PECI_R_DATA3 0x3c
> > +#define ASPEED_PECI_W_DATA4 0x40
> > +#define ASPEED_PECI_W_DATA5 0x44
> > +#define ASPEED_PECI_W_DATA6 0x48
> > +#define ASPEED_PECI_W_DATA7 0x4c
> > +#define ASPEED_PECI_R_DATA4 0x50
> > +#define ASPEED_PECI_R_DATA5 0x54
> > +#define ASPEED_PECI_R_DATA6 0x58
> > +#define ASPEED_PECI_R_DATA7 0x5c
> > +#define ASPEED_PECI_DATA_BUF_SIZE_MAX 32
> > +
> > +/* Timing Negotiation */
> > +#define ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT 8
> > +#define ASPEED_PECI_RD_SAMPLING_POINT_MAX (BIT(4) - 1)
> > +#define ASPEED_PECI_CLK_DIV_DEFAULT 0
> > +#define ASPEED_PECI_CLK_DIV_MAX (BIT(3) - 1)
> > +#define ASPEED_PECI_MSG_TIMING_DEFAULT 1
> > +#define ASPEED_PECI_MSG_TIMING_MAX (BIT(8) - 1)
> > +#define ASPEED_PECI_ADDR_TIMING_DEFAULT 1
> > +#define ASPEED_PECI_ADDR_TIMING_MAX (BIT(8) - 1)
> > +
> > +/* Timeout */
> > +#define ASPEED_PECI_IDLE_CHECK_TIMEOUT_US (50 * USEC_PER_MSEC)
> > +#define ASPEED_PECI_IDLE_CHECK_INTERVAL_US (10 * USEC_PER_MSEC)
> > +#define ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT (1000)
> > +#define ASPEED_PECI_CMD_TIMEOUT_MS_MAX (1000)
> > +
> > +struct aspeed_peci {
> > + struct peci_controller controller;
> > + struct device *dev;
> > + void __iomem *base;
> > + struct clk *clk;
> > + struct reset_control *rst;
> > + int irq;
> > + spinlock_t lock; /* to sync completion status handling */
> > + struct completion xfer_complete;
> > + u32 status;
> > + u32 cmd_timeout_ms;
> > + u32 msg_timing;
> > + u32 addr_timing;
> > + u32 rd_sampling_point;
> > + u32 clk_div;
> > +};
> > +
> > +static inline struct aspeed_peci *to_aspeed_peci(struct peci_controller *a)
> > +{
> > + return container_of(a, struct aspeed_peci, controller);
> > +}
> > +
> > +static void aspeed_peci_init_regs(struct aspeed_peci *priv)
> > +{
> > + u32 val;
> > +
> > + val = FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK,
> > ASPEED_PECI_CLK_DIV_DEFAULT);
> > + val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
> > + writel(val, priv->base + ASPEED_PECI_CTRL);
> > + /*
> > + * Timing negotiation period setting.
> > + * The unit of the programmed value is 4 times of PECI clock period.
> > + */
> > + val = FIELD_PREP(ASPEED_PECI_TIMING_MESSAGE_MASK, priv->msg_timing);
> > + val |= FIELD_PREP(ASPEED_PECI_TIMING_ADDRESS_MASK, priv-
> > >addr_timing);
> > + writel(val, priv->base + ASPEED_PECI_TIMING_NEGOTIATION);
> > +
> > + /* Clear interrupts */
> > + val = readl(priv->base + ASPEED_PECI_INT_STS) |
> > ASPEED_PECI_INT_MASK;
>
> This should be & instead of |, I'm guessing?
>
I believe the idea is to unconditionally clear all known interrupt status bits,
(irrelevant of what value is already set in regs), and the HW expects that this
is done by writing 1 to corresponding bits.
> > + writel(val, priv->base + ASPEED_PECI_INT_STS);
> > +
> > + /* Set timing negotiation mode and enable interrupts */
> > + val = FIELD_PREP(ASPEED_PECI_TIMING_NEGO_SEL_MASK,
> > ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO);
> > + val |= ASPEED_PECI_INT_MASK;
> > + writel(val, priv->base + ASPEED_PECI_INT_CTRL);
> > +
> > + val = FIELD_PREP(ASPEED_PECI_CTRL_SAMPLING_MASK, priv-
> > >rd_sampling_point);
> > + val |= FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, priv->clk_div);
> > + val |= ASPEED_PECI_CTRL_PECI_EN;
> > + val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
> > + writel(val, priv->base + ASPEED_PECI_CTRL);
> > +}
> > +
> > +static inline int aspeed_peci_check_idle(struct aspeed_peci *priv)
> > +{
> > + u32 cmd_sts = readl(priv->base + ASPEED_PECI_CMD);
> > +
> > + if (FIELD_GET(ASPEED_PECI_CMD_STS_MASK, cmd_sts) ==
> > ASPEED_PECI_CMD_STS_ADDR_T_NEGO)
> > + aspeed_peci_init_regs(priv);
> > +
> > + return readl_poll_timeout(priv->base + ASPEED_PECI_CMD,
> > + cmd_sts,
> > + !(cmd_sts & ASPEED_PECI_CMD_IDLE_MASK),
> > + ASPEED_PECI_IDLE_CHECK_INTERVAL_US,
> > + ASPEED_PECI_IDLE_CHECK_TIMEOUT_US);
> > +}
> > +
> > +static int aspeed_peci_xfer(struct peci_controller *controller,
> > + u8 addr, struct peci_request *req)
> > +{
> > + struct aspeed_peci *priv = to_aspeed_peci(controller);
> > + unsigned long flags, timeout = msecs_to_jiffies(priv-
> > >cmd_timeout_ms);
> > + u32 peci_head;
> > + int ret;
> > +
> > + if (req->tx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX ||
> > + req->rx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX)
> > + return -EINVAL;
> > +
> > + /* Check command sts and bus idle state */
> > + ret = aspeed_peci_check_idle(priv);
> > + if (ret)
> > + return ret; /* -ETIMEDOUT */
> > +
> > + spin_lock_irqsave(&priv->lock, flags);
> > + reinit_completion(&priv->xfer_complete);
> > +
> > + peci_head = FIELD_PREP(ASPEED_PECI_TAGET_ADDR_MASK, addr) |
> > + FIELD_PREP(ASPEED_PECI_WRITE_LEN_MASK, req->tx.len) |
> > + FIELD_PREP(ASPEED_PECI_READ_LEN_MASK, req->rx.len);
> > +
> > + writel(peci_head, priv->base + ASPEED_PECI_RW_LENGTH);
> > +
> > + memcpy_toio(priv->base + ASPEED_PECI_W_DATA0, req->tx.buf,
> > + req->tx.len > 16 ? 16 : req->tx.len);
>
> min(req->tx.len, 16) for the third argument there might be a bit
> clearer.
Ack.
>
> > + if (req->tx.len > 16)
> > + memcpy_toio(priv->base + ASPEED_PECI_W_DATA4, req->tx.buf +
> > 16,
> > + req->tx.len - 16);
> > +
> > + dev_dbg(priv->dev, "HEAD : 0x%08x\n", peci_head);
> > + print_hex_dump_bytes("TX : ", DUMP_PREFIX_NONE, req->tx.buf, req-
> > >tx.len);
> > +
> > + priv->status = 0;
> > + writel(ASPEED_PECI_CMD_FIRE, priv->base + ASPEED_PECI_CMD);
> > + spin_unlock_irqrestore(&priv->lock, flags);
> > +
> > + ret = wait_for_completion_interruptible_timeout(&priv-
> > >xfer_complete, timeout);
> > + if (ret < 0)
> > + return ret;
> > +
> > + if (ret == 0) {
> > + dev_dbg(priv->dev, "Timeout waiting for a response!\n");
> > + return -ETIMEDOUT;
> > + }
> > +
> > + spin_lock_irqsave(&priv->lock, flags);
> > +
> > + writel(0, priv->base + ASPEED_PECI_CMD);
> > +
> > + if (priv->status != ASPEED_PECI_INT_CMD_DONE) {
> > + spin_unlock_irqrestore(&priv->lock, flags);
> > + dev_dbg(priv->dev, "No valid response!\n");
> > + return -EIO;
> > + }
> > +
> > + spin_unlock_irqrestore(&priv->lock, flags);
> > +
> > + memcpy_fromio(req->rx.buf, priv->base + ASPEED_PECI_R_DATA0,
> > + req->rx.len > 16 ? 16 : req->rx.len);
>
> Likewise, min(req->rx.len, 16) here.
Ack.
>
> > + if (req->rx.len > 16)
> > + memcpy_fromio(req->rx.buf + 16, priv->base +
> > ASPEED_PECI_R_DATA4,
> > + req->rx.len - 16);
> > +
> > + print_hex_dump_bytes("RX : ", DUMP_PREFIX_NONE, req->rx.buf, req-
> > >rx.len);
> > +
> > + return 0;
> > +}
> > +
> > +static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)
> > +{
> > + struct aspeed_peci *priv = arg;
> > + u32 status;
> > +
> > + spin_lock(&priv->lock);
> > + status = readl(priv->base + ASPEED_PECI_INT_STS);
> > + writel(status, priv->base + ASPEED_PECI_INT_STS);
> > + priv->status |= (status & ASPEED_PECI_INT_MASK);
> > +
> > + /*
> > + * In most cases, interrupt bits will be set one by one but also
> > note
> > + * that multiple interrupt bits could be set at the same time.
> > + */
> > + if (status & ASPEED_PECI_INT_BUS_TIMEOUT)
> > + dev_dbg_ratelimited(priv->dev,
> > "ASPEED_PECI_INT_BUS_TIMEOUT\n");
> > +
> > + if (status & ASPEED_PECI_INT_BUS_CONNECT)
> > + dev_dbg_ratelimited(priv->dev,
> > "ASPEED_PECI_INT_BUS_CONNECT\n");
>
> s/CONNECT/CONTENTION/ here too (in the message string).
Ack.
>
> > +
> > + if (status & ASPEED_PECI_INT_W_FCS_BAD)
> > + dev_dbg_ratelimited(priv->dev,
> > "ASPEED_PECI_INT_W_FCS_BAD\n");
> > +
> > + if (status & ASPEED_PECI_INT_W_FCS_ABORT)
> > + dev_dbg_ratelimited(priv->dev,
> > "ASPEED_PECI_INT_W_FCS_ABORT\n");
>
> Bus contention can of course arise legitimately, and I suppose an
> offline host CPU might result in a timeout, so dbg seems fine for those
> (though as Dan suggests, making some counters available seems like a
> good idea, especially for contention). Are the FCS error cases
> significant enough to warrant something less likely to go unnoticed
> though? (e.g. dev_warn_ratelimited() or something?)
It's similar story for FCS errors (can occur legitimately).
We can hit ASPEED_PECI_INT_W_FCS_BAD in completely valid scenarios, e.g.
unsuccessful detect during rescan.
In case of ASPEED_PECI_INT_W_FCS_ABORT - caller can hit this by providing e.g.
malformed command. Since we do return -EIO in this case, caller can print its
own log. In other words, it's not always an error condition in peci-aspeed (or
HW). Moreover, if we ever expose more direct PECI access to userspace (pecidev,
or something similar) this warn would be user triggerable.
I would prefer to keep this at debug level for now.
>
> > +
> > + /*
> > + * All commands should be ended up with a ASPEED_PECI_INT_CMD_DONE
> > bit
> > + * set even in an error case.
> > + */
> > + if (status & ASPEED_PECI_INT_CMD_DONE)
> > + complete(&priv->xfer_complete);
> > +
> > + spin_unlock(&priv->lock);
> > +
> > + return IRQ_HANDLED;
> > +}
> > +
> > +static void __sanitize_clock_divider(struct aspeed_peci *priv)
> > +{
> > + u32 clk_div;
> > + int ret;
> > +
> > + ret = device_property_read_u32(priv->dev, "clock-divider",
> > &clk_div);
> > + if (ret) {
> > + clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
> > + } else if (clk_div > ASPEED_PECI_CLK_DIV_MAX) {
> > + dev_warn(priv->dev, "Invalid clock-divider: %u, Using
> > default: %u\n",
> > + clk_div, ASPEED_PECI_CLK_DIV_DEFAULT);
> > +
> > + clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
> > + }
> > +
> > + priv->clk_div = clk_div;
> > +}
> > +
>
> The naming of these __sanitize_*() functions is a bit inconsistent with
> the rest of the driver -- though given how similar they all look, could
> they instead be refactored into a single helper function taking
> property-name, default-value, and max-value parameters?
You're right - we can have a single helper.
Regarding naming, the idea was to have a simple "inner" helper function to be
called by the more appropriately named aspeed_peci_device_property_sanitize().
Do you think I should use "aspeed_peci_" prefix in this function name or just
remove "__" and name it "sanitize_property()"?
>
> > +static void __sanitize_msg_timing(struct aspeed_peci *priv)
> > +{
> > + u32 msg_timing;
> > + int ret;
> > +
> > + ret = device_property_read_u32(priv->dev, "msg-timing",
> > &msg_timing);
> > + if (ret) {
> > + msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
> > + } else if (msg_timing > ASPEED_PECI_MSG_TIMING_MAX) {
> > + dev_warn(priv->dev, "Invalid msg-timing : %u, Use default :
> > %u\n",
> > + msg_timing, ASPEED_PECI_MSG_TIMING_DEFAULT);
> > +
> > + msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
> > + }
> > +
> > + priv->msg_timing = msg_timing;
> > +}
> > +
> > +static void __sanitize_addr_timing(struct aspeed_peci *priv)
> > +{
> > + u32 addr_timing;
> > + int ret;
> > +
> > + ret = device_property_read_u32(priv->dev, "addr-timing",
> > &addr_timing);
> > + if (ret) {
> > + addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
> > + } else if (addr_timing > ASPEED_PECI_ADDR_TIMING_MAX) {
> > + dev_warn(priv->dev, "Invalid addr-timing : %u, Use default :
> > %u\n",
> > + addr_timing, ASPEED_PECI_ADDR_TIMING_DEFAULT);
> > +
> > + addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
> > + }
> > +
> > + priv->addr_timing = addr_timing;
> > +}
> > +
> > +static void __sanitize_rd_sampling_point(struct aspeed_peci *priv)
> > +{
> > + u32 rd_sampling_point;
> > + int ret;
> > +
> > + ret = device_property_read_u32(priv->dev, "rd-sampling-point",
> > &rd_sampling_point);
> > + if (ret) {
> > + rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
> > + } else if (rd_sampling_point > ASPEED_PECI_RD_SAMPLING_POINT_MAX) {
> > + dev_warn(priv->dev, "Invalid rd-sampling-point: %u, Use
> > default : %u\n",
> > + rd_sampling_point,
> > ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT);
> > +
> > + rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
> > + }
> > +
> > + priv->rd_sampling_point = rd_sampling_point;
> > +}
> > +
> > +static void __sanitize_cmd_timeout(struct aspeed_peci *priv)
> > +{
> > + u32 timeout;
> > + int ret;
> > +
> > + ret = device_property_read_u32(priv->dev, "cmd-timeout-ms",
> > &timeout);
> > + if (ret) {
> > + timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
> > + } else if (timeout > ASPEED_PECI_CMD_TIMEOUT_MS_MAX || timeout == 0)
> > {
> > + dev_warn(priv->dev, "Invalid cmd-timeout-ms: %u, Use
> > default: %u\n",
> > + timeout, ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT);
> > +
> > + timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
> > + }
> > +
> > + priv->cmd_timeout_ms = timeout;
> > +}
> > +
> > +static void aspeed_peci_device_property_sanitize(struct aspeed_peci *priv)
> > +{
> > + __sanitize_clock_divider(priv);
> > + __sanitize_msg_timing(priv);
> > + __sanitize_addr_timing(priv);
> > + __sanitize_rd_sampling_point(priv);
> > + __sanitize_cmd_timeout(priv);
> > +}
> > +
> > +static void aspeed_peci_disable_clk(void *data)
> > +{
> > + clk_disable_unprepare(data);
> > +}
> > +
> > +static int aspeed_peci_init_ctrl(struct aspeed_peci *priv)
> > +{
> > + int ret;
> > +
> > + priv->clk = devm_clk_get(priv->dev, NULL);
> > + if (IS_ERR(priv->clk))
> > + return dev_err_probe(priv->dev, PTR_ERR(priv->clk), "Failed
> > to get clk source\n");
> > +
> > + ret = clk_prepare_enable(priv->clk);
> > + if (ret) {
> > + dev_err(priv->dev, "Failed to enable clock\n");
> > + return ret;
> > + }
> > +
> > + ret = devm_add_action_or_reset(priv->dev, aspeed_peci_disable_clk,
> > priv->clk);
> > + if (ret)
> > + return ret;
> > +
> > + aspeed_peci_device_property_sanitize(priv);
> > +
> > + aspeed_peci_init_regs(priv);
> > +
> > + return 0;
> > +}
> > +
> > +static int aspeed_peci_probe(struct platform_device *pdev)
> > +{
> > + struct aspeed_peci *priv;
> > + int ret;
> > +
> > + priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
> > + if (!priv)
> > + return -ENOMEM;
> > +
> > + priv->dev = &pdev->dev;
> > + dev_set_drvdata(priv->dev, priv);
> > +
> > + priv->base = devm_platform_ioremap_resource(pdev, 0);
> > + if (IS_ERR(priv->base))
> > + return PTR_ERR(priv->base);
> > +
> > + priv->irq = platform_get_irq(pdev, 0);
> > + if (!priv->irq)
> > + return priv->irq;
> > +
> > + ret = devm_request_irq(&pdev->dev, priv->irq,
> > aspeed_peci_irq_handler,
> > + 0, "peci-aspeed-irq", priv);
>
> Might as well drop the "-irq" suffix here? (Seems a bit redundant, and
> a quick glance through /proc/interrupts on the systems I have at hand
> doesn't show anything else following that convention.)
I'll remove it.
Thank you
-Iwona
>
> > + if (ret)
> > + return ret;
> > +
> > + init_completion(&priv->xfer_complete);
> > + spin_lock_init(&priv->lock);
> > +
> > + priv->controller.xfer = aspeed_peci_xfer;
> > +
> > + priv->rst = devm_reset_control_get(&pdev->dev, NULL);
> > + if (IS_ERR(priv->rst)) {
> > + dev_err(&pdev->dev, "Missing or invalid reset controller
> > entry\n");
> > + return PTR_ERR(priv->rst);
> > + }
> > + reset_control_deassert(priv->rst);
> > +
> > + ret = aspeed_peci_init_ctrl(priv);
> > + if (ret)
> > + return ret;
> > +
> > + return peci_controller_add(&priv->controller, priv->dev);
> > +}
> > +
> > +static int aspeed_peci_remove(struct platform_device *pdev)
> > +{
> > + struct aspeed_peci *priv = dev_get_drvdata(&pdev->dev);
> > +
> > + peci_controller_remove(&priv->controller);
> > + reset_control_assert(priv->rst);
> > +
> > + return 0;
> > +}
> > +
> > +static const struct of_device_id aspeed_peci_of_table[] = {
> > + { .compatible = "aspeed,ast2400-peci", },
> > + { .compatible = "aspeed,ast2500-peci", },
> > + { .compatible = "aspeed,ast2600-peci", },
> > + { }
> > +};
> > +MODULE_DEVICE_TABLE(of, aspeed_peci_of_table);
> > +
> > +static struct platform_driver aspeed_peci_driver = {
> > + .probe = aspeed_peci_probe,
> > + .remove = aspeed_peci_remove,
> > + .driver = {
> > + .name = "peci-aspeed",
> > + .of_match_table = aspeed_peci_of_table,
> > + },
> > +};
> > +module_platform_driver(aspeed_peci_driver);
> > +
> > +MODULE_AUTHOR("Ryan Chen <[email protected]>");
> > +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
> > +MODULE_DESCRIPTION("ASPEED PECI driver");
> > +MODULE_LICENSE("GPL");
> > +MODULE_IMPORT_NS(PECI);
> > --
> > 2.31.1
On Thu, Jul 29, 2021 at 09:03:28AM CDT, Winiarska, Iwona wrote:
>On Tue, 2021-07-27 at 08:49 +0000, Zev Weiss wrote:
>> On Mon, Jul 12, 2021 at 05:04:40PM CDT, Iwona Winiarska wrote:
>> > From: Jae Hyun Yoo <[email protected]>
>> >
>> > ASPEED AST24xx/AST25xx/AST26xx SoCs supports the PECI electrical
>> > interface (a.k.a PECI wire).
>> >
>> > Signed-off-by: Jae Hyun Yoo <[email protected]>
>> > Co-developed-by: Iwona Winiarska <[email protected]>
>> > Signed-off-by: Iwona Winiarska <[email protected]>
>> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
>> > ---
>> > MAINTAINERS?????????????????????????? |?? 9 +
>> > drivers/peci/Kconfig????????????????? |?? 6 +
>> > drivers/peci/Makefile???????????????? |?? 3 +
>> > drivers/peci/controller/Kconfig?????? |? 12 +
>> > drivers/peci/controller/Makefile????? |?? 3 +
>> > drivers/peci/controller/peci-aspeed.c | 501 ++++++++++++++++++++++++++
>> > 6 files changed, 534 insertions(+)
>> > create mode 100644 drivers/peci/controller/Kconfig
>> > create mode 100644 drivers/peci/controller/Makefile
>> > create mode 100644 drivers/peci/controller/peci-aspeed.c
>> >
>> > diff --git a/MAINTAINERS b/MAINTAINERS
>> > index 47411e2b6336..4ba874afa2fa 100644
>> > --- a/MAINTAINERS
>> > +++ b/MAINTAINERS
>> > @@ -2865,6 +2865,15 @@ S:???????Maintained
>> > F:??????Documentation/hwmon/asc7621.rst
>> > F:??????drivers/hwmon/asc7621.c
>> >
>> > +ASPEED PECI CONTROLLER
>> > +M:?????Iwona Winiarska <[email protected]>
>> > +M:?????Jae Hyun Yoo <[email protected]>
>> > +L:[email protected]?(moderated for non-subscribers)
>> > +L:[email protected]?(moderated for non-subscribers)
>> > +S:?????Supported
>> > +F:?????Documentation/devicetree/bindings/peci/peci-aspeed.yaml
>> > +F:?????drivers/peci/controller/peci-aspeed.c
>> > +
>> > ASPEED PINCTRL DRIVERS
>> > M:??????Andrew Jeffery <[email protected]>
>> > L:[email protected]?(moderated for non-subscribers)
>> > diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
>> > index 601cc3c3c852..0d0ee8009713 100644
>> > --- a/drivers/peci/Kconfig
>> > +++ b/drivers/peci/Kconfig
>> > @@ -12,3 +12,9 @@ menuconfig PECI
>> >
>> > ????????? This support is also available as a module. If so, the module
>> > ????????? will be called peci.
>> > +
>> > +if PECI
>> > +
>> > +source "drivers/peci/controller/Kconfig"
>> > +
>> > +endif # PECI
>> > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
>> > index 2bb2f51bcda7..621a993e306a 100644
>> > --- a/drivers/peci/Makefile
>> > +++ b/drivers/peci/Makefile
>> > @@ -3,3 +3,6 @@
>> > # Core functionality
>> > peci-y := core.o sysfs.o
>> > obj-$(CONFIG_PECI) += peci.o
>> > +
>> > +# Hardware specific bus drivers
>> > +obj-y += controller/
>> > diff --git a/drivers/peci/controller/Kconfig
>> > b/drivers/peci/controller/Kconfig
>> > new file mode 100644
>> > index 000000000000..8ddbe494677f
>> > --- /dev/null
>> > +++ b/drivers/peci/controller/Kconfig
>> > @@ -0,0 +1,12 @@
>> > +# SPDX-License-Identifier: GPL-2.0-only
>> > +
>> > +config PECI_ASPEED
>> > +???????tristate "ASPEED PECI support"
>> > +???????depends on ARCH_ASPEED || COMPILE_TEST
>> > +???????depends on OF
>> > +???????depends on HAS_IOMEM
>> > +???????help
>> > +???????? Enable this driver if you want to support ASPEED PECI controller.
>> > +
>> > +???????? This driver can be also build as a module. If so, the module
>> > +???????? will be called peci-aspeed.
>> > diff --git a/drivers/peci/controller/Makefile
>> > b/drivers/peci/controller/Makefile
>> > new file mode 100644
>> > index 000000000000..022c28ef1bf0
>> > --- /dev/null
>> > +++ b/drivers/peci/controller/Makefile
>> > @@ -0,0 +1,3 @@
>> > +# SPDX-License-Identifier: GPL-2.0-only
>> > +
>> > +obj-$(CONFIG_PECI_ASPEED)??????+= peci-aspeed.o
>> > diff --git a/drivers/peci/controller/peci-aspeed.c
>> > b/drivers/peci/controller/peci-aspeed.c
>> > new file mode 100644
>> > index 000000000000..888b46383ea4
>> > --- /dev/null
>> > +++ b/drivers/peci/controller/peci-aspeed.c
>> > @@ -0,0 +1,501 @@
>> > +// SPDX-License-Identifier: GPL-2.0-only
>> > +// Copyright (C) 2012-2017 ASPEED Technology Inc.
>> > +// Copyright (c) 2018-2021 Intel Corporation
>> > +
>> > +#include <linux/bitfield.h>
>> > +#include <linux/clk.h>
>> > +#include <linux/delay.h>
>> > +#include <linux/interrupt.h>
>> > +#include <linux/io.h>
>> > +#include <linux/iopoll.h>
>> > +#include <linux/jiffies.h>
>> > +#include <linux/module.h>
>> > +#include <linux/of.h>
>> > +#include <linux/peci.h>
>> > +#include <linux/platform_device.h>
>> > +#include <linux/reset.h>
>> > +
>> > +#include <asm/unaligned.h>
>> > +
>> > +/* ASPEED PECI Registers */
>> > +/* Control Register */
>> > +#define ASPEED_PECI_CTRL???????????????????????0x00
>> > +#define?? ASPEED_PECI_CTRL_SAMPLING_MASK???????GENMASK(19, 16)
>> > +#define?? ASPEED_PECI_CTRL_READ_MODE_MASK??????GENMASK(13, 12)
>> > +#define?? ASPEED_PECI_CTRL_READ_MODE_COUNT?????BIT(12)
>> > +#define?? ASPEED_PECI_CTRL_READ_MODE_DBG???????BIT(13)
>>
>> Nitpick: might be nice to keep things in a consistent descending order
>> here (13 then 12).
>>
>
>Sure, I'll change it in v2.
>
>> > +#define?? ASPEED_PECI_CTRL_CLK_SOURCE_MASK?????BIT(11)
>>
>> _MASK suffix seems out of place on this one.
>
>Ack.
>
>>
>> > +#define?? ASPEED_PECI_CTRL_CLK_DIV_MASK????????????????GENMASK(10, 8)
>> > +#define?? ASPEED_PECI_CTRL_INVERT_OUT??????????BIT(7)
>> > +#define?? ASPEED_PECI_CTRL_INVERT_IN???????????BIT(6)
>> > +#define?? ASPEED_PECI_CTRL_BUS_CONTENT_EN??????BIT(5)
>>
>> It *is* already kind of a long macro name, but abbreviating "contention"
>> to "content" seems a bit confusing; I'd suggest keeping the extra three
>> characters (or maybe drop the _EN suffix if you want to avoid making it
>> even longer).
>>
>
>You're right - it'll be renamed properly in v2.
>
>> > +#define?? ASPEED_PECI_CTRL_PECI_EN?????????????BIT(4)
>> > +#define?? ASPEED_PECI_CTRL_PECI_CLK_EN?????????BIT(0)
>> > +
>> > +/* Timing Negotiation Register */
>> > +#define ASPEED_PECI_TIMING_NEGOTIATION?????????0x04
>> > +#define?? ASPEED_PECI_TIMING_MESSAGE_MASK??????GENMASK(15, 8)
>> > +#define?? ASPEED_PECI_TIMING_ADDRESS_MASK??????GENMASK(7, 0)
>> > +
>> > +/* Command Register */
>> > +#define ASPEED_PECI_CMD????????????????????????????????0x08
>> > +#define?? ASPEED_PECI_CMD_PIN_MON??????????????BIT(31)
>> > +#define?? ASPEED_PECI_CMD_STS_MASK?????????????GENMASK(27, 24)
>> > +#define???? ASPEED_PECI_CMD_STS_ADDR_T_NEGO????0x3
>> > +#define?? ASPEED_PECI_CMD_IDLE_MASK????????????\
>> > +???????? (ASPEED_PECI_CMD_STS_MASK | ASPEED_PECI_CMD_PIN_MON)
>> > +#define?? ASPEED_PECI_CMD_FIRE?????????????????BIT(0)
>> > +
>> > +/* Read/Write Length Register */
>> > +#define ASPEED_PECI_RW_LENGTH??????????????????0x0c
>> > +#define?? ASPEED_PECI_AW_FCS_EN????????????????????????BIT(31)
>> > +#define?? ASPEED_PECI_READ_LEN_MASK????????????GENMASK(23, 16)
>> > +#define?? ASPEED_PECI_WRITE_LEN_MASK???????????GENMASK(15, 8)
>> > +#define?? ASPEED_PECI_TAGET_ADDR_MASK??????????GENMASK(7, 0)
>>
>> s/TAGET/TARGET/
>>
>
>Ack.
>
>> > +
>> > +/* Expected FCS Data Register */
>> > +#define ASPEED_PECI_EXP_FCS????????????????????0x10
>> > +#define?? ASPEED_PECI_EXP_READ_FCS_MASK????????????????GENMASK(23, 16)
>> > +#define?? ASPEED_PECI_EXP_AW_FCS_AUTO_MASK?????GENMASK(15, 8)
>> > +#define?? ASPEED_PECI_EXP_WRITE_FCS_MASK???????GENMASK(7, 0)
>> > +
>> > +/* Captured FCS Data Register */
>> > +#define ASPEED_PECI_CAP_FCS????????????????????0x14
>> > +#define?? ASPEED_PECI_CAP_READ_FCS_MASK????????????????GENMASK(23, 16)
>> > +#define?? ASPEED_PECI_CAP_WRITE_FCS_MASK???????GENMASK(7, 0)
>> > +
>> > +/* Interrupt Register */
>> > +#define ASPEED_PECI_INT_CTRL???????????????????0x18
>> > +#define?? ASPEED_PECI_TIMING_NEGO_SEL_MASK?????GENMASK(31, 30)
>> > +#define???? ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO???0
>> > +#define???? ASPEED_PECI_2ND_BIT_OF_ADDR_NEGO???1
>> > +#define???? ASPEED_PECI_MESSAGE_NEGO???????????2
>> > +#define?? ASPEED_PECI_INT_MASK?????????????????GENMASK(4, 0)
>> > +#define?? ASPEED_PECI_INT_BUS_TIMEOUT??????????BIT(4)
>> > +#define?? ASPEED_PECI_INT_BUS_CONNECT??????????BIT(3)
>>
>> s/CONNECT/CONTENTION/
>
>Ack.
>
>>
>> > +#define?? ASPEED_PECI_INT_W_FCS_BAD????????????BIT(2)
>> > +#define?? ASPEED_PECI_INT_W_FCS_ABORT??????????BIT(1)
>> > +#define?? ASPEED_PECI_INT_CMD_DONE?????????????BIT(0)
>> > +
>> > +/* Interrupt Status Register */
>> > +#define ASPEED_PECI_INT_STS????????????????????0x1c
>> > +#define?? ASPEED_PECI_INT_TIMING_RESULT_MASK???GENMASK(29, 16)
>> > +???????? /* bits[4..0]: Same bit fields in the 'Interrupt Register' */
>> > +
>> > +/* Rx/Tx Data Buffer Registers */
>> > +#define ASPEED_PECI_W_DATA0????????????????????0x20
>> > +#define ASPEED_PECI_W_DATA1????????????????????0x24
>> > +#define ASPEED_PECI_W_DATA2????????????????????0x28
>> > +#define ASPEED_PECI_W_DATA3????????????????????0x2c
>> > +#define ASPEED_PECI_R_DATA0????????????????????0x30
>> > +#define ASPEED_PECI_R_DATA1????????????????????0x34
>> > +#define ASPEED_PECI_R_DATA2????????????????????0x38
>> > +#define ASPEED_PECI_R_DATA3????????????????????0x3c
>> > +#define ASPEED_PECI_W_DATA4????????????????????0x40
>> > +#define ASPEED_PECI_W_DATA5????????????????????0x44
>> > +#define ASPEED_PECI_W_DATA6????????????????????0x48
>> > +#define ASPEED_PECI_W_DATA7????????????????????0x4c
>> > +#define ASPEED_PECI_R_DATA4????????????????????0x50
>> > +#define ASPEED_PECI_R_DATA5????????????????????0x54
>> > +#define ASPEED_PECI_R_DATA6????????????????????0x58
>> > +#define ASPEED_PECI_R_DATA7????????????????????0x5c
>> > +#define?? ASPEED_PECI_DATA_BUF_SIZE_MAX????????????????32
>> > +
>> > +/* Timing Negotiation */
>> > +#define ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT??8
>> > +#define ASPEED_PECI_RD_SAMPLING_POINT_MAX??????(BIT(4) - 1)
>> > +#define ASPEED_PECI_CLK_DIV_DEFAULT????????????0
>> > +#define ASPEED_PECI_CLK_DIV_MAX????????????????????????(BIT(3) - 1)
>> > +#define ASPEED_PECI_MSG_TIMING_DEFAULT?????????1
>> > +#define ASPEED_PECI_MSG_TIMING_MAX?????????????(BIT(8) - 1)
>> > +#define ASPEED_PECI_ADDR_TIMING_DEFAULT????????????????1
>> > +#define ASPEED_PECI_ADDR_TIMING_MAX????????????(BIT(8) - 1)
>> > +
>> > +/* Timeout */
>> > +#define ASPEED_PECI_IDLE_CHECK_TIMEOUT_US??????(50 * USEC_PER_MSEC)
>> > +#define ASPEED_PECI_IDLE_CHECK_INTERVAL_US?????(10 * USEC_PER_MSEC)
>> > +#define ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT?????(1000)
>> > +#define ASPEED_PECI_CMD_TIMEOUT_MS_MAX?????????(1000)
>> > +
>> > +struct aspeed_peci {
>> > +???????struct peci_controller controller;
>> > +???????struct device *dev;
>> > +???????void __iomem *base;
>> > +???????struct clk *clk;
>> > +???????struct reset_control *rst;
>> > +???????int irq;
>> > +???????spinlock_t lock; /* to sync completion status handling */
>> > +???????struct completion xfer_complete;
>> > +???????u32 status;
>> > +???????u32 cmd_timeout_ms;
>> > +???????u32 msg_timing;
>> > +???????u32 addr_timing;
>> > +???????u32 rd_sampling_point;
>> > +???????u32 clk_div;
>> > +};
>> > +
>> > +static inline struct aspeed_peci *to_aspeed_peci(struct peci_controller *a)
>> > +{
>> > +???????return container_of(a, struct aspeed_peci, controller);
>> > +}
>> > +
>> > +static void aspeed_peci_init_regs(struct aspeed_peci *priv)
>> > +{
>> > +???????u32 val;
>> > +
>> > +???????val = FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK,
>> > ASPEED_PECI_CLK_DIV_DEFAULT);
>> > +???????val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
>> > +???????writel(val, priv->base + ASPEED_PECI_CTRL);
>> > +???????/*
>> > +??????? * Timing negotiation period setting.
>> > +??????? * The unit of the programmed value is 4 times of PECI clock period.
>> > +??????? */
>> > +???????val = FIELD_PREP(ASPEED_PECI_TIMING_MESSAGE_MASK, priv->msg_timing);
>> > +???????val |= FIELD_PREP(ASPEED_PECI_TIMING_ADDRESS_MASK, priv-
>> > >addr_timing);
>> > +???????writel(val, priv->base + ASPEED_PECI_TIMING_NEGOTIATION);
>> > +
>> > +???????/* Clear interrupts */
>> > +???????val = readl(priv->base + ASPEED_PECI_INT_STS) |
>> > ASPEED_PECI_INT_MASK;
>>
>> This should be & instead of |, I'm guessing?
>>
>
>I believe the idea is to unconditionally clear all known interrupt status bits,
>(irrelevant of what value is already set in regs), and the HW expects that this
>is done by writing 1 to corresponding bits.
>
Ah -- I had been thinking we needed to ensure that we were only writing
zeros to reserved or RO bits, but I suppose re-writing whatever bit
pattern they provide on a read is probably okay (I'm having trouble
finding any explicit statement either way in the datasheet I've got).
>> > +???????writel(val, priv->base + ASPEED_PECI_INT_STS);
>> > +
>> > +???????/* Set timing negotiation mode and enable interrupts */
>> > +???????val = FIELD_PREP(ASPEED_PECI_TIMING_NEGO_SEL_MASK,
>> > ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO);
>> > +???????val |= ASPEED_PECI_INT_MASK;
>> > +???????writel(val, priv->base + ASPEED_PECI_INT_CTRL);
>> > +
>> > +???????val = FIELD_PREP(ASPEED_PECI_CTRL_SAMPLING_MASK, priv-
>> > >rd_sampling_point);
>> > +???????val |= FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, priv->clk_div);
>> > +???????val |= ASPEED_PECI_CTRL_PECI_EN;
>> > +???????val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
>> > +???????writel(val, priv->base + ASPEED_PECI_CTRL);
>> > +}
>> > +
>> > +static inline int aspeed_peci_check_idle(struct aspeed_peci *priv)
>> > +{
>> > +???????u32 cmd_sts = readl(priv->base + ASPEED_PECI_CMD);
>> > +
>> > +???????if (FIELD_GET(ASPEED_PECI_CMD_STS_MASK, cmd_sts) ==
>> > ASPEED_PECI_CMD_STS_ADDR_T_NEGO)
>> > +???????????????aspeed_peci_init_regs(priv);
>> > +
>> > +???????return readl_poll_timeout(priv->base + ASPEED_PECI_CMD,
>> > +???????????????????????????????? cmd_sts,
>> > +???????????????????????????????? !(cmd_sts & ASPEED_PECI_CMD_IDLE_MASK),
>> > +???????????????????????????????? ASPEED_PECI_IDLE_CHECK_INTERVAL_US,
>> > +???????????????????????????????? ASPEED_PECI_IDLE_CHECK_TIMEOUT_US);
>> > +}
>> > +
>> > +static int aspeed_peci_xfer(struct peci_controller *controller,
>> > +?????????????????????????? u8 addr, struct peci_request *req)
>> > +{
>> > +???????struct aspeed_peci *priv = to_aspeed_peci(controller);
>> > +???????unsigned long flags, timeout = msecs_to_jiffies(priv-
>> > >cmd_timeout_ms);
>> > +???????u32 peci_head;
>> > +???????int ret;
>> > +
>> > +???????if (req->tx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX ||
>> > +?????????? req->rx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX)
>> > +???????????????return -EINVAL;
>> > +
>> > +???????/* Check command sts and bus idle state */
>> > +???????ret = aspeed_peci_check_idle(priv);
>> > +???????if (ret)
>> > +???????????????return ret; /* -ETIMEDOUT */
>> > +
>> > +???????spin_lock_irqsave(&priv->lock, flags);
>> > +???????reinit_completion(&priv->xfer_complete);
>> > +
>> > +???????peci_head = FIELD_PREP(ASPEED_PECI_TAGET_ADDR_MASK, addr) |
>> > +?????????????????? FIELD_PREP(ASPEED_PECI_WRITE_LEN_MASK, req->tx.len) |
>> > +?????????????????? FIELD_PREP(ASPEED_PECI_READ_LEN_MASK, req->rx.len);
>> > +
>> > +???????writel(peci_head, priv->base + ASPEED_PECI_RW_LENGTH);
>> > +
>> > +???????memcpy_toio(priv->base + ASPEED_PECI_W_DATA0, req->tx.buf,
>> > +?????????????????? req->tx.len > 16 ? 16 : req->tx.len);
>>
>> min(req->tx.len, 16) for the third argument there might be a bit
>> clearer.
>
>Ack.
>
>>
>> > +???????if (req->tx.len > 16)
>> > +???????????????memcpy_toio(priv->base + ASPEED_PECI_W_DATA4, req->tx.buf +
>> > 16,
>> > +?????????????????????????? req->tx.len - 16);
>> > +
>> > +???????dev_dbg(priv->dev, "HEAD : 0x%08x\n", peci_head);
>> > +???????print_hex_dump_bytes("TX : ", DUMP_PREFIX_NONE, req->tx.buf, req-
>> > >tx.len);
>> > +
>> > +???????priv->status = 0;
>> > +???????writel(ASPEED_PECI_CMD_FIRE, priv->base + ASPEED_PECI_CMD);
>> > +???????spin_unlock_irqrestore(&priv->lock, flags);
>> > +
>> > +???????ret = wait_for_completion_interruptible_timeout(&priv-
>> > >xfer_complete, timeout);
>> > +???????if (ret < 0)
>> > +???????????????return ret;
>> > +
>> > +???????if (ret == 0) {
>> > +???????????????dev_dbg(priv->dev, "Timeout waiting for a response!\n");
>> > +???????????????return -ETIMEDOUT;
>> > +???????}
>> > +
>> > +???????spin_lock_irqsave(&priv->lock, flags);
>> > +
>> > +???????writel(0, priv->base + ASPEED_PECI_CMD);
>> > +
>> > +???????if (priv->status != ASPEED_PECI_INT_CMD_DONE) {
>> > +???????????????spin_unlock_irqrestore(&priv->lock, flags);
>> > +???????????????dev_dbg(priv->dev, "No valid response!\n");
>> > +???????????????return -EIO;
>> > +???????}
>> > +
>> > +???????spin_unlock_irqrestore(&priv->lock, flags);
>> > +
>> > +???????memcpy_fromio(req->rx.buf, priv->base + ASPEED_PECI_R_DATA0,
>> > +???????????????????? req->rx.len > 16 ? 16 : req->rx.len);
>>
>> Likewise, min(req->rx.len, 16) here.
>
>Ack.
>
>>
>> > +???????if (req->rx.len > 16)
>> > +???????????????memcpy_fromio(req->rx.buf + 16, priv->base +
>> > ASPEED_PECI_R_DATA4,
>> > +???????????????????????????? req->rx.len - 16);
>> > +
>> > +???????print_hex_dump_bytes("RX : ", DUMP_PREFIX_NONE, req->rx.buf, req-
>> > >rx.len);
>> > +
>> > +???????return 0;
>> > +}
>> > +
>> > +static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)
>> > +{
>> > +???????struct aspeed_peci *priv = arg;
>> > +???????u32 status;
>> > +
>> > +???????spin_lock(&priv->lock);
>> > +???????status = readl(priv->base + ASPEED_PECI_INT_STS);
>> > +???????writel(status, priv->base + ASPEED_PECI_INT_STS);
>> > +???????priv->status |= (status & ASPEED_PECI_INT_MASK);
>> > +
>> > +???????/*
>> > +??????? * In most cases, interrupt bits will be set one by one but also
>> > note
>> > +??????? * that multiple interrupt bits could be set at the same time.
>> > +??????? */
>> > +???????if (status & ASPEED_PECI_INT_BUS_TIMEOUT)
>> > +???????????????dev_dbg_ratelimited(priv->dev,
>> > "ASPEED_PECI_INT_BUS_TIMEOUT\n");
>> > +
>> > +???????if (status & ASPEED_PECI_INT_BUS_CONNECT)
>> > +???????????????dev_dbg_ratelimited(priv->dev,
>> > "ASPEED_PECI_INT_BUS_CONNECT\n");
>>
>> s/CONNECT/CONTENTION/ here too (in the message string).
>
>Ack.
>
>>
>> > +
>> > +???????if (status & ASPEED_PECI_INT_W_FCS_BAD)
>> > +???????????????dev_dbg_ratelimited(priv->dev,
>> > "ASPEED_PECI_INT_W_FCS_BAD\n");
>> > +
>> > +???????if (status & ASPEED_PECI_INT_W_FCS_ABORT)
>> > +???????????????dev_dbg_ratelimited(priv->dev,
>> > "ASPEED_PECI_INT_W_FCS_ABORT\n");
>>
>> Bus contention can of course arise legitimately, and I suppose an
>> offline host CPU might result in a timeout, so dbg seems fine for those
>> (though as Dan suggests, making some counters available seems like a
>> good idea, especially for contention).? Are the FCS error cases
>> significant enough to warrant something less likely to go unnoticed
>> though?? (e.g. dev_warn_ratelimited() or something?)
>
>It's similar story for FCS errors (can occur legitimately).
>We can hit ASPEED_PECI_INT_W_FCS_BAD in completely valid scenarios, e.g.
>unsuccessful detect during rescan.
>In case of ASPEED_PECI_INT_W_FCS_ABORT - caller can hit this by providing e.g.
>malformed command. Since we do return -EIO in this case, caller can print its
>own log. In other words, it's not always an error condition in peci-aspeed (or
>HW). Moreover, if we ever expose more direct PECI access to userspace (pecidev,
>or something similar) this warn would be user triggerable.
>
>I would prefer to keep this at debug level for now.
>
Okay, I guess that's alright -- counters of some sort (e.g. in sysfs)
would be a nice thing to supplement that with for diagnosing problems
though, I think.
>>
>> > +
>> > +???????/*
>> > +??????? * All commands should be ended up with a ASPEED_PECI_INT_CMD_DONE
>> > bit
>> > +??????? * set even in an error case.
>> > +??????? */
>> > +???????if (status & ASPEED_PECI_INT_CMD_DONE)
>> > +???????????????complete(&priv->xfer_complete);
>> > +
>> > +???????spin_unlock(&priv->lock);
>> > +
>> > +???????return IRQ_HANDLED;
>> > +}
>> > +
>> > +static void __sanitize_clock_divider(struct aspeed_peci *priv)
>> > +{
>> > +???????u32 clk_div;
>> > +???????int ret;
>> > +
>> > +???????ret = device_property_read_u32(priv->dev, "clock-divider",
>> > &clk_div);
>> > +???????if (ret) {
>> > +???????????????clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
>> > +???????} else if (clk_div > ASPEED_PECI_CLK_DIV_MAX) {
>> > +???????????????dev_warn(priv->dev, "Invalid clock-divider: %u, Using
>> > default: %u\n",
>> > +??????????????????????? clk_div, ASPEED_PECI_CLK_DIV_DEFAULT);
>> > +
>> > +???????????????clk_div = ASPEED_PECI_CLK_DIV_DEFAULT;
>> > +???????}
>> > +
>> > +???????priv->clk_div = clk_div;
>> > +}
>> > +
>>
>> The naming of these __sanitize_*() functions is a bit inconsistent with
>> the rest of the driver -- though given how similar they all look, could
>> they instead be refactored into a single helper function taking
>> property-name, default-value, and max-value parameters?
>
>You're right - we can have a single helper.
>
>Regarding naming, the idea was to have a simple "inner" helper function to be
>called by the more appropriately named aspeed_peci_device_property_sanitize().
>
>Do you think I should use "aspeed_peci_" prefix in this function name or just
>remove "__" and name it "sanitize_property()"?
>
I just think it's generally nice to have function names give some
indication as to what part of the kernel they belong to -- that way when
they show up in a stack bracktrace or a symbol list (e.g. for ftrace
usage and such) it's clearer what they are (and reduces the likelihood
of name collisions and ensuing confusion).
>>
>> > +static void __sanitize_msg_timing(struct aspeed_peci *priv)
>> > +{
>> > +???????u32 msg_timing;
>> > +???????int ret;
>> > +
>> > +???????ret = device_property_read_u32(priv->dev, "msg-timing",
>> > &msg_timing);
>> > +???????if (ret) {
>> > +???????????????msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
>> > +???????} else if (msg_timing > ASPEED_PECI_MSG_TIMING_MAX) {
>> > +???????????????dev_warn(priv->dev, "Invalid msg-timing : %u, Use default :
>> > %u\n",
>> > +??????????????????????? msg_timing, ASPEED_PECI_MSG_TIMING_DEFAULT);
>> > +
>> > +???????????????msg_timing = ASPEED_PECI_MSG_TIMING_DEFAULT;
>> > +???????}
>> > +
>> > +???????priv->msg_timing = msg_timing;
>> > +}
>> > +
>> > +static void __sanitize_addr_timing(struct aspeed_peci *priv)
>> > +{
>> > +???????u32 addr_timing;
>> > +???????int ret;
>> > +
>> > +???????ret = device_property_read_u32(priv->dev, "addr-timing",
>> > &addr_timing);
>> > +???????if (ret) {
>> > +???????????????addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
>> > +???????} else if (addr_timing > ASPEED_PECI_ADDR_TIMING_MAX) {
>> > +???????????????dev_warn(priv->dev, "Invalid addr-timing : %u, Use default :
>> > %u\n",
>> > +??????????????????????? addr_timing, ASPEED_PECI_ADDR_TIMING_DEFAULT);
>> > +
>> > +???????????????addr_timing = ASPEED_PECI_ADDR_TIMING_DEFAULT;
>> > +???????}
>> > +
>> > +???????priv->addr_timing = addr_timing;
>> > +}
>> > +
>> > +static void __sanitize_rd_sampling_point(struct aspeed_peci *priv)
>> > +{
>> > +???????u32 rd_sampling_point;
>> > +???????int ret;
>> > +
>> > +???????ret = device_property_read_u32(priv->dev, "rd-sampling-point",
>> > &rd_sampling_point);
>> > +???????if (ret) {
>> > +???????????????rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
>> > +???????} else if (rd_sampling_point > ASPEED_PECI_RD_SAMPLING_POINT_MAX) {
>> > +???????????????dev_warn(priv->dev, "Invalid rd-sampling-point: %u, Use
>> > default : %u\n",
>> > +??????????????????????? rd_sampling_point,
>> > ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT);
>> > +
>> > +???????????????rd_sampling_point = ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT;
>> > +???????}
>> > +
>> > +???????priv->rd_sampling_point = rd_sampling_point;
>> > +}
>> > +
>> > +static void __sanitize_cmd_timeout(struct aspeed_peci *priv)
>> > +{
>> > +???????u32 timeout;
>> > +???????int ret;
>> > +
>> > +???????ret = device_property_read_u32(priv->dev, "cmd-timeout-ms",
>> > &timeout);
>> > +???????if (ret) {
>> > +???????????????timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
>> > +???????} else if (timeout > ASPEED_PECI_CMD_TIMEOUT_MS_MAX || timeout == 0)
>> > {
>> > +???????????????dev_warn(priv->dev, "Invalid cmd-timeout-ms: %u, Use
>> > default: %u\n",
>> > +??????????????????????? timeout, ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT);
>> > +
>> > +???????????????timeout = ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT;
>> > +???????}
>> > +
>> > +???????priv->cmd_timeout_ms = timeout;
>> > +}
>> > +
>> > +static void aspeed_peci_device_property_sanitize(struct aspeed_peci *priv)
>> > +{
>> > +???????__sanitize_clock_divider(priv);
>> > +???????__sanitize_msg_timing(priv);
>> > +???????__sanitize_addr_timing(priv);
>> > +???????__sanitize_rd_sampling_point(priv);
>> > +???????__sanitize_cmd_timeout(priv);
>> > +}
>> > +
>> > +static void aspeed_peci_disable_clk(void *data)
>> > +{
>> > +???????clk_disable_unprepare(data);
>> > +}
>> > +
>> > +static int aspeed_peci_init_ctrl(struct aspeed_peci *priv)
>> > +{
>> > +???????int ret;
>> > +
>> > +???????priv->clk = devm_clk_get(priv->dev, NULL);
>> > +???????if (IS_ERR(priv->clk))
>> > +???????????????return dev_err_probe(priv->dev, PTR_ERR(priv->clk), "Failed
>> > to get clk source\n");
>> > +
>> > +???????ret = clk_prepare_enable(priv->clk);
>> > +???????if (ret) {
>> > +???????????????dev_err(priv->dev, "Failed to enable clock\n");
>> > +???????????????return ret;
>> > +???????}
>> > +
>> > +???????ret = devm_add_action_or_reset(priv->dev, aspeed_peci_disable_clk,
>> > priv->clk);
>> > +???????if (ret)
>> > +???????????????return ret;
>> > +
>> > +???????aspeed_peci_device_property_sanitize(priv);
>> > +
>> > +???????aspeed_peci_init_regs(priv);
>> > +
>> > +???????return 0;
>> > +}
>> > +
>> > +static int aspeed_peci_probe(struct platform_device *pdev)
>> > +{
>> > +???????struct aspeed_peci *priv;
>> > +???????int ret;
>> > +
>> > +???????priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
>> > +???????if (!priv)
>> > +???????????????return -ENOMEM;
>> > +
>> > +???????priv->dev = &pdev->dev;
>> > +???????dev_set_drvdata(priv->dev, priv);
>> > +
>> > +???????priv->base = devm_platform_ioremap_resource(pdev, 0);
>> > +???????if (IS_ERR(priv->base))
>> > +???????????????return PTR_ERR(priv->base);
>> > +
>> > +???????priv->irq = platform_get_irq(pdev, 0);
>> > +???????if (!priv->irq)
>> > +???????????????return priv->irq;
>> > +
>> > +???????ret = devm_request_irq(&pdev->dev, priv->irq,
>> > aspeed_peci_irq_handler,
>> > +????????????????????????????? 0, "peci-aspeed-irq", priv);
>>
>> Might as well drop the "-irq" suffix here?? (Seems a bit redundant, and
>> a quick glance through /proc/interrupts on the systems I have at hand
>> doesn't show anything else following that convention.)
>
>I'll remove it.
>
>Thank you
>-Iwona
>
>>
>> > +???????if (ret)
>> > +???????????????return ret;
>> > +
>> > +???????init_completion(&priv->xfer_complete);
>> > +???????spin_lock_init(&priv->lock);
>> > +
>> > +???????priv->controller.xfer = aspeed_peci_xfer;
>> > +
>> > +???????priv->rst = devm_reset_control_get(&pdev->dev, NULL);
>> > +???????if (IS_ERR(priv->rst)) {
>> > +???????????????dev_err(&pdev->dev, "Missing or invalid reset controller
>> > entry\n");
>> > +???????????????return PTR_ERR(priv->rst);
>> > +???????}
>> > +???????reset_control_deassert(priv->rst);
>> > +
>> > +???????ret = aspeed_peci_init_ctrl(priv);
>> > +???????if (ret)
>> > +???????????????return ret;
>> > +
>> > +???????return peci_controller_add(&priv->controller, priv->dev);
>> > +}
>> > +
>> > +static int aspeed_peci_remove(struct platform_device *pdev)
>> > +{
>> > +???????struct aspeed_peci *priv = dev_get_drvdata(&pdev->dev);
>> > +
>> > +???????peci_controller_remove(&priv->controller);
>> > +???????reset_control_assert(priv->rst);
>> > +
>> > +???????return 0;
>> > +}
>> > +
>> > +static const struct of_device_id aspeed_peci_of_table[] = {
>> > +???????{ .compatible = "aspeed,ast2400-peci", },
>> > +???????{ .compatible = "aspeed,ast2500-peci", },
>> > +???????{ .compatible = "aspeed,ast2600-peci", },
>> > +???????{ }
>> > +};
>> > +MODULE_DEVICE_TABLE(of, aspeed_peci_of_table);
>> > +
>> > +static struct platform_driver aspeed_peci_driver = {
>> > +???????.probe? = aspeed_peci_probe,
>> > +???????.remove = aspeed_peci_remove,
>> > +???????.driver = {
>> > +???????????????.name?????????? = "peci-aspeed",
>> > +???????????????.of_match_table = aspeed_peci_of_table,
>> > +???????},
>> > +};
>> > +module_platform_driver(aspeed_peci_driver);
>> > +
>> > +MODULE_AUTHOR("Ryan Chen <[email protected]>");
>> > +MODULE_AUTHOR("Jae Hyun Yoo <[email protected]>");
>> > +MODULE_DESCRIPTION("ASPEED PECI driver");
>> > +MODULE_LICENSE("GPL");
>> > +MODULE_IMPORT_NS(PECI);
>> > --
>> > 2.31.1
>
On Tue, 2021-07-27 at 17:49 +0000, Zev Weiss wrote:
> On Mon, Jul 12, 2021 at 05:04:41PM CDT, Iwona Winiarska wrote:
> > Since PECI devices are discoverable, we can dynamically detect devices
> > that are actually available in the system.
> >
> > This change complements the earlier implementation by rescanning PECI
> > bus to detect available devices. For this purpose, it also introduces the
> > minimal API for PECI requests.
> >
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> > drivers/peci/Makefile | 2 +-
> > drivers/peci/core.c | 13 ++++-
> > drivers/peci/device.c | 111 ++++++++++++++++++++++++++++++++++++++++
> > drivers/peci/internal.h | 15 ++++++
> > drivers/peci/request.c | 74 +++++++++++++++++++++++++++
> > drivers/peci/sysfs.c | 34 ++++++++++++
> > 6 files changed, 246 insertions(+), 3 deletions(-)
> > create mode 100644 drivers/peci/device.c
> > create mode 100644 drivers/peci/request.c
> >
> > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
> > index 621a993e306a..917f689e147a 100644
> > --- a/drivers/peci/Makefile
> > +++ b/drivers/peci/Makefile
> > @@ -1,7 +1,7 @@
> > # SPDX-License-Identifier: GPL-2.0-only
> >
> > # Core functionality
> > -peci-y := core.o sysfs.o
> > +peci-y := core.o request.o device.o sysfs.o
> > obj-$(CONFIG_PECI) += peci.o
> >
> > # Hardware specific bus drivers
> > diff --git a/drivers/peci/core.c b/drivers/peci/core.c
> > index 0ad00110459d..ae7a9572cdf3 100644
> > --- a/drivers/peci/core.c
> > +++ b/drivers/peci/core.c
> > @@ -31,7 +31,15 @@ struct device_type peci_controller_type = {
> >
> > int peci_controller_scan_devices(struct peci_controller *controller)
> > {
> > - /* Just a stub, no support for actual devices yet */
> > + int ret;
> > + u8 addr;
> > +
> > + for (addr = PECI_BASE_ADDR; addr < PECI_BASE_ADDR +
> > PECI_DEVICE_NUM_MAX; addr++) {
> > + ret = peci_device_create(controller, addr);
> > + if (ret)
> > + return ret;
> > + }
> > +
> > return 0;
> > }
> >
> > @@ -106,7 +114,8 @@ EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
> >
> > static int _unregister(struct device *dev, void *dummy)
> > {
> > - /* Just a stub, no support for actual devices yet */
> > + peci_device_destroy(to_peci_device(dev));
> > +
> > return 0;
> > }
> >
> > diff --git a/drivers/peci/device.c b/drivers/peci/device.c
> > new file mode 100644
> > index 000000000000..1124862211e2
> > --- /dev/null
> > +++ b/drivers/peci/device.c
> > @@ -0,0 +1,111 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2018-2021 Intel Corporation
> > +
> > +#include <linux/peci.h>
> > +#include <linux/slab.h>
> > +
> > +#include "internal.h"
> > +
> > +static int peci_detect(struct peci_controller *controller, u8 addr)
> > +{
> > + struct peci_request *req;
> > + int ret;
> > +
> > + req = peci_request_alloc(NULL, 0, 0);
> > + if (!req)
> > + return -ENOMEM;
> > +
>
> Might be worth a brief comment here noting that an empty request happens
> to be the format of a PECI ping command (and/or change the name of the
> function to peci_ping()).
I'll add a comment:
"We are using PECI Ping command to detect presence of PECI devices."
>
> > + mutex_lock(&controller->bus_lock);
> > + ret = controller->xfer(controller, addr, req);
> > + mutex_unlock(&controller->bus_lock);
> > +
> > + peci_request_free(req);
> > +
> > + return ret;
> > +}
> > +
> > +static bool peci_addr_valid(u8 addr)
> > +{
> > + return addr >= PECI_BASE_ADDR && addr < PECI_BASE_ADDR +
> > PECI_DEVICE_NUM_MAX;
> > +}
> > +
> > +static int peci_dev_exists(struct device *dev, void *data)
> > +{
> > + struct peci_device *device = to_peci_device(dev);
> > + u8 *addr = data;
> > +
> > + if (device->addr == *addr)
> > + return -EBUSY;
> > +
> > + return 0;
> > +}
> > +
> > +int peci_device_create(struct peci_controller *controller, u8 addr)
> > +{
> > + struct peci_device *device;
> > + int ret;
> > +
> > + if (WARN_ON(!peci_addr_valid(addr)))
> > + return -EINVAL;
>
> Wondering about the necessity of this check (and the peci_addr_valid()
> function) -- as of the end of this patch series, there's only one caller
> of peci_device_create(), and it's peci_controller_scan_devices() looping
> from PECI_BASE_ADDR to PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX, so
> checking that the address is in that range seems a bit redundant. Do we
> anticipate that we might gain additional callers in the future that
> could run a non-zero risk of passing a bad address?
It's just a sanity check to avoid any surprises if the code changes in the
future.
>
> > +
> > + /* Check if we have already detected this device before. */
> > + ret = device_for_each_child(&controller->dev, &addr,
> > peci_dev_exists);
> > + if (ret)
> > + return 0;
> > +
> > + ret = peci_detect(controller, addr);
> > + if (ret) {
> > + /*
> > + * Device not present or host state doesn't allow successful
> > + * detection at this time.
> > + */
> > + if (ret == -EIO || ret == -ETIMEDOUT)
> > + return 0;
>
> Do we really want to be ignoring EIO here? From a look at
> aspeed_peci_xfer(), it looks like the only path that would produce that
> is the non-timeout, non-CMD_DONE case, which I guess happens on
> contention or FCS errors and such. Should we maybe have some automatic
> (limited) retry loop for cases like those?
Yes, we want to ignore EIO here.
It may be returned when we get "Bad Write FCS", after we try to ping non-
existing PECI device.
>
> > +
> > + return ret;
> > + }
> > +
> > + device = kzalloc(sizeof(*device), GFP_KERNEL);
> > + if (!device)
> > + return -ENOMEM;
> > +
> > + device->controller = controller;
> > + device->addr = addr;
> > + device->dev.parent = &device->controller->dev;
> > + device->dev.bus = &peci_bus_type;
> > + device->dev.type = &peci_device_type;
> > +
> > + ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device-
> > >addr);
> > + if (ret)
> > + goto err_free;
> > +
> > + ret = device_register(&device->dev);
> > + if (ret)
> > + goto err_put;
> > +
> > + return 0;
> > +
> > +err_put:
> > + put_device(&device->dev);
> > +err_free:
> > + kfree(device);
> > +
> > + return ret;
> > +}
> > +
> > +void peci_device_destroy(struct peci_device *device)
> > +{
> > + device_unregister(&device->dev);
> > +}
> > +
> > +static void peci_device_release(struct device *dev)
> > +{
> > + struct peci_device *device = to_peci_device(dev);
> > +
> > + kfree(device);
> > +}
> > +
> > +struct device_type peci_device_type = {
> > + .groups = peci_device_groups,
> > + .release = peci_device_release,
> > +};
> > diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
> > index 80c61bcdfc6b..6b139adaf6b8 100644
> > --- a/drivers/peci/internal.h
> > +++ b/drivers/peci/internal.h
> > @@ -9,6 +9,21 @@
> >
> > struct peci_controller;
> > struct attribute_group;
> > +struct peci_device;
> > +struct peci_request;
> > +
> > +/* PECI CPU address range 0x30-0x37 */
> > +#define PECI_BASE_ADDR 0x30
> > +#define PECI_DEVICE_NUM_MAX 8
> > +
> > +struct peci_request *peci_request_alloc(struct peci_device *device, u8
> > tx_len, u8 rx_len);
> > +void peci_request_free(struct peci_request *req);
> > +
> > +extern struct device_type peci_device_type;
> > +extern const struct attribute_group *peci_device_groups[];
> > +
> > +int peci_device_create(struct peci_controller *controller, u8 addr);
> > +void peci_device_destroy(struct peci_device *device);
> >
> > extern struct bus_type peci_bus_type;
> > extern const struct attribute_group *peci_bus_groups[];
> > diff --git a/drivers/peci/request.c b/drivers/peci/request.c
> > new file mode 100644
> > index 000000000000..78cee51dfae1
> > --- /dev/null
> > +++ b/drivers/peci/request.c
> > @@ -0,0 +1,74 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2021 Intel Corporation
> > +
> > +#include <linux/export.h>
> > +#include <linux/peci.h>
> > +#include <linux/slab.h>
> > +#include <linux/types.h>
> > +
> > +#include "internal.h"
> > +
> > +/**
> > + * peci_request_alloc() - allocate &struct peci_request with buffers with
> > given lengths
> > + * @device: PECI device to which request is going to be sent
> > + * @tx_len: requested TX buffer length
> > + * @rx_len: requested RX buffer length
> > + *
> > + * Return: A pointer to a newly allocated &struct peci_request on success
> > or NULL otherwise.
> > + */
> > +struct peci_request *peci_request_alloc(struct peci_device *device, u8
> > tx_len, u8 rx_len)
> > +{
> > + struct peci_request *req;
> > + u8 *tx_buf, *rx_buf;
> > +
> > + req = kzalloc(sizeof(*req), GFP_KERNEL);
> > + if (!req)
> > + return NULL;
> > +
> > + req->device = device;
> > +
> > + /*
> > + * PECI controllers that we are using now don't support DMA, this
> > + * should be converted to DMA API once support for controllers that
> > do
> > + * allow it is added to avoid an extra copy.
> > + */
> > + if (tx_len) {
> > + tx_buf = kzalloc(tx_len, GFP_KERNEL);
> > + if (!tx_buf)
> > + goto err_free_req;
> > +
> > + req->tx.buf = tx_buf;
> > + req->tx.len = tx_len;
> > + }
> > +
> > + if (rx_len) {
> > + rx_buf = kzalloc(rx_len, GFP_KERNEL);
> > + if (!rx_buf)
> > + goto err_free_tx;
> > +
> > + req->rx.buf = rx_buf;
> > + req->rx.len = rx_len;
> > + }
> > +
>
> As long as we're punting on DMA support, could we do the whole thing in
> a single allocation instead of three? It'd add some pointer arithmetic,
> but would also simplify the error-handling/deallocation paths a bit.
>
> Or, given that the one controller we're currently supporting has a
> hardware limit of 32 bytes per transfer anyway, maybe just inline
> fixed-size rx/tx buffers into struct peci_request and have callers keep
> them on the stack instead of kmalloc()-ing them?
I disagree on error handling (it's not complicated) - however, one argument for
doing a single alloc (or moving the buffers as fixed-size arrays inside struct
peci_request) is that single kzalloc is going to be faster than 3. But I don't
expect it to show up on any perf profiles for now (since peci-wire interface is
not a speed demon).
I wanted to avoid defining max size for TX and RX in peci-core.
Do you have a strong opinion against multiple alloc? If yes, I can go with
fixed-size arrays inside struct peci_request.
Thanks
-Iwona
>
> > + return req;
> > +
> > +err_free_tx:
> > + kfree(req->tx.buf);
> > +err_free_req:
> > + kfree(req);
> > +
> > + return NULL;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_request_alloc, PECI);
> > +
> > +/**
> > + * peci_request_free() - free peci_request
> > + * @req: the PECI request to be freed
> > + */
> > +void peci_request_free(struct peci_request *req)
> > +{
> > + kfree(req->rx.buf);
> > + kfree(req->tx.buf);
> > + kfree(req);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
> > diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c
> > index 36c5e2a18a92..db9ef05776e3 100644
> > --- a/drivers/peci/sysfs.c
> > +++ b/drivers/peci/sysfs.c
> > @@ -1,6 +1,8 @@
> > // SPDX-License-Identifier: GPL-2.0-only
> > // Copyright (c) 2021 Intel Corporation
> >
> > +#include <linux/device.h>
> > +#include <linux/kernel.h>
> > #include <linux/peci.h>
> >
> > #include "internal.h"
> > @@ -46,3 +48,35 @@ const struct attribute_group *peci_bus_groups[] = {
> > &peci_bus_group,
> > NULL
> > };
> > +
> > +static ssize_t remove_store(struct device *dev, struct device_attribute
> > *attr,
> > + const char *buf, size_t count)
> > +{
> > + struct peci_device *device = to_peci_device(dev);
> > + bool res;
> > + int ret;
> > +
> > + ret = kstrtobool(buf, &res);
> > + if (ret)
> > + return ret;
> > +
> > + if (res && device_remove_file_self(dev, attr))
> > + peci_device_destroy(device);
> > +
> > + return count;
> > +}
> > +static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0200, NULL, remove_store);
> > +
> > +static struct attribute *peci_device_attrs[] = {
> > + &dev_attr_remove.attr,
> > + NULL
> > +};
> > +
> > +static const struct attribute_group peci_device_group = {
> > + .attrs = peci_device_attrs,
> > +};
> > +
> > +const struct attribute_group *peci_device_groups[] = {
> > + &peci_device_group,
> > + NULL
> > +};
> > --
> > 2.31.1
On Tue, 2021-07-27 at 20:10 +0000, Zev Weiss wrote:
> On Mon, Jul 12, 2021 at 05:04:42PM CDT, Iwona Winiarska wrote:
> > Here we're adding support for PECI device drivers, which unlike PECI
> > controller drivers are actually able to provide functionalities to
> > userspace.
> >
> > We're also extending peci_request API to allow querying more details
> > about PECI device (e.g. model/family), that's going to be used to find
> > a compatible peci_driver.
> >
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> > drivers/peci/Kconfig | 1 +
> > drivers/peci/core.c | 49 +++++++++
> > drivers/peci/device.c | 99 ++++++++++++++++++
> > drivers/peci/internal.h | 75 ++++++++++++++
> > drivers/peci/request.c | 217 ++++++++++++++++++++++++++++++++++++++++
> > include/linux/peci.h | 19 ++++
> > lib/Kconfig | 2 +-
> > 7 files changed, 461 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
> > index 0d0ee8009713..27c31535843c 100644
> > --- a/drivers/peci/Kconfig
> > +++ b/drivers/peci/Kconfig
> > @@ -2,6 +2,7 @@
> >
> > menuconfig PECI
> > tristate "PECI support"
> > + select GENERIC_LIB_X86
> > help
> > The Platform Environment Control Interface (PECI) is an interface
> > that provides a communication channel to Intel processors and
> > diff --git a/drivers/peci/core.c b/drivers/peci/core.c
> > index ae7a9572cdf3..94426b7f2618 100644
> > --- a/drivers/peci/core.c
> > +++ b/drivers/peci/core.c
> > @@ -143,8 +143,57 @@ void peci_controller_remove(struct peci_controller
> > *controller)
> > }
> > EXPORT_SYMBOL_NS_GPL(peci_controller_remove, PECI);
> >
> > +static const struct peci_device_id *
> > +peci_bus_match_device_id(const struct peci_device_id *id, struct
> > peci_device *device)
> > +{
> > + while (id->family != 0) {
> > + if (id->family == device->info.family &&
> > + id->model == device->info.model)
> > + return id;
> > + id++;
> > + }
> > +
> > + return NULL;
> > +}
> > +
> > +static int peci_bus_device_match(struct device *dev, struct device_driver
> > *drv)
> > +{
> > + struct peci_device *device = to_peci_device(dev);
> > + struct peci_driver *peci_drv = to_peci_driver(drv);
> > +
> > + if (dev->type != &peci_device_type)
> > + return 0;
> > +
> > + if (peci_bus_match_device_id(peci_drv->id_table, device))
> > + return 1;
> > +
> > + return 0;
> > +}
> > +
> > +static int peci_bus_device_probe(struct device *dev)
> > +{
> > + struct peci_device *device = to_peci_device(dev);
> > + struct peci_driver *driver = to_peci_driver(dev->driver);
> > +
> > + return driver->probe(device, peci_bus_match_device_id(driver-
> > >id_table, device));
> > +}
> > +
> > +static int peci_bus_device_remove(struct device *dev)
> > +{
> > + struct peci_device *device = to_peci_device(dev);
> > + struct peci_driver *driver = to_peci_driver(dev->driver);
> > +
> > + if (driver->remove)
> > + driver->remove(device);
> > +
> > + return 0;
> > +}
> > +
> > struct bus_type peci_bus_type = {
> > .name = "peci",
> > + .match = peci_bus_device_match,
> > + .probe = peci_bus_device_probe,
> > + .remove = peci_bus_device_remove,
> > .bus_groups = peci_bus_groups,
> > };
> >
> > diff --git a/drivers/peci/device.c b/drivers/peci/device.c
> > index 1124862211e2..8c4bd1ebbc29 100644
> > --- a/drivers/peci/device.c
> > +++ b/drivers/peci/device.c
> > @@ -1,11 +1,79 @@
> > // SPDX-License-Identifier: GPL-2.0-only
> > // Copyright (c) 2018-2021 Intel Corporation
> >
> > +#include <linux/bitfield.h>
> > #include <linux/peci.h>
> > #include <linux/slab.h>
> > +#include <linux/x86/cpu.h>
> >
> > #include "internal.h"
> >
> > +#define REVISION_NUM_MASK GENMASK(15, 8)
> > +static int peci_get_revision(struct peci_device *device, u8 *revision)
> > +{
> > + struct peci_request *req;
> > + u64 dib;
> > +
> > + req = peci_get_dib(device);
> > + if (IS_ERR(req))
> > + return PTR_ERR(req);
> > +
> > + dib = peci_request_data_dib(req);
> > + if (dib == 0) {
> > + peci_request_free(req);
> > + return -EIO;
>
> Any particular reason to check for zero specifically here? It looks
> like that would be a case where the host CPU responds and everything's
> otherwise fine, but the host just happened to send back a bunch of zeros
> for whatever reason -- which may not be a valid PECI revision number,
> but if it sent back a bunch of 0xff bytes instead wouldn't that be
> equally invalid?
The response with all 0's is possible (and defined) in certain device states. If
that happens - we don't want to continue adding the device (with "invalid"
revision 0), we just want to return error.
>
> Also, given that the docs (the ones I have, at least) describe the DIB
> as a collection of individual bytes, dealing with it as a combined u64
> seems a bit confusing to me -- could we just return req->rx.buf[1]
> instead?
GetDIB returns 8-byte response, which is why we're treating it in this way
(similar to other commands). We're pulling out the whole response and use
FIELD_GET to obtain the data we need.
>
> > + }
> > +
> > + *revision = FIELD_GET(REVISION_NUM_MASK, dib);
> > +
> > + peci_request_free(req);
> > +
> > + return 0;
> > +}
> > +
> > +static int peci_get_cpu_id(struct peci_device *device, u32 *cpu_id)
> > +{
> > + struct peci_request *req;
> > + int ret;
> > +
> > + req = peci_pkg_cfg_readl(device, PECI_PCS_PKG_ID,
> > PECI_PKG_ID_CPU_ID);
> > + if (IS_ERR(req))
> > + return PTR_ERR(req);
> > +
> > + ret = peci_request_status(req);
> > + if (ret)
> > + goto out_req_free;
> > +
> > + *cpu_id = peci_request_data_readl(req);
> > +out_req_free:
>
> As suggested on patch #8, I think it might be cleaner to stack-allocate
> struct peci_request, which would obviate the need for explicit free
> calls in functions like this and hence might simplify it away entirely,
> but if this does remain like this we could just do
>
> if (!ret)
> *cpu_id = peci_request_data_readl(req);
>
> instead of using a goto to skip a single line.
Please, see my response on patch 8.
I would prefer to operate on allocated objects rather than on local variables in
case of peci requests.
>
> > + peci_request_free(req);
> > +
> > + return ret;
> > +}
> > +
> > +static int peci_device_info_init(struct peci_device *device)
> > +{
> > + u8 revision;
> > + u32 cpu_id;
> > + int ret;
> > +
> > + ret = peci_get_cpu_id(device, &cpu_id);
> > + if (ret)
> > + return ret;
> > +
> > + device->info.family = x86_family(cpu_id);
> > + device->info.model = x86_model(cpu_id);
> > +
> > + ret = peci_get_revision(device, &revision);
> > + if (ret)
> > + return ret;
> > + device->info.peci_revision = revision;
> > +
> > + device->info.socket_id = device->addr - PECI_BASE_ADDR;
> > +
> > + return 0;
> > +}
> > +
> > static int peci_detect(struct peci_controller *controller, u8 addr)
> > {
> > struct peci_request *req;
> > @@ -75,6 +143,10 @@ int peci_device_create(struct peci_controller
> > *controller, u8 addr)
> > device->dev.bus = &peci_bus_type;
> > device->dev.type = &peci_device_type;
> >
> > + ret = peci_device_info_init(device);
> > + if (ret)
> > + goto err_free;
> > +
> > ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device-
> > >addr);
> > if (ret)
> > goto err_free;
> > @@ -98,6 +170,33 @@ void peci_device_destroy(struct peci_device *device)
> > device_unregister(&device->dev);
> > }
> >
> > +int __peci_driver_register(struct peci_driver *driver, struct module
> > *owner,
> > + const char *mod_name)
> > +{
> > + driver->driver.bus = &peci_bus_type;
> > + driver->driver.owner = owner;
> > + driver->driver.mod_name = mod_name;
> > +
> > + if (!driver->probe) {
> > + pr_err("peci: trying to register driver without probe
> > callback\n");
> > + return -EINVAL;
> > + }
> > +
> > + if (!driver->id_table) {
> > + pr_err("peci: trying to register driver without device id
> > table\n");
> > + return -EINVAL;
> > + }
> > +
> > + return driver_register(&driver->driver);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(__peci_driver_register, PECI);
> > +
> > +void peci_driver_unregister(struct peci_driver *driver)
> > +{
> > + driver_unregister(&driver->driver);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_driver_unregister, PECI);
> > +
> > static void peci_device_release(struct device *dev)
> > {
> > struct peci_device *device = to_peci_device(dev);
> > diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
> > index 6b139adaf6b8..c891c93e077a 100644
> > --- a/drivers/peci/internal.h
> > +++ b/drivers/peci/internal.h
> > @@ -19,6 +19,34 @@ struct peci_request;
> > struct peci_request *peci_request_alloc(struct peci_device *device, u8
> > tx_len, u8 rx_len);
> > void peci_request_free(struct peci_request *req);
> >
> > +int peci_request_status(struct peci_request *req);
> > +u64 peci_request_data_dib(struct peci_request *req);
> > +
> > +u8 peci_request_data_readb(struct peci_request *req);
> > +u16 peci_request_data_readw(struct peci_request *req);
> > +u32 peci_request_data_readl(struct peci_request *req);
> > +u64 peci_request_data_readq(struct peci_request *req);
> > +
> > +struct peci_request *peci_get_dib(struct peci_device *device);
> > +struct peci_request *peci_get_temp(struct peci_device *device);
> > +
> > +struct peci_request *peci_pkg_cfg_readb(struct peci_device *device, u8
> > index, u16 param);
> > +struct peci_request *peci_pkg_cfg_readw(struct peci_device *device, u8
> > index, u16 param);
> > +struct peci_request *peci_pkg_cfg_readl(struct peci_device *device, u8
> > index, u16 param);
> > +struct peci_request *peci_pkg_cfg_readq(struct peci_device *device, u8
> > index, u16 param);
> > +
> > +/**
> > + * struct peci_device_id - PECI device data to match
> > + * @data: pointer to driver private data specific to device
> > + * @family: device family
> > + * @model: device model
> > + */
> > +struct peci_device_id {
> > + const void *data;
> > + u16 family;
> > + u8 model;
> > +};
> > +
> > extern struct device_type peci_device_type;
> > extern const struct attribute_group *peci_device_groups[];
> >
> > @@ -28,6 +56,53 @@ void peci_device_destroy(struct peci_device *device);
> > extern struct bus_type peci_bus_type;
> > extern const struct attribute_group *peci_bus_groups[];
> >
> > +/**
> > + * struct peci_driver - PECI driver
> > + * @driver: inherit device driver
> > + * @probe: probe callback
> > + * @remove: remove callback
> > + * @id_table: PECI device match table to decide which device to bind
> > + */
> > +struct peci_driver {
> > + struct device_driver driver;
> > + int (*probe)(struct peci_device *device, const struct peci_device_id
> > *id);
> > + void (*remove)(struct peci_device *device);
> > + const struct peci_device_id *id_table;
> > +};
> > +
> > +static inline struct peci_driver *to_peci_driver(struct device_driver *d)
> > +{
> > + return container_of(d, struct peci_driver, driver);
> > +}
> > +
> > +int __peci_driver_register(struct peci_driver *driver, struct module
> > *owner,
> > + const char *mod_name);
> > +/**
> > + * peci_driver_register() - register PECI driver
> > + * @driver: the driver to be registered
> > + * @owner: owner module of the driver being registered
> > + * @mod_name: module name string
> > + *
> > + * PECI drivers that don't need to do anything special in module init
> > should
> > + * use the convenience "module_peci_driver" macro instead
> > + *
> > + * Return: zero on success, else a negative error code.
> > + */
> > +#define peci_driver_register(driver) \
> > + __peci_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
> > +void peci_driver_unregister(struct peci_driver *driver);
> > +
> > +/**
> > + * module_peci_driver() - Helper macro for registering a modular PECI
> > driver
> > + * @__peci_driver: peci_driver struct
> > + *
> > + * Helper macro for PECI drivers which do not do anything special in module
> > + * init/exit. This eliminates a lot of boilerplate. Each module may only
> > + * use this macro once, and calling it replaces module_init() and
> > module_exit()
> > + */
> > +#define module_peci_driver(__peci_driver) \
> > + module_driver(__peci_driver, peci_driver_register,
> > peci_driver_unregister)
> > +
> > extern struct device_type peci_controller_type;
> >
> > int peci_controller_scan_devices(struct peci_controller *controller);
> > diff --git a/drivers/peci/request.c b/drivers/peci/request.c
> > index 78cee51dfae1..48354455b554 100644
> > --- a/drivers/peci/request.c
> > +++ b/drivers/peci/request.c
> > @@ -1,13 +1,142 @@
> > // SPDX-License-Identifier: GPL-2.0-only
> > // Copyright (c) 2021 Intel Corporation
> >
> > +#include <linux/bug.h>
> > #include <linux/export.h>
> > #include <linux/peci.h>
> > #include <linux/slab.h>
> > #include <linux/types.h>
> >
> > +#include <asm/unaligned.h>
> > +
> > #include "internal.h"
> >
> > +#define PECI_GET_DIB_CMD 0xf7
> > +#define PECI_GET_DIB_WR_LEN 1
> > +#define PECI_GET_DIB_RD_LEN 8
> > +
> > +#define PECI_RDPKGCFG_CMD 0xa1
> > +#define PECI_RDPKGCFG_WRITE_LEN 5
> > +#define PECI_RDPKGCFG_READ_LEN_BASE 1
> > +#define PECI_WRPKGCFG_CMD 0xa5
> > +#define PECI_WRPKGCFG_WRITE_LEN_BASE 6
> > +#define PECI_WRPKGCFG_READ_LEN 1
> > +
> > +/* Device Specific Completion Code (CC) Definition */
> > +#define PECI_CC_SUCCESS 0x40
> > +#define PECI_CC_NEED_RETRY 0x80
> > +#define PECI_CC_OUT_OF_RESOURCE 0x81
> > +#define PECI_CC_UNAVAIL_RESOURCE 0x82
> > +#define PECI_CC_INVALID_REQ 0x90
> > +#define PECI_CC_MCA_ERROR 0x91
> > +#define PECI_CC_CATASTROPHIC_MCA_ERROR 0x93
> > +#define PECI_CC_FATAL_MCA_ERROR 0x94
> > +#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB 0x98
> > +#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR 0x9B
> > +#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA 0x9C
> > +
> > +#define PECI_RETRY_BIT BIT(0)
> > +
> > +#define PECI_RETRY_TIMEOUT msecs_to_jiffies(700)
> > +#define PECI_RETRY_INTERVAL_MIN msecs_to_jiffies(1)
> > +#define PECI_RETRY_INTERVAL_MAX msecs_to_jiffies(128)
> > +
> > +static u8 peci_request_data_cc(struct peci_request *req)
> > +{
> > + return req->rx.buf[0];
> > +}
> > +
> > +/**
> > + * peci_request_status() - return -errno based on PECI completion code
> > + * @req: the PECI request that contains response data with completion code
> > + *
> > + * It can't be used for Ping(), GetDIB() and GetTemp() - for those commands
> > we
> > + * don't expect completion code in the response.
> > + *
> > + * Return: -errno
> > + */
> > +int peci_request_status(struct peci_request *req)
> > +{
> > + u8 cc = peci_request_data_cc(req);
> > +
> > + if (cc != PECI_CC_SUCCESS)
> > + dev_dbg(&req->device->dev, "ret: %#02x\n", cc);
> > +
> > + switch (cc) {
> > + case PECI_CC_SUCCESS:
> > + return 0;
> > + case PECI_CC_NEED_RETRY:
> > + case PECI_CC_OUT_OF_RESOURCE:
> > + case PECI_CC_UNAVAIL_RESOURCE:
> > + return -EAGAIN;
> > + case PECI_CC_INVALID_REQ:
> > + return -EINVAL;
> > + case PECI_CC_MCA_ERROR:
> > + case PECI_CC_CATASTROPHIC_MCA_ERROR:
> > + case PECI_CC_FATAL_MCA_ERROR:
> > + case PECI_CC_PARITY_ERR_GPSB_OR_PMSB:
> > + case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR:
> > + case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA:
> > + return -EIO;
> > + }
> > +
> > + WARN_ONCE(1, "Unknown PECI completion code: %#02x\n", cc);
> > +
> > + return -EIO;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_request_status, PECI);
> > +
> > +static int peci_request_xfer(struct peci_request *req)
> > +{
> > + struct peci_device *device = req->device;
> > + struct peci_controller *controller = device->controller;
> > + int ret;
> > +
> > + mutex_lock(&controller->bus_lock);
> > + ret = controller->xfer(controller, device->addr, req);
> > + mutex_unlock(&controller->bus_lock);
> > +
> > + return ret;
> > +}
> > +
> > +static int peci_request_xfer_retry(struct peci_request *req)
> > +{
> > + long wait_interval = PECI_RETRY_INTERVAL_MIN;
> > + struct peci_device *device = req->device;
> > + struct peci_controller *controller = device->controller;
> > + unsigned long start = jiffies;
> > + int ret;
> > +
> > + /* Don't try to use it for ping */
> > + if (WARN_ON(!req->rx.buf))
> > + return 0;
> > +
> > + do {
> > + ret = peci_request_xfer(req);
> > + if (ret) {
> > + dev_dbg(&controller->dev, "xfer error: %d\n", ret);
> > + return ret;
> > + }
> > +
> > + if (peci_request_status(req) != -EAGAIN)
> > + return 0;
> > +
> > + /* Set the retry bit to indicate a retry attempt */
> > + req->tx.buf[1] |= PECI_RETRY_BIT;
> > +
> > + if (schedule_timeout_interruptible(wait_interval))
> > + return -ERESTARTSYS;
> > +
> > + wait_interval *= 2;
> > + if (wait_interval > PECI_RETRY_INTERVAL_MAX)
> > + wait_interval = PECI_RETRY_INTERVAL_MAX;
>
> wait_interval = min(wait_interval * 2, PECI_RETRY_INTERVAL_MAX) ?
Ack.
>
> > + } while (time_before(jiffies, start + PECI_RETRY_TIMEOUT));
> > +
> > + dev_dbg(&controller->dev, "request timed out\n");
> > +
> > + return -ETIMEDOUT;
> > +}
> > +
> > /**
> > * peci_request_alloc() - allocate &struct peci_request with buffers with
> > given lengths
> > * @device: PECI device to which request is going to be sent
> > @@ -72,3 +201,91 @@ void peci_request_free(struct peci_request *req)
> > kfree(req);
> > }
> > EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
> > +
> > +struct peci_request *peci_get_dib(struct peci_device *device)
> > +{
> > + struct peci_request *req;
> > + int ret;
> > +
> > + req = peci_request_alloc(device, PECI_GET_DIB_WR_LEN,
> > PECI_GET_DIB_RD_LEN);
> > + if (!req)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + req->tx.buf[0] = PECI_GET_DIB_CMD;
> > +
> > + ret = peci_request_xfer(req);
> > + if (ret) {
> > + peci_request_free(req);
> > + return ERR_PTR(ret);
> > + }
> > +
> > + return req;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_get_dib, PECI);
> > +
> > +static struct peci_request *
> > +__pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
> > +{
> > + struct peci_request *req;
> > + int ret;
> > +
> > + req = peci_request_alloc(device, PECI_RDPKGCFG_WRITE_LEN,
> > + PECI_RDPKGCFG_READ_LEN_BASE + len);
> > + if (!req)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + req->tx.buf[0] = PECI_RDPKGCFG_CMD;
> > + req->tx.buf[1] = 0;
> > + req->tx.buf[2] = index;
> > + put_unaligned_le16(param, &req->tx.buf[3]);
> > +
> > + ret = peci_request_xfer_retry(req);
> > + if (ret) {
> > + peci_request_free(req);
> > + return ERR_PTR(ret);
> > + }
> > +
> > + return req;
> > +}
> > +
> > +u8 peci_request_data_readb(struct peci_request *req)
> > +{
> > + return req->rx.buf[1];
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_request_data_readb, PECI);
> > +
> > +u16 peci_request_data_readw(struct peci_request *req)
> > +{
> > + return get_unaligned_le16(&req->rx.buf[1]);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_request_data_readw, PECI);
> > +
> > +u32 peci_request_data_readl(struct peci_request *req)
> > +{
> > + return get_unaligned_le32(&req->rx.buf[1]);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_request_data_readl, PECI);
> > +
> > +u64 peci_request_data_readq(struct peci_request *req)
> > +{
> > + return get_unaligned_le64(&req->rx.buf[1]);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_request_data_readq, PECI);
> > +
> > +u64 peci_request_data_dib(struct peci_request *req)
> > +{
> > + return get_unaligned_le64(&req->rx.buf[0]);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_request_data_dib, PECI);
> > +
> > +#define __read_pkg_config(x, type) \
> > +struct peci_request *peci_pkg_cfg_##x(struct peci_device *device, u8 index,
> > u16 param) \
> > +{ \
> > + return __pkg_cfg_read(device, index, param, sizeof(type)); \
> > +} \
>
> Is there a reason for this particular API? I'd think a more natural one
> that would offload a bit of boilerplate from callers would look more like
>
> int peci_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param, type
> *outp),
>
> returning peci_request_status() and writing the requested data to *outp
> if that status is zero.
We provide a consistent lower-level API for "internal" usage (for code in
drivers/peci), operating on requests and allowing access to full request,
including completion code, etc.
Then - we wrap that with "external" API (e.g. include/linux/peci-cpu.h) which is
the "more natural" one - it pulls out the necessary data from requests, deals
with error handling in an appropriate way converting completion codes to errno
values (abstracting away the PECI-specific details).
>
> > +EXPORT_SYMBOL_NS_GPL(peci_pkg_cfg_##x, PECI)
> > +
> > +__read_pkg_config(readb, u8);
> > +__read_pkg_config(readw, u16);
> > +__read_pkg_config(readl, u32);
> > +__read_pkg_config(readq, u64);
> > diff --git a/include/linux/peci.h b/include/linux/peci.h
> > index cdf3008321fd..f9f37b874011 100644
> > --- a/include/linux/peci.h
> > +++ b/include/linux/peci.h
> > @@ -9,6 +9,14 @@
> > #include <linux/mutex.h>
> > #include <linux/types.h>
> >
> > +#define PECI_PCS_PKG_ID 0 /* Package Identifier
> > Read */
> > +#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
> > +#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
> > +#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
> > +#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
> > +#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update
> > Revision */
> > +#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
> > +
> > struct peci_request;
> >
> > /**
> > @@ -41,6 +49,11 @@ static inline struct peci_controller
> > *to_peci_controller(void *d)
> > * struct peci_device - PECI device
> > * @dev: device object to register PECI device to the device model
> > * @controller: manages the bus segment hosting this PECI device
> > + * @info: PECI device characteristics
> > + * @info.family: device family
> > + * @info.model: device model
> > + * @info.peci_revision: PECI revision supported by the PECI device
> > + * @info.socket_id: the socket ID represented by the PECI device
> > * @addr: address used on the PECI bus connected to the parent controller
> > *
> > * A peci_device identifies a single device (i.e. CPU) connected to a PECI
> > bus.
> > @@ -50,6 +63,12 @@ static inline struct peci_controller
> > *to_peci_controller(void *d)
> > struct peci_device {
> > struct device dev;
> > struct peci_controller *controller;
> > + struct {
> > + u16 family;
> > + u8 model;
> > + u8 peci_revision;
>
> This field gets set but doesn't seem to end up used anywhere; is it
> useful?
The idea was to have mechanism to validate the revision number retrieved via
GetDIB with revision expected by the driver (since it uses commands that are
PECI revision dependent), and warn if there's a mismatch.
It seems I dropped the "validate and warn" part when doing the split on the
series. Good catch - I'll fix this in v2.
Thanks
-Iwona
>
> > + u8 socket_id;
> > + } info;
> > u8 addr;
> > };
> >
> > diff --git a/lib/Kconfig b/lib/Kconfig
> > index cc28bc1f2d84..a74e6c0fa75c 100644
> > --- a/lib/Kconfig
> > +++ b/lib/Kconfig
> > @@ -721,5 +721,5 @@ config ASN1_ENCODER
> >
> > config GENERIC_LIB_X86
> > bool
> > - depends on X86
> > + depends on X86 || PECI
> > default n
> > --
> > 2.31.1
On Thu, Jul 29, 2021 at 01:55:19PM CDT, Winiarska, Iwona wrote:
>On Tue, 2021-07-27 at 17:49 +0000, Zev Weiss wrote:
>> On Mon, Jul 12, 2021 at 05:04:41PM CDT, Iwona Winiarska wrote:
>> > Since PECI devices are discoverable, we can dynamically detect devices
>> > that are actually available in the system.
>> >
>> > This change complements the earlier implementation by rescanning PECI
>> > bus to detect available devices. For this purpose, it also introduces the
>> > minimal API for PECI requests.
>> >
>> > Signed-off-by: Iwona Winiarska <[email protected]>
>> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
>> > ---
>> > drivers/peci/Makefile?? |?? 2 +-
>> > drivers/peci/core.c???? |? 13 ++++-
>> > drivers/peci/device.c?? | 111 ++++++++++++++++++++++++++++++++++++++++
>> > drivers/peci/internal.h |? 15 ++++++
>> > drivers/peci/request.c? |? 74 +++++++++++++++++++++++++++
>> > drivers/peci/sysfs.c??? |? 34 ++++++++++++
>> > 6 files changed, 246 insertions(+), 3 deletions(-)
>> > create mode 100644 drivers/peci/device.c
>> > create mode 100644 drivers/peci/request.c
>> >
>> > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
>> > index 621a993e306a..917f689e147a 100644
>> > --- a/drivers/peci/Makefile
>> > +++ b/drivers/peci/Makefile
>> > @@ -1,7 +1,7 @@
>> > # SPDX-License-Identifier: GPL-2.0-only
>> >
>> > # Core functionality
>> > -peci-y := core.o sysfs.o
>> > +peci-y := core.o request.o device.o sysfs.o
>> > obj-$(CONFIG_PECI) += peci.o
>> >
>> > # Hardware specific bus drivers
>> > diff --git a/drivers/peci/core.c b/drivers/peci/core.c
>> > index 0ad00110459d..ae7a9572cdf3 100644
>> > --- a/drivers/peci/core.c
>> > +++ b/drivers/peci/core.c
>> > @@ -31,7 +31,15 @@ struct device_type peci_controller_type = {
>> >
>> > int peci_controller_scan_devices(struct peci_controller *controller)
>> > {
>> > -???????/* Just a stub, no support for actual devices yet */
>> > +???????int ret;
>> > +???????u8 addr;
>> > +
>> > +???????for (addr = PECI_BASE_ADDR; addr < PECI_BASE_ADDR +
>> > PECI_DEVICE_NUM_MAX; addr++) {
>> > +???????????????ret = peci_device_create(controller, addr);
>> > +???????????????if (ret)
>> > +???????????????????????return ret;
>> > +???????}
>> > +
>> > ????????return 0;
>> > }
>> >
>> > @@ -106,7 +114,8 @@ EXPORT_SYMBOL_NS_GPL(peci_controller_add, PECI);
>> >
>> > static int _unregister(struct device *dev, void *dummy)
>> > {
>> > -???????/* Just a stub, no support for actual devices yet */
>> > +???????peci_device_destroy(to_peci_device(dev));
>> > +
>> > ????????return 0;
>> > }
>> >
>> > diff --git a/drivers/peci/device.c b/drivers/peci/device.c
>> > new file mode 100644
>> > index 000000000000..1124862211e2
>> > --- /dev/null
>> > +++ b/drivers/peci/device.c
>> > @@ -0,0 +1,111 @@
>> > +// SPDX-License-Identifier: GPL-2.0-only
>> > +// Copyright (c) 2018-2021 Intel Corporation
>> > +
>> > +#include <linux/peci.h>
>> > +#include <linux/slab.h>
>> > +
>> > +#include "internal.h"
>> > +
>> > +static int peci_detect(struct peci_controller *controller, u8 addr)
>> > +{
>> > +???????struct peci_request *req;
>> > +???????int ret;
>> > +
>> > +???????req = peci_request_alloc(NULL, 0, 0);
>> > +???????if (!req)
>> > +???????????????return -ENOMEM;
>> > +
>>
>> Might be worth a brief comment here noting that an empty request happens
>> to be the format of a PECI ping command (and/or change the name of the
>> function to peci_ping()).
>
>I'll add a comment:
>"We are using PECI Ping command to detect presence of PECI devices."
>
Well, what I was more aiming to get at was that to someone not
intimately familiar with the PECI protocol it's not immediately obvious
from the code that it in fact implements a ping (there's no 'msg->cmd =
PECI_CMD_PING' or anything), so I was hoping for something that would
just make that slightly more explicit.
>>
>> > +???????mutex_lock(&controller->bus_lock);
>> > +???????ret = controller->xfer(controller, addr, req);
>> > +???????mutex_unlock(&controller->bus_lock);
>> > +
>> > +???????peci_request_free(req);
>> > +
>> > +???????return ret;
>> > +}
>> > +
>> > +static bool peci_addr_valid(u8 addr)
>> > +{
>> > +???????return addr >= PECI_BASE_ADDR && addr < PECI_BASE_ADDR +
>> > PECI_DEVICE_NUM_MAX;
>> > +}
>> > +
>> > +static int peci_dev_exists(struct device *dev, void *data)
>> > +{
>> > +???????struct peci_device *device = to_peci_device(dev);
>> > +???????u8 *addr = data;
>> > +
>> > +???????if (device->addr == *addr)
>> > +???????????????return -EBUSY;
>> > +
>> > +???????return 0;
>> > +}
>> > +
>> > +int peci_device_create(struct peci_controller *controller, u8 addr)
>> > +{
>> > +???????struct peci_device *device;
>> > +???????int ret;
>> > +
>> > +???????if (WARN_ON(!peci_addr_valid(addr)))
>> > +???????????????return -EINVAL;
>>
>> Wondering about the necessity of this check (and the peci_addr_valid()
>> function) -- as of the end of this patch series, there's only one caller
>> of peci_device_create(), and it's peci_controller_scan_devices() looping
>> from PECI_BASE_ADDR to PECI_BASE_ADDR + PECI_DEVICE_NUM_MAX, so
>> checking that the address is in that range seems a bit redundant.? Do we
>> anticipate that we might gain additional callers in the future that
>> could run a non-zero risk of passing a bad address?
>
>It's just a sanity check to avoid any surprises if the code changes in the
>future.
>
>>
>> > +
>> > +???????/* Check if we have already detected this device before. */
>> > +???????ret = device_for_each_child(&controller->dev, &addr,
>> > peci_dev_exists);
>> > +???????if (ret)
>> > +???????????????return 0;
>> > +
>> > +???????ret = peci_detect(controller, addr);
>> > +???????if (ret) {
>> > +???????????????/*
>> > +??????????????? * Device not present or host state doesn't allow successful
>> > +??????????????? * detection at this time.
>> > +??????????????? */
>> > +???????????????if (ret == -EIO || ret == -ETIMEDOUT)
>> > +???????????????????????return 0;
>>
>> Do we really want to be ignoring EIO here?? From a look at
>> aspeed_peci_xfer(), it looks like the only path that would produce that
>> is the non-timeout, non-CMD_DONE case, which I guess happens on
>> contention or FCS errors and such.? Should we maybe have some automatic
>> (limited) retry loop for cases like those?
>
>Yes, we want to ignore EIO here.
>It may be returned when we get "Bad Write FCS", after we try to ping non-
>existing PECI device.
>
>>
>> > +
>> > +???????????????return ret;
>> > +???????}
>> > +
>> > +???????device = kzalloc(sizeof(*device), GFP_KERNEL);
>> > +???????if (!device)
>> > +???????????????return -ENOMEM;
>> > +
>> > +???????device->controller = controller;
>> > +???????device->addr = addr;
>> > +???????device->dev.parent = &device->controller->dev;
>> > +???????device->dev.bus = &peci_bus_type;
>> > +???????device->dev.type = &peci_device_type;
>> > +
>> > +???????ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device-
>> > >addr);
>> > +???????if (ret)
>> > +???????????????goto err_free;
>> > +
>> > +???????ret = device_register(&device->dev);
>> > +???????if (ret)
>> > +???????????????goto err_put;
>> > +
>> > +???????return 0;
>> > +
>> > +err_put:
>> > +???????put_device(&device->dev);
>> > +err_free:
>> > +???????kfree(device);
>> > +
>> > +???????return ret;
>> > +}
>> > +
>> > +void peci_device_destroy(struct peci_device *device)
>> > +{
>> > +???????device_unregister(&device->dev);
>> > +}
>> > +
>> > +static void peci_device_release(struct device *dev)
>> > +{
>> > +???????struct peci_device *device = to_peci_device(dev);
>> > +
>> > +???????kfree(device);
>> > +}
>> > +
>> > +struct device_type peci_device_type = {
>> > +???????.groups?????????= peci_device_groups,
>> > +???????.release????????= peci_device_release,
>> > +};
>> > diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
>> > index 80c61bcdfc6b..6b139adaf6b8 100644
>> > --- a/drivers/peci/internal.h
>> > +++ b/drivers/peci/internal.h
>> > @@ -9,6 +9,21 @@
>> >
>> > struct peci_controller;
>> > struct attribute_group;
>> > +struct peci_device;
>> > +struct peci_request;
>> > +
>> > +/* PECI CPU address range 0x30-0x37 */
>> > +#define PECI_BASE_ADDR?????????0x30
>> > +#define PECI_DEVICE_NUM_MAX????????????8
>> > +
>> > +struct peci_request *peci_request_alloc(struct peci_device *device, u8
>> > tx_len, u8 rx_len);
>> > +void peci_request_free(struct peci_request *req);
>> > +
>> > +extern struct device_type peci_device_type;
>> > +extern const struct attribute_group *peci_device_groups[];
>> > +
>> > +int peci_device_create(struct peci_controller *controller, u8 addr);
>> > +void peci_device_destroy(struct peci_device *device);
>> >
>> > extern struct bus_type peci_bus_type;
>> > extern const struct attribute_group *peci_bus_groups[];
>> > diff --git a/drivers/peci/request.c b/drivers/peci/request.c
>> > new file mode 100644
>> > index 000000000000..78cee51dfae1
>> > --- /dev/null
>> > +++ b/drivers/peci/request.c
>> > @@ -0,0 +1,74 @@
>> > +// SPDX-License-Identifier: GPL-2.0-only
>> > +// Copyright (c) 2021 Intel Corporation
>> > +
>> > +#include <linux/export.h>
>> > +#include <linux/peci.h>
>> > +#include <linux/slab.h>
>> > +#include <linux/types.h>
>> > +
>> > +#include "internal.h"
>> > +
>> > +/**
>> > + * peci_request_alloc() - allocate &struct peci_request with buffers with
>> > given lengths
>> > + * @device: PECI device to which request is going to be sent
>> > + * @tx_len: requested TX buffer length
>> > + * @rx_len: requested RX buffer length
>> > + *
>> > + * Return: A pointer to a newly allocated &struct peci_request on success
>> > or NULL otherwise.
>> > + */
>> > +struct peci_request *peci_request_alloc(struct peci_device *device, u8
>> > tx_len, u8 rx_len)
>> > +{
>> > +???????struct peci_request *req;
>> > +???????u8 *tx_buf, *rx_buf;
>> > +
>> > +???????req = kzalloc(sizeof(*req), GFP_KERNEL);
>> > +???????if (!req)
>> > +???????????????return NULL;
>> > +
>> > +???????req->device = device;
>> > +
>> > +???????/*
>> > +??????? * PECI controllers that we are using now don't support DMA, this
>> > +??????? * should be converted to DMA API once support for controllers that
>> > do
>> > +??????? * allow it is added to avoid an extra copy.
>> > +??????? */
>> > +???????if (tx_len) {
>> > +???????????????tx_buf = kzalloc(tx_len, GFP_KERNEL);
>> > +???????????????if (!tx_buf)
>> > +???????????????????????goto err_free_req;
>> > +
>> > +???????????????req->tx.buf = tx_buf;
>> > +???????????????req->tx.len = tx_len;
>> > +???????}
>> > +
>> > +???????if (rx_len) {
>> > +???????????????rx_buf = kzalloc(rx_len, GFP_KERNEL);
>> > +???????????????if (!rx_buf)
>> > +???????????????????????goto err_free_tx;
>> > +
>> > +???????????????req->rx.buf = rx_buf;
>> > +???????????????req->rx.len = rx_len;
>> > +???????}
>> > +
>>
>> As long as we're punting on DMA support, could we do the whole thing in
>> a single allocation instead of three?? It'd add some pointer arithmetic,
>> but would also simplify the error-handling/deallocation paths a bit.
>>
>> Or, given that the one controller we're currently supporting has a
>> hardware limit of 32 bytes per transfer anyway, maybe just inline
>> fixed-size rx/tx buffers into struct peci_request and have callers keep
>> them on the stack instead of kmalloc()-ing them?
>
>I disagree on error handling (it's not complicated) - however, one argument for
>doing a single alloc (or moving the buffers as fixed-size arrays inside struct
>peci_request) is that single kzalloc is going to be faster than 3. But I don't
>expect it to show up on any perf profiles for now (since peci-wire interface is
>not a speed demon).
>
>I wanted to avoid defining max size for TX and RX in peci-core.
>Do you have a strong opinion against multiple alloc? If yes, I can go with
>fixed-size arrays inside struct peci_request.
>
As is it's certainly not terribly complicated in an absolute sense, but
comparatively speaking the cleanup path for a single allocation is still
simpler, no?
Making it more efficient would definitely be a nice benefit too (perhaps
a more significant one) -- in a typical deployment I'd guess this code
path will see roughly socket_count + total_core_count executions per
second? On a big multi-socket system that could end up being a
reasonably large number (>100), so while it may not end up as a major
hot spot in a system-wide profile, it seems like it might be worth
having it do 1/3 as many allocations if it's reasonably easy to do.
(And while I don't think the kernel is generally at fault for this, from
what I've seen of OpenBMC as a whole I think it might benefit from a bit
more overall frugality with CPU cycles.)
As for a fixed max request size and inlined buffers, I definitely
understand not wanting to put a cap on that in the generic PECI core --
and actually, looking at the peci-npcm code from previous iterations of
the PECI patchset, it looks like the Nuvoton hardware has significantly
larger size limits (127 bytes if I'm reading things right) that might be
a bit bulky for on-stack allocation. So while that's appealing
efficiency-wise and (IMO) aesthetically, perhaps it's not ultimately
real viable.
Hmm, though (thinking out loud) I suppose we could also get down to a
zero-allocation common case by having the driver hold on to a request
struct and reuse it across transfers, given that they're all serialized
by a mutex anyway?
>Thanks
>-Iwona
>
>>
>> > +???????return req;
>> > +
>> > +err_free_tx:
>> > +???????kfree(req->tx.buf);
>> > +err_free_req:
>> > +???????kfree(req);
>> > +
>> > +???????return NULL;
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(peci_request_alloc, PECI);
>> > +
>> > +/**
>> > + * peci_request_free() - free peci_request
>> > + * @req: the PECI request to be freed
>> > + */
>> > +void peci_request_free(struct peci_request *req)
>> > +{
>> > +???????kfree(req->rx.buf);
>> > +???????kfree(req->tx.buf);
>> > +???????kfree(req);
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
>> > diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c
>> > index 36c5e2a18a92..db9ef05776e3 100644
>> > --- a/drivers/peci/sysfs.c
>> > +++ b/drivers/peci/sysfs.c
>> > @@ -1,6 +1,8 @@
>> > // SPDX-License-Identifier: GPL-2.0-only
>> > // Copyright (c) 2021 Intel Corporation
>> >
>> > +#include <linux/device.h>
>> > +#include <linux/kernel.h>
>> > #include <linux/peci.h>
>> >
>> > #include "internal.h"
>> > @@ -46,3 +48,35 @@ const struct attribute_group *peci_bus_groups[] = {
>> > ????????&peci_bus_group,
>> > ????????NULL
>> > };
>> > +
>> > +static ssize_t remove_store(struct device *dev, struct device_attribute
>> > *attr,
>> > +?????????????????????????? const char *buf, size_t count)
>> > +{
>> > +???????struct peci_device *device = to_peci_device(dev);
>> > +???????bool res;
>> > +???????int ret;
>> > +
>> > +???????ret = kstrtobool(buf, &res);
>> > +???????if (ret)
>> > +???????????????return ret;
>> > +
>> > +???????if (res && device_remove_file_self(dev, attr))
>> > +???????????????peci_device_destroy(device);
>> > +
>> > +???????return count;
>> > +}
>> > +static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0200, NULL, remove_store);
>> > +
>> > +static struct attribute *peci_device_attrs[] = {
>> > +???????&dev_attr_remove.attr,
>> > +???????NULL
>> > +};
>> > +
>> > +static const struct attribute_group peci_device_group = {
>> > +???????.attrs = peci_device_attrs,
>> > +};
>> > +
>> > +const struct attribute_group *peci_device_groups[] = {
>> > +???????&peci_device_group,
>> > +???????NULL
>> > +};
>> > --
>> > 2.31.1
>
On Thu, Jul 29, 2021 at 04:17:06PM CDT, Winiarska, Iwona wrote:
>On Tue, 2021-07-27 at 20:10 +0000, Zev Weiss wrote:
>> On Mon, Jul 12, 2021 at 05:04:42PM CDT, Iwona Winiarska wrote:
>> > Here we're adding support for PECI device drivers, which unlike PECI
>> > controller drivers are actually able to provide functionalities to
>> > userspace.
>> >
>> > We're also extending peci_request API to allow querying more details
>> > about PECI device (e.g. model/family), that's going to be used to find
>> > a compatible peci_driver.
>> >
>> > Signed-off-by: Iwona Winiarska <[email protected]>
>> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
>> > ---
>> > drivers/peci/Kconfig??? |?? 1 +
>> > drivers/peci/core.c???? |? 49 +++++++++
>> > drivers/peci/device.c?? |? 99 ++++++++++++++++++
>> > drivers/peci/internal.h |? 75 ++++++++++++++
>> > drivers/peci/request.c? | 217 ++++++++++++++++++++++++++++++++++++++++
>> > include/linux/peci.h??? |? 19 ++++
>> > lib/Kconfig???????????? |?? 2 +-
>> > 7 files changed, 461 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
>> > index 0d0ee8009713..27c31535843c 100644
>> > --- a/drivers/peci/Kconfig
>> > +++ b/drivers/peci/Kconfig
>> > @@ -2,6 +2,7 @@
>> >
>> > menuconfig PECI
>> > ????????tristate "PECI support"
>> > +???????select GENERIC_LIB_X86
>> > ????????help
>> > ????????? The Platform Environment Control Interface (PECI) is an interface
>> > ????????? that provides a communication channel to Intel processors and
>> > diff --git a/drivers/peci/core.c b/drivers/peci/core.c
>> > index ae7a9572cdf3..94426b7f2618 100644
>> > --- a/drivers/peci/core.c
>> > +++ b/drivers/peci/core.c
>> > @@ -143,8 +143,57 @@ void peci_controller_remove(struct peci_controller
>> > *controller)
>> > }
>> > EXPORT_SYMBOL_NS_GPL(peci_controller_remove, PECI);
>> >
>> > +static const struct peci_device_id *
>> > +peci_bus_match_device_id(const struct peci_device_id *id, struct
>> > peci_device *device)
>> > +{
>> > +???????while (id->family != 0) {
>> > +???????????????if (id->family == device->info.family &&
>> > +?????????????????? id->model == device->info.model)
>> > +???????????????????????return id;
>> > +???????????????id++;
>> > +???????}
>> > +
>> > +???????return NULL;
>> > +}
>> > +
>> > +static int peci_bus_device_match(struct device *dev, struct device_driver
>> > *drv)
>> > +{
>> > +???????struct peci_device *device = to_peci_device(dev);
>> > +???????struct peci_driver *peci_drv = to_peci_driver(drv);
>> > +
>> > +???????if (dev->type != &peci_device_type)
>> > +???????????????return 0;
>> > +
>> > +???????if (peci_bus_match_device_id(peci_drv->id_table, device))
>> > +???????????????return 1;
>> > +
>> > +???????return 0;
>> > +}
>> > +
>> > +static int peci_bus_device_probe(struct device *dev)
>> > +{
>> > +???????struct peci_device *device = to_peci_device(dev);
>> > +???????struct peci_driver *driver = to_peci_driver(dev->driver);
>> > +
>> > +???????return driver->probe(device, peci_bus_match_device_id(driver-
>> > >id_table, device));
>> > +}
>> > +
>> > +static int peci_bus_device_remove(struct device *dev)
>> > +{
>> > +???????struct peci_device *device = to_peci_device(dev);
>> > +???????struct peci_driver *driver = to_peci_driver(dev->driver);
>> > +
>> > +???????if (driver->remove)
>> > +???????????????driver->remove(device);
>> > +
>> > +???????return 0;
>> > +}
>> > +
>> > struct bus_type peci_bus_type = {
>> > ????????.name???????????= "peci",
>> > +???????.match??????????= peci_bus_device_match,
>> > +???????.probe??????????= peci_bus_device_probe,
>> > +???????.remove?????????= peci_bus_device_remove,
>> > ????????.bus_groups?????= peci_bus_groups,
>> > };
>> >
>> > diff --git a/drivers/peci/device.c b/drivers/peci/device.c
>> > index 1124862211e2..8c4bd1ebbc29 100644
>> > --- a/drivers/peci/device.c
>> > +++ b/drivers/peci/device.c
>> > @@ -1,11 +1,79 @@
>> > // SPDX-License-Identifier: GPL-2.0-only
>> > // Copyright (c) 2018-2021 Intel Corporation
>> >
>> > +#include <linux/bitfield.h>
>> > #include <linux/peci.h>
>> > #include <linux/slab.h>
>> > +#include <linux/x86/cpu.h>
>> >
>> > #include "internal.h"
>> >
>> > +#define REVISION_NUM_MASK GENMASK(15, 8)
>> > +static int peci_get_revision(struct peci_device *device, u8 *revision)
>> > +{
>> > +???????struct peci_request *req;
>> > +???????u64 dib;
>> > +
>> > +???????req = peci_get_dib(device);
>> > +???????if (IS_ERR(req))
>> > +???????????????return PTR_ERR(req);
>> > +
>> > +???????dib = peci_request_data_dib(req);
>> > +???????if (dib == 0) {
>> > +???????????????peci_request_free(req);
>> > +???????????????return -EIO;
>>
>> Any particular reason to check for zero specifically here?? It looks
>> like that would be a case where the host CPU responds and everything's
>> otherwise fine, but the host just happened to send back a bunch of zeros
>> for whatever reason -- which may not be a valid PECI revision number,
>> but if it sent back a bunch of 0xff bytes instead wouldn't that be
>> equally invalid?
>
>The response with all 0's is possible (and defined) in certain device states. If
>that happens - we don't want to continue adding the device (with "invalid"
>revision 0), we just want to return error.
>
Okay, that's reasonable -- maybe worth a brief comment though.
>>
>> Also, given that the docs (the ones I have, at least) describe the DIB
>> as a collection of individual bytes, dealing with it as a combined u64
>> seems a bit confusing to me -- could we just return req->rx.buf[1]
>> instead?
>
>GetDIB returns 8-byte response, which is why we're treating it in this way
>(similar to other commands). We're pulling out the whole response and use
>FIELD_GET to obtain the data we need.
>
Sure -- but since the 8 bytes that GetDIB retrieves are a device info
byte, a revision number byte, and six reserved bytes (at least as of the
documentation I have access to), I'm not sure why we want to pack that
all up into a u64 only to unpack a single byte from it a moment later
with FIELD_GET(), when we've already got it in a convenient
array-of-bytes form (req->rx.buf). I could understand wanting a u64 if
the 8 bytes were all a single value, or if it had sub-fields that
spanned byte boundaries in awkward ways or something, but given the
format of the response data a byte array seems like the most natural way
of dealing with it.
I suppose it facilitates an easy zero check, but that could also be
written as !memchr_inv(req->rx.buf, 0, 8) in the non-u64 case.
>>
>> > +???????}
>> > +
>> > +???????*revision = FIELD_GET(REVISION_NUM_MASK, dib);
>> > +
>> > +???????peci_request_free(req);
>> > +
>> > +???????return 0;
>> > +}
>> > +
>> > +static int peci_get_cpu_id(struct peci_device *device, u32 *cpu_id)
>> > +{
>> > +???????struct peci_request *req;
>> > +???????int ret;
>> > +
>> > +???????req = peci_pkg_cfg_readl(device, PECI_PCS_PKG_ID,
>> > PECI_PKG_ID_CPU_ID);
>> > +???????if (IS_ERR(req))
>> > +???????????????return PTR_ERR(req);
>> > +
>> > +???????ret = peci_request_status(req);
>> > +???????if (ret)
>> > +???????????????goto out_req_free;
>> > +
>> > +???????*cpu_id = peci_request_data_readl(req);
>> > +out_req_free:
>>
>> As suggested on patch #8, I think it might be cleaner to stack-allocate
>> struct peci_request, which would obviate the need for explicit free
>> calls in functions like this and hence might simplify it away entirely,
>> but if this does remain like this we could just do
>>
>> ????????if (!ret)
>> ????????????????*cpu_id = peci_request_data_readl(req);
>>
>> instead of using a goto to skip a single line.
>
>Please, see my response on patch 8.
>
>I would prefer to operate on allocated objects rather than on local variables in
>case of peci requests.
>
>>
>> > +???????peci_request_free(req);
>> > +
>> > +???????return ret;
>> > +}
>> > +
>> > +static int peci_device_info_init(struct peci_device *device)
>> > +{
>> > +???????u8 revision;
>> > +???????u32 cpu_id;
>> > +???????int ret;
>> > +
>> > +???????ret = peci_get_cpu_id(device, &cpu_id);
>> > +???????if (ret)
>> > +???????????????return ret;
>> > +
>> > +???????device->info.family = x86_family(cpu_id);
>> > +???????device->info.model = x86_model(cpu_id);
>> > +
>> > +???????ret = peci_get_revision(device, &revision);
>> > +???????if (ret)
>> > +???????????????return ret;
>> > +???????device->info.peci_revision = revision;
>> > +
>> > +???????device->info.socket_id = device->addr - PECI_BASE_ADDR;
>> > +
>> > +???????return 0;
>> > +}
>> > +
>> > static int peci_detect(struct peci_controller *controller, u8 addr)
>> > {
>> > ????????struct peci_request *req;
>> > @@ -75,6 +143,10 @@ int peci_device_create(struct peci_controller
>> > *controller, u8 addr)
>> > ????????device->dev.bus = &peci_bus_type;
>> > ????????device->dev.type = &peci_device_type;
>> >
>> > +???????ret = peci_device_info_init(device);
>> > +???????if (ret)
>> > +???????????????goto err_free;
>> > +
>> > ????????ret = dev_set_name(&device->dev, "%d-%02x", controller->id, device-
>> > >addr);
>> > ????????if (ret)
>> > ????????????????goto err_free;
>> > @@ -98,6 +170,33 @@ void peci_device_destroy(struct peci_device *device)
>> > ????????device_unregister(&device->dev);
>> > }
>> >
>> > +int __peci_driver_register(struct peci_driver *driver, struct module
>> > *owner,
>> > +????????????????????????? const char *mod_name)
>> > +{
>> > +???????driver->driver.bus = &peci_bus_type;
>> > +???????driver->driver.owner = owner;
>> > +???????driver->driver.mod_name = mod_name;
>> > +
>> > +???????if (!driver->probe) {
>> > +???????????????pr_err("peci: trying to register driver without probe
>> > callback\n");
>> > +???????????????return -EINVAL;
>> > +???????}
>> > +
>> > +???????if (!driver->id_table) {
>> > +???????????????pr_err("peci: trying to register driver without device id
>> > table\n");
>> > +???????????????return -EINVAL;
>> > +???????}
>> > +
>> > +???????return driver_register(&driver->driver);
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(__peci_driver_register, PECI);
>> > +
>> > +void peci_driver_unregister(struct peci_driver *driver)
>> > +{
>> > +???????driver_unregister(&driver->driver);
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(peci_driver_unregister, PECI);
>> > +
>> > static void peci_device_release(struct device *dev)
>> > {
>> > ????????struct peci_device *device = to_peci_device(dev);
>> > diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
>> > index 6b139adaf6b8..c891c93e077a 100644
>> > --- a/drivers/peci/internal.h
>> > +++ b/drivers/peci/internal.h
>> > @@ -19,6 +19,34 @@ struct peci_request;
>> > struct peci_request *peci_request_alloc(struct peci_device *device, u8
>> > tx_len, u8 rx_len);
>> > void peci_request_free(struct peci_request *req);
>> >
>> > +int peci_request_status(struct peci_request *req);
>> > +u64 peci_request_data_dib(struct peci_request *req);
>> > +
>> > +u8 peci_request_data_readb(struct peci_request *req);
>> > +u16 peci_request_data_readw(struct peci_request *req);
>> > +u32 peci_request_data_readl(struct peci_request *req);
>> > +u64 peci_request_data_readq(struct peci_request *req);
>> > +
>> > +struct peci_request *peci_get_dib(struct peci_device *device);
>> > +struct peci_request *peci_get_temp(struct peci_device *device);
>> > +
>> > +struct peci_request *peci_pkg_cfg_readb(struct peci_device *device, u8
>> > index, u16 param);
>> > +struct peci_request *peci_pkg_cfg_readw(struct peci_device *device, u8
>> > index, u16 param);
>> > +struct peci_request *peci_pkg_cfg_readl(struct peci_device *device, u8
>> > index, u16 param);
>> > +struct peci_request *peci_pkg_cfg_readq(struct peci_device *device, u8
>> > index, u16 param);
>> > +
>> > +/**
>> > + * struct peci_device_id - PECI device data to match
>> > + * @data: pointer to driver private data specific to device
>> > + * @family: device family
>> > + * @model: device model
>> > + */
>> > +struct peci_device_id {
>> > +???????const void *data;
>> > +???????u16 family;
>> > +???????u8 model;
>> > +};
>> > +
>> > extern struct device_type peci_device_type;
>> > extern const struct attribute_group *peci_device_groups[];
>> >
>> > @@ -28,6 +56,53 @@ void peci_device_destroy(struct peci_device *device);
>> > extern struct bus_type peci_bus_type;
>> > extern const struct attribute_group *peci_bus_groups[];
>> >
>> > +/**
>> > + * struct peci_driver - PECI driver
>> > + * @driver: inherit device driver
>> > + * @probe: probe callback
>> > + * @remove: remove callback
>> > + * @id_table: PECI device match table to decide which device to bind
>> > + */
>> > +struct peci_driver {
>> > +???????struct device_driver driver;
>> > +???????int (*probe)(struct peci_device *device, const struct peci_device_id
>> > *id);
>> > +???????void (*remove)(struct peci_device *device);
>> > +???????const struct peci_device_id *id_table;
>> > +};
>> > +
>> > +static inline struct peci_driver *to_peci_driver(struct device_driver *d)
>> > +{
>> > +???????return container_of(d, struct peci_driver, driver);
>> > +}
>> > +
>> > +int __peci_driver_register(struct peci_driver *driver, struct module
>> > *owner,
>> > +????????????????????????? const char *mod_name);
>> > +/**
>> > + * peci_driver_register() - register PECI driver
>> > + * @driver: the driver to be registered
>> > + * @owner: owner module of the driver being registered
>> > + * @mod_name: module name string
>> > + *
>> > + * PECI drivers that don't need to do anything special in module init
>> > should
>> > + * use the convenience "module_peci_driver" macro instead
>> > + *
>> > + * Return: zero on success, else a negative error code.
>> > + */
>> > +#define peci_driver_register(driver) \
>> > +???????__peci_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
>> > +void peci_driver_unregister(struct peci_driver *driver);
>> > +
>> > +/**
>> > + * module_peci_driver() - Helper macro for registering a modular PECI
>> > driver
>> > + * @__peci_driver: peci_driver struct
>> > + *
>> > + * Helper macro for PECI drivers which do not do anything special in module
>> > + * init/exit. This eliminates a lot of boilerplate. Each module may only
>> > + * use this macro once, and calling it replaces module_init() and
>> > module_exit()
>> > + */
>> > +#define module_peci_driver(__peci_driver) \
>> > +???????module_driver(__peci_driver, peci_driver_register,
>> > peci_driver_unregister)
>> > +
>> > extern struct device_type peci_controller_type;
>> >
>> > int peci_controller_scan_devices(struct peci_controller *controller);
>> > diff --git a/drivers/peci/request.c b/drivers/peci/request.c
>> > index 78cee51dfae1..48354455b554 100644
>> > --- a/drivers/peci/request.c
>> > +++ b/drivers/peci/request.c
>> > @@ -1,13 +1,142 @@
>> > // SPDX-License-Identifier: GPL-2.0-only
>> > // Copyright (c) 2021 Intel Corporation
>> >
>> > +#include <linux/bug.h>
>> > #include <linux/export.h>
>> > #include <linux/peci.h>
>> > #include <linux/slab.h>
>> > #include <linux/types.h>
>> >
>> > +#include <asm/unaligned.h>
>> > +
>> > #include "internal.h"
>> >
>> > +#define PECI_GET_DIB_CMD???????????????0xf7
>> > +#define? PECI_GET_DIB_WR_LEN???????????1
>> > +#define? PECI_GET_DIB_RD_LEN???????????8
>> > +
>> > +#define PECI_RDPKGCFG_CMD??????????????0xa1
>> > +#define? PECI_RDPKGCFG_WRITE_LEN???????5
>> > +#define? PECI_RDPKGCFG_READ_LEN_BASE???1
>> > +#define PECI_WRPKGCFG_CMD??????????????0xa5
>> > +#define? PECI_WRPKGCFG_WRITE_LEN_BASE??6
>> > +#define? PECI_WRPKGCFG_READ_LEN????????????????1
>> > +
>> > +/* Device Specific Completion Code (CC) Definition */
>> > +#define PECI_CC_SUCCESS????????????????????????????????0x40
>> > +#define PECI_CC_NEED_RETRY?????????????????????0x80
>> > +#define PECI_CC_OUT_OF_RESOURCE????????????????????????0x81
>> > +#define PECI_CC_UNAVAIL_RESOURCE???????????????0x82
>> > +#define PECI_CC_INVALID_REQ????????????????????0x90
>> > +#define PECI_CC_MCA_ERROR??????????????????????0x91
>> > +#define PECI_CC_CATASTROPHIC_MCA_ERROR?????????0x93
>> > +#define PECI_CC_FATAL_MCA_ERROR????????????????????????0x94
>> > +#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB????????????????0x98
>> > +#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR???0x9B
>> > +#define PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA????0x9C
>> > +
>> > +#define PECI_RETRY_BIT?????????????????BIT(0)
>> > +
>> > +#define PECI_RETRY_TIMEOUT?????????????msecs_to_jiffies(700)
>> > +#define PECI_RETRY_INTERVAL_MIN????????????????msecs_to_jiffies(1)
>> > +#define PECI_RETRY_INTERVAL_MAX????????????????msecs_to_jiffies(128)
>> > +
>> > +static u8 peci_request_data_cc(struct peci_request *req)
>> > +{
>> > +???????return req->rx.buf[0];
>> > +}
>> > +
>> > +/**
>> > + * peci_request_status() - return -errno based on PECI completion code
>> > + * @req: the PECI request that contains response data with completion code
>> > + *
>> > + * It can't be used for Ping(), GetDIB() and GetTemp() - for those commands
>> > we
>> > + * don't expect completion code in the response.
>> > + *
>> > + * Return: -errno
>> > + */
>> > +int peci_request_status(struct peci_request *req)
>> > +{
>> > +???????u8 cc = peci_request_data_cc(req);
>> > +
>> > +???????if (cc != PECI_CC_SUCCESS)
>> > +???????????????dev_dbg(&req->device->dev, "ret: %#02x\n", cc);
>> > +
>> > +???????switch (cc) {
>> > +???????case PECI_CC_SUCCESS:
>> > +???????????????return 0;
>> > +???????case PECI_CC_NEED_RETRY:
>> > +???????case PECI_CC_OUT_OF_RESOURCE:
>> > +???????case PECI_CC_UNAVAIL_RESOURCE:
>> > +???????????????return -EAGAIN;
>> > +???????case PECI_CC_INVALID_REQ:
>> > +???????????????return -EINVAL;
>> > +???????case PECI_CC_MCA_ERROR:
>> > +???????case PECI_CC_CATASTROPHIC_MCA_ERROR:
>> > +???????case PECI_CC_FATAL_MCA_ERROR:
>> > +???????case PECI_CC_PARITY_ERR_GPSB_OR_PMSB:
>> > +???????case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_IERR:
>> > +???????case PECI_CC_PARITY_ERR_GPSB_OR_PMSB_MCA:
>> > +???????????????return -EIO;
>> > +???????}
>> > +
>> > +???????WARN_ONCE(1, "Unknown PECI completion code: %#02x\n", cc);
>> > +
>> > +???????return -EIO;
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(peci_request_status, PECI);
>> > +
>> > +static int peci_request_xfer(struct peci_request *req)
>> > +{
>> > +???????struct peci_device *device = req->device;
>> > +???????struct peci_controller *controller = device->controller;
>> > +???????int ret;
>> > +
>> > +???????mutex_lock(&controller->bus_lock);
>> > +???????ret = controller->xfer(controller, device->addr, req);
>> > +???????mutex_unlock(&controller->bus_lock);
>> > +
>> > +???????return ret;
>> > +}
>> > +
>> > +static int peci_request_xfer_retry(struct peci_request *req)
>> > +{
>> > +???????long wait_interval = PECI_RETRY_INTERVAL_MIN;
>> > +???????struct peci_device *device = req->device;
>> > +???????struct peci_controller *controller = device->controller;
>> > +???????unsigned long start = jiffies;
>> > +???????int ret;
>> > +
>> > +???????/* Don't try to use it for ping */
>> > +???????if (WARN_ON(!req->rx.buf))
>> > +???????????????return 0;
>> > +
>> > +???????do {
>> > +???????????????ret = peci_request_xfer(req);
>> > +???????????????if (ret) {
>> > +???????????????????????dev_dbg(&controller->dev, "xfer error: %d\n", ret);
>> > +???????????????????????return ret;
>> > +???????????????}
>> > +
>> > +???????????????if (peci_request_status(req) != -EAGAIN)
>> > +???????????????????????return 0;
>> > +
>> > +???????????????/* Set the retry bit to indicate a retry attempt */
>> > +???????????????req->tx.buf[1] |= PECI_RETRY_BIT;
>> > +
>> > +???????????????if (schedule_timeout_interruptible(wait_interval))
>> > +???????????????????????return -ERESTARTSYS;
>> > +
>> > +???????????????wait_interval *= 2;
>> > +???????????????if (wait_interval > PECI_RETRY_INTERVAL_MAX)
>> > +???????????????????????wait_interval = PECI_RETRY_INTERVAL_MAX;
>>
>> wait_interval = min(wait_interval * 2, PECI_RETRY_INTERVAL_MAX) ?
>
>Ack.
>
>>
>> > +???????} while (time_before(jiffies, start + PECI_RETRY_TIMEOUT));
>> > +
>> > +???????dev_dbg(&controller->dev, "request timed out\n");
>> > +
>> > +???????return -ETIMEDOUT;
>> > +}
>> > +
>> > /**
>> > ?* peci_request_alloc() - allocate &struct peci_request with buffers with
>> > given lengths
>> > ?* @device: PECI device to which request is going to be sent
>> > @@ -72,3 +201,91 @@ void peci_request_free(struct peci_request *req)
>> > ????????kfree(req);
>> > }
>> > EXPORT_SYMBOL_NS_GPL(peci_request_free, PECI);
>> > +
>> > +struct peci_request *peci_get_dib(struct peci_device *device)
>> > +{
>> > +???????struct peci_request *req;
>> > +???????int ret;
>> > +
>> > +???????req = peci_request_alloc(device, PECI_GET_DIB_WR_LEN,
>> > PECI_GET_DIB_RD_LEN);
>> > +???????if (!req)
>> > +???????????????return ERR_PTR(-ENOMEM);
>> > +
>> > +???????req->tx.buf[0] = PECI_GET_DIB_CMD;
>> > +
>> > +???????ret = peci_request_xfer(req);
>> > +???????if (ret) {
>> > +???????????????peci_request_free(req);
>> > +???????????????return ERR_PTR(ret);
>> > +???????}
>> > +
>> > +???????return req;
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(peci_get_dib, PECI);
>> > +
>> > +static struct peci_request *
>> > +__pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
>> > +{
>> > +???????struct peci_request *req;
>> > +???????int ret;
>> > +
>> > +???????req = peci_request_alloc(device, PECI_RDPKGCFG_WRITE_LEN,
>> > +??????????????????????????????? PECI_RDPKGCFG_READ_LEN_BASE + len);
>> > +???????if (!req)
>> > +???????????????return ERR_PTR(-ENOMEM);
>> > +
>> > +???????req->tx.buf[0] = PECI_RDPKGCFG_CMD;
>> > +???????req->tx.buf[1] = 0;
>> > +???????req->tx.buf[2] = index;
>> > +???????put_unaligned_le16(param, &req->tx.buf[3]);
>> > +
>> > +???????ret = peci_request_xfer_retry(req);
>> > +???????if (ret) {
>> > +???????????????peci_request_free(req);
>> > +???????????????return ERR_PTR(ret);
>> > +???????}
>> > +
>> > +???????return req;
>> > +}
>> > +
>> > +u8 peci_request_data_readb(struct peci_request *req)
>> > +{
>> > +???????return req->rx.buf[1];
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(peci_request_data_readb, PECI);
>> > +
>> > +u16 peci_request_data_readw(struct peci_request *req)
>> > +{
>> > +???????return get_unaligned_le16(&req->rx.buf[1]);
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(peci_request_data_readw, PECI);
>> > +
>> > +u32 peci_request_data_readl(struct peci_request *req)
>> > +{
>> > +???????return get_unaligned_le32(&req->rx.buf[1]);
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(peci_request_data_readl, PECI);
>> > +
>> > +u64 peci_request_data_readq(struct peci_request *req)
>> > +{
>> > +???????return get_unaligned_le64(&req->rx.buf[1]);
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(peci_request_data_readq, PECI);
>> > +
>> > +u64 peci_request_data_dib(struct peci_request *req)
>> > +{
>> > +???????return get_unaligned_le64(&req->rx.buf[0]);
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(peci_request_data_dib, PECI);
>> > +
>> > +#define __read_pkg_config(x, type) \
>> > +struct peci_request *peci_pkg_cfg_##x(struct peci_device *device, u8 index,
>> > u16 param) \
>> > +{ \
>> > +???????return __pkg_cfg_read(device, index, param, sizeof(type)); \
>> > +} \
>>
>> Is there a reason for this particular API?? I'd think a more natural one
>> that would offload a bit of boilerplate from callers would look more like
>>
>> int peci_pkg_cfg_##x(struct peci_device *device, u8 index, u16 param, type
>> *outp),
>>
>> returning peci_request_status() and writing the requested data to *outp
>> if that status is zero.
>
>We provide a consistent lower-level API for "internal" usage (for code in
>drivers/peci), operating on requests and allowing access to full request,
>including completion code, etc.
>Then - we wrap that with "external" API (e.g. include/linux/peci-cpu.h) which is
>the "more natural" one - it pulls out the necessary data from requests, deals
>with error handling in an appropriate way converting completion codes to errno
>values (abstracting away the PECI-specific details).
>
>>
>> > +EXPORT_SYMBOL_NS_GPL(peci_pkg_cfg_##x, PECI)
>> > +
>> > +__read_pkg_config(readb, u8);
>> > +__read_pkg_config(readw, u16);
>> > +__read_pkg_config(readl, u32);
>> > +__read_pkg_config(readq, u64);
>> > diff --git a/include/linux/peci.h b/include/linux/peci.h
>> > index cdf3008321fd..f9f37b874011 100644
>> > --- a/include/linux/peci.h
>> > +++ b/include/linux/peci.h
>> > @@ -9,6 +9,14 @@
>> > #include <linux/mutex.h>
>> > #include <linux/types.h>
>> >
>> > +#define PECI_PCS_PKG_ID????????????????????????0? /* Package Identifier
>> > Read */
>> > +#define? PECI_PKG_ID_CPU_ID????????????0x0000? /* CPUID Info */
>> > +#define? PECI_PKG_ID_PLATFORM_ID???????0x0001? /* Platform ID */
>> > +#define? PECI_PKG_ID_DEVICE_ID?????????0x0002? /* Uncore Device ID */
>> > +#define? PECI_PKG_ID_MAX_THREAD_ID?????0x0003? /* Max Thread ID */
>> > +#define? PECI_PKG_ID_MICROCODE_REV?????0x0004? /* CPU Microcode Update
>> > Revision */
>> > +#define? PECI_PKG_ID_MCA_ERROR_LOG?????0x0005? /* Machine Check Status */
>> > +
>> > struct peci_request;
>> >
>> > /**
>> > @@ -41,6 +49,11 @@ static inline struct peci_controller
>> > *to_peci_controller(void *d)
>> > ?* struct peci_device - PECI device
>> > ?* @dev: device object to register PECI device to the device model
>> > ?* @controller: manages the bus segment hosting this PECI device
>> > + * @info: PECI device characteristics
>> > + * @info.family: device family
>> > + * @info.model: device model
>> > + * @info.peci_revision: PECI revision supported by the PECI device
>> > + * @info.socket_id: the socket ID represented by the PECI device
>> > ?* @addr: address used on the PECI bus connected to the parent controller
>> > ?*
>> > ?* A peci_device identifies a single device (i.e. CPU) connected to a PECI
>> > bus.
>> > @@ -50,6 +63,12 @@ static inline struct peci_controller
>> > *to_peci_controller(void *d)
>> > struct peci_device {
>> > ????????struct device dev;
>> > ????????struct peci_controller *controller;
>> > +???????struct {
>> > +???????????????u16 family;
>> > +???????????????u8 model;
>> > +???????????????u8 peci_revision;
>>
>> This field gets set but doesn't seem to end up used anywhere; is it
>> useful?
>
>The idea was to have mechanism to validate the revision number retrieved via
>GetDIB with revision expected by the driver (since it uses commands that are
>PECI revision dependent), and warn if there's a mismatch.
>It seems I dropped the "validate and warn" part when doing the split on the
>series. Good catch - I'll fix this in v2.
>
>Thanks
>-Iwona
>
>>
>> > +???????????????u8 socket_id;
>> > +???????} info;
>> > ????????u8 addr;
>> > };
>> >
>> > diff --git a/lib/Kconfig b/lib/Kconfig
>> > index cc28bc1f2d84..a74e6c0fa75c 100644
>> > --- a/lib/Kconfig
>> > +++ b/lib/Kconfig
>> > @@ -721,5 +721,5 @@ config ASN1_ENCODER
>> >
>> > config GENERIC_LIB_X86
>> > ????????bool
>> > -???????depends on X86
>> > +???????depends on X86 || PECI
>> > ????????default n
>> > --
>> > 2.31.1
>
On Thu, 2021-07-29 at 20:50 +0000, Zev Weiss wrote:
> On Thu, Jul 29, 2021 at 01:55:19PM CDT, Winiarska, Iwona wrote:
> > On Tue, 2021-07-27 at 17:49 +0000, Zev Weiss wrote:
> > > On Mon, Jul 12, 2021 at 05:04:41PM CDT, Iwona Winiarska wrote:
> > > >
> > > > +
> > > > +static int peci_detect(struct peci_controller *controller, u8 addr)
> > > > +{
> > > > + struct peci_request *req;
> > > > + int ret;
> > > > +
> > > > + req = peci_request_alloc(NULL, 0, 0);
> > > > + if (!req)
> > > > + return -ENOMEM;
> > > > +
> > >
> > > Might be worth a brief comment here noting that an empty request happens
> > > to be the format of a PECI ping command (and/or change the name of the
> > > function to peci_ping()).
> >
> > I'll add a comment:
> > "We are using PECI Ping command to detect presence of PECI devices."
> >
>
> Well, what I was more aiming to get at was that to someone not
> intimately familiar with the PECI protocol it's not immediately obvious
> from the code that it in fact implements a ping (there's no 'msg->cmd =
> PECI_CMD_PING' or anything), so I was hoping for something that would
> just make that slightly more explicit.
/*
* PECI Ping is a command encoded by tx_len = 0, rx_len = 0.
* We expect correct Write FCS if the device at the target address is
* able to respond.
*/
I would like to avoid doing a peci_ping wrapper that doesn't operate on
peci_device - note that at this point we don't have a struct peci_device yet,
we're using ping to figure out whether we should create one.
> > > > +
> > > > +/**
> > > > + * peci_request_alloc() - allocate &struct peci_request with buffers
> > > > with
> > > > given lengths
> > > > + * @device: PECI device to which request is going to be sent
> > > > + * @tx_len: requested TX buffer length
> > > > + * @rx_len: requested RX buffer length
> > > > + *
> > > > + * Return: A pointer to a newly allocated &struct peci_request on
> > > > success
> > > > or NULL otherwise.
> > > > + */
> > > > +struct peci_request *peci_request_alloc(struct peci_device *device, u8
> > > > tx_len, u8 rx_len)
> > > > +{
> > > > + struct peci_request *req;
> > > > + u8 *tx_buf, *rx_buf;
> > > > +
> > > > + req = kzalloc(sizeof(*req), GFP_KERNEL);
> > > > + if (!req)
> > > > + return NULL;
> > > > +
> > > > + req->device = device;
> > > > +
> > > > + /*
> > > > + * PECI controllers that we are using now don't support DMA,
> > > > this
> > > > + * should be converted to DMA API once support for controllers
> > > > that
> > > > do
> > > > + * allow it is added to avoid an extra copy.
> > > > + */
> > > > + if (tx_len) {
> > > > + tx_buf = kzalloc(tx_len, GFP_KERNEL);
> > > > + if (!tx_buf)
> > > > + goto err_free_req;
> > > > +
> > > > + req->tx.buf = tx_buf;
> > > > + req->tx.len = tx_len;
> > > > + }
> > > > +
> > > > + if (rx_len) {
> > > > + rx_buf = kzalloc(rx_len, GFP_KERNEL);
> > > > + if (!rx_buf)
> > > > + goto err_free_tx;
> > > > +
> > > > + req->rx.buf = rx_buf;
> > > > + req->rx.len = rx_len;
> > > > + }
> > > > +
> > >
> > > As long as we're punting on DMA support, could we do the whole thing in
> > > a single allocation instead of three? It'd add some pointer arithmetic,
> > > but would also simplify the error-handling/deallocation paths a bit.
> > >
> > > Or, given that the one controller we're currently supporting has a
> > > hardware limit of 32 bytes per transfer anyway, maybe just inline
> > > fixed-size rx/tx buffers into struct peci_request and have callers keep
> > > them on the stack instead of kmalloc()-ing them?
> >
> > I disagree on error handling (it's not complicated) - however, one argument
> > for
> > doing a single alloc (or moving the buffers as fixed-size arrays inside
> > struct
> > peci_request) is that single kzalloc is going to be faster than 3. But I
> > don't
> > expect it to show up on any perf profiles for now (since peci-wire interface
> > is
> > not a speed demon).
> >
> > I wanted to avoid defining max size for TX and RX in peci-core.
> > Do you have a strong opinion against multiple alloc? If yes, I can go with
> > fixed-size arrays inside struct peci_request.
> >
>
> As is it's certainly not terribly complicated in an absolute sense, but
> comparatively speaking the cleanup path for a single allocation is still
> simpler, no?
>
> Making it more efficient would definitely be a nice benefit too (perhaps
> a more significant one) -- in a typical deployment I'd guess this code
> path will see roughly socket_count + total_core_count executions per
> second? On a big multi-socket system that could end up being a
> reasonably large number (>100), so while it may not end up as a major
> hot spot in a system-wide profile, it seems like it might be worth
> having it do 1/3 as many allocations if it's reasonably easy to do.
> (And while I don't think the kernel is generally at fault for this, from
> what I've seen of OpenBMC as a whole I think it might benefit from a bit
> more overall frugality with CPU cycles.)
>
> As for a fixed max request size and inlined buffers, I definitely
> understand not wanting to put a cap on that in the generic PECI core --
> and actually, looking at the peci-npcm code from previous iterations of
> the PECI patchset, it looks like the Nuvoton hardware has significantly
> larger size limits (127 bytes if I'm reading things right) that might be
> a bit bulky for on-stack allocation. So while that's appealing
> efficiency-wise and (IMO) aesthetically, perhaps it's not ultimately
> real viable.
>
> Hmm, though (thinking out loud) I suppose we could also get down to a
> zero-allocation common case by having the driver hold on to a request
> struct and reuse it across transfers, given that they're all serialized
> by a mutex anyway?
With the "zero-allocation" case we still need some memory to copy the necessary
data from the "request area" (now "global" - per-controller).
After more consideration, I think this doesn't have to rely on controller
capabilities, we can just define a max value based on the commands we're using
and use that with single alloc (with rx and tx having fixed size arrays).
I'll change it in v2.
Thank you
-Iwona
>
On Tue, 2021-07-27 at 13:16 +0200, David Müller (ELSOFT AG) wrote:
> Iwona Winiarska wrote:
>
> > +static const struct peci_device_id peci_cpu_device_ids[] = {
> > + { /* Haswell Xeon */
> > + .family = 6,
> > + .model = INTEL_FAM6_HASWELL_X,
> > + .data = "hsx",
> > + },
> > + { /* Broadwell Xeon */
> > + .family = 6,
> > + .model = INTEL_FAM6_BROADWELL_X,
> > + .data = "bdx",
> > + },
> > + { /* Broadwell Xeon D */
> > + .family = 6,
> > + .model = INTEL_FAM6_BROADWELL_D,
> > + .data = "skxd",
>
> I think this should read "bdxd" as "skxd" does not exist in the
> cputemp/dimmtemp drivers.
It should be "bdxd" - I'll fix it in v2.
Thank you
-Iwona
On Thu, 2021-07-29 at 23:22 +0000, Zev Weiss wrote:
> On Thu, Jul 29, 2021 at 04:17:06PM CDT, Winiarska, Iwona wrote:
> > On Tue, 2021-07-27 at 20:10 +0000, Zev Weiss wrote:
> > > On Mon, Jul 12, 2021 at 05:04:42PM CDT, Iwona Winiarska wrote:
> > > >
> > > >
> > > > +#define REVISION_NUM_MASK GENMASK(15, 8)
> > > > +static int peci_get_revision(struct peci_device *device, u8 *revision)
> > > > +{
> > > > + struct peci_request *req;
> > > > + u64 dib;
> > > > +
> > > > + req = peci_get_dib(device);
> > > > + if (IS_ERR(req))
> > > > + return PTR_ERR(req);
> > > > +
> > > > + dib = peci_request_data_dib(req);
> > > > + if (dib == 0) {
> > > > + peci_request_free(req);
> > > > + return -EIO;
> > >
> > > Any particular reason to check for zero specifically here? It looks
> > > like that would be a case where the host CPU responds and everything's
> > > otherwise fine, but the host just happened to send back a bunch of zeros
> > > for whatever reason -- which may not be a valid PECI revision number,
> > > but if it sent back a bunch of 0xff bytes instead wouldn't that be
> > > equally invalid?
> >
> > The response with all 0's is possible (and defined) in certain device
> > states. If
> > that happens - we don't want to continue adding the device (with "invalid"
> > revision 0), we just want to return error.
> >
>
> Okay, that's reasonable -- maybe worth a brief comment though.
/*
* PECI device may be in a state where it is unable to return a proper DIB,
* in which case it returns 0 as DIB value.
* Let's treat this as an error to avoid carrying on with the detection using
* invalid revision.
*/
>
> > >
> > > Also, given that the docs (the ones I have, at least) describe the DIB
> > > as a collection of individual bytes, dealing with it as a combined u64
> > > seems a bit confusing to me -- could we just return req->rx.buf[1]
> > > instead?
> >
> > GetDIB returns 8-byte response, which is why we're treating it in this way
> > (similar to other commands). We're pulling out the whole response and use
> > FIELD_GET to obtain the data we need.
> >
>
> Sure -- but since the 8 bytes that GetDIB retrieves are a device info
> byte, a revision number byte, and six reserved bytes (at least as of the
> documentation I have access to), I'm not sure why we want to pack that
> all up into a u64 only to unpack a single byte from it a moment later
> with FIELD_GET(), when we've already got it in a convenient
> array-of-bytes form (req->rx.buf). I could understand wanting a u64 if
> the 8 bytes were all a single value, or if it had sub-fields that
> spanned byte boundaries in awkward ways or something, but given the
> format of the response data a byte array seems like the most natural way
> of dealing with it.
>
> I suppose it facilitates an easy zero check, but that could also be
> written as !memchr_inv(req->rx.buf, 0, 8) in the non-u64 case.
What you suggest would look like this:
static int peci_get_revision(struct peci_device *device, u8 *revision)
{
struct peci_request *req;
req = peci_get_dib(device);
if (IS_ERR(req))
return PTR_ERR(req);
if (!memchr_inv(req->rx.buf, 0, PECI_GET_DIB_RD_LEN)) {
peci_request_free(req);
return -EIO;
}
*revision = req->rx.buf[1];
peci_request_free(req);
return 0;
}
Note that the caller (device.c) now needs to know read length -
PECI_GET_DIB_RD_LEN (which currently is internal to the request.c) and is
digging into rx.buf directly (rather than using helper from internal.h).
By forcing the callers to use helper functions, we can make things consistent
across various commands and avoid exporting everything to everyone using a giant
header with all definitions.
I would prefer to keep peci_get_revision() as is.
Thanks
-Iwona
On Tue, 2021-07-27 at 21:33 +0000, Zev Weiss wrote:
> On Mon, Jul 12, 2021 at 05:04:43PM CDT, Iwona Winiarska wrote:
> > PECI is an interface that may be used by different types of devices.
> > Here we're adding a peci-cpu driver compatible with Intel processors.
> > The driver is responsible for handling auxiliary devices that can
> > subsequently be used by other drivers (e.g. hwmons).
> >
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> > MAINTAINERS | 1 +
> > drivers/peci/Kconfig | 15 ++
> > drivers/peci/Makefile | 2 +
> > drivers/peci/cpu.c | 347 +++++++++++++++++++++++++++++++++++++++
> > drivers/peci/device.c | 1 +
> > drivers/peci/internal.h | 27 +++
> > drivers/peci/request.c | 211 ++++++++++++++++++++++++
> > include/linux/peci-cpu.h | 38 +++++
> > include/linux/peci.h | 8 -
> > 9 files changed, 642 insertions(+), 8 deletions(-)
> > create mode 100644 drivers/peci/cpu.c
> > create mode 100644 include/linux/peci-cpu.h
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 4ba874afa2fa..f47b5f634293 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -14511,6 +14511,7 @@ L: [email protected] (moderated for non-
> > subscribers)
> > S: Supported
> > F: Documentation/devicetree/bindings/peci/
> > F: drivers/peci/
> > +F: include/linux/peci-cpu.h
> > F: include/linux/peci.h
> >
> > PENSANDO ETHERNET DRIVERS
> > diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
> > index 27c31535843c..9e17e06fda90 100644
> > --- a/drivers/peci/Kconfig
> > +++ b/drivers/peci/Kconfig
> > @@ -16,6 +16,21 @@ menuconfig PECI
> >
> > if PECI
> >
> > +config PECI_CPU
> > + tristate "PECI CPU"
> > + select AUXILIARY_BUS
> > + help
> > + This option enables peci-cpu driver for Intel processors. It is
> > + responsible for creating auxiliary devices that can subsequently
> > + be used by other drivers in order to perform various
> > + functionalities such as e.g. temperature monitoring.
> > +
> > + Additional drivers must be enabled in order to use the
> > functionality
> > + of the device.
> > +
> > + This driver can also be built as a module. If so, the module
> > + will be called peci-cpu.
> > +
> > source "drivers/peci/controller/Kconfig"
> >
> > endif # PECI
> > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
> > index 917f689e147a..7de18137e738 100644
> > --- a/drivers/peci/Makefile
> > +++ b/drivers/peci/Makefile
> > @@ -3,6 +3,8 @@
> > # Core functionality
> > peci-y := core.o request.o device.o sysfs.o
> > obj-$(CONFIG_PECI) += peci.o
> > +peci-cpu-y := cpu.o
> > +obj-$(CONFIG_PECI_CPU) += peci-cpu.o
> >
> > # Hardware specific bus drivers
> > obj-y += controller/
> > diff --git a/drivers/peci/cpu.c b/drivers/peci/cpu.c
> > new file mode 100644
> > index 000000000000..8d130a9a71ad
> > --- /dev/null
> > +++ b/drivers/peci/cpu.c
> > @@ -0,0 +1,347 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2021 Intel Corporation
> > +
> > +#include <linux/auxiliary_bus.h>
> > +#include <linux/module.h>
> > +#include <linux/peci.h>
> > +#include <linux/peci-cpu.h>
> > +#include <linux/slab.h>
> > +#include <linux/x86/intel-family.h>
> > +
> > +#include "internal.h"
> > +
> > +/**
> > + * peci_temp_read() - read the maximum die temperature from PECI target
> > device
> > + * @device: PECI device to which request is going to be sent
> > + * @temp_raw: where to store the read temperature
> > + *
> > + * It uses GetTemp PECI command.
> > + *
> > + * Return: 0 if succeeded, other values in case errors.
> > + */
> > +int peci_temp_read(struct peci_device *device, s16 *temp_raw)
> > +{
> > + struct peci_request *req;
> > +
> > + req = peci_get_temp(device);
> > + if (IS_ERR(req))
> > + return PTR_ERR(req);
> > +
> > + *temp_raw = peci_request_data_temp(req);
> > +
> > + peci_request_free(req);
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_temp_read, PECI_CPU);
> > +
> > +/**
> > + * peci_pcs_read() - read PCS register
> > + * @device: PECI device to which request is going to be sent
> > + * @index: PCS index
> > + * @param: PCS parameter
> > + * @data: where to store the read data
> > + *
> > + * It uses RdPkgConfig PECI command.
> > + *
> > + * Return: 0 if succeeded, other values in case errors.
> > + */
> > +int peci_pcs_read(struct peci_device *device, u8 index, u16 param, u32
> > *data)
> > +{
> > + struct peci_request *req;
> > + int ret;
> > +
> > + req = peci_pkg_cfg_readl(device, index, param);
> > + if (IS_ERR(req))
> > + return PTR_ERR(req);
> > +
> > + ret = peci_request_status(req);
> > + if (ret)
> > + goto out_req_free;
> > +
> > + *data = peci_request_data_readl(req);
> > +out_req_free:
>
> As in patch 9, this control flow could be rewritten as just
>
> if (!ret)
> *data = peci_request_data_readl(req);
>
> and avoid the goto.
I think explicit error handling just reads better (and is a more common pattern
in kernel code).
In order to save a single line of code, doing:
if (non-error)
do-the-regular-flow
where readers are used to the inverse:
if (error)
handle-error
do-the-regular-flow
may make the reader confused (it's easy to mix up error handling with regular
flow).
>
> > + peci_request_free(req);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_pcs_read, PECI_CPU);
> > +
> > +/**
> > + * peci_pci_local_read() - read 32-bit memory location using raw address
> > + * @device: PECI device to which request is going to be sent
> > + * @bus: bus
> > + * @dev: device
> > + * @func: function
> > + * @reg: register
> > + * @data: where to store the read data
> > + *
> > + * It uses RdPCIConfigLocal PECI command.
> > + *
> > + * Return: 0 if succeeded, other values in case errors.
> > + */
> > +int peci_pci_local_read(struct peci_device *device, u8 bus, u8 dev, u8
> > func,
> > + u16 reg, u32 *data)
> > +{
> > + struct peci_request *req;
> > + int ret;
> > +
> > + req = peci_pci_cfg_local_readl(device, bus, dev, func, reg);
> > + if (IS_ERR(req))
> > + return PTR_ERR(req);
> > +
> > + ret = peci_request_status(req);
> > + if (ret)
> > + goto out_req_free;
> > +
> > + *data = peci_request_data_readl(req);
> > +out_req_free:
> > + peci_request_free(req);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_pci_local_read, PECI_CPU);
> > +
> > +/**
> > + * peci_ep_pci_local_read() - read 32-bit memory location using raw address
> > + * @device: PECI device to which request is going to be sent
> > + * @seg: PCI segment
> > + * @bus: bus
> > + * @dev: device
> > + * @func: function
> > + * @reg: register
> > + * @data: where to store the read data
> > + *
> > + * Like &peci_pci_local_read, but it uses RdEndpointConfig PECI command.
> > + *
> > + * Return: 0 if succeeded, other values in case errors.
> > + */
> > +int peci_ep_pci_local_read(struct peci_device *device, u8 seg,
> > + u8 bus, u8 dev, u8 func, u16 reg, u32 *data)
> > +{
> > + struct peci_request *req;
> > + int ret;
> > +
> > + req = peci_ep_pci_cfg_local_readl(device, seg, bus, dev, func, reg);
> > + if (IS_ERR(req))
> > + return PTR_ERR(req);
> > +
> > + ret = peci_request_status(req);
> > + if (ret)
> > + goto out_req_free;
> > +
> > + *data = peci_request_data_readl(req);
> > +out_req_free:
> > + peci_request_free(req);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_ep_pci_local_read, PECI_CPU);
> > +
> > +/**
> > + * peci_mmio_read() - read 32-bit memory location using 64-bit bar offset
> > address
> > + * @device: PECI device to which request is going to be sent
> > + * @bar: PCI bar
> > + * @seg: PCI segment
> > + * @bus: bus
> > + * @dev: device
> > + * @func: function
> > + * @address: 64-bit MMIO address
> > + * @data: where to store the read data
> > + *
> > + * It uses RdEndpointConfig PECI command.
> > + *
> > + * Return: 0 if succeeded, other values in case errors.
> > + */
> > +int peci_mmio_read(struct peci_device *device, u8 bar, u8 seg,
> > + u8 bus, u8 dev, u8 func, u64 address, u32 *data)
> > +{
> > + struct peci_request *req;
> > + int ret;
> > +
> > + req = peci_ep_mmio64_readl(device, bar, seg, bus, dev, func,
> > address);
> > + if (IS_ERR(req))
> > + return PTR_ERR(req);
> > +
> > + ret = peci_request_status(req);
> > + if (ret)
> > + goto out_req_free;
> > +
> > + *data = peci_request_data_readl(req);
> > +out_req_free:
> > + peci_request_free(req);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_mmio_read, PECI_CPU);
> > +
> > +struct peci_cpu {
> > + struct peci_device *device;
> > + const struct peci_device_id *id;
> > + struct auxiliary_device **aux_devices;
>
> Given that the size for this allocation is a compile-time constant,
> should we just inline this as 'struct auxiliary_device
> *aux_devices[ARRAY_SIZE(type)]' and avoid some kmalloc work in
> peci_cpu_add_adevices()?
Ack.
>
> > +};
> > +
> > +static const char * const type[] = {
>
> A slightly more descriptive name might be good -- maybe something like
> 'peci_adevice_types'?
I'll rename it to something more descriptive.
>
> > + "cputemp",
> > + "dimmtemp",
> > +};
> > +
> > +static void adev_release(struct device *dev)
> > +{
> > + struct auxiliary_device *adev = to_auxiliary_dev(dev);
> > +
> > + kfree(adev->name);
> > + kfree(adev);
> > +}
> > +
> > +static struct auxiliary_device *add_adev(struct peci_cpu *priv, int idx)
> > +{
> > + struct peci_controller *controller = priv->device->controller;
> > + struct auxiliary_device *adev;
> > + const char *name;
> > + int ret;
> > +
> > + adev = kzalloc(sizeof(*adev), GFP_KERNEL);
> > + if (!adev)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + name = kasprintf(GFP_KERNEL, "%s.%s", type[idx], (const char *)priv-
> > >id->data);
> > + if (!name) {
> > + ret = -ENOMEM;
> > + goto free_adev;
> > + }
> > +
> > + adev->name = name;
> > + adev->dev.parent = &priv->device->dev;
> > + adev->dev.release = adev_release;
> > + adev->id = (controller->id << 16) | (priv->device->addr);
> > +
> > + ret = auxiliary_device_init(adev);
> > + if (ret)
> > + goto free_name;
> > +
> > + ret = auxiliary_device_add(adev);
> > + if (ret) {
> > + auxiliary_device_uninit(adev);
> > + return ERR_PTR(ret);
> > + }
> > +
> > + return adev;
> > +
> > +free_name:
> > + kfree(name);
> > +free_adev:
> > + kfree(adev);
> > + return ERR_PTR(ret);
> > +}
> > +
> > +static void del_adev(struct auxiliary_device *adev)
> > +{
> > + auxiliary_device_delete(adev);
> > + auxiliary_device_uninit(adev);
> > +}
> > +
> > +static int peci_cpu_add_adevices(struct peci_cpu *priv)
> > +{
> > + struct device *dev = &priv->device->dev;
> > + struct auxiliary_device *adev;
> > + int i;
> > +
> > + priv->aux_devices = devm_kcalloc(dev, ARRAY_SIZE(type),
> > + sizeof(*priv->aux_devices),
> > + GFP_KERNEL);
> > + if (!priv->aux_devices)
> > + return -ENOMEM;
> > +
> > + for (i = 0; i < ARRAY_SIZE(type); i++) {
> > + adev = add_adev(priv, i);
> > + if (IS_ERR(adev)) {
> > + dev_warn(dev, "Failed to add PECI auxiliary: %s, ret
> > = %ld\n",
> > + type[i], PTR_ERR(adev));
> > + continue;
> > + }
> > +
> > + priv->aux_devices[i] = adev;
> > + }
> > + return 0;
> > +}
> > +
> > +static int
> > +peci_cpu_probe(struct peci_device *device, const struct peci_device_id *id)
> > +{
> > + struct device *dev = &device->dev;
> > + struct peci_cpu *priv;
> > +
> > + priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> > + if (!priv)
> > + return -ENOMEM;
> > +
> > + dev_set_drvdata(dev, priv);
> > + priv->device = device;
> > + priv->id = id;
> > +
> > + return peci_cpu_add_adevices(priv);
> > +}
> > +
> > +static void peci_cpu_remove(struct peci_device *device)
> > +{
> > + struct peci_cpu *priv = dev_get_drvdata(&device->dev);
> > + int i;
> > +
> > + for (i = 0; i < ARRAY_SIZE(type); i++) {
> > + struct auxiliary_device *adev = priv->aux_devices[i];
> > +
> > + if (adev)
> > + del_adev(adev);
> > + }
> > +}
> > +
> > +static const struct peci_device_id peci_cpu_device_ids[] = {
> > + { /* Haswell Xeon */
> > + .family = 6,
> > + .model = INTEL_FAM6_HASWELL_X,
> > + .data = "hsx",
> > + },
> > + { /* Broadwell Xeon */
> > + .family = 6,
> > + .model = INTEL_FAM6_BROADWELL_X,
> > + .data = "bdx",
> > + },
> > + { /* Broadwell Xeon D */
> > + .family = 6,
> > + .model = INTEL_FAM6_BROADWELL_D,
> > + .data = "skxd",
> > + },
> > + { /* Skylake Xeon */
> > + .family = 6,
> > + .model = INTEL_FAM6_SKYLAKE_X,
> > + .data = "skx",
> > + },
> > + { /* Icelake Xeon */
> > + .family = 6,
> > + .model = INTEL_FAM6_ICELAKE_X,
> > + .data = "icx",
> > + },
> > + { /* Icelake Xeon D */
> > + .family = 6,
> > + .model = INTEL_FAM6_ICELAKE_D,
> > + .data = "icxd",
> > + },
> > + { }
> > +};
> > +MODULE_DEVICE_TABLE(peci, peci_cpu_device_ids);
> > +
> > +static struct peci_driver peci_cpu_driver = {
> > + .probe = peci_cpu_probe,
> > + .remove = peci_cpu_remove,
> > + .id_table = peci_cpu_device_ids,
> > + .driver = {
> > + .name = "peci-cpu",
> > + },
> > +};
> > +module_peci_driver(peci_cpu_driver);
> > +
> > +MODULE_AUTHOR("Iwona Winiarska <[email protected]>");
> > +MODULE_DESCRIPTION("PECI CPU driver");
> > +MODULE_LICENSE("GPL");
> > +MODULE_IMPORT_NS(PECI);
> > diff --git a/drivers/peci/device.c b/drivers/peci/device.c
> > index 8c4bd1ebbc29..c278c9ea166c 100644
> > --- a/drivers/peci/device.c
> > +++ b/drivers/peci/device.c
> > @@ -3,6 +3,7 @@
> >
> > #include <linux/bitfield.h>
> > #include <linux/peci.h>
> > +#include <linux/peci-cpu.h>
> > #include <linux/slab.h>
> > #include <linux/x86/cpu.h>
> >
> > diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h
> > index c891c93e077a..1d39483a8acf 100644
> > --- a/drivers/peci/internal.h
> > +++ b/drivers/peci/internal.h
> > @@ -21,6 +21,7 @@ void peci_request_free(struct peci_request *req);
> >
> > int peci_request_status(struct peci_request *req);
> > u64 peci_request_data_dib(struct peci_request *req);
> > +s16 peci_request_data_temp(struct peci_request *req);
> >
> > u8 peci_request_data_readb(struct peci_request *req);
> > u16 peci_request_data_readw(struct peci_request *req);
> > @@ -35,6 +36,32 @@ struct peci_request *peci_pkg_cfg_readw(struct
> > peci_device *device, u8 index, u1
> > struct peci_request *peci_pkg_cfg_readl(struct peci_device *device, u8
> > index, u16 param);
> > struct peci_request *peci_pkg_cfg_readq(struct peci_device *device, u8
> > index, u16 param);
> >
> > +struct peci_request *peci_pci_cfg_local_readb(struct peci_device *device,
> > + u8 bus, u8 dev, u8 func, u16
> > reg);
> > +struct peci_request *peci_pci_cfg_local_readw(struct peci_device *device,
> > + u8 bus, u8 dev, u8 func, u16
> > reg);
> > +struct peci_request *peci_pci_cfg_local_readl(struct peci_device *device,
> > + u8 bus, u8 dev, u8 func, u16
> > reg);
> > +
> > +struct peci_request *peci_ep_pci_cfg_local_readb(struct peci_device
> > *device, u8 seg,
> > + u8 bus, u8 dev, u8 func,
> > u16 reg);
> > +struct peci_request *peci_ep_pci_cfg_local_readw(struct peci_device
> > *device, u8 seg,
> > + u8 bus, u8 dev, u8 func,
> > u16 reg);
> > +struct peci_request *peci_ep_pci_cfg_local_readl(struct peci_device
> > *device, u8 seg,
> > + u8 bus, u8 dev, u8 func,
> > u16 reg);
> > +
> > +struct peci_request *peci_ep_pci_cfg_readb(struct peci_device *device, u8
> > seg,
> > + u8 bus, u8 dev, u8 func, u16
> > reg);
> > +struct peci_request *peci_ep_pci_cfg_readw(struct peci_device *device, u8
> > seg,
> > + u8 bus, u8 dev, u8 func, u16
> > reg);
> > +struct peci_request *peci_ep_pci_cfg_readl(struct peci_device *device, u8
> > seg,
> > + u8 bus, u8 dev, u8 func, u16
> > reg);
> > +
> > +struct peci_request *peci_ep_mmio32_readl(struct peci_device *device, u8
> > bar, u8 seg,
> > + u8 bus, u8 dev, u8 func, u64
> > offset);
> > +
> > +struct peci_request *peci_ep_mmio64_readl(struct peci_device *device, u8
> > bar, u8 seg,
> > + u8 bus, u8 dev, u8 func, u64
> > offset);
> > /**
> > * struct peci_device_id - PECI device data to match
> > * @data: pointer to driver private data specific to device
> > diff --git a/drivers/peci/request.c b/drivers/peci/request.c
> > index 48354455b554..c5d39f7e8142 100644
> > --- a/drivers/peci/request.c
> > +++ b/drivers/peci/request.c
> > @@ -3,6 +3,7 @@
> >
> > #include <linux/bug.h>
> > #include <linux/export.h>
> > +#include <linux/pci.h>
> > #include <linux/peci.h>
> > #include <linux/slab.h>
> > #include <linux/types.h>
> > @@ -15,6 +16,10 @@
> > #define PECI_GET_DIB_WR_LEN 1
> > #define PECI_GET_DIB_RD_LEN 8
> >
> > +#define PECI_GET_TEMP_CMD 0x01
> > +#define PECI_GET_TEMP_WR_LEN 1
> > +#define PECI_GET_TEMP_RD_LEN 2
> > +
> > #define PECI_RDPKGCFG_CMD 0xa1
> > #define PECI_RDPKGCFG_WRITE_LEN 5
> > #define PECI_RDPKGCFG_READ_LEN_BASE 1
> > @@ -22,6 +27,44 @@
> > #define PECI_WRPKGCFG_WRITE_LEN_BASE 6
> > #define PECI_WRPKGCFG_READ_LEN 1
> >
> > +#define PECI_RDIAMSR_CMD 0xb1
> > +#define PECI_RDIAMSR_WRITE_LEN 5
> > +#define PECI_RDIAMSR_READ_LEN 9
> > +#define PECI_WRIAMSR_CMD 0xb5
> > +#define PECI_RDIAMSREX_CMD 0xd1
> > +#define PECI_RDIAMSREX_WRITE_LEN 6
> > +#define PECI_RDIAMSREX_READ_LEN 9
> > +
> > +#define PECI_RDPCICFG_CMD 0x61
> > +#define PECI_RDPCICFG_WRITE_LEN 6
> > +#define PECI_RDPCICFG_READ_LEN 5
> > +#define PECI_RDPCICFG_READ_LEN_MAX 24
> > +#define PECI_WRPCICFG_CMD 0x65
> > +
> > +#define PECI_RDPCICFGLOCAL_CMD 0xe1
> > +#define PECI_RDPCICFGLOCAL_WRITE_LEN 5
> > +#define PECI_RDPCICFGLOCAL_READ_LEN_BASE 1
> > +#define PECI_WRPCICFGLOCAL_CMD 0xe5
> > +#define PECI_WRPCICFGLOCAL_WRITE_LEN_BASE 6
> > +#define PECI_WRPCICFGLOCAL_READ_LEN 1
> > +
> > +#define PECI_ENDPTCFG_TYPE_LOCAL_PCI 0x03
> > +#define PECI_ENDPTCFG_TYPE_PCI 0x04
> > +#define PECI_ENDPTCFG_TYPE_MMIO 0x05
> > +#define PECI_ENDPTCFG_ADDR_TYPE_PCI 0x04
> > +#define PECI_ENDPTCFG_ADDR_TYPE_MMIO_D 0x05
> > +#define PECI_ENDPTCFG_ADDR_TYPE_MMIO_Q 0x06
> > +#define PECI_RDENDPTCFG_CMD 0xc1
> > +#define PECI_RDENDPTCFG_PCI_WRITE_LEN 12
> > +#define PECI_RDENDPTCFG_MMIO_D_WRITE_LEN 14
> > +#define PECI_RDENDPTCFG_MMIO_Q_WRITE_LEN 18
> > +#define PECI_RDENDPTCFG_READ_LEN_BASE 1
> > +#define PECI_WRENDPTCFG_CMD 0xc5
> > +#define PECI_WRENDPTCFG_PCI_WRITE_LEN_BASE 13
> > +#define PECI_WRENDPTCFG_MMIO_D_WRITE_LEN_BASE 15
> > +#define PECI_WRENDPTCFG_MMIO_Q_WRITE_LEN_BASE 19
> > +#define PECI_WRENDPTCFG_READ_LEN 1
> > +
> > /* Device Specific Completion Code (CC) Definition */
> > #define PECI_CC_SUCCESS 0x40
> > #define PECI_CC_NEED_RETRY 0x80
> > @@ -223,6 +266,27 @@ struct peci_request *peci_get_dib(struct peci_device
> > *device)
> > }
> > EXPORT_SYMBOL_NS_GPL(peci_get_dib, PECI);
> >
> > +struct peci_request *peci_get_temp(struct peci_device *device)
> > +{
> > + struct peci_request *req;
> > + int ret;
> > +
> > + req = peci_request_alloc(device, PECI_GET_TEMP_WR_LEN,
> > PECI_GET_TEMP_RD_LEN);
> > + if (!req)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + req->tx.buf[0] = PECI_GET_TEMP_CMD;
> > +
> > + ret = peci_request_xfer(req);
> > + if (ret) {
> > + peci_request_free(req);
> > + return ERR_PTR(ret);
> > + }
> > +
> > + return req;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_get_temp, PECI);
> > +
> > static struct peci_request *
> > __pkg_cfg_read(struct peci_device *device, u8 index, u16 param, u8 len)
> > {
> > @@ -248,6 +312,108 @@ __pkg_cfg_read(struct peci_device *device, u8 index,
> > u16 param, u8 len)
> > return req;
> > }
> >
> > +static u32 __get_pci_addr(u8 bus, u8 dev, u8 func, u16 reg)
> > +{
> > + return reg | PCI_DEVID(bus, PCI_DEVFN(dev, func)) << 12;
> > +}
> > +
> > +static struct peci_request *
> > +__pci_cfg_local_read(struct peci_device *device, u8 bus, u8 dev, u8 func,
> > u16 reg, u8 len)
> > +{
> > + struct peci_request *req;
> > + u32 pci_addr;
> > + int ret;
> > +
> > + req = peci_request_alloc(device, PECI_RDPCICFGLOCAL_WRITE_LEN,
> > + PECI_RDPCICFGLOCAL_READ_LEN_BASE + len);
> > + if (!req)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + pci_addr = __get_pci_addr(bus, dev, func, reg);
> > +
> > + req->tx.buf[0] = PECI_RDPCICFGLOCAL_CMD;
> > + req->tx.buf[1] = 0;
> > + put_unaligned_le24(pci_addr, &req->tx.buf[2]);
> > +
> > + ret = peci_request_xfer_retry(req);
> > + if (ret) {
> > + peci_request_free(req);
> > + return ERR_PTR(ret);
> > + }
> > +
> > + return req;
> > +}
> > +
> > +static struct peci_request *
> > +__ep_pci_cfg_read(struct peci_device *device, u8 msg_type, u8 seg,
> > + u8 bus, u8 dev, u8 func, u16 reg, u8 len)
> > +{
> > + struct peci_request *req;
> > + u32 pci_addr;
> > + int ret;
> > +
> > + req = peci_request_alloc(device, PECI_RDENDPTCFG_PCI_WRITE_LEN,
> > + PECI_RDENDPTCFG_READ_LEN_BASE + len);
> > + if (!req)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + pci_addr = __get_pci_addr(bus, dev, func, reg);
> > +
> > + req->tx.buf[0] = PECI_RDENDPTCFG_CMD;
> > + req->tx.buf[1] = 0;
> > + req->tx.buf[2] = msg_type;
> > + req->tx.buf[3] = 0;
> > + req->tx.buf[4] = 0;
> > + req->tx.buf[5] = 0;
> > + req->tx.buf[6] = PECI_ENDPTCFG_ADDR_TYPE_PCI;
> > + req->tx.buf[7] = seg; /* PCI Segment */
> > + put_unaligned_le32(pci_addr, &req->tx.buf[8]);
> > +
> > + ret = peci_request_xfer_retry(req);
> > + if (ret) {
> > + peci_request_free(req);
> > + return ERR_PTR(ret);
> > + }
> > +
> > + return req;
> > +}
> > +
> > +static struct peci_request *
> > +__ep_mmio_read(struct peci_device *device, u8 bar, u8 addr_type, u8 seg,
> > + u8 bus, u8 dev, u8 func, u64 offset, u8 tx_len, u8 len)
> > +{
> > + struct peci_request *req;
> > + int ret;
> > +
> > + req = peci_request_alloc(device, tx_len,
> > PECI_RDENDPTCFG_READ_LEN_BASE + len);
> > + if (!req)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + req->tx.buf[0] = PECI_RDENDPTCFG_CMD;
> > + req->tx.buf[1] = 0;
> > + req->tx.buf[2] = PECI_ENDPTCFG_TYPE_MMIO;
> > + req->tx.buf[3] = 0; /* Endpoint ID */
> > + req->tx.buf[4] = 0; /* Reserved */
> > + req->tx.buf[5] = bar;
> > + req->tx.buf[6] = addr_type;
> > + req->tx.buf[7] = seg; /* PCI Segment */
> > + req->tx.buf[8] = PCI_DEVFN(dev, func);
> > + req->tx.buf[9] = bus; /* PCI Bus */
> > +
> > + if (addr_type == PECI_ENDPTCFG_ADDR_TYPE_MMIO_D)
> > + put_unaligned_le32(offset, &req->tx.buf[10]);
> > + else
> > + put_unaligned_le64(offset, &req->tx.buf[10]);
> > +
> > + ret = peci_request_xfer_retry(req);
> > + if (ret) {
> > + peci_request_free(req);
> > + return ERR_PTR(ret);
> > + }
> > +
> > + return req;
> > +}
> > +
> > u8 peci_request_data_readb(struct peci_request *req)
> > {
> > return req->rx.buf[1];
> > @@ -278,6 +444,12 @@ u64 peci_request_data_dib(struct peci_request *req)
> > }
> > EXPORT_SYMBOL_NS_GPL(peci_request_data_dib, PECI);
> >
> > +s16 peci_request_data_temp(struct peci_request *req)
> > +{
> > + return get_unaligned_le16(&req->rx.buf[0]);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(peci_request_data_temp, PECI);
> > +
> > #define __read_pkg_config(x, type) \
> > struct peci_request *peci_pkg_cfg_##x(struct peci_device *device, u8 index,
> > u16 param) \
> > { \
> > @@ -289,3 +461,42 @@ __read_pkg_config(readb, u8);
> > __read_pkg_config(readw, u16);
> > __read_pkg_config(readl, u32);
> > __read_pkg_config(readq, u64);
> > +
> > +#define __read_pci_config_local(x, type) \
> > +struct peci_request * \
> > +peci_pci_cfg_local_##x(struct peci_device *device, u8 bus, u8 dev, u8 func,
> > u16 reg) \
> > +{ \
> > + return __pci_cfg_local_read(device, bus, dev, func, reg,
> > sizeof(type)); \
> > +} \
>
> As with peci_pkg_cfg_*() in patch 9, it seems like this could relieve
> callers of some busy-work by returning a status int and writing the data
> to a 'type*' pointer instead of returning a struct peci_request*.
The callers that expect such behavior (getting the data directly without
bothering with requests, peci completion codes, and so on) are supposed to use
the API exposed by their "parent" driver (e.g. peci_pci_local_read).
>
> > +EXPORT_SYMBOL_NS_GPL(peci_pci_cfg_local_##x, PECI)
> > +
> > +__read_pci_config_local(readb, u8);
> > +__read_pci_config_local(readw, u16);
> > +__read_pci_config_local(readl, u32);
> > +
> > +#define __read_ep_pci_config(x, msg_type, type) \
> > +struct peci_request * \
> > +peci_ep_pci_cfg_##x(struct peci_device *device, u8 seg, u8 bus, u8 dev, u8
> > func, u16 reg) \
> > +{ \
> > + return __ep_pci_cfg_read(device, msg_type, seg, bus, dev, func, reg,
> > sizeof(type)); \
> > +} \
>
> Likewise here.
>
> > +EXPORT_SYMBOL_NS_GPL(peci_ep_pci_cfg_##x, PECI)
> > +
> > +__read_ep_pci_config(local_readb, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u8);
> > +__read_ep_pci_config(local_readw, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u16);
> > +__read_ep_pci_config(local_readl, PECI_ENDPTCFG_TYPE_LOCAL_PCI, u32);
> > +__read_ep_pci_config(readb, PECI_ENDPTCFG_TYPE_PCI, u8);
> > +__read_ep_pci_config(readw, PECI_ENDPTCFG_TYPE_PCI, u16);
> > +__read_ep_pci_config(readl, PECI_ENDPTCFG_TYPE_PCI, u32);
> > +
> > +#define __read_ep_mmio(x, y, addr_type, type1, type2) \
> > +struct peci_request *peci_ep_mmio##y##_##x(struct peci_device *device, u8
> > bar, u8 seg, \
> > + u8 bus, u8 dev, u8 func, u64
> > offset) \
> > +{ \
> > + return __ep_mmio_read(device, bar, addr_type, seg, bus, dev, func, \
> > + offset, 10 + sizeof(type1), sizeof(type2)); \
> > +} \
>
> And here (I think).
>
> Also, the '10 +' looks a bit magical/mysterious. Could that be
> clarified a bit with a macro or something?
Makes sense - I'll define it.
Thank you
-Iwona
>
> > +EXPORT_SYMBOL_NS_GPL(peci_ep_mmio##y##_##x, PECI)
> > +
> > +__read_ep_mmio(readl, 32, PECI_ENDPTCFG_ADDR_TYPE_MMIO_D, u32, u32);
> > +__read_ep_mmio(readl, 64, PECI_ENDPTCFG_ADDR_TYPE_MMIO_Q, u64, u32);
> > diff --git a/include/linux/peci-cpu.h b/include/linux/peci-cpu.h
> > new file mode 100644
> > index 000000000000..d1b307ec2429
> > --- /dev/null
> > +++ b/include/linux/peci-cpu.h
> > @@ -0,0 +1,38 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright (c) 2021 Intel Corporation */
> > +
> > +#ifndef __LINUX_PECI_CPU_H
> > +#define __LINUX_PECI_CPU_H
> > +
> > +#include <linux/types.h>
> > +
> > +#define PECI_PCS_PKG_ID 0 /* Package Identifier
> > Read */
> > +#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
> > +#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
> > +#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
> > +#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
> > +#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update
> > Revision */
> > +#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
> > +#define PECI_PCS_MODULE_TEMP 9 /* Per Core DTS Temperature Read
> > */
> > +#define PECI_PCS_THERMAL_MARGIN 10 /* DTS thermal margin */
> > +#define PECI_PCS_DDR_DIMM_TEMP 14 /* DDR DIMM Temperature */
> > +#define PECI_PCS_TEMP_TARGET 16 /* Temperature Target Read */
> > +#define PECI_PCS_TDP_UNITS 30 /* Units for power/energy
> > registers */
> > +
> > +struct peci_device;
> > +
> > +int peci_temp_read(struct peci_device *device, s16 *temp_raw);
> > +
> > +int peci_pcs_read(struct peci_device *device, u8 index,
> > + u16 param, u32 *data);
> > +
> > +int peci_pci_local_read(struct peci_device *device, u8 bus, u8 dev,
> > + u8 func, u16 reg, u32 *data);
> > +
> > +int peci_ep_pci_local_read(struct peci_device *device, u8 seg,
> > + u8 bus, u8 dev, u8 func, u16 reg, u32 *data);
> > +
> > +int peci_mmio_read(struct peci_device *device, u8 bar, u8 seg,
> > + u8 bus, u8 dev, u8 func, u64 address, u32 *data);
> > +
> > +#endif /* __LINUX_PECI_CPU_H */
> > diff --git a/include/linux/peci.h b/include/linux/peci.h
> > index f9f37b874011..31f9e628fd11 100644
> > --- a/include/linux/peci.h
> > +++ b/include/linux/peci.h
> > @@ -9,14 +9,6 @@
> > #include <linux/mutex.h>
> > #include <linux/types.h>
> >
> > -#define PECI_PCS_PKG_ID 0 /* Package Identifier
> > Read */
> > -#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */
> > -#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */
> > -#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */
> > -#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */
> > -#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update
> > Revision */
> > -#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */
> > -
> > struct peci_request;
> >
> > /**
> > --
> > 2.31.1
On Tue, 2021-07-27 at 07:06 +0000, Zev Weiss wrote:
> On Mon, Jul 12, 2021 at 05:04:44PM CDT, Iwona Winiarska wrote:
> > Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
> > readings of the processor package and processor cores that are
> > accessible via the PECI interface.
> >
> > The main use case for the driver (and PECI interface) is out-of-band
> > management, where we're able to obtain the DTS readings from an external
> > entity connected with PECI, e.g. BMC on server platforms.
> >
> > Co-developed-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> > MAINTAINERS | 7 +
> > drivers/hwmon/Kconfig | 2 +
> > drivers/hwmon/Makefile | 1 +
> > drivers/hwmon/peci/Kconfig | 18 ++
> > drivers/hwmon/peci/Makefile | 5 +
> > drivers/hwmon/peci/common.h | 46 ++++
> > drivers/hwmon/peci/cputemp.c | 503 +++++++++++++++++++++++++++++++++++
> > 7 files changed, 582 insertions(+)
> > create mode 100644 drivers/hwmon/peci/Kconfig
> > create mode 100644 drivers/hwmon/peci/Makefile
> > create mode 100644 drivers/hwmon/peci/common.h
> > create mode 100644 drivers/hwmon/peci/cputemp.c
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index f47b5f634293..35ba9e3646bd 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -14504,6 +14504,13 @@ L: [email protected]
> > S: Maintained
> > F: drivers/platform/x86/peaq-wmi.c
> >
> > +PECI HARDWARE MONITORING DRIVERS
> > +M: Iwona Winiarska <[email protected]>
> > +R: Jae Hyun Yoo <[email protected]>
> > +L: [email protected]
> > +S: Supported
> > +F: drivers/hwmon/peci/
> > +
> > PECI SUBSYSTEM
> > M: Iwona Winiarska <[email protected]>
> > R: Jae Hyun Yoo <[email protected]>
> > diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> > index e3675377bc5d..61c0e3404415 100644
> > --- a/drivers/hwmon/Kconfig
> > +++ b/drivers/hwmon/Kconfig
> > @@ -1507,6 +1507,8 @@ config SENSORS_PCF8591
> > These devices are hard to detect and rarely found on mainstream
> > hardware. If unsure, say N.
> >
> > +source "drivers/hwmon/peci/Kconfig"
> > +
> > source "drivers/hwmon/pmbus/Kconfig"
> >
> > config SENSORS_PWM_FAN
> > diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> > index d712c61c1f5e..f52331f212ed 100644
> > --- a/drivers/hwmon/Makefile
> > +++ b/drivers/hwmon/Makefile
> > @@ -202,6 +202,7 @@ obj-$(CONFIG_SENSORS_WM8350) += wm8350-hwmon.o
> > obj-$(CONFIG_SENSORS_XGENE) += xgene-hwmon.o
> >
> > obj-$(CONFIG_SENSORS_OCC) += occ/
> > +obj-$(CONFIG_SENSORS_PECI) += peci/
> > obj-$(CONFIG_PMBUS) += pmbus/
> >
> > ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
> > diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
> > new file mode 100644
> > index 000000000000..e10eed68d70a
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/Kconfig
> > @@ -0,0 +1,18 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config SENSORS_PECI_CPUTEMP
> > + tristate "PECI CPU temperature monitoring client"
> > + depends on PECI
> > + select SENSORS_PECI
> > + select PECI_CPU
> > + help
> > + If you say yes here you get support for the generic Intel PECI
> > + cputemp driver which provides Digital Thermal Sensor (DTS) thermal
> > + readings of the CPU package and CPU cores that are accessible via
> > + the processor PECI interface.
> > +
> > + This driver can also be built as a module. If so, the module
> > + will be called peci-cputemp.
> > +
> > +config SENSORS_PECI
> > + tristate
> > diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
> > new file mode 100644
> > index 000000000000..e8a0ada5ab1f
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +peci-cputemp-y := cputemp.o
> > +
> > +obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
> > diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
> > new file mode 100644
> > index 000000000000..54580c100d06
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/common.h
> > @@ -0,0 +1,46 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright (c) 2021 Intel Corporation */
> > +
> > +#include <linux/types.h>
> > +
> > +#ifndef __PECI_HWMON_COMMON_H
> > +#define __PECI_HWMON_COMMON_H
> > +
> > +#define UPDATE_INTERVAL_DEFAULT HZ
> > +
> > +/**
> > + * struct peci_sensor_data - PECI sensor information
> > + * @valid: flag to indicate the sensor value is valid
> > + * @value: sensor value in milli units
> > + * @last_updated: time of the last update in jiffies
> > + */
> > +struct peci_sensor_data {
> > + unsigned int valid;
>
> From what I can see it looks like the 'valid' member here is strictly a
> one-shot has-this-value-ever-been-set indicator, which seems a bit
> wasteful to keep around forever post initialization; couldn't the same
> information be inferred from checking last_updated != 0 or something?
That's just expressed in jiffies, which means it can overflow (we're just
unlikely to hit it - but IIUC it can happen).
Doing it this way would require making sure that last_updated is never set to 0
in code that does the update. I don't think it's worth to add more complexity
there just to save a couple of bytes.
>
> > + s32 value;
> > + unsigned long last_updated;
> > +};
> > +
> > +/**
> > + * peci_sensor_need_update() - check whether sensor update is needed or not
> > + * @sensor: pointer to sensor data struct
> > + *
> > + * Return: true if update is needed, false if not.
> > + */
> > +
> > +static inline bool peci_sensor_need_update(struct peci_sensor_data *sensor)
> > +{
> > + return !sensor->valid ||
> > + time_after(jiffies, sensor->last_updated +
> > UPDATE_INTERVAL_DEFAULT);
> > +}
> > +
> > +/**
> > + * peci_sensor_mark_updated() - mark the sensor is updated
> > + * @sensor: pointer to sensor data struct
> > + */
> > +static inline void peci_sensor_mark_updated(struct peci_sensor_data
> > *sensor)
> > +{
> > + sensor->valid = 1;
> > + sensor->last_updated = jiffies;
> > +}
> > +
> > +#endif /* __PECI_HWMON_COMMON_H */
> > diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
> > new file mode 100644
> > index 000000000000..56a526471687
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/cputemp.c
> > @@ -0,0 +1,503 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2018-2021 Intel Corporation
> > +
> > +#include <linux/auxiliary_bus.h>
> > +#include <linux/bitfield.h>
> > +#include <linux/bitops.h>
> > +#include <linux/hwmon.h>
> > +#include <linux/jiffies.h>
> > +#include <linux/module.h>
> > +#include <linux/peci.h>
> > +#include <linux/peci-cpu.h>
> > +#include <linux/units.h>
> > +#include <linux/x86/intel-family.h>
> > +
> > +#include "common.h"
> > +
> > +#define CORE_NUMS_MAX 64
> > +
> > +#define DEFAULT_CHANNEL_NUMS 5
>
> DEFAULT_ seems like a slightly odd prefix for this (it's not something
> that can really be overridden or anything); would BASE_ perhaps be a bit
> more appropriate?
Ack.
>
> > +#define CORETEMP_CHANNEL_NUMS CORE_NUMS_MAX
> > +#define CPUTEMP_CHANNEL_NUMS (DEFAULT_CHANNEL_NUMS +
> > CORETEMP_CHANNEL_NUMS)
> > +
> > +#define TEMP_TARGET_FAN_TEMP_MASK GENMASK(15, 8)
> > +#define TEMP_TARGET_REF_TEMP_MASK GENMASK(23, 16)
> > +#define TEMP_TARGET_TJ_OFFSET_MASK GENMASK(29, 24)
> > +
> > +#define DTS_MARGIN_MASK GENMASK(15, 0)
> > +#define PCS_MODULE_TEMP_MASK GENMASK(15, 0)
> > +
> > +#define DTS_FIXED_POINT_FRACTION 64
> > +
> > +struct resolved_cores_reg {
> > + u8 bus;
> > + u8 dev;
> > + u8 func;
> > + u8 offset;
> > +};
> > +
> > +struct cpu_info {
> > + struct resolved_cores_reg *reg;
> > + u8 min_peci_revision;
>
> As with the dimmtemp driver, min_peci_revision appears unused here,
> though in this case if it were removed there'd only be one (pointer)
> member left in struct cpu_info, so we could perhaps remove it as well
> and then also a level of indirection in peci_cputemp_ids/cpu_{hsx,icx}
> too?
As I mentioned in reply to previous patch comment, it'll be used to validate if
PECI device revision matches driver requirements.
>
> > +};
> > +
> > +struct peci_cputemp {
> > + struct peci_device *peci_dev;
> > + struct device *dev;
> > + const char *name;
> > + const struct cpu_info *gen_info;
> > + struct {
> > + struct peci_sensor_data die;
> > + struct peci_sensor_data dts;
> > + struct peci_sensor_data tcontrol;
> > + struct peci_sensor_data tthrottle;
> > + struct peci_sensor_data tjmax;
> > + struct peci_sensor_data core[CORETEMP_CHANNEL_NUMS];
> > + } temp;
> > + const char **coretemp_label;
> > + DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
> > +};
> > +
> > +enum cputemp_channels {
> > + channel_die,
> > + channel_dts,
> > + channel_tcontrol,
> > + channel_tthrottle,
> > + channel_tjmax,
> > + channel_core,
> > +};
> > +
> > +static const char *cputemp_label[DEFAULT_CHANNEL_NUMS] = {
>
> static const char * const cputemp_label? (That is, const pointer to
> const char, rather than non-const pointer to const char.)
Ack.
>
> > + "Die",
> > + "DTS",
> > + "Tcontrol",
> > + "Tthrottle",
> > + "Tjmax",
> > +};
> > +
> > +static int get_temp_targets(struct peci_cputemp *priv)
> > +{
> > + s32 tthrottle_offset, tcontrol_margin;
> > + u32 pcs;
> > + int ret;
> > +
> > + /*
> > + * Just use only the tcontrol marker to determine if target values
> > need
> > + * update.
> > + */
> > + if (!peci_sensor_need_update(&priv->temp.tcontrol))
> > + return 0;
> > +
> > + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
> > + if (ret)
> > + return ret;
> > +
> > + priv->temp.tjmax.value = FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > + tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
> > + tcontrol_margin = sign_extend32(tcontrol_margin, 7) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp.tcontrol.value = priv->temp.tjmax.value -
> > tcontrol_margin;
> > +
> > + tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp.tthrottle.value = priv->temp.tjmax.value -
> > tthrottle_offset;
> > +
> > + peci_sensor_mark_updated(&priv->temp.tcontrol);
> > +
> > + return 0;
> > +}
> > +
> > +/*
> > + * Processors return a value of DTS reading in S10.6 fixed point format
> > + * (sign, 10 bits signed integer value, 6 bits fractional).
>
> This parenthetical reads to me like it's describing 17 bits -- I'm not a
> PECI expert, but from my reading of the (somewhat skimpy) docs I've got
> on it I'd suggest a description more like "sign, 9-bit magnitude, 6-bit
> fraction".
You're right, adding "sign" here was not intentional.
I'll change it to:
"16 bits: sign, 9-bit magnitude, 6-bit fraction"
or
"16 bits: 10-bit signed magnitude, 6-bit fraction"
Thanks
-Iwona
On 7/30/21 2:51 PM, Winiarska, Iwona wrote:
> On Tue, 2021-07-27 at 07:06 +0000, Zev Weiss wrote:
>> On Mon, Jul 12, 2021 at 05:04:44PM CDT, Iwona Winiarska wrote:
>>> Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
>>> readings of the processor package and processor cores that are
>>> accessible via the PECI interface.
>>>
>>> The main use case for the driver (and PECI interface) is out-of-band
>>> management, where we're able to obtain the DTS readings from an external
>>> entity connected with PECI, e.g. BMC on server platforms.
>>>
>>> Co-developed-by: Jae Hyun Yoo <[email protected]>
>>> Signed-off-by: Jae Hyun Yoo <[email protected]>
>>> Signed-off-by: Iwona Winiarska <[email protected]>
>>> Reviewed-by: Pierre-Louis Bossart <[email protected]>
>>> ---
>>> MAINTAINERS | 7 +
>>> drivers/hwmon/Kconfig | 2 +
>>> drivers/hwmon/Makefile | 1 +
>>> drivers/hwmon/peci/Kconfig | 18 ++
>>> drivers/hwmon/peci/Makefile | 5 +
>>> drivers/hwmon/peci/common.h | 46 ++++
>>> drivers/hwmon/peci/cputemp.c | 503 +++++++++++++++++++++++++++++++++++
>>> 7 files changed, 582 insertions(+)
>>> create mode 100644 drivers/hwmon/peci/Kconfig
>>> create mode 100644 drivers/hwmon/peci/Makefile
>>> create mode 100644 drivers/hwmon/peci/common.h
>>> create mode 100644 drivers/hwmon/peci/cputemp.c
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index f47b5f634293..35ba9e3646bd 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -14504,6 +14504,13 @@ L: [email protected]
>>> S: Maintained
>>> F: drivers/platform/x86/peaq-wmi.c
>>>
>>> +PECI HARDWARE MONITORING DRIVERS
>>> +M: Iwona Winiarska <[email protected]>
>>> +R: Jae Hyun Yoo <[email protected]>
>>> +L: [email protected]
>>> +S: Supported
>>> +F: drivers/hwmon/peci/
>>> +
>>> PECI SUBSYSTEM
>>> M: Iwona Winiarska <[email protected]>
>>> R: Jae Hyun Yoo <[email protected]>
>>> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
>>> index e3675377bc5d..61c0e3404415 100644
>>> --- a/drivers/hwmon/Kconfig
>>> +++ b/drivers/hwmon/Kconfig
>>> @@ -1507,6 +1507,8 @@ config SENSORS_PCF8591
>>> These devices are hard to detect and rarely found on mainstream
>>> hardware. If unsure, say N.
>>>
>>> +source "drivers/hwmon/peci/Kconfig"
>>> +
>>> source "drivers/hwmon/pmbus/Kconfig"
>>>
>>> config SENSORS_PWM_FAN
>>> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
>>> index d712c61c1f5e..f52331f212ed 100644
>>> --- a/drivers/hwmon/Makefile
>>> +++ b/drivers/hwmon/Makefile
>>> @@ -202,6 +202,7 @@ obj-$(CONFIG_SENSORS_WM8350) += wm8350-hwmon.o
>>> obj-$(CONFIG_SENSORS_XGENE) += xgene-hwmon.o
>>>
>>> obj-$(CONFIG_SENSORS_OCC) += occ/
>>> +obj-$(CONFIG_SENSORS_PECI) += peci/
>>> obj-$(CONFIG_PMBUS) += pmbus/
>>>
>>> ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
>>> diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
>>> new file mode 100644
>>> index 000000000000..e10eed68d70a
>>> --- /dev/null
>>> +++ b/drivers/hwmon/peci/Kconfig
>>> @@ -0,0 +1,18 @@
>>> +# SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +config SENSORS_PECI_CPUTEMP
>>> + tristate "PECI CPU temperature monitoring client"
>>> + depends on PECI
>>> + select SENSORS_PECI
>>> + select PECI_CPU
>>> + help
>>> + If you say yes here you get support for the generic Intel PECI
>>> + cputemp driver which provides Digital Thermal Sensor (DTS) thermal
>>> + readings of the CPU package and CPU cores that are accessible via
>>> + the processor PECI interface.
>>> +
>>> + This driver can also be built as a module. If so, the module
>>> + will be called peci-cputemp.
>>> +
>>> +config SENSORS_PECI
>>> + tristate
>>> diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
>>> new file mode 100644
>>> index 000000000000..e8a0ada5ab1f
>>> --- /dev/null
>>> +++ b/drivers/hwmon/peci/Makefile
>>> @@ -0,0 +1,5 @@
>>> +# SPDX-License-Identifier: GPL-2.0-only
>>> +
>>> +peci-cputemp-y := cputemp.o
>>> +
>>> +obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
>>> diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
>>> new file mode 100644
>>> index 000000000000..54580c100d06
>>> --- /dev/null
>>> +++ b/drivers/hwmon/peci/common.h
>>> @@ -0,0 +1,46 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/* Copyright (c) 2021 Intel Corporation */
>>> +
>>> +#include <linux/types.h>
>>> +
>>> +#ifndef __PECI_HWMON_COMMON_H
>>> +#define __PECI_HWMON_COMMON_H
>>> +
>>> +#define UPDATE_INTERVAL_DEFAULT HZ
>>> +
>>> +/**
>>> + * struct peci_sensor_data - PECI sensor information
>>> + * @valid: flag to indicate the sensor value is valid
>>> + * @value: sensor value in milli units
>>> + * @last_updated: time of the last update in jiffies
>>> + */
>>> +struct peci_sensor_data {
>>> + unsigned int valid;
>>
>> From what I can see it looks like the 'valid' member here is strictly a
>> one-shot has-this-value-ever-been-set indicator, which seems a bit
>> wasteful to keep around forever post initialization; couldn't the same
>> information be inferred from checking last_updated != 0 or something?
>
> That's just expressed in jiffies, which means it can overflow (we're just
> unlikely to hit it - but IIUC it can happen).
> Doing it this way would require making sure that last_updated is never set to 0
> in code that does the update. I don't think it's worth to add more complexity
> there just to save a couple of bytes.
>
Correct. There are ways around that (eg by setting 'last_updated' to some time
in the past), but that isn't really worth the trouble.
'valid' should be bool, though, not "unsigned int".
Guenter
>>
>>> + s32 value;
>>> + unsigned long last_updated;
>>> +};
>>> +
>>> +/**
>>> + * peci_sensor_need_update() - check whether sensor update is needed or not
>>> + * @sensor: pointer to sensor data struct
>>> + *
>>> + * Return: true if update is needed, false if not.
>>> + */
>>> +
>>> +static inline bool peci_sensor_need_update(struct peci_sensor_data *sensor)
>>> +{
>>> + return !sensor->valid ||
>>> + time_after(jiffies, sensor->last_updated +
>>> UPDATE_INTERVAL_DEFAULT);
>>> +}
>>> +
>>> +/**
>>> + * peci_sensor_mark_updated() - mark the sensor is updated
>>> + * @sensor: pointer to sensor data struct
>>> + */
>>> +static inline void peci_sensor_mark_updated(struct peci_sensor_data
>>> *sensor)
>>> +{
>>> + sensor->valid = 1;
>>> + sensor->last_updated = jiffies;
>>> +}
>>> +
>>> +#endif /* __PECI_HWMON_COMMON_H */
>>> diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
>>> new file mode 100644
>>> index 000000000000..56a526471687
>>> --- /dev/null
>>> +++ b/drivers/hwmon/peci/cputemp.c
>>> @@ -0,0 +1,503 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +// Copyright (c) 2018-2021 Intel Corporation
>>> +
>>> +#include <linux/auxiliary_bus.h>
>>> +#include <linux/bitfield.h>
>>> +#include <linux/bitops.h>
>>> +#include <linux/hwmon.h>
>>> +#include <linux/jiffies.h>
>>> +#include <linux/module.h>
>>> +#include <linux/peci.h>
>>> +#include <linux/peci-cpu.h>
>>> +#include <linux/units.h>
>>> +#include <linux/x86/intel-family.h>
>>> +
>>> +#include "common.h"
>>> +
>>> +#define CORE_NUMS_MAX 64
>>> +
>>> +#define DEFAULT_CHANNEL_NUMS 5
>>
>> DEFAULT_ seems like a slightly odd prefix for this (it's not something
>> that can really be overridden or anything); would BASE_ perhaps be a bit
>> more appropriate?
>
> Ack.
>
>>
>>> +#define CORETEMP_CHANNEL_NUMS CORE_NUMS_MAX
>>> +#define CPUTEMP_CHANNEL_NUMS (DEFAULT_CHANNEL_NUMS +
>>> CORETEMP_CHANNEL_NUMS)
>>> +
>>> +#define TEMP_TARGET_FAN_TEMP_MASK GENMASK(15, 8)
>>> +#define TEMP_TARGET_REF_TEMP_MASK GENMASK(23, 16)
>>> +#define TEMP_TARGET_TJ_OFFSET_MASK GENMASK(29, 24)
>>> +
>>> +#define DTS_MARGIN_MASK GENMASK(15, 0)
>>> +#define PCS_MODULE_TEMP_MASK GENMASK(15, 0)
>>> +
>>> +#define DTS_FIXED_POINT_FRACTION 64
>>> +
>>> +struct resolved_cores_reg {
>>> + u8 bus;
>>> + u8 dev;
>>> + u8 func;
>>> + u8 offset;
>>> +};
>>> +
>>> +struct cpu_info {
>>> + struct resolved_cores_reg *reg;
>>> + u8 min_peci_revision;
>>
>> As with the dimmtemp driver, min_peci_revision appears unused here,
>> though in this case if it were removed there'd only be one (pointer)
>> member left in struct cpu_info, so we could perhaps remove it as well
>> and then also a level of indirection in peci_cputemp_ids/cpu_{hsx,icx}
>> too?
>
> As I mentioned in reply to previous patch comment, it'll be used to validate if
> PECI device revision matches driver requirements.
>
>>
>>> +};
>>> +
>>> +struct peci_cputemp {
>>> + struct peci_device *peci_dev;
>>> + struct device *dev;
>>> + const char *name;
>>> + const struct cpu_info *gen_info;
>>> + struct {
>>> + struct peci_sensor_data die;
>>> + struct peci_sensor_data dts;
>>> + struct peci_sensor_data tcontrol;
>>> + struct peci_sensor_data tthrottle;
>>> + struct peci_sensor_data tjmax;
>>> + struct peci_sensor_data core[CORETEMP_CHANNEL_NUMS];
>>> + } temp;
>>> + const char **coretemp_label;
>>> + DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
>>> +};
>>> +
>>> +enum cputemp_channels {
>>> + channel_die,
>>> + channel_dts,
>>> + channel_tcontrol,
>>> + channel_tthrottle,
>>> + channel_tjmax,
>>> + channel_core,
>>> +};
>>> +
>>> +static const char *cputemp_label[DEFAULT_CHANNEL_NUMS] = {
>>
>> static const char * const cputemp_label? (That is, const pointer to
>> const char, rather than non-const pointer to const char.)
>
> Ack.
>
>>
>>> + "Die",
>>> + "DTS",
>>> + "Tcontrol",
>>> + "Tthrottle",
>>> + "Tjmax",
>>> +};
>>> +
>>> +static int get_temp_targets(struct peci_cputemp *priv)
>>> +{
>>> + s32 tthrottle_offset, tcontrol_margin;
>>> + u32 pcs;
>>> + int ret;
>>> +
>>> + /*
>>> + * Just use only the tcontrol marker to determine if target values
>>> need
>>> + * update.
>>> + */
>>> + if (!peci_sensor_need_update(&priv->temp.tcontrol))
>>> + return 0;
>>> +
>>> + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + priv->temp.tjmax.value = FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) *
>>> MILLIDEGREE_PER_DEGREE;
>>> +
>>> + tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
>>> + tcontrol_margin = sign_extend32(tcontrol_margin, 7) *
>>> MILLIDEGREE_PER_DEGREE;
>>> + priv->temp.tcontrol.value = priv->temp.tjmax.value -
>>> tcontrol_margin;
>>> +
>>> + tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) *
>>> MILLIDEGREE_PER_DEGREE;
>>> + priv->temp.tthrottle.value = priv->temp.tjmax.value -
>>> tthrottle_offset;
>>> +
>>> + peci_sensor_mark_updated(&priv->temp.tcontrol);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +/*
>>> + * Processors return a value of DTS reading in S10.6 fixed point format
>>> + * (sign, 10 bits signed integer value, 6 bits fractional).
>>
>> This parenthetical reads to me like it's describing 17 bits -- I'm not a
>> PECI expert, but from my reading of the (somewhat skimpy) docs I've got
>> on it I'd suggest a description more like "sign, 9-bit magnitude, 6-bit
>> fraction".
>
> You're right, adding "sign" here was not intentional.
> I'll change it to:
> "16 bits: sign, 9-bit magnitude, 6-bit fraction"
> or
> "16 bits: 10-bit signed magnitude, 6-bit fraction"
>
> Thanks
> -Iwona
>
On Mon, 2021-07-26 at 22:08 +0000, Zev Weiss wrote:
> On Mon, Jul 12, 2021 at 05:04:45PM CDT, Iwona Winiarska wrote:
> > Add peci-dimmtemp driver for Digital Thermal Sensor (DTS) thermal
> > readings of DIMMs that are accessible via the processor PECI interface.
> >
> > The main use case for the driver (and PECI interface) is out-of-band
> > management, where we're able to obtain the DTS readings from an external
> > entity connected with PECI, e.g. BMC on server platforms.
> >
> > Co-developed-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> > drivers/hwmon/peci/Kconfig | 13 +
> > drivers/hwmon/peci/Makefile | 2 +
> > drivers/hwmon/peci/dimmtemp.c | 508 ++++++++++++++++++++++++++++++++++
> > 3 files changed, 523 insertions(+)
> > create mode 100644 drivers/hwmon/peci/dimmtemp.c
> >
> > diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
> > index e10eed68d70a..f2d57efa508b 100644
> > --- a/drivers/hwmon/peci/Kconfig
> > +++ b/drivers/hwmon/peci/Kconfig
> > @@ -14,5 +14,18 @@ config SENSORS_PECI_CPUTEMP
> > This driver can also be built as a module. If so, the module
> > will be called peci-cputemp.
> >
> > +config SENSORS_PECI_DIMMTEMP
> > + tristate "PECI DIMM temperature monitoring client"
> > + depends on PECI
> > + select SENSORS_PECI
> > + select PECI_CPU
> > + help
> > + If you say yes here you get support for the generic Intel PECI hwmon
> > + driver which provides Digital Thermal Sensor (DTS) thermal readings
> > of
> > + DIMM components that are accessible via the processor PECI
> > interface.
> > +
> > + This driver can also be built as a module. If so, the module
> > + will be called peci-dimmtemp.
> > +
> > config SENSORS_PECI
> > tristate
> > diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
> > index e8a0ada5ab1f..191cfa0227f3 100644
> > --- a/drivers/hwmon/peci/Makefile
> > +++ b/drivers/hwmon/peci/Makefile
> > @@ -1,5 +1,7 @@
> > # SPDX-License-Identifier: GPL-2.0-only
> >
> > peci-cputemp-y := cputemp.o
> > +peci-dimmtemp-y := dimmtemp.o
> >
> > obj-$(CONFIG_SENSORS_PECI_CPUTEMP) += peci-cputemp.o
> > +obj-$(CONFIG_SENSORS_PECI_DIMMTEMP) += peci-dimmtemp.o
> > diff --git a/drivers/hwmon/peci/dimmtemp.c b/drivers/hwmon/peci/dimmtemp.c
> > new file mode 100644
> > index 000000000000..2fcb8607137a
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/dimmtemp.c
> > @@ -0,0 +1,508 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2018-2021 Intel Corporation
> > +
> > +#include <linux/auxiliary_bus.h>
> > +#include <linux/bitfield.h>
> > +#include <linux/bitops.h>
> > +#include <linux/hwmon.h>
> > +#include <linux/jiffies.h>
> > +#include <linux/module.h>
> > +#include <linux/peci.h>
> > +#include <linux/peci-cpu.h>
> > +#include <linux/units.h>
> > +#include <linux/workqueue.h>
> > +#include <linux/x86/intel-family.h>
> > +
> > +#include "common.h"
> > +
> > +#define DIMM_MASK_CHECK_DELAY_JIFFIES msecs_to_jiffies(5000)
> > +#define DIMM_MASK_CHECK_RETRY_MAX 60 /* 60 x 5 secs = 5 minutes */
> > +
> > +/* Max number of channel ranks and DIMM index per channel */
> > +#define CHAN_RANK_MAX_ON_HSX 8
> > +#define DIMM_IDX_MAX_ON_HSX 3
> > +#define CHAN_RANK_MAX_ON_BDX 4
> > +#define DIMM_IDX_MAX_ON_BDX 3
> > +#define CHAN_RANK_MAX_ON_BDXD 2
> > +#define DIMM_IDX_MAX_ON_BDXD 2
> > +#define CHAN_RANK_MAX_ON_SKX 6
> > +#define DIMM_IDX_MAX_ON_SKX 2
> > +#define CHAN_RANK_MAX_ON_ICX 8
> > +#define DIMM_IDX_MAX_ON_ICX 2
> > +#define CHAN_RANK_MAX_ON_ICXD 4
> > +#define DIMM_IDX_MAX_ON_ICXD 2
> > +
> > +#define CHAN_RANK_MAX CHAN_RANK_MAX_ON_HSX
> > +#define DIMM_IDX_MAX DIMM_IDX_MAX_ON_HSX
> > +#define DIMM_NUMS_MAX (CHAN_RANK_MAX * DIMM_IDX_MAX)
>
> Should we perhaps have a static_assert(DIMM_NUMS_MAX <= 64) so that
> check_populated_dimms() doesn't silently break if we ever have a system
> with > 64 dimms? (Not sure how far off that might be, but doesn't seem
> *that* wildly inconceivable, anyway.)
Ack, I'll add an assert and warn in check_populated_dimms():
BUILD_BUG_ON(DIMM_NUMS_MAX > 64);
if (chan_rank_max * dimm_idx_max > DIMM_NUMS_MAX) {
WARN_ONCE(1, "Unsupported number of DIMMs");
return -EINVAL;
}
>
> On a similar note, it'd be nice if there were some neat way of
> automating the maintenance of CHAN_RANK_MAX and DIMM_IDX_MAX, but I
> don't know of any great solutions for that offhand. (Shrug.)
With the added WARN it should be easy enough to be catch it without being an
issue.
>
> > +
> > +#define CPU_SEG_MASK GENMASK(23, 16)
> > +#define GET_CPU_SEG(x) (((x) & CPU_SEG_MASK) >> 16)
> > +#define CPU_BUS_MASK GENMASK(7, 0)
> > +#define GET_CPU_BUS(x) ((x) & CPU_BUS_MASK)
> > +
> > +#define DIMM_TEMP_MAX GENMASK(15, 8)
> > +#define DIMM_TEMP_CRIT GENMASK(23, 16)
> > +#define GET_TEMP_MAX(x) (((x) & DIMM_TEMP_MAX) >> 8)
> > +#define GET_TEMP_CRIT(x) (((x) & DIMM_TEMP_CRIT) >> 16)
> > +
> > +struct dimm_info {
> > + int chan_rank_max;
> > + int dimm_idx_max;
> > + u8 min_peci_revision;
>
> This field doesn't seem to be used for anything that I can see; is it
> really needed?
Just like in cputemp - for device sanity check.
>
> > +};
> > +
> > +struct peci_dimmtemp {
> > + struct peci_device *peci_dev;
> > + struct device *dev;
> > + const char *name;
> > + const struct dimm_info *gen_info;
> > + struct delayed_work detect_work;
> > + struct peci_sensor_data temp[DIMM_NUMS_MAX];
> > + long temp_max[DIMM_NUMS_MAX];
> > + long temp_crit[DIMM_NUMS_MAX];
> > + int retry_count;
> > + char **dimmtemp_label;
> > + DECLARE_BITMAP(dimm_mask, DIMM_NUMS_MAX);
> > +};
> > +
> > +static u8 __dimm_temp(u32 reg, int dimm_order)
> > +{
> > + return (reg >> (dimm_order * 8)) & 0xff;
> > +}
> > +
> > +static int get_dimm_temp(struct peci_dimmtemp *priv, int dimm_no)
> > +{
> > + int dimm_order = dimm_no % priv->gen_info->dimm_idx_max;
> > + int chan_rank = dimm_no / priv->gen_info->dimm_idx_max;
> > + struct peci_device *peci_dev = priv->peci_dev;
> > + u8 cpu_seg, cpu_bus, dev, func;
> > + u64 offset;
> > + u32 data;
> > + u16 reg;
> > + int ret;
> > +
> > + if (!peci_sensor_need_update(&priv->temp[dimm_no]))
> > + return 0;
> > +
> > + ret = peci_pcs_read(peci_dev, PECI_PCS_DDR_DIMM_TEMP, chan_rank,
> > &data);
> > + if (ret)
> > + return ret;
> > +
> > + priv->temp[dimm_no].value = __dimm_temp(data, dimm_order) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > + switch (peci_dev->info.model) {
> > + case INTEL_FAM6_ICELAKE_X:
> > + case INTEL_FAM6_ICELAKE_D:
> > + ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd4,
> > &data);
> > + if (ret || !(data & BIT(31)))
> > + break; /* Use default or previous value */
> > +
> > + ret = peci_ep_pci_local_read(peci_dev, 0, 13, 0, 2, 0xd0,
> > &data);
> > + if (ret)
> > + break; /* Use default or previous value */
> > +
> > + cpu_seg = GET_CPU_SEG(data);
> > + cpu_bus = GET_CPU_BUS(data);
> > +
> > + /*
> > + * Device 26, Offset 224e0: IMC 0 channel 0 -> rank 0
> > + * Device 26, Offset 264e0: IMC 0 channel 1 -> rank 1
> > + * Device 27, Offset 224e0: IMC 1 channel 0 -> rank 2
> > + * Device 27, Offset 264e0: IMC 1 channel 1 -> rank 3
> > + * Device 28, Offset 224e0: IMC 2 channel 0 -> rank 4
> > + * Device 28, Offset 264e0: IMC 2 channel 1 -> rank 5
> > + * Device 29, Offset 224e0: IMC 3 channel 0 -> rank 6
> > + * Device 29, Offset 264e0: IMC 3 channel 1 -> rank 7
> > + */
> > + dev = 0x1a + chan_rank / 2;
> > + offset = 0x224e0 + dimm_order * 4;
> > + if (chan_rank % 2)
> > + offset += 0x4000;
> > +
> > + ret = peci_mmio_read(peci_dev, 0, cpu_seg, cpu_bus, dev, 0,
> > offset, &data);
> > + if (ret)
> > + return ret;
> > +
> > + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
> > MILLIDEGREE_PER_DEGREE;
>
> These two lines look identical in all (non-default) cases; should we
> deduplicate them by just moving them to after the switch?
I'll refactor this.
>
> > +
> > + break;
> > + case INTEL_FAM6_SKYLAKE_X:
> > + /*
> > + * Device 10, Function 2: IMC 0 channel 0 -> rank 0
> > + * Device 10, Function 6: IMC 0 channel 1 -> rank 1
> > + * Device 11, Function 2: IMC 0 channel 2 -> rank 2
> > + * Device 12, Function 2: IMC 1 channel 0 -> rank 3
> > + * Device 12, Function 6: IMC 1 channel 1 -> rank 4
> > + * Device 13, Function 2: IMC 1 channel 2 -> rank 5
> > + */
> > + dev = 10 + chan_rank / 3 * 2 + (chan_rank % 3 == 2 ? 1 : 0);
> > + func = chan_rank % 3 == 1 ? 6 : 2;
> > + reg = 0x120 + dimm_order * 4;
> > +
> > + ret = peci_pci_local_read(peci_dev, 2, dev, func, reg,
> > &data);
> > + if (ret)
> > + return ret;
> > +
> > + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > + break;
> > + case INTEL_FAM6_BROADWELL_D:
> > + /*
> > + * Device 10, Function 2: IMC 0 channel 0 -> rank 0
> > + * Device 10, Function 6: IMC 0 channel 1 -> rank 1
> > + * Device 12, Function 2: IMC 1 channel 0 -> rank 2
> > + * Device 12, Function 6: IMC 1 channel 1 -> rank 3
> > + */
> > + dev = 10 + chan_rank / 2 * 2;
> > + func = (chan_rank % 2) ? 6 : 2;
> > + reg = 0x120 + dimm_order * 4;
> > +
> > + ret = peci_pci_local_read(peci_dev, 2, dev, func, reg,
> > &data);
> > + if (ret)
> > + return ret;
> > +
> > + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > + break;
> > + case INTEL_FAM6_HASWELL_X:
> > + case INTEL_FAM6_BROADWELL_X:
> > + /*
> > + * Device 20, Function 0: IMC 0 channel 0 -> rank 0
> > + * Device 20, Function 1: IMC 0 channel 1 -> rank 1
> > + * Device 21, Function 0: IMC 0 channel 2 -> rank 2
> > + * Device 21, Function 1: IMC 0 channel 3 -> rank 3
> > + * Device 23, Function 0: IMC 1 channel 0 -> rank 4
> > + * Device 23, Function 1: IMC 1 channel 1 -> rank 5
> > + * Device 24, Function 0: IMC 1 channel 2 -> rank 6
> > + * Device 24, Function 1: IMC 1 channel 3 -> rank 7
> > + */
> > + dev = 20 + chan_rank / 2 + chan_rank / 4;
> > + func = chan_rank % 2;
> > + reg = 0x120 + dimm_order * 4;
> > +
> > + ret = peci_pci_local_read(peci_dev, 1, dev, func, reg,
> > &data);
> > + if (ret)
> > + return ret;
> > +
> > + priv->temp_max[dimm_no] = GET_TEMP_MAX(data) *
> > MILLIDEGREE_PER_DEGREE;
> > + priv->temp_crit[dimm_no] = GET_TEMP_CRIT(data) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > + break;
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + peci_sensor_mark_updated(&priv->temp[dimm_no]);
> > +
> > + return 0;
> > +}
> > +
> > +static int dimmtemp_read_string(struct device *dev,
> > + enum hwmon_sensor_types type,
> > + u32 attr, int channel, const char **str)
> > +{
> > + struct peci_dimmtemp *priv = dev_get_drvdata(dev);
> > +
> > + if (attr != hwmon_temp_label)
> > + return -EOPNOTSUPP;
> > +
> > + *str = (const char *)priv->dimmtemp_label[channel];
> > +
> > + return 0;
> > +}
> > +
> > +static int dimmtemp_read(struct device *dev, enum hwmon_sensor_types type,
> > + u32 attr, int channel, long *val)
> > +{
> > + struct peci_dimmtemp *priv = dev_get_drvdata(dev);
> > + int ret;
> > +
> > + ret = get_dimm_temp(priv, channel);
> > + if (ret)
> > + return ret;
> > +
> > + switch (attr) {
> > + case hwmon_temp_input:
> > + *val = priv->temp[channel].value;
> > + break;
> > + case hwmon_temp_max:
> > + *val = priv->temp_max[channel];
> > + break;
> > + case hwmon_temp_crit:
> > + *val = priv->temp_crit[channel];
> > + break;
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static umode_t dimmtemp_is_visible(const void *data, enum
> > hwmon_sensor_types type,
> > + u32 attr, int channel)
> > +{
> > + const struct peci_dimmtemp *priv = data;
> > +
> > + if (test_bit(channel, priv->dimm_mask))
> > + return 0444;
> > +
> > + return 0;
> > +}
> > +
> > +static const struct hwmon_ops peci_dimmtemp_ops = {
> > + .is_visible = dimmtemp_is_visible,
> > + .read_string = dimmtemp_read_string,
> > + .read = dimmtemp_read,
> > +};
> > +
> > +static int check_populated_dimms(struct peci_dimmtemp *priv)
> > +{
> > + int chan_rank_max = priv->gen_info->chan_rank_max;
> > + int dimm_idx_max = priv->gen_info->dimm_idx_max;
> > + int chan_rank, dimm_idx, ret;
> > + u64 dimm_mask = 0;
> > + u32 pcs;
> > +
> > + for (chan_rank = 0; chan_rank < chan_rank_max; chan_rank++) {
> > + ret = peci_pcs_read(priv->peci_dev, PECI_PCS_DDR_DIMM_TEMP,
> > chan_rank, &pcs);
> > + if (ret) {
> > + /*
> > + * Overall, we expect either success or -EINVAL in
> > + * order to determine whether DIMM is populated or
> > not.
> > + * For anything else - we fall back to defering the
> > + * detection to be performed at a later point in
> > time.
> > + */
> > + if (ret == -EINVAL)
> > + continue;
> > + else
> > + return -EAGAIN;
> > + }
> > +
> > + for (dimm_idx = 0; dimm_idx < dimm_idx_max; dimm_idx++)
> > + if (__dimm_temp(pcs, dimm_idx))
> > + dimm_mask |= BIT(chan_rank * dimm_idx_max +
> > dimm_idx);
> > + }
> > + /*
> > + * It's possible that memory training is not done yet. In this case
> > we
> > + * defer the detection to be performed at a later point in time.
> > + */
> > + if (!dimm_mask)
> > + return -EAGAIN;
> > +
> > + dev_dbg(priv->dev, "Scanned populated DIMMs: %#llx\n", dimm_mask);
>
> Hmm, though aside from this one debug print it seems like this function
> could just as easily operate directly on priv->dimm_mask if we wanted to
> make it safe for >64 dimms (I have no particular objection to keeping it
> as-is for now though).
I'll leave it as is for now.
Thanks
-Iwona
On Tue, 2021-07-27 at 22:58 +0000, Zev Weiss wrote:
> On Mon, Jul 12, 2021 at 05:04:46PM CDT, Iwona Winiarska wrote:
> > From: Jae Hyun Yoo <[email protected]>
> >
> > Add documentation for peci-cputemp driver that provides DTS thermal
> > readings for CPU packages and CPU cores and peci-dimmtemp driver that
> > provides DTS thermal readings for DIMMs.
> >
> > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > Co-developed-by: Iwona Winiarska <[email protected]>
> > Signed-off-by: Iwona Winiarska <[email protected]>
> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > ---
> > Documentation/hwmon/index.rst | 2 +
> > Documentation/hwmon/peci-cputemp.rst | 93 +++++++++++++++++++++++++++
> > Documentation/hwmon/peci-dimmtemp.rst | 58 +++++++++++++++++
> > MAINTAINERS | 2 +
> > 4 files changed, 155 insertions(+)
> > create mode 100644 Documentation/hwmon/peci-cputemp.rst
> > create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
> >
> > diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
> > index bc01601ea81a..cc76b5b3f791 100644
> > --- a/Documentation/hwmon/index.rst
> > +++ b/Documentation/hwmon/index.rst
> > @@ -154,6 +154,8 @@ Hardware Monitoring Kernel Drivers
> > pcf8591
> > pim4328
> > pm6764tr
> > + peci-cputemp
> > + peci-dimmtemp
> > pmbus
> > powr1220
> > pxe1610
> > diff --git a/Documentation/hwmon/peci-cputemp.rst
> > b/Documentation/hwmon/peci-cputemp.rst
> > new file mode 100644
> > index 000000000000..d3a218ba810a
> > --- /dev/null
> > +++ b/Documentation/hwmon/peci-cputemp.rst
> > @@ -0,0 +1,93 @@
> > +.. SPDX-License-Identifier: GPL-2.0-only
> > +
> > +Kernel driver peci-cputemp
> > +==========================
> > +
> > +Supported chips:
> > + One of Intel server CPUs listed below which is connected to a PECI
> > bus.
> > + * Intel Xeon E5/E7 v3 server processors
> > + Intel Xeon E5-14xx v3 family
> > + Intel Xeon E5-24xx v3 family
> > + Intel Xeon E5-16xx v3 family
> > + Intel Xeon E5-26xx v3 family
> > + Intel Xeon E5-46xx v3 family
> > + Intel Xeon E7-48xx v3 family
> > + Intel Xeon E7-88xx v3 family
> > + * Intel Xeon E5/E7 v4 server processors
> > + Intel Xeon E5-16xx v4 family
> > + Intel Xeon E5-26xx v4 family
> > + Intel Xeon E5-46xx v4 family
> > + Intel Xeon E7-48xx v4 family
> > + Intel Xeon E7-88xx v4 family
> > + * Intel Xeon Scalable server processors
> > + Intel Xeon D family
> > + Intel Xeon Bronze family
> > + Intel Xeon Silver family
> > + Intel Xeon Gold family
> > + Intel Xeon Platinum family
> > +
> > + Datasheet: Available from http://www.intel.com/design/literature.htm
> > +
> > +Author: Jae Hyun Yoo <[email protected]>
> > +
> > +Description
> > +-----------
> > +
> > +This driver implements a generic PECI hwmon feature which provides Digital
> > +Thermal Sensor (DTS) thermal readings of the CPU package and CPU cores that
> > are
> > +accessible via the processor PECI interface.
> > +
> > +All temperature values are given in millidegree Celsius and will be
> > measurable
> > +only when the target CPU is powered on.
> > +
> > +Sysfs interface
> > +-------------------
> > +
> > +=======================
> > =======================================================
> > +temp1_label "Die"
> > +temp1_input Provides current die temperature of the CPU package.
> > +temp1_max Provides thermal control temperature of the CPU
> > package
> > + which is also known as Tcontrol.
> > +temp1_crit Provides shutdown temperature of the CPU package
> > which
> > + is also known as the maximum processor junction
> > + temperature, Tjmax or Tprochot.
> > +temp1_crit_hyst Provides the hysteresis value from Tcontrol
> > to Tjmax of
> > + the CPU package.
> > +
> > +temp2_label "DTS"
> > +temp2_input Provides current DTS temperature of the CPU package.
>
> Would this be a good place to note the slightly counter-intuitive nature
> of DTS readings? i.e. add something along the lines of "The DTS sensor
> produces a delta relative to Tjmax, so negative values are normal and
> values approaching zero are hot." (In my experience people who aren't
> already familiar with it tend to think something's wrong when a CPU
> temperature reading shows -50C.)
I believe that what you're referring to is a result of "GetTemp", and we're
using it to calculate "Die" sensor values (temp1).
The sensor value is absolute - we don't expose "raw" thermal sensor value
(delta) anywhere.
DTS sensor is exposing temperature value scaled to fit DTS 2.0 thermal profile:
https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-thermal-guide.html
(section 5.2.3.2)
Similar to "Die" sensor - it's also exposed in absolute form.
I'll try to change description to avoid confusion.
>
> > +temp2_max Provides thermal control temperature of the CPU
> > package
> > + which is also known as Tcontrol.
> > +temp2_crit Provides shutdown temperature of the CPU package which
> > + is also known as the maximum processor junction
> > + temperature, Tjmax or Tprochot.
> > +temp2_crit_hyst Provides the hysteresis value from Tcontrol to
> > Tjmax of
> > + the CPU package.
> > +
> > +temp3_label "Tcontrol"
> > +temp3_input Provides current Tcontrol temperature of the CPU
> > + package which is also known as Fan Temperature target.
> > + Indicates the relative value from thermal monitor trip
> > + temperature at which fans should be engaged.
> > +temp3_crit Provides Tcontrol critical value of the CPU package
> > + which is same to Tjmax.
> > +
> > +temp4_label "Tthrottle"
> > +temp4_input Provides current Tthrottle temperature of the CPU
> > + package. Used for throttling temperature. If this
> > value
> > + is allowed and lower than Tjmax - the throttle will
> > + occur and reported at lower than Tjmax.
> > +
> > +temp5_label "Tjmax"
> > +temp5_input Provides the maximum junction temperature, Tjmax of
> > the
> > + CPU package.
> > +
> > +temp[6-N]_label Provides string "Core X", where X is resolved
> > core
> > + number.
> > +temp[6-N]_input Provides current temperature of each core.
> > +temp[6-N]_max Provides thermal control temperature of the core.
> > +temp[6-N]_crit Provides shutdown temperature of the core.
> > +temp[6-N]_crit_hyst Provides the hysteresis value from Tcontrol to Tjmax
> > of
> > + the core.
>
> I only see *_label and *_input for the per-core temperature sensors, no
> *_max, *_crit, or *_crit_hyst.
You're right - this should be removed from documentation.
>
> > +
> > +=======================
> > =======================================================
> > diff --git a/Documentation/hwmon/peci-dimmtemp.rst b/Documentation/hwmon/peci-
> > dimmtemp.rst
> > new file mode 100644
> > index 000000000000..1778d9317e43
> > --- /dev/null
> > +++ b/Documentation/hwmon/peci-dimmtemp.rst
> > @@ -0,0 +1,58 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +Kernel driver peci-dimmtemp
> > +===========================
> > +
> > +Supported chips:
> > + One of Intel server CPUs listed below which is connected to a PECI
> > bus.
> > + * Intel Xeon E5/E7 v3 server processors
> > + Intel Xeon E5-14xx v3 family
> > + Intel Xeon E5-24xx v3 family
> > + Intel Xeon E5-16xx v3 family
> > + Intel Xeon E5-26xx v3 family
> > + Intel Xeon E5-46xx v3 family
> > + Intel Xeon E7-48xx v3 family
> > + Intel Xeon E7-88xx v3 family
> > + * Intel Xeon E5/E7 v4 server processors
> > + Intel Xeon E5-16xx v4 family
> > + Intel Xeon E5-26xx v4 family
> > + Intel Xeon E5-46xx v4 family
> > + Intel Xeon E7-48xx v4 family
> > + Intel Xeon E7-88xx v4 family
> > + * Intel Xeon Scalable server processors
> > + Intel Xeon D family
> > + Intel Xeon Bronze family
> > + Intel Xeon Silver family
> > + Intel Xeon Gold family
> > + Intel Xeon Platinum family
> > +
> > + Datasheet: Available from http://www.intel.com/design/literature.htm
> > +
> > +Author: Jae Hyun Yoo <[email protected]>
> > +
> > +Description
> > +-----------
> > +
> > +This driver implements a generic PECI hwmon feature which provides Digital
> > +Thermal Sensor (DTS) thermal readings of DIMM components that are accessible
> > +via the processor PECI interface.
>
> I had thought "DTS" referred to a fairly specific sensor in the CPU; is
> the same term also used for DIMM temp sensors or is the mention of it
> here a copy/paste error?
Yeah - it should be "Temperature Sensor on DIMM".
Thanks
-Iwona
>
> > +
> > +All temperature values are given in millidegree Celsius and will be
> > measurable
> > +only when the target CPU is powered on.
> > +
> > +Sysfs interface
> > +-------------------
> > +
> > +=======================
> > =======================================================
> > +
> > +temp[N]_label Provides string "DIMM CI", where C is DIMM channel and
> > + I is DIMM index of the populated DIMM.
> > +temp[N]_input Provides current temperature of the populated DIMM.
> > +temp[N]_max Provides thermal control temperature of the DIMM.
> > +temp[N]_crit Provides shutdown temperature of the DIMM.
> > +
> > +=======================
> > =======================================================
> > +
> > +Note:
> > + DIMM temperature attributes will appear when the client CPU's BIOS
> > + completes memory training and testing.
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 35ba9e3646bd..d16da127bbdc 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -14509,6 +14509,8 @@ M: Iwona Winiarska <[email protected]>
> > R: Jae Hyun Yoo <[email protected]>
> > L: [email protected]
> > S: Supported
> > +F: Documentation/hwmon/peci-cputemp.rst
> > +F: Documentation/hwmon/peci-dimmtemp.rst
> > F: drivers/hwmon/peci/
> >
> > PECI SUBSYSTEM
> > --
> > 2.31.1
On Tue, 2021-07-27 at 17:49 -0700, Guenter Roeck wrote:
> On 7/27/21 3:58 PM, Zev Weiss wrote:
> > On Mon, Jul 12, 2021 at 05:04:46PM CDT, Iwona Winiarska wrote:
> > > From: Jae Hyun Yoo <[email protected]>
> > >
> > > Add documentation for peci-cputemp driver that provides DTS thermal
> > > readings for CPU packages and CPU cores and peci-dimmtemp driver that
> > > provides DTS thermal readings for DIMMs.
> > >
> > > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > > Co-developed-by: Iwona Winiarska <[email protected]>
> > > Signed-off-by: Iwona Winiarska <[email protected]>
> > > Reviewed-by: Pierre-Louis Bossart <[email protected]>
> > > ---
> > > Documentation/hwmon/index.rst | 2 +
> > > Documentation/hwmon/peci-cputemp.rst | 93 +++++++++++++++++++++++++++
> > > Documentation/hwmon/peci-dimmtemp.rst | 58 +++++++++++++++++
> > > MAINTAINERS | 2 +
> > > 4 files changed, 155 insertions(+)
> > > create mode 100644 Documentation/hwmon/peci-cputemp.rst
> > > create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
> > >
> > > diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
> > > index bc01601ea81a..cc76b5b3f791 100644
> > > --- a/Documentation/hwmon/index.rst
> > > +++ b/Documentation/hwmon/index.rst
> > > @@ -154,6 +154,8 @@ Hardware Monitoring Kernel Drivers
> > > pcf8591
> > > pim4328
> > > pm6764tr
> > > + peci-cputemp
> > > + peci-dimmtemp
> > > pmbus
> > > powr1220
> > > pxe1610
> > > diff --git a/Documentation/hwmon/peci-cputemp.rst
> > > b/Documentation/hwmon/peci-cputemp.rst
> > > new file mode 100644
> > > index 000000000000..d3a218ba810a
> > > --- /dev/null
> > > +++ b/Documentation/hwmon/peci-cputemp.rst
> > > @@ -0,0 +1,93 @@
> > > +.. SPDX-License-Identifier: GPL-2.0-only
> > > +
> > > +Kernel driver peci-cputemp
> > > +==========================
> > > +
> > > +Supported chips:
> > > + One of Intel server CPUs listed below which is connected to a PECI
> > > bus.
> > > + * Intel Xeon E5/E7 v3 server processors
> > > + Intel Xeon E5-14xx v3 family
> > > + Intel Xeon E5-24xx v3 family
> > > + Intel Xeon E5-16xx v3 family
> > > + Intel Xeon E5-26xx v3 family
> > > + Intel Xeon E5-46xx v3 family
> > > + Intel Xeon E7-48xx v3 family
> > > + Intel Xeon E7-88xx v3 family
> > > + * Intel Xeon E5/E7 v4 server processors
> > > + Intel Xeon E5-16xx v4 family
> > > + Intel Xeon E5-26xx v4 family
> > > + Intel Xeon E5-46xx v4 family
> > > + Intel Xeon E7-48xx v4 family
> > > + Intel Xeon E7-88xx v4 family
> > > + * Intel Xeon Scalable server processors
> > > + Intel Xeon D family
> > > + Intel Xeon Bronze family
> > > + Intel Xeon Silver family
> > > + Intel Xeon Gold family
> > > + Intel Xeon Platinum family
> > > +
> > > + Datasheet: Available from http://www.intel.com/design/literature.htm
> > > +
> > > +Author: Jae Hyun Yoo <[email protected]>
> > > +
> > > +Description
> > > +-----------
> > > +
> > > +This driver implements a generic PECI hwmon feature which provides Digital
> > > +Thermal Sensor (DTS) thermal readings of the CPU package and CPU cores that
> > > are
> > > +accessible via the processor PECI interface.
> > > +
> > > +All temperature values are given in millidegree Celsius and will be
> > > measurable
> > > +only when the target CPU is powered on.
> > > +
> > > +Sysfs interface
> > > +-------------------
> > > +
> > > +=======================
> > > =======================================================
> > > +temp1_label "Die"
> > > +temp1_input Provides current die temperature of the CPU package.
> > > +temp1_max Provides thermal control temperature of the CPU
> > > package
> > > + which is also known as Tcontrol.
> > > +temp1_crit Provides shutdown temperature of the CPU package
> > > which
> > > + is also known as the maximum processor junction
> > > + temperature, Tjmax or Tprochot.
> > > +temp1_crit_hyst Provides the hysteresis value from Tcontrol
> > > to Tjmax of
> > > + the CPU package.
> > > +
> > > +temp2_label "DTS"
> > > +temp2_input Provides current DTS temperature of the CPU package.
> >
> > Would this be a good place to note the slightly counter-intuitive nature
> > of DTS readings? i.e. add something along the lines of "The DTS sensor
> > produces a delta relative to Tjmax, so negative values are normal and
> > values approaching zero are hot." (In my experience people who aren't
> > already familiar with it tend to think something's wrong when a CPU
> > temperature reading shows -50C.)
> >
>
> All attributes shall follow the ABI, and the driver must translate reported
> values to degrees C. If those sensors do not follow the ABI and report something
> else, I won't accept the driver.
>
> Guenter
Sure, I believe all attributes already follow the ABI and the reported values
are in millidegree Celsius.
Thanks
-Iwona
>
On Mon, Aug 02, 2021 at 06:37:30AM CDT, Winiarska, Iwona wrote:
>On Tue, 2021-07-27 at 22:58 +0000, Zev Weiss wrote:
>> On Mon, Jul 12, 2021 at 05:04:46PM CDT, Iwona Winiarska wrote:
>> > From: Jae Hyun Yoo <[email protected]>
>> >
>> > Add documentation for peci-cputemp driver that provides DTS thermal
>> > readings for CPU packages and CPU cores and peci-dimmtemp driver that
>> > provides DTS thermal readings for DIMMs.
>> >
>> > Signed-off-by: Jae Hyun Yoo <[email protected]>
>> > Co-developed-by: Iwona Winiarska <[email protected]>
>> > Signed-off-by: Iwona Winiarska <[email protected]>
>> > Reviewed-by: Pierre-Louis Bossart <[email protected]>
>> > ---
>> > Documentation/hwmon/index.rst???????? |? 2 +
>> > Documentation/hwmon/peci-cputemp.rst? | 93 +++++++++++++++++++++++++++
>> > Documentation/hwmon/peci-dimmtemp.rst | 58 +++++++++++++++++
>> > MAINTAINERS?????????????????????????? |? 2 +
>> > 4 files changed, 155 insertions(+)
>> > create mode 100644 Documentation/hwmon/peci-cputemp.rst
>> > create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
>> >
>> > diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
>> > index bc01601ea81a..cc76b5b3f791 100644
>> > --- a/Documentation/hwmon/index.rst
>> > +++ b/Documentation/hwmon/index.rst
>> > @@ -154,6 +154,8 @@ Hardware Monitoring Kernel Drivers
>> > ?? pcf8591
>> > ?? pim4328
>> > ?? pm6764tr
>> > +?? peci-cputemp
>> > +?? peci-dimmtemp
>> > ?? pmbus
>> > ?? powr1220
>> > ?? pxe1610
>> > diff --git a/Documentation/hwmon/peci-cputemp.rst
>> > b/Documentation/hwmon/peci-cputemp.rst
>> > new file mode 100644
>> > index 000000000000..d3a218ba810a
>> > --- /dev/null
>> > +++ b/Documentation/hwmon/peci-cputemp.rst
>> > @@ -0,0 +1,93 @@
>> > +.. SPDX-License-Identifier: GPL-2.0-only
>> > +
>> > +Kernel driver peci-cputemp
>> > +==========================
>> > +
>> > +Supported chips:
>> > +???????One of Intel server CPUs listed below which is connected to a PECI
>> > bus.
>> > +???????????????* Intel Xeon E5/E7 v3 server processors
>> > +???????????????????????Intel Xeon E5-14xx v3 family
>> > +???????????????????????Intel Xeon E5-24xx v3 family
>> > +???????????????????????Intel Xeon E5-16xx v3 family
>> > +???????????????????????Intel Xeon E5-26xx v3 family
>> > +???????????????????????Intel Xeon E5-46xx v3 family
>> > +???????????????????????Intel Xeon E7-48xx v3 family
>> > +???????????????????????Intel Xeon E7-88xx v3 family
>> > +???????????????* Intel Xeon E5/E7 v4 server processors
>> > +???????????????????????Intel Xeon E5-16xx v4 family
>> > +???????????????????????Intel Xeon E5-26xx v4 family
>> > +???????????????????????Intel Xeon E5-46xx v4 family
>> > +???????????????????????Intel Xeon E7-48xx v4 family
>> > +???????????????????????Intel Xeon E7-88xx v4 family
>> > +???????????????* Intel Xeon Scalable server processors
>> > +???????????????????????Intel Xeon D family
>> > +???????????????????????Intel Xeon Bronze family
>> > +???????????????????????Intel Xeon Silver family
>> > +???????????????????????Intel Xeon Gold family
>> > +???????????????????????Intel Xeon Platinum family
>> > +
>> > +???????Datasheet: Available from http://www.intel.com/design/literature.htm
>> > +
>> > +Author: Jae Hyun Yoo <[email protected]>
>> > +
>> > +Description
>> > +-----------
>> > +
>> > +This driver implements a generic PECI hwmon feature which provides Digital
>> > +Thermal Sensor (DTS) thermal readings of the CPU package and CPU cores that
>> > are
>> > +accessible via the processor PECI interface.
>> > +
>> > +All temperature values are given in millidegree Celsius and will be
>> > measurable
>> > +only when the target CPU is powered on.
>> > +
>> > +Sysfs interface
>> > +-------------------
>> > +
>> > +=======================
>> > =======================================================
>> > +temp1_label????????????"Die"
>> > +temp1_input????????????Provides current die temperature of the CPU package.
>> > +temp1_max??????????????Provides thermal control temperature of the CPU
>> > package
>> > +???????????????????????which is also known as Tcontrol.
>> > +temp1_crit?????????????Provides shutdown temperature of the CPU package
>> > which
>> > +???????????????????????is also known as the maximum processor junction
>> > +???????????????????????temperature, Tjmax or Tprochot.
>> > +temp1_crit_hyst????????????????Provides the hysteresis value from Tcontrol
>> > to Tjmax of
>> > +???????????????????????the CPU package.
>> > +
>> > +temp2_label????????????"DTS"
>> > +temp2_input????????????Provides current DTS temperature of the CPU package.
>>
>> Would this be a good place to note the slightly counter-intuitive nature
>> of DTS readings?? i.e. add something along the lines of "The DTS sensor
>> produces a delta relative to Tjmax, so negative values are normal and
>> values approaching zero are hot."? (In my experience people who aren't
>> already familiar with it tend to think something's wrong when a CPU
>> temperature reading shows -50C.)
>
>I believe that what you're referring to is a result of "GetTemp", and we're
>using it to calculate "Die" sensor values (temp1).
>The sensor value is absolute - we don't expose "raw" thermal sensor value
>(delta) anywhere.
>
>DTS sensor is exposing temperature value scaled to fit DTS 2.0 thermal profile:
>https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-thermal-guide.html
>(section 5.2.3.2)
>
>Similar to "Die" sensor - it's also exposed in absolute form.
>
>I'll try to change description to avoid confusion.
>
When I tested the patch series by applying it to my OpenBMC kernel, the
temp2_input sysfs file produced negative numbers (as has been the case
with previous iterations of the PECI patchset). Is that expected? From
what Guenter has said it sounds like that's going to need to change so
that the temperature readings are all in "normal" millidegrees C
(that is, relative to the freezing point of water).
Zev
On 8/4/21 10:52 AM, Zev Weiss wrote:
> On Mon, Aug 02, 2021 at 06:37:30AM CDT, Winiarska, Iwona wrote:
>> On Tue, 2021-07-27 at 22:58 +0000, Zev Weiss wrote:
>>> On Mon, Jul 12, 2021 at 05:04:46PM CDT, Iwona Winiarska wrote:
>>>> From: Jae Hyun Yoo <[email protected]>
>>>>
>>>> Add documentation for peci-cputemp driver that provides DTS thermal
>>>> readings for CPU packages and CPU cores and peci-dimmtemp driver that
>>>> provides DTS thermal readings for DIMMs.
>>>>
>>>> Signed-off-by: Jae Hyun Yoo <[email protected]>
>>>> Co-developed-by: Iwona Winiarska <[email protected]>
>>>> Signed-off-by: Iwona Winiarska <[email protected]>
>>>> Reviewed-by: Pierre-Louis Bossart <[email protected]>
>>>> ---
>>>> Documentation/hwmon/index.rst | 2 +
>>>> Documentation/hwmon/peci-cputemp.rst | 93 +++++++++++++++++++++++++++
>>>> Documentation/hwmon/peci-dimmtemp.rst | 58 +++++++++++++++++
>>>> MAINTAINERS | 2 +
>>>> 4 files changed, 155 insertions(+)
>>>> create mode 100644 Documentation/hwmon/peci-cputemp.rst
>>>> create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
>>>>
>>>> diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
>>>> index bc01601ea81a..cc76b5b3f791 100644
>>>> --- a/Documentation/hwmon/index.rst
>>>> +++ b/Documentation/hwmon/index.rst
>>>> @@ -154,6 +154,8 @@ Hardware Monitoring Kernel Drivers
>>>> pcf8591
>>>> pim4328
>>>> pm6764tr
>>>> + peci-cputemp
>>>> + peci-dimmtemp
>>>> pmbus
>>>> powr1220
>>>> pxe1610
>>>> diff --git a/Documentation/hwmon/peci-cputemp.rst
>>>> b/Documentation/hwmon/peci-cputemp.rst
>>>> new file mode 100644
>>>> index 000000000000..d3a218ba810a
>>>> --- /dev/null
>>>> +++ b/Documentation/hwmon/peci-cputemp.rst
>>>> @@ -0,0 +1,93 @@
>>>> +.. SPDX-License-Identifier: GPL-2.0-only
>>>> +
>>>> +Kernel driver peci-cputemp
>>>> +==========================
>>>> +
>>>> +Supported chips:
>>>> + One of Intel server CPUs listed below which is connected to a PECI
>>>> bus.
>>>> + * Intel Xeon E5/E7 v3 server processors
>>>> + Intel Xeon E5-14xx v3 family
>>>> + Intel Xeon E5-24xx v3 family
>>>> + Intel Xeon E5-16xx v3 family
>>>> + Intel Xeon E5-26xx v3 family
>>>> + Intel Xeon E5-46xx v3 family
>>>> + Intel Xeon E7-48xx v3 family
>>>> + Intel Xeon E7-88xx v3 family
>>>> + * Intel Xeon E5/E7 v4 server processors
>>>> + Intel Xeon E5-16xx v4 family
>>>> + Intel Xeon E5-26xx v4 family
>>>> + Intel Xeon E5-46xx v4 family
>>>> + Intel Xeon E7-48xx v4 family
>>>> + Intel Xeon E7-88xx v4 family
>>>> + * Intel Xeon Scalable server processors
>>>> + Intel Xeon D family
>>>> + Intel Xeon Bronze family
>>>> + Intel Xeon Silver family
>>>> + Intel Xeon Gold family
>>>> + Intel Xeon Platinum family
>>>> +
>>>> + Datasheet: Available from http://www.intel.com/design/literature.htm
>>>> +
>>>> +Author: Jae Hyun Yoo <[email protected]>
>>>> +
>>>> +Description
>>>> +-----------
>>>> +
>>>> +This driver implements a generic PECI hwmon feature which provides Digital
>>>> +Thermal Sensor (DTS) thermal readings of the CPU package and CPU cores that
>>>> are
>>>> +accessible via the processor PECI interface.
>>>> +
>>>> +All temperature values are given in millidegree Celsius and will be
>>>> measurable
>>>> +only when the target CPU is powered on.
>>>> +
>>>> +Sysfs interface
>>>> +-------------------
>>>> +
>>>> +=======================
>>>> =======================================================
>>>> +temp1_label "Die"
>>>> +temp1_input Provides current die temperature of the CPU package.
>>>> +temp1_max Provides thermal control temperature of the CPU
>>>> package
>>>> + which is also known as Tcontrol.
>>>> +temp1_crit Provides shutdown temperature of the CPU package
>>>> which
>>>> + is also known as the maximum processor junction
>>>> + temperature, Tjmax or Tprochot.
>>>> +temp1_crit_hyst Provides the hysteresis value from Tcontrol
>>>> to Tjmax of
>>>> + the CPU package.
>>>> +
>>>> +temp2_label "DTS"
>>>> +temp2_input Provides current DTS temperature of the CPU package.
>>>
>>> Would this be a good place to note the slightly counter-intuitive nature
>>> of DTS readings? i.e. add something along the lines of "The DTS sensor
>>> produces a delta relative to Tjmax, so negative values are normal and
>>> values approaching zero are hot." (In my experience people who aren't
>>> already familiar with it tend to think something's wrong when a CPU
>>> temperature reading shows -50C.)
>>
>> I believe that what you're referring to is a result of "GetTemp", and we're
>> using it to calculate "Die" sensor values (temp1).
>> The sensor value is absolute - we don't expose "raw" thermal sensor value
>> (delta) anywhere.
>>
>> DTS sensor is exposing temperature value scaled to fit DTS 2.0 thermal profile:
>> https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-thermal-guide.html
>> (section 5.2.3.2)
>>
>> Similar to "Die" sensor - it's also exposed in absolute form.
>>
>> I'll try to change description to avoid confusion.
>>
>
> When I tested the patch series by applying it to my OpenBMC kernel, the
> temp2_input sysfs file produced negative numbers (as has been the case
> with previous iterations of the PECI patchset). Is that expected? From
> what Guenter has said it sounds like that's going to need to change so
> that the temperature readings are all in "normal" millidegrees C
> (that is, relative to the freezing point of water).
>
Correct, the temperature is expected to be reported in millidegrees C
per hwmon ABI. Everything else is unacceptable. That makes me wonder what
"raw" and "absolute" means. Negative numbers suggest that, whatever is
reported today, it is not millidegrees C.
Guenter
On Wed, 2021-08-04 at 11:05 -0700, Guenter Roeck wrote:
> On 8/4/21 10:52 AM, Zev Weiss wrote:
> > On Mon, Aug 02, 2021 at 06:37:30AM CDT, Winiarska, Iwona wrote:
> > > On Tue, 2021-07-27 at 22:58 +0000, Zev Weiss wrote:
> > > > On Mon, Jul 12, 2021 at 05:04:46PM CDT, Iwona Winiarska wrote:
> > > > > From: Jae Hyun Yoo <[email protected]>
> > > > >
> > > > > Add documentation for peci-cputemp driver that provides DTS thermal
> > > > > readings for CPU packages and CPU cores and peci-dimmtemp driver that
> > > > > provides DTS thermal readings for DIMMs.
> > > > >
> > > > > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > > > > Co-developed-by: Iwona Winiarska <[email protected]>
> > > > > Signed-off-by: Iwona Winiarska <[email protected]>
> > > > > Reviewed-by: Pierre-Louis Bossart
> > > > > <[email protected]>
> > > > > ---
> > > > > Documentation/hwmon/index.rst | 2 +
> > > > > Documentation/hwmon/peci-cputemp.rst | 93 +++++++++++++++++++++++++++
> > > > > Documentation/hwmon/peci-dimmtemp.rst | 58 +++++++++++++++++
> > > > > MAINTAINERS | 2 +
> > > > > 4 files changed, 155 insertions(+)
> > > > > create mode 100644 Documentation/hwmon/peci-cputemp.rst
> > > > > create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
> > > > >
> > > > > diff --git a/Documentation/hwmon/index.rst
> > > > > b/Documentation/hwmon/index.rst
> > > > > index bc01601ea81a..cc76b5b3f791 100644
> > > > > --- a/Documentation/hwmon/index.rst
> > > > > +++ b/Documentation/hwmon/index.rst
> > > > > @@ -154,6 +154,8 @@ Hardware Monitoring Kernel Drivers
> > > > > pcf8591
> > > > > pim4328
> > > > > pm6764tr
> > > > > + peci-cputemp
> > > > > + peci-dimmtemp
> > > > > pmbus
> > > > > powr1220
> > > > > pxe1610
> > > > > diff --git a/Documentation/hwmon/peci-cputemp.rst
> > > > > b/Documentation/hwmon/peci-cputemp.rst
> > > > > new file mode 100644
> > > > > index 000000000000..d3a218ba810a
> > > > > --- /dev/null
> > > > > +++ b/Documentation/hwmon/peci-cputemp.rst
> > > > > @@ -0,0 +1,93 @@
> > > > > +.. SPDX-License-Identifier: GPL-2.0-only
> > > > > +
> > > > > +Kernel driver peci-cputemp
> > > > > +==========================
> > > > > +
> > > > > +Supported chips:
> > > > > + One of Intel server CPUs listed below which is connected to a
> > > > > PECI
> > > > > bus.
> > > > > + * Intel Xeon E5/E7 v3 server processors
> > > > > + Intel Xeon E5-14xx v3 family
> > > > > + Intel Xeon E5-24xx v3 family
> > > > > + Intel Xeon E5-16xx v3 family
> > > > > + Intel Xeon E5-26xx v3 family
> > > > > + Intel Xeon E5-46xx v3 family
> > > > > + Intel Xeon E7-48xx v3 family
> > > > > + Intel Xeon E7-88xx v3 family
> > > > > + * Intel Xeon E5/E7 v4 server processors
> > > > > + Intel Xeon E5-16xx v4 family
> > > > > + Intel Xeon E5-26xx v4 family
> > > > > + Intel Xeon E5-46xx v4 family
> > > > > + Intel Xeon E7-48xx v4 family
> > > > > + Intel Xeon E7-88xx v4 family
> > > > > + * Intel Xeon Scalable server processors
> > > > > + Intel Xeon D family
> > > > > + Intel Xeon Bronze family
> > > > > + Intel Xeon Silver family
> > > > > + Intel Xeon Gold family
> > > > > + Intel Xeon Platinum family
> > > > > +
> > > > > + Datasheet: Available from
> > > > > http://www.intel.com/design/literature.htm
> > > > > +
> > > > > +Author: Jae Hyun Yoo <[email protected]>
> > > > > +
> > > > > +Description
> > > > > +-----------
> > > > > +
> > > > > +This driver implements a generic PECI hwmon feature which provides
> > > > > Digital
> > > > > +Thermal Sensor (DTS) thermal readings of the CPU package and CPU
> > > > > cores that
> > > > > are
> > > > > +accessible via the processor PECI interface.
> > > > > +
> > > > > +All temperature values are given in millidegree Celsius and will be
> > > > > measurable
> > > > > +only when the target CPU is powered on.
> > > > > +
> > > > > +Sysfs interface
> > > > > +-------------------
> > > > > +
> > > > > +=======================
> > > > > =======================================================
> > > > > +temp1_label "Die"
> > > > > +temp1_input Provides current die temperature of the CPU
> > > > > package.
> > > > > +temp1_max Provides thermal control temperature of the
> > > > > CPU
> > > > > package
> > > > > + which is also known as Tcontrol.
> > > > > +temp1_crit Provides shutdown temperature of the CPU
> > > > > package
> > > > > which
> > > > > + is also known as the maximum processor
> > > > > junction
> > > > > + temperature, Tjmax or Tprochot.
> > > > > +temp1_crit_hyst Provides the hysteresis value from
> > > > > Tcontrol
> > > > > to Tjmax of
> > > > > + the CPU package.
> > > > > +
> > > > > +temp2_label "DTS"
> > > > > +temp2_input Provides current DTS temperature of the CPU
> > > > > package.
> > > >
> > > > Would this be a good place to note the slightly counter-intuitive nature
> > > > of DTS readings? i.e. add something along the lines of "The DTS sensor
> > > > produces a delta relative to Tjmax, so negative values are normal and
> > > > values approaching zero are hot." (In my experience people who aren't
> > > > already familiar with it tend to think something's wrong when a CPU
> > > > temperature reading shows -50C.)
> > >
> > > I believe that what you're referring to is a result of "GetTemp", and
> > > we're
> > > using it to calculate "Die" sensor values (temp1).
> > > The sensor value is absolute - we don't expose "raw" thermal sensor value
> > > (delta) anywhere.
> > >
> > > DTS sensor is exposing temperature value scaled to fit DTS 2.0 thermal
> > > profile:
> > > https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-thermal-guide.html
> > > (section 5.2.3.2)
> > >
> > > Similar to "Die" sensor - it's also exposed in absolute form.
> > >
> > > I'll try to change description to avoid confusion.
> > >
> >
> > When I tested the patch series by applying it to my OpenBMC kernel, the
> > temp2_input sysfs file produced negative numbers (as has been the case
> > with previous iterations of the PECI patchset). Is that expected? From
> > what Guenter has said it sounds like that's going to need to change so
> > that the temperature readings are all in "normal" millidegrees C
> > (that is, relative to the freezing point of water).
> >
>
> Correct, the temperature is expected to be reported in millidegrees C
> per hwmon ABI. Everything else is unacceptable. That makes me wonder what
> "raw" and "absolute" means. Negative numbers suggest that, whatever is
> reported today, it is not millidegrees C.
Let's say we have two values: "base" and "delta". Both are in milidegrees C.
"absolute" means that the sensor value exposed to userspace is calculated as:
base - delta (or base + delta, depending on sensor).
"relative" would mean that we expose "delta" to userspace as sensor value.
For peci-cputemp (and dimmtemp) we're exposing sensors in "absolute" form.
I contacted Zev and we found that the platform he uses has a different format
for the "raw" value ("delta" in the example above) of this particular sensor
(S8.8 instead of S10.6), which means that we're subtracting significantly larger
number than we should, resulting in sensor going into negative.
On the platform I'm using for development purpose, sampling Die and DTS values
returned:
Die 26344
DTS 26329
The platform that Zev used is currently not supported by peci-cpu, however, I
went through the specs, and it looks like some of the older supported platforms
are also using S8.8.
I'll fix this in v3.
Thanks
-Iwona
>
> Guenter