On Tue, 2021-07-13 at 00:04 +0200, Iwona Winiarska wrote:
> Note: All changes to arch/x86 are contained within patches 01-02.

Hi Iwona,

One meta question first, who is this submission "To:"? Is there an
existing upstream maintainer path for OpenBMC changes? Are you
expecting contributions to this subsystem from others? While Greg
sometimes ends up as default maintainer for new stuff, I wonder if
someone from the OpenBMC commnuity should step up to fill this role?

>
> The Platform Environment Control Interface (PECI) is a communication
> interface between Intel processors and management controllers (e.g.
> Baseboard Management Controller, BMC).
>
> This series adds a PECI subsystem and introduces drivers which run in
> the Linux instance on the management controller (not the main Intel
> processor) and is intended to be used by the OpenBMC [1], a Linux
> distribution for BMC devices.
> The information exposed over PECI (like processor and DIMM
> temperature) refers to the Intel processor and can be consumed by
> daemons running on the BMC to, for example, display the processor
> temperature in its web interface.
>
> The PECI bus is collection of code that provides interface support
> between PECI devices (that actually represent processors) and PECI
> controllers (such as the "peci-aspeed" controller) that allow to
> access physical PECI interface. PECI devices are bound to PECI
> drivers that provides access to PECI services. This series introduces
> a generic "peci-cpu" driver that exposes hardware monitoring
> "cputemp"
> and "dimmtemp" using the auxiliary bus.
>
> Exposing "raw" PECI to userspace, either to write userspace drivers
> or
> for debug/testing purpose was left out of this series to encourage
> writing kernel drivers instead, but may be pursued in the future.
>
> Introducing PECI to upstream Linux was already attempted before [2].
> Since it's been over a year since last revision, and the series
> changed quite a bit in the meantime, I've decided to start from v1.
>
> I would also like to give credit to everyone who helped me with
> different aspects of preliminary review:
> - Pierre-Louis Bossart,
> - Tony Luck,
> - Andy Shevchenko,
> - Dave Hansen.
>
> [1] https://github.com/openbmc/openbmc
> [2]
> https://lore.kernel.org/openbmc/[email protected]/
>
> Iwona Winiarska (12):
> x86/cpu: Move intel-family to arch-independent headers
> x86/cpu: Extract cpuid helpers to arch-independent
> dt-bindings: Add generic bindings for PECI
> dt-bindings: Add bindings for peci-aspeed
> ARM: dts: aspeed: Add PECI controller nodes
> peci: Add core infrastructure
> peci: Add device detection
> peci: Add support for PECI device drivers
> peci: Add peci-cpu driver
> hwmon: peci: Add cputemp driver
> hwmon: peci: Add dimmtemp driver
> docs: Add PECI documentation
>
> Jae Hyun Yoo (2):
> peci: Add peci-aspeed controller driver
> docs: hwmon: Document PECI drivers
>
> .../devicetree/bindings/peci/peci-aspeed.yaml | 111 ++++
> .../bindings/peci/peci-controller.yaml        | 28 +
> Documentation/hwmon/index.rst                 |   2 +
> Documentation/hwmon/peci-cputemp.rst          | 93 ++++
> Documentation/hwmon/peci-dimmtemp.rst         | 58 ++
> Documentation/index.rst                       |   1 +
> Documentation/peci/index.rst                  | 16 +
> Documentation/peci/peci.rst                   | 48 ++
> MAINTAINERS                                   | 32 ++
> arch/arm/boot/dts/aspeed-g4.dtsi              | 14 +
> arch/arm/boot/dts/aspeed-g5.dtsi              | 14 +
> arch/arm/boot/dts/aspeed-g6.dtsi              | 14 +
> arch/x86/Kconfig                              |   1 +
> arch/x86/include/asm/cpu.h                    |   3 -
> arch/x86/include/asm/intel-family.h           | 141 +----
> arch/x86/include/asm/microcode.h              |   2 +-
> arch/x86/kvm/cpuid.h                          |   3 +-
> arch/x86/lib/Makefile                         |   2 +-
> drivers/Kconfig                               |   3 +
> drivers/Makefile                              |   1 +
> drivers/edac/mce_amd.c                        |   3 +-
> drivers/hwmon/Kconfig                         |   2 +
> drivers/hwmon/Makefile                        |   1 +
> drivers/hwmon/peci/Kconfig                    | 31 ++
> drivers/hwmon/peci/Makefile                   |   7 +
> drivers/hwmon/peci/common.h                   | 46 ++
> drivers/hwmon/peci/cputemp.c                  | 503
> +++++++++++++++++
> drivers/hwmon/peci/dimmtemp.c                 | 508
> ++++++++++++++++++
> drivers/peci/Kconfig                          | 36 ++
> drivers/peci/Makefile                         | 10 +
> drivers/peci/controller/Kconfig               | 12 +
> drivers/peci/controller/Makefile              |   3 +
> drivers/peci/controller/peci-aspeed.c         | 501
> +++++++++++++++++
> drivers/peci/core.c                           | 224 ++++++++
> drivers/peci/cpu.c                            | 347 ++++++++++++
> drivers/peci/device.c                         | 211 ++++++++
> drivers/peci/internal.h                       | 137 +++++
> drivers/peci/request.c                        | 502
> +++++++++++++++++
> drivers/peci/sysfs.c                          | 82 +++
> include/linux/peci-cpu.h                      | 38 ++
> include/linux/peci.h                          | 93 ++++
> include/linux/x86/cpu.h                       |   9 +
> include/linux/x86/intel-family.h              | 146 +++++
> lib/Kconfig                                   |   5 +
> lib/Makefile                                  |   2 +
> lib/x86/Makefile                              |   3 +
> {arch/x86/lib => lib/x86}/cpu.c               |   2 +-
> 47 files changed, 3902 insertions(+), 149 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/peci/peci-
> aspeed.yaml
> create mode 100644 Documentation/devicetree/bindings/peci/peci-
> controller.yaml
> create mode 100644 Documentation/hwmon/peci-cputemp.rst
> create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
> create mode 100644 Documentation/peci/index.rst
> create mode 100644 Documentation/peci/peci.rst
> create mode 100644 drivers/hwmon/peci/Kconfig
> create mode 100644 drivers/hwmon/peci/Makefile
> create mode 100644 drivers/hwmon/peci/common.h
> create mode 100644 drivers/hwmon/peci/cputemp.c
> create mode 100644 drivers/hwmon/peci/dimmtemp.c
> create mode 100644 drivers/peci/Kconfig
> create mode 100644 drivers/peci/Makefile
> create mode 100644 drivers/peci/controller/Kconfig
> create mode 100644 drivers/peci/controller/Makefile
> create mode 100644 drivers/peci/controller/peci-aspeed.c
> create mode 100644 drivers/peci/core.c
> create mode 100644 drivers/peci/cpu.c
> create mode 100644 drivers/peci/device.c
> create mode 100644 drivers/peci/internal.h
> create mode 100644 drivers/peci/request.c
> create mode 100644 drivers/peci/sysfs.c
> create mode 100644 include/linux/peci-cpu.h
> create mode 100644 include/linux/peci.h
> create mode 100644 include/linux/x86/cpu.h
> create mode 100644 include/linux/x86/intel-family.h
> create mode 100644 lib/x86/Makefile
> rename {arch/x86/lib => lib/x86}/cpu.c (95%)
>

2021-07-14 16:57:15

by Dan Williams

[permalink] [raw]

On Wed, 2021-08-04 at 11:05 -0700, Guenter Roeck wrote:
> On 8/4/21 10:52 AM, Zev Weiss wrote:
> > On Mon, Aug 02, 2021 at 06:37:30AM CDT, Winiarska, Iwona wrote:
> > > On Tue, 2021-07-27 at 22:58 +0000, Zev Weiss wrote:
> > > > On Mon, Jul 12, 2021 at 05:04:46PM CDT, Iwona Winiarska wrote:
> > > > > From: Jae Hyun Yoo <[email protected]>
> > > > >
> > > > > Add documentation for peci-cputemp driver that provides DTS thermal
> > > > > readings for CPU packages and CPU cores and peci-dimmtemp driver that
> > > > > provides DTS thermal readings for DIMMs.
> > > > >
> > > > > Signed-off-by: Jae Hyun Yoo <[email protected]>
> > > > > Co-developed-by: Iwona Winiarska <[email protected]>
> > > > > Signed-off-by: Iwona Winiarska <[email protected]>
> > > > > Reviewed-by: Pierre-Louis Bossart
> > > > > <[email protected]>
> > > > > ---
> > > > > Documentation/hwmon/index.rst         | 2 +
> > > > > Documentation/hwmon/peci-cputemp.rst | 93 +++++++++++++++++++++++++++
> > > > > Documentation/hwmon/peci-dimmtemp.rst | 58 +++++++++++++++++
> > > > > MAINTAINERS                           | 2 +
> > > > > 4 files changed, 155 insertions(+)
> > > > > create mode 100644 Documentation/hwmon/peci-cputemp.rst
> > > > > create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
> > > > >
> > > > > diff --git a/Documentation/hwmon/index.rst
> > > > > b/Documentation/hwmon/index.rst
> > > > > index bc01601ea81a..cc76b5b3f791 100644
> > > > > --- a/Documentation/hwmon/index.rst
> > > > > +++ b/Documentation/hwmon/index.rst
> > > > > @@ -154,6 +154,8 @@ Hardware Monitoring Kernel Drivers
> > > > >     pcf8591
> > > > >     pim4328
> > > > >     pm6764tr
> > > > > +   peci-cputemp
> > > > > +   peci-dimmtemp
> > > > >     pmbus
> > > > >     powr1220
> > > > >     pxe1610
> > > > > diff --git a/Documentation/hwmon/peci-cputemp.rst
> > > > > b/Documentation/hwmon/peci-cputemp.rst
> > > > > new file mode 100644
> > > > > index 000000000000..d3a218ba810a
> > > > > --- /dev/null
> > > > > +++ b/Documentation/hwmon/peci-cputemp.rst
> > > > > @@ -0,0 +1,93 @@
> > > > > +.. SPDX-License-Identifier: GPL-2.0-only
> > > > > +
> > > > > +Kernel driver peci-cputemp
> > > > > +==========================
> > > > > +
> > > > > +Supported chips:
> > > > > +       One of Intel server CPUs listed below which is connected to a
> > > > > PECI
> > > > > bus.
> > > > > +               * Intel Xeon E5/E7 v3 server processors
> > > > > +                       Intel Xeon E5-14xx v3 family
> > > > > +                       Intel Xeon E5-24xx v3 family
> > > > > +                       Intel Xeon E5-16xx v3 family
> > > > > +                       Intel Xeon E5-26xx v3 family
> > > > > +                       Intel Xeon E5-46xx v3 family
> > > > > +                       Intel Xeon E7-48xx v3 family
> > > > > +                       Intel Xeon E7-88xx v3 family
> > > > > +               * Intel Xeon E5/E7 v4 server processors
> > > > > +                       Intel Xeon E5-16xx v4 family
> > > > > +                       Intel Xeon E5-26xx v4 family
> > > > > +                       Intel Xeon E5-46xx v4 family
> > > > > +                       Intel Xeon E7-48xx v4 family
> > > > > +                       Intel Xeon E7-88xx v4 family
> > > > > +               * Intel Xeon Scalable server processors
> > > > > +                       Intel Xeon D family
> > > > > +                       Intel Xeon Bronze family
> > > > > +                       Intel Xeon Silver family
> > > > > +                       Intel Xeon Gold family
> > > > > +                       Intel Xeon Platinum family
> > > > > +
> > > > > +       Datasheet: Available from
> > > > > http://www.intel.com/design/literature.htm
> > > > > +
> > > > > +Author: Jae Hyun Yoo <[email protected]>
> > > > > +
> > > > > +Description
> > > > > +-----------
> > > > > +
> > > > > +This driver implements a generic PECI hwmon feature which provides
> > > > > Digital
> > > > > +Thermal Sensor (DTS) thermal readings of the CPU package and CPU
> > > > > cores that
> > > > > are
> > > > > +accessible via the processor PECI interface.
> > > > > +
> > > > > +All temperature values are given in millidegree Celsius and will be
> > > > > measurable
> > > > > +only when the target CPU is powered on.
> > > > > +
> > > > > +Sysfs interface
> > > > > +-------------------
> > > > > +
> > > > > +=======================
> > > > > =======================================================
> > > > > +temp1_label            "Die"
> > > > > +temp1_input            Provides current die temperature of the CPU
> > > > > package.
> > > > > +temp1_max              Provides thermal control temperature of the
> > > > > CPU
> > > > > package
> > > > > +                       which is also known as Tcontrol.
> > > > > +temp1_crit             Provides shutdown temperature of the CPU
> > > > > package
> > > > > which
> > > > > +                       is also known as the maximum processor
> > > > > junction
> > > > > +                       temperature, Tjmax or Tprochot.
> > > > > +temp1_crit_hyst                Provides the hysteresis value from
> > > > > Tcontrol
> > > > > to Tjmax of
> > > > > +                       the CPU package.
> > > > > +
> > > > > +temp2_label            "DTS"
> > > > > +temp2_input            Provides current DTS temperature of the CPU
> > > > > package.
> > > >
> > > > Would this be a good place to note the slightly counter-intuitive nature
> > > > of DTS readings? i.e. add something along the lines of "The DTS sensor
> > > > produces a delta relative to Tjmax, so negative values are normal and
> > > > values approaching zero are hot." (In my experience people who aren't
> > > > already familiar with it tend to think something's wrong when a CPU
> > > > temperature reading shows -50C.)
> > >
> > > I believe that what you're referring to is a result of "GetTemp", and
> > > we're
> > > using it to calculate "Die" sensor values (temp1).
> > > The sensor value is absolute - we don't expose "raw" thermal sensor value
> > > (delta) anywhere.
> > >
> > > DTS sensor is exposing temperature value scaled to fit DTS 2.0 thermal
> > > profile:
> > > https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-thermal-guide.html
> > > (section 5.2.3.2)
> > >
> > > Similar to "Die" sensor - it's also exposed in absolute form.
> > >
> > > I'll try to change description to avoid confusion.
> > >
> >
> > When I tested the patch series by applying it to my OpenBMC kernel, the
> > temp2_input sysfs file produced negative numbers (as has been the case
> > with previous iterations of the PECI patchset). Is that expected? From
> > what Guenter has said it sounds like that's going to need to change so
> > that the temperature readings are all in "normal" millidegrees C
> > (that is, relative to the freezing point of water).
> >
>
> Correct, the temperature is expected to be reported in millidegrees C
> per hwmon ABI. Everything else is unacceptable. That makes me wonder what
> "raw" and "absolute" means. Negative numbers suggest that, whatever is
> reported today, it is not millidegrees C.

Let's say we have two values: "base" and "delta". Both are in milidegrees C.
"absolute" means that the sensor value exposed to userspace is calculated as:
base - delta (or base + delta, depending on sensor).
"relative" would mean that we expose "delta" to userspace as sensor value.

For peci-cputemp (and dimmtemp) we're exposing sensors in "absolute" form.

I contacted Zev and we found that the platform he uses has a different format
for the "raw" value ("delta" in the example above) of this particular sensor
(S8.8 instead of S10.6), which means that we're subtracting significantly larger
number than we should, resulting in sensor going into negative.

On the platform I'm using for development purpose, sampling Die and DTS values
returned:
Die 26344
DTS 26329

The platform that Zev used is currently not supported by peci-cpu, however, I
went through the specs, and it looks like some of the older supported platforms
are also using S8.8.
I'll fix this in v3.

Thanks
-Iwona

>
> Guenter