This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
this rewrite will:
Overview
Modern Intel memory controllers host an SMBus controller and connection to
DIMMs and their thermal sensors. The memory controller firmware has three modes
of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
Throttling (OLTT) and none.
- CLTT: The memory controller firmware is periodically accessing the DIMM
temperature sensor over the SMBus.
- OLTT: The memory controller firmware is not accessing the DIMM temperature
sensor over the SMBus but approximates/guesses its temperature.
Depending on the temperature, the memory controller firmware may throttle the
memory bandwidth and alike.
Only one mode of operation can be used at a time. Intel recommends CLTT. This
is also the default on our BIOS.
Original Driver and its Rewrite
The original driver i2c-imc.c was an iMC SMBus controller that provided access
to the DIMM thermal sensors. A second driver dimm-bus.c, also part of Andy's
patch-set, instantiated the thermal sensors.
The original driver was written for the memory controller found in Sandy Bridge
CPUs. Either the Sandy Bridge documentation is incomplete or the functionality
is limited. It was not possible to use this driver while the memory controller
was in CLTT mode as the driver and firmware were both accessing the memory
controller without arbitration. We ran this driver on our Broadwell CPU and the
driver's internal consistency check failed every 30 min or so.
We rewrote this driver to support Broadwell's memory controller 8086.6fa8. Over
time, support for more memory controllers should be added.
Our documentation (Intel Xeon Processor D-1500 Product Family External Design
Specification (EDS), Volume Two: Core and Uncore Registers Volume 2 of 5 Rev.
2.3) hints how to make OS drivers and firmware co-exist in CLTT mode. In short:
- don't (necessarily) disable CLTT mode, but set tsod_polling_interval to 0
- wait 10 ms to drain a potential in-flight firmware CLTT transaction
- OS has now exclusive access to the smb bus
- set tsod_polling_interval to the previous value
Our patch provides proper arbitration between OS and firmware on Broadwell.
The original patch-set also provided an additional driver, dimm-bus.c, to
instantiate the temperature sensors. It had some draw-backs:
- the probe function i2c_scan_dimm_bus() blindly enumerates potential DIMM
sensor i2c addresses causing the SBE bit to be set 6 times on our system.
That is dangerous (see comment in i2c-imc.c: if (stat & SMBSTAT_SBE)). The
i2c addresses of the actual temperature sensors are known to the memory
controller (when in CLTT mode) and don't need to be blindly enumerated.
- the probe function i2c_scan_dimm_bus() instantiates blindly 10 temperature
sensors, although our system had only 2 DIMMs (with 1 temperature sensor
each). The remaining 8 temperature sensors returned 0.
- as already pointed out, the instantiations happen in a further driver
dimm-bus.c. The iMC SMBus driver i2c-imc.c is calling dimm-bus.c to do its
job. That does not feel right. I don't know how to do it better and even move
for now the instantiations into the iMC SMBus driver itself
(imc_instantiate_sensors(()). Please advice here.
The mapping of dimm to i2c adapter and addresses is confusing at best. From the
smb_stat_0 and from Andy's dimm-bus.c driver, I gain the impression the mapping
may be
channel 00 slot 00 i2c-1 0x18 (if there is a dimm)
channel 00 slot 01 i2c-1 0x19 (if there is a dimm)
channel 00 slot 02 i2c-1 0x1a (if there is a dimm)
channel 00 slot 03 i2c-1 0x1b (if there is a dimm)
channel 01 slot 00 i2c-1 0x1c (if there is a dimm)
channel 01 slot 01 i2c-1 0x1d (if there is a dimm)
channel 01 slot 02 i2c-1 0x1e (if there is a dimm)
channel 01 slot 03 i2c-1 0x1f (if there is a dimm)
channel 02 slot 00 i2c-2 0x18 (if there is a dimm)
channel 02 slot 01 i2c-2 0x19 (if there is a dimm)
channel 02 slot 02 i2c-2 0x1a (if there is a dimm)
channel 02 slot 03 i2c-2 0x1b (if there is a dimm)
channel 03 slot 00 i2c-2 0x1c (if there is a dimm)
channel 03 slot 01 i2c-2 0x1d (if there is a dimm)
channel 03 slot 02 i2c-2 0x1e (if there is a dimm)
channel 03 slot 03 i2c-2 0x1f (if there is a dimm)
Experimentally, I gain the impression it's rather
channel 00 slot 00 i2c-1 0x18 (if there is a dimm)
channel 00 slot 01 i2c-1 0x19 (if there is a dimm)
channel 01 slot 00 i2c-1 0x1a (if there is a dimm)
channel 01 slot 01 i2c-1 0x1b (if there is a dimm)
channel 02 slot 00 i2c-2 0x18 (if there is a dimm)
channel 02 slot 01 i2c-2 0x19 (if there is a dimm)
channel 03 slot 00 i2c-2 0x1a (if there is a dimm)
channel 03 slot 01 i2c-2 0x1b (if there is a dimm)
Why? Because we see on our system temperature sensors on i2c address i2c-1 0x18
and ic2-1 0x1a and BIOS and EDAC tell us we have DIMMs on channel 0:slot 0 and
channel 1:slot 0.
[ 9.522781] EDAC DEBUG: __populate_dimms: mc#0: ha 0 channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 16, rank: 2, row: 0x10000, col: 0x400
[ 9.522786] EDAC DEBUG: __populate_dimms: mc#0: ha 0 channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 16, rank: 2, row: 0x10000, col: 0x400
When in OLTT mode, the sensors need to be manually instantiated, e.g.
# echo jc42 0x18 > /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/new_device
# echo jc42 0x1a > /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/new_device
In CLTT mode - we expect almost everyone to configure CLTT mode in their BIOS -
the new driver knows where DIMMs are populated (see arguments to
imc_instantiate_sensor()) and instantiates the sensors. For this magic to
happen, we don't need to understand the mapping.
Unit Test
I had access to two systems with these memory configurations:
System 1: DIMM at channel 1, slot 0.
System 2: DIMM at channel 0, slot 0. DIMM at channel 1, slot 0.
I had no access to a system with DIMMs on channel 2 or 3.
We read the temperature sensors for 8 hours while having CLTT enabled. Next we
read the temperature sensors for 8 hours while having OLTT enabled. We always
get sane data. The internal sanity check always passes and dmesg is clean. The
grep at the end filters out sane temperature values in the 20C to 39C range so
we can focus on abnormal temperature values and error messages.
First we stress-tested the driver (for 8 hours).
System 1:
while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &
while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &
System 2:
while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-0018/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &
while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-0018/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &
while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &
while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; done | grep -v ^[23] &
Next, we gave firmware polling a better chance to start and added a sleep of 2
seconds (for 8 hours).
System 1 and System 2:
while true; do cat /sys/devices/pci0000:ff/0000:ff:13.0/i2c-1/1-001a/hwmon/hwmon?/temp1_input; sleep 2; done | grep -v ^[23] &
~ Stefan
Stefan Schaeckeler (1):
i2c: imc: Add support for Intel iMC SMBus host controller.
MAINTAINERS | 5 +
drivers/i2c/busses/Kconfig | 15 ++
drivers/i2c/busses/Makefile | 1 +
drivers/i2c/busses/i2c-imc.c | 515 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 536 insertions(+)
create mode 100644 drivers/i2c/busses/i2c-imc.c
--
2.11.0
Modern Intel memory controllers have an SMBus connection to DIMMs and their
eeproms and thermal sensors.
Initially support Broadwell memory controllers and DIMM thermal sensors,
only.
Automatically expose the thermal sensors.
Signed-off-by: Stefan Schaeckeler <[email protected]>
---
MAINTAINERS | 5 +
drivers/i2c/busses/Kconfig | 15 ++
drivers/i2c/busses/Makefile | 1 +
drivers/i2c/busses/i2c-imc.c | 515 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 536 insertions(+)
create mode 100644 drivers/i2c/busses/i2c-imc.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 4beb8dc4c7eb..8de6bffd4772 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7895,6 +7895,11 @@ F: drivers/i2c/busses/i2c-sis96x.c
F: drivers/i2c/busses/i2c-via.c
F: drivers/i2c/busses/i2c-viapro.c
+I2C/SMBUS IMC DRIVER
+M: Stefan Schaeckeler <[email protected]>
+L: [email protected]
+F: drivers/i2c/busses/i2c-imc.c
+
I2C/SMBUS INTEL CHT WHISKEY COVE PMIC DRIVER
M: Hans de Goede <[email protected]>
L: [email protected]
diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index 2ddca08f8a76..87c5f8055ca4 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -150,6 +150,21 @@ config I2C_I801
This driver can also be built as a module. If so, the module
will be called i2c-i801.
+config I2C_IMC
+ tristate "Intel iMC SMBus Controller"
+ depends on PCI && X86 && SENSORS_JC42
+ help
+ If you say yes to this option, support will be included for the Intel
+ iMC SMBus host controller.
+
+ Currently, there is support for Broadwell, only.
+
+ Behind the iMC SMBus are DIMMs with eeproms and temperature sensors.
+ This driver is also instantiating the temperature sensors.
+
+ This driver can also be built as a module. If so, the module will be
+ called i2c-imc.
+
config I2C_ISCH
tristate "Intel SCH SMBus 1.0"
depends on PCI
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 25d60889713c..31446bdcf9c3 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_I2C_AMD756_S4882) += i2c-amd756-s4882.o
obj-$(CONFIG_I2C_AMD8111) += i2c-amd8111.o
obj-$(CONFIG_I2C_CHT_WC) += i2c-cht-wc.o
obj-$(CONFIG_I2C_I801) += i2c-i801.o
+obj-$(CONFIG_I2C_IMC) += i2c-imc.o
obj-$(CONFIG_I2C_ISCH) += i2c-isch.o
obj-$(CONFIG_I2C_ISMT) += i2c-ismt.o
obj-$(CONFIG_I2C_NFORCE2) += i2c-nforce2.o
diff --git a/drivers/i2c/busses/i2c-imc.c b/drivers/i2c/busses/i2c-imc.c
new file mode 100644
index 000000000000..653684b9c4c7
--- /dev/null
+++ b/drivers/i2c/busses/i2c-imc.c
@@ -0,0 +1,515 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Intel Memory Controller iMC SMBus Driver to DIMMs.
+ *
+ * Copyright (c) 2013-2016 Andrew Lutomirski <[email protected]>
+ * Copyright (c) 2020 Stefan Schaeckeler <[email protected]>, Cisco Systems
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+#include <linux/i2c.h>
+
+/* iMC Main, PCI dev 0x13, fn 0, 8086.6fa8 */
+#define PCI_DEVICE_ID_INTEL_BROADWELL_IMC_TA 0x6fa8
+
+/* Register offsets for channel pairs 0+1 and 2+3 */
+#define SMBSTAT(i) (0x180 + 0x10*(i))
+#define SMBCMD(i) (0x184 + 0x10*(i))
+#define SMBCNTL(i) (0x188 + 0x10*(i))
+
+/* SMBSTAT fields */
+#define SMBSTAT_RDO (1U << 31) /* Read Data Valid */
+#define SMBSTAT_WOD (1U << 30) /* Write Operation Done */
+#define SMBSTAT_SBE (1U << 29) /* SMBus Error */
+#define SMBSTAT_SMB_BUSY (1U << 28) /* SMBus Busy State */
+#define SMBSTAT_RDATA_MASK 0xffff /* Result of a read */
+
+/* SMBCMD fields */
+#define SMBCMD_TRIGGER (1U << 31) /* CMD Trigger */
+#define SMBCMD_WORD_ACCESS (1U << 29) /* Word (vs byte) access */
+#define SMBCMD_TYPE_READ (0U << 27) /* Read */
+#define SMBCMD_TYPE_WRITE (1U << 27) /* Write */
+#define SMBCMD_SA_SHIFT 24
+#define SMBCMD_BA_SHIFT 16
+
+/* SMBCNTL fields */
+#define SMBCNTL_DTI_MASK 0xf0000000 /* Slave Address low bits */
+#define SMBCNTL_DTI_SHIFT 28 /* Slave Address low bits */
+#define SMBCNTL_DIS_WRT (1U << 26) /* Disable Write */
+#define SMBCNTL_TSOD_PRES_MASK 0xff /* DIMM Present mask */
+
+/* For sanity check: bits that might randomly change if we race with firmware */
+#define SMBCMD_OUR_BITS (~(u32)SMBCMD_TRIGGER)
+#define SMBCNTL_OUR_BITS (SMBCNTL_DTI_MASK)
+
+
+/* System Address Decoder, PCI dev 0xf fn 5, 8086.6ffd */
+#define PCI_DEVICE_ID_INTEL_BROADWELL_IMC_SAD 0x6ffd
+
+/* Register offsets */
+#define SADCNTL 0xf4
+
+/* SADCNTL fields */
+#define SADCNTL_LOCAL_NODEID_MASK 0xf /* Local NodeID of socket */
+
+
+/* Power Control Unit, PCI dev 0x1e fn 1, 8086.6f99 */
+#define PCI_DEVICE_ID_INTEL_BROADWELL_IMC_PCU 0x6f99
+
+/* Register offsets */
+#define TSODCNTL 0xe0
+
+/* TSODCNTL fields */
+
+
+/* DIMMs hold jc42 thermal sensors starting at i2c address 0x18 */
+#define DIMM_SENSOR_DRV "jc42"
+#define DIMM_SENSOR_BASE_ADR 0x18
+
+
+#define sanitycheck 1
+
+struct imc_channelpair {
+ struct i2c_adapter adapter;
+ bool can_write, cltt;
+};
+
+struct imc_pcu {
+ struct pci_dev *pci_dev;
+ u32 tsod_polling_interval;
+ struct mutex mutex; /* see imc_channelpair_claim() */
+};
+
+struct imc_priv {
+ struct pci_dev *pci_dev;
+ struct imc_channelpair channelpair[2];
+ struct imc_pcu pcu;
+ bool suspended;
+};
+
+static int imc_channelpair_claim(struct imc_priv *priv, int i)
+{
+ if (priv->suspended)
+ return -EIO;
+
+ /*
+ * i2c controllers need exclusive access to a psu register and wait
+ * then for 10ms before starting their transaction.
+ *
+ * Possible optimization: Once an i2c controller modified the psu
+ * register and waits, the other controller does not need to wait for
+ * the whole 10ms, but then only this other controller has to clean up
+ * the psu register.
+ */
+ mutex_lock(&priv->pcu.mutex);
+
+ if (priv->channelpair[i].cltt) {
+ pci_write_config_dword(priv->pcu.pci_dev, TSODCNTL, 0);
+ usleep_range(10000, 10500);
+ }
+ return 0;
+}
+
+static void imc_channelpair_release(struct imc_priv *priv, int i)
+{
+ if (priv->channelpair[i].cltt) {
+ /* set tosd_control.tsod_polling_interval to previous value */
+ pci_write_config_dword(priv->pcu.pci_dev, TSODCNTL,
+ priv->pcu.tsod_polling_interval);
+ }
+ mutex_unlock(&priv->pcu.mutex);
+}
+
+static bool imc_wait_for_transaction(struct imc_priv *priv, int i, u32 *stat)
+{
+ int j;
+ static int busywaits = 1;
+
+ /*
+ * Distribution of transaction time from 10000 collected samples:
+ *
+ * 70us: 1, 80us: 12, 90us: 34, 100us: 132, 110us: 424, 120us: 1138,
+ * 130us: 5224, 140us: 3035.
+ *
+ */
+ usleep_range(131, 140);
+
+ /* Don't give up, yet */
+ for (j = 0; j < 20; j++) {
+ pci_read_config_dword(priv->pci_dev, SMBSTAT(i), stat);
+ if (!(*stat & SMBSTAT_SMB_BUSY)) {
+ if (j > busywaits) {
+ busywaits = j;
+ dev_warn(&priv->pci_dev->dev,
+ "Discovered surprisingly long transaction time (%d)\n",
+ busywaits);
+ }
+ return true;
+ }
+ udelay(9);
+ }
+ return false;
+}
+
+/*
+ * The iMC supports five access types. The terminology is rather inconsistent.
+ * These are the types:
+ *
+ * "Write to pointer register SMBus": I2C_SMBUS_WRITE, I2C_SMBUS_BYTE
+ *
+ * Read byte/word: I2C_SMBUS_READ, I2C_SMBUS_{BYTE|WORD}_DATA
+ *
+ * Write byte/word: I2C_SMBUS_WRITE, I2C_SMBUS_{BYTE|WORD}_DATA
+ */
+
+static u32 imc_func(struct i2c_adapter *adapter)
+{
+ int i;
+ struct imc_channelpair *cp;
+ struct imc_priv *priv = i2c_get_adapdata(adapter);
+
+ i = (adapter == &priv->channelpair[0].adapter ? 0 : 1);
+ cp = &priv->channelpair[i];
+
+ if (cp->can_write)
+ return I2C_FUNC_SMBUS_BYTE_DATA | I2C_FUNC_SMBUS_WORD_DATA;
+ else
+ return I2C_FUNC_SMBUS_READ_BYTE_DATA |
+ I2C_FUNC_SMBUS_READ_WORD_DATA;
+}
+
+static s32 imc_smbus_xfer(struct i2c_adapter *adap, u16 addr,
+ unsigned short flags, char read_write, u8 command,
+ int size, union i2c_smbus_data *data)
+{
+ int ret, i;
+ u32 cmd = 0, cntl, stat;
+#ifdef sanitycheck
+ u32 final_cmd, final_cntl;
+#endif
+ struct imc_channelpair *cp;
+ struct imc_priv *priv = i2c_get_adapdata(adap);
+
+ i = (adap == &priv->channelpair[0].adapter ? 0 : 1);
+ cp = &priv->channelpair[i];
+
+ /* Encode CMD part of addresses and access size */
+ cmd |= ((u32)addr & 0x7) << SMBCMD_SA_SHIFT;
+ cmd |= ((u32)command) << SMBCMD_BA_SHIFT;
+ if (size == I2C_SMBUS_WORD_DATA)
+ cmd |= SMBCMD_WORD_ACCESS;
+
+ /* Encode read/write and data to write */
+ if (read_write == I2C_SMBUS_READ) {
+ cmd |= SMBCMD_TYPE_READ;
+ } else {
+ cmd |= SMBCMD_TYPE_WRITE;
+ cmd |= (size == I2C_SMBUS_WORD_DATA
+ ? swab16(data->word)
+ : data->byte);
+ }
+
+ ret = imc_channelpair_claim(priv, i);
+ if (ret)
+ return ret;
+
+ pci_read_config_dword(priv->pci_dev, SMBCNTL(i), &cntl);
+ cntl &= ~SMBCNTL_DTI_MASK;
+ cntl |= ((u32)addr >> 3) << SMBCNTL_DTI_SHIFT;
+ pci_write_config_dword(priv->pci_dev, SMBCNTL(i), cntl);
+
+ cmd |= SMBCMD_TRIGGER;
+ pci_write_config_dword(priv->pci_dev, SMBCMD(i), cmd);
+
+ if (!imc_wait_for_transaction(priv, i, &stat)) {
+ dev_warn(&priv->pci_dev->dev, "smbus transaction did not complete.\n");
+ ret = -ETIMEDOUT;
+ goto xfer_out_release;
+ }
+
+#ifdef sanitycheck /* This is a young driver. Keep the checks for now */
+ pci_read_config_dword(priv->pci_dev, SMBCMD(i), &final_cmd);
+ pci_read_config_dword(priv->pci_dev, SMBCNTL(i), &final_cntl);
+ if (((cmd ^ final_cmd) & SMBCMD_OUR_BITS) ||
+ ((cntl ^ final_cntl) & SMBCNTL_OUR_BITS)) {
+ dev_err(&priv->pci_dev->dev,
+ "Access to channel pair %d-%d raced with hardware: cmd 0x%08X->0x%08X, cntl 0x%08X->0x%08X\n",
+ 2*i, 2*i+1, cmd, final_cmd, cntl, final_cntl);
+ ret = -EIO;
+ goto xfer_out_release;
+ }
+#endif
+
+ if (stat & SMBSTAT_SBE) {
+ /*
+ * While SBE is set hardware TSOD polling is disabled. This is
+ * very bad as this bit is RO-V and will only be cleared after
+ * a further software initiated transaction finishes
+ * successfully.
+ */
+ dev_err(&priv->pci_dev->dev,
+ "smbus error: sbe is set 0x%x\n", stat);
+ ret = -ENXIO;
+ goto xfer_out_release;
+ }
+
+ if (read_write == I2C_SMBUS_READ) {
+ if (!(stat & SMBSTAT_RDO)) {
+ dev_warn(&priv->pci_dev->dev,
+ "Unexpected read status 0x%08X\n", stat);
+ ret = -EIO;
+ goto xfer_out_release;
+ }
+ /*
+ * The iMC SMBus controller thinks of SMBus words as being
+ * big-endian (MSB first). Linux treats them as little-endian,
+ * so we need to swap them.
+ */
+ if (size == I2C_SMBUS_WORD_DATA)
+ data->word = swab16(stat & SMBSTAT_RDATA_MASK);
+ else
+ data->byte = stat & 0xFF;
+ } else {
+ if (!(stat & SMBSTAT_WOD)) {
+ dev_warn(&priv->pci_dev->dev,
+ "Unexpected write status 0x%08X\n", stat);
+ ret = -EIO;
+ }
+ }
+
+xfer_out_release:
+ imc_channelpair_release(priv, i);
+
+ return ret;
+}
+
+static const struct i2c_algorithm imc_smbus_algorithm = {
+ .smbus_xfer = imc_smbus_xfer,
+ .functionality = imc_func,
+};
+
+static void imc_instantiate_sensors(struct i2c_adapter *adapter, u8 presence)
+{
+ struct i2c_board_info info = {};
+
+ strcpy(info.type, DIMM_SENSOR_DRV);
+ info.addr = DIMM_SENSOR_BASE_ADR;
+
+ /*
+ * Presence is a bit vector. Bits from right to left map into i2c slave
+ * addresses starting 0x18.
+ */
+ while (presence) {
+ if (presence & 0x1)
+ i2c_new_device(adapter, &info);
+ info.addr++;
+ presence >>= 1;
+ }
+}
+
+static int imc_init_channelpair(struct imc_priv *priv, int i, int socket)
+{
+ int err;
+ u32 val;
+ struct imc_channelpair *cp = &priv->channelpair[i];
+
+ i2c_set_adapdata(&cp->adapter, priv);
+ cp->adapter.owner = THIS_MODULE;
+ cp->adapter.algo = &imc_smbus_algorithm;
+ cp->adapter.dev.parent = &priv->pci_dev->dev;
+
+ pci_read_config_dword(priv->pci_dev, SMBCNTL(i), &val);
+ cp->can_write = !(val & SMBCNTL_DIS_WRT);
+
+ /*
+ * A TSOD polling interval of > 0 tells us if CLTT mode is enabled on
+ * some channel pair.
+ *
+ * Is there a better way to check for CLTT mode? In particular, is
+ * there a way to distingush the mode on a channel pair basis?
+ */
+ cp->cltt = (priv->pcu.tsod_polling_interval > 0);
+
+ snprintf(cp->adapter.name, sizeof(cp->adapter.name),
+ "iMC socket %d for channel pair %d-%d", socket, 2*i, 2*i+1);
+ err = i2c_add_adapter(&cp->adapter);
+ if (err)
+ return err;
+
+ /* For reasons unknown, TSOD_PRES_MASK is only set in CLTT mode. */
+ if (cp->cltt) {
+ dev_info(&priv->pci_dev->dev,
+ "CLTT is enabled on channel pair %d-%d. Thermal sensors will be automatically enabled\n",
+ 2*i, 2*i+1);
+ } else {
+ dev_info(&priv->pci_dev->dev,
+ "CLTT is disabled on channel pair %d-%d. Thermal sensors need to be manually enabled\n",
+ 2*i, 2*i+1);
+ }
+
+ imc_instantiate_sensors(&cp->adapter, val & SMBCNTL_TSOD_PRES_MASK);
+
+ return 0;
+}
+
+static void imc_free_channelpair(struct imc_priv *priv, int i)
+{
+ struct imc_channelpair *cp = &priv->channelpair[i];
+
+ i2c_del_adapter(&cp->adapter);
+}
+
+static struct pci_dev *imc_get_related_device(struct pci_bus *bus,
+ unsigned int devfn, u16 devid)
+{
+ struct pci_dev *dev = pci_get_slot(bus, devfn);
+
+ if (!dev)
+ return NULL;
+
+ if (dev->vendor != PCI_VENDOR_ID_INTEL || dev->device != devid) {
+ pci_dev_put(dev);
+ return NULL;
+ }
+ return dev;
+}
+
+static int imc_probe(struct pci_dev *dev, const struct pci_device_id *id)
+{
+ int i, j, err;
+ struct imc_priv *priv;
+ struct pci_dev *sad; /* System Address Decoder */
+ u32 sadcntl;
+
+ /* Sanity check. This device is always at 0x13.0 */
+ if (dev->devfn != PCI_DEVFN(0x13, 0))
+ return -ENODEV;
+
+ priv = devm_kzalloc(&dev->dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+ priv->pci_dev = dev;
+ pci_set_drvdata(dev, priv);
+
+ /*
+ * From sad, we learn the local node id of the socket.
+ *
+ * The socket will not change at runtime and so we throw away sad.
+ */
+ sad = imc_get_related_device(dev->bus, PCI_DEVFN(0x0f, 5),
+ PCI_DEVICE_ID_INTEL_BROADWELL_IMC_SAD);
+ if (!sad) {
+ err = -ENODEV;
+ goto probe_out_free;
+ }
+ pci_read_config_dword(sad, SADCNTL, &sadcntl);
+ pci_dev_put(sad);
+
+ /*
+ * From pcu, we access the CLTT polling interval.
+ *
+ * The polling interval is set by BIOS. We assume it will not change at
+ * runtime and cache the initial value.
+ */
+ priv->pcu.pci_dev = imc_get_related_device(dev->bus, PCI_DEVFN(0x1e, 1),
+ PCI_DEVICE_ID_INTEL_BROADWELL_IMC_PCU);
+ if (!priv->pcu.pci_dev) {
+ err = -ENODEV;
+ goto probe_out_free;
+ }
+ pci_read_config_dword(priv->pcu.pci_dev, TSODCNTL,
+ &priv->pcu.tsod_polling_interval);
+
+ mutex_init(&priv->pcu.mutex);
+
+ for (i = 0; i < 2; i++) {
+ err = imc_init_channelpair(priv, i,
+ sadcntl & SADCNTL_LOCAL_NODEID_MASK);
+ if (err)
+ goto probe_out_free_channelpair;
+ }
+
+ return 0;
+
+probe_out_free_channelpair:
+ for (j = 0; j < i; j++)
+ imc_free_channelpair(priv, j);
+
+ mutex_destroy(&priv->pcu.mutex);
+
+probe_out_free:
+ kfree(priv);
+ return err;
+}
+
+static void imc_remove(struct pci_dev *dev)
+{
+ int i;
+ struct imc_priv *priv = pci_get_drvdata(dev);
+
+ for (i = 0; i < 2; i++)
+ imc_free_channelpair(priv, i);
+
+ /* set tosd_control.tsod_polling_interval to initial value */
+ pci_write_config_dword(priv->pcu.pci_dev, TSODCNTL,
+ priv->pcu.tsod_polling_interval);
+
+ mutex_destroy(&priv->pcu.mutex);
+}
+
+static int imc_suspend(struct pci_dev *dev, pm_message_t mesg)
+{
+ struct imc_priv *priv = pci_get_drvdata(dev);
+
+ /* BIOS is in charge. We should finish any pending transaction */
+ priv->suspended = true;
+
+ return 0;
+}
+
+static int imc_resume(struct pci_dev *dev)
+{
+ struct imc_priv *priv = pci_get_drvdata(dev);
+
+ priv->suspended = false;
+
+ return 0;
+}
+
+static const struct pci_device_id imc_ids[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_INTEL,
+ PCI_DEVICE_ID_INTEL_BROADWELL_IMC_TA) },
+ { 0, }
+};
+MODULE_DEVICE_TABLE(pci, imc_ids);
+
+static struct pci_driver imc_pci_driver = {
+ .name = "imc_smbus",
+ .id_table = imc_ids,
+ .probe = imc_probe,
+ .remove = imc_remove,
+ .suspend = imc_suspend,
+ .resume = imc_resume,
+};
+
+static int __init i2c_imc_init(void)
+{
+ return pci_register_driver(&imc_pci_driver);
+}
+module_init(i2c_imc_init);
+
+static void __exit i2c_imc_exit(void)
+{
+ pci_unregister_driver(&imc_pci_driver);
+}
+module_exit(i2c_imc_exit);
+
+MODULE_AUTHOR("Stefan Schaeckeler <[email protected]>");
+MODULE_DESCRIPTION("iMC SMBus driver");
+MODULE_LICENSE("GPL v2");
--
2.11.0
On Mon, Feb 24, 2020 at 12:54 AM Stefan Schaeckeler <[email protected]> wrote:
>
> This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> this rewrite will:
Thanks for the patch!
I'll review the code later.
I think the better to have a documentation file where you describe
stuff like enumeration and so on for this drivers (under Documentation
folder).
> Stefan Schaeckeler (1):
> i2c: imc: Add support for Intel iMC SMBus host controller.
>
> MAINTAINERS | 5 +
> drivers/i2c/busses/Kconfig | 15 ++
> drivers/i2c/busses/Makefile | 1 +
> drivers/i2c/busses/i2c-imc.c | 515 +++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 536 insertions(+)
> create mode 100644 drivers/i2c/busses/i2c-imc.c
>
> --
> 2.11.0
>
--
With Best Regards,
Andy Shevchenko
On Sun, Feb 23, 2020 at 2:52 PM Stefan Schaeckeler <[email protected]> wrote:
>
> This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> this rewrite will:
>
>
> Overview
>
> Modern Intel memory controllers host an SMBus controller and connection to
> DIMMs and their thermal sensors. The memory controller firmware has three modes
> of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
> Throttling (OLTT) and none.
>
> - CLTT: The memory controller firmware is periodically accessing the DIMM
> temperature sensor over the SMBus.
>
I think this is great! One question, though: what happens if the
system is in CLTT mode but you disable CLTT and claim the bus for too
long? For example, if there's an infinite loop or other lockup which
you have the tsod polling interval set to 0? Does the system catch
fire or does the system do something intelligent like temporarily
switching to open loop?
Hello Any,
> > This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> > https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> > this rewrite will:
> >
> >
> > Overview
> >
> > Modern Intel memory controllers host an SMBus controller and connection to
> > DIMMs and their thermal sensors. The memory controller firmware has three modes
> > of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
> > Throttling (OLTT) and none.
> >
> > - CLTT: The memory controller firmware is periodically accessing the DIMM
> > temperature sensor over the SMBus.
> >
>
>
> I think this is great! One question, though: what happens if the
> system is in CLTT mode but you disable CLTT and claim the bus for too
> long? For example, if there's an infinite loop or other lockup which
> you have the tsod polling interval set to 0? Does the system catch
> fire or does the system do something intelligent like temporarily
> switching to open loop?
I don't know. Most likely, the current memory throttling rate will be kept.
That might not be enough for the forthcoming workload and, ehm, the system may
catch fire.
I assume our use-case is the most common use-case for this driver: our embedded
system comes with its own environmental management software. It monitors, among
other sensor values, the DIMM temperatures and takes action on abnormal values.
If one is concerned about your scenario, then the environmental management
software needs to consider blocked reads on the sysfs node as a worst case
scenario and reboot the system.
Nothing can really go wrong while the polling interval is set to 0, though:
- reading and setting pci configuration space registers.
- calling dev_err, dev_warn and alike.
- usleep_range(131,140) and up to 20 udelay(9).
What is not clear to me is what if imc_smbus_xfer() is executing while the
driver is rmmod-ed. Defensively, I set in the driver's remove function the
tsod_polling_interval back to its original value.
~ Stefan
On Tue, Feb 25, 2020 at 01:49:34PM -0800, Andy Lutomirski wrote:
> On Sun, Feb 23, 2020 at 2:52 PM Stefan Schaeckeler <[email protected]> wrote:
> >
> > This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> > https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> > this rewrite will:
> >
> >
> > Overview
> >
> > Modern Intel memory controllers host an SMBus controller and connection to
> > DIMMs and their thermal sensors. The memory controller firmware has three modes
> > of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
> > Throttling (OLTT) and none.
> >
> > - CLTT: The memory controller firmware is periodically accessing the DIMM
> > temperature sensor over the SMBus.
> >
>
>
> I think this is great! One question, though: what happens if the
> system is in CLTT mode but you disable CLTT and claim the bus for too
> long? For example, if there's an infinite loop or other lockup which
> you have the tsod polling interval set to 0? Does the system catch
> fire or does the system do something intelligent like temporarily
> switching to open loop?
Any news on this question?
Hello Wolfram,
> > > This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> > > https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> > > this rewrite will:
> > >
> > >
> > > Overview
> > >
> > > Modern Intel memory controllers host an SMBus controller and connection to
> > > DIMMs and their thermal sensors. The memory controller firmware has three modes
> > > of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
> > > Throttling (OLTT) and none.
> > >
> > > - CLTT: The memory controller firmware is periodically accessing the DIMM
> > > temperature sensor over the SMBus.
> > >
> >
> >
> > I think this is great! One question, though: what happens if the
> > system is in CLTT mode but you disable CLTT and claim the bus for too
> > long? For example, if there's an infinite loop or other lockup which
> > you have the tsod polling interval set to 0? Does the system catch
> > fire or does the system do something intelligent like temporarily
> > switching to open loop?
>
> Any news on this question?
Thank you for your interest in this patch. You can read my reply here
https://lkml.org/lkml/2020/3/1/216
Stefan
On Sun, Apr 5, 2020 at 2:41 PM Stefan Schaeckeler <[email protected]> wrote:
>
> Hello Wolfram,
>
> > > > This patch is based on Andy Lutomirski's iMC SMBus driver patch-set
> > > > https://lkml.org/lkml/2016/4/28/926. It never made it into the kernel. I hope
> > > > this rewrite will:
> > > >
> > > >
> > > > Overview
> > > >
> > > > Modern Intel memory controllers host an SMBus controller and connection to
> > > > DIMMs and their thermal sensors. The memory controller firmware has three modes
> > > > of operation: Closed Loop Thermal Throttling (CLTT), Open Loop Thermal
> > > > Throttling (OLTT) and none.
> > > >
> > > > - CLTT: The memory controller firmware is periodically accessing the DIMM
> > > > temperature sensor over the SMBus.
> > > >
> > >
> > >
> > > I think this is great! One question, though: what happens if the
> > > system is in CLTT mode but you disable CLTT and claim the bus for too
> > > long? For example, if there's an infinite loop or other lockup which
> > > you have the tsod polling interval set to 0? Does the system catch
> > > fire or does the system do something intelligent like temporarily
> > > switching to open loop?
> >
> > Any news on this question?
>
> Thank you for your interest in this patch. You can read my reply here
> https://lkml.org/lkml/2020/3/1/216
I think it could make sense to upstream this driver but to require a
scary boot-time option to enable it. Maybe i2c_imc.dangerous=1?
>
> Stefan