optoe is an i2c based driver that supports read/write access to all
the pages (tables) of MSA standard SFP and similar devices (conforming
to the SFF-8472 spec) and MSA standard QSFP and similar devices
(conforming to the SFF-8436 spec).
These devices provide identification, operational status and control
registers via an EEPROM model. These devices support one or 3 fixed
pages (128 bytes) of data, and one page that is selected via a page
register on the first fixed page. Thus the driver's main task is
to map these pages onto a simple linear address space for user space
management applications. See the driver code for a detailed layout.
EEPROM data is accessible via a bin_attribute file called 'eeprom',
e.g. /sys/bus/i2c/devices/24-0050/eeprom.
Signed-off-by: Don Bollinger <[email protected]>
---
Why should this driver be in the Linux kernel? SFP and QSFP devices plug
into switches to convert electrical to optical signals and drive the
optical signal over fiber optic cables. They provide status and control
registers through an i2c interface similar to to other EEPROMS. However,
they have a paging mechanism that is unique, which requires a different
driver from (for example) at24. Various drivers have been developed for
this purpose, none of them support both SFP and QSFP, provide both read
and write access, and access all 256 architected pages. optoe does all
of these.
optoe has been adopted and is shipping to customers as a base module,
available to all platforms (switches) and used by multiple vendors and
platforms on both ONL (Open Network Linux) and SONiC (Microsoft's
'Software for Open Networking in the Cloud').
This patch has been built on the latest staging-testing kernel. It has
built and tested with SFP and QSFP devices on an ARM platform with a 4.9
kernel, and an x86 switch with a 3.16 kernel. This patch should install
and build clean on any kernel from 3.16 up to the latest (as of 6/10/2018).
Documentation/misc-devices/optoe.txt | 56 ++
drivers/misc/eeprom/Kconfig | 18 +
drivers/misc/eeprom/Makefile | 1 +
drivers/misc/eeprom/optoe.c | 1141 ++++++++++++++++++++++++++++++++++
4 files changed, 1216 insertions(+)
create mode 100644 Documentation/misc-devices/optoe.txt
create mode 100644 drivers/misc/eeprom/optoe.c
diff --git a/Documentation/misc-devices/optoe.txt b/Documentation/misc-devices/optoe.txt
new file mode 100644
index 000000000000..496134940147
--- /dev/null
+++ b/Documentation/misc-devices/optoe.txt
@@ -0,0 +1,56 @@
+optoe driver
+
+Author Don Bollinger ([email protected])
+
+Optoe is an i2c based driver that supports read/write access to all
+the pages (tables) of MSA standard SFP and similar devices (conforming
+to the SFF-8472 spec) and MSA standard QSFP and similar devices
+(conforming to the SFF-8436 spec).
+
+i2c based optoelectronic transceivers (SPF, QSFP, etc) provide identification,
+operational status, and control registers via an EEPROM model. Unlike the
+EEPROMs that at24 supports, these devices access data beyond byte 256 via
+a page select register, which must be managed by the driver. See the driver
+code for a detailed explanation of how the linear address space provided
+by the driver maps to the paged address space provided by the devices.
+
+The EEPROM data is accessible via a bin_attribute file called 'eeprom',
+e.g. /sys/bus/i2c/devices/24-0050/eeprom
+
+This driver also reports the port number for each device, via a sysfs
+attribute: 'port_name'. This is a read/write attribute. It should be
+explicitly set as part of system initialization, ideally at the same time
+the device is instantiated. Write an appropriate port name (any string, up
+to 19 characters) to initialize. If not initialized explicitly, all ports
+will have the port_name of 'unitialized'. Alternatively, if the driver is
+called with platform_data, the port_name will be read from eeprom_data->label
+(if the EEPROM CLASS driver is configured) or from platform_data.port_name.
+
+This driver can be instantiated with 'new_device', per the convention
+described in Documentation/i2c/instantiating-devices. It wants one of
+two possible device identifiers. Use 'optoe1' to indicate this is a device
+with just one i2c address (all QSFP type devices). Use 'optoe2' to indicate
+this is a device with two i2c addresses (all SFP type devices).
+
+Example:
+# echo optoe1 0x50 > /sys/bus/i2c/devices/i2c-64/new_device
+# echo port54 > /sys/bus/i2c/devices/i2c-64/port_name
+
+This will add a QSFP type device to i2c bus i2c-64, and name it 'port54'
+
+Example:
+# echo optoe2 0x50 > /sys/bus/i2c/devices/i2c-11/new_device
+# echo port1 > /sys/bus/i2c/devices/i2c-11/port_name
+
+This will add an SFP type device to i2c bus i2c-11, and name it 'port1'
+
+The second parameter to new_device is an i2c address, and MUST be 0x50 for
+this driver to work properly. This is part of the spec for these devices.
+(It is not necessary to create a device at 0x51 for SFP type devices, the
+driver does that automatically.)
+
+Note that SFP type and QSFP type devices are not plug-compatible. The
+driver expects the correct ID for each port (each i2c device). It does
+not check because the port will often be empty, and the only way to check
+is to interrogate the device. Incorrect choice of ID will lead to correct
+data being reported for the first 256 bytes, incorrect data after that.
diff --git a/drivers/misc/eeprom/Kconfig b/drivers/misc/eeprom/Kconfig
index 68a1ac929917..9a08e12756ee 100644
--- a/drivers/misc/eeprom/Kconfig
+++ b/drivers/misc/eeprom/Kconfig
@@ -111,4 +111,22 @@ config EEPROM_IDT_89HPESX
This driver can also be built as a module. If so, the module
will be called idt_89hpesx.
+config EEPROM_OPTOE
+ tristate "read/write access to SFP* & QSFP* EEPROMs"
+ depends on I2C && SYSFS
+ help
+ If you say yes here you get support for read and write access to
+ the EEPROM of SFP and QSFP type optical and copper transceivers.
+ Includes all devices which conform to the sff-8436 and sff-8472
+ spec including SFP, SFP+, SFP28, SFP-DWDM, QSFP, QSFP+, QSFP28
+ or later. These devices are usually found in network switches.
+
+ This driver only manages read/write access to the EEPROM, all
+ other features should be accessed via i2c-dev.
+
+ This driver can also be built as a module. If so, the module
+ will be called optoe.
+
+ If unsure, say N.
+
endmenu
diff --git a/drivers/misc/eeprom/Makefile b/drivers/misc/eeprom/Makefile
index 2aab60ef3e3e..00288d669017 100644
--- a/drivers/misc/eeprom/Makefile
+++ b/drivers/misc/eeprom/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_EEPROM_93CX6) += eeprom_93cx6.o
obj-$(CONFIG_EEPROM_93XX46) += eeprom_93xx46.o
obj-$(CONFIG_EEPROM_DIGSY_MTC_CFG) += digsy_mtc_eeprom.o
obj-$(CONFIG_EEPROM_IDT_89HPESX) += idt_89hpesx.o
+obj-$(CONFIG_EEPROM_OPTOE) += optoe.o
diff --git a/drivers/misc/eeprom/optoe.c b/drivers/misc/eeprom/optoe.c
new file mode 100644
index 000000000000..7cdf1a0a5299
--- /dev/null
+++ b/drivers/misc/eeprom/optoe.c
@@ -0,0 +1,1141 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * optoe.c - A driver to read and write the EEPROM on optical transceivers
+ * (SFP, QSFP and similar I2C based devices)
+ *
+ * Copyright (C) 2014 Cumulus networks Inc.
+ * Copyright (C) 2017 Finisar Corp.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Freeoftware Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+/*
+ * Description:
+ * a) Optical transceiver EEPROM read/write transactions are just like
+ * the at24 eeproms managed by the at24.c i2c driver
+ * b) The register/memory layout is up to 256 128 byte pages defined by
+ * a "pages valid" register and switched via a "page select"
+ * register as explained in below diagram.
+ * c) 256 bytes are mapped at a time. 'Lower page 00h' is the first 128
+ * bytes of address space, and always references the same
+ * location, independent of the page select register.
+ * All mapped pages are mapped into the upper 128 bytes
+ * (offset 128-255) of the i2c address.
+ * d) Devices with one I2C address (eg QSFP) use I2C address 0x50
+ * (A0h in the spec), and map all pages in the upper 128 bytes
+ * of that address.
+ * e) Devices with two I2C addresses (eg SFP) have 256 bytes of data
+ * at I2C address 0x50, and 256 bytes of data at I2C address
+ * 0x51 (A2h in the spec). Page selection and paged access
+ * only apply to this second I2C address (0x51).
+ * e) The address space is presented, by the driver, as a linear
+ * address space. For devices with one I2C client at address
+ * 0x50 (eg QSFP), offset 0-127 are in the lower
+ * half of address 50/A0h/client[0]. Offset 128-255 are in
+ * page 0, 256-383 are page 1, etc. More generally, offset
+ * 'n' resides in page (n/128)-1. ('page -1' is the lower
+ * half, offset 0-127).
+ * f) For devices with two I2C clients at address 0x50 and 0x51 (eg SFP),
+ * the address space places offset 0-127 in the lower
+ * half of 50/A0/client[0], offset 128-255 in the upper
+ * half. Offset 256-383 is in the lower half of 51/A2/client[1].
+ * Offset 384-511 is in page 0, in the upper half of 51/A2/...
+ * Offset 512-639 is in page 1, in the upper half of 51/A2/...
+ * Offset 'n' is in page (n/128)-3 (for n > 383)
+ *
+ * One I2c addressed (eg QSFP) Memory Map
+ *
+ * 2-Wire Serial Address: 1010000x
+ *
+ * Lower Page 00h (128 bytes)
+ * =====================
+ * | |
+ * | |
+ * | |
+ * | |
+ * | |
+ * | |
+ * | |
+ * | |
+ * | |
+ * | |
+ * |Page Select Byte(127)|
+ * =====================
+ * |
+ * |
+ * |
+ * |
+ * V
+ * ------------------------------------------------------------
+ * | | | |
+ * | | | |
+ * | | | |
+ * | | | |
+ * | | | |
+ * | | | |
+ * | | | |
+ * | | | |
+ * | | | |
+ * V V V V
+ * ------------ -------------- --------------- --------------
+ * | | | | | | | |
+ * | Upper | | Upper | | Upper | | Upper |
+ * | Page 00h | | Page 01h | | Page 02h | | Page 03h |
+ * | | | (Optional) | | (Optional) | | (Optional |
+ * | | | | | | | for Cable |
+ * | | | | | | | Assemblies) |
+ * | ID | | AST | | User | | |
+ * | Fields | | Table | | EEPROM Data | | |
+ * | | | | | | | |
+ * | | | | | | | |
+ * | | | | | | | |
+ * ------------ -------------- --------------- --------------
+ *
+ * The SFF 8436 (QSFP) spec only defines the 4 pages described above.
+ * In anticipation of future applications and devices, this driver
+ * supports access to the full architected range, 256 pages.
+ *
+ **/
+
+/* #define DEBUG 1 */
+
+#undef EEPROM_CLASS
+#ifdef CONFIG_EEPROM_CLASS
+#define EEPROM_CLASS
+#endif
+#ifdef CONFIG_EEPROM_CLASS_MODULE
+#define EEPROM_CLASS
+#endif
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/delay.h>
+#include <linux/mutex.h>
+#include <linux/sysfs.h>
+#include <linux/jiffies.h>
+#include <linux/i2c.h>
+
+#ifdef EEPROM_CLASS
+#include <linux/eeprom_class.h>
+#endif
+
+#include <linux/types.h>
+
+/* The maximum length of a port name */
+#define MAX_PORT_NAME_LEN 20
+
+struct optoe_platform_data {
+ u32 byte_len; /* size (sum of all addr) */
+ u16 page_size; /* for writes */
+ u8 flags;
+ void *dummy1; /* backward compatibility */
+ void *dummy2; /* backward compatibility */
+
+#ifdef EEPROM_CLASS
+ struct eeprom_platform_data *eeprom_data;
+#endif
+ char port_name[MAX_PORT_NAME_LEN];
+};
+
+/* fundamental unit of addressing for EEPROM */
+#define OPTOE_PAGE_SIZE 128
+/*
+ * Single address devices (eg QSFP) have 256 pages, plus the unpaged
+ * low 128 bytes. If the device does not support paging, it is
+ * only 2 'pages' long.
+ */
+#define OPTOE_ARCH_PAGES 256
+#define ONE_ADDR_EEPROM_SIZE ((1 + OPTOE_ARCH_PAGES) * OPTOE_PAGE_SIZE)
+#define ONE_ADDR_EEPROM_UNPAGED_SIZE (2 * OPTOE_PAGE_SIZE)
+/*
+ * Dual address devices (eg SFP) have 256 pages, plus the unpaged
+ * low 128 bytes, plus 256 bytes at 0x50. If the device does not
+ * support paging, it is 4 'pages' long.
+ */
+#define TWO_ADDR_EEPROM_SIZE ((3 + OPTOE_ARCH_PAGES) * OPTOE_PAGE_SIZE)
+#define TWO_ADDR_EEPROM_UNPAGED_SIZE (4 * OPTOE_PAGE_SIZE)
+#define TWO_ADDR_NO_0X51_SIZE (2 * OPTOE_PAGE_SIZE)
+
+/* a few constants to find our way around the EEPROM */
+#define OPTOE_PAGE_SELECT_REG 0x7F
+#define ONE_ADDR_PAGEABLE_REG 0x02
+#define ONE_ADDR_NOT_PAGEABLE BIT(2)
+#define TWO_ADDR_PAGEABLE_REG 0x40
+#define TWO_ADDR_PAGEABLE BIT(4)
+#define TWO_ADDR_0X51_REG 92
+#define TWO_ADDR_0X51_SUPP BIT(6)
+#define OPTOE_ID_REG 0
+#define OPTOE_READ_OP 0
+#define OPTOE_WRITE_OP 1
+#define OPTOE_EOF 0 /* used for access beyond end of device */
+
+struct optoe_data {
+ struct optoe_platform_data chip;
+ int use_smbus;
+ char port_name[MAX_PORT_NAME_LEN];
+
+ /*
+ * Lock protects against activities from other Linux tasks,
+ * but not from changes by other I2C masters.
+ */
+ struct mutex lock;
+ struct bin_attribute bin;
+ struct attribute_group attr_group;
+
+ u8 *writebuf;
+ unsigned int write_max;
+
+ unsigned int num_addresses;
+
+#ifdef EEPROM_CLASS
+ struct eeprom_device *eeprom_dev;
+#endif
+
+ /* dev_class: ONE_ADDR (QSFP) or TWO_ADDR (SFP) */
+ int dev_class;
+
+ struct i2c_client *client[2];
+};
+
+/*
+ * This parameter is to help this driver avoid blocking other drivers out
+ * of I2C for potentially troublesome amounts of time. With a 100 kHz I2C
+ * clock, one 256 byte read takes about 1/43 second which is excessive;
+ * but the 1/170 second it takes at 400 kHz may be quite reasonable; and
+ * at 1 MHz (Fm+) a 1/430 second delay could easily be invisible.
+ *
+ * This value is forced to be a power of two so that writes align on pages.
+ */
+static unsigned int io_limit = OPTOE_PAGE_SIZE;
+
+/*
+ * specs often allow 5 msec for a page write, sometimes 20 msec;
+ * it's important to recover from write timeouts.
+ */
+static unsigned int write_timeout = 25;
+
+/*
+ * flags to distinguish one-address (QSFP family) from two-address (SFP family)
+ * If the family is not known, figure it out when the device is accessed
+ */
+#define ONE_ADDR 1
+#define TWO_ADDR 2
+
+static const struct i2c_device_id optoe_ids[] = {
+ { "optoe1", ONE_ADDR },
+ { "optoe2", TWO_ADDR },
+ { "sff8436", ONE_ADDR },
+ { "24c04", TWO_ADDR },
+ { /* END OF LIST */ }
+};
+MODULE_DEVICE_TABLE(i2c, optoe_ids);
+
+/*-------------------------------------------------------------------------*/
+/*
+ * This routine computes the addressing information to be used for
+ * a given r/w request.
+ *
+ * Task is to calculate the client (0 = i2c addr 50, 1 = i2c addr 51),
+ * the page, and the offset.
+ *
+ * Handles both single address (eg QSFP) and two address (eg SFP).
+ * For SFP, offset 0-255 are on client[0], >255 is on client[1]
+ * Offset 256-383 are on the lower half of client[1]
+ * Pages are accessible on the upper half of client[1].
+ * Offset >383 are in 128 byte pages mapped into the upper half
+ *
+ * For QSFP, all offsets are on client[0]
+ * offset 0-127 are on the lower half of client[0] (no paging)
+ * Pages are accessible on the upper half of client[1].
+ * Offset >127 are in 128 byte pages mapped into the upper half
+ *
+ * Callers must not read/write beyond the end of a client or a page
+ * without recomputing the client/page. Hence offset (within page)
+ * plus length must be less than or equal to 128. (Note that this
+ * routine does not have access to the length of the call, hence
+ * cannot do the validity check.)
+ *
+ * Offset within Lower Page 00h and Upper Page 00h are not recomputed
+ */
+
+static uint8_t optoe_translate_offset(struct optoe_data *optoe,
+ loff_t *offset,
+ struct i2c_client **client)
+{
+ unsigned int page = 0;
+
+ *client = optoe->client[0];
+
+ /* if SFP style, offset > 255, shift to i2c addr 0x51 */
+ if (optoe->dev_class == TWO_ADDR) {
+ if (*offset > 255) {
+ /* like QSFP, but shifted to client[1] */
+ *client = optoe->client[1];
+ *offset -= 256;
+ }
+ }
+
+ /*
+ * if offset is in the range 0-128...
+ * page doesn't matter (using lower half), return 0.
+ * offset is already correct (don't add 128 to get to paged area)
+ */
+ if (*offset < OPTOE_PAGE_SIZE)
+ return page;
+
+ /* note, page will always be positive since *offset >= 128 */
+ page = (*offset >> 7) - 1;
+ /* 0x80 places the offset in the top half, offset is last 7 bits */
+ *offset = OPTOE_PAGE_SIZE + (*offset & 0x7f);
+
+ return page; /* note also returning client and offset */
+}
+
+static ssize_t optoe_eeprom_read(struct optoe_data *optoe,
+ struct i2c_client *client,
+ char *buf, unsigned int offset, size_t count)
+{
+ struct i2c_msg msg[2];
+ u8 msgbuf[2];
+ unsigned long timeout, read_time;
+ int status, i;
+
+ memset(msg, 0, sizeof(msg));
+
+ switch (optoe->use_smbus) {
+ case I2C_SMBUS_I2C_BLOCK_DATA:
+ /*smaller eeproms can work given some SMBus extension calls */
+ if (count > I2C_SMBUS_BLOCK_MAX)
+ count = I2C_SMBUS_BLOCK_MAX;
+ break;
+ case I2C_SMBUS_WORD_DATA:
+ /* Check for odd length transaction */
+ count = (count == 1) ? 1 : 2;
+ break;
+ case I2C_SMBUS_BYTE_DATA:
+ count = 1;
+ break;
+ default:
+ /*
+ * When we have a better choice than SMBus calls, use a
+ * combined I2C message. Write address; then read up to
+ * io_limit data bytes. msgbuf is u8 and will cast to our
+ * needs.
+ */
+ i = 0;
+ msgbuf[i++] = offset;
+
+ msg[0].addr = client->addr;
+ msg[0].buf = msgbuf;
+ msg[0].len = i;
+
+ msg[1].addr = client->addr;
+ msg[1].flags = I2C_M_RD;
+ msg[1].buf = buf;
+ msg[1].len = count;
+ }
+
+ /*
+ * Reads fail if the previous write didn't complete yet. We may
+ * loop a few times until this one succeeds, waiting at least
+ * long enough for one entire page write to work.
+ */
+ timeout = jiffies + msecs_to_jiffies(write_timeout);
+ do {
+ read_time = jiffies;
+
+ switch (optoe->use_smbus) {
+ case I2C_SMBUS_I2C_BLOCK_DATA:
+ status = i2c_smbus_read_i2c_block_data(client, offset,
+ count, buf);
+ break;
+ case I2C_SMBUS_WORD_DATA:
+ status = i2c_smbus_read_word_data(client, offset);
+ if (status >= 0) {
+ buf[0] = status & 0xff;
+ if (count == 2)
+ buf[1] = status >> 8;
+ status = count;
+ }
+ break;
+ case I2C_SMBUS_BYTE_DATA:
+ status = i2c_smbus_read_byte_data(client, offset);
+ if (status >= 0) {
+ buf[0] = status;
+ status = count;
+ }
+ break;
+ default:
+ status = i2c_transfer(client->adapter, msg, 2);
+ if (status == 2)
+ status = count;
+ }
+
+ dev_dbg(&client->dev, "eeprom read %zu@%d --> %d (%ld)\n",
+ count, offset, status, jiffies);
+
+ if (status == count) /* happy path */
+ return count;
+
+ if (status == -ENXIO) /* no module present */
+ return status;
+
+ /* REVISIT: at HZ=100, this is sloooow */
+ usleep_range(1000, 2000);
+ } while (time_before(read_time, timeout));
+
+ return -ETIMEDOUT;
+}
+
+static ssize_t optoe_eeprom_write(struct optoe_data *optoe,
+ struct i2c_client *client,
+ const char *buf,
+ unsigned int offset, size_t count)
+{
+ struct i2c_msg msg;
+ ssize_t status;
+ unsigned long timeout, write_time;
+ unsigned int next_page_start;
+ int i = 0;
+ u16 writeword;
+
+ /* write max is at most a page
+ * (In this driver, write_max is actually one byte!)
+ */
+ if (count > optoe->write_max)
+ count = optoe->write_max;
+
+ /* shorten count if necessary to avoid crossing page boundary */
+ next_page_start = roundup(offset + 1, OPTOE_PAGE_SIZE);
+ if (offset + count > next_page_start)
+ count = next_page_start - offset;
+
+ switch (optoe->use_smbus) {
+ case I2C_SMBUS_I2C_BLOCK_DATA:
+ /*smaller eeproms can work given some SMBus extension calls */
+ if (count > I2C_SMBUS_BLOCK_MAX)
+ count = I2C_SMBUS_BLOCK_MAX;
+ break;
+ case I2C_SMBUS_WORD_DATA:
+ /* Check for odd length transaction */
+ count = (count == 1) ? 1 : 2;
+ break;
+ case I2C_SMBUS_BYTE_DATA:
+ count = 1;
+ break;
+ default:
+ /* If we'll use I2C calls for I/O, set up the message */
+ msg.addr = client->addr;
+ msg.flags = 0;
+
+ /* msg.buf is u8 and casts will mask the values */
+ msg.buf = optoe->writebuf;
+
+ msg.buf[i++] = offset;
+ memcpy(&msg.buf[i], buf, count);
+ msg.len = i + count;
+ break;
+ }
+
+ /*
+ * Reads fail if the previous write didn't complete yet. We may
+ * loop a few times until this one succeeds, waiting at least
+ * long enough for one entire page write to work.
+ */
+ timeout = jiffies + msecs_to_jiffies(write_timeout);
+ do {
+ write_time = jiffies;
+
+ switch (optoe->use_smbus) {
+ case I2C_SMBUS_I2C_BLOCK_DATA:
+ status = i2c_smbus_write_i2c_block_data(client,
+ offset,
+ count,
+ buf);
+ if (status == 0)
+ status = count;
+ break;
+ case I2C_SMBUS_WORD_DATA:
+ if (count == 2) {
+ writeword = (buf[1] << 8) | buf[0];
+ status = i2c_smbus_write_word_data(client,
+ offset,
+ writeword);
+ } else {
+ /* count = 1 */
+ status = i2c_smbus_write_byte_data(client,
+ offset,
+ buf[0]);
+ }
+ if (status == 0)
+ status = count;
+ break;
+ case I2C_SMBUS_BYTE_DATA:
+ status = i2c_smbus_write_byte_data(client, offset,
+ buf[0]);
+ if (status == 0)
+ status = count;
+ break;
+ default:
+ status = i2c_transfer(client->adapter, &msg, 1);
+ if (status == 1)
+ status = count;
+ break;
+ }
+
+ dev_dbg(&client->dev, "eeprom write %zu@%d --> %ld (%lu)\n",
+ count, offset, (long int)status, jiffies);
+
+ if (status == count)
+ return count;
+
+ /* REVISIT: at HZ=100, this is sloooow */
+ usleep_range(1000, 2000);
+ } while (time_before(write_time, timeout));
+
+ return -ETIMEDOUT;
+}
+
+static ssize_t optoe_eeprom_update_client(struct optoe_data *optoe,
+ char *buf, loff_t off,
+ size_t count, int opcode)
+{
+ struct i2c_client *client;
+ ssize_t retval = 0;
+ u8 page = 0;
+ loff_t phy_offset = off;
+ int ret = 0;
+
+ page = optoe_translate_offset(optoe, &phy_offset, &client);
+ dev_dbg(&client->dev,
+ "%s off %lld page:%d phy_offset:%lld, count:%ld, opcode:%d\n",
+ __func__, off, page, phy_offset, (long int)count, opcode);
+ if (page > 0) {
+ ret = optoe_eeprom_write(optoe, client, &page,
+ OPTOE_PAGE_SELECT_REG, 1);
+ if (ret < 0) {
+ dev_dbg(&client->dev,
+ "Write page register for page %d failed ret:%d!\n",
+ page, ret);
+ return ret;
+ }
+ }
+
+ while (count) {
+ ssize_t status;
+
+ if (opcode == OPTOE_READ_OP) {
+ status = optoe_eeprom_read(optoe, client, buf,
+ phy_offset, count);
+ } else {
+ status = optoe_eeprom_write(optoe, client, buf,
+ phy_offset, count);
+ }
+ if (status <= 0) {
+ if (retval == 0)
+ retval = status;
+ break;
+ }
+ buf += status;
+ phy_offset += status;
+ count -= status;
+ retval += status;
+ }
+
+ if (page > 0) {
+ /* return the page register to page 0 (why?) */
+ page = 0;
+ ret = optoe_eeprom_write(optoe, client, &page,
+ OPTOE_PAGE_SELECT_REG, 1);
+ if (ret < 0) {
+ dev_err(&client->dev,
+ "Restore page register to 0 failed:%d!\n", ret);
+ /* error only if nothing has been transferred */
+ if (retval == 0)
+ retval = ret;
+ }
+ }
+ return retval;
+}
+
+/*
+ * Figure out if this access is within the range of supported pages.
+ * Note this is called on every access because we don't know if the
+ * module has been replaced since the last call.
+ * If/when modules support more pages, this is the routine to update
+ * to validate and allow access to additional pages.
+ *
+ * Returns updated len for this access:
+ * - entire access is legal, original len is returned.
+ * - access begins legal but is too long, len is truncated to fit.
+ * - initial offset exceeds supported pages, return OPTOE_EOF (zero)
+ */
+static ssize_t optoe_page_legal(struct optoe_data *optoe,
+ loff_t off, size_t len)
+{
+ struct i2c_client *client = optoe->client[0];
+ u8 regval;
+ int status;
+ size_t maxlen;
+
+ if (off < 0)
+ return -EINVAL;
+ if (optoe->dev_class == TWO_ADDR) {
+ /* SFP case */
+ /* if only using addr 0x50 (first 256 bytes) we're good */
+ if ((off + len) <= TWO_ADDR_NO_0X51_SIZE)
+ return len;
+ /* if offset exceeds possible pages, we're not good */
+ if (off >= TWO_ADDR_EEPROM_SIZE)
+ return OPTOE_EOF;
+ /* in between, are pages supported? */
+ status = optoe_eeprom_read(optoe, client, ®val,
+ TWO_ADDR_PAGEABLE_REG, 1);
+ if (status < 0)
+ return status; /* error out (no module?) */
+ if (regval & TWO_ADDR_PAGEABLE) {
+ /* Pages supported, trim len to the end of pages */
+ maxlen = TWO_ADDR_EEPROM_SIZE - off;
+ } else {
+ /* pages not supported, trim len to unpaged size */
+ if (off >= TWO_ADDR_EEPROM_UNPAGED_SIZE)
+ return OPTOE_EOF;
+
+ /* will be accessing addr 0x51, is that supported? */
+ /* byte 92, bit 6 implies DDM support, 0x51 support */
+ status = optoe_eeprom_read(optoe, client, ®val,
+ TWO_ADDR_0X51_REG, 1);
+ if (status < 0)
+ return status;
+ if (regval & TWO_ADDR_0X51_SUPP) {
+ /* addr 0x51 is OK */
+ maxlen = TWO_ADDR_EEPROM_UNPAGED_SIZE - off;
+ } else {
+ /* addr 0x51 NOT supported, trim to 256 max */
+ if (off >= TWO_ADDR_NO_0X51_SIZE)
+ return OPTOE_EOF;
+ maxlen = TWO_ADDR_NO_0X51_SIZE - off;
+ }
+ }
+ len = (len > maxlen) ? maxlen : len;
+ dev_dbg(&client->dev,
+ "page_legal, SFP, off %lld len %ld\n",
+ off, (long int)len);
+ } else {
+ /* QSFP case */
+ /* if no pages needed, we're good */
+ if ((off + len) <= ONE_ADDR_EEPROM_UNPAGED_SIZE)
+ return len;
+ /* if offset exceeds possible pages, we're not good */
+ if (off >= ONE_ADDR_EEPROM_SIZE)
+ return OPTOE_EOF;
+ /* in between, are pages supported? */
+ status = optoe_eeprom_read(optoe, client, ®val,
+ ONE_ADDR_PAGEABLE_REG, 1);
+ if (status < 0)
+ return status; /* error out (no module?) */
+ if (regval & ONE_ADDR_NOT_PAGEABLE) {
+ /* pages not supported, trim len to unpaged size */
+ if (off >= ONE_ADDR_EEPROM_UNPAGED_SIZE)
+ return OPTOE_EOF;
+ maxlen = ONE_ADDR_EEPROM_UNPAGED_SIZE - off;
+ } else {
+ /* Pages supported, trim len to the end of pages */
+ maxlen = ONE_ADDR_EEPROM_SIZE - off;
+ }
+ len = (len > maxlen) ? maxlen : len;
+ dev_dbg(&client->dev,
+ "page_legal, QSFP, off %lld len %ld\n",
+ off, (long int)len);
+ }
+ return len;
+}
+
+static ssize_t optoe_read_write(struct optoe_data *optoe,
+ char *buf, loff_t off,
+ size_t len, int opcode)
+{
+ struct i2c_client *client = optoe->client[0];
+ int chunk;
+ int status = 0;
+ ssize_t retval;
+ size_t pending_len = 0, chunk_len = 0;
+ loff_t chunk_offset = 0, chunk_start_offset = 0;
+
+ dev_dbg(&client->dev,
+ "%s: off %lld len:%ld, opcode:%s\n",
+ __func__, off, (long int)len,
+ (opcode == OPTOE_READ_OP) ? "r" : "w");
+ if (unlikely(!len))
+ return len;
+
+ /*
+ * Read data from chip, protecting against concurrent updates
+ * from this host, but not from other I2C masters.
+ */
+ mutex_lock(&optoe->lock);
+
+ /*
+ * Confirm this access fits within the device supported addr range
+ */
+ status = optoe_page_legal(optoe, off, len);
+ if (status == OPTOE_EOF || status < 0) {
+ mutex_unlock(&optoe->lock);
+ return status;
+ }
+ len = status;
+
+ /*
+ * For each (128 byte) chunk involved in this request, issue a
+ * separate call to sff_eeprom_update_client(), to
+ * ensure that each access recalculates the client/page
+ * and writes the page register as needed.
+ * Note that chunk to page mapping is confusing, is different for
+ * QSFP and SFP, and never needs to be done. Don't try!
+ */
+ pending_len = len; /* amount remaining to transfer */
+ retval = 0; /* amount transferred */
+ for (chunk = off >> 7; chunk <= (off + len - 1) >> 7; chunk++) {
+ /*
+ * Compute the offset and number of bytes to be read/write
+ *
+ * 1. start at offset 0 (within the chunk), and read/write
+ * the entire chunk
+ * 2. start at offset 0 (within the chunk) and read/write less
+ * than entire chunk
+ * 3. start at an offset not equal to 0 and read/write the rest
+ * of the chunk
+ * 4. start at an offset not equal to 0 and read/write less than
+ * (end of chunk - offset)
+ */
+ chunk_start_offset = chunk * OPTOE_PAGE_SIZE;
+
+ if (chunk_start_offset < off) {
+ chunk_offset = off;
+ if ((off + pending_len) < (chunk_start_offset +
+ OPTOE_PAGE_SIZE))
+ chunk_len = pending_len;
+ else
+ chunk_len = OPTOE_PAGE_SIZE - off;
+ } else {
+ chunk_offset = chunk_start_offset;
+ if (pending_len > OPTOE_PAGE_SIZE)
+ chunk_len = OPTOE_PAGE_SIZE;
+ else
+ chunk_len = pending_len;
+ }
+
+ dev_dbg(&client->dev,
+ "sff_r/w: off %lld, len %ld, chunk_start_offset %lld, chunk_offset %lld, chunk_len %ld, pending_len %ld\n",
+ off, (long int)len, chunk_start_offset, chunk_offset,
+ (long int)chunk_len, (long int)pending_len);
+
+ /*
+ * note: chunk_offset is from the start of the EEPROM,
+ * not the start of the chunk
+ */
+ status = optoe_eeprom_update_client(optoe, buf, chunk_offset,
+ chunk_len, opcode);
+ if (status != chunk_len) {
+ /* This is another 'no device present' path */
+ dev_dbg(&client->dev,
+ "o_u_c: chunk %d c_offset %lld c_len %ld failed %d!\n",
+ chunk, chunk_offset, (long int)chunk_len, status);
+ if (status > 0)
+ retval += status;
+ if (retval == 0)
+ retval = status;
+ break;
+ }
+ buf += status;
+ pending_len -= status;
+ retval += status;
+ }
+ mutex_unlock(&optoe->lock);
+
+ return retval;
+}
+
+static ssize_t optoe_bin_read(struct file *filp, struct kobject *kobj,
+ struct bin_attribute *attr,
+ char *buf, loff_t off, size_t count)
+{
+ struct i2c_client *client = to_i2c_client(container_of(kobj,
+ struct device, kobj));
+ struct optoe_data *optoe = i2c_get_clientdata(client);
+
+ return optoe_read_write(optoe, buf, off, count, OPTOE_READ_OP);
+}
+
+static ssize_t optoe_bin_write(struct file *filp, struct kobject *kobj,
+ struct bin_attribute *attr,
+ char *buf, loff_t off, size_t count)
+{
+ struct i2c_client *client = to_i2c_client(container_of(kobj,
+ struct device, kobj));
+ struct optoe_data *optoe = i2c_get_clientdata(client);
+
+ return optoe_read_write(optoe, buf, off, count, OPTOE_WRITE_OP);
+}
+
+static int optoe_remove(struct i2c_client *client)
+{
+ struct optoe_data *optoe;
+
+ optoe = i2c_get_clientdata(client);
+ sysfs_remove_group(&client->dev.kobj, &optoe->attr_group);
+ sysfs_remove_bin_file(&client->dev.kobj, &optoe->bin);
+
+ if (optoe->num_addresses == 2)
+ i2c_unregister_device(optoe->client[1]);
+
+#ifdef EEPROM_CLASS
+ eeprom_device_unregister(optoe->eeprom_dev);
+#endif
+
+ kfree(optoe->writebuf);
+ kfree(optoe);
+ return 0;
+}
+
+static ssize_t dev_class_show(struct device *dev,
+ struct device_attribute *dattr, char *buf)
+{
+ struct i2c_client *client = to_i2c_client(dev);
+ struct optoe_data *optoe = i2c_get_clientdata(client);
+ ssize_t count;
+
+ mutex_lock(&optoe->lock);
+ count = sprintf(buf, "%d\n", optoe->dev_class);
+ mutex_unlock(&optoe->lock);
+
+ return count;
+}
+
+static ssize_t dev_class_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct i2c_client *client = to_i2c_client(dev);
+ struct optoe_data *optoe = i2c_get_clientdata(client);
+ int dev_class;
+
+ /*
+ * dev_class is actually the number of i2c addresses used, thus
+ * legal values are "1" (QSFP class) and "2" (SFP class)
+ */
+
+ if (kstrtoint(buf, 0, &dev_class) != 0 ||
+ dev_class < 1 || dev_class > 2)
+ return -EINVAL;
+
+ mutex_lock(&optoe->lock);
+ optoe->dev_class = dev_class;
+ mutex_unlock(&optoe->lock);
+
+ return count;
+}
+
+/*
+ * if using the EEPROM CLASS driver, we don't report a port_name,
+ * the EEPROM CLASS drive handles that. Hence all this code is
+ * only compiled if we are NOT using the EEPROM CLASS driver.
+ */
+#ifndef EEPROM_CLASS
+
+static ssize_t port_name_show(struct device *dev,
+ struct device_attribute *dattr, char *buf)
+{
+ struct i2c_client *client = to_i2c_client(dev);
+ struct optoe_data *optoe = i2c_get_clientdata(client);
+ ssize_t count;
+
+ mutex_lock(&optoe->lock);
+ count = sprintf(buf, "%s\n", optoe->port_name);
+ mutex_unlock(&optoe->lock);
+
+ return count;
+}
+
+static ssize_t port_name_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct i2c_client *client = to_i2c_client(dev);
+ struct optoe_data *optoe = i2c_get_clientdata(client);
+ char port_name[MAX_PORT_NAME_LEN];
+
+ /* no checking, this value is not used except by port_name_show */
+
+ if (sscanf(buf, "%19s", port_name) != 1)
+ return -EINVAL;
+
+ mutex_lock(&optoe->lock);
+ strcpy(optoe->port_name, port_name);
+ mutex_unlock(&optoe->lock);
+
+ return count;
+}
+
+static DEVICE_ATTR_RW(port_name);
+#endif /* if NOT defined EEPROM_CLASS, the common case */
+
+static DEVICE_ATTR_RW(dev_class);
+
+static struct attribute *optoe_attrs[] = {
+#ifndef EEPROM_CLASS
+ &dev_attr_port_name.attr,
+#endif
+ &dev_attr_dev_class.attr,
+ NULL,
+};
+
+static struct attribute_group optoe_attr_group = {
+ .attrs = optoe_attrs,
+};
+
+static int optoe_probe(struct i2c_client *client,
+ const struct i2c_device_id *id)
+{
+ int err;
+ int use_smbus = 0;
+ struct optoe_platform_data chip;
+ struct optoe_data *optoe;
+ int num_addresses;
+ char port_name[MAX_PORT_NAME_LEN];
+
+ if (client->addr != 0x50) {
+ dev_dbg(&client->dev, "probe, bad i2c addr: 0x%x\n",
+ client->addr);
+ err = -EINVAL;
+ goto exit;
+ }
+
+ if (client->dev.platform_data) {
+ chip = *(struct optoe_platform_data *)client->dev.platform_data;
+ /* take the port name from the supplied platform data */
+#ifdef EEPROM_CLASS
+ strncpy(port_name, chip.eeprom_data->label, MAX_PORT_NAME_LEN);
+#else
+ memcpy(port_name, chip.port_name, MAX_PORT_NAME_LEN);
+#endif
+ dev_dbg(&client->dev,
+ "probe, chip provided, flags:0x%x; name: %s\n",
+ chip.flags, client->name);
+ } else {
+ if (!id->driver_data) {
+ err = -ENODEV;
+ goto exit;
+ }
+ dev_dbg(&client->dev, "probe, building chip\n");
+ strcpy(port_name, "unitialized");
+ chip.flags = 0;
+#ifdef EEPROM_CLASS
+ chip.eeprom_data = NULL;
+#endif
+ }
+
+ /* Use I2C operations unless we're stuck with SMBus extensions. */
+ if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) {
+ if (i2c_check_functionality(client->adapter,
+ I2C_FUNC_SMBUS_READ_I2C_BLOCK)) {
+ use_smbus = I2C_SMBUS_I2C_BLOCK_DATA;
+ } else if (i2c_check_functionality(client->adapter,
+ I2C_FUNC_SMBUS_READ_WORD_DATA)) {
+ use_smbus = I2C_SMBUS_WORD_DATA;
+ } else if (i2c_check_functionality(client->adapter,
+ I2C_FUNC_SMBUS_READ_BYTE_DATA)) {
+ use_smbus = I2C_SMBUS_BYTE_DATA;
+ } else {
+ err = -EPFNOSUPPORT;
+ goto exit;
+ }
+ }
+
+ optoe = kzalloc(sizeof(*optoe), GFP_KERNEL);
+ if (!optoe) {
+ err = -ENOMEM;
+ goto exit;
+ }
+
+ mutex_init(&optoe->lock);
+
+ /* determine whether this is a one-address or two-address module */
+ if ((strcmp(client->name, "optoe1") == 0) ||
+ (strcmp(client->name, "sff8436") == 0)) {
+ /* one-address (eg QSFP) family */
+ optoe->dev_class = ONE_ADDR;
+ chip.byte_len = ONE_ADDR_EEPROM_SIZE;
+ num_addresses = 1;
+ } else if ((strcmp(client->name, "optoe2") == 0) ||
+ (strcmp(client->name, "24c04") == 0)) {
+ /* SFP family */
+ optoe->dev_class = TWO_ADDR;
+ chip.byte_len = TWO_ADDR_EEPROM_SIZE;
+ num_addresses = 2;
+ } else { /* those were the only two choices */
+ err = -EINVAL;
+ goto exit;
+ }
+
+ dev_dbg(&client->dev, "dev_class: %d\n", optoe->dev_class);
+ optoe->use_smbus = use_smbus;
+ optoe->chip = chip;
+ optoe->num_addresses = num_addresses;
+ memcpy(optoe->port_name, port_name, MAX_PORT_NAME_LEN);
+
+ /*
+ * Export the EEPROM bytes through sysfs, since that's convenient.
+ * By default, only root should see the data (maybe passwords etc)
+ */
+ sysfs_bin_attr_init(&optoe->bin);
+ optoe->bin.attr.name = "eeprom";
+ optoe->bin.attr.mode = 0444;
+ optoe->bin.read = optoe_bin_read;
+ optoe->bin.size = chip.byte_len;
+
+ if (!use_smbus ||
+ i2c_check_functionality(client->adapter,
+ I2C_FUNC_SMBUS_WRITE_I2C_BLOCK) ||
+ i2c_check_functionality(client->adapter,
+ I2C_FUNC_SMBUS_WRITE_WORD_DATA) ||
+ i2c_check_functionality(client->adapter,
+ I2C_FUNC_SMBUS_WRITE_BYTE_DATA)) {
+ /*
+ * NOTE: AN-2079
+ * Finisar recommends that the host implement 1 byte writes
+ * only since this module only supports 32 byte page boundaries.
+ * 2 byte writes are acceptable for PE and Vout changes per
+ * Application Note AN-2071.
+ */
+ unsigned int write_max = 1;
+
+ optoe->bin.write = optoe_bin_write;
+ optoe->bin.attr.mode |= 0200;
+
+ if (write_max > io_limit)
+ write_max = io_limit;
+ if (use_smbus && write_max > I2C_SMBUS_BLOCK_MAX)
+ write_max = I2C_SMBUS_BLOCK_MAX;
+ optoe->write_max = write_max;
+
+ /* buffer (data + address at the beginning) */
+ optoe->writebuf = kmalloc(write_max + 2, GFP_KERNEL);
+ if (!optoe->writebuf) {
+ err = -ENOMEM;
+ goto exit_kfree;
+ }
+ } else {
+ dev_warn(&client->dev,
+ "cannot write due to controller restrictions.");
+ }
+
+ optoe->client[0] = client;
+
+ /* SFF-8472 spec requires that the second I2C address be 0x51 */
+ if (num_addresses == 2) {
+ optoe->client[1] = i2c_new_dummy(client->adapter, 0x51);
+ if (!optoe->client[1]) {
+ dev_err(&client->dev, "address 0x51 unavailable\n");
+ err = -EADDRINUSE;
+ goto err_struct;
+ }
+ }
+
+ /* create the sysfs eeprom file */
+ err = sysfs_create_bin_file(&client->dev.kobj, &optoe->bin);
+ if (err)
+ goto err_struct;
+
+ optoe->attr_group = optoe_attr_group;
+
+ err = sysfs_create_group(&client->dev.kobj, &optoe->attr_group);
+ if (err) {
+ dev_err(&client->dev, "failed to create sysfs attribute group.\n");
+ goto err_struct;
+ }
+
+#ifdef EEPROM_CLASS
+ optoe->eeprom_dev = eeprom_device_register(&client->dev,
+ chip.eeprom_data);
+ if (IS_ERR(optoe->eeprom_dev)) {
+ dev_err(&client->dev, "error registering eeprom device.\n");
+ err = PTR_ERR(optoe->eeprom_dev);
+ goto err_sysfs_cleanup;
+ }
+#endif
+
+ i2c_set_clientdata(client, optoe);
+
+ dev_info(&client->dev, "%zu byte %s EEPROM, %s\n",
+ optoe->bin.size, client->name,
+ optoe->bin.write ? "read/write" : "read-only");
+
+ if (use_smbus == I2C_SMBUS_WORD_DATA ||
+ use_smbus == I2C_SMBUS_BYTE_DATA) {
+ dev_notice(&client->dev,
+ "Falling back to %s reads, performance will suffer\n",
+ use_smbus == I2C_SMBUS_WORD_DATA ? "word" : "byte");
+ }
+
+ return 0;
+
+#ifdef EEPROM_CLASS
+err_sysfs_cleanup:
+ sysfs_remove_group(&client->dev.kobj, &optoe->attr_group);
+ sysfs_remove_bin_file(&client->dev.kobj, &optoe->bin);
+#endif
+
+err_struct:
+ if (num_addresses == 2) {
+ if (optoe->client[1])
+ i2c_unregister_device(optoe->client[1]);
+ }
+
+ kfree(optoe->writebuf);
+exit_kfree:
+ kfree(optoe);
+exit:
+ dev_dbg(&client->dev, "probe error %d\n", err);
+
+ return err;
+}
+
+/*-------------------------------------------------------------------------*/
+
+static struct i2c_driver optoe_driver = {
+ .driver = {
+ .name = "optoe",
+ .owner = THIS_MODULE,
+ },
+ .probe = optoe_probe,
+ .remove = optoe_remove,
+ .id_table = optoe_ids,
+};
+
+static int __init optoe_init(void)
+{
+ if (!io_limit) {
+ pr_err("optoe: io_limit must not be 0!\n");
+ return -EINVAL;
+ }
+
+ io_limit = rounddown_pow_of_two(io_limit);
+ return i2c_add_driver(&optoe_driver);
+}
+module_init(optoe_init);
+
+static void __exit optoe_exit(void)
+{
+ i2c_del_driver(&optoe_driver);
+}
+module_exit(optoe_exit);
+
+MODULE_DESCRIPTION("Driver for optical transceiver (SFP, QSFP, ...) EEPROMs");
+MODULE_AUTHOR("DON BOLLINGER <[email protected]>");
+MODULE_LICENSE("GPL");
--
2.11.0
Hi,
On 06/10/2018 09:25 PM, Don Bollinger wrote:
> diff --git a/drivers/misc/eeprom/Kconfig b/drivers/misc/eeprom/Kconfig
> index 68a1ac929917..9a08e12756ee 100644
> --- a/drivers/misc/eeprom/Kconfig
> +++ b/drivers/misc/eeprom/Kconfig
> @@ -111,4 +111,22 @@ config EEPROM_IDT_89HPESX
> This driver can also be built as a module. If so, the module
> will be called idt_89hpesx.
>
> +config EEPROM_OPTOE
> + tristate "read/write access to SFP* & QSFP* EEPROMs"
> + depends on I2C && SYSFS
> + help
> + If you say yes here you get support for read and write access to
> + the EEPROM of SFP and QSFP type optical and copper transceivers.
> + Includes all devices which conform to the sff-8436 and sff-8472
This includes
and then please s/sff/SFF/ (2 places).
> + spec including SFP, SFP+, SFP28, SFP-DWDM, QSFP, QSFP+, QSFP28
> + or later. These devices are usually found in network switches.
> +
> + This driver only manages read/write access to the EEPROM, all
> + other features should be accessed via i2c-dev.
> +
> + This driver can also be built as a module. If so, the module
> + will be called optoe.
> +
> + If unsure, say N.
> +
> endmenu
thanks,
--
~Randy
On Mon, Jun 11, 2018 at 6:25 AM, Don Bollinger <[email protected]> wrote:
> optoe is an i2c based driver that supports read/write access to all
> the pages (tables) of MSA standard SFP and similar devices (conforming
> to the SFF-8472 spec) and MSA standard QSFP and similar devices
> (conforming to the SFF-8436 spec).
>
> These devices provide identification, operational status and control
> registers via an EEPROM model. These devices support one or 3 fixed
> pages (128 bytes) of data, and one page that is selected via a page
> register on the first fixed page. Thus the driver's main task is
> to map these pages onto a simple linear address space for user space
> management applications. See the driver code for a detailed layout.
>
> EEPROM data is accessible via a bin_attribute file called 'eeprom',
> e.g. /sys/bus/i2c/devices/24-0050/eeprom.
>
> Signed-off-by: Don Bollinger <[email protected]>
> +
> +#undef EEPROM_CLASS
> +#ifdef CONFIG_EEPROM_CLASS
> +#define EEPROM_CLASS
> +#endif
> +#ifdef CONFIG_EEPROM_CLASS_MODULE
> +#define EEPROM_CLASS
> +#endif
I don't understand this part: I see some older patches introducing an
EEPROM_CLASS, but nothing ever seems to have made it into the
mainline kernel.
If that class isn't there, this code shouldn't be either. You can always
add it back in case we decide to introduce that class later, but then
I wouldn't make it a compile-time option but just a hard dependency
instead.
> +struct optoe_platform_data {
> + u32 byte_len; /* size (sum of all addr) */
> + u16 page_size; /* for writes */
> + u8 flags;
> + void *dummy1; /* backward compatibility */
> + void *dummy2; /* backward compatibility */
> +
> +#ifdef EEPROM_CLASS
> + struct eeprom_platform_data *eeprom_data;
> +#endif
> + char port_name[MAX_PORT_NAME_LEN];
> +};
What is the backward-compatibility for? Normally we don't do it
like that, we only keep compatibility with things that have already
been part of the kernel, especially when you also have an #ifdef
that makes it incompatible again ;-)
> +struct optoe_data {
> + struct optoe_platform_data chip;
Maybe merge the two structures into one?
Arnd
On 6/10/2018 11:25 PM, Don Bollinger wrote:
> optoe is an i2c based driver that supports read/write access to all
> the pages (tables) of MSA standard SFP and similar devices (conforming
> to the SFF-8472 spec) and MSA standard QSFP and similar devices
> (conforming to the SFF-8436 spec).
>
> These devices provide identification, operational status and control
> registers via an EEPROM model. These devices support one or 3 fixed
> pages (128 bytes) of data, and one page that is selected via a page
> register on the first fixed page. Thus the driver's main task is
> to map these pages onto a simple linear address space for user space
> management applications. See the driver code for a detailed layout.
>
> EEPROM data is accessible via a bin_attribute file called 'eeprom',
> e.g. /sys/bus/i2c/devices/24-0050/eeprom.
>
> Signed-off-by: Don Bollinger <[email protected]>
> ---
>
> Why should this driver be in the Linux kernel? SFP and QSFP devices plug
> into switches to convert electrical to optical signals and drive the
> optical signal over fiber optic cables. They provide status and control
> registers through an i2c interface similar to to other EEPROMS. However,
> they have a paging mechanism that is unique, which requires a different
> driver from (for example) at24. Various drivers have been developed for
> this purpose, none of them support both SFP and QSFP, provide both read
> and write access, and access all 256 architected pages. optoe does all
> of these.
>
> optoe has been adopted and is shipping to customers as a base module,
> available to all platforms (switches) and used by multiple vendors and
> platforms on both ONL (Open Network Linux) and SONiC (Microsoft's
> 'Software for Open Networking in the Cloud').
>
> This patch has been built on the latest staging-testing kernel. It has
> built and tested with SFP and QSFP devices on an ARM platform with a 4.9
> kernel, and an x86 switch with a 3.16 kernel. This patch should install
> and build clean on any kernel from 3.16 up to the latest (as of 6/10/2018).
>
>
> Documentation/misc-devices/optoe.txt | 56 ++
> drivers/misc/eeprom/Kconfig | 18 +
> drivers/misc/eeprom/Makefile | 1 +
> drivers/misc/eeprom/optoe.c | 1141 ++++++++++++++++++++++++++++++++++
> 4 files changed, 1216 insertions(+)
> create mode 100644 Documentation/misc-devices/optoe.txt
> create mode 100644 drivers/misc/eeprom/optoe.c
>
There's an SFP driver under drivers/net/phy. Can that driver be extended
to provide this support? Adding Russel King who developed sfp.c, as well
at the netdev mailing list.
Thanks,
Tom
> diff --git a/Documentation/misc-devices/optoe.txt b/Documentation/misc-devices/optoe.txt
> new file mode 100644
> index 000000000000..496134940147
> --- /dev/null
> +++ b/Documentation/misc-devices/optoe.txt
> @@ -0,0 +1,56 @@
> +optoe driver
> +
> +Author Don Bollinger ([email protected])
> +
> +Optoe is an i2c based driver that supports read/write access to all
> +the pages (tables) of MSA standard SFP and similar devices (conforming
> +to the SFF-8472 spec) and MSA standard QSFP and similar devices
> +(conforming to the SFF-8436 spec).
> +
> +i2c based optoelectronic transceivers (SPF, QSFP, etc) provide identification,
> +operational status, and control registers via an EEPROM model. Unlike the
> +EEPROMs that at24 supports, these devices access data beyond byte 256 via
> +a page select register, which must be managed by the driver. See the driver
> +code for a detailed explanation of how the linear address space provided
> +by the driver maps to the paged address space provided by the devices.
> +
> +The EEPROM data is accessible via a bin_attribute file called 'eeprom',
> +e.g. /sys/bus/i2c/devices/24-0050/eeprom
> +
> +This driver also reports the port number for each device, via a sysfs
> +attribute: 'port_name'. This is a read/write attribute. It should be
> +explicitly set as part of system initialization, ideally at the same time
> +the device is instantiated. Write an appropriate port name (any string, up
> +to 19 characters) to initialize. If not initialized explicitly, all ports
> +will have the port_name of 'unitialized'. Alternatively, if the driver is
> +called with platform_data, the port_name will be read from eeprom_data->label
> +(if the EEPROM CLASS driver is configured) or from platform_data.port_name.
> +
> +This driver can be instantiated with 'new_device', per the convention
> +described in Documentation/i2c/instantiating-devices. It wants one of
> +two possible device identifiers. Use 'optoe1' to indicate this is a device
> +with just one i2c address (all QSFP type devices). Use 'optoe2' to indicate
> +this is a device with two i2c addresses (all SFP type devices).
> +
> +Example:
> +# echo optoe1 0x50 > /sys/bus/i2c/devices/i2c-64/new_device
> +# echo port54 > /sys/bus/i2c/devices/i2c-64/port_name
> +
> +This will add a QSFP type device to i2c bus i2c-64, and name it 'port54'
> +
> +Example:
> +# echo optoe2 0x50 > /sys/bus/i2c/devices/i2c-11/new_device
> +# echo port1 > /sys/bus/i2c/devices/i2c-11/port_name
> +
> +This will add an SFP type device to i2c bus i2c-11, and name it 'port1'
> +
> +The second parameter to new_device is an i2c address, and MUST be 0x50 for
> +this driver to work properly. This is part of the spec for these devices.
> +(It is not necessary to create a device at 0x51 for SFP type devices, the
> +driver does that automatically.)
> +
> +Note that SFP type and QSFP type devices are not plug-compatible. The
> +driver expects the correct ID for each port (each i2c device). It does
> +not check because the port will often be empty, and the only way to check
> +is to interrogate the device. Incorrect choice of ID will lead to correct
> +data being reported for the first 256 bytes, incorrect data after that.
> diff --git a/drivers/misc/eeprom/Kconfig b/drivers/misc/eeprom/Kconfig
> index 68a1ac929917..9a08e12756ee 100644
> --- a/drivers/misc/eeprom/Kconfig
> +++ b/drivers/misc/eeprom/Kconfig
> @@ -111,4 +111,22 @@ config EEPROM_IDT_89HPESX
> This driver can also be built as a module. If so, the module
> will be called idt_89hpesx.
>
> +config EEPROM_OPTOE
> + tristate "read/write access to SFP* & QSFP* EEPROMs"
> + depends on I2C && SYSFS
> + help
> + If you say yes here you get support for read and write access to
> + the EEPROM of SFP and QSFP type optical and copper transceivers.
> + Includes all devices which conform to the sff-8436 and sff-8472
> + spec including SFP, SFP+, SFP28, SFP-DWDM, QSFP, QSFP+, QSFP28
> + or later. These devices are usually found in network switches.
> +
> + This driver only manages read/write access to the EEPROM, all
> + other features should be accessed via i2c-dev.
> +
> + This driver can also be built as a module. If so, the module
> + will be called optoe.
> +
> + If unsure, say N.
> +
> endmenu
> diff --git a/drivers/misc/eeprom/Makefile b/drivers/misc/eeprom/Makefile
> index 2aab60ef3e3e..00288d669017 100644
> --- a/drivers/misc/eeprom/Makefile
> +++ b/drivers/misc/eeprom/Makefile
> @@ -7,3 +7,4 @@ obj-$(CONFIG_EEPROM_93CX6) += eeprom_93cx6.o
> obj-$(CONFIG_EEPROM_93XX46) += eeprom_93xx46.o
> obj-$(CONFIG_EEPROM_DIGSY_MTC_CFG) += digsy_mtc_eeprom.o
> obj-$(CONFIG_EEPROM_IDT_89HPESX) += idt_89hpesx.o
> +obj-$(CONFIG_EEPROM_OPTOE) += optoe.o
> diff --git a/drivers/misc/eeprom/optoe.c b/drivers/misc/eeprom/optoe.c
> new file mode 100644
> index 000000000000..7cdf1a0a5299
> --- /dev/null
> +++ b/drivers/misc/eeprom/optoe.c
> @@ -0,0 +1,1141 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * optoe.c - A driver to read and write the EEPROM on optical transceivers
> + * (SFP, QSFP and similar I2C based devices)
> + *
> + * Copyright (C) 2014 Cumulus networks Inc.
> + * Copyright (C) 2017 Finisar Corp.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Freeoftware Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +/*
> + * Description:
> + * a) Optical transceiver EEPROM read/write transactions are just like
> + * the at24 eeproms managed by the at24.c i2c driver
> + * b) The register/memory layout is up to 256 128 byte pages defined by
> + * a "pages valid" register and switched via a "page select"
> + * register as explained in below diagram.
> + * c) 256 bytes are mapped at a time. 'Lower page 00h' is the first 128
> + * bytes of address space, and always references the same
> + * location, independent of the page select register.
> + * All mapped pages are mapped into the upper 128 bytes
> + * (offset 128-255) of the i2c address.
> + * d) Devices with one I2C address (eg QSFP) use I2C address 0x50
> + * (A0h in the spec), and map all pages in the upper 128 bytes
> + * of that address.
> + * e) Devices with two I2C addresses (eg SFP) have 256 bytes of data
> + * at I2C address 0x50, and 256 bytes of data at I2C address
> + * 0x51 (A2h in the spec). Page selection and paged access
> + * only apply to this second I2C address (0x51).
> + * e) The address space is presented, by the driver, as a linear
> + * address space. For devices with one I2C client at address
> + * 0x50 (eg QSFP), offset 0-127 are in the lower
> + * half of address 50/A0h/client[0]. Offset 128-255 are in
> + * page 0, 256-383 are page 1, etc. More generally, offset
> + * 'n' resides in page (n/128)-1. ('page -1' is the lower
> + * half, offset 0-127).
> + * f) For devices with two I2C clients at address 0x50 and 0x51 (eg SFP),
> + * the address space places offset 0-127 in the lower
> + * half of 50/A0/client[0], offset 128-255 in the upper
> + * half. Offset 256-383 is in the lower half of 51/A2/client[1].
> + * Offset 384-511 is in page 0, in the upper half of 51/A2/...
> + * Offset 512-639 is in page 1, in the upper half of 51/A2/...
> + * Offset 'n' is in page (n/128)-3 (for n > 383)
> + *
> + * One I2c addressed (eg QSFP) Memory Map
> + *
> + * 2-Wire Serial Address: 1010000x
> + *
> + * Lower Page 00h (128 bytes)
> + * =====================
> + * | |
> + * | |
> + * | |
> + * | |
> + * | |
> + * | |
> + * | |
> + * | |
> + * | |
> + * | |
> + * |Page Select Byte(127)|
> + * =====================
> + * |
> + * |
> + * |
> + * |
> + * V
> + * ------------------------------------------------------------
> + * | | | |
> + * | | | |
> + * | | | |
> + * | | | |
> + * | | | |
> + * | | | |
> + * | | | |
> + * | | | |
> + * | | | |
> + * V V V V
> + * ------------ -------------- --------------- --------------
> + * | | | | | | | |
> + * | Upper | | Upper | | Upper | | Upper |
> + * | Page 00h | | Page 01h | | Page 02h | | Page 03h |
> + * | | | (Optional) | | (Optional) | | (Optional |
> + * | | | | | | | for Cable |
> + * | | | | | | | Assemblies) |
> + * | ID | | AST | | User | | |
> + * | Fields | | Table | | EEPROM Data | | |
> + * | | | | | | | |
> + * | | | | | | | |
> + * | | | | | | | |
> + * ------------ -------------- --------------- --------------
> + *
> + * The SFF 8436 (QSFP) spec only defines the 4 pages described above.
> + * In anticipation of future applications and devices, this driver
> + * supports access to the full architected range, 256 pages.
> + *
> + **/
> +
> +/* #define DEBUG 1 */
> +
> +#undef EEPROM_CLASS
> +#ifdef CONFIG_EEPROM_CLASS
> +#define EEPROM_CLASS
> +#endif
> +#ifdef CONFIG_EEPROM_CLASS_MODULE
> +#define EEPROM_CLASS
> +#endif
> +
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/delay.h>
> +#include <linux/mutex.h>
> +#include <linux/sysfs.h>
> +#include <linux/jiffies.h>
> +#include <linux/i2c.h>
> +
> +#ifdef EEPROM_CLASS
> +#include <linux/eeprom_class.h>
> +#endif
> +
> +#include <linux/types.h>
> +
> +/* The maximum length of a port name */
> +#define MAX_PORT_NAME_LEN 20
> +
> +struct optoe_platform_data {
> + u32 byte_len; /* size (sum of all addr) */
> + u16 page_size; /* for writes */
> + u8 flags;
> + void *dummy1; /* backward compatibility */
> + void *dummy2; /* backward compatibility */
> +
> +#ifdef EEPROM_CLASS
> + struct eeprom_platform_data *eeprom_data;
> +#endif
> + char port_name[MAX_PORT_NAME_LEN];
> +};
> +
> +/* fundamental unit of addressing for EEPROM */
> +#define OPTOE_PAGE_SIZE 128
> +/*
> + * Single address devices (eg QSFP) have 256 pages, plus the unpaged
> + * low 128 bytes. If the device does not support paging, it is
> + * only 2 'pages' long.
> + */
> +#define OPTOE_ARCH_PAGES 256
> +#define ONE_ADDR_EEPROM_SIZE ((1 + OPTOE_ARCH_PAGES) * OPTOE_PAGE_SIZE)
> +#define ONE_ADDR_EEPROM_UNPAGED_SIZE (2 * OPTOE_PAGE_SIZE)
> +/*
> + * Dual address devices (eg SFP) have 256 pages, plus the unpaged
> + * low 128 bytes, plus 256 bytes at 0x50. If the device does not
> + * support paging, it is 4 'pages' long.
> + */
> +#define TWO_ADDR_EEPROM_SIZE ((3 + OPTOE_ARCH_PAGES) * OPTOE_PAGE_SIZE)
> +#define TWO_ADDR_EEPROM_UNPAGED_SIZE (4 * OPTOE_PAGE_SIZE)
> +#define TWO_ADDR_NO_0X51_SIZE (2 * OPTOE_PAGE_SIZE)
> +
> +/* a few constants to find our way around the EEPROM */
> +#define OPTOE_PAGE_SELECT_REG 0x7F
> +#define ONE_ADDR_PAGEABLE_REG 0x02
> +#define ONE_ADDR_NOT_PAGEABLE BIT(2)
> +#define TWO_ADDR_PAGEABLE_REG 0x40
> +#define TWO_ADDR_PAGEABLE BIT(4)
> +#define TWO_ADDR_0X51_REG 92
> +#define TWO_ADDR_0X51_SUPP BIT(6)
> +#define OPTOE_ID_REG 0
> +#define OPTOE_READ_OP 0
> +#define OPTOE_WRITE_OP 1
> +#define OPTOE_EOF 0 /* used for access beyond end of device */
> +
> +struct optoe_data {
> + struct optoe_platform_data chip;
> + int use_smbus;
> + char port_name[MAX_PORT_NAME_LEN];
> +
> + /*
> + * Lock protects against activities from other Linux tasks,
> + * but not from changes by other I2C masters.
> + */
> + struct mutex lock;
> + struct bin_attribute bin;
> + struct attribute_group attr_group;
> +
> + u8 *writebuf;
> + unsigned int write_max;
> +
> + unsigned int num_addresses;
> +
> +#ifdef EEPROM_CLASS
> + struct eeprom_device *eeprom_dev;
> +#endif
> +
> + /* dev_class: ONE_ADDR (QSFP) or TWO_ADDR (SFP) */
> + int dev_class;
> +
> + struct i2c_client *client[2];
> +};
> +
> +/*
> + * This parameter is to help this driver avoid blocking other drivers out
> + * of I2C for potentially troublesome amounts of time. With a 100 kHz I2C
> + * clock, one 256 byte read takes about 1/43 second which is excessive;
> + * but the 1/170 second it takes at 400 kHz may be quite reasonable; and
> + * at 1 MHz (Fm+) a 1/430 second delay could easily be invisible.
> + *
> + * This value is forced to be a power of two so that writes align on pages.
> + */
> +static unsigned int io_limit = OPTOE_PAGE_SIZE;
> +
> +/*
> + * specs often allow 5 msec for a page write, sometimes 20 msec;
> + * it's important to recover from write timeouts.
> + */
> +static unsigned int write_timeout = 25;
> +
> +/*
> + * flags to distinguish one-address (QSFP family) from two-address (SFP family)
> + * If the family is not known, figure it out when the device is accessed
> + */
> +#define ONE_ADDR 1
> +#define TWO_ADDR 2
> +
> +static const struct i2c_device_id optoe_ids[] = {
> + { "optoe1", ONE_ADDR },
> + { "optoe2", TWO_ADDR },
> + { "sff8436", ONE_ADDR },
> + { "24c04", TWO_ADDR },
> + { /* END OF LIST */ }
> +};
> +MODULE_DEVICE_TABLE(i2c, optoe_ids);
> +
> +/*-------------------------------------------------------------------------*/
> +/*
> + * This routine computes the addressing information to be used for
> + * a given r/w request.
> + *
> + * Task is to calculate the client (0 = i2c addr 50, 1 = i2c addr 51),
> + * the page, and the offset.
> + *
> + * Handles both single address (eg QSFP) and two address (eg SFP).
> + * For SFP, offset 0-255 are on client[0], >255 is on client[1]
> + * Offset 256-383 are on the lower half of client[1]
> + * Pages are accessible on the upper half of client[1].
> + * Offset >383 are in 128 byte pages mapped into the upper half
> + *
> + * For QSFP, all offsets are on client[0]
> + * offset 0-127 are on the lower half of client[0] (no paging)
> + * Pages are accessible on the upper half of client[1].
> + * Offset >127 are in 128 byte pages mapped into the upper half
> + *
> + * Callers must not read/write beyond the end of a client or a page
> + * without recomputing the client/page. Hence offset (within page)
> + * plus length must be less than or equal to 128. (Note that this
> + * routine does not have access to the length of the call, hence
> + * cannot do the validity check.)
> + *
> + * Offset within Lower Page 00h and Upper Page 00h are not recomputed
> + */
> +
> +static uint8_t optoe_translate_offset(struct optoe_data *optoe,
> + loff_t *offset,
> + struct i2c_client **client)
> +{
> + unsigned int page = 0;
> +
> + *client = optoe->client[0];
> +
> + /* if SFP style, offset > 255, shift to i2c addr 0x51 */
> + if (optoe->dev_class == TWO_ADDR) {
> + if (*offset > 255) {
> + /* like QSFP, but shifted to client[1] */
> + *client = optoe->client[1];
> + *offset -= 256;
> + }
> + }
> +
> + /*
> + * if offset is in the range 0-128...
> + * page doesn't matter (using lower half), return 0.
> + * offset is already correct (don't add 128 to get to paged area)
> + */
> + if (*offset < OPTOE_PAGE_SIZE)
> + return page;
> +
> + /* note, page will always be positive since *offset >= 128 */
> + page = (*offset >> 7) - 1;
> + /* 0x80 places the offset in the top half, offset is last 7 bits */
> + *offset = OPTOE_PAGE_SIZE + (*offset & 0x7f);
> +
> + return page; /* note also returning client and offset */
> +}
> +
> +static ssize_t optoe_eeprom_read(struct optoe_data *optoe,
> + struct i2c_client *client,
> + char *buf, unsigned int offset, size_t count)
> +{
> + struct i2c_msg msg[2];
> + u8 msgbuf[2];
> + unsigned long timeout, read_time;
> + int status, i;
> +
> + memset(msg, 0, sizeof(msg));
> +
> + switch (optoe->use_smbus) {
> + case I2C_SMBUS_I2C_BLOCK_DATA:
> + /*smaller eeproms can work given some SMBus extension calls */
> + if (count > I2C_SMBUS_BLOCK_MAX)
> + count = I2C_SMBUS_BLOCK_MAX;
> + break;
> + case I2C_SMBUS_WORD_DATA:
> + /* Check for odd length transaction */
> + count = (count == 1) ? 1 : 2;
> + break;
> + case I2C_SMBUS_BYTE_DATA:
> + count = 1;
> + break;
> + default:
> + /*
> + * When we have a better choice than SMBus calls, use a
> + * combined I2C message. Write address; then read up to
> + * io_limit data bytes. msgbuf is u8 and will cast to our
> + * needs.
> + */
> + i = 0;
> + msgbuf[i++] = offset;
> +
> + msg[0].addr = client->addr;
> + msg[0].buf = msgbuf;
> + msg[0].len = i;
> +
> + msg[1].addr = client->addr;
> + msg[1].flags = I2C_M_RD;
> + msg[1].buf = buf;
> + msg[1].len = count;
> + }
> +
> + /*
> + * Reads fail if the previous write didn't complete yet. We may
> + * loop a few times until this one succeeds, waiting at least
> + * long enough for one entire page write to work.
> + */
> + timeout = jiffies + msecs_to_jiffies(write_timeout);
> + do {
> + read_time = jiffies;
> +
> + switch (optoe->use_smbus) {
> + case I2C_SMBUS_I2C_BLOCK_DATA:
> + status = i2c_smbus_read_i2c_block_data(client, offset,
> + count, buf);
> + break;
> + case I2C_SMBUS_WORD_DATA:
> + status = i2c_smbus_read_word_data(client, offset);
> + if (status >= 0) {
> + buf[0] = status & 0xff;
> + if (count == 2)
> + buf[1] = status >> 8;
> + status = count;
> + }
> + break;
> + case I2C_SMBUS_BYTE_DATA:
> + status = i2c_smbus_read_byte_data(client, offset);
> + if (status >= 0) {
> + buf[0] = status;
> + status = count;
> + }
> + break;
> + default:
> + status = i2c_transfer(client->adapter, msg, 2);
> + if (status == 2)
> + status = count;
> + }
> +
> + dev_dbg(&client->dev, "eeprom read %zu@%d --> %d (%ld)\n",
> + count, offset, status, jiffies);
> +
> + if (status == count) /* happy path */
> + return count;
> +
> + if (status == -ENXIO) /* no module present */
> + return status;
> +
> + /* REVISIT: at HZ=100, this is sloooow */
> + usleep_range(1000, 2000);
> + } while (time_before(read_time, timeout));
> +
> + return -ETIMEDOUT;
> +}
> +
> +static ssize_t optoe_eeprom_write(struct optoe_data *optoe,
> + struct i2c_client *client,
> + const char *buf,
> + unsigned int offset, size_t count)
> +{
> + struct i2c_msg msg;
> + ssize_t status;
> + unsigned long timeout, write_time;
> + unsigned int next_page_start;
> + int i = 0;
> + u16 writeword;
> +
> + /* write max is at most a page
> + * (In this driver, write_max is actually one byte!)
> + */
> + if (count > optoe->write_max)
> + count = optoe->write_max;
> +
> + /* shorten count if necessary to avoid crossing page boundary */
> + next_page_start = roundup(offset + 1, OPTOE_PAGE_SIZE);
> + if (offset + count > next_page_start)
> + count = next_page_start - offset;
> +
> + switch (optoe->use_smbus) {
> + case I2C_SMBUS_I2C_BLOCK_DATA:
> + /*smaller eeproms can work given some SMBus extension calls */
> + if (count > I2C_SMBUS_BLOCK_MAX)
> + count = I2C_SMBUS_BLOCK_MAX;
> + break;
> + case I2C_SMBUS_WORD_DATA:
> + /* Check for odd length transaction */
> + count = (count == 1) ? 1 : 2;
> + break;
> + case I2C_SMBUS_BYTE_DATA:
> + count = 1;
> + break;
> + default:
> + /* If we'll use I2C calls for I/O, set up the message */
> + msg.addr = client->addr;
> + msg.flags = 0;
> +
> + /* msg.buf is u8 and casts will mask the values */
> + msg.buf = optoe->writebuf;
> +
> + msg.buf[i++] = offset;
> + memcpy(&msg.buf[i], buf, count);
> + msg.len = i + count;
> + break;
> + }
> +
> + /*
> + * Reads fail if the previous write didn't complete yet. We may
> + * loop a few times until this one succeeds, waiting at least
> + * long enough for one entire page write to work.
> + */
> + timeout = jiffies + msecs_to_jiffies(write_timeout);
> + do {
> + write_time = jiffies;
> +
> + switch (optoe->use_smbus) {
> + case I2C_SMBUS_I2C_BLOCK_DATA:
> + status = i2c_smbus_write_i2c_block_data(client,
> + offset,
> + count,
> + buf);
> + if (status == 0)
> + status = count;
> + break;
> + case I2C_SMBUS_WORD_DATA:
> + if (count == 2) {
> + writeword = (buf[1] << 8) | buf[0];
> + status = i2c_smbus_write_word_data(client,
> + offset,
> + writeword);
> + } else {
> + /* count = 1 */
> + status = i2c_smbus_write_byte_data(client,
> + offset,
> + buf[0]);
> + }
> + if (status == 0)
> + status = count;
> + break;
> + case I2C_SMBUS_BYTE_DATA:
> + status = i2c_smbus_write_byte_data(client, offset,
> + buf[0]);
> + if (status == 0)
> + status = count;
> + break;
> + default:
> + status = i2c_transfer(client->adapter, &msg, 1);
> + if (status == 1)
> + status = count;
> + break;
> + }
> +
> + dev_dbg(&client->dev, "eeprom write %zu@%d --> %ld (%lu)\n",
> + count, offset, (long int)status, jiffies);
> +
> + if (status == count)
> + return count;
> +
> + /* REVISIT: at HZ=100, this is sloooow */
> + usleep_range(1000, 2000);
> + } while (time_before(write_time, timeout));
> +
> + return -ETIMEDOUT;
> +}
> +
> +static ssize_t optoe_eeprom_update_client(struct optoe_data *optoe,
> + char *buf, loff_t off,
> + size_t count, int opcode)
> +{
> + struct i2c_client *client;
> + ssize_t retval = 0;
> + u8 page = 0;
> + loff_t phy_offset = off;
> + int ret = 0;
> +
> + page = optoe_translate_offset(optoe, &phy_offset, &client);
> + dev_dbg(&client->dev,
> + "%s off %lld page:%d phy_offset:%lld, count:%ld, opcode:%d\n",
> + __func__, off, page, phy_offset, (long int)count, opcode);
> + if (page > 0) {
> + ret = optoe_eeprom_write(optoe, client, &page,
> + OPTOE_PAGE_SELECT_REG, 1);
> + if (ret < 0) {
> + dev_dbg(&client->dev,
> + "Write page register for page %d failed ret:%d!\n",
> + page, ret);
> + return ret;
> + }
> + }
> +
> + while (count) {
> + ssize_t status;
> +
> + if (opcode == OPTOE_READ_OP) {
> + status = optoe_eeprom_read(optoe, client, buf,
> + phy_offset, count);
> + } else {
> + status = optoe_eeprom_write(optoe, client, buf,
> + phy_offset, count);
> + }
> + if (status <= 0) {
> + if (retval == 0)
> + retval = status;
> + break;
> + }
> + buf += status;
> + phy_offset += status;
> + count -= status;
> + retval += status;
> + }
> +
> + if (page > 0) {
> + /* return the page register to page 0 (why?) */
> + page = 0;
> + ret = optoe_eeprom_write(optoe, client, &page,
> + OPTOE_PAGE_SELECT_REG, 1);
> + if (ret < 0) {
> + dev_err(&client->dev,
> + "Restore page register to 0 failed:%d!\n", ret);
> + /* error only if nothing has been transferred */
> + if (retval == 0)
> + retval = ret;
> + }
> + }
> + return retval;
> +}
> +
> +/*
> + * Figure out if this access is within the range of supported pages.
> + * Note this is called on every access because we don't know if the
> + * module has been replaced since the last call.
> + * If/when modules support more pages, this is the routine to update
> + * to validate and allow access to additional pages.
> + *
> + * Returns updated len for this access:
> + * - entire access is legal, original len is returned.
> + * - access begins legal but is too long, len is truncated to fit.
> + * - initial offset exceeds supported pages, return OPTOE_EOF (zero)
> + */
> +static ssize_t optoe_page_legal(struct optoe_data *optoe,
> + loff_t off, size_t len)
> +{
> + struct i2c_client *client = optoe->client[0];
> + u8 regval;
> + int status;
> + size_t maxlen;
> +
> + if (off < 0)
> + return -EINVAL;
> + if (optoe->dev_class == TWO_ADDR) {
> + /* SFP case */
> + /* if only using addr 0x50 (first 256 bytes) we're good */
> + if ((off + len) <= TWO_ADDR_NO_0X51_SIZE)
> + return len;
> + /* if offset exceeds possible pages, we're not good */
> + if (off >= TWO_ADDR_EEPROM_SIZE)
> + return OPTOE_EOF;
> + /* in between, are pages supported? */
> + status = optoe_eeprom_read(optoe, client, ®val,
> + TWO_ADDR_PAGEABLE_REG, 1);
> + if (status < 0)
> + return status; /* error out (no module?) */
> + if (regval & TWO_ADDR_PAGEABLE) {
> + /* Pages supported, trim len to the end of pages */
> + maxlen = TWO_ADDR_EEPROM_SIZE - off;
> + } else {
> + /* pages not supported, trim len to unpaged size */
> + if (off >= TWO_ADDR_EEPROM_UNPAGED_SIZE)
> + return OPTOE_EOF;
> +
> + /* will be accessing addr 0x51, is that supported? */
> + /* byte 92, bit 6 implies DDM support, 0x51 support */
> + status = optoe_eeprom_read(optoe, client, ®val,
> + TWO_ADDR_0X51_REG, 1);
> + if (status < 0)
> + return status;
> + if (regval & TWO_ADDR_0X51_SUPP) {
> + /* addr 0x51 is OK */
> + maxlen = TWO_ADDR_EEPROM_UNPAGED_SIZE - off;
> + } else {
> + /* addr 0x51 NOT supported, trim to 256 max */
> + if (off >= TWO_ADDR_NO_0X51_SIZE)
> + return OPTOE_EOF;
> + maxlen = TWO_ADDR_NO_0X51_SIZE - off;
> + }
> + }
> + len = (len > maxlen) ? maxlen : len;
> + dev_dbg(&client->dev,
> + "page_legal, SFP, off %lld len %ld\n",
> + off, (long int)len);
> + } else {
> + /* QSFP case */
> + /* if no pages needed, we're good */
> + if ((off + len) <= ONE_ADDR_EEPROM_UNPAGED_SIZE)
> + return len;
> + /* if offset exceeds possible pages, we're not good */
> + if (off >= ONE_ADDR_EEPROM_SIZE)
> + return OPTOE_EOF;
> + /* in between, are pages supported? */
> + status = optoe_eeprom_read(optoe, client, ®val,
> + ONE_ADDR_PAGEABLE_REG, 1);
> + if (status < 0)
> + return status; /* error out (no module?) */
> + if (regval & ONE_ADDR_NOT_PAGEABLE) {
> + /* pages not supported, trim len to unpaged size */
> + if (off >= ONE_ADDR_EEPROM_UNPAGED_SIZE)
> + return OPTOE_EOF;
> + maxlen = ONE_ADDR_EEPROM_UNPAGED_SIZE - off;
> + } else {
> + /* Pages supported, trim len to the end of pages */
> + maxlen = ONE_ADDR_EEPROM_SIZE - off;
> + }
> + len = (len > maxlen) ? maxlen : len;
> + dev_dbg(&client->dev,
> + "page_legal, QSFP, off %lld len %ld\n",
> + off, (long int)len);
> + }
> + return len;
> +}
> +
> +static ssize_t optoe_read_write(struct optoe_data *optoe,
> + char *buf, loff_t off,
> + size_t len, int opcode)
> +{
> + struct i2c_client *client = optoe->client[0];
> + int chunk;
> + int status = 0;
> + ssize_t retval;
> + size_t pending_len = 0, chunk_len = 0;
> + loff_t chunk_offset = 0, chunk_start_offset = 0;
> +
> + dev_dbg(&client->dev,
> + "%s: off %lld len:%ld, opcode:%s\n",
> + __func__, off, (long int)len,
> + (opcode == OPTOE_READ_OP) ? "r" : "w");
> + if (unlikely(!len))
> + return len;
> +
> + /*
> + * Read data from chip, protecting against concurrent updates
> + * from this host, but not from other I2C masters.
> + */
> + mutex_lock(&optoe->lock);
> +
> + /*
> + * Confirm this access fits within the device supported addr range
> + */
> + status = optoe_page_legal(optoe, off, len);
> + if (status == OPTOE_EOF || status < 0) {
> + mutex_unlock(&optoe->lock);
> + return status;
> + }
> + len = status;
> +
> + /*
> + * For each (128 byte) chunk involved in this request, issue a
> + * separate call to sff_eeprom_update_client(), to
> + * ensure that each access recalculates the client/page
> + * and writes the page register as needed.
> + * Note that chunk to page mapping is confusing, is different for
> + * QSFP and SFP, and never needs to be done. Don't try!
> + */
> + pending_len = len; /* amount remaining to transfer */
> + retval = 0; /* amount transferred */
> + for (chunk = off >> 7; chunk <= (off + len - 1) >> 7; chunk++) {
> + /*
> + * Compute the offset and number of bytes to be read/write
> + *
> + * 1. start at offset 0 (within the chunk), and read/write
> + * the entire chunk
> + * 2. start at offset 0 (within the chunk) and read/write less
> + * than entire chunk
> + * 3. start at an offset not equal to 0 and read/write the rest
> + * of the chunk
> + * 4. start at an offset not equal to 0 and read/write less than
> + * (end of chunk - offset)
> + */
> + chunk_start_offset = chunk * OPTOE_PAGE_SIZE;
> +
> + if (chunk_start_offset < off) {
> + chunk_offset = off;
> + if ((off + pending_len) < (chunk_start_offset +
> + OPTOE_PAGE_SIZE))
> + chunk_len = pending_len;
> + else
> + chunk_len = OPTOE_PAGE_SIZE - off;
> + } else {
> + chunk_offset = chunk_start_offset;
> + if (pending_len > OPTOE_PAGE_SIZE)
> + chunk_len = OPTOE_PAGE_SIZE;
> + else
> + chunk_len = pending_len;
> + }
> +
> + dev_dbg(&client->dev,
> + "sff_r/w: off %lld, len %ld, chunk_start_offset %lld, chunk_offset %lld, chunk_len %ld, pending_len %ld\n",
> + off, (long int)len, chunk_start_offset, chunk_offset,
> + (long int)chunk_len, (long int)pending_len);
> +
> + /*
> + * note: chunk_offset is from the start of the EEPROM,
> + * not the start of the chunk
> + */
> + status = optoe_eeprom_update_client(optoe, buf, chunk_offset,
> + chunk_len, opcode);
> + if (status != chunk_len) {
> + /* This is another 'no device present' path */
> + dev_dbg(&client->dev,
> + "o_u_c: chunk %d c_offset %lld c_len %ld failed %d!\n",
> + chunk, chunk_offset, (long int)chunk_len, status);
> + if (status > 0)
> + retval += status;
> + if (retval == 0)
> + retval = status;
> + break;
> + }
> + buf += status;
> + pending_len -= status;
> + retval += status;
> + }
> + mutex_unlock(&optoe->lock);
> +
> + return retval;
> +}
> +
> +static ssize_t optoe_bin_read(struct file *filp, struct kobject *kobj,
> + struct bin_attribute *attr,
> + char *buf, loff_t off, size_t count)
> +{
> + struct i2c_client *client = to_i2c_client(container_of(kobj,
> + struct device, kobj));
> + struct optoe_data *optoe = i2c_get_clientdata(client);
> +
> + return optoe_read_write(optoe, buf, off, count, OPTOE_READ_OP);
> +}
> +
> +static ssize_t optoe_bin_write(struct file *filp, struct kobject *kobj,
> + struct bin_attribute *attr,
> + char *buf, loff_t off, size_t count)
> +{
> + struct i2c_client *client = to_i2c_client(container_of(kobj,
> + struct device, kobj));
> + struct optoe_data *optoe = i2c_get_clientdata(client);
> +
> + return optoe_read_write(optoe, buf, off, count, OPTOE_WRITE_OP);
> +}
> +
> +static int optoe_remove(struct i2c_client *client)
> +{
> + struct optoe_data *optoe;
> +
> + optoe = i2c_get_clientdata(client);
> + sysfs_remove_group(&client->dev.kobj, &optoe->attr_group);
> + sysfs_remove_bin_file(&client->dev.kobj, &optoe->bin);
> +
> + if (optoe->num_addresses == 2)
> + i2c_unregister_device(optoe->client[1]);
> +
> +#ifdef EEPROM_CLASS
> + eeprom_device_unregister(optoe->eeprom_dev);
> +#endif
> +
> + kfree(optoe->writebuf);
> + kfree(optoe);
> + return 0;
> +}
> +
> +static ssize_t dev_class_show(struct device *dev,
> + struct device_attribute *dattr, char *buf)
> +{
> + struct i2c_client *client = to_i2c_client(dev);
> + struct optoe_data *optoe = i2c_get_clientdata(client);
> + ssize_t count;
> +
> + mutex_lock(&optoe->lock);
> + count = sprintf(buf, "%d\n", optoe->dev_class);
> + mutex_unlock(&optoe->lock);
> +
> + return count;
> +}
> +
> +static ssize_t dev_class_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct i2c_client *client = to_i2c_client(dev);
> + struct optoe_data *optoe = i2c_get_clientdata(client);
> + int dev_class;
> +
> + /*
> + * dev_class is actually the number of i2c addresses used, thus
> + * legal values are "1" (QSFP class) and "2" (SFP class)
> + */
> +
> + if (kstrtoint(buf, 0, &dev_class) != 0 ||
> + dev_class < 1 || dev_class > 2)
> + return -EINVAL;
> +
> + mutex_lock(&optoe->lock);
> + optoe->dev_class = dev_class;
> + mutex_unlock(&optoe->lock);
> +
> + return count;
> +}
> +
> +/*
> + * if using the EEPROM CLASS driver, we don't report a port_name,
> + * the EEPROM CLASS drive handles that. Hence all this code is
> + * only compiled if we are NOT using the EEPROM CLASS driver.
> + */
> +#ifndef EEPROM_CLASS
> +
> +static ssize_t port_name_show(struct device *dev,
> + struct device_attribute *dattr, char *buf)
> +{
> + struct i2c_client *client = to_i2c_client(dev);
> + struct optoe_data *optoe = i2c_get_clientdata(client);
> + ssize_t count;
> +
> + mutex_lock(&optoe->lock);
> + count = sprintf(buf, "%s\n", optoe->port_name);
> + mutex_unlock(&optoe->lock);
> +
> + return count;
> +}
> +
> +static ssize_t port_name_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct i2c_client *client = to_i2c_client(dev);
> + struct optoe_data *optoe = i2c_get_clientdata(client);
> + char port_name[MAX_PORT_NAME_LEN];
> +
> + /* no checking, this value is not used except by port_name_show */
> +
> + if (sscanf(buf, "%19s", port_name) != 1)
> + return -EINVAL;
> +
> + mutex_lock(&optoe->lock);
> + strcpy(optoe->port_name, port_name);
> + mutex_unlock(&optoe->lock);
> +
> + return count;
> +}
> +
> +static DEVICE_ATTR_RW(port_name);
> +#endif /* if NOT defined EEPROM_CLASS, the common case */
> +
> +static DEVICE_ATTR_RW(dev_class);
> +
> +static struct attribute *optoe_attrs[] = {
> +#ifndef EEPROM_CLASS
> + &dev_attr_port_name.attr,
> +#endif
> + &dev_attr_dev_class.attr,
> + NULL,
> +};
> +
> +static struct attribute_group optoe_attr_group = {
> + .attrs = optoe_attrs,
> +};
> +
> +static int optoe_probe(struct i2c_client *client,
> + const struct i2c_device_id *id)
> +{
> + int err;
> + int use_smbus = 0;
> + struct optoe_platform_data chip;
> + struct optoe_data *optoe;
> + int num_addresses;
> + char port_name[MAX_PORT_NAME_LEN];
> +
> + if (client->addr != 0x50) {
> + dev_dbg(&client->dev, "probe, bad i2c addr: 0x%x\n",
> + client->addr);
> + err = -EINVAL;
> + goto exit;
> + }
> +
> + if (client->dev.platform_data) {
> + chip = *(struct optoe_platform_data *)client->dev.platform_data;
> + /* take the port name from the supplied platform data */
> +#ifdef EEPROM_CLASS
> + strncpy(port_name, chip.eeprom_data->label, MAX_PORT_NAME_LEN);
> +#else
> + memcpy(port_name, chip.port_name, MAX_PORT_NAME_LEN);
> +#endif
> + dev_dbg(&client->dev,
> + "probe, chip provided, flags:0x%x; name: %s\n",
> + chip.flags, client->name);
> + } else {
> + if (!id->driver_data) {
> + err = -ENODEV;
> + goto exit;
> + }
> + dev_dbg(&client->dev, "probe, building chip\n");
> + strcpy(port_name, "unitialized");
> + chip.flags = 0;
> +#ifdef EEPROM_CLASS
> + chip.eeprom_data = NULL;
> +#endif
> + }
> +
> + /* Use I2C operations unless we're stuck with SMBus extensions. */
> + if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) {
> + if (i2c_check_functionality(client->adapter,
> + I2C_FUNC_SMBUS_READ_I2C_BLOCK)) {
> + use_smbus = I2C_SMBUS_I2C_BLOCK_DATA;
> + } else if (i2c_check_functionality(client->adapter,
> + I2C_FUNC_SMBUS_READ_WORD_DATA)) {
> + use_smbus = I2C_SMBUS_WORD_DATA;
> + } else if (i2c_check_functionality(client->adapter,
> + I2C_FUNC_SMBUS_READ_BYTE_DATA)) {
> + use_smbus = I2C_SMBUS_BYTE_DATA;
> + } else {
> + err = -EPFNOSUPPORT;
> + goto exit;
> + }
> + }
> +
> + optoe = kzalloc(sizeof(*optoe), GFP_KERNEL);
> + if (!optoe) {
> + err = -ENOMEM;
> + goto exit;
> + }
> +
> + mutex_init(&optoe->lock);
> +
> + /* determine whether this is a one-address or two-address module */
> + if ((strcmp(client->name, "optoe1") == 0) ||
> + (strcmp(client->name, "sff8436") == 0)) {
> + /* one-address (eg QSFP) family */
> + optoe->dev_class = ONE_ADDR;
> + chip.byte_len = ONE_ADDR_EEPROM_SIZE;
> + num_addresses = 1;
> + } else if ((strcmp(client->name, "optoe2") == 0) ||
> + (strcmp(client->name, "24c04") == 0)) {
> + /* SFP family */
> + optoe->dev_class = TWO_ADDR;
> + chip.byte_len = TWO_ADDR_EEPROM_SIZE;
> + num_addresses = 2;
> + } else { /* those were the only two choices */
> + err = -EINVAL;
> + goto exit;
> + }
> +
> + dev_dbg(&client->dev, "dev_class: %d\n", optoe->dev_class);
> + optoe->use_smbus = use_smbus;
> + optoe->chip = chip;
> + optoe->num_addresses = num_addresses;
> + memcpy(optoe->port_name, port_name, MAX_PORT_NAME_LEN);
> +
> + /*
> + * Export the EEPROM bytes through sysfs, since that's convenient.
> + * By default, only root should see the data (maybe passwords etc)
> + */
> + sysfs_bin_attr_init(&optoe->bin);
> + optoe->bin.attr.name = "eeprom";
> + optoe->bin.attr.mode = 0444;
> + optoe->bin.read = optoe_bin_read;
> + optoe->bin.size = chip.byte_len;
> +
> + if (!use_smbus ||
> + i2c_check_functionality(client->adapter,
> + I2C_FUNC_SMBUS_WRITE_I2C_BLOCK) ||
> + i2c_check_functionality(client->adapter,
> + I2C_FUNC_SMBUS_WRITE_WORD_DATA) ||
> + i2c_check_functionality(client->adapter,
> + I2C_FUNC_SMBUS_WRITE_BYTE_DATA)) {
> + /*
> + * NOTE: AN-2079
> + * Finisar recommends that the host implement 1 byte writes
> + * only since this module only supports 32 byte page boundaries.
> + * 2 byte writes are acceptable for PE and Vout changes per
> + * Application Note AN-2071.
> + */
> + unsigned int write_max = 1;
> +
> + optoe->bin.write = optoe_bin_write;
> + optoe->bin.attr.mode |= 0200;
> +
> + if (write_max > io_limit)
> + write_max = io_limit;
> + if (use_smbus && write_max > I2C_SMBUS_BLOCK_MAX)
> + write_max = I2C_SMBUS_BLOCK_MAX;
> + optoe->write_max = write_max;
> +
> + /* buffer (data + address at the beginning) */
> + optoe->writebuf = kmalloc(write_max + 2, GFP_KERNEL);
> + if (!optoe->writebuf) {
> + err = -ENOMEM;
> + goto exit_kfree;
> + }
> + } else {
> + dev_warn(&client->dev,
> + "cannot write due to controller restrictions.");
> + }
> +
> + optoe->client[0] = client;
> +
> + /* SFF-8472 spec requires that the second I2C address be 0x51 */
> + if (num_addresses == 2) {
> + optoe->client[1] = i2c_new_dummy(client->adapter, 0x51);
> + if (!optoe->client[1]) {
> + dev_err(&client->dev, "address 0x51 unavailable\n");
> + err = -EADDRINUSE;
> + goto err_struct;
> + }
> + }
> +
> + /* create the sysfs eeprom file */
> + err = sysfs_create_bin_file(&client->dev.kobj, &optoe->bin);
> + if (err)
> + goto err_struct;
> +
> + optoe->attr_group = optoe_attr_group;
> +
> + err = sysfs_create_group(&client->dev.kobj, &optoe->attr_group);
> + if (err) {
> + dev_err(&client->dev, "failed to create sysfs attribute group.\n");
> + goto err_struct;
> + }
> +
> +#ifdef EEPROM_CLASS
> + optoe->eeprom_dev = eeprom_device_register(&client->dev,
> + chip.eeprom_data);
> + if (IS_ERR(optoe->eeprom_dev)) {
> + dev_err(&client->dev, "error registering eeprom device.\n");
> + err = PTR_ERR(optoe->eeprom_dev);
> + goto err_sysfs_cleanup;
> + }
> +#endif
> +
> + i2c_set_clientdata(client, optoe);
> +
> + dev_info(&client->dev, "%zu byte %s EEPROM, %s\n",
> + optoe->bin.size, client->name,
> + optoe->bin.write ? "read/write" : "read-only");
> +
> + if (use_smbus == I2C_SMBUS_WORD_DATA ||
> + use_smbus == I2C_SMBUS_BYTE_DATA) {
> + dev_notice(&client->dev,
> + "Falling back to %s reads, performance will suffer\n",
> + use_smbus == I2C_SMBUS_WORD_DATA ? "word" : "byte");
> + }
> +
> + return 0;
> +
> +#ifdef EEPROM_CLASS
> +err_sysfs_cleanup:
> + sysfs_remove_group(&client->dev.kobj, &optoe->attr_group);
> + sysfs_remove_bin_file(&client->dev.kobj, &optoe->bin);
> +#endif
> +
> +err_struct:
> + if (num_addresses == 2) {
> + if (optoe->client[1])
> + i2c_unregister_device(optoe->client[1]);
> + }
> +
> + kfree(optoe->writebuf);
> +exit_kfree:
> + kfree(optoe);
> +exit:
> + dev_dbg(&client->dev, "probe error %d\n", err);
> +
> + return err;
> +}
> +
> +/*-------------------------------------------------------------------------*/
> +
> +static struct i2c_driver optoe_driver = {
> + .driver = {
> + .name = "optoe",
> + .owner = THIS_MODULE,
> + },
> + .probe = optoe_probe,
> + .remove = optoe_remove,
> + .id_table = optoe_ids,
> +};
> +
> +static int __init optoe_init(void)
> +{
> + if (!io_limit) {
> + pr_err("optoe: io_limit must not be 0!\n");
> + return -EINVAL;
> + }
> +
> + io_limit = rounddown_pow_of_two(io_limit);
> + return i2c_add_driver(&optoe_driver);
> +}
> +module_init(optoe_init);
> +
> +static void __exit optoe_exit(void)
> +{
> + i2c_del_driver(&optoe_driver);
> +}
> +module_exit(optoe_exit);
> +
> +MODULE_DESCRIPTION("Driver for optical transceiver (SFP, QSFP, ...) EEPROMs");
> +MODULE_AUTHOR("DON BOLLINGER <[email protected]>");
> +MODULE_LICENSE("GPL");
>
On Sun, Jun 10, 2018 at 09:56:00PM -0700, Randy Dunlap wrote:
> Hi,
>
> On 06/10/2018 09:25 PM, Don Bollinger wrote:
> > diff --git a/drivers/misc/eeprom/Kconfig b/drivers/misc/eeprom/Kconfig
> > index 68a1ac929917..9a08e12756ee 100644
> > --- a/drivers/misc/eeprom/Kconfig
> > +++ b/drivers/misc/eeprom/Kconfig
> > @@ -111,4 +111,22 @@ config EEPROM_IDT_89HPESX
> > This driver can also be built as a module. If so, the module
> > will be called idt_89hpesx.
> >
> > +config EEPROM_OPTOE
> > + tristate "read/write access to SFP* & QSFP* EEPROMs"
> > + depends on I2C && SYSFS
> > + help
> > + If you say yes here you get support for read and write access to
> > + the EEPROM of SFP and QSFP type optical and copper transceivers.
> > + Includes all devices which conform to the sff-8436 and sff-8472
>
> This includes
> and then please s/sff/SFF/ (2 places).
Got it. I'll make the changes in the next version.
Don
On Mon, Jun 11, 2018 at 01:33:07PM -0500, Tom Lendacky wrote:
> On 6/10/2018 11:25 PM, Don Bollinger wrote:
> > optoe is an i2c based driver that supports read/write access to all
> > the pages (tables) of MSA standard SFP and similar devices (conforming
> > to the SFF-8472 spec) and MSA standard QSFP and similar devices
> > (conforming to the SFF-8436 spec).
> >
> > These devices provide identification, operational status and control
> > registers via an EEPROM model. These devices support one or 3 fixed
> > pages (128 bytes) of data, and one page that is selected via a page
> > register on the first fixed page. Thus the driver's main task is
> > to map these pages onto a simple linear address space for user space
> > management applications. See the driver code for a detailed layout.
> >
> > EEPROM data is accessible via a bin_attribute file called 'eeprom',
> > e.g. /sys/bus/i2c/devices/24-0050/eeprom.
> >
> > Signed-off-by: Don Bollinger <[email protected]>
> > ---
> >
> > Why should this driver be in the Linux kernel? SFP and QSFP devices plug
> > into switches to convert electrical to optical signals and drive the
> > optical signal over fiber optic cables. They provide status and control
> > registers through an i2c interface similar to to other EEPROMS. However,
> > they have a paging mechanism that is unique, which requires a different
> > driver from (for example) at24. Various drivers have been developed for
> > this purpose, none of them support both SFP and QSFP, provide both read
> > and write access, and access all 256 architected pages. optoe does all
> > of these.
> >
> > optoe has been adopted and is shipping to customers as a base module,
> > available to all platforms (switches) and used by multiple vendors and
> > platforms on both ONL (Open Network Linux) and SONiC (Microsoft's
> > 'Software for Open Networking in the Cloud').
> >
> > This patch has been built on the latest staging-testing kernel. It has
> > built and tested with SFP and QSFP devices on an ARM platform with a 4.9
> > kernel, and an x86 switch with a 3.16 kernel. This patch should install
> > and build clean on any kernel from 3.16 up to the latest (as of 6/10/2018).
> >
> >
> > Documentation/misc-devices/optoe.txt | 56 ++
> > drivers/misc/eeprom/Kconfig | 18 +
> > drivers/misc/eeprom/Makefile | 1 +
> > drivers/misc/eeprom/optoe.c | 1141 ++++++++++++++++++++++++++++++++++
> > 4 files changed, 1216 insertions(+)
> > create mode 100644 Documentation/misc-devices/optoe.txt
> > create mode 100644 drivers/misc/eeprom/optoe.c
> >
>
> There's an SFP driver under drivers/net/phy. Can that driver be extended
> to provide this support? Adding Russel King who developed sfp.c, as well
> at the netdev mailing list.
>
> Thanks,
> Tom
I don't think that would work very well. It's software, anything is
possible, but...
sfp.c appears to be working with mdio, gpio and 'sfp_bus', in addition
to i2c. optoe only access the EEPROM, and only via i2c. The linux
standard i2c interfaces are the only dependencies for optoe.
Also, sfp.c does not support QSFP (best I can tell), and it does not
support additional pages beyond the 256 bytes at i2c address 0x50 and
0x51.
Best I can tell, sfp.c supports SFP devices on NIC cards. Those cards
have a very different hardware architecture from the network switches
that optoe is targeting.
> There's an SFP driver under drivers/net/phy. Can that driver be extended
> to provide this support? Adding Russel King who developed sfp.c, as well
> at the netdev mailing list.
I agree, the current SFP code should be used.
My observations seem to be there are two different ways {Q}SFP are used:
1) The Linux kernel has full control, as assumed by the devlink/SFP
frame work. We parse the SFP data to find the capabilities of the SFP
and use it to program the MAC to use the correct mode. The MAC can be
a NIC, but it can also be a switch. DSA is gaining support for
PHYLINK, so SFP modules should just work with most switches which DSA
support. And there is no reason a plain switchdev switch can not use
PHYLINK.
2) Firmware is in control of the PHY layer, but there is a wish to
expose some of the data which is available via i2c from the {Q}SFP to
linux.
It appears this optoe supports this second case. It does not appear to
support any in kernel API to actually make use of the SFP data in the
kernel.
We should not be duplicating code. We should share the SFP code for
both use cases above. There is also a Linux standard API for getting
access to this information. ethtool -m/--module-info. Anything which
is exporting {Q}SFP data needs to use this API.
Andrew
On Mon, Jun 11, 2018 at 03:43:02PM +0200, Arnd Bergmann wrote:
> On Mon, Jun 11, 2018 at 6:25 AM, Don Bollinger <[email protected]> wrote:
> > optoe is an i2c based driver that supports read/write access to all
> > the pages (tables) of MSA standard SFP and similar devices (conforming
> > to the SFF-8472 spec) and MSA standard QSFP and similar devices
> > (conforming to the SFF-8436 spec).
> >
> >
> > Signed-off-by: Don Bollinger <[email protected]>
>
> > +
> > +#undef EEPROM_CLASS
> > +#ifdef CONFIG_EEPROM_CLASS
> > +#define EEPROM_CLASS
> > +#endif
> > +#ifdef CONFIG_EEPROM_CLASS_MODULE
> > +#define EEPROM_CLASS
> > +#endif
>
> I don't understand this part: I see some older patches introducing an
> EEPROM_CLASS, but nothing ever seems to have made it into the
> mainline kernel.
>
> If that class isn't there, this code shouldn't be either. You can always
> add it back in case we decide to introduce that class later, but then
> I wouldn't make it a compile-time option but just a hard dependency
> instead.
Thanks for the feedback.
Some background will explain how optoe got here... My goal is to make
the EEPROM data and controls in {Q}SFP devices more accessible. (I'm
working for Finisar, an optics vendor.)
The ecosystem where optoe operates includes switch vendors, NOS vendors,
optics vendors and switch ASIC vendors, competing and collaborating to
build a bunch of different complete white box switch solutions.
The NOS (Network Operating System) vendors start with a Linux distro,
then add a bunch of their own patches and 'value add'. Then they
include platform drivers from the switch vendors. And they integrate
the chosen switch ASIC SDK.
Here's the key: I don't have access to all of those pieces. I can
build an environment to build my driver for many of those systems, but I
can't change other elements of that NOS.
The first NOS I targeted (Cumulus Linux) happens to be the author of the
EEPROM_CLASS patches you cited. I began with their driver (the best
available), and morphed it into optoe. To fit their environment, I kept
the EEPROM_CLASS code. They use their own platform driver (one for each
switch model) to instantiate their predecessor to the optoe driver. So,
since I don't have access to their platform driver, I kept the data
structure they pass to the probe function, where I get initialization
data. That is now struct optoe_platform_data. The *dummy items in that
struct maintain compatibility, while making it clear that they aren't
really needed.
When I targeted the next NOS, it did not have the EEPROM_CLASS code. I
figured out I could #ifdef that code, enabling me to create a driver that
works in both environments.
>
> > +struct optoe_platform_data {
> > + u32 byte_len; /* size (sum of all addr) */
> > + u16 page_size; /* for writes */
> > + u8 flags;
> > + void *dummy1; /* backward compatibility */
> > + void *dummy2; /* backward compatibility */
> > +
> > +#ifdef EEPROM_CLASS
> > + struct eeprom_platform_data *eeprom_data;
> > +#endif
> > + char port_name[MAX_PORT_NAME_LEN];
> > +};
>
> What is the backward-compatibility for? Normally we don't do it
> like that, we only keep compatibility with things that have already
> been part of the kernel, especially when you also have an #ifdef
> that makes it incompatible again ;-)
So it is actually compatible with two different kernel compilations. If
EEPROM_CLASS is configured, optoe supports that model. If not, optoe
implements a sysfs file to identify which port this device is on. It
works.
>
> > +struct optoe_data {
> > + struct optoe_platform_data chip;
>
> Maybe merge the two structures into one?
optoe_platform_data is passed in when optoe gets a probe. It is only
part of the data I need to maintain internally. I inherited this
pattern from at24.c (the predecessor ^2 to optoe).
>
> Arnd
>
All that said... I am working with my partners to see if we can remove
both the EEPROM_CLASS code and the compatibility anomalies. Technically
it would be easy. Logistically it is probably manageable. The next
step is mine, I will either remove the offending code or I will lobby to
keep it in there.
Don
On Thu, Jun 14, 2018 at 2:40 AM, Don Bollinger <[email protected]> wrote:
> On Mon, Jun 11, 2018 at 03:43:02PM +0200, Arnd Bergmann wrote:
>> On Mon, Jun 11, 2018 at 6:25 AM, Don Bollinger <[email protected]> wrote:
>>
>> I don't understand this part: I see some older patches introducing an
>> EEPROM_CLASS, but nothing ever seems to have made it into the
>> mainline kernel.
>>
>> If that class isn't there, this code shouldn't be either. You can always
>> add it back in case we decide to introduce that class later, but then
>> I wouldn't make it a compile-time option but just a hard dependency
>> instead.
>
> Thanks for the feedback.
>
> Some background will explain how optoe got here... My goal is to make
> the EEPROM data and controls in {Q}SFP devices more accessible. (I'm
> working for Finisar, an optics vendor.)
>
> The ecosystem where optoe operates includes switch vendors, NOS vendors,
> optics vendors and switch ASIC vendors, competing and collaborating to
> build a bunch of different complete white box switch solutions.
>
> The NOS (Network Operating System) vendors start with a Linux distro,
> then add a bunch of their own patches and 'value add'. Then they
> include platform drivers from the switch vendors. And they integrate
> the chosen switch ASIC SDK.
>
> Here's the key: I don't have access to all of those pieces. I can
> build an environment to build my driver for many of those systems, but I
> can't change other elements of that NOS.
>
> The first NOS I targeted (Cumulus Linux) happens to be the author of the
> EEPROM_CLASS patches you cited. I began with their driver (the best
> available), and morphed it into optoe. To fit their environment, I kept
> the EEPROM_CLASS code. They use their own platform driver (one for each
> switch model) to instantiate their predecessor to the optoe driver. So,
> since I don't have access to their platform driver, I kept the data
> structure they pass to the probe function, where I get initialization
> data. That is now struct optoe_platform_data. The *dummy items in that
> struct maintain compatibility, while making it clear that they aren't
> really needed.
>
> When I targeted the next NOS, it did not have the EEPROM_CLASS code. I
> figured out I could #ifdef that code, enabling me to create a driver that
> works in both environments.
Ok, I see. For the upstream submission of course, none of the forked
kernel code bases matter at all, what we want is a driver that makes
sense by itself, and none of it should depend on any third party code.
>> > +struct optoe_platform_data {
>> > + u32 byte_len; /* size (sum of all addr) */
>> > + u16 page_size; /* for writes */
>> > + u8 flags;
>> > + void *dummy1; /* backward compatibility */
>> > + void *dummy2; /* backward compatibility */
>> > +
>> > +#ifdef EEPROM_CLASS
>> > + struct eeprom_platform_data *eeprom_data;
>> > +#endif
>> > + char port_name[MAX_PORT_NAME_LEN];
>> > +};
>>
>> What is the backward-compatibility for? Normally we don't do it
>> like that, we only keep compatibility with things that have already
>> been part of the kernel, especially when you also have an #ifdef
>> that makes it incompatible again ;-)
>
> So it is actually compatible with two different kernel compilations. If
> EEPROM_CLASS is configured, optoe supports that model. If not, optoe
> implements a sysfs file to identify which port this device is on. It
> works.
I mean the 'dummy1' and 'dummy2' variables. They don't seem to make
any sense, and neither does the comment here.
>> > +struct optoe_data {
>> > + struct optoe_platform_data chip;
>>
>> Maybe merge the two structures into one?
>
> optoe_platform_data is passed in when optoe gets a probe. It is only
> part of the data I need to maintain internally. I inherited this
> pattern from at24.c (the predecessor ^2 to optoe).
Passed in from where? The structure is declared locally in this
file, so no other driver has access to the definition.
For traditional devices, we would use a header in
include/linux/platform_data/, but a more modern way of doing this
would be to use named device properties that are either put
in the devicetree file (on embedded machines) or added through
the .properties field when statically declaring an i2c device from
a PCI device parent.
Arnd
On Tue, Jun 12, 2018 at 08:11:09PM +0200, Andrew Lunn wrote:
> > There's an SFP driver under drivers/net/phy. Can that driver be extended
> > to provide this support? Adding Russel King who developed sfp.c, as well
> > at the netdev mailing list.
>
> I agree, the current SFP code should be used.
>
> My observations seem to be there are two different ways {Q}SFP are used:
>
> 1) The Linux kernel has full control, as assumed by the devlink/SFP
> frame work. We parse the SFP data to find the capabilities of the SFP
> and use it to program the MAC to use the correct mode. The MAC can be
> a NIC, but it can also be a switch. DSA is gaining support for
> PHYLINK, so SFP modules should just work with most switches which DSA
> support. And there is no reason a plain switchdev switch can not use
> PHYLINK.
>
> 2) Firmware is in control of the PHY layer, but there is a wish to
> expose some of the data which is available via i2c from the {Q}SFP to
> linux.
>
> It appears this optoe supports this second case. It does not appear to
> support any in kernel API to actually make use of the SFP data in the
> kernel.
>
> We should not be duplicating code. We should share the SFP code for
> both use cases above. There is also a Linux standard API for getting
> access to this information. ethtool -m/--module-info. Anything which
> is exporting {Q}SFP data needs to use this API.
>
> Andrew
Actually this is better described by a third use case. The target
switches are PHY-less (see various designs at
http://www.compute.org/wiki/Networking/SpecsAndDesigns). The AS5712 for example
says "The AS5712-54X is a PHY-Less design with the SFP+ and QSFP+
connections directly attaching to the Serdes interfaces of the Broadcom
BCM56854 720G Trident 2 switching silicon..."
The electrical controls of the {Q}SFP devices (TxDisable for example)
are organized in a platform dependent way, through CPLD devices, and
managed by a platform specific CPLD driver.
The i2c bus is muxed from the CPU to all of the {Q}SFP devices, which
are set up as standard linux i2c devices
(/sys/bus/i2c/devices/i2c-xxxx).
There is no MDIO bus between the CPU and the {Q}SFP devices.
> 2) Firmware is in control of the PHY layer, but there is a wish to
> expose some of the data which is available via i2c from the {Q}SFP to
> linux.
So the switch silicon is in control of the PHY layer. The platform
driver is in control of the electrical interfaces. And the EEPROM data
is available via I2C.
And, there isn't actually 'a wish to expose' the EEPROM data to linux
(the kernel). It turns out that none of the NOS partners I'm working
with use that data *in the kernel*. It is all managed from user space.
More generally, I think sfp.c and optoe are not actually trying to
accomplish the same thing at all. sfp.c combines all three functions
(PHY, electrical control, EEPROM access). optoe is only providing EEPROM
access, and only to user space. This is a real need in the white box
switch environment, and is not met by sfp.c. optoe isn't better, sfp.c
isn't better, they're just different.
Finally, sfp.c does not recognize that SFP devices have data beyond 512
bytes, accessible via a page register. It also does not recognize QSFP
devices at all. QSFP devices have only 256 bytes accessible (one i2c
address) before switching to paged access for the remaining data. The
first design requirement for optoe was to access all the available
pages, because there is information and controls that we (optics
vendors) want to make available to network management applications.
If sfp.c creates a standard linux i2c client for each SFP device, it
should be possible to create an optoe managed device 'under' sfp.c to
provide access to the full EEPROM address space:
# echo optoe2 0x50 >/sys/bus/i2c/devices/i2c-xx/new_device
This might prove useful to user space consumers of that data. We could
also easily add a kernel API (eg the nvmem framework) to optoe to provide
kernel access. In other words, sfp.c could assign EEPROM management to
optoe, while managing the electrical interfaces. (This is actually
pretty close to how the platfom drivers work in the switch environment.)
sfp.c would get SFP page support and QSFP EEPROM access 'for free'.
> There is also a Linux standard API for getting
> access to this information. ethtool -m/--module-info. Anything which
> is exporting {Q}SFP data needs to use this API.
optoe simply provides direct access from user space to the full EEPROM
data. There is more data there than ethtool knows about, and in some
devices there are proprietary registers that ethtool will never know
about. optoe does not interpret any of the EEPROM content (except the
bare minimum to access pages correctly). optoe also does not get in the
way of ethtool. It could prove to be a handy way for ethtool to access
new EEPROM fields in the future. QSFP-DD/OSFP are coming soon, they
will have a different (incompatible) set of new fields to be decoded.
Bottom Line: sfp.c is not a useful starting point for the switch
environment I'm working in. The underlying hardware architecture is
quite different. optoe is not a competing alternative. Its only
function is to provide user-space access to the EEPROM data in {Q}SFP
devices.
Don
On Thu, Jun 14, 2018 at 09:46:36AM +0200, Arnd Bergmann wrote:
> On Thu, Jun 14, 2018 at 2:40 AM, Don Bollinger <[email protected]> wrote:
> > On Mon, Jun 11, 2018 at 03:43:02PM +0200, Arnd Bergmann wrote:
> >> On Mon, Jun 11, 2018 at 6:25 AM, Don Bollinger <[email protected]> wrote:
>
> >>
> >> I don't understand this part: I see some older patches introducing an
> >> EEPROM_CLASS, but nothing ever seems to have made it into the
> >> mainline kernel.
> >>
> >> If that class isn't there, this code shouldn't be either. You can always
> >> add it back in case we decide to introduce that class later, but then
> >> I wouldn't make it a compile-time option but just a hard dependency
> >> instead.
> >
> > Thanks for the feedback.
> >
> > Some background will explain how optoe got here...
>
> Ok, I see. For the upstream submission of course, none of the forked
> kernel code bases matter at all, what we want is a driver that makes
> sense by itself, and none of it should depend on any third party code.
Got it.
> For traditional devices, we would use a header in
> include/linux/platform_data/, but a more modern way of doing this
> would be to use named device properties that are either put
> in the devicetree file (on embedded machines) or added through
> the .properties field when statically declaring an i2c device from
> a PCI device parent.
>
> Arnd
Thanks for the guidance. It turns out that getting into mainline makes
it easier for my partners to consume a header in
include/linux/platform_data. I'll restore that file and remove all of the
unnecessary items, which should address the concerns you have raised.
Rev 2 coming soon.
Don
On 06/14/2018 07:26 PM, Don Bollinger wrote:
> On Tue, Jun 12, 2018 at 08:11:09PM +0200, Andrew Lunn wrote:
>>> There's an SFP driver under drivers/net/phy. Can that driver be extended
>>> to provide this support? Adding Russel King who developed sfp.c, as well
>>> at the netdev mailing list.
>>
>> I agree, the current SFP code should be used.
>>
>> My observations seem to be there are two different ways {Q}SFP are used:
>>
>> 1) The Linux kernel has full control, as assumed by the devlink/SFP
>> frame work. We parse the SFP data to find the capabilities of the SFP
>> and use it to program the MAC to use the correct mode. The MAC can be
>> a NIC, but it can also be a switch. DSA is gaining support for
>> PHYLINK, so SFP modules should just work with most switches which DSA
>> support. And there is no reason a plain switchdev switch can not use
>> PHYLINK.
>>
>> 2) Firmware is in control of the PHY layer, but there is a wish to
>> expose some of the data which is available via i2c from the {Q}SFP to
>> linux.
>>
>> It appears this optoe supports this second case. It does not appear to
>> support any in kernel API to actually make use of the SFP data in the
>> kernel.
>>
>> We should not be duplicating code. We should share the SFP code for
>> both use cases above. There is also a Linux standard API for getting
>> access to this information. ethtool -m/--module-info. Anything which
>> is exporting {Q}SFP data needs to use this API.
>>
>> Andrew
>
> Actually this is better described by a third use case. The target
> switches are PHY-less (see various designs at
> http://www.compute.org/wiki/Networking/SpecsAndDesigns). The AS5712 for example
> says "The AS5712-54X is a PHY-Less design with the SFP+ and QSFP+
> connections directly attaching to the Serdes interfaces of the Broadcom
> BCM56854 720G Trident 2 switching silicon..."
>
> The electrical controls of the {Q}SFP devices (TxDisable for example)
> are organized in a platform dependent way, through CPLD devices, and
> managed by a platform specific CPLD driver.
>
> The i2c bus is muxed from the CPU to all of the {Q}SFP devices, which
> are set up as standard linux i2c devices
> (/sys/bus/i2c/devices/i2c-xxxx).
>
> There is no MDIO bus between the CPU and the {Q}SFP devices.
>
>> 2) Firmware is in control of the PHY layer, but there is a wish to
>> expose some of the data which is available via i2c from the {Q}SFP to
>> linux.
>
> So the switch silicon is in control of the PHY layer. The platform
> driver is in control of the electrical interfaces. And the EEPROM data
> is available via I2C.
>
> And, there isn't actually 'a wish to expose' the EEPROM data to linux
> (the kernel). It turns out that none of the NOS partners I'm working
> with use that data *in the kernel*. It is all managed from user space.
>
> More generally, I think sfp.c and optoe are not actually trying to
> accomplish the same thing at all. sfp.c combines all three functions
> (PHY, electrical control, EEPROM access). optoe is only providing EEPROM
> access, and only to user space. This is a real need in the white box
> switch environment, and is not met by sfp.c. optoe isn't better, sfp.c
> isn't better, they're just different.
sfp exposes standard ethtool hooks such as get_module_info() which pass
the whole data blob to user-space, e.g: ethtool where all of this is
better interpreted.
>
> Finally, sfp.c does not recognize that SFP devices have data beyond 512
> bytes, accessible via a page register. It also does not recognize QSFP
> devices at all. QSFP devices have only 256 bytes accessible (one i2c
> address) before switching to paged access for the remaining data. The
> first design requirement for optoe was to access all the available
> pages, because there is information and controls that we (optics
> vendors) want to make available to network management applications.
Patches welcome if you wish to extend sfp.c to support QSFP devices for
instances.
>
> If sfp.c creates a standard linux i2c client for each SFP device, it
> should be possible to create an optoe managed device 'under' sfp.c to
> provide access to the full EEPROM address space:
It's the other way around, SFP relies on a standard Linux i2c bus master
to exist such that it can read the EEPROM from the standard slave
address location, same goes with a possibly present PHY.
> # echo optoe2 0x50 >/sys/bus/i2c/devices/i2c-xx/new_device
> This might prove useful to user space consumers of that data. We could
> also easily add a kernel API (eg the nvmem framework) to optoe to provide
> kernel access. In other words, sfp.c could assign EEPROM management to
> optoe, while managing the electrical interfaces. (This is actually
> pretty close to how the platfom drivers work in the switch environment.)
> sfp.c would get SFP page support and QSFP EEPROM access 'for free'.
That sounds like a possibly good approach.
>
>> There is also a Linux standard API for getting
>> access to this information. ethtool -m/--module-info. Anything which
>> is exporting {Q}SFP data needs to use this API.
>
> optoe simply provides direct access from user space to the full EEPROM
> data. There is more data there than ethtool knows about, and in some
> devices there are proprietary registers that ethtool will never know
> about. optoe does not interpret any of the EEPROM content (except the
> bare minimum to access pages correctly). optoe also does not get in the
> way of ethtool. It could prove to be a handy way for ethtool to access
> new EEPROM fields in the future. QSFP-DD/OSFP are coming soon, they
> will have a different (incompatible) set of new fields to be decoded.
sfp is the same it only passes the EEPROM information to user-space and
interprets just what it needs to get the work done.
>
> Bottom Line: sfp.c is not a useful starting point for the switch
> environment I'm working in. The underlying hardware architecture is
> quite different. optoe is not a competing alternative. Its only
> function is to provide user-space access to the EEPROM data in {Q}SFP
> devices.
I just don't understand why would we want optoe when the standard way to
expose both EEPROM and diagnostics modules has been through ethtool's
get_module_info since the basic paradigm is that a network port is a
net_device instance in the kernel. If that basic device driver model
does not exist, then it is unclear to me what are the benefits.
Would I be completely wrong if I wrote that you are likely working with
a switch SDK which primarily runs in user-space and so with lack of a
proper kernel-based device driver for your piece of hardware you are
attempting to create a driver which is some sort of a "tap" for some
specific portion of that larger hardware block?
--
Florian
> Actually this is better described by a third use case. The target
> switches are PHY-less (see various designs at
> http://www.compute.org/wiki/Networking/SpecsAndDesigns). The AS5712 for example
> says "The AS5712-54X is a PHY-Less design with the SFP+ and QSFP+
> connections directly attaching to the Serdes interfaces of the Broadcom
> BCM56854 720G Trident 2 switching silicon..."
We consider the SFP+ and QSFP+ as being the PHY. You need something to
control that PHY. Either it is firmware running in the switch, or it
is the Linux kernel, via PHYLINK.
> The i2c bus is muxed from the CPU to all of the {Q}SFP devices, which
> are set up as standard linux i2c devices
> (/sys/bus/i2c/devices/i2c-xxxx).
Having a standard i2c bus driver is correct. This is what PHYLINK
assumes. It knows about the different addresses the SFP uses on the
i2c bus.
> There is no MDIO bus between the CPU and the {Q}SFP devices.
There is no physical MDIO bus for SFP devices. If the SFP module
implements copper 1G, there is often MDIO tunnelled over i2c. PHYLINK
knows how to do this, and will instantiate a normal Linux MDIO bus
driver, and then you can use the Linux kernel copper PHY state
machines as normal.
> And, there isn't actually 'a wish to expose' the EEPROM data to linux
> (the kernel). It turns out that none of the NOS partners I'm working
> with use that data *in the kernel*. It is all managed from user space.
Ah. O.K. We can stop here then.
If you are using Linux as a boot loader, i doubt you will find any
network kernel developers who are willing to consider this driver. The
kernel community as decided switchdev is how the Linux kernel supports
switches. We are unlikely to add drivers for supporting user space
drivers of switches.
NACK.
Andrew
On Thu, Jun 14, 2018 at 08:24:34PM -0700, Florian Fainelli wrote:
>
>
> On 06/14/2018 07:26 PM, Don Bollinger wrote:
> > On Tue, Jun 12, 2018 at 08:11:09PM +0200, Andrew Lunn wrote:
> >>> There's an SFP driver under drivers/net/phy. Can that driver be extended
> >>> to provide this support? Adding Russel King who developed sfp.c, as well
> >>> at the netdev mailing list.
> >>
> >> I agree, the current SFP code should be used.
> >>
> >> My observations seem to be there are two different ways {Q}SFP are used:
> >>
> >> 1) The Linux kernel has full control, as assumed by the devlink/SFP
> >> frame work. We parse the SFP data to find the capabilities of the SFP
> >> and use it to program the MAC to use the correct mode. The MAC can be
> >> a NIC, but it can also be a switch. DSA is gaining support for
> >> PHYLINK, so SFP modules should just work with most switches which DSA
> >> support. And there is no reason a plain switchdev switch can not use
> >> PHYLINK.
> >>
> >> 2) Firmware is in control of the PHY layer, but there is a wish to
> >> expose some of the data which is available via i2c from the {Q}SFP to
> >> linux.
> >>
> >> It appears this optoe supports this second case. It does not appear to
> >> support any in kernel API to actually make use of the SFP data in the
> >> kernel.
> >>
> >> We should not be duplicating code. We should share the SFP code for
> >> both use cases above. There is also a Linux standard API for getting
> >> access to this information. ethtool -m/--module-info. Anything which
> >> is exporting {Q}SFP data needs to use this API.
> >>
> >> Andrew
> >
> > Actually this is better described by a third use case. The target
> > switches are PHY-less (see various designs at
> > http://www.compute.org/wiki/Networking/SpecsAndDesigns). The AS5712 for example
> > says "The AS5712-54X is a PHY-Less design with the SFP+ and QSFP+
> > connections directly attaching to the Serdes interfaces of the Broadcom
> > BCM56854 720G Trident 2 switching silicon..."
> >
> > The electrical controls of the {Q}SFP devices (TxDisable for example)
> > are organized in a platform dependent way, through CPLD devices, and
> > managed by a platform specific CPLD driver.
> >
> > The i2c bus is muxed from the CPU to all of the {Q}SFP devices, which
> > are set up as standard linux i2c devices
> > (/sys/bus/i2c/devices/i2c-xxxx).
> >
> > There is no MDIO bus between the CPU and the {Q}SFP devices.
> >
> >> 2) Firmware is in control of the PHY layer, but there is a wish to
> >> expose some of the data which is available via i2c from the {Q}SFP to
> >> linux.
> >
> > So the switch silicon is in control of the PHY layer. The platform
> > driver is in control of the electrical interfaces. And the EEPROM data
> > is available via I2C.
> >
> > And, there isn't actually 'a wish to expose' the EEPROM data to linux
> > (the kernel). It turns out that none of the NOS partners I'm working
> > with use that data *in the kernel*. It is all managed from user space.
> >
> > More generally, I think sfp.c and optoe are not actually trying to
> > accomplish the same thing at all. sfp.c combines all three functions
> > (PHY, electrical control, EEPROM access). optoe is only providing EEPROM
> > access, and only to user space. This is a real need in the white box
> > switch environment, and is not met by sfp.c. optoe isn't better, sfp.c
> > isn't better, they're just different.
>
> sfp exposes standard ethtool hooks such as get_module_info() which pass
> the whole data blob to user-space, e.g: ethtool where all of this is
> better interpreted.
Yes. But ethtool depends on the underlying driver to access the data.
The current sfp.c implementation limits the amount of data that can be
accessed. optoe makes the entire EEPROM accessible. I think the right
solution is to call optoe for access to the EEPROM. A couple of lines
of code in sfp_i2c_read could call optoe to get the data instead of
formatting the i2c_transfer directly. That change would immediately
give the whole SFP framework access to all the address space of both SFP
and QSFP. (Same change to sfp_i2c_write.)
>
> >
> > Finally, sfp.c does not recognize that SFP devices have data beyond 512
> > bytes, accessible via a page register. It also does not recognize QSFP
> > devices at all. QSFP devices have only 256 bytes accessible (one i2c
> > address) before switching to paged access for the remaining data. The
> > first design requirement for optoe was to access all the available
> > pages, because there is information and controls that we (optics
> > vendors) want to make available to network management applications.
>
> Patches welcome if you wish to extend sfp.c to support QSFP devices for
> instances.
I would love to collaborate on this. I don't have an environment
(hardware or software) where I could build or test changes to the sfp
code.
>
> >
> > If sfp.c creates a standard linux i2c client for each SFP device, it
> > should be possible to create an optoe managed device 'under' sfp.c to
> > provide access to the full EEPROM address space:
>
> It's the other way around, SFP relies on a standard Linux i2c bus master
> to exist such that it can read the EEPROM from the standard slave
> address location, same goes with a possibly present PHY.
Great. Then plugging optoe in should be easy.
>
> > # echo optoe2 0x50 >/sys/bus/i2c/devices/i2c-xx/new_device
> > This might prove useful to user space consumers of that data. We could
> > also easily add a kernel API (eg the nvmem framework) to optoe to provide
> > kernel access. In other words, sfp.c could assign EEPROM management to
> > optoe, while managing the electrical interfaces. (This is actually
> > pretty close to how the platfom drivers work in the switch environment.)
> > sfp.c would get SFP page support and QSFP EEPROM access 'for free'.
>
> That sounds like a possibly good approach.
Thanks
>
> >
> >> There is also a Linux standard API for getting
> >> access to this information. ethtool -m/--module-info. Anything which
> >> is exporting {Q}SFP data needs to use this API.
> >
> > optoe simply provides direct access from user space to the full EEPROM
> > data. There is more data there than ethtool knows about, and in some
> > devices there are proprietary registers that ethtool will never know
> > about. optoe does not interpret any of the EEPROM content (except the
> > bare minimum to access pages correctly). optoe also does not get in the
> > way of ethtool. It could prove to be a handy way for ethtool to access
> > new EEPROM fields in the future. QSFP-DD/OSFP are coming soon, they
> > will have a different (incompatible) set of new fields to be decoded.
>
> sfp is the same it only passes the EEPROM information to user-space and
> interprets just what it needs to get the work done.
I offer include/linux/sfp.h as a counter example. Every byte, every bit
in the spec is spec'ed there. That's great, but exceeds the mandate of
optoe. Optoe is just there to get the data in and out of the EEPROM.
>
> >
> > Bottom Line: sfp.c is not a useful starting point for the switch
> > environment I'm working in. The underlying hardware architecture is
> > quite different. optoe is not a competing alternative. Its only
> > function is to provide user-space access to the EEPROM data in {Q}SFP
> > devices.
>
> I just don't understand why would we want optoe when the standard way to
> expose both EEPROM and diagnostics modules has been through ethtool's
> get_module_info since the basic paradigm is that a network port is a
> net_device instance in the kernel. If that basic device driver model
> does not exist, then it is unclear to me what are the benefits.
We want optoe to access regions of the EEPROM which are paged, and to
access QSFP which only has one I2C address and is also paged. It is
just the sfp_read/sfp_write function, but reading and writing the whole
EEPROM. Plugging it into sfp gives that broader access to the ethtool
interface and the rest of the net-device model.
>
> Would I be completely wrong if I wrote that you are likely working with
> a switch SDK which primarily runs in user-space and so with lack of a
> proper kernel-based device driver for your piece of hardware you are
> attempting to create a driver which is some sort of a "tap" for some
> specific portion of that larger hardware block?
Not completely wrong, but biased. The switch ASIC and the switch SDK do
not connect to the i2c bus (at least in the architecture I'm working
with). Whether or not it is in user-space isn't the point. That it
doesn't access i2c, hence doesn't access the EEPROM, is the reason we
have a 'proper kernel-based device driver'.
> --
> Florian
>
Don
On Fri, Jun 15, 2018 at 09:54:17AM +0200, Andrew Lunn wrote:
> > Actually this is better described by a third use case. The target
> > switches are PHY-less (see various designs at
> > http://www.compute.org/wiki/Networking/SpecsAndDesigns). The AS5712 for example
> > says "The AS5712-54X is a PHY-Less design with the SFP+ and QSFP+
> > connections directly attaching to the Serdes interfaces of the Broadcom
> > BCM56854 720G Trident 2 switching silicon..."
>
> We consider the SFP+ and QSFP+ as being the PHY. You need something to
> control that PHY. Either it is firmware running in the switch, or it
> is the Linux kernel, via PHYLINK.
Actually in the environment I'm working in (at least 3 different NOS
vendors, and 3 different switch vendors), the {Q}SFP devices are
controlled by a platform specific driver that pokes CPLD registers. I'm
not sure where the actual control logic is, but it doesn't appear to use
any of the frameworks you are describing.
>
> > The i2c bus is muxed from the CPU to all of the {Q}SFP devices, which
> > are set up as standard linux i2c devices
> > (/sys/bus/i2c/devices/i2c-xxxx).
>
> Having a standard i2c bus driver is correct. This is what PHYLINK
> assumes. It knows about the different addresses the SFP uses on the
> i2c bus.
Great. Then plugging optoe into it should be easy.
>
> > There is no MDIO bus between the CPU and the {Q}SFP devices.
>
> There is no physical MDIO bus for SFP devices. If the SFP module
> implements copper 1G, there is often MDIO tunnelled over i2c. PHYLINK
> knows how to do this, and will instantiate a normal Linux MDIO bus
> driver, and then you can use the Linux kernel copper PHY state
> machines as normal.
>
> > And, there isn't actually 'a wish to expose' the EEPROM data to linux
> > (the kernel). It turns out that none of the NOS partners I'm working
> > with use that data *in the kernel*. It is all managed from user space.
>
> Ah. O.K. We can stop here then.
>
> If you are using Linux as a boot loader, i doubt you will find any
> network kernel developers who are willing to consider this driver. The
It isn't a boot loader. It is the kernel that is running on the switch
when it is doing its switch thing. The kernel hosts the drivers and the
switch SDK and all the apps that configure and manage the networking.
> kernel community as decided switchdev is how the Linux kernel supports
I'm sure switchdev works very well. It is not being used in the
environment I am trying to support. I've checked all 3 NOSs that have
adopted optoe, none of them have switchdev configured in their .config
file:
# CONFIG_NET_SWITCHDEV is not set
> switches. We are unlikely to add drivers for supporting user space
> drivers of switches.
That's not the request. I'm offering an improved driver to access {Q}SFP
EEPROMs. It can be easily called by your framework, so the sfp.c users
can also get improved access to the EEPROMs.
As designed it fits the need in the linux based community I'm
working with. It is in production in two NOSs on three switches. Less
complete variants of this driver are in production on all three NOSs
I've worked with, on dozens of platforms. This is real code, that fits
a real need, and would like the benefits of being maintained as part of
the mainline kernel.
>
> NACK.
>
> Andrew
>
Don
> > If you are using Linux as a boot loader, i doubt you will find any
> > network kernel developers who are willing to consider this driver. The
>
> It isn't a boot loader. It is the kernel that is running on the switch
> when it is doing its switch thing. The kernel hosts the drivers and the
> switch SDK and all the apps that configure and manage the networking.
That is exactly using it as a boot loader. Linux network stack itself
has nothing to do with the switches. It just loads an application
which controls the switch from user space.
> This is real code, that fits a real need, and would like the
> benefits of being maintained as part of the mainline kernel.
Have you ever attended one of the netdev conferences?
https://www.netdevconf.org/2.1/submit-proposal.html
Netdev 2.1 is a community-driven conference geared towards Linux
netheads. Linux kernel networking and user space utilization of the
interfaces to the Linux kernel networking subsystem are the focus. If
you are using Linux as a boot system for proprietary networking, then
this conference _may not be for you_.
That gives you an idea of what the kernel community feels about this
sort of thing. Why should we help maintain code which brings no
benefit to the netdev community?
If you make use of the existing SFP code, extend it so it both
benefits the netdev kernel community and your own, then we might
consider merging your code. But optoe as it is brings no benefit to
the netdev community, so is unlikely to get merged.
Andrew