This patch series removes the write() interface for user access in favor of an
ioctl() based approach. This is in response to the complaint that we had
different handlers for write() and writev() doing different things and expecting
different types of data. See:
http://www.spinics.net/lists/linux-rdma/msg34451.html
In a response to that thread we mentioned some other possible approaches such
as using multiple files, or converting everything to writev(). However after
looking at things more it seemed a cleaner approach to use ioctl().
So each command that was being done through write() has been converted to
ioctl() and the write() interface is removed. We still use writev() for the data
path but control is done totally through ioctl().
The plan is to make the same sort of change in qib as well but we want to get
the opinion of the community on the approach first.
As part of this work I also decided to move the eprom functionality to its own
device (last patch) since it really is its own thing. It uses ioctl() as well,
again because it seems to be a more natural interface for this operation.
There is also a driver software version being exported via a sysfs file. This is
needed so that user space applications (psm) can determine if it needs to do
ioctl() or write().
This patch applies on the latest hfi1 patch set "Fix link and other issues".
Patches can also be viewed on GitHub at:
https://github.com/ddalessa/kernel/tree/for-4.7
---
Dennis Dalessandro (7):
IB/hfi1: Export drivers user sw version via sysfs
IB/hfi1: Remove unused user command
IB/hfi1: Add ioctl() interface for user commands
IB/hfi1: Remove write(), use ioctl() for user cmds
IB/hfi1: Add trace message in user IOCTL handling
IB/hfi1: Consolidate IOCTL defines
IB/hfi1: Move eprom to its own device
drivers/staging/rdma/hfi1/common.h | 3
drivers/staging/rdma/hfi1/diag.c | 170 ++++++++++++++++++---------
drivers/staging/rdma/hfi1/eprom.c | 121 +------------------
drivers/staging/rdma/hfi1/eprom.h | 16 ++-
drivers/staging/rdma/hfi1/file_ops.c | 213 ++++++++++++++--------------------
drivers/staging/rdma/hfi1/hfi.h | 22 +++-
drivers/staging/rdma/hfi1/sysfs.c | 8 +
drivers/staging/rdma/hfi1/trace.c | 1
drivers/staging/rdma/hfi1/trace.h | 1
include/uapi/rdma/hfi/hfi1_user.h | 106 ++++++++++++++++-
10 files changed, 350 insertions(+), 311 deletions(-)
--
-Denny
User space consumers of hfi1 will need to know what version of the
driver they are dealing with. This becomes particularly important when
the write() interface is removed. User's will need to be able to query
the driver information but without asking the driver directly.
Add the driver's software version to sysfs for this exact purpose.
Reviewed-by: Mitko Haralanov <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
---
drivers/staging/rdma/hfi1/sysfs.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/drivers/staging/rdma/hfi1/sysfs.c b/drivers/staging/rdma/hfi1/sysfs.c
index 8cd6df8..2ff5a14 100644
--- a/drivers/staging/rdma/hfi1/sysfs.c
+++ b/drivers/staging/rdma/hfi1/sysfs.c
@@ -622,6 +622,12 @@ static ssize_t show_tempsense(struct device *device,
return ret;
}
+static ssize_t show_user_sw_version(struct device *device,
+ struct device_attribute *attr, char *buf)
+{
+ return scnprintf(buf, PAGE_SIZE, "0x%x\n", HFI1_USER_SWVERSION);
+}
+
/*
* end of per-unit (or driver, in some cases, but replicated
* per unit) functions
@@ -636,6 +642,7 @@ static DEVICE_ATTR(serial, S_IRUGO, show_serial, NULL);
static DEVICE_ATTR(boardversion, S_IRUGO, show_boardversion, NULL);
static DEVICE_ATTR(tempsense, S_IRUGO, show_tempsense, NULL);
static DEVICE_ATTR(chip_reset, S_IWUSR, NULL, store_chip_reset);
+static DEVICE_ATTR(user_sw_version, S_IRUGO, show_user_sw_version, NULL);
static struct device_attribute *hfi1_attributes[] = {
&dev_attr_hw_rev,
@@ -646,6 +653,7 @@ static struct device_attribute *hfi1_attributes[] = {
&dev_attr_boardversion,
&dev_attr_tempsense,
&dev_attr_chip_reset,
+ &dev_attr_user_sw_version,
};
int hfi1_create_port_files(struct ib_device *ibdev, u8 port_num,
The HFI1_CMD_SDMA_STATUS_UPD command was never implemented it has no
reason to live in the driver. Remove it.
Reviewed-by: Mitko Haralanov <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
---
drivers/staging/rdma/hfi1/file_ops.c | 3 ---
include/uapi/rdma/hfi/hfi1_user.h | 1 -
2 files changed, 0 insertions(+), 4 deletions(-)
diff --git a/drivers/staging/rdma/hfi1/file_ops.c b/drivers/staging/rdma/hfi1/file_ops.c
index 8396dc5..00d1b34 100644
--- a/drivers/staging/rdma/hfi1/file_ops.c
+++ b/drivers/staging/rdma/hfi1/file_ops.c
@@ -209,7 +209,6 @@ static ssize_t hfi1_file_write(struct file *fp, const char __user *data,
copy = sizeof(uinfo);
dest = &uinfo;
break;
- case HFI1_CMD_SDMA_STATUS_UPD:
case HFI1_CMD_CREDIT_UPD:
copy = 0;
break;
@@ -284,8 +283,6 @@ static ssize_t hfi1_file_write(struct file *fp, const char __user *data,
ret = get_base_info(fp, (void __user *)(unsigned long)
user_val, cmd.len);
break;
- case HFI1_CMD_SDMA_STATUS_UPD:
- break;
case HFI1_CMD_CREDIT_UPD:
if (uctxt && uctxt->sc)
sc_return_credits(uctxt->sc);
diff --git a/include/uapi/rdma/hfi/hfi1_user.h b/include/uapi/rdma/hfi/hfi1_user.h
index a533cec..46f6caa 100644
--- a/include/uapi/rdma/hfi/hfi1_user.h
+++ b/include/uapi/rdma/hfi/hfi1_user.h
@@ -127,7 +127,6 @@
#define HFI1_CMD_TID_UPDATE 4 /* update expected TID entries */
#define HFI1_CMD_TID_FREE 5 /* free expected TID entries */
#define HFI1_CMD_CREDIT_UPD 6 /* force an update of PIO credit */
-#define HFI1_CMD_SDMA_STATUS_UPD 7 /* force update of SDMA status ring */
#define HFI1_CMD_RECV_CTRL 8 /* control receipt of packets */
#define HFI1_CMD_POLL_TYPE 9 /* set the kind of polling we want */
IOCTL is more suited to what user space commands need to do than the
write() interface. Add IOCTL definitions for all existing write commands
and the handling for those. The write() interface will be removed in a
follow on patch.
Reviewed-by: Mitko Haralanov <[email protected]>
Reviewed-by: Mike Marciniszyn <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
---
drivers/staging/rdma/hfi1/common.h | 6 +
drivers/staging/rdma/hfi1/diag.c | 13 --
drivers/staging/rdma/hfi1/file_ops.c | 224 ++++++++++++++++++++++++++++++++++
drivers/staging/rdma/hfi1/hfi.h | 14 ++
include/uapi/rdma/hfi/hfi1_user.h | 48 +++++++
5 files changed, 294 insertions(+), 11 deletions(-)
diff --git a/drivers/staging/rdma/hfi1/common.h b/drivers/staging/rdma/hfi1/common.h
index e9b6bb3..37ac229 100644
--- a/drivers/staging/rdma/hfi1/common.h
+++ b/drivers/staging/rdma/hfi1/common.h
@@ -178,7 +178,8 @@
HFI1_CAP_PKEY_CHECK | \
HFI1_CAP_NO_INTEGRITY)
-#define HFI1_USER_SWVERSION ((HFI1_USER_SWMAJOR << 16) | HFI1_USER_SWMINOR)
+#define HFI1_USER_SWVERSION ((HFI1_USER_SWMAJOR << HFI1_SWMAJOR_SHIFT) | \
+ HFI1_USER_SWMINOR)
#ifndef HFI1_KERN_TYPE
#define HFI1_KERN_TYPE 0
@@ -349,6 +350,9 @@ struct hfi1_message_header {
#define HFI1_BECN_MASK 1
#define HFI1_BECN_SMASK BIT(HFI1_BECN_SHIFT)
+#define HFI1_PSM_IOC_BASE_SEQ 0x0
+#define HFI1_SNOOP_IOC_BASE_SEQ 0x80 /* leaves plenty of room for psm */
+
static inline __u64 rhf_to_cpu(const __le32 *rbuf)
{
return __le64_to_cpu(*((__le64 *)rbuf));
diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index bb2409a..4cf6e8d 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -168,9 +168,7 @@ struct hfi1_link_info {
*/
#define SNOOP_CAPTURE_VERSION 0x1
-#define IB_IOCTL_MAGIC 0x1b /* See Documentation/ioctl-number.txt */
#define HFI1_SNOOP_IOC_MAGIC IB_IOCTL_MAGIC
-#define HFI1_SNOOP_IOC_BASE_SEQ 0x80
#define HFI1_SNOOP_IOCGETLINKSTATE \
_IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ)
@@ -1040,21 +1038,16 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, unsigned long arg)
struct hfi1_pportdata *ppd = NULL;
unsigned int index;
struct hfi1_link_info link_info;
- int read_cmd, write_cmd, read_ok, write_ok;
dd = hfi1_dd_from_sc_inode(fp->f_inode);
if (!dd)
return -ENODEV;
- mode_flag = dd->hfi1_snoop.mode_flag;
- read_cmd = _IOC_DIR(cmd) & _IOC_READ;
- write_cmd = _IOC_DIR(cmd) & _IOC_WRITE;
- write_ok = access_ok(VERIFY_WRITE, (void __user *)arg, _IOC_SIZE(cmd));
- read_ok = access_ok(VERIFY_READ, (void __user *)arg, _IOC_SIZE(cmd));
-
- if ((read_cmd && !write_ok) || (write_cmd && !read_ok))
+ if (check_ioctl_access(cmd, arg))
return -EFAULT;
+ mode_flag = dd->hfi1_snoop.mode_flag;
+
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
diff --git a/drivers/staging/rdma/hfi1/file_ops.c b/drivers/staging/rdma/hfi1/file_ops.c
index 00d1b34..fab8676 100644
--- a/drivers/staging/rdma/hfi1/file_ops.c
+++ b/drivers/staging/rdma/hfi1/file_ops.c
@@ -95,6 +95,8 @@ static int user_event_ack(struct hfi1_ctxtdata *, int, unsigned long);
static int set_ctxt_pkey(struct hfi1_ctxtdata *, unsigned, u16);
static int manage_rcvq(struct hfi1_ctxtdata *, unsigned, int);
static int vma_fault(struct vm_area_struct *, struct vm_fault *);
+static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
+ unsigned long arg);
static const struct file_operations hfi1_file_ops = {
.owner = THIS_MODULE,
@@ -102,6 +104,7 @@ static const struct file_operations hfi1_file_ops = {
.write_iter = hfi1_write_iter,
.open = hfi1_file_open,
.release = hfi1_file_close,
+ .unlocked_ioctl = hfi1_file_ioctl,
.poll = hfi1_poll,
.mmap = hfi1_file_mmap,
.llseek = noop_llseek,
@@ -174,6 +177,227 @@ static int hfi1_file_open(struct inode *inode, struct file *fp)
return fp->private_data ? 0 : -ENOMEM;
}
+static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
+ unsigned long arg)
+{
+ struct hfi1_filedata *fd = fp->private_data;
+ struct hfi1_ctxtdata *uctxt = fd->uctxt;
+ struct hfi1_user_info uinfo;
+ struct hfi1_tid_info tinfo;
+ struct hfi1_cmd ucmd;
+ int uctxt_required = 1;
+ int ret = 0;
+ unsigned long addr;
+ int uval = 0;
+ unsigned long ul_uval = 0;
+ u16 uval16 = 0;
+
+ ret = check_ioctl_access(cmd, arg);
+ if (ret)
+ return ret;
+
+ switch (cmd) {
+ case HFI1_IOCTL_ASSIGN_CTXT:
+ uctxt_required = 0; /* assigned user context not required */
+ break;
+ case HFI1_IOCTL_EP_INFO:
+ case HFI1_IOCTL_EP_ERASE_CHIP:
+ case HFI1_IOCTL_EP_ERASE_RANGE:
+ case HFI1_IOCTL_EP_READ_RANGE:
+ case HFI1_IOCTL_EP_WRITE_RANGE:
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+ if (copy_from_user(&ucmd,
+ (struct hfi11_cmd __user *)arg,
+ sizeof(ucmd)))
+ return -EFAULT;
+ return handle_eprom_command(fp, &ucmd);
+ }
+
+ if (uctxt_required && !uctxt)
+ return -EINVAL;
+
+ /* Checked for root/context process the IOCTL */
+ switch (cmd) {
+ case HFI1_IOCTL_ASSIGN_CTXT:
+ if (copy_from_user(&uinfo,
+ (struct hfi1_user_info __user *)arg,
+ sizeof(uinfo)))
+ return -EFAULT;
+
+ ret = assign_ctxt(fp, &uinfo);
+ if (ret < 0)
+ return ret;
+ setup_ctxt(fp);
+ if (ret)
+ return ret;
+ ret = user_init(fp);
+ break;
+ case HFI1_IOCTL_CTXT_INFO:
+ ret = get_ctxt_info(fp, (void __user *)(unsigned long)arg,
+ sizeof(struct hfi1_ctxt_info));
+ break;
+ case HFI1_IOCTL_USER_INFO:
+ ret = get_base_info(fp, (void __user *)(unsigned long)arg,
+ sizeof(struct hfi1_base_info));
+ break;
+ case HFI1_IOCTL_CREDIT_UPD:
+ if (uctxt && uctxt->sc)
+ sc_return_credits(uctxt->sc);
+ break;
+
+ case HFI1_IOCTL_TID_UPDATE:
+ if (copy_from_user(&tinfo,
+ (struct hfi11_tid_info __user *)arg,
+ sizeof(tinfo)))
+ return -EFAULT;
+
+ ret = hfi1_user_exp_rcv_setup(fp, &tinfo);
+ if (!ret) {
+ /*
+ * Copy the number of tidlist entries we used
+ * and the length of the buffer we registered.
+ * These fields are adjacent in the structure so
+ * we can copy them at the same time.
+ */
+ addr = arg + offsetof(struct hfi1_tid_info, tidcnt);
+ if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
+ sizeof(tinfo.tidcnt) +
+ sizeof(tinfo.length)))
+ ret = -EFAULT;
+ }
+ break;
+
+ case HFI1_IOCTL_TID_FREE:
+ if (copy_from_user(&tinfo,
+ (struct hfi11_tid_info __user *)arg,
+ sizeof(tinfo)))
+ return -EFAULT;
+
+ ret = hfi1_user_exp_rcv_clear(fp, &tinfo);
+ if (ret)
+ break;
+ addr = arg + offsetof(struct hfi1_tid_info, tidcnt);
+ if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
+ sizeof(tinfo.tidcnt)))
+ ret = -EFAULT;
+ break;
+
+ case HFI1_IOCTL_TID_INVAL_READ:
+ if (copy_from_user(&tinfo,
+ (struct hfi11_tid_info __user *)arg,
+ sizeof(tinfo)))
+ return -EFAULT;
+
+ ret = hfi1_user_exp_rcv_invalid(fp, &tinfo);
+ if (ret)
+ break;
+ addr = arg + offsetof(struct hfi1_tid_info, tidcnt);
+ if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
+ sizeof(tinfo.tidcnt)))
+ ret = -EFAULT;
+ break;
+
+ case HFI1_IOCTL_RECV_CTRL:
+ ret = __get_user(uval, (int __user *)arg);
+ if (ret != 0)
+ return -EFAULT;
+ ret = manage_rcvq(uctxt, fd->subctxt, uval);
+ break;
+
+ case HFI1_IOCTL_POLL_TYPE:
+ ret = __get_user(uval, (int __user *)arg);
+ if (ret != 0)
+ return -EFAULT;
+ uctxt->poll_type = (typeof(uctxt->poll_type))uval;
+ break;
+
+ case HFI1_IOCTL_ACK_EVENT:
+ ret = __get_user(ul_uval, (unsigned long __user *)arg);
+ if (ret != 0)
+ return -EFAULT;
+ ret = user_event_ack(uctxt, fd->subctxt, ul_uval);
+ break;
+
+ case HFI1_IOCTL_SET_PKEY:
+ ret = __get_user(uval16, (u16 __user *)arg);
+ if (ret != 0)
+ return -EFAULT;
+ if (HFI1_CAP_IS_USET(PKEY_CHECK))
+ ret = set_ctxt_pkey(uctxt, fd->subctxt, uval16);
+ else
+ ret = -EPERM;
+ break;
+
+ case HFI1_IOCTL_CTXT_RESET: {
+ struct send_context *sc;
+ struct hfi1_devdata *dd;
+
+ if (!uctxt || !uctxt->dd || !uctxt->sc) {
+ ret = -EINVAL;
+ break;
+ }
+ /*
+ * There is no protection here. User level has to
+ * guarantee that no one will be writing to the send
+ * context while it is being re-initialized.
+ * If user level breaks that guarantee, it will break
+ * it's own context and no one else's.
+ */
+ dd = uctxt->dd;
+ sc = uctxt->sc;
+ /*
+ * Wait until the interrupt handler has marked the
+ * context as halted or frozen. Report error if we time
+ * out.
+ */
+ wait_event_interruptible_timeout(
+ sc->halt_wait, (sc->flags & SCF_HALTED),
+ msecs_to_jiffies(SEND_CTXT_HALT_TIMEOUT));
+ if (!(sc->flags & SCF_HALTED)) {
+ ret = -ENOLCK;
+ break;
+ }
+ /*
+ * If the send context was halted due to a Freeze,
+ * wait until the device has been "unfrozen" before
+ * resetting the context.
+ */
+ if (sc->flags & SCF_FROZEN) {
+ wait_event_interruptible_timeout(
+ dd->event_queue,
+ !(ACCESS_ONCE(dd->flags) & HFI1_FROZEN),
+ msecs_to_jiffies(SEND_CTXT_HALT_TIMEOUT));
+ if (dd->flags & HFI1_FROZEN) {
+ ret = -ENOLCK;
+ break;
+ }
+ if (dd->flags & HFI1_FORCED_FREEZE) {
+ /*
+ * Don't allow context reset if we are into
+ * forced freeze
+ */
+ ret = -ENODEV;
+ break;
+ }
+ sc_disable(sc);
+ ret = sc_enable(sc);
+ hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_ENB,
+ uctxt->ctxt);
+ } else {
+ ret = sc_restart(sc);
+ }
+ if (!ret)
+ sc_return_credits(sc);
+ break;
+ }
+ default:
+ return -EINVAL;
+ }
+
+ return ret;
+}
+
static ssize_t hfi1_file_write(struct file *fp, const char __user *data,
size_t count, loff_t *offset)
{
diff --git a/drivers/staging/rdma/hfi1/hfi.h b/drivers/staging/rdma/hfi1/hfi.h
index 7b78d56..7351898 100644
--- a/drivers/staging/rdma/hfi1/hfi.h
+++ b/drivers/staging/rdma/hfi1/hfi.h
@@ -1946,4 +1946,18 @@ static inline u32 qsfp_resource(struct hfi1_devdata *dd)
int hfi1_tempsense_rd(struct hfi1_devdata *dd, struct hfi1_temp *temp);
+static inline int check_ioctl_access(unsigned int cmd, unsigned long arg)
+{
+ int read_cmd, write_cmd, read_ok, write_ok;
+
+ read_cmd = _IOC_DIR(cmd) & _IOC_READ;
+ write_cmd = _IOC_DIR(cmd) & _IOC_WRITE;
+ write_ok = access_ok(VERIFY_WRITE, (void __user *)arg, _IOC_SIZE(cmd));
+ read_ok = access_ok(VERIFY_READ, (void __user *)arg, _IOC_SIZE(cmd));
+
+ if ((read_cmd && !write_ok) || (write_cmd && !read_ok))
+ return -EFAULT;
+
+ return 0;
+}
#endif /* _HFI1_KERNEL_H */
diff --git a/include/uapi/rdma/hfi/hfi1_user.h b/include/uapi/rdma/hfi/hfi1_user.h
index 46f6caa..37a9697 100644
--- a/include/uapi/rdma/hfi/hfi1_user.h
+++ b/include/uapi/rdma/hfi/hfi1_user.h
@@ -78,6 +78,11 @@
#define HFI1_USER_SWMINOR 0
/*
+ * We will encode the major/minor inside a single 32bit version number.
+ */
+#define HFI1_SWMAJOR_SHIFT 16
+
+/*
* Set of HW and driver capability/feature bits.
* These bit values are used to configure enabled/disabled HW and
* driver features. The same set of bits are communicated to user
@@ -142,6 +147,48 @@
#define HFI1_CMD_EP_READ_RANGE 76 /* read EPROM range */
#define HFI1_CMD_EP_WRITE_RANGE 77 /* write EPROM range */
+/*
+ * User IOCTLs can not go above 128 if they do then see common.h and change the
+ * base for the snoop ioctl
+ */
+#define IB_IOCTL_MAGIC 0x1b /* See Documentation/ioctl/ioctl-number.txt */
+
+struct hfi1_cmd;
+#define HFI1_IOCTL_ASSIGN_CTXT \
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_ASSIGN_CTXT, struct hfi1_user_info)
+#define HFI1_IOCTL_CTXT_INFO \
+ _IOW(IB_IOCTL_MAGIC, HFI1_CMD_CTXT_INFO, struct hfi1_ctxt_info)
+#define HFI1_IOCTL_USER_INFO \
+ _IOW(IB_IOCTL_MAGIC, HFI1_CMD_USER_INFO, struct hfi1_base_info)
+#define HFI1_IOCTL_TID_UPDATE \
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_TID_UPDATE, struct hfi1_tid_info)
+#define HFI1_IOCTL_TID_FREE \
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_TID_FREE, struct hfi1_tid_info)
+#define HFI1_IOCTL_CREDIT_UPD \
+ _IO(IB_IOCTL_MAGIC, HFI1_CMD_CREDIT_UPD)
+#define HFI1_IOCTL_RECV_CTRL \
+ _IOW(IB_IOCTL_MAGIC, HFI1_CMD_RECV_CTRL, int)
+#define HFI1_IOCTL_POLL_TYPE \
+ _IOW(IB_IOCTL_MAGIC, HFI1_CMD_POLL_TYPE, int)
+#define HFI1_IOCTL_ACK_EVENT \
+ _IOW(IB_IOCTL_MAGIC, HFI1_CMD_ACK_EVENT, unsigned long)
+#define HFI1_IOCTL_SET_PKEY \
+ _IOW(IB_IOCTL_MAGIC, HFI1_CMD_SET_PKEY, __u16)
+#define HFI1_IOCTL_CTXT_RESET \
+ _IO(IB_IOCTL_MAGIC, HFI1_CMD_CTXT_RESET)
+#define HFI1_IOCTL_TID_INVAL_READ \
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_TID_INVAL_READ, struct hfi1_tid_info)
+#define HFI1_IOCTL_EP_INFO \
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_INFO, struct hfi1_cmd)
+#define HFI1_IOCTL_EP_ERASE_CHIP \
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_ERASE_CHIP, struct hfi1_cmd)
+#define HFI1_IOCTL_EP_ERASE_RANGE \
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_ERASE_RANGE, struct hfi1_cmd)
+#define HFI1_IOCTL_EP_READ_RANGE \
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_READ_RANGE, struct hfi1_cmd)
+#define HFI1_IOCTL_EP_WRITE_RANGE \
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_WRITE_RANGE, struct hfi1_cmd)
+
#define _HFI1_EVENT_FROZEN_BIT 0
#define _HFI1_EVENT_LINKDOWN_BIT 1
#define _HFI1_EVENT_LID_CHANGE_BIT 2
@@ -242,6 +289,7 @@ struct hfi1_tid_info {
__u32 length;
};
+/* hfi1_cmd is used for EPROM commands only */
struct hfi1_cmd {
__u32 type; /* command type */
__u32 len; /* length of struct pointed to by add */
Remove the write() handler for user space commands now that ioctl
handling is available. User apps will need to change to use ioctl from
this point forward.
Reviewed-by: Mitko Haralanov <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
---
drivers/staging/rdma/hfi1/file_ops.c | 245 ----------------------------------
include/uapi/rdma/hfi/hfi1_user.h | 2
2 files changed, 1 insertions(+), 246 deletions(-)
diff --git a/drivers/staging/rdma/hfi1/file_ops.c b/drivers/staging/rdma/hfi1/file_ops.c
index fab8676..c8c1acd 100644
--- a/drivers/staging/rdma/hfi1/file_ops.c
+++ b/drivers/staging/rdma/hfi1/file_ops.c
@@ -70,8 +70,6 @@
*/
static int hfi1_file_open(struct inode *, struct file *);
static int hfi1_file_close(struct inode *, struct file *);
-static ssize_t hfi1_file_write(struct file *, const char __user *,
- size_t, loff_t *);
static ssize_t hfi1_write_iter(struct kiocb *, struct iov_iter *);
static unsigned int hfi1_poll(struct file *, struct poll_table_struct *);
static int hfi1_file_mmap(struct file *, struct vm_area_struct *);
@@ -100,7 +98,6 @@ static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
static const struct file_operations hfi1_file_ops = {
.owner = THIS_MODULE,
- .write = hfi1_file_write,
.write_iter = hfi1_write_iter,
.open = hfi1_file_open,
.release = hfi1_file_close,
@@ -398,248 +395,6 @@ static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
return ret;
}
-static ssize_t hfi1_file_write(struct file *fp, const char __user *data,
- size_t count, loff_t *offset)
-{
- const struct hfi1_cmd __user *ucmd;
- struct hfi1_filedata *fd = fp->private_data;
- struct hfi1_ctxtdata *uctxt = fd->uctxt;
- struct hfi1_cmd cmd;
- struct hfi1_user_info uinfo;
- struct hfi1_tid_info tinfo;
- unsigned long addr;
- ssize_t consumed = 0, copy = 0, ret = 0;
- void *dest = NULL;
- __u64 user_val = 0;
- int uctxt_required = 1;
- int must_be_root = 0;
-
- if (count < sizeof(cmd)) {
- ret = -EINVAL;
- goto bail;
- }
-
- ucmd = (const struct hfi1_cmd __user *)data;
- if (copy_from_user(&cmd, ucmd, sizeof(cmd))) {
- ret = -EFAULT;
- goto bail;
- }
-
- consumed = sizeof(cmd);
-
- switch (cmd.type) {
- case HFI1_CMD_ASSIGN_CTXT:
- uctxt_required = 0; /* assigned user context not required */
- copy = sizeof(uinfo);
- dest = &uinfo;
- break;
- case HFI1_CMD_CREDIT_UPD:
- copy = 0;
- break;
- case HFI1_CMD_TID_UPDATE:
- case HFI1_CMD_TID_FREE:
- case HFI1_CMD_TID_INVAL_READ:
- copy = sizeof(tinfo);
- dest = &tinfo;
- break;
- case HFI1_CMD_USER_INFO:
- case HFI1_CMD_RECV_CTRL:
- case HFI1_CMD_POLL_TYPE:
- case HFI1_CMD_ACK_EVENT:
- case HFI1_CMD_CTXT_INFO:
- case HFI1_CMD_SET_PKEY:
- case HFI1_CMD_CTXT_RESET:
- copy = 0;
- user_val = cmd.addr;
- break;
- case HFI1_CMD_EP_INFO:
- case HFI1_CMD_EP_ERASE_CHIP:
- case HFI1_CMD_EP_ERASE_RANGE:
- case HFI1_CMD_EP_READ_RANGE:
- case HFI1_CMD_EP_WRITE_RANGE:
- uctxt_required = 0; /* assigned user context not required */
- must_be_root = 1; /* validate user */
- copy = 0;
- break;
- default:
- ret = -EINVAL;
- goto bail;
- }
-
- /* If the command comes with user data, copy it. */
- if (copy) {
- if (copy_from_user(dest, (void __user *)cmd.addr, copy)) {
- ret = -EFAULT;
- goto bail;
- }
- consumed += copy;
- }
-
- /*
- * Make sure there is a uctxt when needed.
- */
- if (uctxt_required && !uctxt) {
- ret = -EINVAL;
- goto bail;
- }
-
- /* only root can do these operations */
- if (must_be_root && !capable(CAP_SYS_ADMIN)) {
- ret = -EPERM;
- goto bail;
- }
-
- switch (cmd.type) {
- case HFI1_CMD_ASSIGN_CTXT:
- ret = assign_ctxt(fp, &uinfo);
- if (ret < 0)
- goto bail;
- ret = setup_ctxt(fp);
- if (ret)
- goto bail;
- ret = user_init(fp);
- break;
- case HFI1_CMD_CTXT_INFO:
- ret = get_ctxt_info(fp, (void __user *)(unsigned long)
- user_val, cmd.len);
- break;
- case HFI1_CMD_USER_INFO:
- ret = get_base_info(fp, (void __user *)(unsigned long)
- user_val, cmd.len);
- break;
- case HFI1_CMD_CREDIT_UPD:
- if (uctxt && uctxt->sc)
- sc_return_credits(uctxt->sc);
- break;
- case HFI1_CMD_TID_UPDATE:
- ret = hfi1_user_exp_rcv_setup(fp, &tinfo);
- if (!ret) {
- /*
- * Copy the number of tidlist entries we used
- * and the length of the buffer we registered.
- * These fields are adjacent in the structure so
- * we can copy them at the same time.
- */
- addr = (unsigned long)cmd.addr +
- offsetof(struct hfi1_tid_info, tidcnt);
- if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
- sizeof(tinfo.tidcnt) +
- sizeof(tinfo.length)))
- ret = -EFAULT;
- }
- break;
- case HFI1_CMD_TID_INVAL_READ:
- ret = hfi1_user_exp_rcv_invalid(fp, &tinfo);
- if (ret)
- break;
- addr = (unsigned long)cmd.addr +
- offsetof(struct hfi1_tid_info, tidcnt);
- if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
- sizeof(tinfo.tidcnt)))
- ret = -EFAULT;
- break;
- case HFI1_CMD_TID_FREE:
- ret = hfi1_user_exp_rcv_clear(fp, &tinfo);
- if (ret)
- break;
- addr = (unsigned long)cmd.addr +
- offsetof(struct hfi1_tid_info, tidcnt);
- if (copy_to_user((void __user *)addr, &tinfo.tidcnt,
- sizeof(tinfo.tidcnt)))
- ret = -EFAULT;
- break;
- case HFI1_CMD_RECV_CTRL:
- ret = manage_rcvq(uctxt, fd->subctxt, (int)user_val);
- break;
- case HFI1_CMD_POLL_TYPE:
- uctxt->poll_type = (typeof(uctxt->poll_type))user_val;
- break;
- case HFI1_CMD_ACK_EVENT:
- ret = user_event_ack(uctxt, fd->subctxt, user_val);
- break;
- case HFI1_CMD_SET_PKEY:
- if (HFI1_CAP_IS_USET(PKEY_CHECK))
- ret = set_ctxt_pkey(uctxt, fd->subctxt, user_val);
- else
- ret = -EPERM;
- break;
- case HFI1_CMD_CTXT_RESET: {
- struct send_context *sc;
- struct hfi1_devdata *dd;
-
- if (!uctxt || !uctxt->dd || !uctxt->sc) {
- ret = -EINVAL;
- break;
- }
- /*
- * There is no protection here. User level has to
- * guarantee that no one will be writing to the send
- * context while it is being re-initialized.
- * If user level breaks that guarantee, it will break
- * it's own context and no one else's.
- */
- dd = uctxt->dd;
- sc = uctxt->sc;
- /*
- * Wait until the interrupt handler has marked the
- * context as halted or frozen. Report error if we time
- * out.
- */
- wait_event_interruptible_timeout(
- sc->halt_wait, (sc->flags & SCF_HALTED),
- msecs_to_jiffies(SEND_CTXT_HALT_TIMEOUT));
- if (!(sc->flags & SCF_HALTED)) {
- ret = -ENOLCK;
- break;
- }
- /*
- * If the send context was halted due to a Freeze,
- * wait until the device has been "unfrozen" before
- * resetting the context.
- */
- if (sc->flags & SCF_FROZEN) {
- wait_event_interruptible_timeout(
- dd->event_queue,
- !(ACCESS_ONCE(dd->flags) & HFI1_FROZEN),
- msecs_to_jiffies(SEND_CTXT_HALT_TIMEOUT));
- if (dd->flags & HFI1_FROZEN) {
- ret = -ENOLCK;
- break;
- }
- if (dd->flags & HFI1_FORCED_FREEZE) {
- /*
- * Don't allow context reset if we are into
- * forced freeze
- */
- ret = -ENODEV;
- break;
- }
- sc_disable(sc);
- ret = sc_enable(sc);
- hfi1_rcvctrl(dd, HFI1_RCVCTRL_CTXT_ENB,
- uctxt->ctxt);
- } else {
- ret = sc_restart(sc);
- }
- if (!ret)
- sc_return_credits(sc);
- break;
- }
- case HFI1_CMD_EP_INFO:
- case HFI1_CMD_EP_ERASE_CHIP:
- case HFI1_CMD_EP_ERASE_RANGE:
- case HFI1_CMD_EP_READ_RANGE:
- case HFI1_CMD_EP_WRITE_RANGE:
- ret = handle_eprom_command(fp, &cmd);
- break;
- }
-
- if (ret >= 0)
- ret = consumed;
-bail:
- return ret;
-}
-
static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from)
{
struct hfi1_filedata *fd = kiocb->ki_filp->private_data;
diff --git a/include/uapi/rdma/hfi/hfi1_user.h b/include/uapi/rdma/hfi/hfi1_user.h
index 37a9697..383d036 100644
--- a/include/uapi/rdma/hfi/hfi1_user.h
+++ b/include/uapi/rdma/hfi/hfi1_user.h
@@ -66,7 +66,7 @@
* The major version changes when data structures change in an incompatible
* way. The driver must be the same for initialization to succeed.
*/
-#define HFI1_USER_SWMAJOR 5
+#define HFI1_USER_SWMAJOR 6
/*
* Minor version differences are always compatible
Add a trace message to HFI1s user IOCTL handling. This allows debugging
of which IOCTLs are being handled by the driver.
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
---
drivers/staging/rdma/hfi1/file_ops.c | 2 ++
drivers/staging/rdma/hfi1/trace.c | 1 +
drivers/staging/rdma/hfi1/trace.h | 1 +
3 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/drivers/staging/rdma/hfi1/file_ops.c b/drivers/staging/rdma/hfi1/file_ops.c
index c8c1acd..26f3d8f 100644
--- a/drivers/staging/rdma/hfi1/file_ops.c
+++ b/drivers/staging/rdma/hfi1/file_ops.c
@@ -193,6 +193,8 @@ static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
if (ret)
return ret;
+ hfi1_cdbg(IOCTL, "IOCTL recv: 0x%x", cmd);
+
switch (cmd) {
case HFI1_IOCTL_ASSIGN_CTXT:
uctxt_required = 0; /* assigned user context not required */
diff --git a/drivers/staging/rdma/hfi1/trace.c b/drivers/staging/rdma/hfi1/trace.c
index 8b62fef..caddb2a 100644
--- a/drivers/staging/rdma/hfi1/trace.c
+++ b/drivers/staging/rdma/hfi1/trace.c
@@ -233,3 +233,4 @@ __hfi1_trace_fn(FIRMWARE);
__hfi1_trace_fn(RCVCTRL);
__hfi1_trace_fn(TID);
__hfi1_trace_fn(MMU);
+__hfi1_trace_fn(IOCTL);
diff --git a/drivers/staging/rdma/hfi1/trace.h b/drivers/staging/rdma/hfi1/trace.h
index 963dc94..3cb0391 100644
--- a/drivers/staging/rdma/hfi1/trace.h
+++ b/drivers/staging/rdma/hfi1/trace.h
@@ -1341,6 +1341,7 @@ __hfi1_trace_def(FIRMWARE);
__hfi1_trace_def(RCVCTRL);
__hfi1_trace_def(TID);
__hfi1_trace_def(MMU);
+__hfi1_trace_def(IOCTL);
#define hfi1_cdbg(which, fmt, ...) \
__hfi1_trace_##which(__func__, fmt, ##__VA_ARGS__)
Consolidate all the IOCTL defines which are supported by the driver into
a common location. This header file is made available to user space and
will now allow user applications to determine IOCTL numbers directly.
Reviewed-by: Mitko Haralanov <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
---
drivers/staging/rdma/hfi1/common.h | 3 --
drivers/staging/rdma/hfi1/diag.c | 50 +-------------------------------
include/uapi/rdma/hfi/hfi1_user.h | 56 +++++++++++++++++++++++++++++++++---
3 files changed, 53 insertions(+), 56 deletions(-)
diff --git a/drivers/staging/rdma/hfi1/common.h b/drivers/staging/rdma/hfi1/common.h
index 37ac229..b729445 100644
--- a/drivers/staging/rdma/hfi1/common.h
+++ b/drivers/staging/rdma/hfi1/common.h
@@ -350,9 +350,6 @@ struct hfi1_message_header {
#define HFI1_BECN_MASK 1
#define HFI1_BECN_SMASK BIT(HFI1_BECN_SHIFT)
-#define HFI1_PSM_IOC_BASE_SEQ 0x0
-#define HFI1_SNOOP_IOC_BASE_SEQ 0x80 /* leaves plenty of room for psm */
-
static inline __u64 rhf_to_cpu(const __le32 *rbuf)
{
return __le64_to_cpu(*((__le64 *)rbuf));
diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 4cf6e8d..776ccee 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -64,6 +64,7 @@
#include <linux/uaccess.h>
#include <linux/module.h>
#include <rdma/ib_smi.h>
+#include <rdma/hfi/hfi1_user.h>
#include "hfi.h"
#include "device.h"
#include "common.h"
@@ -142,60 +143,11 @@ static const struct file_operations diagpkt_file_ops = {
};
/*
- * This is used for communication with user space for snoop extended IOCTLs
- */
-struct hfi1_link_info {
- __be64 node_guid;
- u8 port_mode;
- u8 port_state;
- u16 link_speed_active;
- u16 link_width_active;
- u16 vl15_init;
- u8 port_number;
- /*
- * Add padding to make this a full IB SMP payload. Note: changing the
- * size of this structure will make the IOCTLs created with _IOWR
- * change.
- * Be sure to run tests on all IOCTLs when making changes to this
- * structure.
- */
- u8 res[47];
-};
-
-/*
* This starts our ioctl sequence numbers *way* off from the ones
* defined in ib_core.
*/
#define SNOOP_CAPTURE_VERSION 0x1
-#define HFI1_SNOOP_IOC_MAGIC IB_IOCTL_MAGIC
-
-#define HFI1_SNOOP_IOCGETLINKSTATE \
- _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ)
-#define HFI1_SNOOP_IOCSETLINKSTATE \
- _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 1)
-#define HFI1_SNOOP_IOCCLEARQUEUE \
- _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 2)
-#define HFI1_SNOOP_IOCCLEARFILTER \
- _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 3)
-#define HFI1_SNOOP_IOCSETFILTER \
- _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 4)
-#define HFI1_SNOOP_IOCGETVERSION \
- _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 5)
-#define HFI1_SNOOP_IOCSET_OPTS \
- _IO(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 6)
-
-/*
- * These offsets +6/+7 could change, but these are already known and used
- * IOCTL numbers so don't change them without a good reason.
- */
-#define HFI1_SNOOP_IOCGETLINKSTATE_EXTRA \
- _IOWR(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 6, \
- struct hfi1_link_info)
-#define HFI1_SNOOP_IOCSETLINKSTATE_EXTRA \
- _IOWR(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 7, \
- struct hfi1_link_info)
-
static int hfi1_snoop_open(struct inode *in, struct file *fp);
static ssize_t hfi1_snoop_read(struct file *fp, char __user *data,
size_t pkt_len, loff_t *off);
diff --git a/include/uapi/rdma/hfi/hfi1_user.h b/include/uapi/rdma/hfi/hfi1_user.h
index 383d036..5503451 100644
--- a/include/uapi/rdma/hfi/hfi1_user.h
+++ b/include/uapi/rdma/hfi/hfi1_user.h
@@ -147,13 +147,11 @@
#define HFI1_CMD_EP_READ_RANGE 76 /* read EPROM range */
#define HFI1_CMD_EP_WRITE_RANGE 77 /* write EPROM range */
-/*
- * User IOCTLs can not go above 128 if they do then see common.h and change the
- * base for the snoop ioctl
- */
#define IB_IOCTL_MAGIC 0x1b /* See Documentation/ioctl/ioctl-number.txt */
struct hfi1_cmd;
+#define HFI1_PSM_IOC_BASE_SEQ 0x0
+
#define HFI1_IOCTL_ASSIGN_CTXT \
_IOWR(IB_IOCTL_MAGIC, HFI1_CMD_ASSIGN_CTXT, struct hfi1_user_info)
#define HFI1_IOCTL_CTXT_INFO \
@@ -189,6 +187,56 @@ struct hfi1_cmd;
#define HFI1_IOCTL_EP_WRITE_RANGE \
_IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_WRITE_RANGE, struct hfi1_cmd)
+#define HFI1_SNOOP_IOC_BASE_SEQ 0x80 /* leaves plenty of room for psm */
+#define HFI1_SNOOP_IOC_MAGIC IB_IOCTL_MAGIC
+
+#define HFI1_SNOOP_IOCGETLINKSTATE \
+ _IO(IB_IOCTL_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ)
+#define HFI1_SNOOP_IOCSETLINKSTATE \
+ _IO(IB_IOCTL_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 1)
+#define HFI1_SNOOP_IOCCLEARQUEUE \
+ _IO(IB_IOCTL_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 2)
+#define HFI1_SNOOP_IOCCLEARFILTER \
+ _IO(IB_IOCTL_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 3)
+#define HFI1_SNOOP_IOCSETFILTER \
+ _IO(IB_IOCTL_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 4)
+#define HFI1_SNOOP_IOCGETVERSION \
+ _IO(IB_IOCTL_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 5)
+#define HFI1_SNOOP_IOCSET_OPTS \
+ _IO(IB_IOCTL_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 6)
+
+/*
+ * This is used for communication with user space for snoop extended IOCTLs
+ */
+struct hfi1_link_info {
+ __be64 node_guid;
+ __u8 port_mode;
+ __u8 port_state;
+ __u16 link_speed_active;
+ __u16 link_width_active;
+ __u16 vl15_init;
+ __u8 port_number;
+ /*
+ * Add padding to make this a full IB SMP payload. Note: changing the
+ * size of this structure will make the IOCTLs created with _IOWR
+ * change.
+ * Be sure to run tests on all IOCTLs when making changes to this
+ * structure.
+ */
+ __u8 res[47];
+};
+
+/*
+ * These offsets +6/+7 could change, but these are already known and used
+ * IOCTL numbers so don't change them without a good reason.
+ */
+#define HFI1_SNOOP_IOCGETLINKSTATE_EXTRA \
+ _IOWR(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 6, \
+ struct hfi1_link_info)
+#define HFI1_SNOOP_IOCSETLINKSTATE_EXTRA \
+ _IOWR(HFI1_SNOOP_IOC_MAGIC, HFI1_SNOOP_IOC_BASE_SEQ + 7, \
+ struct hfi1_link_info)
+
#define _HFI1_EVENT_FROZEN_BIT 0
#define _HFI1_EVENT_LINKDOWN_BIT 1
#define _HFI1_EVENT_LID_CHANGE_BIT 2
The eprom writing capability of the driver is currently residing in the
same handler for applications that wish to send/receive packets. This is
better suited as a diagnostic device with its own device file.
Create an eprom device file and add its own IOCTL handling.
Reviewed-by: Mitko Haralanov <[email protected]>
Reviewed-by: Mike Marciniszyn <[email protected]>
Reviewed-by: Dean Luick <[email protected]>
Signed-off-by: Dennis Dalessandro <[email protected]>
---
drivers/staging/rdma/hfi1/diag.c | 107 ++++++++++++++++++++++++++++++
drivers/staging/rdma/hfi1/eprom.c | 121 ++--------------------------------
drivers/staging/rdma/hfi1/eprom.h | 16 ++++
drivers/staging/rdma/hfi1/file_ops.c | 23 ------
drivers/staging/rdma/hfi1/hfi.h | 8 ++
include/uapi/rdma/hfi/hfi1_user.h | 21 +++---
6 files changed, 146 insertions(+), 150 deletions(-)
diff --git a/drivers/staging/rdma/hfi1/diag.c b/drivers/staging/rdma/hfi1/diag.c
index 776ccee..0947d29 100644
--- a/drivers/staging/rdma/hfi1/diag.c
+++ b/drivers/staging/rdma/hfi1/diag.c
@@ -70,6 +70,7 @@
#include "common.h"
#include "verbs_txreq.h"
#include "trace.h"
+#include "eprom.h"
#undef pr_fmt
#define pr_fmt(fmt) DRIVER_NAME ": " fmt
@@ -157,6 +158,9 @@ static long hfi1_ioctl(struct file *fp, unsigned int cmd, unsigned long arg);
static unsigned int hfi1_snoop_poll(struct file *fp,
struct poll_table_struct *wait);
static int hfi1_snoop_release(struct inode *in, struct file *fp);
+static int hfi1_eprom_open(struct inode *in, struct file *fp);
+static long hfi1_eprom_ioctl(struct file *fp, unsigned int cmd,
+ unsigned long arg);
struct hfi1_packet_filter_command {
int opcode;
@@ -189,6 +193,12 @@ static const struct file_operations snoop_file_ops = {
.release = hfi1_snoop_release
};
+static const struct file_operations eprom_file_ops = {
+ .owner = THIS_MODULE,
+ .open = hfi1_eprom_open,
+ .unlocked_ioctl = hfi1_eprom_ioctl
+};
+
struct hfi1_filter_array {
int (*filter)(void *, void *, void *);
};
@@ -236,6 +246,13 @@ int hfi1_diag_add(struct hfi1_devdata *dd)
if (ret)
dd_dev_err(dd, "Unable to init snoop/capture device");
+ snprintf(name, sizeof(name), "%s_eprom%d", class_name(),
+ dd->unit);
+ ret = hfi1_cdev_init(HFI1_EPROM_BASE + dd->unit, name,
+ &eprom_file_ops,
+ &dd->hfi1_eprom.cdev, &dd->hfi1_eprom.class_dev,
+ false);
+
snprintf(name, sizeof(name), "%s_diagpkt", class_name());
if (atomic_inc_return(&diagpkt_count) == 1) {
ret = hfi1_cdev_init(HFI1_DIAGPKT_MINOR, name,
@@ -275,6 +292,7 @@ void hfi1_diag_remove(struct hfi1_devdata *dd)
if (atomic_dec_and_test(&diagpkt_count))
hfi1_cdev_cleanup(&diagpkt_cdev, &diagpkt_device);
hfi1_cdev_cleanup(&dd->diag_cdev, &dd->diag_device);
+ hfi1_cdev_cleanup(&dd->hfi1_eprom.cdev, &dd->hfi1_eprom.class_dev);
}
/*
@@ -1868,3 +1886,92 @@ void snoop_inline_pio_send(struct hfi1_devdata *dd, struct pio_buf *pbuf,
inline_pio_out:
pio_copy(dd, pbuf, pbc, from, count);
}
+
+static int hfi1_eprom_open(struct inode *in, struct file *fp)
+{
+ return 0;
+}
+
+static long hfi1_eprom_ioctl(struct file *fp, unsigned int cmd,
+ unsigned long arg)
+{
+ struct hfi1_devdata *dd = NULL;
+ int read_cmd, write_cmd, read_ok, write_ok;
+ struct hfi1_eprom_cmd ec;
+ u32 dev_id, rlen, rstart;
+ int ret;
+ int unit;
+
+ hfi1_cdbg(IOCTL, "IOCTL recv: 0x%x", cmd);
+
+ if (check_ioctl_access(cmd, arg))
+ return -EFAULT;
+
+ read_cmd = _IOC_DIR(cmd) & _IOC_READ;
+ write_cmd = _IOC_DIR(cmd) & _IOC_WRITE;
+ write_ok = access_ok(VERIFY_WRITE, (void __user *)arg, _IOC_SIZE(cmd));
+ read_ok = access_ok(VERIFY_READ, (void __user *)arg, _IOC_SIZE(cmd));
+
+ if ((read_cmd && !write_ok) || (write_cmd && !read_ok))
+ return -EFAULT;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ /* All commands need user struct except erase chip */
+ if (cmd != HFI1_IOCTL_EP_ERASE_CHIP) {
+ memset(&ec, 0, sizeof(ec));
+ if (copy_from_user(&ec, (struct hfi1_eprom_cmd __user *)arg,
+ sizeof(ec)))
+ return -EFAULT;
+ }
+
+ unit = iminor(fp->f_inode) - HFI1_EPROM_BASE;
+ dd = hfi1_lookup(unit);
+ if (!dd)
+ return -ENODEV;
+
+ /* some devices do not have an EPROM */
+ if (!dd->eprom_available)
+ return -EOPNOTSUPP;
+
+ ret = acquire_chip_resource(dd, CR_EPROM, EPROM_TIMEOUT);
+ if (ret) {
+ dd_dev_err(dd, "%s: unable to acquire EPROM resource\n",
+ __func__);
+ return ret;
+ }
+
+ switch (cmd) {
+ case HFI1_IOCTL_EP_INFO:
+ dev_id = hfi1_eprom_read_device_id(dd);
+ if (copy_to_user((void __user *)ec.addr, &dev_id,
+ sizeof(dev_id)))
+ ret = -EFAULT;
+ break;
+ case HFI1_IOCTL_EP_ERASE_CHIP:
+ ret = hfi1_eprom_erase_chip(dd);
+ break;
+ case HFI1_IOCTL_EP_ERASE_RANGE:
+ rlen = ec.length;
+ rstart = ec.start;
+ ret = hfi1_eprom_erase_range(dd, rstart, rlen);
+ break;
+ case HFI1_IOCTL_EP_READ_RANGE:
+ rlen = ec.length;
+ rstart = ec.start;
+ ret = hfi1_eprom_read_length(dd, rstart, rlen, ec.addr);
+ break;
+ case HFI1_IOCTL_EP_WRITE_RANGE:
+ rlen = ec.length;
+ rstart = ec.start;
+ ret = hfi1_eprom_write_length(dd, rstart, rlen, ec.addr);
+ break;
+ default:
+ dd_dev_err(dd, "%s: unexpected command %d\n",
+ __func__, cmd);
+ ret = -EINVAL;
+ }
+ release_chip_resource(dd, CR_EPROM);
+ return ret;
+}
diff --git a/drivers/staging/rdma/hfi1/eprom.c b/drivers/staging/rdma/hfi1/eprom.c
index bd87715..73d7104 100644
--- a/drivers/staging/rdma/hfi1/eprom.c
+++ b/drivers/staging/rdma/hfi1/eprom.c
@@ -102,13 +102,6 @@
#define EPROM_WP_N BIT_ULL(14) /* EPROM write line */
/*
- * How long to wait for the EPROM to become available, in ms.
- * The spec 32 Mb EPROM takes around 40s to erase then write.
- * Double it for safety.
- */
-#define EPROM_TIMEOUT 80000 /* ms */
-
-/*
* Turn on external enable line that allows writing on the flash.
*/
static void write_enable(struct hfi1_devdata *dd)
@@ -166,7 +159,7 @@ static int wait_for_not_busy(struct hfi1_devdata *dd)
/*
* Read the device ID from the SPI controller.
*/
-static u32 read_device_id(struct hfi1_devdata *dd)
+u32 hfi1_eprom_read_device_id(struct hfi1_devdata *dd)
{
/* read the Manufacture Device ID */
write_csr(dd, ASIC_EEP_ADDR_CMD, CMD_READ_MANUF_DEV_ID);
@@ -176,7 +169,7 @@ static u32 read_device_id(struct hfi1_devdata *dd)
/*
* Erase the whole flash.
*/
-static int erase_chip(struct hfi1_devdata *dd)
+int hfi1_eprom_erase_chip(struct hfi1_devdata *dd)
{
int ret;
@@ -194,7 +187,7 @@ static int erase_chip(struct hfi1_devdata *dd)
/*
* Erase a range.
*/
-static int erase_range(struct hfi1_devdata *dd, u32 start, u32 len)
+int hfi1_eprom_erase_range(struct hfi1_devdata *dd, u32 start, u32 len)
{
u32 end = start + len;
int ret = 0;
@@ -257,7 +250,8 @@ static void read_page(struct hfi1_devdata *dd, u32 offset, u32 *result)
/*
* Read length bytes starting at offset. Copy to user address addr.
*/
-static int read_length(struct hfi1_devdata *dd, u32 start, u32 len, u64 addr)
+int hfi1_eprom_read_length(struct hfi1_devdata *dd, u32 start, u32 len,
+ u64 addr)
{
u32 offset;
u32 buffer[EP_PAGE_SIZE / sizeof(u32)];
@@ -300,7 +294,8 @@ static int write_page(struct hfi1_devdata *dd, u32 offset, u32 *data)
/*
* Write length bytes starting at offset. Read from user address addr.
*/
-static int write_length(struct hfi1_devdata *dd, u32 start, u32 len, u64 addr)
+int hfi1_eprom_write_length(struct hfi1_devdata *dd, u32 start, u32 len,
+ u64 addr)
{
u32 offset;
u32 buffer[EP_PAGE_SIZE / sizeof(u32)];
@@ -328,108 +323,6 @@ done:
return ret;
}
-/* convert an range composite to a length, in bytes */
-static inline u32 extract_rlen(u32 composite)
-{
- return (composite & 0xffff) * EP_PAGE_SIZE;
-}
-
-/* convert an range composite to a start, in bytes */
-static inline u32 extract_rstart(u32 composite)
-{
- return (composite >> 16) * EP_PAGE_SIZE;
-}
-
-/*
- * Perform the given operation on the EPROM. Called from user space. The
- * user credentials have already been checked.
- *
- * Return 0 on success, -ERRNO on error
- */
-int handle_eprom_command(struct file *fp, const struct hfi1_cmd *cmd)
-{
- struct hfi1_devdata *dd;
- u32 dev_id;
- u32 rlen; /* range length */
- u32 rstart; /* range start */
- int i_minor;
- int ret = 0;
-
- /*
- * Map the device file to device data using the relative minor.
- * The device file minor number is the unit number + 1. 0 is
- * the generic device file - reject it.
- */
- i_minor = iminor(file_inode(fp)) - HFI1_USER_MINOR_BASE;
- if (i_minor <= 0)
- return -EINVAL;
- dd = hfi1_lookup(i_minor - 1);
- if (!dd) {
- pr_err("%s: cannot find unit %d!\n", __func__, i_minor);
- return -EINVAL;
- }
-
- /* some devices do not have an EPROM */
- if (!dd->eprom_available)
- return -EOPNOTSUPP;
-
- ret = acquire_chip_resource(dd, CR_EPROM, EPROM_TIMEOUT);
- if (ret) {
- dd_dev_err(dd, "%s: unable to acquire EPROM resource\n",
- __func__);
- goto done_asic;
- }
-
- dd_dev_info(dd, "%s: cmd: type %d, len 0x%x, addr 0x%016llx\n",
- __func__, cmd->type, cmd->len, cmd->addr);
-
- switch (cmd->type) {
- case HFI1_CMD_EP_INFO:
- if (cmd->len != sizeof(u32)) {
- ret = -ERANGE;
- break;
- }
- dev_id = read_device_id(dd);
- /* addr points to a u32 user buffer */
- if (copy_to_user((void __user *)cmd->addr, &dev_id,
- sizeof(u32)))
- ret = -EFAULT;
- break;
-
- case HFI1_CMD_EP_ERASE_CHIP:
- ret = erase_chip(dd);
- break;
-
- case HFI1_CMD_EP_ERASE_RANGE:
- rlen = extract_rlen(cmd->len);
- rstart = extract_rstart(cmd->len);
- ret = erase_range(dd, rstart, rlen);
- break;
-
- case HFI1_CMD_EP_READ_RANGE:
- rlen = extract_rlen(cmd->len);
- rstart = extract_rstart(cmd->len);
- ret = read_length(dd, rstart, rlen, cmd->addr);
- break;
-
- case HFI1_CMD_EP_WRITE_RANGE:
- rlen = extract_rlen(cmd->len);
- rstart = extract_rstart(cmd->len);
- ret = write_length(dd, rstart, rlen, cmd->addr);
- break;
-
- default:
- dd_dev_err(dd, "%s: unexpected command %d\n",
- __func__, cmd->type);
- ret = -EINVAL;
- break;
- }
-
- release_chip_resource(dd, CR_EPROM);
-done_asic:
- return ret;
-}
-
/*
* Initialize the EPROM handler.
*/
diff --git a/drivers/staging/rdma/hfi1/eprom.h b/drivers/staging/rdma/hfi1/eprom.h
index d41f0b1..60c1e08 100644
--- a/drivers/staging/rdma/hfi1/eprom.h
+++ b/drivers/staging/rdma/hfi1/eprom.h
@@ -45,8 +45,20 @@
*
*/
-struct hfi1_cmd;
+/*
+ * How long to wait for the EPROM to become available, in ms.
+ * The spec 32 Mb EPROM takes around 40s to erase then write.
+ * Double it for safety.
+ */
+#define EPROM_TIMEOUT 80000 /* ms */
+
struct hfi1_devdata;
int eprom_init(struct hfi1_devdata *dd);
-int handle_eprom_command(struct file *fp, const struct hfi1_cmd *cmd);
+u32 hfi1_eprom_read_device_id(struct hfi1_devdata *dd);
+int hfi1_eprom_erase_chip(struct hfi1_devdata *dd);
+int hfi1_eprom_read_length(struct hfi1_devdata *dd, u32 start, u32 len,
+ u64 addr);
+int hfi1_eprom_write_length(struct hfi1_devdata *dd, u32 start, u32 len,
+ u64 addr);
+int hfi1_eprom_erase_range(struct hfi1_devdata *dd, u32 start, u32 len);
diff --git a/drivers/staging/rdma/hfi1/file_ops.c b/drivers/staging/rdma/hfi1/file_ops.c
index 26f3d8f..752a927 100644
--- a/drivers/staging/rdma/hfi1/file_ops.c
+++ b/drivers/staging/rdma/hfi1/file_ops.c
@@ -181,8 +181,6 @@ static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
struct hfi1_ctxtdata *uctxt = fd->uctxt;
struct hfi1_user_info uinfo;
struct hfi1_tid_info tinfo;
- struct hfi1_cmd ucmd;
- int uctxt_required = 1;
int ret = 0;
unsigned long addr;
int uval = 0;
@@ -195,28 +193,9 @@ static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
hfi1_cdbg(IOCTL, "IOCTL recv: 0x%x", cmd);
- switch (cmd) {
- case HFI1_IOCTL_ASSIGN_CTXT:
- uctxt_required = 0; /* assigned user context not required */
- break;
- case HFI1_IOCTL_EP_INFO:
- case HFI1_IOCTL_EP_ERASE_CHIP:
- case HFI1_IOCTL_EP_ERASE_RANGE:
- case HFI1_IOCTL_EP_READ_RANGE:
- case HFI1_IOCTL_EP_WRITE_RANGE:
- if (!capable(CAP_SYS_ADMIN))
- return -EPERM;
- if (copy_from_user(&ucmd,
- (struct hfi11_cmd __user *)arg,
- sizeof(ucmd)))
- return -EFAULT;
- return handle_eprom_command(fp, &ucmd);
- }
-
- if (uctxt_required && !uctxt)
+ if (cmd != HFI1_IOCTL_ASSIGN_CTXT && !uctxt)
return -EINVAL;
- /* Checked for root/context process the IOCTL */
switch (cmd) {
case HFI1_IOCTL_ASSIGN_CTXT:
if (copy_from_user(&uinfo,
diff --git a/drivers/staging/rdma/hfi1/hfi.h b/drivers/staging/rdma/hfi1/hfi.h
index 7351898..81b2028 100644
--- a/drivers/staging/rdma/hfi1/hfi.h
+++ b/drivers/staging/rdma/hfi1/hfi.h
@@ -387,6 +387,11 @@ struct hfi1_snoop_data {
u64 dcc_cfg; /* saved value of DCC Cfg register */
};
+struct hfi1_eprom_data {
+ struct cdev cdev;
+ struct device *class_dev;
+};
+
/* snoop mode_flag values */
#define HFI1_PORT_SNOOP_MODE 1U
#define HFI1_PORT_CAPTURE_MODE 2U
@@ -1098,6 +1103,7 @@ struct hfi1_devdata {
size_t portcntrnameslen;
struct hfi1_snoop_data hfi1_snoop;
+ struct hfi1_eprom_data hfi1_eprom;
struct err_info_rcvport err_info_rcvport;
struct err_info_constraint err_info_rcv_constraint;
@@ -1770,8 +1776,8 @@ extern struct mutex hfi1_mutex;
#define HFI1_DIAGPKT_MINOR 128
#define HFI1_DIAG_MINOR_BASE 129
#define HFI1_SNOOP_CAPTURE_BASE 200
+#define HFI1_EPROM_BASE 220
#define HFI1_NMINORS 255
-
#define PCI_VENDOR_ID_INTEL 0x8086
#define PCI_DEVICE_ID_INTEL0 0x24f0
#define PCI_DEVICE_ID_INTEL1 0x24f1
diff --git a/include/uapi/rdma/hfi/hfi1_user.h b/include/uapi/rdma/hfi/hfi1_user.h
index 5503451..a486eec 100644
--- a/include/uapi/rdma/hfi/hfi1_user.h
+++ b/include/uapi/rdma/hfi/hfi1_user.h
@@ -149,7 +149,7 @@
#define IB_IOCTL_MAGIC 0x1b /* See Documentation/ioctl/ioctl-number.txt */
-struct hfi1_cmd;
+struct hfi1_eprom_cmd;
#define HFI1_PSM_IOC_BASE_SEQ 0x0
#define HFI1_IOCTL_ASSIGN_CTXT \
@@ -177,15 +177,15 @@ struct hfi1_cmd;
#define HFI1_IOCTL_TID_INVAL_READ \
_IOWR(IB_IOCTL_MAGIC, HFI1_CMD_TID_INVAL_READ, struct hfi1_tid_info)
#define HFI1_IOCTL_EP_INFO \
- _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_INFO, struct hfi1_cmd)
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_INFO, struct hfi1_eprom_cmd)
#define HFI1_IOCTL_EP_ERASE_CHIP \
- _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_ERASE_CHIP, struct hfi1_cmd)
+ _IO(IB_IOCTL_MAGIC, HFI1_CMD_EP_ERASE_CHIP)
#define HFI1_IOCTL_EP_ERASE_RANGE \
- _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_ERASE_RANGE, struct hfi1_cmd)
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_ERASE_RANGE, struct hfi1_eprom_cmd)
#define HFI1_IOCTL_EP_READ_RANGE \
- _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_READ_RANGE, struct hfi1_cmd)
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_READ_RANGE, struct hfi1_eprom_cmd)
#define HFI1_IOCTL_EP_WRITE_RANGE \
- _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_WRITE_RANGE, struct hfi1_cmd)
+ _IOWR(IB_IOCTL_MAGIC, HFI1_CMD_EP_WRITE_RANGE, struct hfi1_eprom_cmd)
#define HFI1_SNOOP_IOC_BASE_SEQ 0x80 /* leaves plenty of room for psm */
#define HFI1_SNOOP_IOC_MAGIC IB_IOCTL_MAGIC
@@ -337,11 +337,10 @@ struct hfi1_tid_info {
__u32 length;
};
-/* hfi1_cmd is used for EPROM commands only */
-struct hfi1_cmd {
- __u32 type; /* command type */
- __u32 len; /* length of struct pointed to by add */
- __u64 addr; /* pointer to user structure */
+struct hfi1_eprom_cmd {
+ __u32 start; /* start for a range request */
+ __u32 length; /* length of a range request */
+ __u64 addr; /* pointer to user memory */
};
enum hfi1_sdma_comp_state {
On Thu, Apr 14, 2016 at 08:41:35AM -0700, Dennis Dalessandro wrote:
> This patch series removes the write() interface for user access in favor of an
> ioctl() based approach. This is in response to the complaint that we had
> different handlers for write() and writev() doing different things and expecting
> different types of data. See:
I think we should wait on applying these patches until we globally sort out
what to do with the rdma uapi.
It just doesn't make alot of sense for drivers to have their own personal
char devices. :(
A second char dev for the eeprom? How is that OK? Why aren't you using
the I2C layer for this?
Why is there a snoop interface in here? How is that not something that
belongs in a the core code?
Jason
On Thu, Apr 14, 2016 at 10:45:50AM -0600, Jason Gunthorpe wrote:
> On Thu, Apr 14, 2016 at 08:41:35AM -0700, Dennis Dalessandro wrote:
> > This patch series removes the write() interface for user access in favor of an
> > ioctl() based approach. This is in response to the complaint that we had
> > different handlers for write() and writev() doing different things and expecting
> > different types of data. See:
>
> I think we should wait on applying these patches until we globally sort out
> what to do with the rdma uapi.
>
> It just doesn't make alot of sense for drivers to have their own personal
> char devices. :(
I'm afraid I have to disagree at this time. Someday we may have "1 char device
to rule them all" but right now we don't have any line of sight to that
solution. It may be _years_ before we can agree to the semantics which will
work for all high speed, kernel bypass, rdma, low latency, network devices.
We need to fix the write/writev problem now.[1]
Ira
[1] https://www.spinics.net/lists/linux-rdma/msg34451.html
On Thu, Apr 14, 2016 at 10:45:50AM -0600, Jason Gunthorpe wrote:
>On Thu, Apr 14, 2016 at 08:41:35AM -0700, Dennis Dalessandro wrote:
>> This patch series removes the write() interface for user access in favor of an
>> ioctl() based approach. This is in response to the complaint that we had
>> different handlers for write() and writev() doing different things and expecting
>> different types of data. See:
>
>I think we should wait on applying these patches until we globally sort out
>what to do with the rdma uapi.
Perhaps there is a broader change to make to the rdma subsystem, but until
that is fleshed out this patch set achieves our goal of fixing the
write()/writev() problem and should be sufficient to let the driver come out
of staging for 4.7?
>A second char dev for the eeprom? How is that OK? Why aren't you using
>the I2C layer for this?
I moved it because it is totally different in terms of functionality. The
hfi1 device is for send/recv of packets across the wire. The eprom device is
for low level programming of the eprom on the chip. We do not use i2c for
this because the eprom is directly attached to the chip and not accessible
via i2c, requires register access.
>Why is there a snoop interface in here? How is that not something that
>belongs in a the core code?
The snoop interface is a low level diagnostic for the hfi. The intent is to
grab packets before they are handed up to the verbs layer. It also lets us
send all sorts of debug/diagnostic packets for testing.
-Denny
On Thu, Apr 14, 2016 at 01:48:31PM -0400, Ira Weiny wrote:
> On Thu, Apr 14, 2016 at 10:45:50AM -0600, Jason Gunthorpe wrote:
> > On Thu, Apr 14, 2016 at 08:41:35AM -0700, Dennis Dalessandro wrote:
> > > This patch series removes the write() interface for user access in favor of an
> > > ioctl() based approach. This is in response to the complaint that we had
> > > different handlers for write() and writev() doing different things and expecting
> > > different types of data. See:
> >
> > I think we should wait on applying these patches until we globally sort out
> > what to do with the rdma uapi.
> >
> > It just doesn't make alot of sense for drivers to have their own personal
> > char devices. :(
>
> I'm afraid I have to disagree at this time. Someday we may have "1 char device
> to rule them all" but right now we don't have any line of sight to that
> solution. It may be _years_ before we can agree to the semantics which will
> work for all high speed, kernel bypass, rdma, low latency, network devices.
There are some pretty obvious paths to make this saner that could only
be a few weeks away, we haven't even had the first conversations
yet. I think you are completely wrong there is no 'line of sight'
It certainly can't be years.
There is some rational for a very driver specific thing, but EEPROM
and snoop? Seriously?
Jason
On Thu, Apr 14, 2016 at 12:05:40PM -0600, Jason Gunthorpe wrote:
>There are some pretty obvious paths to make this saner that could only
>be a few weeks away, we haven't even had the first conversations
>yet. I think you are completely wrong there is no 'line of sight'
>
>It certainly can't be years.
Does fixing the current write()/writev() problem have any real impact on how
we proceed for the "1 char dev to rule them all" idea?
>There is some rational for a very driver specific thing, but EEPROM
>and snoop? Seriously?
That's the thing, I think these are very driver specific [1]. I'm not dead
set that the eprom needs to be its own device, it made sense to me, but if
others feel the handling should be back in the hfi1 char device I'm fine
with that.
As for the snoop stuff, perhaps that would be better in rdmavt?
[1] http://marc.info/?l=linux-rdma&m=146065638629146&w=2
-Denny
On Thu, Apr 14, 2016 at 01:52:44PM -0400, Dennis Dalessandro wrote:
> On Thu, Apr 14, 2016 at 10:45:50AM -0600, Jason Gunthorpe wrote:
> >On Thu, Apr 14, 2016 at 08:41:35AM -0700, Dennis Dalessandro wrote:
> >>This patch series removes the write() interface for user access in favor of an
> >>ioctl() based approach. This is in response to the complaint that we had
> >>different handlers for write() and writev() doing different things and expecting
> >>different types of data. See:
> >
> >I think we should wait on applying these patches until we globally sort out
> >what to do with the rdma uapi.
>
> Perhaps there is a broader change to make to the rdma subsystem, but until
> that is fleshed out this patch set achieves our goal of fixing the
> write()/writev() problem and should be sufficient to let the driver come out
> of staging for 4.7?
No. Al and Linus have clearly put the kibosh on the idea that a driver
gets a pass on whatever uAPI stuff they want just because it is in a
driver.
If anything adding uAPIs to drivers should be *harder* than adding
them to the core kernel. You nedd a lot more justification why the
core code shouldn't have a well designed version of the function.
You guys need to integrate with the rest of the kernel in some way,
this is just not OK. We catch so much flack from the rest of the
kernel community for our shitty uAPIs, we need to grow up.
I accept the argument that you need special high speed hardware
specific uAPIs for PSM - fine, but that doesn't give hfi1 a free pass
to add whatever other kooky things you find convenient. No to eeprom,
no to snoop.
If you want to migrate out of staging quickly then drop the uAPI from
the driver and submit a sane uAPI later as patches.
IMHO, it was a mistake for Roland to accept ipath with all this uAPI
stuff, and a double mistake to give qib an equal pass. hfi1 is adding
*even more* stuff, with flimsy justification. Enough is enough.
> >A second char dev for the eeprom? How is that OK? Why aren't you using
> >the I2C layer for this?
>
> I moved it because it is totally different in terms of
> functionality. The
Nobody else is doing something like this. It is crazy.
Add a common RDMA API for eeprom. net has one under ethtool, it is
about time we grow something too, all the vendors seem to have various
hacks in this department. Maybe it fits under RDMA's growing netlink
footprint.
> >Why is there a snoop interface in here? How is that not something that
> >belongs in a the core code?
>
> The snoop interface is a low level diagnostic for the hfi. The intent is to
> grab packets before they are handed up to the verbs layer. It also lets us
> send all sorts of debug/diagnostic packets for testing.
So? Why is that unique to hfi1? Packet capture is a well understood
multi-vendor thing.
Nobody else is getting a pass on uAPI design. Thing is, I don't think
it is actually hard to do a good job with the uAPI here, you just
actually have to try. :P
I told John I'd give you guys some design advice. I suggest you give
it a good think, make your wishlist and lets do something sane.
Jason
On Thu, Apr 14, 2016 at 02:42:01PM -0400, Dennis Dalessandro wrote:
> >It certainly can't be years.
>
> Does fixing the current write()/writev() problem have any real
> impact on how we proceed for the "1 char dev to rule them all" idea?
We aren't going to take a bad uAPI into mainline. So how many times do
you want to redo the userspace? I have no objection to the patch
landing, just as long as it stays in staging until we have the uAPI
discussion as a community.
As for the 'one char device', I actually think it would be really
simple.
Add a new uverbs ioctl:
int hfi1_fd = ioctl(uverbs_fd, RDMA_GET_DRIVER_OPS_FD, "psm2.intel.com");
ioctl(hfi1_fd, HFI1_IOCTL_ASSIGN_CTXT, ...);
write(hfi1_fd, ...);
At least that gives us far better options for discovery and versioning
of this stuff than a driver-specific char device.
[eg this would use anon_inode_getfile, like event fds, completion
channels, etc]
You guys need this the most, propose something already.
* driver specific ioctls might be nicer, but people argue that is not
performant enough for what you want... Unclear to me.
Jason
On Thu, Apr 14, 2016 at 01:48:31PM -0400, Ira Weiny wrote:
> On Thu, Apr 14, 2016 at 10:45:50AM -0600, Jason Gunthorpe wrote:
> > On Thu, Apr 14, 2016 at 08:41:35AM -0700, Dennis Dalessandro wrote:
> > > This patch series removes the write() interface for user access in favor of an
> > > ioctl() based approach. This is in response to the complaint that we had
> > > different handlers for write() and writev() doing different things and expecting
> > > different types of data. See:
> >
> > I think we should wait on applying these patches until we globally sort out
> > what to do with the rdma uapi.
> >
> > It just doesn't make alot of sense for drivers to have their own personal
> > char devices. :(
>
> I'm afraid I have to disagree at this time. Someday we may have "1 char device
> to rule them all" but right now we don't have any line of sight to that
> solution. It may be _years_ before we can agree to the semantics which will
> work for all high speed, kernel bypass, rdma, low latency, network devices.
You didn't ever try to come and work on the solution. We talked about
finite time frame (_months_) which is doable based on knowledge that user
space parts are developed by the same companies and all our future changes
will be in one subsystem.
You were supposed to prepare "wish list" from this new API as an initial
phase. If you do it, you will find that it is very short and in the
initial meeting you will see that it similar to other participants in
linux-rdma community.
>
> We need to fix the write/writev problem now.[1]
No, this driver in staging and the proper way to move it out will be to
converge on common API and one clear path instead of duplicating the
interfaces and "inventing the wheel".
>
> Ira
>
> [1] https://www.spinics.net/lists/linux-rdma/msg34451.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 15, 2016 at 07:01:26AM +0300, Leon Romanovsky wrote:
> On Thu, Apr 14, 2016 at 01:48:31PM -0400, Ira Weiny wrote:
> > On Thu, Apr 14, 2016 at 10:45:50AM -0600, Jason Gunthorpe wrote:
> > > On Thu, Apr 14, 2016 at 08:41:35AM -0700, Dennis Dalessandro wrote:
> > > > This patch series removes the write() interface for user access in favor of an
> > > > ioctl() based approach. This is in response to the complaint that we had
> > > > different handlers for write() and writev() doing different things and expecting
> > > > different types of data. See:
> > >
> > > I think we should wait on applying these patches until we globally sort out
> > > what to do with the rdma uapi.
> > >
> > > It just doesn't make alot of sense for drivers to have their own personal
> > > char devices. :(
> >
> > I'm afraid I have to disagree at this time. Someday we may have "1 char device
> > to rule them all" but right now we don't have any line of sight to that
> > solution. It may be _years_ before we can agree to the semantics which will
> > work for all high speed, kernel bypass, rdma, low latency, network devices.
>
> You didn't ever try to come and work on the solution. We talked about
> finite time frame (_months_) which is doable based on knowledge that user
> space parts are developed by the same companies and all our future changes
> will be in one subsystem.
How can you say that I am not working on a solution?
We spent most of last week discussing possible solutions and I am in support of
a more common core. But ask yourself this.
If hfi1 did not support verbs at all would this even be an issue?
>
> You were supposed to prepare "wish list" from this new API as an initial
> phase. If you do it, you will find that it is very short and in the
> initial meeting you will see that it similar to other participants in
> linux-rdma community.
The list of operations may be short. But the way in which you do those in a
performant way for each hardware device is _very_ different. This is a problem
which has been debated for years and no one has come up with an elegant
solution. Every solution ends up being, to quote a presenter at last weeks
conference, "shoving a square peg into a round hole".
Until we all admit 2 things.
1) That there are devices which don't operate on QPs
2) That the High Speed interconnect core should present something more
abstract than a QP interface
we are not really creating a common layer.
I do admit Jasons idea has some merit but I'm just not sure it provides so much
benefit that it is worth the effort at this time.
Ira
On Fri, Apr 15, 2016 at 12:17:55PM -0400, Ira Weiny wrote:
> On Fri, Apr 15, 2016 at 07:01:26AM +0300, Leon Romanovsky wrote:
> > On Thu, Apr 14, 2016 at 01:48:31PM -0400, Ira Weiny wrote:
> > > On Thu, Apr 14, 2016 at 10:45:50AM -0600, Jason Gunthorpe wrote:
> > > > On Thu, Apr 14, 2016 at 08:41:35AM -0700, Dennis Dalessandro wrote:
> > > > > This patch series removes the write() interface for user access in favor of an
> > > > > ioctl() based approach. This is in response to the complaint that we had
> > > > > different handlers for write() and writev() doing different things and expecting
> > > > > different types of data. See:
> > > >
> > > > I think we should wait on applying these patches until we globally sort out
> > > > what to do with the rdma uapi.
> > > >
> > > > It just doesn't make alot of sense for drivers to have their own personal
> > > > char devices. :(
> > >
> > > I'm afraid I have to disagree at this time. Someday we may have "1 char device
> > > to rule them all" but right now we don't have any line of sight to that
> > > solution. It may be _years_ before we can agree to the semantics which will
> > > work for all high speed, kernel bypass, rdma, low latency, network devices.
> >
> > You didn't ever try to come and work on the solution. We talked about
> > finite time frame (_months_) which is doable based on knowledge that user
> > space parts are developed by the same companies and all our future changes
> > will be in one subsystem.
>
> How can you say that I am not working on a solution?
>
> We spent most of last week discussing possible solutions and I am in support of
> a more common core.
Great, did you show it to other RDMA stakeholders except Intel?
I saw nothing posted on ML or proposed for initial discussion, which
will be held in the next week or two.
It is a great opportunity to you guys to start and respect Linux kernel
collaboration development model and to stop to try to do it in your
corporate way.
On Fri, Apr 15, 2016 at 08:30:35PM +0300, Leon Romanovsky wrote:
> Great, did you show it to other RDMA stakeholders except Intel?
> I saw nothing posted on ML or proposed for initial discussion, which
> will be held in the next week or two.
I fear it's kfabrics, which is an entirely crackpot idea and a total
non-starter, but for some reason Intel and their buddies keep wasting
time on it.
> I fear it's kfabrics, which is an entirely crackpot idea and a total non-starter, but for some reason Intel and their buddies keep wasting time on it.
What is being discussed her is not kfabrics. That is a totally different out of kernel pathfinding project at this point.
What is being discussed here is how to best solve the write/writev issue with the PSM interface. The code submitted was to move
to IOCTL instead, but people like Jason have suggested routing the IOCTLs through the verbs layer instead.
> > Great, did you show it to other RDMA stakeholders except Intel?
> > I saw nothing posted on ML or proposed for initial discussion, which
> > will be held in the next week or two.
>
> I fear it's kfabrics, which is an entirely crackpot idea and a total
> non-starter, but for some reason Intel and their buddies keep wasting
> time on it.
There were discussions between several developers from multiple companies around moving away from using writev. That's it. Making random accusations and throwing crap over the wall does nothing to improve the community or instill trust.
On Fri, Apr 15, 2016 at 05:44:48PM +0000, Woodruff, Robert J wrote:
> > I fear it's kfabrics, which is an entirely crackpot idea and a total non-starter, but for some reason Intel and their buddies keep wasting time on it.
>
> What is being discussed her is not kfabrics. That is a totally different out of kernel pathfinding project at this point.
> What is being discussed here is how to best solve the write/writev issue with the PSM interface. The code submitted was to move
> to IOCTL instead, but people like Jason have suggested routing the IOCTLs through the verbs layer instead.
The discussion here is much broader than conversion of PSM interface.
On Fri, Apr 15, 2016 at 10:34:01AM -0700, Christoph Hellwig wrote:
> On Fri, Apr 15, 2016 at 08:30:35PM +0300, Leon Romanovsky wrote:
> > Great, did you show it to other RDMA stakeholders except Intel?
> > I saw nothing posted on ML or proposed for initial discussion, which
> > will be held in the next week or two.
>
> I fear it's kfabrics, which is an entirely crackpot idea and a total
> non-starter, but for some reason Intel and their buddies keep wasting
> time on it.
It is a different thing, during OFA16 conference we were **strongly
advised** to move from old read/write interface in RDMA stack to
something else.
The agreement was that a couple of weeks after the conference,
Liran will organize open web meeting to discuss what we want from
this interface.
It is important to make it open, so all participants will be able to
express their willingness.
Intel as usual decided to do it in their way and the result is presented
on this mailing list.
Thanks.
On Sat, Apr 16, 2016 at 12:23:28AM +0300, Leon Romanovsky wrote:
>
> Intel as usual decided to do it in their way and the result is presented
> on this mailing list.
Excuse me, but this statement is completely unfair. We were specifically asked
by Al and Linus to fix our char device with regards to the write/writev
inconsistency.
https://www.spinics.net/lists/linux-rdma/msg34451.html
Which is _exactly_ what this patch series does.
Do you have a technical reason that this patch series does not fix the
write/writev issue brought up by Al?
Ira
On Sat, Apr 16, 2016 at 12:23:28AM +0300, Leon Romanovsky wrote:
> Intel as usual decided to do it in their way and the result is presented
> on this mailing list.
Dennis was pretty clear he was going to send the patches to address
Al's concern, which he has done.
I was also pretty clear I was looking to get rid of the char dev :)
Not seeing a problem here.
Jason
On Fri, Apr 15, 2016 at 05:37:32PM -0600, Jason Gunthorpe wrote:
> On Sat, Apr 16, 2016 at 12:23:28AM +0300, Leon Romanovsky wrote:
>
> > Intel as usual decided to do it in their way and the result is presented
> > on this mailing list.
>
> Dennis was pretty clear he was going to send the patches to address
> Al's concern, which he has done.
>
> I was also pretty clear I was looking to get rid of the char dev :)
Yes, and I was pretty clear that we need to converge on one common API
prior to converting old code (including drivers in staging) in order to
do it once only.
On Fri, Apr 15, 2016 at 07:28:01PM -0400, Ira Weiny wrote:
> On Sat, Apr 16, 2016 at 12:23:28AM +0300, Leon Romanovsky wrote:
> Do you have a technical reason that this patch series does not fix the
> write/writev issue brought up by Al?
Sure, I truly believe that we can do common API in a months time-frame
and I want to be focused on one transition path only (write/read -> new
API) and not on two parallel paths (ioctl -> new API and write/read ->
new API) plus support of all these intermediate steps.
The original request came after this driver was moved from staging to
RDMA stack, since the driver is still in staging, there is no need to
hurry up now.
>
> Ira
>
On Sat, Apr 16, 2016 at 09:09:40AM +0300, Leon Romanovsky wrote:
>On Fri, Apr 15, 2016 at 07:28:01PM -0400, Ira Weiny wrote:
>> On Sat, Apr 16, 2016 at 12:23:28AM +0300, Leon Romanovsky wrote:
>> Do you have a technical reason that this patch series does not fix the
>> write/writev issue brought up by Al?
>
>Sure, I truly believe that we can do common API in a months time-frame
>and I want to be focused on one transition path only (write/read -> new
>API) and not on two parallel paths (ioctl -> new API and write/read ->
>new API) plus support of all these intermediate steps.
That doesn't say anything about how this patch doesn't address Al and
Linus's complaint, or raise a technical issue with the patch set.
These are two separate issues. I do not see a reason to try and make them
one, and use this to drive the "one-device to rule them all" idea. This
series converts the write() to ioctl() and fixes the problem we set to, as
promised. You don't like the API, that's fine. We'll discuss that on
linux-rdma, but no reason to hold this patch set while that happens.
>The original request came after this driver was moved from staging to
>RDMA stack, since the driver is still in staging, there is no need to
>hurry up now.
There is no need to keep the driver in staging. This is not a driver that
has style problems or is not well tested. It is a driver that has been
heavily tested, performs well and has completed its staging TODO list. We
went ahead and added this write()/writev() fix before making the move
because Al and Linus wanted that issue addressed. For the record:
$ cat drivers/staging/rdma/hfi1/TODO
July, 2015
- Remove unneeded file entries in sysfs
- Remove software processing of IB protocol and place in library for use
by qib, ipath (if still present), hfi1, and eventually soft-roce
Both of those items are complete. The API issue was raised back when the
driver was submitted (almost a year ago), as you can see it did not make the
cut as a staging requirement. Whether you agree with the maintainer's
decision or not. I don't see how it's fair to try and add it again now.
As I mentioned let's discuss the uAPI stuff on linux-rdma. Have the web
meetings that you were mentioning and do whatever we need to in order to
improve the sub-system, but stop trying to tie our driver and moving out of
staging to this much larger issue.
Thanks
-Denny
On Sat, Apr 16, 2016 at 09:00:42AM +0300, Leon Romanovsky wrote:
> On Fri, Apr 15, 2016 at 05:37:32PM -0600, Jason Gunthorpe wrote:
> > On Sat, Apr 16, 2016 at 12:23:28AM +0300, Leon Romanovsky wrote:
> >
> > > Intel as usual decided to do it in their way and the result is presented
> > > on this mailing list.
> >
> > Dennis was pretty clear he was going to send the patches to address
> > Al's concern, which he has done.
> >
> > I was also pretty clear I was looking to get rid of the char dev :)
>
> Yes, and I was pretty clear that we need to converge on one common API
> prior to converting old code (including drivers in staging) in order to
> do it once only.
While we are at it, could the person who'd come up with ui_lseek() be located
and made to stand up and explain the rationale behind the SEEK_END semantics
therein? To quote the manpage (and paraphrase just about any introductory
textbook):
SEEK_END
The file offset is set to the size of the file plus offset bytes.
I'm really curious - which part of "plus" might have lead to
case SEEK_END:
offset = ((dd->kregend - dd->kregbase) + DC8051_DATA_MEM_SIZE) -
offset;
and, if its author has decided that of course it _must_ have meant "minus",
why had he or she failed to post a correction to the manpage? Or, on the
off-chance that this "plus" might have something to do with reality,
experimented with some file, for that matter.
Folks, this is a well-earned "F". And not just for Unix Programming 101 -
the same semantics applies to fseek(3), which is a part of C standard.
Incidentally, lseek(fd, 0, SEEK_END) is "seek to end", not "fail with EINVAL".
As for the use of ioctl... Frankly, considering the above, it does sound like
"that'll make them STFU about the weirdness - ioctl *is* weird, so there!"
Single-consumer APIs stink, film at 11...
On Sat, Apr 16, 2016 at 08:19:17PM +0100, Al Viro wrote:
>While we are at it, could the person who'd come up with ui_lseek() be located
>and made to stand up and explain the rationale behind the SEEK_END semantics
>therein? To quote the manpage (and paraphrase just about any introductory
>textbook):
> SEEK_END
> The file offset is set to the size of the file plus offset bytes.
>
>I'm really curious - which part of "plus" might have lead to
> case SEEK_END:
> offset = ((dd->kregend - dd->kregbase) + DC8051_DATA_MEM_SIZE) -
> offset;
>and, if its author has decided that of course it _must_ have meant "minus",
>why had he or she failed to post a correction to the manpage? Or, on the
Original author of that code confirmed it is just a coding mistake and we
will fix it.
-Denny
Looks fine,
Reviewed-by: Christoph Hellwig <[email protected]>
Looks fine,
Reviewed-by: Christoph Hellwig <[email protected]>
On Thu, Apr 14, 2016 at 10:45:50AM -0600, Jason Gunthorpe wrote:
> On Thu, Apr 14, 2016 at 08:41:35AM -0700, Dennis Dalessandro wrote:
> > This patch series removes the write() interface for user access in favor of an
> > ioctl() based approach. This is in response to the complaint that we had
> > different handlers for write() and writev() doing different things and expecting
> > different types of data. See:
>
> I think we should wait on applying these patches until we globally sort out
> what to do with the rdma uapi.
>
> It just doesn't make alot of sense for drivers to have their own personal
> char devices. :(
I looked through the patches I tend to disagree - while we should wait
for a global UAPI for anything that's actually RDMA/verbs related these
seem to be misc little bits specific to the driver that have no business
in any sort of generic RDMA API.
> A second char dev for the eeprom? How is that OK? Why aren't you using
> the I2C layer for this?
... but this is a really good question, although the right layer to
plug this in would be the eeprom code in drivers/nvmem/
On Mon, Apr 18, 2016 at 06:09:09AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 14, 2016 at 10:45:50AM -0600, Jason Gunthorpe wrote:
> > On Thu, Apr 14, 2016 at 08:41:35AM -0700, Dennis Dalessandro wrote:
> > > This patch series removes the write() interface for user access in favor of an
> > > ioctl() based approach. This is in response to the complaint that we had
> > > different handlers for write() and writev() doing different things and expecting
> > > different types of data. See:
> >
> > I think we should wait on applying these patches until we globally sort out
> > what to do with the rdma uapi.
> >
> > It just doesn't make alot of sense for drivers to have their own personal
> > char devices. :(
>
> I looked through the patches I tend to disagree - while we should wait
> for a global UAPI for anything that's actually RDMA/verbs related these
> seem to be misc little bits specific to the driver that have no business
> in any sort of generic RDMA API.
I wasn't arguing this should integrate into verbs in some way, only
that the way to access the driver-specific uAPI of a RDMA device should
be through the RDMA common uAPI and not through a random char dev.
.. and of course that the driver-specific API be subject to a sane
review and use of the normal standards, not just written off as
driver-garbage nobody cares about. :(
For instance, if we had a driver specific channel, it casts this
endless stream of uAPI verbs patches in a different light: maybe they
should go down the driver-specific channel instead.
Jason
On Mon, Apr 18, 2016 at 11:40:47AM -0600, Jason Gunthorpe wrote:
> I wasn't arguing this should integrate into verbs in some way, only
> that the way to access the driver-specific uAPI of a RDMA device should
> be through the RDMA common uAPI and not through a random char dev.
Well, it's stuff not related to our RDMA userspace API (which _is_
Verbs, not counting for the complete crackpot abuse in usnic), but
very device specific.
The stuff the intel driver are doing isn't pretty, but unfortunately
not unusual either - lots of SCSI or network driver have ioctls
like that. Now we could argue if the ioctls should be one the
main node (uverbs) or the a driver private chardev, or not exist
at all and people will have to patch the driver with some vendor
version if they really need it. Examples for either of these
choices exist in the tree.
On Mon, Apr 18, 2016 at 11:24:11AM -0700, Christoph Hellwig wrote:
> On Mon, Apr 18, 2016 at 11:40:47AM -0600, Jason Gunthorpe wrote:
> > I wasn't arguing this should integrate into verbs in some way, only
> > that the way to access the driver-specific uAPI of a RDMA device should
> > be through the RDMA common uAPI and not through a random char dev.
>
> Well, it's stuff not related to our RDMA userspace API (which _is_
> Verbs, not counting for the complete crackpot abuse in usnic), but
> very device specific.
>
> The stuff the intel driver are doing isn't pretty, but unfortunately
> not unusual either - lots of SCSI or network driver have ioctls
> like that. Now we could argue if the ioctls should be one the
> main node (uverbs) or the a driver private chardev, or not exist
> at all and people will have to patch the driver with some vendor
> version if they really need it. Examples for either of these
> choices exist in the tree.
I'm a bit confused by what you are suggesting that "people will have to patch
the driver with some vendor version if they really need it."?
Could you elaborate?
PSM is the primary performant path for this device. Without it this device is
severely limited in its intended functionality.
We are strongly motivated to have all of our functionality included in the
mainstream kernel. So for eprom/snoop we would really like to find a way to
include all this functionality.
Ira
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 18, 2016 at 11:24:11AM -0700, Christoph Hellwig wrote:
> On Mon, Apr 18, 2016 at 11:40:47AM -0600, Jason Gunthorpe wrote:
> > I wasn't arguing this should integrate into verbs in some way, only
> > that the way to access the driver-specific uAPI of a RDMA device should
> > be through the RDMA common uAPI and not through a random char dev.
>
> Well, it's stuff not related to our RDMA userspace API (which _is_
> Verbs, not counting for the complete crackpot abuse in usnic), but
> very device specific.
It is weakly related, it uses the same device discovery and security
model.
> The stuff the intel driver are doing isn't pretty, but unfortunately
> not unusual either - lots of SCSI or network driver have ioctls
> like that. Now we could argue if the ioctls should be one the
> main node (uverbs) or the a driver private chardev, or not exist
> at all and people will have to patch the driver with some vendor
> version if they really need it. Examples for either of these
> choices exist in the tree.
Right - and the RDMA uAPI has always had an integrated driver-bypass
channel as part of the verb uAPI calls, extending that to allow for
new-driver-specific calls seems very natural.
Jason
On Mon, Apr 18, 2016 at 11:45:49PM -0400, Ira Weiny wrote:
> I'm a bit confused by what you are suggesting that "people will have to patch
> the driver with some vendor version if they really need it."?
>
> Could you elaborate?
There are lots of drivers where we simply did not accept these vendor
specific extensions at all. Especially for networking drivers it's
pretty common. I'm not proposing this here, just saying that we have
lots of examples for it.
On Thu, Apr 14, 2016 at 01:52:44PM -0400, Dennis Dalessandro wrote:
> The eprom device is for low level programming of the eprom on the
> chip. We do not use i2c for this because the eprom is directly
> attached to the chip and not accessible via i2c, requires register
> access.
Okay, but the twsi.c is still not acceptable in a driver, use the i2c
subsystem to talk to the qsfps. Open coding i2c is not OK.
The char dev(s) are also done wrong, there is no set to 'kobj.parent'
and there are empty release functions. There are certainly
use-after-free bugs between close and device removal, someone needs to
audit all of this carefully.
The patch forgets compat_ioctl.
Stuff like this should be a build bug:
/* NOTE: assumes unsigned long is 8 bytes */
The 'goto on success' in hfi1_cdev_init is an anti-pattern, don't do it.
Even if the char dev stays, creating two whole sysfs classes seems
really unnecessary. Surely there is a better place to attach the cdev.
And this is just outrageous:
if (atomic_inc_return(&user_count) == 1) {
ret = hfi1_cdev_init(0, class_name(), &hfi1_file_ops,
&wildcard_cdev, &wildcard_device,
true);
if (ret)
I can see an argument for a per-device char dev (as Christoph says,
that is not entirely uncommon) but this is a multi-device char dev????
It is not OK to create your own private subsystem in a driver! I count
*three* char devs before this series!?!?! One of them marked 0666 even!
I stopped looking at the code.
Jason
On Wed, Apr 20, 2016 at 02:36:16PM -0600, Jason Gunthorpe wrote:
>On Thu, Apr 14, 2016 at 01:52:44PM -0400, Dennis Dalessandro wrote:
>> The eprom device is for low level programming of the eprom on the
>> chip. We do not use i2c for this because the eprom is directly
>> attached to the chip and not accessible via i2c, requires register
>> access.
>
>Okay, but the twsi.c is still not acceptable in a driver, use the i2c
>subsystem to talk to the qsfps. Open coding i2c is not OK.
Yeah, I mentioned this at OFA. We are looking at reworking this.
>The char dev(s) are also done wrong, there is no set to 'kobj.parent'
>and there are empty release functions. There are certainly
>use-after-free bugs between close and device removal, someone needs to
>audit all of this carefully.
>
>The patch forgets compat_ioctl.
We will take a look at this.
>Stuff like this should be a build bug:
> /* NOTE: assumes unsigned long is 8 bytes */
Our Kconfig depends on X86_64. Should we add a BUILD_BUG_ON or something?
>The 'goto on success' in hfi1_cdev_init is an anti-pattern, don't do it.
Can fix.
>Even if the char dev stays, creating two whole sysfs classes seems
>really unnecessary. Surely there is a better place to attach the cdev.
>
>And this is just outrageous:
>
> if (atomic_inc_return(&user_count) == 1) {
> ret = hfi1_cdev_init(0, class_name(), &hfi1_file_ops,
> &wildcard_cdev, &wildcard_device,
> true);
> if (ret)
>
>I can see an argument for a per-device char dev (as Christoph says,
>that is not entirely uncommon) but this is a multi-device char dev????
We are discussing what can be done about this internally. On that note we
are also looking at what we can do about the twsi as mentioned above, also
the eprom, ui, and snoop. We'll send the actual plan for each issue to
linux-rdma for feedback soon.
We are certainly not opposed to improving these areas of the code. So long
as we can maintain the same functionality, which seems doable. It's just
that no one has spoken up about these things in the 9+ months since the
driver was added. We need a bit of time to get it all sorted out.
-Denny
On Fri, Apr 22, 2016 at 02:38:45PM -0400, Dennis Dalessandro wrote:
> >Stuff like this should be a build bug:
> >/* NOTE: assumes unsigned long is 8 bytes */
>
> Our Kconfig depends on X86_64. Should we add a BUILD_BUG_ON or something?
Yes
> as we can maintain the same functionality, which seems doable. It's just
> that no one has spoken up about these things in the 9+ months since the
> driver was added. We need a bit of time to get it all sorted out.
Unfortunately this is a bit of a flaw with the staging process, really
it is not designed to handle uAPI work.
Jason