Hyper-V is adding multiple low speed "speciality" synthetic devices.
Instead of writing a new kernel-level VMBus driver for each device,
make the devices accessible to user space through a UIO-based
hv_vmbus_client driver. Each device can then be supported by a user
space driver. This approach optimizes the development process and
provides flexibility to user space applications to control the key
interactions with the VMBus ring buffer.
The new synthetic devices are low speed devices that don't support
VMBus monitor bits, and so they must use vmbus_setevent() to notify
the host of ring buffer updates. The new driver provides this
functionality along with a configurable ring buffer size.
Moreover, this series of patches incorporates an update to the fcopy
application, enabling it to seamlessly utilize the new interface. The
older fcopy driver and application will be phased out gradually.
Development of other similar userspace drivers is still underway.
Moreover, this patch series adds a new implementation of the fcopy
application that uses the new UIO driver. The older fcopy driver and
application will be phased out gradually. Development of other similar
userspace drivers is still underway.
[V4]
- Modify comment to remove RingDataStartOffset mention
- Change downward ALIGN macro to upward ALIGN
- Add error check for setting ring_size value in sysfs entry
- Add error handling in fcopy_get_instance_id for instance id not found case
[V3]
- Removed ringbuffer sysfs entry and used uio framework for mmap
- Remove ".id_table = NULL"
- kasprintf -> devm_kasprintf
- Change global variable ring_size to per device
- More checks on value which can be set for ring_size
- Remove driverctl, and used echo command instead for driver documentation
- Remove unnecessary one time use macros
- Change kernel version and date for sysfs documentation
- Update documentation.
- Made ring buffer data offset depend on page size
- remove rte_smp_rwmb macro and reused rte_compiler_barrier instead
- Added legal counsel sign-off
- simplify mmap
- Removed "Link:" tag
- Improve cover letter and commit messages
- Improve debug prints
- Instead of hardcoded instance id, query from class id sysfs
- Set the ring_size value from application
- new application compilation dependent on x86
- Not removing the old driver and application for backward compatibility
[V2]
- Update driver info in Documentation/driver-api/uio-howto.rst
- Update ring_size sysfs info in Documentation/ABI/stable/sysfs-bus-vmbus
- Remove DRIVER_VERSION
- Remove refcnt
- scnprintf -> sysfs_emit
- sysfs_create_file -> ATTRIBUTE_GROUPS + ".driver.groups";
- sysfs_create_bin_file -> device_create_bin_file
- dev_notice -> dev_err
- remove MODULE_VERSION
- simpler sysfs path, less parsing
Saurabh Sengar (3):
uio: Add hv_vmbus_client driver
tools: hv: Add vmbus_bufring
tools: hv: Add new fcopy application based on uio driver
Documentation/ABI/stable/sysfs-bus-vmbus | 10 +
Documentation/driver-api/uio-howto.rst | 54 +++
drivers/uio/Kconfig | 12 +
drivers/uio/Makefile | 1 +
drivers/uio/uio_hv_vmbus_client.c | 218 +++++++++
tools/hv/Build | 2 +
tools/hv/Makefile | 21 +-
tools/hv/hv_fcopy_uio_daemon.c | 587 +++++++++++++++++++++++
tools/hv/vmbus_bufring.c | 297 ++++++++++++
tools/hv/vmbus_bufring.h | 154 ++++++
10 files changed, 1355 insertions(+), 1 deletion(-)
create mode 100644 drivers/uio/uio_hv_vmbus_client.c
create mode 100644 tools/hv/hv_fcopy_uio_daemon.c
create mode 100644 tools/hv/vmbus_bufring.c
create mode 100644 tools/hv/vmbus_bufring.h
--
2.34.1
Provide a userspace interface for userspace drivers or applications to
read/write a VMBus ringbuffer. A significant part of this code is
borrowed from DPDK[1]. Current library is supported exclusively for
the x86 architecture.
To build this library:
make -C tools/hv libvmbus_bufring.a
Applications using this library can include the vmbus_bufring.h header
file and libvmbus_bufring.a statically.
[1] https://github.com/DPDK/dpdk/
Signed-off-by: Mary Hardy <[email protected]>
Signed-off-by: Saurabh Sengar <[email protected]>
---
[V4]
- Modify comment to remove RingDataStartOffset mention
- Change downward ALIGN macro to upward ALIGN
[V3]
- Made ring buffer data offset depend on page size
- remove rte_smp_rwmb macro and reused rte_compiler_barrier instead
- Added legal counsel sign-off
- Removed "Link:" tag
- Improve commit messages
- new library compilation dependent on x86
- simplify mmap
[V2]
- simpler sysfs path, less parsing
tools/hv/Build | 1 +
tools/hv/Makefile | 13 +-
tools/hv/vmbus_bufring.c | 297 +++++++++++++++++++++++++++++++++++++++
tools/hv/vmbus_bufring.h | 154 ++++++++++++++++++++
4 files changed, 464 insertions(+), 1 deletion(-)
create mode 100644 tools/hv/vmbus_bufring.c
create mode 100644 tools/hv/vmbus_bufring.h
diff --git a/tools/hv/Build b/tools/hv/Build
index 6cf51fa4b306..2a667d3d94cb 100644
--- a/tools/hv/Build
+++ b/tools/hv/Build
@@ -1,3 +1,4 @@
hv_kvp_daemon-y += hv_kvp_daemon.o
hv_vss_daemon-y += hv_vss_daemon.o
hv_fcopy_daemon-y += hv_fcopy_daemon.o
+vmbus_bufring-y += vmbus_bufring.o
diff --git a/tools/hv/Makefile b/tools/hv/Makefile
index fe770e679ae8..33cf488fd20f 100644
--- a/tools/hv/Makefile
+++ b/tools/hv/Makefile
@@ -11,14 +11,19 @@ srctree := $(patsubst %/,%,$(dir $(CURDIR)))
srctree := $(patsubst %/,%,$(dir $(srctree)))
endif
+include $(srctree)/tools/scripts/Makefile.arch
+
# Do not use make's built-in rules
# (this improves performance and avoids hard-to-debug behaviour);
MAKEFLAGS += -r
override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
+ifeq ($(SRCARCH),x86)
+ALL_LIBS := libvmbus_bufring.a
+endif
ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
-ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
+ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS)) $(patsubst %,$(OUTPUT)%,$(ALL_LIBS))
ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh hv_set_ifconfig.sh
@@ -27,6 +32,12 @@ all: $(ALL_PROGRAMS)
export srctree OUTPUT CC LD CFLAGS
include $(srctree)/tools/build/Makefile.include
+HV_VMBUS_BUFRING_IN := $(OUTPUT)vmbus_bufring.o
+$(HV_VMBUS_BUFRING_IN): FORCE
+ $(Q)$(MAKE) $(build)=vmbus_bufring
+$(OUTPUT)libvmbus_bufring.a : vmbus_bufring.o
+ $(AR) rcs $@ $^
+
HV_KVP_DAEMON_IN := $(OUTPUT)hv_kvp_daemon-in.o
$(HV_KVP_DAEMON_IN): FORCE
$(Q)$(MAKE) $(build)=hv_kvp_daemon
diff --git a/tools/hv/vmbus_bufring.c b/tools/hv/vmbus_bufring.c
new file mode 100644
index 000000000000..0a0686751b79
--- /dev/null
+++ b/tools/hv/vmbus_bufring.c
@@ -0,0 +1,297 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Copyright (c) 2009-2012,2016,2023 Microsoft Corp.
+ * Copyright (c) 2012 NetApp Inc.
+ * Copyright (c) 2012 Citrix Inc.
+ * All rights reserved.
+ */
+
+#include <errno.h>
+#include <emmintrin.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/uio.h>
+#include "vmbus_bufring.h"
+
+#define rte_compiler_barrier() ({ asm volatile ("" : : : "memory"); })
+#define RINGDATA_START_OFFSET (getpagesize())
+#define VMBUS_RQST_ERROR 0xFFFFFFFFFFFFFFFF
+#define ALIGN(val, align) (((val) + ((align) - 1)) & ~((align) - 1))
+
+/* Increase bufring index by inc with wraparound */
+static inline uint32_t vmbus_br_idxinc(uint32_t idx, uint32_t inc, uint32_t sz)
+{
+ idx += inc;
+ if (idx >= sz)
+ idx -= sz;
+
+ return idx;
+}
+
+void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen)
+{
+ br->vbr = buf;
+ br->windex = br->vbr->windex;
+ br->dsize = blen - RINGDATA_START_OFFSET;
+}
+
+static inline __always_inline void
+rte_smp_mb(void)
+{
+ asm volatile("lock addl $0, -128(%%rsp); " ::: "memory");
+}
+
+static inline int
+rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
+{
+ uint8_t res;
+
+ asm volatile("lock ; "
+ "cmpxchgl %[src], %[dst];"
+ "sete %[res];"
+ : [res] "=a" (res), /* output */
+ [dst] "=m" (*dst)
+ : [src] "r" (src), /* input */
+ "a" (exp),
+ "m" (*dst)
+ : "memory"); /* no-clobber list */
+ return res;
+}
+
+static inline uint32_t
+vmbus_txbr_copyto(const struct vmbus_br *tbr, uint32_t windex,
+ const void *src0, uint32_t cplen)
+{
+ uint8_t *br_data = (uint8_t *)tbr->vbr + RINGDATA_START_OFFSET;
+ uint32_t br_dsize = tbr->dsize;
+ const uint8_t *src = src0;
+
+ if (cplen > br_dsize - windex) {
+ uint32_t fraglen = br_dsize - windex;
+
+ /* Wrap-around detected */
+ memcpy(br_data + windex, src, fraglen);
+ memcpy(br_data, src + fraglen, cplen - fraglen);
+ } else {
+ memcpy(br_data + windex, src, cplen);
+ }
+
+ return vmbus_br_idxinc(windex, cplen, br_dsize);
+}
+
+/*
+ * Write scattered channel packet to TX bufring.
+ *
+ * The offset of this channel packet is written as a 64bits value
+ * immediately after this channel packet.
+ *
+ * The write goes through three stages:
+ * 1. Reserve space in ring buffer for the new data.
+ * Writer atomically moves priv_write_index.
+ * 2. Copy the new data into the ring.
+ * 3. Update the tail of the ring (visible to host) that indicates
+ * next read location. Writer updates write_index
+ */
+static int
+vmbus_txbr_write(struct vmbus_br *tbr, const struct iovec iov[], int iovlen,
+ bool *need_sig)
+{
+ struct vmbus_bufring *vbr = tbr->vbr;
+ uint32_t ring_size = tbr->dsize;
+ uint32_t old_windex, next_windex, windex, total;
+ uint64_t save_windex;
+ int i;
+
+ total = 0;
+ for (i = 0; i < iovlen; i++)
+ total += iov[i].iov_len;
+ total += sizeof(save_windex);
+
+ /* Reserve space in ring */
+ do {
+ uint32_t avail;
+
+ /* Get current free location */
+ old_windex = tbr->windex;
+
+ /* Prevent compiler reordering this with calculation */
+ rte_compiler_barrier();
+
+ avail = vmbus_br_availwrite(tbr, old_windex);
+
+ /* If not enough space in ring, then tell caller. */
+ if (avail <= total)
+ return -EAGAIN;
+
+ next_windex = vmbus_br_idxinc(old_windex, total, ring_size);
+
+ /* Atomic update of next write_index for other threads */
+ } while (!rte_atomic32_cmpset(&tbr->windex, old_windex, next_windex));
+
+ /* Space from old..new is now reserved */
+ windex = old_windex;
+ for (i = 0; i < iovlen; i++)
+ windex = vmbus_txbr_copyto(tbr, windex, iov[i].iov_base, iov[i].iov_len);
+
+ /* Set the offset of the current channel packet. */
+ save_windex = ((uint64_t)old_windex) << 32;
+ windex = vmbus_txbr_copyto(tbr, windex, &save_windex,
+ sizeof(save_windex));
+
+ /* The region reserved should match region used */
+ if (windex != next_windex)
+ return -EINVAL;
+
+ /* Ensure that data is available before updating host index */
+ rte_compiler_barrier();
+
+ /* Checkin for our reservation. wait for our turn to update host */
+ while (!rte_atomic32_cmpset(&vbr->windex, old_windex, next_windex))
+ _mm_pause();
+
+ return 0;
+}
+
+int rte_vmbus_chan_send(struct vmbus_br *txbr, uint16_t type, void *data,
+ uint32_t dlen, uint32_t flags)
+{
+ struct vmbus_chanpkt pkt;
+ unsigned int pktlen, pad_pktlen;
+ const uint32_t hlen = sizeof(pkt);
+ bool send_evt = false;
+ uint64_t pad = 0;
+ struct iovec iov[3];
+ int error;
+
+ pktlen = hlen + dlen;
+ pad_pktlen = ALIGN(pktlen, sizeof(uint64_t));
+
+ pkt.hdr.type = type;
+ pkt.hdr.flags = flags;
+ pkt.hdr.hlen = hlen >> VMBUS_CHANPKT_SIZE_SHIFT;
+ pkt.hdr.tlen = pad_pktlen >> VMBUS_CHANPKT_SIZE_SHIFT;
+ pkt.hdr.xactid = VMBUS_RQST_ERROR; /* doesn't support multiple requests at same time */
+
+ iov[0].iov_base = &pkt;
+ iov[0].iov_len = hlen;
+ iov[1].iov_base = data;
+ iov[1].iov_len = dlen;
+ iov[2].iov_base = &pad;
+ iov[2].iov_len = pad_pktlen - pktlen;
+
+ error = vmbus_txbr_write(txbr, iov, 3, &send_evt);
+
+ return error;
+}
+
+static inline uint32_t
+vmbus_rxbr_copyfrom(const struct vmbus_br *rbr, uint32_t rindex,
+ void *dst0, size_t cplen)
+{
+ const uint8_t *br_data = (uint8_t *)rbr->vbr + RINGDATA_START_OFFSET;
+ uint32_t br_dsize = rbr->dsize;
+ uint8_t *dst = dst0;
+
+ if (cplen > br_dsize - rindex) {
+ uint32_t fraglen = br_dsize - rindex;
+
+ /* Wrap-around detected. */
+ memcpy(dst, br_data + rindex, fraglen);
+ memcpy(dst + fraglen, br_data, cplen - fraglen);
+ } else {
+ memcpy(dst, br_data + rindex, cplen);
+ }
+
+ return vmbus_br_idxinc(rindex, cplen, br_dsize);
+}
+
+/* Copy data from receive ring but don't change index */
+static int
+vmbus_rxbr_peek(const struct vmbus_br *rbr, void *data, size_t dlen)
+{
+ uint32_t avail;
+
+ /*
+ * The requested data and the 64bits channel packet
+ * offset should be there at least.
+ */
+ avail = vmbus_br_availread(rbr);
+ if (avail < dlen + sizeof(uint64_t))
+ return -EAGAIN;
+
+ vmbus_rxbr_copyfrom(rbr, rbr->vbr->rindex, data, dlen);
+ return 0;
+}
+
+/*
+ * Copy data from receive ring and change index
+ * NOTE:
+ * We assume (dlen + skip) == sizeof(channel packet).
+ */
+static int
+vmbus_rxbr_read(struct vmbus_br *rbr, void *data, size_t dlen, size_t skip)
+{
+ struct vmbus_bufring *vbr = rbr->vbr;
+ uint32_t br_dsize = rbr->dsize;
+ uint32_t rindex;
+
+ if (vmbus_br_availread(rbr) < dlen + skip + sizeof(uint64_t))
+ return -EAGAIN;
+
+ /* Record where host was when we started read (for debug) */
+ rbr->windex = rbr->vbr->windex;
+
+ /*
+ * Copy channel packet from RX bufring.
+ */
+ rindex = vmbus_br_idxinc(rbr->vbr->rindex, skip, br_dsize);
+ rindex = vmbus_rxbr_copyfrom(rbr, rindex, data, dlen);
+
+ /*
+ * Discard this channel packet's 64bits offset, which is useless to us.
+ */
+ rindex = vmbus_br_idxinc(rindex, sizeof(uint64_t), br_dsize);
+
+ /* Update the read index _after_ the channel packet is fetched. */
+ rte_compiler_barrier();
+
+ vbr->rindex = rindex;
+
+ return 0;
+}
+
+int rte_vmbus_chan_recv_raw(struct vmbus_br *rxbr,
+ void *data, uint32_t *len)
+{
+ struct vmbus_chanpkt_hdr pkt;
+ uint32_t dlen, bufferlen = *len;
+ int error;
+
+ error = vmbus_rxbr_peek(rxbr, &pkt, sizeof(pkt));
+ if (error)
+ return error;
+
+ if (unlikely(pkt.hlen < VMBUS_CHANPKT_HLEN_MIN))
+ /* XXX this channel is dead actually. */
+ return -EIO;
+
+ if (unlikely(pkt.hlen > pkt.tlen))
+ return -EIO;
+
+ /* Length are in quad words */
+ dlen = pkt.tlen << VMBUS_CHANPKT_SIZE_SHIFT;
+ *len = dlen;
+
+ /* If caller buffer is not large enough */
+ if (unlikely(dlen > bufferlen))
+ return -ENOBUFS;
+
+ /* Read data and skip packet header */
+ error = vmbus_rxbr_read(rxbr, data, dlen, 0);
+ if (error)
+ return error;
+
+ /* Return the number of bytes read */
+ return dlen + sizeof(uint64_t);
+}
diff --git a/tools/hv/vmbus_bufring.h b/tools/hv/vmbus_bufring.h
new file mode 100644
index 000000000000..5590df90e971
--- /dev/null
+++ b/tools/hv/vmbus_bufring.h
@@ -0,0 +1,154 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+
+#ifndef _VMBUS_BUF_H_
+#define _VMBUS_BUF_H_
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#define __packed __attribute__((__packed__))
+#define unlikely(x) __builtin_expect(!!(x), 0)
+
+#define ICMSGHDRFLAG_TRANSACTION 1
+#define ICMSGHDRFLAG_REQUEST 2
+#define ICMSGHDRFLAG_RESPONSE 4
+
+#define IC_VERSION_NEGOTIATION_MAX_VER_COUNT 100
+#define ICMSG_HDR (sizeof(struct vmbuspipe_hdr) + sizeof(struct icmsg_hdr))
+#define ICMSG_NEGOTIATE_PKT_SIZE(icframe_vercnt, icmsg_vercnt) \
+ (ICMSG_HDR + sizeof(struct icmsg_negotiate) + \
+ (((icframe_vercnt) + (icmsg_vercnt)) * sizeof(struct ic_version)))
+
+/*
+ * Channel packets
+ */
+
+/* Channel packet flags */
+#define VMBUS_CHANPKT_TYPE_INBAND 0x0006
+#define VMBUS_CHANPKT_TYPE_RXBUF 0x0007
+#define VMBUS_CHANPKT_TYPE_GPA 0x0009
+#define VMBUS_CHANPKT_TYPE_COMP 0x000b
+
+#define VMBUS_CHANPKT_FLAG_NONE 0
+#define VMBUS_CHANPKT_FLAG_RC 0x0001 /* report completion */
+
+#define VMBUS_CHANPKT_SIZE_SHIFT 3
+#define VMBUS_CHANPKT_SIZE_ALIGN BIT(VMBUS_CHANPKT_SIZE_SHIFT)
+#define VMBUS_CHANPKT_HLEN_MIN \
+ (sizeof(struct vmbus_chanpkt_hdr) >> VMBUS_CHANPKT_SIZE_SHIFT)
+
+/*
+ * Buffer ring
+ */
+struct vmbus_bufring {
+ volatile uint32_t windex;
+ volatile uint32_t rindex;
+
+ /*
+ * Interrupt mask {0,1}
+ *
+ * For TX bufring, host set this to 1, when it is processing
+ * the TX bufring, so that we can safely skip the TX event
+ * notification to host.
+ *
+ * For RX bufring, once this is set to 1 by us, host will not
+ * further dispatch interrupts to us, even if there are data
+ * pending on the RX bufring. This effectively disables the
+ * interrupt of the channel to which this RX bufring is attached.
+ */
+ volatile uint32_t imask;
+
+ /*
+ * Win8 uses some of the reserved bits to implement
+ * interrupt driven flow management. On the send side
+ * we can request that the receiver interrupt the sender
+ * when the ring transitions from being full to being able
+ * to handle a message of size "pending_send_sz".
+ *
+ * Add necessary state for this enhancement.
+ */
+ volatile uint32_t pending_send;
+ uint32_t reserved1[12];
+
+ union {
+ struct {
+ uint32_t feat_pending_send_sz:1;
+ };
+ uint32_t value;
+ } feature_bits;
+
+ /*
+ * Ring data starts after PAGE_SIZE offset (RINGDATA_START_OFFSET).
+ * !!! DO NOT place any fields below this !!!
+ */
+ uint8_t data[];
+} __packed;
+
+struct vmbus_br {
+ struct vmbus_bufring *vbr;
+ uint32_t dsize;
+ uint32_t windex; /* next available location */
+};
+
+struct vmbus_chanpkt_hdr {
+ uint16_t type; /* VMBUS_CHANPKT_TYPE_ */
+ uint16_t hlen; /* header len, in 8 bytes */
+ uint16_t tlen; /* total len, in 8 bytes */
+ uint16_t flags; /* VMBUS_CHANPKT_FLAG_ */
+ uint64_t xactid;
+} __packed;
+
+struct vmbus_chanpkt {
+ struct vmbus_chanpkt_hdr hdr;
+} __packed;
+
+struct vmbuspipe_hdr {
+ unsigned int flags;
+ unsigned int msgsize;
+} __packed;
+
+struct ic_version {
+ unsigned short major;
+ unsigned short minor;
+} __packed;
+
+struct icmsg_negotiate {
+ unsigned short icframe_vercnt;
+ unsigned short icmsg_vercnt;
+ unsigned int reserved;
+ struct ic_version icversion_data[]; /* any size array */
+} __packed;
+
+struct icmsg_hdr {
+ struct ic_version icverframe;
+ unsigned short icmsgtype;
+ struct ic_version icvermsg;
+ unsigned short icmsgsize;
+ unsigned int status;
+ unsigned char ictransaction_id;
+ unsigned char icflags;
+ unsigned char reserved[2];
+} __packed;
+
+int rte_vmbus_chan_recv_raw(struct vmbus_br *rxbr, void *data, uint32_t *len);
+int rte_vmbus_chan_send(struct vmbus_br *txbr, uint16_t type, void *data,
+ uint32_t dlen, uint32_t flags);
+void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen);
+
+/* Amount of space available for write */
+static inline uint32_t vmbus_br_availwrite(const struct vmbus_br *br, uint32_t windex)
+{
+ uint32_t rindex = br->vbr->rindex;
+
+ if (windex >= rindex)
+ return br->dsize - (windex - rindex);
+ else
+ return rindex - windex;
+}
+
+static inline uint32_t vmbus_br_availread(const struct vmbus_br *br)
+{
+ return br->dsize - vmbus_br_availwrite(br, br->vbr->windex);
+}
+
+#endif /* !_VMBUS_BUF_H_ */
--
2.34.1
Add a new UIO-based driver that generically supports low speed Hyper-V
VMBus devices. This driver can be bound to VMBus devices by user space
drivers that provide device-specific management.
The new driver provides the following core functionality, which is
suitable for low speed devices:
* A single VMBus channel for each device
* Ability to specify the VMBus channel ring buffer size for each device
* Host notification via a hypercall instead of monitor bits
Signed-off-by: Saurabh Sengar <[email protected]>
Reviewed-by: Michael Kelley <[email protected]>
---
[V4]
- Added Reviewed-by
[V3]
- Removed ringbuffer sysfs entry and used uio framework for mmap
- Remove ".id_table = NULL"
- kasprintf -> devm_kasprintf
- Change global variable ring_size to per device
- More checks on value which can be set for ring_size
- Remove driverctl, and used echo command instead for driver documentation
- Remove unnecessary one time use macros
- Change kernel version and date for sysfs documentation
- Update documentation
- Better commit message
[V2]
- Update driver info in Documentation/driver-api/uio-howto.rst
- Update ring_size sysfs info in Documentation/ABI/stable/sysfs-bus-vmbus
- Remove DRIVER_VERSION
- Remove refcnt
- scnprintf -> sysfs_emit
- sysfs_create_file -> ATTRIBUTE_GROUPS + ".driver.groups";
- sysfs_create_bin_file -> device_create_bin_file
- dev_notice -> dev_err
- remove MODULE_VERSION
Documentation/ABI/stable/sysfs-bus-vmbus | 10 ++
Documentation/driver-api/uio-howto.rst | 54 ++++++
drivers/uio/Kconfig | 12 ++
drivers/uio/Makefile | 1 +
drivers/uio/uio_hv_vmbus_client.c | 218 +++++++++++++++++++++++
5 files changed, 295 insertions(+)
create mode 100644 drivers/uio/uio_hv_vmbus_client.c
diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus b/Documentation/ABI/stable/sysfs-bus-vmbus
index 3066feae1d8d..7e77eda77be3 100644
--- a/Documentation/ABI/stable/sysfs-bus-vmbus
+++ b/Documentation/ABI/stable/sysfs-bus-vmbus
@@ -153,6 +153,16 @@ Contact: Stephen Hemminger <[email protected]>
Description: Binary file created by uio_hv_generic for ring buffer
Users: Userspace drivers
+What: /sys/bus/vmbus/devices/<UUID>/ring_size
+Date: September 2023
+KernelVersion: 6.6
+Contact: Saurabh Sengar <[email protected]>
+Description: File created by uio_hv_vmbus_client for setting device ring
+ buffer size. The value specified within the file denotes the
+ total memory allocation for the one complete ring buffer, which
+ includes the ring buffer header, of size PAGE_SIZE.
+Users: Userspace drivers
+
What: /sys/bus/vmbus/devices/<UUID>/channels/<N>/intr_in_full
Date: February 2019
KernelVersion: 5.0
diff --git a/Documentation/driver-api/uio-howto.rst b/Documentation/driver-api/uio-howto.rst
index 907ffa3b38f5..625c2bda369f 100644
--- a/Documentation/driver-api/uio-howto.rst
+++ b/Documentation/driver-api/uio-howto.rst
@@ -722,6 +722,60 @@ For example::
/sys/bus/vmbus/devices/3811fe4d-0fa0-4b62-981a-74fc1084c757/channels/21/ring
+Generic Hyper-V driver for low speed devices
+============================================
+
+The generic driver is a kernel module named uio_hv_vmbus_client. It
+supports slow devices on the Hyper-V VMBus similar to uio_hv_generic
+for faster devices. This driver also gives flexibility of customized
+ring buffer sizes.
+
+Making the driver recognize the device
+--------------------------------------
+
+Since the driver does not declare any device GUID's, it will not get
+loaded automatically and will not automatically bind to any devices. You
+must load it and allocate id to the driver yourself. For example, to use
+the fcopy device class GUID::
+
+ modprobe uio_hv_vmbus_client
+ echo "34d14be3-dee4-41c8-9ae7-6b174977c192" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/new_id
+
+If there already is a hardware specific kernel driver for the device,
+the generic driver still won't bind to it. In this case if you want to
+use the generic driver for a userspace library you'll have to manually unbind
+the hardware specific driver and bind the generic driver, using the device
+instance GUID like this::
+
+ echo "eb765408-105f-49b6-b4aa-c123b64d17d4" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/unbind
+ echo "eb765408-105f-49b6-b4aa-c123b64d17d4" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/bind
+
+You can verify that the device has been bound to the driver by looking
+for it in sysfs, for example like the following::
+
+ ls -l /sys/bus/vmbus/devices/eb765408-105f-49b6-b4aa-c123b64d17d4/driver
+
+Which if successful should print::
+
+ .../eb765408-105f-49b6-b4aa-c123b64d17d4/driver -> ../../../bus/vmbus/drivers/uio_hv_vmbus_client
+
+Things to know about uio_hv_vmbus_client
+----------------------------------------
+
+The uio_hv_vmbus_client driver maps the Hyper-V device ring buffer to userspace
+and offers an interface to manage it.
+
+The userspace API for mapping and performing read/write operations on the device
+ring buffer is implemented in tools/hv/vmbus_bufring.c. Userspace applications
+should use this file as a library and build their logic on top of it.
+
+Additionally, the uio_hv_vmbus_client driver offers the "ring_size" sysfs entry
+for setting the device ring buffer size before opening the device.
+
+For example::
+
+ /sys/bus/vmbus/devices/eb765408-105f-49b6-b4aa-c123b64d17d4/ring_size
+
Further information
===================
diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 2e16c5338e5b..bd4d27ecfc9a 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -166,6 +166,18 @@ config UIO_HV_GENERIC
If you compile this as a module, it will be called uio_hv_generic.
+config UIO_HV_SLOW_DEVICES
+ tristate "Generic driver for low speed VMBus devices"
+ depends on HYPERV
+ help
+ Generic driver that you can dynamically bind to low speed Hyper-V
+ VMBus devices to allow a user space driver to manage the device.
+ The driver provides a single VMBus channel and uses a hypercall
+ instead of monitor bits to interrupt the host. The driver provides
+ a configurable per-device ring buffer size.
+
+ If you compile this as a module, it will be called uio_hv_vmbus_client.
+
config UIO_DFL
tristate "Generic driver for DFL (Device Feature List) bus"
depends on FPGA_DFL
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index f2f416a14228..44be0f96da34 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -11,4 +11,5 @@ obj-$(CONFIG_UIO_PRUSS) += uio_pruss.o
obj-$(CONFIG_UIO_MF624) += uio_mf624.o
obj-$(CONFIG_UIO_FSL_ELBC_GPCM) += uio_fsl_elbc_gpcm.o
obj-$(CONFIG_UIO_HV_GENERIC) += uio_hv_generic.o
+obj-$(CONFIG_UIO_HV_SLOW_DEVICES) += uio_hv_vmbus_client.o
obj-$(CONFIG_UIO_DFL) += uio_dfl.o
diff --git a/drivers/uio/uio_hv_vmbus_client.c b/drivers/uio/uio_hv_vmbus_client.c
new file mode 100644
index 000000000000..68e5aec92a13
--- /dev/null
+++ b/drivers/uio/uio_hv_vmbus_client.c
@@ -0,0 +1,218 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * uio_hv_vmbus_client - UIO driver for low speed VMBus devices
+ *
+ * Copyright (c) 2023, Microsoft Corporation.
+ *
+ * Authors:
+ * Saurabh Sengar <[email protected]>
+ *
+ * Since the driver does not declare any device ids, userspace code must
+ * allocate an id and bind the device to the driver.
+ *
+ * For example, to associate the fcopy service with this driver:
+ * # echo "34d14be3-dee4-41c8-9ae7-6b174977c192" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/new_id
+ *
+ * If there already is a hardware specific kernel driver for the device,
+ * the generic driver still won't bind to it. In this case if you want to
+ * use the generic driver for a userspace library you'll have to manually unbind
+ * the hardware specific driver and bind the generic driver, using the device
+ * instance GUID like this:
+ * # echo "eb765408-105f-49b6-b4aa-c123b64d17d4" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/unbind
+ * # echo "eb765408-105f-49b6-b4aa-c123b64d17d4" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/bind
+ */
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/uio_driver.h>
+#include <linux/hyperv.h>
+
+struct uio_hv_vmbus_dev {
+ struct uio_info info;
+ struct hv_device *device;
+ int ring_size;
+};
+
+/*
+ * This is the irqcontrol callback to be registered to uio_info.
+ * It can be used to disable/enable interrupt from user space processes.
+ *
+ * @param info
+ * pointer to uio_info.
+ * @param irq_state
+ * state value. 1 to enable interrupt.
+ */
+static int uio_hv_vmbus_irqcontrol(struct uio_info *info, s32 irq_state)
+{
+ struct uio_hv_vmbus_dev *pdata = info->priv;
+ struct hv_device *hv_dev = pdata->device;
+
+ /* Issue a full memory barrier before triggering the notification */
+ virt_mb();
+
+ if (irq_state == 1)
+ vmbus_setevent(hv_dev->channel);
+
+ return 0;
+}
+
+/*
+ * Callback from vmbus_event when something is in inbound ring.
+ */
+static void uio_hv_vmbus_channel_cb(void *context)
+{
+ struct uio_hv_vmbus_dev *pdata = context;
+
+ /* Issue a full memory barrier before sending the event to userspace */
+ virt_mb();
+
+ uio_event_notify(&pdata->info);
+}
+
+static int uio_hv_vmbus_open(struct uio_info *info, struct inode *inode)
+{
+ struct uio_hv_vmbus_dev *pdata = container_of(info, struct uio_hv_vmbus_dev, info);
+ struct hv_device *hv_dev = pdata->device;
+ struct vmbus_channel *channel = hv_dev->channel;
+ void *ring_buffer;
+ int ret;
+
+ ret = vmbus_open(channel, pdata->ring_size, pdata->ring_size, NULL, 0,
+ uio_hv_vmbus_channel_cb, pdata);
+ if (ret) {
+ dev_err(&hv_dev->device, "error %d when opening the channel\n", ret);
+ return ret;
+ }
+ channel->inbound.ring_buffer->interrupt_mask = 0;
+ set_channel_read_mode(channel, HV_CALL_ISR);
+
+ /* set the mem pointer */
+ info->mem[0].name = "txrx_rings";
+ ring_buffer = page_address(channel->ringbuffer_page);
+ info->mem[0].addr = (uintptr_t)virt_to_phys(ring_buffer);
+ info->mem[0].size = channel->ringbuffer_pagecount << PAGE_SHIFT;
+ info->mem[0].memtype = UIO_MEM_IOVA;
+
+ return ret;
+}
+
+static int uio_hv_vmbus_release(struct uio_info *info, struct inode *inode)
+{
+ struct uio_hv_vmbus_dev *pdata = container_of(info, struct uio_hv_vmbus_dev, info);
+ struct hv_device *hv_dev = pdata->device;
+
+ vmbus_close(hv_dev->channel);
+
+ /* restore the mem pointer to its original state */
+ info->mem[0].name = NULL;
+ info->mem[0].addr = 0;
+ info->mem[0].size = 1;
+ info->mem[0].memtype = UIO_MEM_NONE;
+
+ return 0;
+}
+
+static ssize_t ring_size_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct uio_info *info = dev_get_drvdata(dev);
+ struct uio_hv_vmbus_dev *pdata = container_of(info, struct uio_hv_vmbus_dev, info);
+
+ return sysfs_emit(buf, "%d\n", pdata->ring_size);
+}
+
+static ssize_t ring_size_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ unsigned int val;
+ struct uio_info *info = dev_get_drvdata(dev);
+ struct uio_hv_vmbus_dev *pdata = container_of(info, struct uio_hv_vmbus_dev, info);
+
+ if (kstrtouint(buf, 0, &val) < 0)
+ return -EINVAL;
+
+ if (val < 2 * PAGE_SIZE || val % PAGE_SIZE)
+ return -EINVAL;
+
+ pdata->ring_size = val;
+
+ return count;
+}
+
+static DEVICE_ATTR_RW(ring_size);
+
+static struct attribute *uio_hv_vmbus_client_attrs[] = {
+ &dev_attr_ring_size.attr,
+ NULL,
+};
+ATTRIBUTE_GROUPS(uio_hv_vmbus_client);
+
+static int uio_hv_vmbus_probe(struct hv_device *dev, const struct hv_vmbus_device_id *dev_id)
+{
+ struct uio_hv_vmbus_dev *pdata;
+ int ret;
+ char *name = NULL;
+
+ pdata = devm_kzalloc(&dev->device, sizeof(*pdata), GFP_KERNEL);
+ if (!pdata)
+ return -ENOMEM;
+
+ name = devm_kasprintf(&dev->device, GFP_KERNEL, "%pUl", &dev->dev_instance);
+
+ /* Fill general uio info */
+ pdata->info.name = name; /* /sys/class/uio/uioX/name */
+ pdata->info.version = "1";
+ pdata->info.irqcontrol = uio_hv_vmbus_irqcontrol;
+ pdata->info.open = uio_hv_vmbus_open;
+ pdata->info.release = uio_hv_vmbus_release;
+ pdata->info.irq = UIO_IRQ_CUSTOM;
+ pdata->info.priv = pdata;
+ pdata->ring_size = VMBUS_RING_SIZE(3 * HV_HYP_PAGE_SIZE); /* Default ringbuffer size */
+ pdata->device = dev;
+
+ /* dummy value to register the mem pointers which will be updated by open */
+ pdata->info.mem[0].size = 1;
+
+ ret = uio_register_device(&dev->device, &pdata->info);
+ if (ret) {
+ dev_err(&dev->device, "uio_hv_vmbus register failed\n");
+ return ret;
+ }
+
+ hv_set_drvdata(dev, pdata);
+
+ return 0;
+}
+
+static void uio_hv_vmbus_remove(struct hv_device *dev)
+{
+ struct uio_hv_vmbus_dev *pdata = hv_get_drvdata(dev);
+
+ if (pdata)
+ uio_unregister_device(&pdata->info);
+}
+
+static struct hv_driver uio_hv_vmbus_drv = {
+ .driver.dev_groups = uio_hv_vmbus_client_groups,
+ .name = "uio_hv_vmbus_client",
+ .probe = uio_hv_vmbus_probe,
+ .remove = uio_hv_vmbus_remove,
+};
+
+static int __init uio_hv_vmbus_init(void)
+{
+ return vmbus_driver_register(&uio_hv_vmbus_drv);
+}
+
+static void __exit uio_hv_vmbus_exit(void)
+{
+ vmbus_driver_unregister(&uio_hv_vmbus_drv);
+}
+
+module_init(uio_hv_vmbus_init);
+module_exit(uio_hv_vmbus_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Saurabh Sengar <[email protected]>");
+MODULE_DESCRIPTION("Generic UIO driver for low speed VMBus devices");
--
2.34.1
Implement the file copy service for Linux guests on Hyper-V. This
permits the host to copy a file (over VMBus) into the guest. This
facility is part of "guest integration services" supported on the
Hyper-V platform.
Here is a link that provides additional details on this functionality:
http://technet.microsoft.com/en-us/library/dn464282.aspx
This new fcopy application uses uio_hv_vmbus_client driver which
makes the earlier hv_util based driver and application obsolete.
Signed-off-by: Saurabh Sengar <[email protected]>
---
[V4]
- Add error check for setting ring_size value in sysfs entry
- Add error handling in fcopy_get_instance_id for instance id not found case
[V3]
- Improve cover commit messages
- Improve debug prints
- Instead of hardcoded instance id, query from class id sysfs
- Set the ring_size value from application
- Update the application to mmap /dev/uio instead of sysfs
- new application compilation dependent on x86
[V2]
- simpler sysfs path
tools/hv/Build | 1 +
tools/hv/Makefile | 10 +-
tools/hv/hv_fcopy_uio_daemon.c | 587 +++++++++++++++++++++++++++++++++
3 files changed, 597 insertions(+), 1 deletion(-)
create mode 100644 tools/hv/hv_fcopy_uio_daemon.c
diff --git a/tools/hv/Build b/tools/hv/Build
index 2a667d3d94cb..efcbb74a0d23 100644
--- a/tools/hv/Build
+++ b/tools/hv/Build
@@ -2,3 +2,4 @@ hv_kvp_daemon-y += hv_kvp_daemon.o
hv_vss_daemon-y += hv_vss_daemon.o
hv_fcopy_daemon-y += hv_fcopy_daemon.o
vmbus_bufring-y += vmbus_bufring.o
+hv_fcopy_uio_daemon-y += hv_fcopy_uio_daemon.o
diff --git a/tools/hv/Makefile b/tools/hv/Makefile
index 33cf488fd20f..678c6c450a53 100644
--- a/tools/hv/Makefile
+++ b/tools/hv/Makefile
@@ -21,8 +21,10 @@ override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
ifeq ($(SRCARCH),x86)
ALL_LIBS := libvmbus_bufring.a
-endif
+ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon hv_fcopy_uio_daemon
+else
ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
+endif
ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS)) $(patsubst %,$(OUTPUT)%,$(ALL_LIBS))
ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh hv_set_ifconfig.sh
@@ -56,6 +58,12 @@ $(HV_FCOPY_DAEMON_IN): FORCE
$(OUTPUT)hv_fcopy_daemon: $(HV_FCOPY_DAEMON_IN)
$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
+HV_FCOPY_UIO_DAEMON_IN := $(OUTPUT)hv_fcopy_uio_daemon-in.o
+$(HV_FCOPY_UIO_DAEMON_IN): FORCE
+ $(Q)$(MAKE) $(build)=hv_fcopy_uio_daemon
+$(OUTPUT)hv_fcopy_uio_daemon: $(HV_FCOPY_UIO_DAEMON_IN) libvmbus_bufring.a
+ $(QUIET_LINK)$(CC) -lm $< -L. -lvmbus_bufring -o $@
+
clean:
rm -f $(ALL_PROGRAMS)
find $(or $(OUTPUT),.) -name '*.o' -delete -o -name '\.*.d' -delete
diff --git a/tools/hv/hv_fcopy_uio_daemon.c b/tools/hv/hv_fcopy_uio_daemon.c
new file mode 100644
index 000000000000..b35737082c91
--- /dev/null
+++ b/tools/hv/hv_fcopy_uio_daemon.c
@@ -0,0 +1,587 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * An implementation of host to guest copy functionality for Linux.
+ *
+ * Copyright (C) 2023, Microsoft, Inc.
+ *
+ * Author : K. Y. Srinivasan <[email protected]>
+ * Author : Saurabh Sengar <[email protected]>
+ *
+ */
+
+#include <dirent.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <locale.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syslog.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <linux/hyperv.h>
+#include "vmbus_bufring.h"
+
+#define ICMSGTYPE_NEGOTIATE 0
+#define ICMSGTYPE_FCOPY 7
+
+#define WIN8_SRV_MAJOR 1
+#define WIN8_SRV_MINOR 1
+#define WIN8_SRV_VERSION (WIN8_SRV_MAJOR << 16 | WIN8_SRV_MINOR)
+
+#define MAX_PATH_LEN 300
+#define MAX_LINE_LEN 40
+#define DEVICES_SYSFS "/sys/bus/vmbus/devices"
+#define FCOPY_CLASS_ID "34d14be3-dee4-41c8-9ae7-6b174977c192"
+
+#define FCOPY_VER_COUNT 1
+static const int fcopy_versions[] = {
+ WIN8_SRV_VERSION
+};
+
+#define FW_VER_COUNT 1
+static const int fw_versions[] = {
+ UTIL_FW_VERSION
+};
+
+#define HV_RING_SIZE (4 * 4096)
+
+unsigned char desc[HV_RING_SIZE];
+
+static int target_fd;
+static char target_fname[PATH_MAX];
+static unsigned long long filesize;
+
+static int hv_fcopy_create_file(char *file_name, char *path_name, __u32 flags)
+{
+ int error = HV_E_FAIL;
+ char *q, *p;
+
+ filesize = 0;
+ p = (char *)path_name;
+ snprintf(target_fname, sizeof(target_fname), "%s/%s",
+ (char *)path_name, (char *)file_name);
+
+ /*
+ * Check to see if the path is already in place; if not,
+ * create if required.
+ */
+ while ((q = strchr(p, '/')) != NULL) {
+ if (q == p) {
+ p++;
+ continue;
+ }
+ *q = '\0';
+ if (access(path_name, F_OK)) {
+ if (flags & CREATE_PATH) {
+ if (mkdir(path_name, 0755)) {
+ syslog(LOG_ERR, "Failed to create %s",
+ path_name);
+ goto done;
+ }
+ } else {
+ syslog(LOG_ERR, "Invalid path: %s", path_name);
+ goto done;
+ }
+ }
+ p = q + 1;
+ *q = '/';
+ }
+
+ if (!access(target_fname, F_OK)) {
+ syslog(LOG_INFO, "File: %s exists", target_fname);
+ if (!(flags & OVER_WRITE)) {
+ error = HV_ERROR_ALREADY_EXISTS;
+ goto done;
+ }
+ }
+
+ target_fd = open(target_fname,
+ O_RDWR | O_CREAT | O_TRUNC | O_CLOEXEC, 0744);
+ if (target_fd == -1) {
+ syslog(LOG_INFO, "Open Failed: %s", strerror(errno));
+ goto done;
+ }
+
+ error = 0;
+done:
+ if (error)
+ target_fname[0] = '\0';
+ return error;
+}
+
+static int hv_copy_data(struct hv_do_fcopy *cpmsg)
+{
+ ssize_t bytes_written;
+ int ret = 0;
+
+ bytes_written = pwrite(target_fd, cpmsg->data, cpmsg->size,
+ cpmsg->offset);
+
+ filesize += cpmsg->size;
+ if (bytes_written != cpmsg->size) {
+ switch (errno) {
+ case ENOSPC:
+ ret = HV_ERROR_DISK_FULL;
+ break;
+ default:
+ ret = HV_E_FAIL;
+ break;
+ }
+ syslog(LOG_ERR, "pwrite failed to write %llu bytes: %ld (%s)",
+ filesize, (long)bytes_written, strerror(errno));
+ }
+
+ return ret;
+}
+
+/*
+ * Reset target_fname to "" in the two below functions for hibernation: if
+ * the fcopy operation is aborted by hibernation, the daemon should remove the
+ * partially-copied file; to achieve this, the hv_utils driver always fakes a
+ * CANCEL_FCOPY message upon suspend, and later when the VM resumes back,
+ * the daemon calls hv_copy_cancel() to remove the file; if a file is copied
+ * successfully before suspend, hv_copy_finished() must reset target_fname to
+ * avoid that the file can be incorrectly removed upon resume, since the faked
+ * CANCEL_FCOPY message is spurious in this case.
+ */
+static int hv_copy_finished(void)
+{
+ close(target_fd);
+ target_fname[0] = '\0';
+ return 0;
+}
+
+static void print_usage(char *argv[])
+{
+ fprintf(stderr, "Usage: %s [options]\n"
+ "Options are:\n"
+ " -n, --no-daemon stay in foreground, don't daemonize\n"
+ " -h, --help print this help\n", argv[0]);
+}
+
+static bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp, unsigned char *buf,
+ unsigned int buflen, const int *fw_version, int fw_vercnt,
+ const int *srv_version, int srv_vercnt,
+ int *nego_fw_version, int *nego_srv_version)
+{
+ int icframe_major, icframe_minor;
+ int icmsg_major, icmsg_minor;
+ int fw_major, fw_minor;
+ int srv_major, srv_minor;
+ int i, j;
+ bool found_match = false;
+ struct icmsg_negotiate *negop;
+
+ /* Check that there's enough space for icframe_vercnt, icmsg_vercnt */
+ if (buflen < ICMSG_HDR + offsetof(struct icmsg_negotiate, reserved)) {
+ syslog(LOG_ERR, "Invalid icmsg negotiate");
+ return false;
+ }
+
+ icmsghdrp->icmsgsize = 0x10;
+ negop = (struct icmsg_negotiate *)&buf[ICMSG_HDR];
+
+ icframe_major = negop->icframe_vercnt;
+ icframe_minor = 0;
+
+ icmsg_major = negop->icmsg_vercnt;
+ icmsg_minor = 0;
+
+ /* Validate negop packet */
+ if (icframe_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
+ icmsg_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
+ ICMSG_NEGOTIATE_PKT_SIZE(icframe_major, icmsg_major) > buflen) {
+ syslog(LOG_ERR, "Invalid icmsg negotiate - icframe_major: %u, icmsg_major: %u\n",
+ icframe_major, icmsg_major);
+ goto fw_error;
+ }
+
+ /*
+ * Select the framework version number we will
+ * support.
+ */
+
+ for (i = 0; i < fw_vercnt; i++) {
+ fw_major = (fw_version[i] >> 16);
+ fw_minor = (fw_version[i] & 0xFFFF);
+
+ for (j = 0; j < negop->icframe_vercnt; j++) {
+ if (negop->icversion_data[j].major == fw_major &&
+ negop->icversion_data[j].minor == fw_minor) {
+ icframe_major = negop->icversion_data[j].major;
+ icframe_minor = negop->icversion_data[j].minor;
+ found_match = true;
+ break;
+ }
+ }
+
+ if (found_match)
+ break;
+ }
+
+ if (!found_match)
+ goto fw_error;
+
+ found_match = false;
+
+ for (i = 0; i < srv_vercnt; i++) {
+ srv_major = (srv_version[i] >> 16);
+ srv_minor = (srv_version[i] & 0xFFFF);
+
+ for (j = negop->icframe_vercnt;
+ (j < negop->icframe_vercnt + negop->icmsg_vercnt);
+ j++) {
+ if (negop->icversion_data[j].major == srv_major &&
+ negop->icversion_data[j].minor == srv_minor) {
+ icmsg_major = negop->icversion_data[j].major;
+ icmsg_minor = negop->icversion_data[j].minor;
+ found_match = true;
+ break;
+ }
+ }
+
+ if (found_match)
+ break;
+ }
+
+ /*
+ * Respond with the framework and service
+ * version numbers we can support.
+ */
+fw_error:
+ if (!found_match) {
+ negop->icframe_vercnt = 0;
+ negop->icmsg_vercnt = 0;
+ } else {
+ negop->icframe_vercnt = 1;
+ negop->icmsg_vercnt = 1;
+ }
+
+ if (nego_fw_version)
+ *nego_fw_version = (icframe_major << 16) | icframe_minor;
+
+ if (nego_srv_version)
+ *nego_srv_version = (icmsg_major << 16) | icmsg_minor;
+
+ negop->icversion_data[0].major = icframe_major;
+ negop->icversion_data[0].minor = icframe_minor;
+ negop->icversion_data[1].major = icmsg_major;
+ negop->icversion_data[1].minor = icmsg_minor;
+
+ return found_match;
+}
+
+static void wcstoutf8(char *dest, const __u16 *src, size_t dest_size)
+{
+ size_t len = 0;
+
+ while (len < dest_size) {
+ if (src[len] < 0x80)
+ dest[len++] = (char)(*src++);
+ else
+ dest[len++] = 'X';
+ }
+
+ dest[len] = '\0';
+}
+
+static int hv_fcopy_start(struct hv_start_fcopy *smsg_in)
+{
+ setlocale(LC_ALL, "en_US.utf8");
+ size_t file_size, path_size;
+ char *file_name, *path_name;
+ char *in_file_name = (char *)smsg_in->file_name;
+ char *in_path_name = (char *)smsg_in->path_name;
+
+ file_size = wcstombs(NULL, (const wchar_t *restrict)in_file_name, 0) + 1;
+ path_size = wcstombs(NULL, (const wchar_t *restrict)in_path_name, 0) + 1;
+
+ file_name = (char *)malloc(file_size * sizeof(char));
+ path_name = (char *)malloc(path_size * sizeof(char));
+
+ wcstoutf8(file_name, (__u16 *)in_file_name, file_size);
+ wcstoutf8(path_name, (__u16 *)in_path_name, path_size);
+
+ return hv_fcopy_create_file(file_name, path_name, smsg_in->copy_flags);
+}
+
+static int hv_fcopy_send_data(struct hv_fcopy_hdr *fcopy_msg, int recvlen)
+{
+ int operation = fcopy_msg->operation;
+
+ /*
+ * The strings sent from the host are encoded in
+ * utf16; convert it to utf8 strings.
+ * The host assures us that the utf16 strings will not exceed
+ * the max lengths specified. We will however, reserve room
+ * for the string terminating character - in the utf16s_utf8s()
+ * function we limit the size of the buffer where the converted
+ * string is placed to W_MAX_PATH -1 to guarantee
+ * that the strings can be properly terminated!
+ */
+
+ switch (operation) {
+ case START_FILE_COPY:
+ return hv_fcopy_start((struct hv_start_fcopy *)fcopy_msg);
+ case WRITE_TO_FILE:
+ return hv_copy_data((struct hv_do_fcopy *)fcopy_msg);
+ case COMPLETE_FCOPY:
+ return hv_copy_finished();
+ }
+
+ return HV_E_FAIL;
+}
+
+/* process the packet recv from host */
+static int fcopy_pkt_process(struct vmbus_br *txbr)
+{
+ int ret, offset, pktlen;
+ int fcopy_srv_version;
+ const struct vmbus_chanpkt_hdr *pkt;
+ struct hv_fcopy_hdr *fcopy_msg;
+ struct icmsg_hdr *icmsghdr;
+
+ pkt = (const struct vmbus_chanpkt_hdr *)desc;
+ offset = pkt->hlen << 3;
+ pktlen = (pkt->tlen << 3) - offset;
+ icmsghdr = (struct icmsg_hdr *)&desc[offset + sizeof(struct vmbuspipe_hdr)];
+ icmsghdr->status = HV_E_FAIL;
+
+ if (icmsghdr->icmsgtype == ICMSGTYPE_NEGOTIATE) {
+ if (vmbus_prep_negotiate_resp(icmsghdr, desc + offset, pktlen, fw_versions,
+ FW_VER_COUNT, fcopy_versions, FCOPY_VER_COUNT,
+ NULL, &fcopy_srv_version)) {
+ syslog(LOG_INFO, "FCopy IC version %d.%d",
+ fcopy_srv_version >> 16, fcopy_srv_version & 0xFFFF);
+ icmsghdr->status = 0;
+ }
+ } else if (icmsghdr->icmsgtype == ICMSGTYPE_FCOPY) {
+ /* Ensure recvlen is big enough to contain hv_fcopy_hdr */
+ if (pktlen < ICMSG_HDR + sizeof(struct hv_fcopy_hdr)) {
+ syslog(LOG_ERR, "Invalid Fcopy hdr. Packet length too small: %u",
+ pktlen);
+ return -ENOBUFS;
+ }
+
+ fcopy_msg = (struct hv_fcopy_hdr *)&desc[offset + ICMSG_HDR];
+ icmsghdr->status = hv_fcopy_send_data(fcopy_msg, pktlen);
+ }
+
+ icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;
+ ret = rte_vmbus_chan_send(txbr, 0x6, desc + offset, pktlen, 0);
+ if (ret) {
+ syslog(LOG_ERR, "Write to ringbuffer failed err: %d", ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static void fcopy_get_first_folder(char *path, char *chan_no)
+{
+ DIR *dir = opendir(path);
+ struct dirent *entry;
+
+ if (!dir) {
+ syslog(LOG_ERR, "Failed to open directory (errno=%s).\n", strerror(errno));
+ return;
+ }
+
+ while ((entry = readdir(dir)) != NULL) {
+ if (entry->d_type == DT_DIR && strcmp(entry->d_name, ".") != 0 &&
+ strcmp(entry->d_name, "..") != 0) {
+ strcpy(chan_no, entry->d_name);
+ break;
+ }
+ }
+
+ closedir(dir);
+}
+
+static void fcopy_set_ring_size(char *path, char *inst, int size)
+{
+ char ring_size_path[MAX_PATH_LEN] = {0};
+ FILE *fd;
+
+ snprintf(ring_size_path, sizeof(ring_size_path), "%s/%s/%s", path, inst, "ring_size");
+ fd = fopen(ring_size_path, "w");
+ if (!fd) {
+ syslog(LOG_WARNING, "Failed to open ring_size file (errno=%s).\n", strerror(errno));
+ return;
+ }
+
+ setvbuf(fd, NULL, _IONBF, 0); /* don't allow buffering to catch sysfs store error */
+ if (fprintf(fd, "%d", size) < 0)
+ syslog(LOG_WARNING, "Failed to set %d as ring size (errno=%s).\n",
+ size, strerror(errno));
+
+ fclose(fd);
+}
+
+static char *fcopy_read_sysfs(char *path, char *buf, int len)
+{
+ FILE *fd;
+ char *ret;
+
+ fd = fopen(path, "r");
+ if (!fd)
+ return NULL;
+
+ ret = fgets(buf, len, fd);
+ fclose(fd);
+
+ return ret;
+}
+
+static int fcopy_get_instance_id(char *path, char *class_id, char *inst)
+{
+ DIR *dir = opendir(path);
+ struct dirent *entry;
+ char tmp_path[MAX_PATH_LEN] = {0};
+ char line[MAX_LINE_LEN];
+ int ret = -EINVAL;
+
+ if (!dir) {
+ syslog(LOG_ERR, "Failed to open directory (errno=%s).", strerror(errno));
+ return ret;
+ }
+
+ while ((entry = readdir(dir)) != NULL) {
+ if (entry->d_type == DT_LNK && strcmp(entry->d_name, ".") != 0 &&
+ strcmp(entry->d_name, "..") != 0) {
+ /* search for the sysfs path with matching class_id */
+ snprintf(tmp_path, sizeof(tmp_path), "%s/%s/%s",
+ path, entry->d_name, "class_id");
+ if (!fcopy_read_sysfs(tmp_path, line, MAX_LINE_LEN))
+ continue;
+
+ /* class id matches, now fetch the instance id from device_id */
+ if (strstr(line, class_id)) {
+ snprintf(tmp_path, sizeof(tmp_path), "%s/%s/%s",
+ path, entry->d_name, "device_id");
+ if (!fcopy_read_sysfs(tmp_path, line, MAX_LINE_LEN))
+ continue;
+ /* remove braces */
+ strncpy(inst, line + 1, strlen(line) - 3);
+ ret = 0;
+ goto closedir;
+ }
+ }
+ }
+
+ syslog(LOG_ERR, "Failed to fetch instance id");
+closedir:
+ closedir(dir);
+ return ret;
+}
+
+int main(int argc, char *argv[])
+{
+ int fcopy_fd = -1, tmp = 1;
+ int daemonize = 1, long_index = 0, opt, ret = -EINVAL;
+ struct vmbus_br txbr, rxbr;
+ void *ring;
+ uint32_t len = HV_RING_SIZE;
+ char uio_name[10] = {0};
+ char uio_dev_path[15] = {0};
+ char uio_path[MAX_PATH_LEN] = {0};
+ char inst[MAX_LINE_LEN] = {0};
+
+ static struct option long_options[] = {
+ {"help", no_argument, 0, 'h' },
+ {"no-daemon", no_argument, 0, 'n' },
+ {0, 0, 0, 0 }
+ };
+
+ while ((opt = getopt_long(argc, argv, "hn", long_options,
+ &long_index)) != -1) {
+ switch (opt) {
+ case 'n':
+ daemonize = 0;
+ break;
+ case 'h':
+ default:
+ print_usage(argv);
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ if (daemonize && daemon(1, 0)) {
+ syslog(LOG_ERR, "daemon() failed; error: %s", strerror(errno));
+ exit(EXIT_FAILURE);
+ }
+
+ openlog("HV_UIO_FCOPY", 0, LOG_USER);
+ syslog(LOG_INFO, "starting; pid is:%d", getpid());
+
+ /* get instance id */
+ if (fcopy_get_instance_id(DEVICES_SYSFS, FCOPY_CLASS_ID, inst))
+ exit(EXIT_FAILURE);
+
+ /* set ring_size value */
+ fcopy_set_ring_size(DEVICES_SYSFS, inst, HV_RING_SIZE);
+
+ /* get /dev/uioX dev path and open it */
+ snprintf(uio_path, sizeof(uio_path), "%s/%s/%s", DEVICES_SYSFS, inst, "uio");
+ fcopy_get_first_folder(uio_path, uio_name);
+ snprintf(uio_dev_path, sizeof(uio_dev_path), "/dev/%s", uio_name);
+ fcopy_fd = open(uio_dev_path, O_RDWR);
+
+ if (fcopy_fd < 0) {
+ syslog(LOG_ERR, "open %s failed; error: %d %s",
+ uio_dev_path, errno, strerror(errno));
+ syslog(LOG_ERR, "Please make sure module uio_hv_vmbus_client is loaded and" \
+ " device is not used by any other application\n");
+ ret = fcopy_fd;
+ exit(EXIT_FAILURE);
+ }
+
+ ring = mmap(NULL, 2 * HV_RING_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fcopy_fd, 0);
+ if (ring == MAP_FAILED) {
+ ret = errno;
+ syslog(LOG_ERR, "mmap ringbuffer failed; error: %d %s", ret, strerror(ret));
+ goto close;
+ }
+ vmbus_br_setup(&txbr, ring, HV_RING_SIZE);
+ vmbus_br_setup(&rxbr, (char *)ring + HV_RING_SIZE, HV_RING_SIZE);
+
+ while (1) {
+ /*
+ * In this loop we process fcopy messages after the
+ * handshake is complete.
+ */
+ ret = pread(fcopy_fd, &tmp, sizeof(int), 0);
+ if (ret < 0) {
+ syslog(LOG_ERR, "pread failed: %s", strerror(errno));
+ continue;
+ }
+
+ len = HV_RING_SIZE;
+ ret = rte_vmbus_chan_recv_raw(&rxbr, desc, &len);
+ if (unlikely(ret <= 0)) {
+ /* This indicates a failure to communicate (or worse) */
+ syslog(LOG_ERR, "VMBus channel recv error: %d", ret);
+ } else {
+ ret = fcopy_pkt_process(&txbr);
+ if (ret < 0)
+ goto close;
+
+ /* Signal host */
+ tmp = 1;
+ if ((write(fcopy_fd, &tmp, sizeof(int))) != sizeof(int)) {
+ ret = errno;
+ syslog(LOG_ERR, "Registration failed: %s\n", strerror(ret));
+ goto close;
+ }
+ }
+ }
+close:
+ close(fcopy_fd);
+ return ret;
+}
--
2.34.1
On Fri, Aug 04, 2023 at 12:09:53AM -0700, Saurabh Sengar wrote:
> Hyper-V is adding multiple low speed "speciality" synthetic devices.
> Instead of writing a new kernel-level VMBus driver for each device,
> make the devices accessible to user space through a UIO-based
> hv_vmbus_client driver. Each device can then be supported by a user
> space driver. This approach optimizes the development process and
> provides flexibility to user space applications to control the key
> interactions with the VMBus ring buffer.
Why is it faster to write userspace drivers here? Where are those new
drivers, and why can't they be proper kernel drivers? Are all hyper-v
drivers going to move to userspace now?
> The new synthetic devices are low speed devices that don't support
> VMBus monitor bits, and so they must use vmbus_setevent() to notify
> the host of ring buffer updates. The new driver provides this
> functionality along with a configurable ring buffer size.
>
> Moreover, this series of patches incorporates an update to the fcopy
> application, enabling it to seamlessly utilize the new interface. The
> older fcopy driver and application will be phased out gradually.
> Development of other similar userspace drivers is still underway.
>
> Moreover, this patch series adds a new implementation of the fcopy
> application that uses the new UIO driver. The older fcopy driver and
> application will be phased out gradually. Development of other similar
> userspace drivers is still underway.
You are adding a new user api with the "ring buffer" size api, which is
odd for normal UIO drivers as that's not something that UIO was designed
for.
Why not just make you own generic type uiofs type kernel api if you
really want to do all of this type of thing in userspace instead of in
the kernel?
And finally, if this is going to replace the fcopy driver, then it
should be part of this series as well, removing it to show that you
really mean it when you say it will be deleted, otherwise we all know
that will never happen.
thanks,
greg k-h