This series converts kvp/vss daemons to use misc char devices instead of
netlink for userspace/kernel communication and then updates fcopy to be
consistent with kvp/vss.
Userspace/kernel communication via netlink has a number of issues:
- It is hard for userspace to figure out if the kernel part was loaded or not
and this fact can change as there is a way to enable/disable the service from
host side. Racy daemon startup is also a problem.
- When the userspace daemon restarts/dies kernel part doesn't receive a
notification.
- Netlink communication is not stable under heavy load.
- ...
RFC: I'm a bit puzzled on how to split commits 1 and 2 avoiding breakages.
Commit 3 can definitely be split, however, it is consistent with commits 1 and
2 at this moment and I'm not sure such split will simplify the review.
Vitaly Kuznetsov (3):
Drivers: hv: kvp: convert userspace/kernel communication to using char
device
Drivers: hv: vss: convert userspace/kernel communication to using char
device
Drivers: hv: fcopy: make it consistent with vss/kvp
drivers/hv/hv_fcopy.c | 395 +++++++++++++++++++++++++------------------
drivers/hv/hv_kvp.c | 396 +++++++++++++++++++++++++++-----------------
drivers/hv/hv_snapshot.c | 335 +++++++++++++++++++++++++++----------
include/uapi/linux/hyperv.h | 10 ++
tools/hv/hv_fcopy_daemon.c | 48 ++++--
tools/hv/hv_kvp_daemon.c | 187 ++++-----------------
tools/hv/hv_vss_daemon.c | 141 +++-------------
7 files changed, 824 insertions(+), 688 deletions(-)
--
1.9.3
Userspace/kernel communication via netlink has a number of issues:
- It is hard for userspace to figure out if the kernel part was loaded or not
and this fact can change as there is a way to enable/disable the service from
host side. Racy daemon startup is also a problem.
- When the userspace daemon restarts/dies kernel part doesn't receive a
notification.
- Netlink communication is not stable under heavy load.
- ...
Re-implement the communication using misc char device. Use ioctl to do
kernel/userspace version negotiation (doesn't make much sense at this moment
as we're breaking backwards compatibility but can be used in future).
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
drivers/hv/hv_kvp.c | 396 +++++++++++++++++++++++++++-----------------
include/uapi/linux/hyperv.h | 8 +
tools/hv/hv_kvp_daemon.c | 187 ++++-----------------
3 files changed, 287 insertions(+), 304 deletions(-)
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index beb8105..8078b1a 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -22,12 +22,16 @@
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#include <linux/net.h>
#include <linux/nls.h>
-#include <linux/connector.h>
+#include <linux/fs.h>
+#include <linux/sched.h>
#include <linux/workqueue.h>
+#include <linux/mutex.h>
#include <linux/hyperv.h>
+#include <linux/miscdevice.h>
+#include <linux/poll.h>
+#include <linux/uaccess.h>
/*
* Pre win8 version numbers used in ws2008 and ws 2008 r2 (win7)
@@ -45,46 +49,41 @@
#define WIN8_SRV_VERSION (WIN8_SRV_MAJOR << 16 | WIN8_SRV_MINOR)
/*
- * Global state maintained for transaction that is being processed.
- * Note that only one transaction can be active at any point in time.
- *
- * This state is set when we receive a request from the host; we
- * cleanup this state when the transaction is completed - when we respond
- * to the host with the key value.
+ * Global state maintained for the device. Note that only one transaction can
+ * be active at any point in time.
*/
+enum kvp_device_state {
+ KVP_DEVICE_INITIALIZING = 0, /* driver was loaded */
+ KVP_DEVICE_OPENED, /* device was opened */
+ KVP_READY, /* userspace was registered */
+ KVP_HOSTMSG_RECEIVED, /* message from host was received */
+ KVP_USERMSG_READY, /* message for userspace is ready */
+ KVP_USERSPACE_REQ, /* request to userspace was sent */
+ KVP_USERSPACE_RECV, /* reply from userspace was received */
+ KVP_DEVICE_DYING, /* driver unload is in progress */
+};
+
static struct {
- bool active; /* transaction status - active or not */
+ int state; /* kvp_device_state */
int recv_len; /* number of bytes received. */
- struct hv_kvp_msg *kvp_msg; /* current message */
struct vmbus_channel *recv_channel; /* chn we got the request */
u64 recv_req_id; /* request ID. */
void *kvp_context; /* for the channel callback */
-} kvp_transaction;
-
-/*
- * Before we can accept KVP messages from the host, we need
- * to handshake with the user level daemon. This state tracks
- * if we are in the handshake phase.
- */
-static bool in_hand_shake = true;
-
-/*
- * This state maintains the version number registered by the daemon.
- */
-static int dm_reg_value;
+ int dm_reg_value; /* daemon version number */
+ struct mutex lock; /* syncronization */
+ struct hv_kvp_msg user_msg; /* message to/from userspace */
+ struct hv_kvp_msg host_msg; /* message to/from host */
+ wait_queue_head_t proc_list; /* waiting processes */
+} kvp_device;
static void kvp_send_key(struct work_struct *dummy);
-
-
-static void kvp_respond_to_host(struct hv_kvp_msg *msg, int error);
+static void kvp_respond_to_host(int error);
static void kvp_work_func(struct work_struct *dummy);
-static void kvp_register(int);
static DECLARE_DELAYED_WORK(kvp_work, kvp_work_func);
static DECLARE_WORK(kvp_sendkey_work, kvp_send_key);
-static struct cb_id kvp_id = { CN_KVP_IDX, CN_KVP_VAL };
static const char kvp_name[] = "kvp_kernel_module";
static u8 *recv_buffer;
/*
@@ -92,31 +91,8 @@ static u8 *recv_buffer;
* As part of this registration, pass the LIC version number.
* This number has no meaning, it satisfies the registration protocol.
*/
-#define HV_DRV_VERSION "3.1"
-
-static void
-kvp_register(int reg_value)
-{
-
- struct cn_msg *msg;
- struct hv_kvp_msg *kvp_msg;
- char *version;
-
- msg = kzalloc(sizeof(*msg) + sizeof(struct hv_kvp_msg), GFP_ATOMIC);
+#define HV_DRV_VERSION 31
- if (msg) {
- kvp_msg = (struct hv_kvp_msg *)msg->data;
- version = kvp_msg->body.kvp_register.version;
- msg->id.idx = CN_KVP_IDX;
- msg->id.val = CN_KVP_VAL;
-
- kvp_msg->kvp_hdr.operation = reg_value;
- strcpy(version, HV_DRV_VERSION);
- msg->len = sizeof(struct hv_kvp_msg);
- cn_netlink_send(msg, 0, 0, GFP_ATOMIC);
- kfree(msg);
- }
-}
static void
kvp_work_func(struct work_struct *dummy)
{
@@ -124,7 +100,7 @@ kvp_work_func(struct work_struct *dummy)
* If the timer fires, the user-mode component has not responded;
* process the pending transaction.
*/
- kvp_respond_to_host(NULL, HV_E_FAIL);
+ kvp_respond_to_host(HV_E_FAIL);
}
static void poll_channel(struct vmbus_channel *channel)
@@ -138,36 +114,26 @@ static void poll_channel(struct vmbus_channel *channel)
}
-static int kvp_handle_handshake(struct hv_kvp_msg *msg)
+static int kvp_handle_handshake(u32 op)
{
- int ret = 1;
+ int ret = 0;
- switch (msg->kvp_hdr.operation) {
+ switch (op) {
case KVP_OP_REGISTER:
- dm_reg_value = KVP_OP_REGISTER;
+ kvp_device.dm_reg_value = KVP_OP_REGISTER;
pr_info("KVP: IP injection functionality not available\n");
pr_info("KVP: Upgrade the KVP daemon\n");
break;
case KVP_OP_REGISTER1:
- dm_reg_value = KVP_OP_REGISTER1;
+ kvp_device.dm_reg_value = KVP_OP_REGISTER1;
break;
default:
pr_info("KVP: incompatible daemon\n");
pr_info("KVP: KVP version: %d, Daemon version: %d\n",
- KVP_OP_REGISTER1, msg->kvp_hdr.operation);
- ret = 0;
+ KVP_OP_REGISTER1, op);
+ ret = 1;
}
- if (ret) {
- /*
- * We have a compatible daemon; complete the handshake.
- */
- pr_info("KVP: user-mode registering done.\n");
- kvp_register(dm_reg_value);
- kvp_transaction.active = false;
- if (kvp_transaction.kvp_context)
- poll_channel(kvp_transaction.kvp_context);
- }
return ret;
}
@@ -176,25 +142,11 @@ static int kvp_handle_handshake(struct hv_kvp_msg *msg)
* Callback when data is received from user mode.
*/
-static void
-kvp_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
+static void kvp_userwrite_callback(void)
{
- struct hv_kvp_msg *message;
+ struct hv_kvp_msg *message = &kvp_device.user_msg;
struct hv_kvp_msg_enumerate *data;
- int error = 0;
-
- message = (struct hv_kvp_msg *)msg->data;
-
- /*
- * If we are negotiating the version information
- * with the daemon; handle that first.
- */
-
- if (in_hand_shake) {
- if (kvp_handle_handshake(message))
- in_hand_shake = false;
- return;
- }
+ int error = 0;
/*
* Based on the version of the daemon, we propagate errors from the
@@ -203,7 +155,7 @@ kvp_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
data = &message->body.kvp_enum_data;
- switch (dm_reg_value) {
+ switch (kvp_device.dm_reg_value) {
case KVP_OP_REGISTER:
/*
* Null string is used to pass back error condition.
@@ -226,10 +178,9 @@ kvp_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
* to the host. But first, cancel the timeout.
*/
if (cancel_delayed_work_sync(&kvp_work))
- kvp_respond_to_host(message, error);
+ kvp_respond_to_host(error);
}
-
static int process_ob_ipinfo(void *in_msg, void *out_msg, int op)
{
struct hv_kvp_msg *in = in_msg;
@@ -337,32 +288,21 @@ static void process_ib_ipinfo(void *in_msg, void *out_msg, int op)
}
}
-
-
-
static void
kvp_send_key(struct work_struct *dummy)
{
- struct cn_msg *msg;
- struct hv_kvp_msg *message;
- struct hv_kvp_msg *in_msg;
- __u8 operation = kvp_transaction.kvp_msg->kvp_hdr.operation;
- __u8 pool = kvp_transaction.kvp_msg->kvp_hdr.pool;
+ struct hv_kvp_msg *message = &kvp_device.user_msg;
+ struct hv_kvp_msg *in_msg = &kvp_device.host_msg;
+ __u8 operation = in_msg->kvp_hdr.operation;
+ __u8 pool = in_msg->kvp_hdr.pool;
__u32 val32;
__u64 val64;
- int rc;
- msg = kzalloc(sizeof(*msg) + sizeof(struct hv_kvp_msg) , GFP_ATOMIC);
- if (!msg)
- return;
-
- msg->id.idx = CN_KVP_IDX;
- msg->id.val = CN_KVP_VAL;
+ mutex_lock(&kvp_device.lock);
- message = (struct hv_kvp_msg *)msg->data;
+ memset(message, 0, sizeof(struct hv_kvp_msg));
message->kvp_hdr.operation = operation;
message->kvp_hdr.pool = pool;
- in_msg = kvp_transaction.kvp_msg;
/*
* The key/value strings sent from the host are encoded in
@@ -446,15 +386,10 @@ kvp_send_key(struct work_struct *dummy)
break;
}
- msg->len = sizeof(struct hv_kvp_msg);
- rc = cn_netlink_send(msg, 0, 0, GFP_ATOMIC);
- if (rc) {
- pr_debug("KVP: failed to communicate to the daemon: %d\n", rc);
- if (cancel_delayed_work_sync(&kvp_work))
- kvp_respond_to_host(message, HV_E_FAIL);
- }
+ kvp_device.state = KVP_USERMSG_READY;
+ wake_up_interruptible(&kvp_device.proc_list);
- kfree(msg);
+ mutex_unlock(&kvp_device.lock);
return;
}
@@ -463,10 +398,10 @@ kvp_send_key(struct work_struct *dummy)
* Send a response back to the host.
*/
-static void
-kvp_respond_to_host(struct hv_kvp_msg *msg_to_host, int error)
+static void kvp_respond_to_host(int error)
{
struct hv_kvp_msg *kvp_msg;
+ struct hv_kvp_msg *msg_to_host = &kvp_device.user_msg;
struct hv_kvp_exchg_msg_value *kvp_data;
char *key_name;
char *value;
@@ -479,26 +414,13 @@ kvp_respond_to_host(struct hv_kvp_msg *msg_to_host, int error)
int ret;
/*
- * If a transaction is not active; log and return.
- */
-
- if (!kvp_transaction.active) {
- /*
- * This is a spurious call!
- */
- pr_warn("KVP: Transaction not active\n");
- return;
- }
- /*
* Copy the global state for completing the transaction. Note that
* only one transaction can be active at a time.
*/
- buf_len = kvp_transaction.recv_len;
- channel = kvp_transaction.recv_channel;
- req_id = kvp_transaction.recv_req_id;
-
- kvp_transaction.active = false;
+ buf_len = kvp_device.recv_len;
+ channel = kvp_device.recv_channel;
+ req_id = kvp_device.recv_req_id;
icmsghdrp = (struct icmsg_hdr *)
&recv_buffer[sizeof(struct vmbuspipe_hdr)];
@@ -528,7 +450,8 @@ kvp_respond_to_host(struct hv_kvp_msg *msg_to_host, int error)
&recv_buffer[sizeof(struct vmbuspipe_hdr) +
sizeof(struct icmsg_hdr)];
- switch (kvp_transaction.kvp_msg->kvp_hdr.operation) {
+
+ switch (kvp_device.host_msg.kvp_hdr.operation) {
case KVP_OP_GET_IP_INFO:
ret = process_ob_ipinfo(msg_to_host,
(struct hv_kvp_ip_msg *)kvp_msg,
@@ -586,6 +509,17 @@ response_done:
vmbus_sendpacket(channel, recv_buffer, buf_len, req_id,
VM_PKT_DATA_INBAND, 0);
+
+ /* We're ready to process next request, reset the device state */
+ if (kvp_device.state == KVP_USERSPACE_RECV ||
+ kvp_device.state == KVP_USERSPACE_REQ)
+ kvp_device.state = KVP_READY;
+ /*
+ * Make sure device state was set before polling the channel as
+ * processing can happen on a different CPU.
+ */
+ smp_mb();
+
poll_channel(channel);
}
@@ -612,14 +546,15 @@ void hv_kvp_onchannelcallback(void *context)
int util_fw_version;
int kvp_srv_version;
- if (kvp_transaction.active) {
+ if (kvp_device.state > KVP_READY) {
/*
* We will defer processing this callback once
* the current transaction is complete.
*/
- kvp_transaction.kvp_context = context;
+ kvp_device.kvp_context = channel;
return;
}
+ kvp_device.kvp_context = NULL;
vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 4, &recvlen,
&requestid);
@@ -661,11 +596,19 @@ void hv_kvp_onchannelcallback(void *context)
* transaction; note transactions are serialized.
*/
- kvp_transaction.recv_len = recvlen;
- kvp_transaction.recv_channel = channel;
- kvp_transaction.recv_req_id = requestid;
- kvp_transaction.active = true;
- kvp_transaction.kvp_msg = kvp_msg;
+ kvp_device.recv_len = recvlen;
+ kvp_device.recv_channel = channel;
+ kvp_device.recv_req_id = requestid;
+
+ if (kvp_device.state != KVP_READY) {
+ /* Userspace daemon is not connected, fail. */
+ kvp_respond_to_host(HV_E_FAIL);
+ return;
+ }
+
+ kvp_device.state = KVP_HOSTMSG_RECEIVED;
+ memcpy(&kvp_device.host_msg, kvp_msg,
+ sizeof(struct hv_kvp_msg));
/*
* Get the information from the
@@ -690,17 +633,166 @@ void hv_kvp_onchannelcallback(void *context)
recvlen, requestid,
VM_PKT_DATA_INBAND, 0);
}
+}
+
+static int kvp_op_open(struct inode *inode, struct file *f)
+{
+ if (kvp_device.state != KVP_DEVICE_INITIALIZING)
+ return -EBUSY;
+ kvp_device.state = KVP_DEVICE_OPENED;
+ return 0;
+}
+
+static int kvp_op_release(struct inode *inode, struct file *f)
+{
+ kvp_device.state = KVP_DEVICE_INITIALIZING;
+ return 0;
+}
+
+static ssize_t kvp_op_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ int ret = 0;
+
+ if (kvp_device.state == KVP_DEVICE_DYING)
+ return -EFAULT;
+
+ if (count != sizeof(struct hv_kvp_msg)) {
+ pr_warn("kvp_op_write: invalid write len: %d (expected: %d)\n",
+ (int)count, (int)sizeof(struct hv_kvp_msg));
+ return -EINVAL;
+ }
+ mutex_lock(&kvp_device.lock);
+
+ if (kvp_device.state == KVP_USERSPACE_REQ) {
+ if (!copy_from_user(&kvp_device.user_msg, buf,
+ sizeof(struct hv_kvp_msg))) {
+ kvp_device.state = KVP_USERSPACE_RECV;
+ kvp_userwrite_callback();
+ ret = sizeof(struct hv_kvp_msg);
+ } else
+ ret = -EFAULT;
+ } else {
+ pr_warn("kvp_op_write: invalid transaction state: %d\n",
+ kvp_device.state);
+ ret = -EINVAL;
+ }
+
+ mutex_unlock(&kvp_device.lock);
+ return ret;
}
+static ssize_t kvp_op_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ ssize_t ret = 0;
+
+ if (kvp_device.state == KVP_DEVICE_DYING)
+ return -EFAULT;
+
+ if (count != sizeof(struct hv_kvp_msg)) {
+ pr_warn("kvp_op_read: invalid read len: %d (expected: %d)\n",
+ (int)count, (int)sizeof(struct hv_kvp_msg));
+ return -EINVAL;
+ }
+
+ if (wait_event_interruptible(kvp_device.proc_list,
+ kvp_device.state == KVP_USERMSG_READY ||
+ kvp_device.state == KVP_DEVICE_DYING))
+ return -EFAULT;
+
+ if (kvp_device.state != KVP_USERMSG_READY)
+ return -EFAULT;
+
+ mutex_lock(&kvp_device.lock);
+
+ if (!copy_to_user(buf, &kvp_device.user_msg,
+ sizeof(struct hv_kvp_msg))) {
+ kvp_device.state = KVP_USERSPACE_REQ;
+ ret = sizeof(struct hv_kvp_msg);
+ } else
+ ret = -EFAULT;
+
+ mutex_unlock(&kvp_device.lock);
+ return ret;
+}
+
+static unsigned int kvp_op_poll(struct file *file, poll_table *wait)
+{
+ if (kvp_device.state == KVP_DEVICE_DYING)
+ return -EFAULT;
+
+ poll_wait(file, &kvp_device.proc_list, wait);
+ if (kvp_device.state == KVP_USERMSG_READY)
+ return POLLIN | POLLRDNORM;
+ return 0;
+}
+
+static long kvp_op_ioctl(struct file *fp,
+ unsigned int cmd, unsigned long arg)
+{
+ long ret = 0;
+ void __user *argp = (void __user *)arg;
+ u32 val32;
+
+ if (kvp_device.state == KVP_DEVICE_DYING)
+ return -EFAULT;
+
+ /* The only ioctl we have is registation */
+ if (kvp_device.state != KVP_DEVICE_OPENED)
+ return -EINVAL;
+
+ mutex_lock(&kvp_device.lock);
+
+ switch (cmd) {
+ case HYPERV_KVP_REGISTER:
+ if (copy_from_user(&val32, argp, sizeof(val32))) {
+ ret = -EFAULT;
+ break;
+ }
+ if (!kvp_handle_handshake(val32)) {
+ val32 = (u32)HV_DRV_VERSION;
+ if (copy_to_user(argp, &val32, sizeof(val32))) {
+ ret = -EFAULT;
+ break;
+ }
+ kvp_device.state = KVP_READY;
+ pr_info("KVP: user-mode registering done.\n");
+ if (kvp_device.kvp_context)
+ poll_channel(kvp_device.kvp_context);
+ } else
+ ret = -EINVAL;
+ break;
+
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+ mutex_unlock(&kvp_device.lock);
+ return ret;
+}
+
+static const struct file_operations kvp_fops = {
+ .owner = THIS_MODULE,
+ .read = kvp_op_read,
+ .write = kvp_op_write,
+ .release = kvp_op_release,
+ .open = kvp_op_open,
+ .poll = kvp_op_poll,
+ .unlocked_ioctl = kvp_op_ioctl
+};
+
+static struct miscdevice kvp_misc = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "vmbus/hv_kvp",
+ .fops = &kvp_fops,
+};
+
int
hv_kvp_init(struct hv_util_service *srv)
{
- int err;
-
- err = cn_add_callback(&kvp_id, kvp_name, kvp_cn_callback);
- if (err)
- return err;
recv_buffer = srv->recv_buffer;
/*
@@ -709,14 +801,20 @@ hv_kvp_init(struct hv_util_service *srv)
* Defer processing channel callbacks until the daemon
* has registered.
*/
- kvp_transaction.active = true;
+ kvp_device.state = KVP_DEVICE_INITIALIZING;
+ init_waitqueue_head(&kvp_device.proc_list);
+ mutex_init(&kvp_device.lock);
- return 0;
+ return misc_register(&kvp_misc);
}
void hv_kvp_deinit(void)
{
- cn_del_callback(&kvp_id);
+ kvp_device.state = KVP_DEVICE_DYING;
+ /* Make sure nobody sees the old state */
+ smp_mb();
+ wake_up_interruptible(&kvp_device.proc_list);
cancel_delayed_work_sync(&kvp_work);
cancel_work_sync(&kvp_sendkey_work);
+ misc_deregister(&kvp_misc);
}
diff --git a/include/uapi/linux/hyperv.h b/include/uapi/linux/hyperv.h
index bb1cb73..80713a3 100644
--- a/include/uapi/linux/hyperv.h
+++ b/include/uapi/linux/hyperv.h
@@ -26,6 +26,7 @@
#define _UAPI_HYPERV_H
#include <linux/uuid.h>
+#include <linux/types.h>
/*
* Framework version for util services.
@@ -389,4 +390,11 @@ struct hv_kvp_ip_msg {
struct hv_kvp_ipaddr_value kvp_ip_val;
} __attribute__((packed));
+/*
+ * Userspace registration ioctls. Userspace daemons are supposed to pass their
+ * version as a parameter and get driver version back. KVP daemon supplies
+ * either KVP_OP_REGISTER or KVP_OP_REGISTER1.
+ */
+#define HYPERV_KVP_REGISTER _IOWR('v', 0, __u32)
+
#endif /* _UAPI_HYPERV_H */
diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 408bb07..0c3cac7 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -33,7 +33,6 @@
#include <ctype.h>
#include <errno.h>
#include <arpa/inet.h>
-#include <linux/connector.h>
#include <linux/hyperv.h>
#include <linux/netlink.h>
#include <ifaddrs.h>
@@ -44,6 +43,7 @@
#include <dirent.h>
#include <net/if.h>
#include <getopt.h>
+#include <sys/ioctl.h>
/*
* KVP protocol: The user mode component first registers with the
@@ -79,9 +79,6 @@ enum {
DNS
};
-static struct sockaddr_nl addr;
-static int in_hand_shake = 1;
-
static char *os_name = "";
static char *os_major = "";
static char *os_minor = "";
@@ -1387,34 +1384,6 @@ kvp_get_domain_name(char *buffer, int length)
freeaddrinfo(info);
}
-static int
-netlink_send(int fd, struct cn_msg *msg)
-{
- struct nlmsghdr nlh = { .nlmsg_type = NLMSG_DONE };
- unsigned int size;
- struct msghdr message;
- struct iovec iov[2];
-
- size = sizeof(struct cn_msg) + msg->len;
-
- nlh.nlmsg_pid = getpid();
- nlh.nlmsg_len = NLMSG_LENGTH(size);
-
- iov[0].iov_base = &nlh;
- iov[0].iov_len = sizeof(nlh);
-
- iov[1].iov_base = msg;
- iov[1].iov_len = size;
-
- memset(&message, 0, sizeof(message));
- message.msg_name = &addr;
- message.msg_namelen = sizeof(addr);
- message.msg_iov = iov;
- message.msg_iovlen = 2;
-
- return sendmsg(fd, &message, 0);
-}
-
void print_usage(char *argv[])
{
fprintf(stderr, "Usage: %s [options]\n"
@@ -1425,23 +1394,18 @@ void print_usage(char *argv[])
int main(int argc, char *argv[])
{
- int fd, len, nl_group;
+ int kvp_fd, len;
int error;
- struct cn_msg *message;
struct pollfd pfd;
- struct nlmsghdr *incoming_msg;
- struct cn_msg *incoming_cn_msg;
- struct hv_kvp_msg *hv_msg;
- char *p;
+ struct hv_kvp_msg hv_msg[1];
char *key_value;
char *key_name;
int op;
int pool;
char *if_name;
struct hv_kvp_ipaddr_value *kvp_ip_val;
- char *kvp_recv_buffer;
- size_t kvp_recv_buffer_len;
int daemonize = 1, long_index = 0, opt;
+ __u32 daemon_ver = (__u32)KVP_OP_REGISTER1;
static struct option long_options[] = {
{"help", no_argument, 0, 'h' },
@@ -1468,12 +1432,14 @@ int main(int argc, char *argv[])
openlog("KVP", 0, LOG_USER);
syslog(LOG_INFO, "KVP starting; pid is:%d", getpid());
- kvp_recv_buffer_len = NLMSG_LENGTH(0) + sizeof(struct cn_msg) + sizeof(struct hv_kvp_msg);
- kvp_recv_buffer = calloc(1, kvp_recv_buffer_len);
- if (!kvp_recv_buffer) {
- syslog(LOG_ERR, "Failed to allocate netlink buffer");
+ kvp_fd = open("/dev/vmbus/hv_kvp", O_RDWR);
+
+ if (kvp_fd < 0) {
+ syslog(LOG_ERR, "open /dev/vmbus/hv_kvp failed; error: %d %s",
+ errno, strerror(errno));
exit(EXIT_FAILURE);
}
+
/*
* Retrieve OS release information.
*/
@@ -1489,100 +1455,44 @@ int main(int argc, char *argv[])
exit(EXIT_FAILURE);
}
- fd = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
- if (fd < 0) {
- syslog(LOG_ERR, "netlink socket creation failed; error: %d %s", errno,
- strerror(errno));
- exit(EXIT_FAILURE);
- }
- addr.nl_family = AF_NETLINK;
- addr.nl_pad = 0;
- addr.nl_pid = 0;
- addr.nl_groups = 0;
-
-
- error = bind(fd, (struct sockaddr *)&addr, sizeof(addr));
- if (error < 0) {
- syslog(LOG_ERR, "bind failed; error: %d %s", errno, strerror(errno));
- close(fd);
- exit(EXIT_FAILURE);
- }
- nl_group = CN_KVP_IDX;
-
- if (setsockopt(fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, &nl_group, sizeof(nl_group)) < 0) {
- syslog(LOG_ERR, "setsockopt failed; error: %d %s", errno, strerror(errno));
- close(fd);
- exit(EXIT_FAILURE);
- }
-
/*
* Register ourselves with the kernel.
*/
- message = (struct cn_msg *)kvp_recv_buffer;
- message->id.idx = CN_KVP_IDX;
- message->id.val = CN_KVP_VAL;
-
- hv_msg = (struct hv_kvp_msg *)message->data;
- hv_msg->kvp_hdr.operation = KVP_OP_REGISTER1;
- message->ack = 0;
- message->len = sizeof(struct hv_kvp_msg);
-
- len = netlink_send(fd, message);
- if (len < 0) {
- syslog(LOG_ERR, "netlink_send failed; error: %d %s", errno, strerror(errno));
- close(fd);
+ if (ioctl(kvp_fd, HYPERV_KVP_REGISTER, &daemon_ver)) {
+ syslog(LOG_ERR, "registration to kernel failed; error: %d %s",
+ errno, strerror(errno));
+ close(kvp_fd);
exit(EXIT_FAILURE);
}
- pfd.fd = fd;
+ syslog(LOG_INFO, "KVP LIC Version: %d", daemon_ver);
+
+ pfd.fd = kvp_fd;
while (1) {
- struct sockaddr *addr_p = (struct sockaddr *) &addr;
- socklen_t addr_l = sizeof(addr);
pfd.events = POLLIN;
pfd.revents = 0;
if (poll(&pfd, 1, -1) < 0) {
syslog(LOG_ERR, "poll failed; error: %d %s", errno, strerror(errno));
if (errno == EINVAL) {
- close(fd);
+ close(kvp_fd);
exit(EXIT_FAILURE);
}
else
continue;
}
- len = recvfrom(fd, kvp_recv_buffer, kvp_recv_buffer_len, 0,
- addr_p, &addr_l);
-
- if (len < 0) {
- int saved_errno = errno;
- syslog(LOG_ERR, "recvfrom failed; pid:%u error:%d %s",
- addr.nl_pid, errno, strerror(errno));
-
- if (saved_errno == ENOBUFS) {
- syslog(LOG_ERR, "receive error: ignored");
- continue;
- }
+ len = read(kvp_fd, hv_msg, sizeof(struct hv_kvp_msg));
- close(fd);
- return -1;
- }
+ if (len != sizeof(struct hv_kvp_msg)) {
+ syslog(LOG_ERR, "read failed; error:%d %s",
+ errno, strerror(errno));
- if (addr.nl_pid) {
- syslog(LOG_WARNING, "Received packet from untrusted pid:%u",
- addr.nl_pid);
- continue;
+ close(kvp_fd);
+ return EXIT_FAILURE;
}
- incoming_msg = (struct nlmsghdr *)kvp_recv_buffer;
-
- if (incoming_msg->nlmsg_type != NLMSG_DONE)
- continue;
-
- incoming_cn_msg = (struct cn_msg *)NLMSG_DATA(incoming_msg);
- hv_msg = (struct hv_kvp_msg *)incoming_cn_msg->data;
-
/*
* We will use the KVP header information to pass back
* the error from this daemon. So, first copy the state
@@ -1592,24 +1502,6 @@ int main(int argc, char *argv[])
pool = hv_msg->kvp_hdr.pool;
hv_msg->error = HV_S_OK;
- if ((in_hand_shake) && (op == KVP_OP_REGISTER1)) {
- /*
- * Driver is registering with us; stash away the version
- * information.
- */
- in_hand_shake = 0;
- p = (char *)hv_msg->body.kvp_register.version;
- lic_version = malloc(strlen(p) + 1);
- if (lic_version) {
- strcpy(lic_version, p);
- syslog(LOG_INFO, "KVP LIC Version: %s",
- lic_version);
- } else {
- syslog(LOG_ERR, "malloc failed");
- }
- continue;
- }
-
switch (op) {
case KVP_OP_GET_IP_INFO:
kvp_ip_val = &hv_msg->body.kvp_ip_val;
@@ -1702,7 +1594,6 @@ int main(int argc, char *argv[])
goto kvp_done;
}
- hv_msg = (struct hv_kvp_msg *)incoming_cn_msg->data;
key_name = (char *)hv_msg->body.kvp_enum_data.data.key;
key_value = (char *)hv_msg->body.kvp_enum_data.data.value;
@@ -1753,31 +1644,17 @@ int main(int argc, char *argv[])
hv_msg->error = HV_S_CONT;
break;
}
- /*
- * Send the value back to the kernel. The response is
- * already in the receive buffer. Update the cn_msg header to
- * reflect the key value that has been added to the message
- */
-kvp_done:
-
- incoming_cn_msg->id.idx = CN_KVP_IDX;
- incoming_cn_msg->id.val = CN_KVP_VAL;
- incoming_cn_msg->ack = 0;
- incoming_cn_msg->len = sizeof(struct hv_kvp_msg);
-
- len = netlink_send(fd, incoming_cn_msg);
- if (len < 0) {
- int saved_errno = errno;
- syslog(LOG_ERR, "net_link send failed; error: %d %s", errno,
- strerror(errno));
-
- if (saved_errno == ENOMEM || saved_errno == ENOBUFS) {
- syslog(LOG_ERR, "send error: ignored");
- continue;
- }
+ /* Send the value back to the kernel. */
+kvp_done:
+ len = write(kvp_fd, hv_msg, sizeof(struct hv_kvp_msg));
+ if (len != sizeof(struct hv_kvp_msg)) {
+ syslog(LOG_ERR, "write failed; error: %d %s", errno,
+ strerror(errno));
exit(EXIT_FAILURE);
}
}
+ close(kvp_fd);
+ exit(0);
}
--
1.9.3
Userspace/kernel communication via netlink has a number of issues:
- It is hard for userspace to figure out if the kernel part was loaded or not
and this fact can change as there is a way to enable/disable the service from
host side. Racy daemon startup is also a problem.
- When the userspace daemon restarts/dies kernel part doesn't receive a
notification.
- Netlink communication is not stable under heavy load.
- ...
Re-implement the communication using misc char device. Use ioctl to do
kernel/userspace version negotiation (doesn't make much sense at this moment
as we're breaking backwards compatibility but can be used in future). Read from
the device returns struct hv_vss_msg and userspace is supposed to reply with
__u32.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
drivers/hv/hv_snapshot.c | 335 ++++++++++++++++++++++++++++++++------------
include/uapi/linux/hyperv.h | 1 +
tools/hv/hv_vss_daemon.c | 141 ++++---------------
3 files changed, 273 insertions(+), 204 deletions(-)
diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index 9d5e0d1..b399758 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -20,9 +20,12 @@
#include <linux/net.h>
#include <linux/nls.h>
-#include <linux/connector.h>
+#include <linux/sched.h>
#include <linux/workqueue.h>
+#include <linux/mutex.h>
#include <linux/hyperv.h>
+#include <linux/miscdevice.h>
+#include <linux/poll.h>
#define VSS_MAJOR 5
#define VSS_MINOR 0
@@ -31,26 +34,36 @@
#define VSS_USERSPACE_TIMEOUT (msecs_to_jiffies(10 * 1000))
/*
- * Global state maintained for transaction that is being processed.
- * Note that only one transaction can be active at any point in time.
- *
- * This state is set when we receive a request from the host; we
- * cleanup this state when the transaction is completed - when we respond
- * to the host with the key value.
+ * Global state maintained for the device. Note that only one transaction can
+ * be active at any point in time.
*/
+enum vss_device_state {
+ VSS_DEVICE_INITIALIZING = 0, /* driver was loaded */
+ VSS_DEVICE_OPENED, /* device was opened */
+ VSS_READY, /* userspace registered */
+ VSS_HOSTMSG_RECEIVED, /* message from host was received */
+ VSS_USERMSG_READY, /* message for userspace is ready */
+ VSS_USERSPACE_REQ, /* request to userspace was sent */
+ VSS_USERSPACE_RECV, /* reply from userspace was received */
+ VSS_DEVICE_DYING, /* driver unload is in progress */
+};
+
static struct {
- bool active; /* transaction status - active or not */
+ int state; /* vss_device_state */
int recv_len; /* number of bytes received. */
struct vmbus_channel *recv_channel; /* chn we got the request */
u64 recv_req_id; /* request ID. */
- struct hv_vss_msg *msg; /* current message */
-} vss_transaction;
-
+ void *vss_context; /* for the channel callback */
+ int dm_reg_value; /* daemon version number */
+ struct mutex lock; /* syncronization */
+ struct hv_vss_msg user_msg; /* message to/from userspace */
+ struct hv_vss_msg host_msg; /* message to/from host */
+ wait_queue_head_t proc_list; /* waiting processes */
+} vss_device;
-static void vss_respond_to_host(int error);
+static void vss_respond_to_host(u32 error);
-static struct cb_id vss_id = { CN_VSS_IDX, CN_VSS_VAL };
static const char vss_name[] = "vss_kernel_module";
static __u8 *recv_buffer;
@@ -60,6 +73,23 @@ static void vss_timeout_func(struct work_struct *dummy);
static DECLARE_DELAYED_WORK(vss_timeout_work, vss_timeout_func);
static DECLARE_WORK(vss_send_op_work, vss_send_op);
+static int vss_handle_handshake(u32 op)
+{
+ vss_device.dm_reg_value = op;
+
+ return 0;
+}
+
+static void poll_channel(struct vmbus_channel *channel)
+{
+ if (channel->target_cpu != smp_processor_id())
+ smp_call_function_single(channel->target_cpu,
+ hv_vss_onchannelcallback,
+ channel, true);
+ else
+ hv_vss_onchannelcallback(channel);
+}
+
/*
* Callback when data is received from user mode.
*/
@@ -73,62 +103,27 @@ static void vss_timeout_func(struct work_struct *dummy)
vss_respond_to_host(HV_E_FAIL);
}
-static void
-vss_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp)
-{
- struct hv_vss_msg *vss_msg;
-
- vss_msg = (struct hv_vss_msg *)msg->data;
-
- if (vss_msg->vss_hdr.operation == VSS_OP_REGISTER) {
- pr_info("VSS daemon registered\n");
- vss_transaction.active = false;
- if (vss_transaction.recv_channel != NULL)
- hv_vss_onchannelcallback(vss_transaction.recv_channel);
- return;
-
- }
- if (cancel_delayed_work_sync(&vss_timeout_work))
- vss_respond_to_host(vss_msg->error);
-}
-
-
static void vss_send_op(struct work_struct *dummy)
{
- int op = vss_transaction.msg->vss_hdr.operation;
- int rc;
- struct cn_msg *msg;
- struct hv_vss_msg *vss_msg;
+ mutex_lock(&vss_device.lock);
- msg = kzalloc(sizeof(*msg) + sizeof(*vss_msg), GFP_ATOMIC);
- if (!msg)
+ if (vss_device.state != VSS_HOSTMSG_RECEIVED)
return;
- vss_msg = (struct hv_vss_msg *)msg->data;
-
- msg->id.idx = CN_VSS_IDX;
- msg->id.val = CN_VSS_VAL;
-
- vss_msg->vss_hdr.operation = op;
- msg->len = sizeof(struct hv_vss_msg);
+ memcpy(&vss_device.user_msg, &vss_device.host_msg,
+ sizeof(struct hv_vss_msg));
- rc = cn_netlink_send(msg, 0, 0, GFP_ATOMIC);
- if (rc) {
- pr_warn("VSS: failed to communicate to the daemon: %d\n", rc);
- if (cancel_delayed_work_sync(&vss_timeout_work))
- vss_respond_to_host(HV_E_FAIL);
- }
- kfree(msg);
+ vss_device.state = VSS_USERMSG_READY;
+ wake_up_interruptible(&vss_device.proc_list);
+ mutex_unlock(&vss_device.lock);
return;
}
/*
* Send a response back to the host.
*/
-
-static void
-vss_respond_to_host(int error)
+static void vss_respond_to_host(u32 error)
{
struct icmsg_hdr *icmsghdrp;
u32 buf_len;
@@ -136,25 +131,13 @@ vss_respond_to_host(int error)
u64 req_id;
/*
- * If a transaction is not active; log and return.
- */
-
- if (!vss_transaction.active) {
- /*
- * This is a spurious call!
- */
- pr_warn("VSS: Transaction not active\n");
- return;
- }
- /*
* Copy the global state for completing the transaction. Note that
* only one transaction can be active at a time.
*/
- buf_len = vss_transaction.recv_len;
- channel = vss_transaction.recv_channel;
- req_id = vss_transaction.recv_req_id;
- vss_transaction.active = false;
+ buf_len = vss_device.recv_len;
+ channel = vss_device.recv_channel;
+ req_id = vss_device.recv_req_id;
icmsghdrp = (struct icmsg_hdr *)
&recv_buffer[sizeof(struct vmbuspipe_hdr)];
@@ -173,6 +156,17 @@ vss_respond_to_host(int error)
vmbus_sendpacket(channel, recv_buffer, buf_len, req_id,
VM_PKT_DATA_INBAND, 0);
+ /* We're ready to process next request, reset the device state */
+ if (vss_device.state == VSS_USERSPACE_RECV ||
+ vss_device.state == VSS_USERSPACE_REQ)
+ vss_device.state = VSS_READY;
+ /*
+ * Make sure device state was set before polling the channel as
+ * processing can happen on a different CPU.
+ */
+ smp_mb();
+
+ poll_channel(channel);
}
/*
@@ -186,19 +180,18 @@ void hv_vss_onchannelcallback(void *context)
u32 recvlen;
u64 requestid;
struct hv_vss_msg *vss_msg;
-
-
struct icmsg_hdr *icmsghdrp;
struct icmsg_negotiate *negop = NULL;
- if (vss_transaction.active) {
+ if (vss_device.state > VSS_READY) {
/*
* We will defer processing this callback once
* the current transaction is complete.
*/
- vss_transaction.recv_channel = channel;
+ vss_device.vss_context = channel;
return;
}
+ vss_device.vss_context = NULL;
vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 2, &recvlen,
&requestid);
@@ -221,11 +214,9 @@ void hv_vss_onchannelcallback(void *context)
* transaction; note transactions are serialized.
*/
- vss_transaction.recv_len = recvlen;
- vss_transaction.recv_channel = channel;
- vss_transaction.recv_req_id = requestid;
- vss_transaction.active = true;
- vss_transaction.msg = (struct hv_vss_msg *)vss_msg;
+ vss_device.recv_len = recvlen;
+ vss_device.recv_channel = channel;
+ vss_device.recv_req_id = requestid;
switch (vss_msg->vss_hdr.operation) {
/*
@@ -241,6 +232,15 @@ void hv_vss_onchannelcallback(void *context)
*/
case VSS_OP_FREEZE:
case VSS_OP_THAW:
+ if (vss_device.state != VSS_READY) {
+ /* Userspace daemon is not connected */
+ vss_respond_to_host(HV_E_FAIL);
+ return;
+ }
+
+ memcpy(&vss_device.host_msg, vss_msg,
+ sizeof(struct hv_vss_msg));
+ vss_device.state = VSS_HOSTMSG_RECEIVED;
schedule_work(&vss_send_op_work);
schedule_delayed_work(&vss_timeout_work,
VSS_USERSPACE_TIMEOUT);
@@ -275,14 +275,164 @@ void hv_vss_onchannelcallback(void *context)
}
-int
-hv_vss_init(struct hv_util_service *srv)
+static int vss_op_open(struct inode *inode, struct file *f)
{
- int err;
+ if (vss_device.state != VSS_DEVICE_INITIALIZING)
+ return -EBUSY;
+ vss_device.state = VSS_DEVICE_OPENED;
+ return 0;
+}
- err = cn_add_callback(&vss_id, vss_name, vss_cn_callback);
- if (err)
- return err;
+static int vss_op_release(struct inode *inode, struct file *f)
+{
+ vss_device.state = VSS_DEVICE_INITIALIZING;
+ return 0;
+}
+
+static ssize_t vss_op_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ int ret = 0;
+ u32 val32;
+
+ if (vss_device.state == VSS_DEVICE_DYING)
+ return -EFAULT;
+
+ if (count != sizeof(u32)) {
+ pr_warn("vss_op_write: invalid write len: %d (expected: %d)\n",
+ (int)count, (int)sizeof(u32));
+ return -EINVAL;
+ }
+
+ mutex_lock(&vss_device.lock);
+
+ if (vss_device.state == VSS_USERSPACE_REQ) {
+ if (!copy_from_user(&val32, buf, sizeof(u32))) {
+ vss_device.state = VSS_USERSPACE_RECV;
+ if (cancel_delayed_work_sync(&vss_timeout_work))
+ vss_respond_to_host(val32);
+ ret = sizeof(u32);
+ } else
+ ret = -EFAULT;
+ } else {
+ pr_warn("vss_op_write: invalid transaction state: %d\n",
+ vss_device.state);
+ ret = -EINVAL;
+ }
+
+ mutex_unlock(&vss_device.lock);
+ return ret;
+}
+
+static ssize_t vss_op_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ ssize_t ret = 0;
+
+ if (vss_device.state == VSS_DEVICE_DYING)
+ return -EFAULT;
+
+ if (count != sizeof(struct hv_vss_msg)) {
+ pr_warn("vss_op_read: invalid read len: %d (expected: %d)\n",
+ (int)count, (int)sizeof(struct hv_vss_msg));
+ return -EINVAL;
+ }
+
+ if (wait_event_interruptible(vss_device.proc_list,
+ vss_device.state == VSS_USERMSG_READY ||
+ vss_device.state == VSS_DEVICE_DYING))
+ return -EFAULT;
+
+ if (vss_device.state != VSS_USERMSG_READY)
+ return -EFAULT;
+
+ mutex_lock(&vss_device.lock);
+
+ if (!copy_to_user(buf, &vss_device.user_msg,
+ sizeof(struct hv_vss_msg))) {
+ vss_device.state = VSS_USERSPACE_REQ;
+ ret = sizeof(struct hv_vss_msg);
+ } else
+ ret = -EFAULT;
+
+ mutex_unlock(&vss_device.lock);
+ return ret;
+}
+
+static unsigned int vss_op_poll(struct file *file, poll_table *wait)
+{
+ if (vss_device.state == VSS_DEVICE_DYING)
+ return -EFAULT;
+
+ poll_wait(file, &vss_device.proc_list, wait);
+ if (vss_device.state == VSS_USERMSG_READY)
+ return POLLIN | POLLRDNORM;
+ return 0;
+}
+
+static long vss_op_ioctl(struct file *fp,
+ unsigned int cmd, unsigned long arg)
+{
+ long ret = 0;
+ void __user *argp = (void __user *)arg;
+ u32 val32;
+
+ if (vss_device.state == VSS_DEVICE_DYING)
+ return -EFAULT;
+
+ /* The only ioctl we have is registation */
+ if (vss_device.state != VSS_DEVICE_OPENED)
+ return -EINVAL;
+
+ mutex_lock(&vss_device.lock);
+
+ switch (cmd) {
+ case HYPERV_VSS_REGISTER:
+ if (copy_from_user(&val32, argp, sizeof(val32))) {
+ ret = -EFAULT;
+ break;
+ }
+ if (!vss_handle_handshake(val32)) {
+ val32 = (u32)VSS_VERSION;
+ if (copy_to_user(argp, &val32, sizeof(val32))) {
+ ret = -EFAULT;
+ break;
+ }
+ vss_device.state = VSS_READY;
+ pr_info("VSS: user-mode registering done.\n");
+ if (vss_device.vss_context)
+ poll_channel(vss_device.vss_context);
+ } else
+ ret = -EINVAL;
+ break;
+
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+ mutex_unlock(&vss_device.lock);
+ return ret;
+}
+
+static const struct file_operations vss_fops = {
+ .owner = THIS_MODULE,
+ .read = vss_op_read,
+ .write = vss_op_write,
+ .release = vss_op_release,
+ .open = vss_op_open,
+ .poll = vss_op_poll,
+ .unlocked_ioctl = vss_op_ioctl
+};
+
+static struct miscdevice vss_misc = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "vmbus/hv_vss",
+ .fops = &vss_fops,
+};
+
+int hv_vss_init(struct hv_util_service *srv)
+{
recv_buffer = srv->recv_buffer;
/*
@@ -291,13 +441,20 @@ hv_vss_init(struct hv_util_service *srv)
* Defer processing channel callbacks until the daemon
* has registered.
*/
- vss_transaction.active = true;
- return 0;
+ vss_device.state = VSS_DEVICE_INITIALIZING;
+ init_waitqueue_head(&vss_device.proc_list);
+ mutex_init(&vss_device.lock);
+
+ return misc_register(&vss_misc);
}
void hv_vss_deinit(void)
{
- cn_del_callback(&vss_id);
+ vss_device.state = VSS_DEVICE_DYING;
+ /* Make sure nobody sees the old state */
+ smp_mb();
+ wake_up_interruptible(&vss_device.proc_list);
cancel_delayed_work_sync(&vss_timeout_work);
cancel_work_sync(&vss_send_op_work);
+ misc_deregister(&vss_misc);
}
diff --git a/include/uapi/linux/hyperv.h b/include/uapi/linux/hyperv.h
index 80713a3..1939826 100644
--- a/include/uapi/linux/hyperv.h
+++ b/include/uapi/linux/hyperv.h
@@ -396,5 +396,6 @@ struct hv_kvp_ip_msg {
* either KVP_OP_REGISTER or KVP_OP_REGISTER1.
*/
#define HYPERV_KVP_REGISTER _IOWR('v', 0, __u32)
+#define HYPERV_VSS_REGISTER _IOWR('v', 0, __u32)
#endif /* _UAPI_HYPERV_H */
diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c
index 5e63f70..d3f1fe9 100644
--- a/tools/hv/hv_vss_daemon.c
+++ b/tools/hv/hv_vss_daemon.c
@@ -19,7 +19,6 @@
#include <sys/types.h>
-#include <sys/socket.h>
#include <sys/poll.h>
#include <sys/ioctl.h>
#include <fcntl.h>
@@ -30,21 +29,11 @@
#include <string.h>
#include <ctype.h>
#include <errno.h>
-#include <arpa/inet.h>
#include <linux/fs.h>
-#include <linux/connector.h>
#include <linux/hyperv.h>
-#include <linux/netlink.h>
#include <syslog.h>
#include <getopt.h>
-static struct sockaddr_nl addr;
-
-#ifndef SOL_NETLINK
-#define SOL_NETLINK 270
-#endif
-
-
/* Don't use syslog() in the function since that can cause write to disk */
static int vss_do_freeze(char *dir, unsigned int cmd)
{
@@ -137,33 +126,6 @@ out:
return error;
}
-static int netlink_send(int fd, struct cn_msg *msg)
-{
- struct nlmsghdr nlh = { .nlmsg_type = NLMSG_DONE };
- unsigned int size;
- struct msghdr message;
- struct iovec iov[2];
-
- size = sizeof(struct cn_msg) + msg->len;
-
- nlh.nlmsg_pid = getpid();
- nlh.nlmsg_len = NLMSG_LENGTH(size);
-
- iov[0].iov_base = &nlh;
- iov[0].iov_len = sizeof(nlh);
-
- iov[1].iov_base = msg;
- iov[1].iov_len = size;
-
- memset(&message, 0, sizeof(message));
- message.msg_name = &addr;
- message.msg_namelen = sizeof(addr);
- message.msg_iov = iov;
- message.msg_iovlen = 2;
-
- return sendmsg(fd, &message, 0);
-}
-
void print_usage(char *argv[])
{
fprintf(stderr, "Usage: %s [options]\n"
@@ -174,17 +136,13 @@ void print_usage(char *argv[])
int main(int argc, char *argv[])
{
- int fd, len, nl_group;
- int error;
- struct cn_msg *message;
+ int vss_fd, len;
+ __u32 error;
struct pollfd pfd;
- struct nlmsghdr *incoming_msg;
- struct cn_msg *incoming_cn_msg;
int op;
- struct hv_vss_msg *vss_msg;
- char *vss_recv_buffer;
- size_t vss_recv_buffer_len;
+ struct hv_vss_msg vss_msg[1];
int daemonize = 1, long_index = 0, opt;
+ __u32 daemon_ver = 1; /* No special meaning */
static struct option long_options[] = {
{"help", no_argument, 0, 'h' },
@@ -211,98 +169,50 @@ int main(int argc, char *argv[])
openlog("Hyper-V VSS", 0, LOG_USER);
syslog(LOG_INFO, "VSS starting; pid is:%d", getpid());
- vss_recv_buffer_len = NLMSG_LENGTH(0) + sizeof(struct cn_msg) + sizeof(struct hv_vss_msg);
- vss_recv_buffer = calloc(1, vss_recv_buffer_len);
- if (!vss_recv_buffer) {
- syslog(LOG_ERR, "Failed to allocate netlink buffers");
- exit(EXIT_FAILURE);
- }
+ vss_fd = open("/dev/vmbus/hv_vss", O_RDWR);
- fd = socket(AF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
- if (fd < 0) {
- syslog(LOG_ERR, "netlink socket creation failed; error:%d %s",
- errno, strerror(errno));
+ if (vss_fd < 0) {
+ syslog(LOG_ERR, "open /dev/vmbus/hv_vss failed; error: %d %s",
+ errno, strerror(errno));
exit(EXIT_FAILURE);
}
- addr.nl_family = AF_NETLINK;
- addr.nl_pad = 0;
- addr.nl_pid = 0;
- addr.nl_groups = 0;
-
- error = bind(fd, (struct sockaddr *)&addr, sizeof(addr));
- if (error < 0) {
- syslog(LOG_ERR, "bind failed; error:%d %s", errno, strerror(errno));
- close(fd);
- exit(EXIT_FAILURE);
- }
- nl_group = CN_VSS_IDX;
- if (setsockopt(fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, &nl_group, sizeof(nl_group)) < 0) {
- syslog(LOG_ERR, "setsockopt failed; error:%d %s", errno, strerror(errno));
- close(fd);
- exit(EXIT_FAILURE);
- }
/*
* Register ourselves with the kernel.
*/
- message = (struct cn_msg *)vss_recv_buffer;
- message->id.idx = CN_VSS_IDX;
- message->id.val = CN_VSS_VAL;
- message->ack = 0;
- vss_msg = (struct hv_vss_msg *)message->data;
- vss_msg->vss_hdr.operation = VSS_OP_REGISTER;
-
- message->len = sizeof(struct hv_vss_msg);
-
- len = netlink_send(fd, message);
- if (len < 0) {
- syslog(LOG_ERR, "netlink_send failed; error:%d %s", errno, strerror(errno));
- close(fd);
+ if (ioctl(vss_fd, HYPERV_VSS_REGISTER, &daemon_ver)) {
+ syslog(LOG_ERR, "registration to kernel failed; error: %d %s",
+ errno, strerror(errno));
+ close(vss_fd);
exit(EXIT_FAILURE);
}
- pfd.fd = fd;
+ pfd.fd = vss_fd;
while (1) {
- struct sockaddr *addr_p = (struct sockaddr *) &addr;
- socklen_t addr_l = sizeof(addr);
pfd.events = POLLIN;
pfd.revents = 0;
if (poll(&pfd, 1, -1) < 0) {
syslog(LOG_ERR, "poll failed; error:%d %s", errno, strerror(errno));
if (errno == EINVAL) {
- close(fd);
+ close(vss_fd);
exit(EXIT_FAILURE);
}
else
continue;
}
- len = recvfrom(fd, vss_recv_buffer, vss_recv_buffer_len, 0,
- addr_p, &addr_l);
+ len = read(vss_fd, vss_msg, sizeof(struct hv_vss_msg));
- if (len < 0) {
- syslog(LOG_ERR, "recvfrom failed; pid:%u error:%d %s",
- addr.nl_pid, errno, strerror(errno));
- close(fd);
- return -1;
- }
+ if (len != sizeof(struct hv_vss_msg)) {
+ syslog(LOG_ERR, "read failed; error:%d %s",
+ errno, strerror(errno));
- if (addr.nl_pid) {
- syslog(LOG_WARNING,
- "Received packet from untrusted pid:%u",
- addr.nl_pid);
- continue;
+ close(vss_fd);
+ return EXIT_FAILURE;
}
- incoming_msg = (struct nlmsghdr *)vss_recv_buffer;
-
- if (incoming_msg->nlmsg_type != NLMSG_DONE)
- continue;
-
- incoming_cn_msg = (struct cn_msg *)NLMSG_DATA(incoming_msg);
- vss_msg = (struct hv_vss_msg *)incoming_cn_msg->data;
op = vss_msg->vss_hdr.operation;
error = HV_S_OK;
@@ -324,13 +234,14 @@ int main(int argc, char *argv[])
default:
syslog(LOG_ERR, "Illegal op:%d\n", op);
}
- vss_msg->error = error;
- len = netlink_send(fd, incoming_cn_msg);
- if (len < 0) {
- syslog(LOG_ERR, "net_link send failed; error:%d %s",
- errno, strerror(errno));
+
+ if (write(vss_fd, &error, sizeof(__u32)) != sizeof(__u32)) {
+ syslog(LOG_ERR, "write failed; error: %d %s", errno,
+ strerror(errno));
exit(EXIT_FAILURE);
}
}
+ close(vss_fd);
+ exit(0);
}
--
1.9.3
Re-implement fcopy in a consistent with "Drivers: hv: vss/kvp: convert
userspace/kernel communication to using char device" way.
In particular:
- Implement "state machine" for the driver instead of 3 separate syncronization
variables ('fcopy_transaction.active', 'fcopy_transaction.read_sema', 'opened')
- Use ioctl for kernel/userspace version negotiation.
- Support poll() operation.
- Set .owner = THIS_MODULE to prevent kernel crash if the module if being
unloaded while there is an active file descriptior in userspace.
- Use __u32 instead of int in userspace replies as it matches icmsg_hdr.status.
- Other minor changes to make drivers code look alike.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
drivers/hv/hv_fcopy.c | 395 ++++++++++++++++++++++++++------------------
include/uapi/linux/hyperv.h | 1 +
tools/hv/hv_fcopy_daemon.c | 48 ++++--
3 files changed, 264 insertions(+), 180 deletions(-)
diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index cd453e4..05c3580 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -28,6 +28,7 @@
#include <linux/sched.h>
#include <linux/uaccess.h>
#include <linux/miscdevice.h>
+#include <linux/poll.h>
#include "hyperv_vmbus.h"
@@ -35,6 +36,8 @@
#define WIN8_SRV_MINOR 1
#define WIN8_SRV_VERSION (WIN8_SRV_MAJOR << 16 | WIN8_SRV_MINOR)
+#define MAX_FCOPY_CHSIZE (PAGE_SIZE * 2)
+
/*
* Global state maintained for transaction that is being processed.
* For a class of integration services, including the "file copy service",
@@ -47,36 +50,37 @@
* ensure this by serializing packet processing in this driver - we do not
* read additional packets from the VMBUs until the current packet is fully
* handled.
- *
- * The transaction "active" state is set when we receive a request from the
- * host and we cleanup this state when the transaction is completed - when we
- * respond to the host with our response. When the transaction active state is
- * set, we defer handling incoming packets.
*/
+enum fcopy_device_state {
+ FCOPY_DEVICE_INITIALIZING = 0, /* driver was loaded */
+ FCOPY_DEVICE_OPENED, /* device was opened */
+ FCOPY_READY, /* userspace registered */
+ FCOPY_HOSTMSG_RECEIVED, /* message from host was received */
+ FCOPY_USERMSG_READY, /* message for userspace is ready */
+ FCOPY_USERSPACE_REQ, /* request to userspace was sent */
+ FCOPY_USERSPACE_RECV, /* reply from userspace was received */
+ FCOPY_DEVICE_DYING, /* driver unload is in progress */
+};
+
static struct {
- bool active; /* transaction status - active or not */
+ int state; /* fcopy device state */
int recv_len; /* number of bytes received. */
- struct hv_fcopy_hdr *fcopy_msg; /* current message */
- struct hv_start_fcopy message; /* sent to daemon */
struct vmbus_channel *recv_channel; /* chn we got the request */
u64 recv_req_id; /* request ID. */
void *fcopy_context; /* for the channel callback */
- struct semaphore read_sema;
-} fcopy_transaction;
-
-static bool opened; /* currently device opened */
+ int dm_reg_value; /* daemon version number */
+ struct mutex lock; /* syncronization */
+ u8 user_msg[MAX_FCOPY_CHSIZE]; /* message to userspace */
+ u8 host_msg[MAX_FCOPY_CHSIZE]; /* message to/from host */
+ wait_queue_head_t proc_list; /* waiting processes */
+} fcopy_device;
-/*
- * Before we can accept copy messages from the host, we need
- * to handshake with the user level daemon. This state tracks
- * if we are in the handshake phase.
- */
-static bool in_hand_shake = true;
-static void fcopy_send_data(void);
static void fcopy_respond_to_host(int error);
static void fcopy_work_func(struct work_struct *dummy);
+static void fcopy_send_op(struct work_struct *dummy);
static DECLARE_DELAYED_WORK(fcopy_work, fcopy_work_func);
+static DECLARE_WORK(fcopy_send_op_work, fcopy_send_op);
static u8 *recv_buffer;
static void fcopy_work_func(struct work_struct *dummy)
@@ -86,22 +90,22 @@ static void fcopy_work_func(struct work_struct *dummy)
* process the pending transaction.
*/
fcopy_respond_to_host(HV_E_FAIL);
+}
- /* In the case the user-space daemon crashes, hangs or is killed, we
- * need to down the semaphore, otherwise, after the daemon starts next
- * time, the obsolete data in fcopy_transaction.message or
- * fcopy_transaction.fcopy_msg will be used immediately.
- *
- * NOTE: fcopy_read() happens to get the semaphore (very rare)? We're
- * still OK, because we've reported the failure to the host.
- */
- if (down_trylock(&fcopy_transaction.read_sema))
- ;
-
+static void poll_channel(struct vmbus_channel *channel)
+{
+ if (channel->target_cpu != smp_processor_id())
+ smp_call_function_single(channel->target_cpu,
+ hv_fcopy_onchannelcallback,
+ channel, true);
+ else
+ hv_fcopy_onchannelcallback(channel);
}
static int fcopy_handle_handshake(u32 version)
{
+ int ret = 0;
+
switch (version) {
case FCOPY_CURRENT_VERSION:
break;
@@ -112,23 +116,19 @@ static int fcopy_handle_handshake(u32 version)
* deal with, we will be backward compatible.
* We will add this code when needed.
*/
- return -EINVAL;
+ ret = -EINVAL;
}
- pr_info("FCP: user-mode registering done. Daemon version: %d\n",
- version);
- fcopy_transaction.active = false;
- if (fcopy_transaction.fcopy_context)
- hv_fcopy_onchannelcallback(fcopy_transaction.fcopy_context);
- in_hand_shake = false;
return 0;
}
-static void fcopy_send_data(void)
+static void fcopy_send_op(struct work_struct *dummy)
{
- struct hv_start_fcopy *smsg_out = &fcopy_transaction.message;
- int operation = fcopy_transaction.fcopy_msg->operation;
+ struct hv_start_fcopy *smsg_out;
+ struct hv_do_fcopy *dmsg_out;
struct hv_start_fcopy *smsg_in;
+ mutex_lock(&fcopy_device.lock);
+
/*
* The strings sent from the host are encoded in
* in utf16; convert it to utf8 strings.
@@ -140,11 +140,14 @@ static void fcopy_send_data(void)
* that the strings can be properly terminated!
*/
- switch (operation) {
+ switch (((struct hv_fcopy_hdr *)fcopy_device.host_msg)->operation) {
case START_FILE_COPY:
- memset(smsg_out, 0, sizeof(struct hv_start_fcopy));
- smsg_out->hdr.operation = operation;
- smsg_in = (struct hv_start_fcopy *)fcopy_transaction.fcopy_msg;
+ memset(&fcopy_device.user_msg, 0,
+ sizeof(struct hv_start_fcopy));
+
+ smsg_out = (struct hv_start_fcopy *)fcopy_device.user_msg;
+ smsg_out->hdr.operation = START_FILE_COPY;
+ smsg_in = (struct hv_start_fcopy *)fcopy_device.host_msg;
utf16s_to_utf8s((wchar_t *)smsg_in->file_name, W_MAX_PATH,
UTF16_LITTLE_ENDIAN,
@@ -159,9 +162,16 @@ static void fcopy_send_data(void)
break;
default:
+ dmsg_out = (struct hv_do_fcopy *)fcopy_device.user_msg;
+ memcpy(fcopy_device.user_msg, fcopy_device.host_msg,
+ sizeof(struct hv_do_fcopy));
break;
}
- up(&fcopy_transaction.read_sema);
+
+ fcopy_device.state = FCOPY_USERMSG_READY;
+ wake_up_interruptible(&fcopy_device.proc_list);
+
+ mutex_unlock(&fcopy_device.lock);
return;
}
@@ -169,8 +179,7 @@ static void fcopy_send_data(void)
* Send a response back to the host.
*/
-static void
-fcopy_respond_to_host(int error)
+static void fcopy_respond_to_host(int error)
{
struct icmsg_hdr *icmsghdr;
u32 buf_len;
@@ -185,11 +194,9 @@ fcopy_respond_to_host(int error)
* only be one active transaction at a time.
*/
- buf_len = fcopy_transaction.recv_len;
- channel = fcopy_transaction.recv_channel;
- req_id = fcopy_transaction.recv_req_id;
-
- fcopy_transaction.active = false;
+ buf_len = fcopy_device.recv_len;
+ channel = fcopy_device.recv_channel;
+ req_id = fcopy_device.recv_req_id;
icmsghdr = (struct icmsg_hdr *)
&recv_buffer[sizeof(struct vmbuspipe_hdr)];
@@ -205,6 +212,20 @@ fcopy_respond_to_host(int error)
icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;
vmbus_sendpacket(channel, recv_buffer, buf_len, req_id,
VM_PKT_DATA_INBAND, 0);
+
+
+ /* We're ready to process next request, reset the device state */
+ if (fcopy_device.state == FCOPY_USERSPACE_RECV ||
+ fcopy_device.state == FCOPY_USERSPACE_REQ)
+ fcopy_device.state = FCOPY_READY;
+
+ /*
+ * Make sure device state was set before polling the channel as
+ * processing can happen on a different CPU.
+ */
+ smp_mb();
+
+ poll_channel(channel);
}
void hv_fcopy_onchannelcallback(void *context)
@@ -218,16 +239,17 @@ void hv_fcopy_onchannelcallback(void *context)
int util_fw_version;
int fcopy_srv_version;
- if (fcopy_transaction.active) {
+ if (fcopy_device.state > FCOPY_READY) {
/*
* We will defer processing this callback once
* the current transaction is complete.
*/
- fcopy_transaction.fcopy_context = context;
+ fcopy_device.fcopy_context = channel;
return;
}
+ fcopy_device.fcopy_context = NULL;
- vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 2, &recvlen,
+ vmbus_recvpacket(channel, recv_buffer, MAX_FCOPY_CHSIZE, &recvlen,
&requestid);
if (recvlen <= 0)
return;
@@ -235,6 +257,10 @@ void hv_fcopy_onchannelcallback(void *context)
icmsghdr = (struct icmsg_hdr *)&recv_buffer[
sizeof(struct vmbuspipe_hdr)];
if (icmsghdr->icmsgtype == ICMSGTYPE_NEGOTIATE) {
+ /*
+ * Process negotiation even before usersace daemon is connected
+ * as we can timeout othervise.
+ */
util_fw_version = UTIL_FW_VERSION;
fcopy_srv_version = WIN8_SRV_VERSION;
vmbus_prep_negotiate_resp(icmsghdr, negop, recv_buffer,
@@ -249,17 +275,26 @@ void hv_fcopy_onchannelcallback(void *context)
* transaction; note transactions are serialized.
*/
- fcopy_transaction.active = true;
- fcopy_transaction.recv_len = recvlen;
- fcopy_transaction.recv_channel = channel;
- fcopy_transaction.recv_req_id = requestid;
- fcopy_transaction.fcopy_msg = fcopy_msg;
+ fcopy_device.recv_len = recvlen;
+ fcopy_device.recv_channel = channel;
+ fcopy_device.recv_req_id = requestid;
+
+ if (fcopy_device.state != FCOPY_READY) {
+ /* Userspace daemon is not connected, just fail. */
+ fcopy_respond_to_host(HV_E_FAIL);
+ return;
+ }
+
+ memcpy(fcopy_device.host_msg, fcopy_msg, recvlen -
+ (sizeof(struct vmbuspipe_hdr) +
+ sizeof(struct icmsg_hdr)));
+ fcopy_device.state = FCOPY_HOSTMSG_RECEIVED;
/*
* Send the information to the user-level daemon.
*/
+ schedule_work(&fcopy_send_op_work);
schedule_delayed_work(&fcopy_work, 5*HZ);
- fcopy_send_data();
return;
}
icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;
@@ -272,121 +307,170 @@ void hv_fcopy_onchannelcallback(void *context)
* the payload.
*/
-static ssize_t fcopy_read(struct file *file, char __user *buf,
- size_t count, loff_t *ppos)
+static ssize_t fcopy_op_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
{
- void *src;
- size_t copy_size;
- int operation;
+ ssize_t ret = 0;
+ int copy_size;
+ struct hv_fcopy_hdr *hdr;
+
+ if (wait_event_interruptible(fcopy_device.proc_list,
+ fcopy_device.state == FCOPY_USERMSG_READY
+ ||
+ fcopy_device.state == FCOPY_DEVICE_DYING))
+ return -EFAULT;
- /*
- * Wait until there is something to be read.
- */
- if (down_interruptible(&fcopy_transaction.read_sema))
- return -EINTR;
+ if (fcopy_device.state != FCOPY_USERMSG_READY)
+ return -EFAULT;
- /*
- * The channel may be rescinded and in this case, we will wakeup the
- * the thread blocked on the semaphore and we will use the opened
- * state to correctly handle this case.
- */
- if (!opened)
- return -ENODEV;
+ mutex_lock(&fcopy_device.lock);
- operation = fcopy_transaction.fcopy_msg->operation;
+ hdr = (struct hv_fcopy_hdr *)fcopy_device.user_msg;
- if (operation == START_FILE_COPY) {
- src = &fcopy_transaction.message;
+ if (hdr->operation == START_FILE_COPY)
copy_size = sizeof(struct hv_start_fcopy);
- if (count < copy_size)
- return 0;
- } else {
- src = fcopy_transaction.fcopy_msg;
+ else
copy_size = sizeof(struct hv_do_fcopy);
- if (count < copy_size)
- return 0;
+
+ if (count < copy_size) {
+ pr_warn("fcopy_op_read: invalid read len: %d (expected: %d)\n",
+ (int)count, copy_size);
+ mutex_unlock(&fcopy_device.lock);
+ return -EINVAL;
}
- if (copy_to_user(buf, src, copy_size))
- return -EFAULT;
- return copy_size;
+ if (!copy_to_user(buf, fcopy_device.user_msg, copy_size)) {
+ fcopy_device.state = FCOPY_USERSPACE_REQ;
+ ret = copy_size;
+ } else
+ ret = -EFAULT;
+
+ mutex_unlock(&fcopy_device.lock);
+ return ret;
}
-static ssize_t fcopy_write(struct file *file, const char __user *buf,
- size_t count, loff_t *ppos)
+static ssize_t fcopy_op_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
{
- int response = 0;
+ int ret = 0;
+ u32 val32;
- if (count != sizeof(int))
- return -EINVAL;
-
- if (copy_from_user(&response, buf, sizeof(int)))
+ if (fcopy_device.state == FCOPY_DEVICE_DYING)
return -EFAULT;
- if (in_hand_shake) {
- if (fcopy_handle_handshake(response))
- return -EINVAL;
- return sizeof(int);
+ if (count != sizeof(u32)) {
+ pr_warn("fcopy_op_write: invalid write len: %d (expected: %d)\n",
+ (int)count, (int)sizeof(u32));
+ return -EINVAL;
}
- /*
- * Complete the transaction by forwarding the result
- * to the host. But first, cancel the timeout.
- */
- if (cancel_delayed_work_sync(&fcopy_work))
- fcopy_respond_to_host(response);
+ mutex_lock(&fcopy_device.lock);
- return sizeof(int);
+ if (fcopy_device.state == FCOPY_USERSPACE_REQ) {
+ if (!copy_from_user(&val32, buf, sizeof(u32))) {
+ fcopy_device.state = FCOPY_USERSPACE_RECV;
+ if (cancel_delayed_work_sync(&fcopy_work))
+ fcopy_respond_to_host(val32);
+ ret = sizeof(u32);
+ } else
+ ret = -EFAULT;
+ } else {
+ pr_warn("fcopy_op_write: invalid transaction state: %d\n",
+ fcopy_device.state);
+ ret = -EINVAL;
+ }
+
+ mutex_unlock(&fcopy_device.lock);
+ return ret;
}
-static int fcopy_open(struct inode *inode, struct file *f)
+static int fcopy_op_open(struct inode *inode, struct file *f)
{
/*
* The user level daemon that will open this device is
* really an extension of this driver. We can have only
* active open at a time.
*/
- if (opened)
+ if (fcopy_device.state != FCOPY_DEVICE_INITIALIZING)
return -EBUSY;
-
- /*
- * The daemon is alive; setup the state.
- */
- opened = true;
+ fcopy_device.state = FCOPY_DEVICE_OPENED;
return 0;
}
-/* XXX: there are still some tricky corner cases, e.g.,
- * 1) In a SMP guest, when fcopy_release() runs between
- * schedule_delayed_work() and fcopy_send_data(), there is
- * still a chance an obsolete message will be queued.
- *
- * 2) When the fcopy daemon is running, if we unload the driver,
- * we'll notice a kernel oops when we kill the daemon later.
- */
-static int fcopy_release(struct inode *inode, struct file *f)
+static int fcopy_op_release(struct inode *inode, struct file *f)
{
/*
* The daemon has exited; reset the state.
*/
- in_hand_shake = true;
- opened = false;
-
- if (cancel_delayed_work_sync(&fcopy_work)) {
- /* We haven't up()-ed the semaphore(very rare)? */
- if (down_trylock(&fcopy_transaction.read_sema))
- ;
- fcopy_respond_to_host(HV_E_FAIL);
- }
+ fcopy_device.state = FCOPY_DEVICE_INITIALIZING;
+ return 0;
+}
+
+static unsigned int fcopy_op_poll(struct file *file, poll_table *wait)
+{
+ if (fcopy_device.state == FCOPY_DEVICE_DYING)
+ return -EFAULT;
+
+ poll_wait(file, &fcopy_device.proc_list, wait);
+ if (fcopy_device.state == FCOPY_USERMSG_READY)
+ return POLLIN | POLLRDNORM;
return 0;
}
+static long fcopy_op_ioctl(struct file *fp,
+ unsigned int cmd, unsigned long arg)
+{
+ long ret = 0;
+ void __user *argp = (void __user *)arg;
+ u32 val32;
+
+ if (fcopy_device.state == FCOPY_DEVICE_DYING)
+ return -EFAULT;
+
+ /* The only ioctl we have is registation */
+ if (fcopy_device.state != FCOPY_DEVICE_OPENED)
+ return -EINVAL;
+
+ mutex_lock(&fcopy_device.lock);
+
+ switch (cmd) {
+ case HYPERV_FCOPY_REGISTER:
+ if (copy_from_user(&val32, argp, sizeof(val32))) {
+ ret = -EFAULT;
+ break;
+ }
+ if (!fcopy_handle_handshake(val32)) {
+ /* No special meaning for userspace part for now. */
+ val32 = (u32)WIN8_SRV_VERSION;
+ if (copy_to_user(argp, &val32, sizeof(val32))) {
+ ret = -EFAULT;
+ break;
+ }
+ fcopy_device.state = FCOPY_READY;
+ pr_info("FCOPY: user-mode registering done.\n");
+ if (fcopy_device.fcopy_context)
+ poll_channel(fcopy_device.fcopy_context);
+ } else
+ ret = -EINVAL;
+ break;
+
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+ mutex_unlock(&fcopy_device.lock);
+ return ret;
+}
static const struct file_operations fcopy_fops = {
- .read = fcopy_read,
- .write = fcopy_write,
- .release = fcopy_release,
- .open = fcopy_open,
+ .owner = THIS_MODULE,
+ .read = fcopy_op_read,
+ .write = fcopy_op_write,
+ .release = fcopy_op_release,
+ .open = fcopy_op_open,
+ .poll = fcopy_op_poll,
+ .unlocked_ioctl = fcopy_op_ioctl
};
static struct miscdevice fcopy_misc = {
@@ -395,29 +479,6 @@ static struct miscdevice fcopy_misc = {
.fops = &fcopy_fops,
};
-static int fcopy_dev_init(void)
-{
- return misc_register(&fcopy_misc);
-}
-
-static void fcopy_dev_deinit(void)
-{
-
- /*
- * The device is going away - perhaps because the
- * host has rescinded the channel. Setup state so that
- * user level daemon can gracefully exit if it is blocked
- * on the read semaphore.
- */
- opened = false;
- /*
- * Signal the semaphore as the device is
- * going away.
- */
- up(&fcopy_transaction.read_sema);
- misc_deregister(&fcopy_misc);
-}
-
int hv_fcopy_init(struct hv_util_service *srv)
{
recv_buffer = srv->recv_buffer;
@@ -428,14 +489,20 @@ int hv_fcopy_init(struct hv_util_service *srv)
* Defer processing channel callbacks until the daemon
* has registered.
*/
- fcopy_transaction.active = true;
- sema_init(&fcopy_transaction.read_sema, 0);
+ fcopy_device.state = FCOPY_DEVICE_INITIALIZING;
+ init_waitqueue_head(&fcopy_device.proc_list);
+ mutex_init(&fcopy_device.lock);
- return fcopy_dev_init();
+ return misc_register(&fcopy_misc);
}
void hv_fcopy_deinit(void)
{
+ fcopy_device.state = FCOPY_DEVICE_DYING;
+ /* Make sure nobody sees the old state */
+ smp_mb();
+ wake_up_interruptible(&fcopy_device.proc_list);
cancel_delayed_work_sync(&fcopy_work);
- fcopy_dev_deinit();
+ cancel_work_sync(&fcopy_send_op_work);
+ misc_deregister(&fcopy_misc);
}
diff --git a/include/uapi/linux/hyperv.h b/include/uapi/linux/hyperv.h
index 1939826..590a2f4 100644
--- a/include/uapi/linux/hyperv.h
+++ b/include/uapi/linux/hyperv.h
@@ -397,5 +397,6 @@ struct hv_kvp_ip_msg {
*/
#define HYPERV_KVP_REGISTER _IOWR('v', 0, __u32)
#define HYPERV_VSS_REGISTER _IOWR('v', 0, __u32)
+#define HYPERV_FCOPY_REGISTER _IOWR('v', 0, __u32)
#endif /* _UAPI_HYPERV_H */
diff --git a/tools/hv/hv_fcopy_daemon.c b/tools/hv/hv_fcopy_daemon.c
index 9445d8f..2ae8196 100644
--- a/tools/hv/hv_fcopy_daemon.c
+++ b/tools/hv/hv_fcopy_daemon.c
@@ -18,19 +18,16 @@
#include <sys/types.h>
-#include <sys/socket.h>
#include <sys/poll.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
#include <linux/types.h>
-#include <linux/kdev_t.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
-#include <string.h>
-#include <ctype.h>
#include <errno.h>
#include <linux/hyperv.h>
#include <syslog.h>
-#include <sys/stat.h>
#include <fcntl.h>
#include <dirent.h>
#include <getopt.h>
@@ -132,7 +129,8 @@ void print_usage(char *argv[])
int main(int argc, char *argv[])
{
int fcopy_fd, len;
- int error;
+ __u32 error;
+ struct pollfd pfd;
int daemonize = 1, long_index = 0, opt;
int version = FCOPY_CURRENT_VERSION;
char *buffer[4096 * 2];
@@ -176,19 +174,33 @@ int main(int argc, char *argv[])
/*
* Register with the kernel.
*/
- if ((write(fcopy_fd, &version, sizeof(int))) != sizeof(int)) {
- syslog(LOG_ERR, "Registration failed: %s", strerror(errno));
+ if (ioctl(fcopy_fd, HYPERV_FCOPY_REGISTER, &version)) {
+ syslog(LOG_ERR, "registration to kernel failed; error: %d %s",
+ errno, strerror(errno));
+ close(fcopy_fd);
exit(EXIT_FAILURE);
}
+ pfd.fd = fcopy_fd;
+
while (1) {
- /*
- * In this loop we process fcopy messages after the
- * handshake is complete.
- */
- len = pread(fcopy_fd, buffer, (4096 * 2), 0);
+ pfd.events = POLLIN;
+ pfd.revents = 0;
+
+ if (poll(&pfd, 1, -1) < 0) {
+ syslog(LOG_ERR, "poll failed; error:%d %s", errno,
+ strerror(errno));
+ if (errno == EINVAL) {
+ close(fcopy_fd);
+ exit(EXIT_FAILURE);
+ } else
+ continue;
+ }
+
+ len = read(fcopy_fd, buffer, (4096 * 2));
if (len < 0) {
- syslog(LOG_ERR, "pread failed: %s", strerror(errno));
+ syslog(LOG_ERR, "read failed: %d %s", errno,
+ strerror(errno));
exit(EXIT_FAILURE);
}
in_msg = (struct hv_fcopy_hdr *)buffer;
@@ -213,9 +225,13 @@ int main(int argc, char *argv[])
}
- if (pwrite(fcopy_fd, &error, sizeof(int), 0) != sizeof(int)) {
- syslog(LOG_ERR, "pwrite failed: %s", strerror(errno));
+ if (write(fcopy_fd, &error, sizeof(__u32)) != sizeof(__u32)) {
+ syslog(LOG_ERR, "write failed: %d %s", errno,
+ strerror(errno));
exit(EXIT_FAILURE);
}
}
+
+ close(fcopy_fd);
+ exit(0);
}
--
1.9.3
2015-02-27 17:14+0100, Vitaly Kuznetsov:
> Re-implement the communication using misc char device. Use ioctl to do
> kernel/userspace version negotiation (doesn't make much sense at this moment
> as we're breaking backwards compatibility but can be used in future).
The main question is whether we want to abolish backward compatibility;
kernel rules are usually against breakages and it's hard to prove that
the bundled daemon is a sole user and gets updated at the same time.
(Note: I'd gladly break anything.)
The ioctl is used too creatively for my liking: as an out-of-band
communication that is required after the main channel has been opened.
It would be simpler to inject the version into first x bytes of the
stream, making a read() after open() mandatory.
(I've only done a high level overview so far.)
2015-02-27 17:14+0100, Vitaly Kuznetsov:
> This series converts kvp/vss daemons to use misc char devices instead of
> netlink for userspace/kernel communication and then updates fcopy to be
> consistent with kvp/vss.
>
> Userspace/kernel communication via netlink has a number of issues:
> - It is hard for userspace to figure out if the kernel part was loaded or not
> and this fact can change as there is a way to enable/disable the service from
> host side.
(Hm, this should be just a message to the userspace daemon, but netlink
probably makes it complicated anyway.)
> Racy daemon startup is also a problem.
(Is it significantly worse than what we need to protect devices?)
> - When the userspace daemon restarts/dies kernel part doesn't receive a
> notification.
(True, we could use a other-side-closed callback.)
> - Netlink communication is not stable under heavy load.
(The message order changes?)
> RFC: I'm a bit puzzled on how to split commits 1 and 2 avoiding breakages.
Split the userspace part -- it won't break bisects.
And then, you could refactor drivers first ... the way we communicate
with userspace should have little impact on what the rest does (or how).
At first sight, there are three units, apart from glue,
1) communication with host
2) communication with userspace
3) repacking of data between first two
With an API for userspace communication, the amount of code to replace
netlink could be lower and resulting patches definitely easier to
review. (And with extra work, both ABIs could even live side-by-side ;)