* * *
This series of VMCI linux upstreaming patches include latest udpate from
VMware.
Summary of changes:
- Fix some new sparse issues.
- Remove some unneeded casts for VMCI.
- add more __user annotations for VMCI.
- Remove kernel version-specific bits from vSockets.
* * *
In an effort to improve the out-of-the-box experience with Linux
kernels for VMware users, VMware is working on readying the Virtual
Machine Communication Interface (vmw_vmci) and VMCI Sockets
(vmw_vsock) kernel modules for inclusion in the Linux kernel. The
purpose of this post is to acquire feedback on the vmw_vmci kernel
module. The vmw_vsock kernel module will be presented in a later post.
* * *
VMCI allows virtual machines to communicate with host kernel modules
and the VMware hypervisors. User level applications both in a virtual
machine and on the host can use vmw_vmci through VMCI Sockets, a socket
address family designed to be compatible with UDP and TCP at the
interface level. Today, VMCI and VMCI Sockets are used by the VMware
shared folders (HGFS) and various VMware Tools components inside the
guest for zero-config, network-less access to VMware host services. In
addition to this, VMware's users are using VMCI Sockets for various
applications, where network access of the virtual machine is
restricted or non-existent. Examples of this are VMs communicating
with device proxies for proprietary hardware running as host
applications and automated testing of applications running within
virtual machines.
In a virtual machine, VMCI is exposed as a regular PCI device. The
primary communication mechanisms supported are a point-to-point
bidirectional transport based on a pair of memory-mapped queues, and
asynchronous notifications in the form of datagrams and
doorbells. These features are available to kernel level components
such as HGFS and VMCI Sockets through the VMCI kernel API. In addition
to this, the VMCI kernel API provides support for receiving events
related to the state of the VMCI communication channels, and the
virtual machine itself.
Outside the virtual machine, the host side support of the VMCI kernel
module makes the same VMCI kernel API available to VMCI endpoints on
the host. In addition to this, the host side manages each VMCI device
in a virtual machine through a context object. This context object
serves to identify the virtual machine for communication, and to track
the resource consumption of the given VMCI device. Both operations
related to communication between the virtual machine and the host
kernel, and those related to the management of the VMCI device state
in the host kernel, are invoked by the user level component of the
hypervisor through a set of ioctls on the VMCI device node. To
provide seamless support for nested virtualization, where a virtual
machine may use both a VMCI PCI device to talk to its hypervisor, and
the VMCI host side support to run nested virtual machines, the VMCI
host and virtual machine support are combined in a single kernel
module.
For additional information about the use of VMCI and in particular
VMCI Sockets, please refer to the VMCI Socket Programming Guide
available at https://www.vmware.com/support/developer/vmci-sdk/.
---
George Zhang (12):
VMCI: context implementation.
VMCI: datagram implementation.
VMCI: doorbell implementation.
VMCI: device driver implementaton.
VMCI: event handling implementation.
VMCI: handle array implementation.
VMCI: queue pairs implementation.
VMCI: resource object implementation.
VMCI: routing implementation.
VMCI: guest side driver implementation.
VMCI: host side driver implementation.
VMCI: Some header and config files.
drivers/misc/Kconfig | 1
drivers/misc/Makefile | 2
drivers/misc/vmw_vmci/Kconfig | 16
drivers/misc/vmw_vmci/Makefile | 4
drivers/misc/vmw_vmci/vmci_common_int.h | 34
drivers/misc/vmw_vmci/vmci_context.c | 1246 ++++++++++
drivers/misc/vmw_vmci/vmci_context.h | 183 ++
drivers/misc/vmw_vmci/vmci_datagram.c | 506 ++++
drivers/misc/vmw_vmci/vmci_datagram.h | 52
drivers/misc/vmw_vmci/vmci_doorbell.c | 605 +++++
drivers/misc/vmw_vmci/vmci_doorbell.h | 51
drivers/misc/vmw_vmci/vmci_driver.c | 116 +
drivers/misc/vmw_vmci/vmci_driver.h | 50
drivers/misc/vmw_vmci/vmci_event.c | 229 ++
drivers/misc/vmw_vmci/vmci_event.h | 25
drivers/misc/vmw_vmci/vmci_guest.c | 762 ++++++
drivers/misc/vmw_vmci/vmci_handle_array.c | 142 +
drivers/misc/vmw_vmci/vmci_handle_array.h | 52
drivers/misc/vmw_vmci/vmci_host.c | 1033 +++++++++
drivers/misc/vmw_vmci/vmci_queue_pair.c | 3506 +++++++++++++++++++++++++++++
drivers/misc/vmw_vmci/vmci_queue_pair.h | 191 ++
drivers/misc/vmw_vmci/vmci_resource.c | 232 ++
drivers/misc/vmw_vmci/vmci_resource.h | 59
drivers/misc/vmw_vmci/vmci_route.c | 229 ++
drivers/misc/vmw_vmci/vmci_route.h | 30
include/linux/vmw_vmci_api.h | 82 +
include/linux/vmw_vmci_defs.h | 973 ++++++++
27 files changed, 10411 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/Kconfig
create mode 100644 drivers/misc/vmw_vmci/Makefile
create mode 100644 drivers/misc/vmw_vmci/vmci_common_int.h
create mode 100644 drivers/misc/vmw_vmci/vmci_context.c
create mode 100644 drivers/misc/vmw_vmci/vmci_context.h
create mode 100644 drivers/misc/vmw_vmci/vmci_datagram.c
create mode 100644 drivers/misc/vmw_vmci/vmci_datagram.h
create mode 100644 drivers/misc/vmw_vmci/vmci_doorbell.c
create mode 100644 drivers/misc/vmw_vmci/vmci_doorbell.h
create mode 100644 drivers/misc/vmw_vmci/vmci_driver.c
create mode 100644 drivers/misc/vmw_vmci/vmci_driver.h
create mode 100644 drivers/misc/vmw_vmci/vmci_event.c
create mode 100644 drivers/misc/vmw_vmci/vmci_event.h
create mode 100644 drivers/misc/vmw_vmci/vmci_guest.c
create mode 100644 drivers/misc/vmw_vmci/vmci_handle_array.c
create mode 100644 drivers/misc/vmw_vmci/vmci_handle_array.h
create mode 100644 drivers/misc/vmw_vmci/vmci_host.c
create mode 100644 drivers/misc/vmw_vmci/vmci_queue_pair.c
create mode 100644 drivers/misc/vmw_vmci/vmci_queue_pair.h
create mode 100644 drivers/misc/vmw_vmci/vmci_resource.c
create mode 100644 drivers/misc/vmw_vmci/vmci_resource.h
create mode 100644 drivers/misc/vmw_vmci/vmci_route.c
create mode 100644 drivers/misc/vmw_vmci/vmci_route.h
create mode 100644 include/linux/vmw_vmci_api.h
create mode 100644 include/linux/vmw_vmci_defs.h
--
Signature
VMCI Context code maintains state for vmci and allows the driver to communicate with
multiple VMs.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/vmw_vmci/vmci_context.c | 1246 ++++++++++++++++++++++++++++++++++
drivers/misc/vmw_vmci/vmci_context.h | 183 +++++
2 files changed, 1429 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/vmci_context.c
create mode 100644 drivers/misc/vmw_vmci/vmci_context.h
diff --git a/drivers/misc/vmw_vmci/vmci_context.c b/drivers/misc/vmw_vmci/vmci_context.c
new file mode 100644
index 0000000..a46bfaa
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_context.c
@@ -0,0 +1,1246 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/vmw_vmci_api.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vmci_common_int.h"
+#include "vmci_queue_pair.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+/*
+ * List of current VMCI contexts. Contexts can be added by
+ * vmci_ctx_create() and removed via vmci_ctx_destroy().
+ * These, along with context lookup, are protected by the
+ * list structure's lock.
+ */
+static struct {
+ struct list_head head;
+ spinlock_t lock; /* Spinlock for context list operations */
+} ctx_list = {
+ .head = LIST_HEAD_INIT(ctx_list.head),
+ .lock = __SPIN_LOCK_UNLOCKED(ctx_list.lock),
+};
+
+/* Used by contexts that did not set up notify flag pointers */
+static bool ctx_dummy_notify;
+
+static void ctx_signal_notify(struct vmci_ctx *context)
+{
+ *context->notify = true;
+}
+
+static void ctx_clear_notify(struct vmci_ctx *context)
+{
+ *context->notify = false;
+}
+
+/*
+ * If nothing requires the attention of the guest, clears both
+ * notify flag and call.
+ */
+static void ctx_clear_notify_call(struct vmci_ctx *context)
+{
+ if (context->pending_datagrams == 0 &&
+ vmci_handle_arr_get_size(context->pending_doorbell_array) == 0)
+ ctx_clear_notify(context);
+}
+
+/*
+ * Sets the context's notify flag iff datagrams are pending for this
+ * context. Called from vmci_setup_notify().
+ */
+void vmci_ctx_check_signal_notify(struct vmci_ctx *context)
+{
+ spin_lock(&context->lock);
+ if (context->pending_datagrams)
+ ctx_signal_notify(context);
+ spin_unlock(&context->lock);
+}
+
+/*
+ * Allocates and initializes a VMCI context.
+ */
+struct vmci_ctx *vmci_ctx_create(u32 cid, u32 priv_flags,
+ uintptr_t event_hnd,
+ int user_version,
+ const struct cred *cred)
+{
+ struct vmci_ctx *context;
+ int error;
+
+ if (cid == VMCI_INVALID_ID) {
+ pr_devel("Invalid context ID for VMCI context.\n");
+ error = -EINVAL;
+ goto err_out;
+ }
+
+ if (priv_flags & ~VMCI_PRIVILEGE_ALL_FLAGS) {
+ pr_devel("Invalid flag (flags=0x%x) for VMCI context.\n",
+ priv_flags);
+ error = -EINVAL;
+ goto err_out;
+ }
+
+ if (user_version == 0) {
+ pr_devel("Invalid suer_version %d\n", user_version);
+ error = -EINVAL;
+ goto err_out;
+ }
+
+ context = kzalloc(sizeof(*context), GFP_KERNEL);
+ if (!context) {
+ pr_warn("Failed to allocate memory for VMCI context.\n");
+ error = -EINVAL;
+ goto err_out;
+ }
+
+ kref_init(&context->kref);
+ spin_lock_init(&context->lock);
+ INIT_LIST_HEAD(&context->list_item);
+ INIT_LIST_HEAD(&context->datagram_queue);
+ INIT_LIST_HEAD(&context->notifier_list);
+
+ /* Initialize host-specific VMCI context. */
+ init_waitqueue_head(&context->host_context.wait_queue);
+
+ context->queue_pair_array = vmci_handle_arr_create(0);
+ if (!context->queue_pair_array) {
+ error = -ENOMEM;
+ goto err_free_ctx;
+ }
+
+ context->doorbell_array = vmci_handle_arr_create(0);
+ if (!context->doorbell_array) {
+ error = -ENOMEM;
+ goto err_free_qp_array;
+ }
+
+ context->pending_doorbell_array = vmci_handle_arr_create(0);
+ if (!context->pending_doorbell_array) {
+ error = -ENOMEM;
+ goto err_free_db_array;
+ }
+
+ context->user_version = user_version;
+
+ context->priv_flags = priv_flags;
+
+ if (cred)
+ context->cred = get_cred(cred);
+
+ context->notify = &ctx_dummy_notify;
+ context->notify_page = NULL;
+
+ /*
+ * If we collide with an existing context we generate a new
+ * and use it instead. The VMX will determine if regeneration
+ * is okay. Since there isn't 4B - 16 VMs running on a given
+ * host, the below loop will terminate.
+ */
+ spin_lock(&ctx_list.lock);
+
+ while (vmci_ctx_exists(cid)) {
+
+ /*
+ * If the cid is below our limit and we collide we are
+ * creating duplicate contexts internally so we want
+ * to assert fail in that case.
+ */
+ ASSERT(cid >= VMCI_RESERVED_CID_LIMIT);
+
+ /* We reserve the lowest 16 ids for fixed contexts. */
+ cid = max(cid, VMCI_RESERVED_CID_LIMIT - 1) + 1;
+ if (cid == VMCI_INVALID_ID)
+ cid = VMCI_RESERVED_CID_LIMIT;
+ }
+ ASSERT(!vmci_ctx_exists(cid));
+ context->cid = cid;
+
+ list_add_tail_rcu(&context->list_item, &ctx_list.head);
+ spin_unlock(&ctx_list.lock);
+
+ return context;
+
+ err_free_db_array:
+ vmci_handle_arr_destroy(context->doorbell_array);
+ err_free_qp_array:
+ vmci_handle_arr_destroy(context->queue_pair_array);
+ err_free_ctx:
+ kfree(context);
+ err_out:
+ return ERR_PTR(error);
+}
+
+/*
+ * Destroy VMCI context.
+ */
+void vmci_ctx_destroy(struct vmci_ctx *context)
+{
+ spin_lock(&ctx_list.lock);
+ list_del_rcu(&context->list_item);
+ spin_unlock(&ctx_list.lock);
+ synchronize_rcu();
+
+ vmci_ctx_put(context);
+}
+
+/*
+ * Fire notification for all contexts interested in given cid.
+ */
+static int ctx_fire_notification(u32 context_id, u32 priv_flags)
+{
+ u32 i, array_size;
+ struct vmci_ctx *sub_ctx;
+ struct vmci_handle_arr *subscriber_array;
+ struct vmci_handle context_handle =
+ vmci_make_handle(context_id, VMCI_EVENT_HANDLER);
+
+ /*
+ * We create an array to hold the subscribers we find when
+ * scanning through all contexts.
+ */
+ subscriber_array = vmci_handle_arr_create(0);
+ if (subscriber_array == NULL)
+ return VMCI_ERROR_NO_MEM;
+
+ /*
+ * Scan all contexts to find who is interested in being
+ * notified about given contextID.
+ */
+ rcu_read_lock();
+ list_for_each_entry_rcu(sub_ctx, &ctx_list.head, list_item) {
+ struct vmci_handle_list *node;
+
+ /*
+ * We only deliver notifications of the removal of
+ * contexts, if the two contexts are allowed to
+ * interact.
+ */
+ if (vmci_deny_interaction(priv_flags, sub_ctx->priv_flags))
+ continue;
+
+ list_for_each_entry_rcu(node, &sub_ctx->notifier_list, node) {
+ if (!VMCI_HANDLE_EQUAL(node->handle, context_handle))
+ continue;
+
+ vmci_handle_arr_append_entry(&subscriber_array,
+ vmci_make_handle(sub_ctx->cid,
+ VMCI_EVENT_HANDLER));
+ }
+ }
+ rcu_read_unlock();
+
+ /* Fire event to all subscribers. */
+ array_size = vmci_handle_arr_get_size(subscriber_array);
+ for (i = 0; i < array_size; i++) {
+ int result;
+ struct vmci_event_msg *e_msg;
+ struct vmci_event_payld_ctx *ev_payload;
+ char buf[sizeof(*e_msg) + sizeof(*ev_payload)];
+
+ e_msg = (struct vmci_event_msg *)buf;
+
+ /* Clear out any garbage. */
+ memset(e_msg, 0, sizeof(*e_msg) + sizeof(*ev_payload));
+ e_msg->hdr.dst = vmci_handle_arr_get_entry(subscriber_array, i);
+ e_msg->hdr.src =
+ vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_CONTEXT_RESOURCE_ID);
+ e_msg->hdr.payload_size =
+ sizeof(*e_msg) + sizeof(*ev_payload) - sizeof(e_msg->hdr);
+ e_msg->event_data.event = VMCI_EVENT_CTX_REMOVED;
+ ev_payload = vmci_event_data_payload(&e_msg->event_data);
+ ev_payload->context_id = context_id;
+
+ result = vmci_datagram_dispatch(VMCI_HYPERVISOR_CONTEXT_ID,
+ &e_msg->hdr, false);
+ if (result < VMCI_SUCCESS) {
+ pr_devel("Failed to enqueue event datagram (type=%d) for context (ID=0x%x).\n",
+ e_msg->event_data.event,
+ e_msg->hdr.dst.context);
+ /* We continue to enqueue on next subscriber. */
+ }
+ }
+ vmci_handle_arr_destroy(subscriber_array);
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Returns the current number of pending datagrams. The call may
+ * also serve as a synchronization point for the datagram queue,
+ * as no enqueue operations can occur concurrently.
+ */
+int vmci_ctx_pending_datagrams(u32 cid, u32 *pending)
+{
+ struct vmci_ctx *context;
+
+ context = vmci_ctx_get(cid);
+ if (context == NULL)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ spin_lock(&context->lock);
+ if (pending)
+ *pending = context->pending_datagrams;
+ spin_unlock(&context->lock);
+ vmci_ctx_put(context);
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Queues a VMCI datagram for the appropriate target VM context.
+ */
+int vmci_ctx_enqueue_datagram(u32 cid, struct vmci_datagram *dg)
+{
+ struct vmci_datagram_queue_entry *dq_entry;
+ struct vmci_ctx *context;
+ struct vmci_handle dg_src;
+ size_t vmci_dg_size;
+
+ vmci_dg_size = VMCI_DG_SIZE(dg);
+ if (vmci_dg_size > VMCI_MAX_DG_SIZE) {
+ pr_devel("Datagram too large (bytes=%Zu).\n", vmci_dg_size);
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ /* Get the target VM's VMCI context. */
+ context = vmci_ctx_get(cid);
+ if (!context) {
+ pr_devel("Invalid context (ID=0x%x).\n", cid);
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ /* Allocate guest call entry and add it to the target VM's queue. */
+ dq_entry = kmalloc(sizeof(*dq_entry), GFP_KERNEL);
+ if (dq_entry == NULL) {
+ pr_warn("Failed to allocate memory for datagram.\n");
+ vmci_ctx_put(context);
+ return VMCI_ERROR_NO_MEM;
+ }
+ dq_entry->dg = dg;
+ dq_entry->dg_size = vmci_dg_size;
+ dg_src = dg->src;
+ INIT_LIST_HEAD(&dq_entry->list_item);
+
+ spin_lock(&context->lock);
+
+ /*
+ * We put a higher limit on datagrams from the hypervisor. If
+ * the pending datagram is not from hypervisor, then we check
+ * if enqueueing it would exceed the
+ * VMCI_MAX_DATAGRAM_QUEUE_SIZE limit on the destination. If
+ * the pending datagram is from hypervisor, we allow it to be
+ * queued at the destination side provided we don't reach the
+ * VMCI_MAX_DATAGRAM_AND_EVENT_QUEUE_SIZE limit.
+ */
+ if (context->datagram_queue_size + vmci_dg_size >=
+ VMCI_MAX_DATAGRAM_QUEUE_SIZE &&
+ (!VMCI_HANDLE_EQUAL(dg_src,
+ vmci_make_handle
+ (VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_CONTEXT_RESOURCE_ID)) ||
+ context->datagram_queue_size + vmci_dg_size >=
+ VMCI_MAX_DATAGRAM_AND_EVENT_QUEUE_SIZE)) {
+ spin_unlock(&context->lock);
+ vmci_ctx_put(context);
+ kfree(dq_entry);
+ pr_devel("Context (ID=0x%x) receive queue is full.\n", cid);
+ return VMCI_ERROR_NO_RESOURCES;
+ }
+
+ list_add(&dq_entry->list_item, &context->datagram_queue);
+ context->pending_datagrams++;
+ context->datagram_queue_size += vmci_dg_size;
+ ctx_signal_notify(context);
+ wake_up(&context->host_context.wait_queue);
+ spin_unlock(&context->lock);
+ vmci_ctx_put(context);
+
+ return vmci_dg_size;
+}
+
+/*
+ * Verifies whether a context with the specified context ID exists.
+ * FIXME: utility is dubious as no decisions can be reliably made
+ * using this data as context can appear and disappear at any time.
+ */
+bool vmci_ctx_exists(u32 cid)
+{
+ struct vmci_ctx *context;
+ bool exists = false;
+
+ rcu_read_lock();
+
+ list_for_each_entry_rcu(context, &ctx_list.head, list_item) {
+ if (context->cid == cid) {
+ exists = true;
+ break;
+ }
+ }
+
+ rcu_read_unlock();
+ return exists;
+}
+
+/*
+ * Retrieves VMCI context corresponding to the given cid.
+ */
+struct vmci_ctx *vmci_ctx_get(u32 cid)
+{
+ struct vmci_ctx *c, *context = NULL;
+
+ if (cid == VMCI_INVALID_ID)
+ return NULL;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(c, &ctx_list.head, list_item) {
+ if (c->cid == cid) {
+ /*
+ * The context owner drops its own reference to the
+ * context only after removing it from the list and
+ * waiting for RCU grace period to expire. This
+ * means that we are not about to increase the
+ * reference count of something that is in the
+ * process of being destroyed.
+ */
+ context = c;
+ kref_get(&context->kref);
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return context;
+}
+
+/*
+ * Deallocates all parts of a context data structure. This
+ * function doesn't lock the context, because it assumes that
+ * the caller was holding the last reference to context.
+ */
+static void ctx_free_ctx(struct kref *kref)
+{
+ struct vmci_ctx *context = container_of(kref, struct vmci_ctx, kref);
+ struct vmci_datagram_queue_entry *dq_entry, *dq_entry_tmp;
+ struct vmci_handle temp_handle;
+ struct vmci_handle_list *notifier, *tmp;
+
+ /*
+ * Fire event to all contexts interested in knowing this
+ * context is dying.
+ */
+ ctx_fire_notification(context->cid, context->priv_flags);
+
+ /*
+ * Cleanup all queue pair resources attached to context. If
+ * the VM dies without cleaning up, this code will make sure
+ * that no resources are leaked.
+ */
+ temp_handle = vmci_handle_arr_get_entry(context->queue_pair_array, 0);
+ while (!VMCI_HANDLE_EQUAL(temp_handle, VMCI_INVALID_HANDLE)) {
+ if (vmci_qp_broker_detach(temp_handle,
+ context) < VMCI_SUCCESS) {
+ /*
+ * When vmci_qp_broker_detach() succeeds it
+ * removes the handle from the array. If
+ * detach fails, we must remove the handle
+ * ourselves.
+ */
+ vmci_handle_arr_remove_entry(context->queue_pair_array,
+ temp_handle);
+ }
+ temp_handle =
+ vmci_handle_arr_get_entry(context->queue_pair_array, 0);
+ }
+
+ /*
+ * It is fine to destroy this without locking the callQueue, as
+ * this is the only thread having a reference to the context.
+ */
+ list_for_each_entry_safe(dq_entry, dq_entry_tmp,
+ &context->datagram_queue, list_item) {
+ WARN_ON(dq_entry->dg_size != VMCI_DG_SIZE(dq_entry->dg));
+ list_del(&dq_entry->list_item);
+ kfree(dq_entry->dg);
+ kfree(dq_entry);
+ }
+
+ list_for_each_entry_safe(notifier, tmp,
+ &context->notifier_list, node) {
+ list_del(¬ifier->node);
+ kfree(notifier);
+ }
+
+ vmci_handle_arr_destroy(context->queue_pair_array);
+ vmci_handle_arr_destroy(context->doorbell_array);
+ vmci_handle_arr_destroy(context->pending_doorbell_array);
+ vmci_ctx_unset_notify(context);
+ if (context->cred)
+ put_cred(context->cred);
+ kfree(context);
+}
+
+/*
+ * Drops reference to VMCI context. If this is the last reference to
+ * the context it will be deallocated. A context is created with
+ * a reference count of one, and on destroy, it is removed from
+ * the context list before its reference count is decremented. Thus,
+ * if we reach zero, we are sure that nobody else are about to increment
+ * it (they need the entry in the context list for that), and so there
+ * is no need for locking.
+ */
+void vmci_ctx_put(struct vmci_ctx *context)
+{
+ kref_put(&context->kref, ctx_free_ctx);
+}
+
+/*
+ * Dequeues the next datagram and returns it to caller.
+ * The caller passes in a pointer to the max size datagram
+ * it can handle and the datagram is only unqueued if the
+ * size is less than max_size. If larger max_size is set to
+ * the size of the datagram to give the caller a chance to
+ * set up a larger buffer for the guestcall.
+ */
+int vmci_ctx_dequeue_datagram(struct vmci_ctx *context,
+ size_t *max_size,
+ struct vmci_datagram **dg)
+{
+ struct vmci_datagram_queue_entry *dq_entry;
+ struct list_head *list_item;
+ int rv;
+
+ /* Dequeue the next datagram entry. */
+ spin_lock(&context->lock);
+ if (context->pending_datagrams == 0) {
+ ctx_clear_notify_call(context);
+ spin_unlock(&context->lock);
+ pr_devel("No datagrams pending.\n");
+ return VMCI_ERROR_NO_MORE_DATAGRAMS;
+ }
+
+ list_item = context->datagram_queue.next;
+ ASSERT(!list_empty(&context->datagram_queue));
+
+ dq_entry =
+ list_entry(list_item, struct vmci_datagram_queue_entry, list_item);
+
+ /* Check size of caller's buffer. */
+ if (*max_size < dq_entry->dg_size) {
+ *max_size = dq_entry->dg_size;
+ spin_unlock(&context->lock);
+ pr_devel("Caller's buffer should be at least (size=%u bytes).\n",
+ (u32) *max_size);
+ return VMCI_ERROR_NO_MEM;
+ }
+
+ list_del(list_item);
+ context->pending_datagrams--;
+ context->datagram_queue_size -= dq_entry->dg_size;
+ if (context->pending_datagrams == 0) {
+ ctx_clear_notify_call(context);
+ rv = VMCI_SUCCESS;
+ } else {
+ /*
+ * Return the size of the next datagram.
+ */
+ struct vmci_datagram_queue_entry *next_entry;
+
+ list_item = context->datagram_queue.next;
+ ASSERT(!list_empty(&context->datagram_queue));
+ next_entry =
+ list_entry(list_item, struct vmci_datagram_queue_entry,
+ list_item);
+
+ /*
+ * The following size_t -> int truncation is fine as
+ * the maximum size of a (routable) datagram is 68KB.
+ */
+ rv = (int)next_entry->dg_size;
+ }
+ spin_unlock(&context->lock);
+
+ /* Caller must free datagram. */
+ ASSERT(dq_entry->dg_size == VMCI_DG_SIZE(dq_entry->dg));
+ *dg = dq_entry->dg;
+ dq_entry->dg = NULL;
+ kfree(dq_entry);
+
+ return rv;
+}
+
+/*
+ * Reverts actions set up by vmci_setup_notify(). Unmaps and unlocks the
+ * page mapped/locked by vmci_setup_notify().
+ */
+void vmci_ctx_unset_notify(struct vmci_ctx *context)
+{
+ struct page *notify_page;
+
+ spin_lock(&context->lock);
+
+ notify_page = context->notify_page;
+ context->notify = &ctx_dummy_notify;
+ context->notify_page = NULL;
+
+ spin_unlock(&context->lock);
+
+ if (notify_page) {
+ kunmap(notify_page);
+ put_page(notify_page);
+ }
+}
+
+/*
+ * Add remote_cid to list of contexts current contexts wants
+ * notifications from/about.
+ */
+int vmci_ctx_add_notification(u32 context_id, u32 remote_cid)
+{
+ struct vmci_ctx *context;
+ struct vmci_handle_list *notifier, *n;
+ int result;
+ bool exists = false;
+
+ context = vmci_ctx_get(context_id);
+ if (!context)
+ return VMCI_ERROR_NOT_FOUND;
+
+ if (VMCI_CONTEXT_IS_VM(context_id) && VMCI_CONTEXT_IS_VM(remote_cid)) {
+ pr_devel("Context removed notifications for other VMs not supported (src=0x%x, remote=0x%x).\n",
+ context_id, remote_cid);
+ result = VMCI_ERROR_DST_UNREACHABLE;
+ goto out;
+ }
+
+ if (context->priv_flags & VMCI_PRIVILEGE_FLAG_RESTRICTED) {
+ result = VMCI_ERROR_NO_ACCESS;
+ goto out;
+ }
+
+ notifier = kmalloc(sizeof(struct vmci_handle_list), GFP_KERNEL);
+ if (!notifier) {
+ result = VMCI_ERROR_NO_MEM;
+ goto out;
+ }
+
+ INIT_LIST_HEAD(¬ifier->node);
+ notifier->handle = vmci_make_handle(remote_cid, VMCI_EVENT_HANDLER);
+
+ spin_lock(&context->lock);
+
+ list_for_each_entry(n, &context->notifier_list, node) {
+ if (VMCI_HANDLE_EQUAL(n->handle, notifier->handle)) {
+ exists = true;
+ break;
+ }
+ }
+
+ if (exists) {
+ kfree(notifier);
+ result = VMCI_ERROR_ALREADY_EXISTS;
+ } else {
+ list_add_tail_rcu(¬ifier->node, &context->notifier_list);
+ context->n_notifiers++;
+ result = VMCI_SUCCESS;
+ }
+
+ spin_unlock(&context->lock);
+
+ out:
+ vmci_ctx_put(context);
+ return result;
+}
+
+/*
+ * Remove remote_cid from current context's list of contexts it is
+ * interested in getting notifications from/about.
+ */
+int vmci_ctx_remove_notification(u32 context_id, u32 remote_cid)
+{
+ struct vmci_ctx *context;
+ struct vmci_handle_list *notifier, *tmp;
+ struct vmci_handle handle;
+ bool found = false;
+
+ context = vmci_ctx_get(context_id);
+ if (!context)
+ return VMCI_ERROR_NOT_FOUND;
+
+ handle = vmci_make_handle(remote_cid, VMCI_EVENT_HANDLER);
+
+ spin_lock(&context->lock);
+ list_for_each_entry_safe(notifier, tmp,
+ &context->notifier_list, node) {
+ if (VMCI_HANDLE_EQUAL(notifier->handle, handle)) {
+ list_del_rcu(¬ifier->node);
+ context->n_notifiers--;
+ found = true;
+ break;
+ }
+ }
+ spin_unlock(&context->lock);
+
+ if (found) {
+ synchronize_rcu();
+ kfree(notifier);
+ }
+
+ vmci_ctx_put(context);
+
+ return found ? VMCI_SUCCESS : VMCI_ERROR_NOT_FOUND;
+}
+
+static int vmci_ctx_get_chkpt_notifiers(struct vmci_ctx *context,
+ u32 *buf_size, void **pbuf)
+{
+ u32 *notifiers;
+ size_t data_size;
+ struct vmci_handle_list *entry;
+ int i = 0;
+
+ if (context->n_notifiers == 0) {
+ *buf_size = 0;
+ *pbuf = NULL;
+ return VMCI_SUCCESS;
+ }
+
+ data_size = context->n_notifiers * sizeof(*notifiers);
+ if (*buf_size < data_size) {
+ *buf_size = data_size;
+ return VMCI_ERROR_MORE_DATA;
+ }
+
+ notifiers = kmalloc(data_size, GFP_ATOMIC); /* FIXME: want GFP_KERNEL */
+ if (!notifiers)
+ return VMCI_ERROR_NO_MEM;
+
+ list_for_each_entry(entry, &context->notifier_list, node)
+ notifiers[i++] = entry->handle.context;
+
+ *buf_size = data_size;
+ *pbuf = notifiers;
+ return VMCI_SUCCESS;
+}
+
+static int vmci_ctx_get_chkpt_doorbells(struct vmci_ctx *context,
+ u32 *buf_size, void **pbuf)
+{
+ struct dbell_cpt_state *dbells;
+ size_t n_doorbells;
+ int i;
+
+ n_doorbells = vmci_handle_arr_get_size(context->doorbell_array);
+ if (n_doorbells > 0) {
+ size_t data_size = n_doorbells * sizeof(*dbells);
+ if (*buf_size < data_size) {
+ *buf_size = data_size;
+ return VMCI_ERROR_MORE_DATA;
+ }
+
+ dbells = kmalloc(data_size, GFP_ATOMIC);
+ if (!dbells)
+ return VMCI_ERROR_NO_MEM;
+
+ for (i = 0; i < n_doorbells; i++)
+ dbells[i].handle = vmci_handle_arr_get_entry(
+ context->doorbell_array, i);
+
+ *buf_size = data_size;
+ *pbuf = dbells;
+ } else {
+ *buf_size = 0;
+ *pbuf = NULL;
+ }
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Get current context's checkpoint state of given type.
+ */
+int vmci_ctx_get_chkpt_state(u32 context_id,
+ u32 cpt_type,
+ u32 *buf_size,
+ void **pbuf)
+{
+ struct vmci_ctx *context;
+ int result;
+
+ context = vmci_ctx_get(context_id);
+ if (!context)
+ return VMCI_ERROR_NOT_FOUND;
+
+ spin_lock(&context->lock);
+
+ switch (cpt_type) {
+ case VMCI_NOTIFICATION_CPT_STATE:
+ result = vmci_ctx_get_chkpt_notifiers(context, buf_size, pbuf);
+ break;
+
+ case VMCI_WELLKNOWN_CPT_STATE:
+ /*
+ * For compatibility with VMX'en with VM to VM communication, we
+ * always return zero wellknown handles.
+ */
+
+ *buf_size = 0;
+ *pbuf = NULL;
+ result = VMCI_SUCCESS;
+ break;
+
+ case VMCI_DOORBELL_CPT_STATE:
+ result = vmci_ctx_get_chkpt_doorbells(context, buf_size, pbuf);
+ break;
+
+ default:
+ pr_devel("Invalid cpt state (type=%d).\n", cpt_type);
+ result = VMCI_ERROR_INVALID_ARGS;
+ break;
+ }
+
+ spin_unlock(&context->lock);
+ vmci_ctx_put(context);
+
+ return result;
+}
+
+/*
+ * Set current context's checkpoint state of given type.
+ */
+int vmci_ctx_set_chkpt_state(u32 context_id,
+ u32 cpt_type,
+ u32 buf_size,
+ void *cpt_buf)
+{
+ u32 i;
+ u32 current_id;
+ int result = VMCI_SUCCESS;
+ u32 num_ids = buf_size / sizeof(u32);
+
+ if (cpt_type == VMCI_WELLKNOWN_CPT_STATE && num_ids > 0) {
+ /*
+ * We would end up here if VMX with VM to VM communication
+ * attempts to restore a checkpoint with wellknown handles.
+ */
+ pr_warn("Attempt to restore checkpoint with obsolete wellknown handles.\n");
+ return VMCI_ERROR_OBSOLETE;
+ }
+
+ if (cpt_type != VMCI_NOTIFICATION_CPT_STATE) {
+ pr_devel("Invalid cpt state (type=%d).\n", cpt_type);
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ for (i = 0; i < num_ids && result == VMCI_SUCCESS; i++) {
+ current_id = ((u32 *) cpt_buf)[i];
+ result = vmci_ctx_add_notification(context_id, current_id);
+ if (result != VMCI_SUCCESS)
+ break;
+ }
+ if (result != VMCI_SUCCESS)
+ pr_devel("Failed to set cpt state (type=%d) (error=%d).\n",
+ cpt_type, result);
+
+ return result;
+}
+
+/*
+ * Retrieves the specified context's pending notifications in the
+ * form of a handle array. The handle arrays returned are the
+ * actual data - not a copy and should not be modified by the
+ * caller. They must be released using
+ * vmci_ctx_rcv_notifications_release.
+ */
+int vmci_ctx_rcv_notifications_get(u32 context_id,
+ struct vmci_handle_arr **db_handle_array,
+ struct vmci_handle_arr **qp_handle_array)
+{
+ struct vmci_ctx *context;
+ int result = VMCI_SUCCESS;
+
+ context = vmci_ctx_get(context_id);
+ if (context == NULL)
+ return VMCI_ERROR_NOT_FOUND;
+
+ spin_lock(&context->lock);
+
+ *db_handle_array = context->pending_doorbell_array;
+ context->pending_doorbell_array = vmci_handle_arr_create(0);
+ if (!context->pending_doorbell_array) {
+ context->pending_doorbell_array = *db_handle_array;
+ *db_handle_array = NULL;
+ result = VMCI_ERROR_NO_MEM;
+ }
+ *qp_handle_array = NULL;
+
+ spin_unlock(&context->lock);
+ vmci_ctx_put(context);
+
+ return result;
+}
+
+/*
+ * Releases handle arrays with pending notifications previously
+ * retrieved using vmci_ctx_rcv_notifications_get. If the
+ * notifications were not successfully handed over to the guest,
+ * success must be false.
+ */
+void vmci_ctx_rcv_notifications_release(u32 context_id,
+ struct vmci_handle_arr *db_handle_array,
+ struct vmci_handle_arr *qp_handle_array,
+ bool success)
+{
+ struct vmci_ctx *context = vmci_ctx_get(context_id);
+
+ if (!context) {
+ /*
+ * The OS driver part is holding on to the context for the
+ * duration of the receive notification ioctl, so it should
+ * still be here.
+ */
+ ASSERT(false);
+ }
+
+ spin_lock(&context->lock);
+ if (!success) {
+ struct vmci_handle handle;
+
+ /*
+ * New notifications may have been added while we were not
+ * holding the context lock, so we transfer any new pending
+ * doorbell notifications to the old array, and reinstate the
+ * old array.
+ */
+
+ handle = vmci_handle_arr_remove_tail(
+ context->pending_doorbell_array);
+ while (!VMCI_HANDLE_INVALID(handle)) {
+ ASSERT(vmci_handle_arr_has_entry(
+ context->doorbell_array, handle));
+ if (!vmci_handle_arr_has_entry(db_handle_array,
+ handle)) {
+ vmci_handle_arr_append_entry(
+ &db_handle_array, handle);
+ }
+ handle = vmci_handle_arr_remove_tail(
+ context->pending_doorbell_array);
+ }
+ vmci_handle_arr_destroy(context->pending_doorbell_array);
+ context->pending_doorbell_array = db_handle_array;
+ db_handle_array = NULL;
+ } else {
+ ctx_clear_notify_call(context);
+ }
+ spin_unlock(&context->lock);
+ vmci_ctx_put(context);
+
+ if (db_handle_array)
+ vmci_handle_arr_destroy(db_handle_array);
+
+ if (qp_handle_array)
+ vmci_handle_arr_destroy(qp_handle_array);
+}
+
+/*
+ * Registers that a new doorbell handle has been allocated by the
+ * context. Only doorbell handles registered can be notified.
+ */
+int vmci_ctx_dbell_create(u32 context_id, struct vmci_handle handle)
+{
+ struct vmci_ctx *context;
+ int result;
+
+ if (context_id == VMCI_INVALID_ID || VMCI_HANDLE_INVALID(handle))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ context = vmci_ctx_get(context_id);
+ if (context == NULL)
+ return VMCI_ERROR_NOT_FOUND;
+
+ spin_lock(&context->lock);
+ if (!vmci_handle_arr_has_entry(context->doorbell_array, handle)) {
+ vmci_handle_arr_append_entry(&context->doorbell_array, handle);
+ result = VMCI_SUCCESS;
+ } else {
+ result = VMCI_ERROR_DUPLICATE_ENTRY;
+ }
+
+ spin_unlock(&context->lock);
+ vmci_ctx_put(context);
+
+ return result;
+}
+
+/*
+ * Unregisters a doorbell handle that was previously registered
+ * with vmci_ctx_dbell_create.
+ */
+int vmci_ctx_dbell_destroy(u32 context_id, struct vmci_handle handle)
+{
+ struct vmci_ctx *context;
+ struct vmci_handle removed_handle;
+
+ if (context_id == VMCI_INVALID_ID || VMCI_HANDLE_INVALID(handle))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ context = vmci_ctx_get(context_id);
+ if (context == NULL)
+ return VMCI_ERROR_NOT_FOUND;
+
+ spin_lock(&context->lock);
+ removed_handle =
+ vmci_handle_arr_remove_entry(context->doorbell_array, handle);
+ vmci_handle_arr_remove_entry(context->pending_doorbell_array, handle);
+ spin_unlock(&context->lock);
+
+ vmci_ctx_put(context);
+
+ return VMCI_HANDLE_INVALID(removed_handle) ?
+ VMCI_ERROR_NOT_FOUND : VMCI_SUCCESS;
+}
+
+/*
+ * Unregisters all doorbell handles that were previously
+ * registered with vmci_ctx_dbell_create.
+ */
+int vmci_ctx_dbell_destroy_all(u32 context_id)
+{
+ struct vmci_ctx *context;
+ struct vmci_handle handle;
+
+ if (context_id == VMCI_INVALID_ID)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ context = vmci_ctx_get(context_id);
+ if (context == NULL)
+ return VMCI_ERROR_NOT_FOUND;
+
+ spin_lock(&context->lock);
+ do {
+ struct vmci_handle_arr *arr = context->doorbell_array;
+ handle = vmci_handle_arr_remove_tail(arr);
+ } while (!VMCI_HANDLE_INVALID(handle));
+ do {
+ struct vmci_handle_arr *arr = context->pending_doorbell_array;
+ handle = vmci_handle_arr_remove_tail(arr);
+ } while (!VMCI_HANDLE_INVALID(handle));
+ spin_unlock(&context->lock);
+
+ vmci_ctx_put(context);
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Registers a notification of a doorbell handle initiated by the
+ * specified source context. The notification of doorbells are
+ * subject to the same isolation rules as datagram delivery. To
+ * allow host side senders of notifications a finer granularity
+ * of sender rights than those assigned to the sending context
+ * itself, the host context is required to specify a different
+ * set of privilege flags that will override the privileges of
+ * the source context.
+ */
+int vmci_ctx_notify_dbell(u32 src_cid,
+ struct vmci_handle handle,
+ u32 src_priv_flags)
+{
+ struct vmci_ctx *dst_context;
+ int result;
+
+ if (VMCI_HANDLE_INVALID(handle))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ /* Get the target VM's VMCI context. */
+ dst_context = vmci_ctx_get(handle.context);
+ if (!dst_context) {
+ pr_devel("Invalid context (ID=0x%x).\n", handle.context);
+ return VMCI_ERROR_NOT_FOUND;
+ }
+
+ if (src_cid != handle.context) {
+ u32 dst_priv_flags;
+
+ if (VMCI_CONTEXT_IS_VM(src_cid) &&
+ VMCI_CONTEXT_IS_VM(handle.context)) {
+ pr_devel("Doorbell notification from VM to VM not supported (src=0x%x, dst=0x%x).\n",
+ src_cid, handle.context);
+ result = VMCI_ERROR_DST_UNREACHABLE;
+ goto out;
+ }
+
+ result = vmci_dbell_get_priv_flags(handle, &dst_priv_flags);
+ if (result < VMCI_SUCCESS) {
+ pr_warn("Failed to get privilege flags for destination (handle=0x%x:0x%x).\n",
+ handle.context, handle.resource);
+ goto out;
+ }
+
+ if (src_cid != VMCI_HOST_CONTEXT_ID ||
+ src_priv_flags == VMCI_NO_PRIVILEGE_FLAGS) {
+ src_priv_flags = vmci_context_get_priv_flags(src_cid);
+ }
+
+ if (vmci_deny_interaction(src_priv_flags, dst_priv_flags)) {
+ result = VMCI_ERROR_NO_ACCESS;
+ goto out;
+ }
+ }
+
+ if (handle.context == VMCI_HOST_CONTEXT_ID) {
+ result = vmci_dbell_host_context_notify(src_cid, handle);
+ } else {
+ spin_lock(&dst_context->lock);
+
+ if (!vmci_handle_arr_has_entry(dst_context->doorbell_array,
+ handle)) {
+ result = VMCI_ERROR_NOT_FOUND;
+ } else {
+ if (!vmci_handle_arr_has_entry(
+ dst_context->pending_doorbell_array,
+ handle)) {
+ vmci_handle_arr_append_entry(
+ &dst_context->pending_doorbell_array,
+ handle);
+
+ ctx_signal_notify(dst_context);
+ wake_up(&dst_context->host_context.wait_queue);
+
+ }
+ result = VMCI_SUCCESS;
+ }
+ spin_unlock(&dst_context->lock);
+ }
+
+ out:
+ vmci_ctx_put(dst_context);
+
+ return result;
+}
+
+bool vmci_ctx_supports_host_qp(struct vmci_ctx *context)
+{
+ return context && context->user_version >= VMCI_VERSION_HOSTQP;
+}
+
+/*
+ * Registers that a new queue pair handle has been allocated by
+ * the context.
+ */
+int vmci_ctx_qp_create(struct vmci_ctx *context, struct vmci_handle handle)
+{
+ int result;
+
+ if (context == NULL || VMCI_HANDLE_INVALID(handle))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ if (!vmci_handle_arr_has_entry(context->queue_pair_array, handle)) {
+ vmci_handle_arr_append_entry(&context->queue_pair_array,
+ handle);
+ result = VMCI_SUCCESS;
+ } else {
+ result = VMCI_ERROR_DUPLICATE_ENTRY;
+ }
+
+ return result;
+}
+
+/*
+ * Unregisters a queue pair handle that was previously registered
+ * with vmci_ctx_qp_create.
+ */
+int vmci_ctx_qp_destroy(struct vmci_ctx *context, struct vmci_handle handle)
+{
+ struct vmci_handle hndl;
+
+ if (context == NULL || VMCI_HANDLE_INVALID(handle))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ hndl = vmci_handle_arr_remove_entry(context->queue_pair_array, handle);
+
+ return VMCI_HANDLE_INVALID(hndl) ? VMCI_ERROR_NOT_FOUND : VMCI_SUCCESS;
+}
+
+/*
+ * Determines whether a given queue pair handle is registered
+ * with the given context.
+ */
+bool vmci_ctx_qp_exists(struct vmci_ctx *context, struct vmci_handle handle)
+{
+ if (context == NULL || VMCI_HANDLE_INVALID(handle))
+ return false;
+
+ return vmci_handle_arr_has_entry(context->queue_pair_array, handle);
+}
+
+/*
+ * vmci_context_get_priv_flags() - Retrieve privilege flags.
+ * @context_id: The context ID of the VMCI context.
+ *
+ * Retrieves privilege flags of the given VMCI context ID.
+ */
+u32 vmci_context_get_priv_flags(u32 context_id)
+{
+ if (vmci_host_code_active()) {
+ u32 flags;
+ struct vmci_ctx *context;
+
+ context = vmci_ctx_get(context_id);
+ if (!context)
+ return VMCI_LEAST_PRIVILEGE_FLAGS;
+
+ flags = context->priv_flags;
+ vmci_ctx_put(context);
+ return flags;
+ }
+ return VMCI_NO_PRIVILEGE_FLAGS;
+}
+EXPORT_SYMBOL_GPL(vmci_context_get_priv_flags);
+
+/*
+ * vmci_is_context_owner() - Determimnes if user is the context owner
+ * @context_id: The context ID of the VMCI context.
+ * @uid: The host user id (real kernel value).
+ *
+ * Determines whether a given UID is the owner of given VMCI context.
+ */
+bool vmci_is_context_owner(u32 context_id, kuid_t uid)
+{
+ bool is_owner = false;
+
+ if (vmci_host_code_active()) {
+ struct vmci_ctx *context = vmci_ctx_get(context_id);
+ if (context) {
+ if (context->cred)
+ is_owner = uid_eq(context->cred->uid, uid);
+ vmci_ctx_put(context);
+ }
+ }
+
+ return is_owner;
+}
+EXPORT_SYMBOL_GPL(vmci_is_context_owner);
diff --git a/drivers/misc/vmw_vmci/vmci_context.h b/drivers/misc/vmw_vmci/vmci_context.h
new file mode 100644
index 0000000..898a7f4
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_context.h
@@ -0,0 +1,183 @@
+/*
+ * VMware VMCI driver (vmciContext.h)
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef _VMCI_CONTEXT_H_
+#define _VMCI_CONTEXT_H_
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/atomic.h>
+#include <linux/kref.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+
+#include "vmci_handle_array.h"
+#include "vmci_common_int.h"
+#include "vmci_datagram.h"
+
+/* Used to determine what checkpoint state to get and set. */
+enum {
+ VMCI_NOTIFICATION_CPT_STATE = 1,
+ VMCI_WELLKNOWN_CPT_STATE = 2,
+ VMCI_DG_OUT_STATE = 3,
+ VMCI_DG_IN_STATE = 4,
+ VMCI_DG_IN_SIZE_STATE = 5,
+ VMCI_DOORBELL_CPT_STATE = 6,
+};
+
+/* Host specific struct used for signalling */
+struct vmci_host {
+ wait_queue_head_t wait_queue;
+};
+
+struct vmci_handle_list {
+ struct list_head node;
+ struct vmci_handle handle;
+};
+
+struct vmci_ctx {
+ struct list_head list_item; /* For global VMCI list. */
+ u32 cid;
+ struct kref kref;
+ struct list_head datagram_queue; /* Head of per VM queue. */
+ u32 pending_datagrams;
+ size_t datagram_queue_size; /* Size of datagram queue in bytes. */
+
+ /*
+ * Version of the code that created
+ * this context; e.g., VMX.
+ */
+ int user_version;
+ spinlock_t lock; /* Locks callQueue and handle_arrays. */
+
+ /*
+ * queue_pairs attached to. The array of
+ * handles for queue pairs is accessed
+ * from the code for QP API, and there
+ * it is protected by the QP lock. It
+ * is also accessed from the context
+ * clean up path, which does not
+ * require a lock. VMCILock is not
+ * used to protect the QP array field.
+ */
+ struct vmci_handle_arr *queue_pair_array;
+
+ /* Doorbells created by context. */
+ struct vmci_handle_arr *doorbell_array;
+
+ /* Doorbells pending for context. */
+ struct vmci_handle_arr *pending_doorbell_array;
+
+ /* Contexts current context is subscribing to. */
+ struct list_head notifier_list;
+ unsigned int n_notifiers;
+
+ struct vmci_host host_context;
+ u32 priv_flags;
+
+ const struct cred *cred;
+ bool *notify; /* Notify flag pointer - hosted only. */
+ struct page *notify_page; /* Page backing the notify UVA. */
+};
+
+/* VMCINotifyAddRemoveInfo: Used to add/remove remote context notifications. */
+struct vmci_ctx_info {
+ u32 remote_cid;
+ int result;
+};
+
+/* VMCICptBufInfo: Used to set/get current context's checkpoint state. */
+struct vmci_ctx_chkpt_buf_info {
+ u64 cpt_buf;
+ u32 cpt_type;
+ u32 buf_size;
+ s32 result;
+ u32 _pad;
+};
+
+/*
+ * VMCINotificationReceiveInfo: Used to recieve pending notifications
+ * for doorbells and queue pairs.
+ */
+struct vmci_ctx_notify_recv_info {
+ u64 db_handle_buf_uva;
+ u64 db_handle_buf_size;
+ u64 qp_handle_buf_uva;
+ u64 qp_handle_buf_size;
+ s32 result;
+ u32 _pad;
+};
+
+/*
+ * Utilility function that checks whether two entities are allowed
+ * to interact. If one of them is restricted, the other one must
+ * be trusted.
+ */
+static inline bool vmci_deny_interaction(u32 part_one, u32 part_two)
+{
+ return ((part_one & VMCI_PRIVILEGE_FLAG_RESTRICTED) &&
+ !(part_two & VMCI_PRIVILEGE_FLAG_TRUSTED)) ||
+ ((part_two & VMCI_PRIVILEGE_FLAG_RESTRICTED) &&
+ !(part_one & VMCI_PRIVILEGE_FLAG_TRUSTED));
+}
+
+struct vmci_ctx *vmci_ctx_create(u32 cid, u32 flags,
+ uintptr_t event_hnd, int version,
+ const struct cred *cred);
+void vmci_ctx_destroy(struct vmci_ctx *context);
+
+bool vmci_ctx_supports_host_qp(struct vmci_ctx *context);
+int vmci_ctx_enqueue_datagram(u32 cid, struct vmci_datagram *dg);
+int vmci_ctx_dequeue_datagram(struct vmci_ctx *context,
+ size_t *max_size, struct vmci_datagram **dg);
+int vmci_ctx_pending_datagrams(u32 cid, u32 *pending);
+struct vmci_ctx *vmci_ctx_get(u32 cid);
+void vmci_ctx_put(struct vmci_ctx *context);
+bool vmci_ctx_exists(u32 cid);
+
+int vmci_ctx_add_notification(u32 context_id, u32 remote_cid);
+int vmci_ctx_remove_notification(u32 context_id, u32 remote_cid);
+int vmci_ctx_get_chkpt_state(u32 context_id, u32 cpt_type,
+ u32 *num_cids, void **cpt_buf_ptr);
+int vmci_ctx_set_chkpt_state(u32 context_id, u32 cpt_type,
+ u32 num_cids, void *cpt_buf);
+
+int vmci_ctx_qp_create(struct vmci_ctx *context, struct vmci_handle handle);
+int vmci_ctx_qp_destroy(struct vmci_ctx *context, struct vmci_handle handle);
+bool vmci_ctx_qp_exists(struct vmci_ctx *context, struct vmci_handle handle);
+
+void vmci_ctx_check_signal_notify(struct vmci_ctx *context);
+void vmci_ctx_unset_notify(struct vmci_ctx *context);
+
+int vmci_ctx_dbell_create(u32 context_id, struct vmci_handle handle);
+int vmci_ctx_dbell_destroy(u32 context_id, struct vmci_handle handle);
+int vmci_ctx_dbell_destroy_all(u32 context_id);
+int vmci_ctx_notify_dbell(u32 cid, struct vmci_handle handle,
+ u32 src_priv_flags);
+
+int vmci_ctx_rcv_notifications_get(u32 context_id, struct vmci_handle_arr
+ **db_handle_array, struct vmci_handle_arr
+ **qp_handle_array);
+void vmci_ctx_rcv_notifications_release(u32 context_id, struct vmci_handle_arr
+ *db_handle_array, struct vmci_handle_arr
+ *qp_handle_array, bool success);
+
+static inline u32 vmci_ctx_get_id(struct vmci_ctx *context)
+{
+ if (!context)
+ return VMCI_INVALID_ID;
+ return context->cid;
+}
+
+#endif /* _VMCI_CONTEXT_H_ */
VMCI datagram Implements datagrams to allow data to be sent between host and guest.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/vmw_vmci/vmci_datagram.c | 506 +++++++++++++++++++++++++++++++++
drivers/misc/vmw_vmci/vmci_datagram.h | 52 +++
2 files changed, 558 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/vmci_datagram.c
create mode 100644 drivers/misc/vmw_vmci/vmci_datagram.h
diff --git a/drivers/misc/vmw_vmci/vmci_datagram.c b/drivers/misc/vmw_vmci/vmci_datagram.c
new file mode 100644
index 0000000..d755e31
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_datagram.c
@@ -0,0 +1,506 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/vmw_vmci_api.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/bug.h>
+
+#include "vmci_common_int.h"
+#include "vmci_datagram.h"
+#include "vmci_resource.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+#include "vmci_route.h"
+
+/*
+ * struct datagram_entry describes the datagram entity. It is used for datagram
+ * entities created only on the host.
+ */
+struct datagram_entry {
+ struct vmci_resource resource;
+ u32 flags;
+ bool run_delayed;
+ vmci_datagram_recv_cb recv_cb;
+ void *client_data;
+ u32 priv_flags;
+};
+
+struct delayed_datagram_info {
+ struct datagram_entry *entry;
+ struct vmci_datagram msg;
+ struct work_struct work;
+ bool in_dg_host_queue;
+};
+
+/* Number of in-flight host->host datagrams */
+static atomic_t delayed_dg_host_queue_size = ATOMIC_INIT(0);
+
+/*
+ * Create a datagram entry given a handle pointer.
+ */
+static int dg_create_handle(u32 resource_id,
+ u32 flags,
+ u32 priv_flags,
+ vmci_datagram_recv_cb recv_cb,
+ void *client_data, struct vmci_handle *out_handle)
+{
+ int result;
+ u32 context_id;
+ struct vmci_handle handle;
+ struct datagram_entry *entry;
+
+ BUG_ON(!recv_cb);
+ BUG_ON(!out_handle);
+ BUG_ON(priv_flags & ~VMCI_PRIVILEGE_ALL_FLAGS);
+
+ if ((flags & VMCI_FLAG_WELLKNOWN_DG_HND) != 0)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ if ((flags & VMCI_FLAG_ANYCID_DG_HND) != 0) {
+ context_id = VMCI_INVALID_ID;
+ } else {
+ context_id = vmci_get_context_id();
+ if (context_id == VMCI_INVALID_ID)
+ return VMCI_ERROR_NO_RESOURCES;
+ }
+
+ handle = vmci_make_handle(context_id, resource_id);
+
+ entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry) {
+ pr_warn("Failed allocating memory for datagram entry.\n");
+ return VMCI_ERROR_NO_MEM;
+ }
+
+ entry->run_delayed = (flags & VMCI_FLAG_DG_DELAYED_CB) ? true : false;
+ entry->flags = flags;
+ entry->recv_cb = recv_cb;
+ entry->client_data = client_data;
+ entry->priv_flags = priv_flags;
+
+ /* Make datagram resource live. */
+ result = vmci_resource_add(&entry->resource,
+ VMCI_RESOURCE_TYPE_DATAGRAM,
+ handle);
+ if (result != VMCI_SUCCESS) {
+ pr_warn("Failed to add new resource (handle=0x%x:0x%x), error: %d\n",
+ handle.context, handle.resource, result);
+ kfree(entry);
+ return result;
+ }
+
+ *out_handle = vmci_resource_handle(&entry->resource);
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Internal utility function with the same purpose as
+ * vmci_datagram_get_priv_flags that also takes a context_id.
+ */
+static int vmci_datagram_get_priv_flags(u32 context_id,
+ struct vmci_handle handle,
+ u32 *priv_flags)
+{
+ if (context_id == VMCI_INVALID_ID)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ if (context_id == VMCI_HOST_CONTEXT_ID) {
+ struct datagram_entry *src_entry;
+ struct vmci_resource *resource;
+
+ resource = vmci_resource_by_handle(handle,
+ VMCI_RESOURCE_TYPE_DATAGRAM);
+ if (!resource)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ src_entry = container_of(resource, struct datagram_entry,
+ resource);
+ *priv_flags = src_entry->priv_flags;
+ vmci_resource_put(resource);
+ } else if (context_id == VMCI_HYPERVISOR_CONTEXT_ID)
+ *priv_flags = VMCI_MAX_PRIVILEGE_FLAGS;
+ else
+ *priv_flags = vmci_context_get_priv_flags(context_id);
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Calls the specified callback in a delayed context.
+ */
+static void dg_delayed_dispatch(struct work_struct *work)
+{
+ struct delayed_datagram_info *dg_info =
+ container_of(work, struct delayed_datagram_info, work);
+
+ dg_info->entry->recv_cb(dg_info->entry->client_data, &dg_info->msg);
+
+ vmci_resource_put(&dg_info->entry->resource);
+
+ if (dg_info->in_dg_host_queue)
+ atomic_dec(&delayed_dg_host_queue_size);
+
+ kfree(dg_info);
+}
+
+/*
+ * Dispatch datagram as a host, to the host, or other vm context. This
+ * function cannot dispatch to hypervisor context handlers. This should
+ * have been handled before we get here by vmci_datagram_dispatch.
+ * Returns number of bytes sent on success, error code otherwise.
+ */
+static int dg_dispatch_as_host(u32 context_id, struct vmci_datagram *dg)
+{
+ int retval;
+ size_t dg_size;
+ u32 src_priv_flags;
+
+ dg_size = VMCI_DG_SIZE(dg);
+
+ /* Host cannot send to the hypervisor. */
+ if (dg->dst.context == VMCI_HYPERVISOR_CONTEXT_ID)
+ return VMCI_ERROR_DST_UNREACHABLE;
+
+ /* Check that source handle matches sending context. */
+ if (dg->src.context != context_id) {
+ pr_devel("Sender context (ID=0x%x) is not owner of src datagram entry (handle=0x%x:0x%x).\n",
+ context_id, dg->src.context, dg->src.resource);
+ return VMCI_ERROR_NO_ACCESS;
+ }
+
+ /* Get hold of privileges of sending endpoint. */
+ retval = vmci_datagram_get_priv_flags(context_id, dg->src,
+ &src_priv_flags);
+ if (retval != VMCI_SUCCESS) {
+ pr_warn("Couldn't get privileges (handle=0x%x:0x%x).\n",
+ dg->src.context, dg->src.resource);
+ return retval;
+ }
+
+ /* Determine if we should route to host or guest destination. */
+ if (dg->dst.context == VMCI_HOST_CONTEXT_ID) {
+ /* Route to host datagram entry. */
+ struct datagram_entry *dst_entry;
+ struct vmci_resource *resource;
+
+ if (dg->src.context == VMCI_HYPERVISOR_CONTEXT_ID &&
+ dg->dst.resource == VMCI_EVENT_HANDLER) {
+ return vmci_event_dispatch(dg);
+ }
+
+ resource = vmci_resource_by_handle(dg->dst,
+ VMCI_RESOURCE_TYPE_DATAGRAM);
+ if (!resource) {
+ pr_devel("Sending to invalid destination (handle=0x%x:0x%x).\n",
+ dg->dst.context, dg->dst.resource);
+ return VMCI_ERROR_INVALID_RESOURCE;
+ }
+ dst_entry = container_of(resource, struct datagram_entry,
+ resource);
+ if (vmci_deny_interaction(src_priv_flags,
+ dst_entry->priv_flags)) {
+ vmci_resource_put(resource);
+ return VMCI_ERROR_NO_ACCESS;
+ }
+ ASSERT(dst_entry->recv_cb);
+
+ /*
+ * If a VMCI datagram destined for the host is also sent by the
+ * host, we always run it delayed. This ensures that no locks
+ * are held when the datagram callback runs.
+ */
+ if (dst_entry->run_delayed ||
+ dg->src.context == VMCI_HOST_CONTEXT_ID) {
+ struct delayed_datagram_info *dg_info;
+
+ if (atomic_add_return(1, &delayed_dg_host_queue_size)
+ == VMCI_MAX_DELAYED_DG_HOST_QUEUE_SIZE) {
+ atomic_dec(&delayed_dg_host_queue_size);
+ vmci_resource_put(resource);
+ return VMCI_ERROR_NO_MEM;
+ }
+
+ dg_info = kmalloc(sizeof(*dg_info) +
+ (size_t) dg->payload_size, GFP_ATOMIC);
+ if (!dg_info) {
+ atomic_dec(&delayed_dg_host_queue_size);
+ vmci_resource_put(resource);
+ return VMCI_ERROR_NO_MEM;
+ }
+
+ dg_info->in_dg_host_queue = true;
+ dg_info->entry = dst_entry;
+ memcpy(&dg_info->msg, dg, dg_size);
+
+ INIT_WORK(&dg_info->work, dg_delayed_dispatch);
+ schedule_work(&dg_info->work);
+ retval = VMCI_SUCCESS;
+
+ } else {
+ retval = dst_entry->recv_cb(dst_entry->client_data, dg);
+ vmci_resource_put(resource);
+ if (retval < VMCI_SUCCESS)
+ return retval;
+ }
+ } else {
+ /* Route to destination VM context. */
+ struct vmci_datagram *new_dg;
+
+ if (context_id != dg->dst.context) {
+ if (vmci_deny_interaction(src_priv_flags,
+ vmci_context_get_priv_flags
+ (dg->dst.context))) {
+ return VMCI_ERROR_NO_ACCESS;
+ } else if (VMCI_CONTEXT_IS_VM(context_id)) {
+ /*
+ * If the sending context is a VM, it
+ * cannot reach another VM.
+ */
+
+ pr_devel("Datagram communication between VMs not supported (src=0x%x, dst=0x%x).\n",
+ context_id, dg->dst.context);
+ return VMCI_ERROR_DST_UNREACHABLE;
+ }
+ }
+
+ /* We make a copy to enqueue. */
+ new_dg = kmalloc(dg_size, GFP_KERNEL);
+ if (new_dg == NULL)
+ return VMCI_ERROR_NO_MEM;
+
+ memcpy(new_dg, dg, dg_size);
+ retval = vmci_ctx_enqueue_datagram(dg->dst.context, new_dg);
+ if (retval < VMCI_SUCCESS) {
+ kfree(new_dg);
+ return retval;
+ }
+ }
+
+ /*
+ * We currently truncate the size to signed 32 bits. This doesn't
+ * matter for this handler as it only support 4Kb messages.
+ */
+ return (int)dg_size;
+}
+
+/*
+ * Dispatch datagram as a guest, down through the VMX and potentially to
+ * the host.
+ * Returns number of bytes sent on success, error code otherwise.
+ */
+static int dg_dispatch_as_guest(struct vmci_datagram *dg)
+{
+ int retval;
+ struct vmci_resource *resource;
+
+ resource = vmci_resource_by_handle(dg->src,
+ VMCI_RESOURCE_TYPE_DATAGRAM);
+ if (!resource)
+ return VMCI_ERROR_NO_HANDLE;
+
+ retval = vmci_send_datagram(dg);
+ vmci_resource_put(resource);
+ return retval;
+}
+
+/*
+ * Dispatch datagram. This will determine the routing for the datagram
+ * and dispatch it accordingly.
+ * Returns number of bytes sent on success, error code otherwise.
+ */
+int vmci_datagram_dispatch(u32 context_id,
+ struct vmci_datagram *dg, bool from_guest)
+{
+ int retval;
+ enum vmci_route route;
+
+ BUILD_BUG_ON(sizeof(struct vmci_datagram) != 24);
+
+ if (VMCI_DG_SIZE(dg) > VMCI_MAX_DG_SIZE) {
+ pr_devel("Payload (size=%llu bytes) too big to send.\n",
+ (unsigned long long)dg->payload_size);
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ retval = vmci_route(&dg->src, &dg->dst, from_guest, &route);
+ if (retval < VMCI_SUCCESS) {
+ pr_devel("Failed to route datagram (src=0x%x, dst=0x%x, err=%d).\n",
+ dg->src.context, dg->dst.context, retval);
+ return retval;
+ }
+
+ if (VMCI_ROUTE_AS_HOST == route) {
+ if (VMCI_INVALID_ID == context_id)
+ context_id = VMCI_HOST_CONTEXT_ID;
+ return dg_dispatch_as_host(context_id, dg);
+ }
+
+ if (VMCI_ROUTE_AS_GUEST == route)
+ return dg_dispatch_as_guest(dg);
+
+ pr_warn("Unknown route (%d) for datagram.\n", route);
+ return VMCI_ERROR_DST_UNREACHABLE;
+}
+
+/*
+ * Invoke the handler for the given datagram. This is intended to be
+ * called only when acting as a guest and receiving a datagram from the
+ * virtual device.
+ */
+int vmci_datagram_invoke_guest_handler(struct vmci_datagram *dg)
+{
+ struct vmci_resource *resource;
+ struct datagram_entry *dst_entry;
+
+ resource = vmci_resource_by_handle(dg->dst,
+ VMCI_RESOURCE_TYPE_DATAGRAM);
+ if (!resource) {
+ pr_devel("destination (handle=0x%x:0x%x) doesn't exist.\n",
+ dg->dst.context, dg->dst.resource);
+ return VMCI_ERROR_NO_HANDLE;
+ }
+
+ dst_entry = container_of(resource, struct datagram_entry, resource);
+ if (dst_entry->run_delayed) {
+ struct delayed_datagram_info *dg_info;
+
+ dg_info = kmalloc(sizeof(*dg_info) + (size_t)dg->payload_size,
+ GFP_ATOMIC);
+ if (!dg_info) {
+ vmci_resource_put(resource);
+ return VMCI_ERROR_NO_MEM;
+ }
+
+ dg_info->in_dg_host_queue = false;
+ dg_info->entry = dst_entry;
+ memcpy(&dg_info->msg, dg, VMCI_DG_SIZE(dg));
+
+ INIT_WORK(&dg_info->work, dg_delayed_dispatch);
+ schedule_work(&dg_info->work);
+ } else {
+ dst_entry->recv_cb(dst_entry->client_data, dg);
+ vmci_resource_put(resource);
+ }
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * vmci_datagram_create_handle_priv() - Create host context datagram endpoint
+ * @resource_id: The resource ID.
+ * @flags: Datagram Flags.
+ * @priv_flags: Privilege Flags.
+ * @recv_cb: Callback when receiving datagrams.
+ * @client_data: Pointer for a datagram_entry struct
+ * @out_handle: vmci_handle that is populated as a result of this function.
+ *
+ * Creates a host context datagram endpoint and returns a handle to it.
+ */
+int vmci_datagram_create_handle_priv(u32 resource_id,
+ u32 flags,
+ u32 priv_flags,
+ vmci_datagram_recv_cb recv_cb,
+ void *client_data,
+ struct vmci_handle *out_handle)
+{
+ if (out_handle == NULL)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ if (recv_cb == NULL) {
+ pr_devel("Client callback needed when creating datagram.\n");
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ if (priv_flags & ~VMCI_PRIVILEGE_ALL_FLAGS)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ return dg_create_handle(resource_id, flags, priv_flags, recv_cb,
+ client_data, out_handle);
+}
+EXPORT_SYMBOL_GPL(vmci_datagram_create_handle_priv);
+
+/*
+ * vmci_datagram_create_handle() - Create host context datagram endpoint
+ * @resource_id: Resource ID.
+ * @flags: Datagram Flags.
+ * @recv_cb: Callback when receiving datagrams.
+ * @client_ata: Pointer for a datagram_entry struct
+ * @out_handle: vmci_handle that is populated as a result of this function.
+ *
+ * Creates a host context datagram endpoint and returns a handle to
+ * it. Same as vmci_datagram_create_handle_priv without the priviledge
+ * flags argument.
+ */
+int vmci_datagram_create_handle(u32 resource_id,
+ u32 flags,
+ vmci_datagram_recv_cb recv_cb,
+ void *client_data,
+ struct vmci_handle *out_handle)
+{
+ return vmci_datagram_create_handle_priv(
+ resource_id, flags,
+ VMCI_DEFAULT_PROC_PRIVILEGE_FLAGS,
+ recv_cb, client_data,
+ out_handle);
+}
+EXPORT_SYMBOL_GPL(vmci_datagram_create_handle);
+
+/*
+ * vmci_datagram_destroy_handle() - Destroys datagram handle
+ * @handle: vmci_handle to be destroyed and reaped.
+ *
+ * Use this function to destroy any datagram handles created by
+ * vmci_datagram_create_handle{,Priv} functions.
+ */
+int vmci_datagram_destroy_handle(struct vmci_handle handle)
+{
+ struct datagram_entry *entry;
+ struct vmci_resource *resource;
+
+ resource = vmci_resource_by_handle(handle, VMCI_RESOURCE_TYPE_DATAGRAM);
+ if (!resource) {
+ pr_devel("Failed to destroy datagram (handle=0x%x:0x%x).\n",
+ handle.context, handle.resource);
+ return VMCI_ERROR_NOT_FOUND;
+ }
+
+ entry = container_of(resource, struct datagram_entry, resource);
+
+ vmci_resource_put(&entry->resource);
+ vmci_resource_remove(&entry->resource);
+ kfree(entry);
+
+ return VMCI_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(vmci_datagram_destroy_handle);
+
+/*
+ * vmci_datagram_send() - Send a datagram
+ * @msg: The datagram to send.
+ *
+ * Sends the provided datagram on its merry way.
+ */
+int vmci_datagram_send(struct vmci_datagram *msg)
+{
+ if (msg == NULL)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ return vmci_datagram_dispatch(VMCI_INVALID_ID, msg, false);
+}
+EXPORT_SYMBOL_GPL(vmci_datagram_send);
diff --git a/drivers/misc/vmw_vmci/vmci_datagram.h b/drivers/misc/vmw_vmci/vmci_datagram.h
new file mode 100644
index 0000000..eb4aab7
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_datagram.h
@@ -0,0 +1,52 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef _VMCI_DATAGRAM_H_
+#define _VMCI_DATAGRAM_H_
+
+#include <linux/types.h>
+#include <linux/list.h>
+
+#include "vmci_context.h"
+
+#define VMCI_MAX_DELAYED_DG_HOST_QUEUE_SIZE 256
+
+/*
+ * The struct vmci_datagram_queue_entry is a queue header for the in-kernel VMCI
+ * datagram queues. It is allocated in non-paged memory, as the
+ * content is accessed while holding a spinlock. The pending datagram
+ * itself may be allocated from paged memory. We shadow the size of
+ * the datagram in the non-paged queue entry as this size is used
+ * while holding the same spinlock as above.
+ */
+struct vmci_datagram_queue_entry {
+ struct list_head list_item; /* For queuing. */
+ size_t dg_size; /* Size of datagram. */
+ struct vmci_datagram *dg; /* Pending datagram. */
+};
+
+/* VMCIDatagramSendRecvInfo */
+struct vmci_datagram_snd_rcv_info {
+ u64 addr;
+ u32 len;
+ s32 result;
+};
+
+/* Datagram API for non-public use. */
+int vmci_datagram_dispatch(u32 context_id, struct vmci_datagram *dg,
+ bool from_guest);
+int vmci_datagram_invoke_guest_handler(struct vmci_datagram *dg);
+
+#endif /* _VMCI_DATAGRAM_H_ */
VMCI doorbell code allows for notifcations between host and guest.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/vmw_vmci/vmci_doorbell.c | 605 +++++++++++++++++++++++++++++++++
drivers/misc/vmw_vmci/vmci_doorbell.h | 51 +++
2 files changed, 656 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/vmci_doorbell.c
create mode 100644 drivers/misc/vmw_vmci/vmci_doorbell.h
diff --git a/drivers/misc/vmw_vmci/vmci_doorbell.c b/drivers/misc/vmw_vmci/vmci_doorbell.c
new file mode 100644
index 0000000..eb87ca6
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_doorbell.c
@@ -0,0 +1,605 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/vmw_vmci_api.h>
+#include <linux/completion.h>
+#include <linux/hash.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vmci_common_int.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_resource.h"
+#include "vmci_driver.h"
+#include "vmci_route.h"
+
+
+#define VMCI_DOORBELL_INDEX_BITS 6
+#define VMCI_DOORBELL_INDEX_TABLE_SIZE (1 << VMCI_DOORBELL_INDEX_BITS)
+#define VMCI_DOORBELL_HASH(_idx) hash_32(_idx, VMCI_DOORBELL_INDEX_BITS)
+
+/*
+ * DoorbellEntry describes the a doorbell notification handle allocated by the
+ * host.
+ */
+struct dbell_entry {
+ struct vmci_resource resource;
+ struct hlist_node node;
+ struct work_struct work;
+ vmci_callback notify_cb;
+ void *client_data;
+ u32 idx;
+ u32 priv_flags;
+ bool run_delayed;
+ atomic_t active; /* Only used by guest personality */
+};
+
+/* The VMCI index table keeps track of currently registered doorbells. */
+struct dbell_index_table {
+ spinlock_t lock; /* Index table lock */
+ struct hlist_head entries[VMCI_DOORBELL_INDEX_TABLE_SIZE];
+};
+
+static struct dbell_index_table vmci_doorbell_it = {
+ .lock = __SPIN_LOCK_UNLOCKED(vmci_doorbell_it.lock),
+};
+
+/*
+ * The max_notify_idx is one larger than the currently known bitmap index in
+ * use, and is used to determine how much of the bitmap needs to be scanned.
+ */
+static u32 max_notify_idx;
+
+/*
+ * The notify_idx_count is used for determining whether there are free entries
+ * within the bitmap (if notify_idx_count + 1 < max_notify_idx).
+ */
+static u32 notify_idx_count;
+
+/*
+ * The last_notify_idx_reserved is used to track the last index handed out - in
+ * the case where multiple handles share a notification index, we hand out
+ * indexes round robin based on last_notify_idx_reserved.
+ */
+static u32 last_notify_idx_reserved;
+
+/* This is a one entry cache used to by the index allocation. */
+static u32 last_notify_idx_released = PAGE_SIZE;
+
+
+/*
+ * Utility function that retrieves the privilege flags associated
+ * with a given doorbell handle. For guest endpoints, the
+ * privileges are determined by the context ID, but for host
+ * endpoints privileges are associated with the complete
+ * handle. Hypervisor endpoints are not yet supported.
+ */
+int vmci_dbell_get_priv_flags(struct vmci_handle handle, u32 *priv_flags)
+{
+ if (priv_flags == NULL || handle.context == VMCI_INVALID_ID)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ if (handle.context == VMCI_HOST_CONTEXT_ID) {
+ struct dbell_entry *entry;
+ struct vmci_resource *resource;
+
+ resource = vmci_resource_by_handle(handle,
+ VMCI_RESOURCE_TYPE_DOORBELL);
+ if (!resource)
+ return VMCI_ERROR_NOT_FOUND;
+
+ entry = container_of(resource, struct dbell_entry, resource);
+ *priv_flags = entry->priv_flags;
+ vmci_resource_put(resource);
+ } else if (handle.context == VMCI_HYPERVISOR_CONTEXT_ID) {
+ /*
+ * Hypervisor endpoints for notifications are not
+ * supported (yet).
+ */
+ return VMCI_ERROR_INVALID_ARGS;
+ } else {
+ *priv_flags = vmci_context_get_priv_flags(handle.context);
+ }
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Find doorbell entry by bitmap index.
+ */
+static struct dbell_entry *dbell_index_table_find(u32 idx)
+{
+ u32 bucket = VMCI_DOORBELL_HASH(idx);
+ struct dbell_entry *dbell;
+ struct hlist_node *node;
+
+ hlist_for_each_entry(dbell, node, &vmci_doorbell_it.entries[bucket],
+ node) {
+ if (idx == dbell->idx)
+ return dbell;
+ }
+
+ return NULL;
+}
+
+/*
+ * Add the given entry to the index table. This willi take a reference to the
+ * entry's resource so that the entry is not deleted before it is removed from
+ * the * table.
+ */
+static void dbell_index_table_add(struct dbell_entry *entry)
+{
+ u32 bucket;
+ u32 new_notify_idx;
+
+ vmci_resource_get(&entry->resource);
+
+ spin_lock_bh(&vmci_doorbell_it.lock);
+
+ /*
+ * Below we try to allocate an index in the notification
+ * bitmap with "not too much" sharing between resources. If we
+ * use less that the full bitmap, we either add to the end if
+ * there are no unused flags within the currently used area,
+ * or we search for unused ones. If we use the full bitmap, we
+ * allocate the index round robin.
+ */
+ if (max_notify_idx < PAGE_SIZE || notify_idx_count < PAGE_SIZE) {
+ if (last_notify_idx_released < max_notify_idx &&
+ !dbell_index_table_find(last_notify_idx_released)) {
+ new_notify_idx = last_notify_idx_released;
+ last_notify_idx_released = PAGE_SIZE;
+ } else {
+ bool reused = false;
+ new_notify_idx = last_notify_idx_reserved;
+ if (notify_idx_count + 1 < max_notify_idx) {
+ do {
+ if (!dbell_index_table_find
+ (new_notify_idx)) {
+ reused = true;
+ break;
+ }
+ new_notify_idx = (new_notify_idx + 1) %
+ max_notify_idx;
+ } while (new_notify_idx !=
+ last_notify_idx_released);
+ }
+ if (!reused) {
+ new_notify_idx = max_notify_idx;
+ max_notify_idx++;
+ }
+ }
+ } else {
+ new_notify_idx = (last_notify_idx_reserved + 1) % PAGE_SIZE;
+ }
+
+ last_notify_idx_reserved = new_notify_idx;
+ notify_idx_count++;
+
+ entry->idx = new_notify_idx;
+ bucket = VMCI_DOORBELL_HASH(entry->idx);
+ hlist_add_head(&entry->node, &vmci_doorbell_it.entries[bucket]);
+
+ spin_unlock_bh(&vmci_doorbell_it.lock);
+}
+
+/*
+ * Remove the given entry from the index table. This will release() the
+ * entry's resource.
+ */
+static void dbell_index_table_remove(struct dbell_entry *entry)
+{
+ spin_lock_bh(&vmci_doorbell_it.lock);
+
+ hlist_del_init(&entry->node);
+
+ notify_idx_count--;
+ if (entry->idx == max_notify_idx - 1) {
+ /*
+ * If we delete an entry with the maximum known
+ * notification index, we take the opportunity to
+ * prune the current max. As there might be other
+ * unused indices immediately below, we lower the
+ * maximum until we hit an index in use.
+ */
+ while (max_notify_idx > 0 &&
+ !dbell_index_table_find(max_notify_idx - 1))
+ max_notify_idx--;
+ }
+
+ last_notify_idx_released = entry->idx;
+
+ spin_unlock_bh(&vmci_doorbell_it.lock);
+
+ vmci_resource_put(&entry->resource);
+}
+
+/*
+ * Creates a link between the given doorbell handle and the given
+ * index in the bitmap in the device backend. A notification state
+ * is created in hypervisor.
+ */
+static int dbell_link(struct vmci_handle handle, u32 notify_idx)
+{
+ struct vmci_doorbell_link_msg link_msg;
+
+ link_msg.hdr.dst = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_DOORBELL_LINK);
+ link_msg.hdr.src = VMCI_ANON_SRC_HANDLE;
+ link_msg.hdr.payload_size = sizeof(link_msg) - VMCI_DG_HEADERSIZE;
+ link_msg.handle = handle;
+ link_msg.notify_idx = notify_idx;
+
+ return vmci_send_datagram(&link_msg.hdr);
+}
+
+/*
+ * Unlinks the given doorbell handle from an index in the bitmap in
+ * the device backend. The notification state is destroyed in hypervisor.
+ */
+static int dbell_unlink(struct vmci_handle handle)
+{
+ struct vmci_doorbell_unlink_msg unlink_msg;
+
+ unlink_msg.hdr.dst = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_DOORBELL_UNLINK);
+ unlink_msg.hdr.src = VMCI_ANON_SRC_HANDLE;
+ unlink_msg.hdr.payload_size = sizeof(unlink_msg) - VMCI_DG_HEADERSIZE;
+ unlink_msg.handle = handle;
+
+ return vmci_send_datagram(&unlink_msg.hdr);
+}
+
+/*
+ * Notify another guest or the host. We send a datagram down to the
+ * host via the hypervisor with the notification info.
+ */
+static int dbell_notify_as_guest(struct vmci_handle handle, u32 priv_flags)
+{
+ struct vmci_doorbell_notify_msg notify_msg;
+
+ notify_msg.hdr.dst = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_DOORBELL_NOTIFY);
+ notify_msg.hdr.src = VMCI_ANON_SRC_HANDLE;
+ notify_msg.hdr.payload_size = sizeof(notify_msg) - VMCI_DG_HEADERSIZE;
+ notify_msg.handle = handle;
+
+ return vmci_send_datagram(¬ify_msg.hdr);
+}
+
+/*
+ * Calls the specified callback in a delayed context.
+ */
+static void dbell_delayed_dispatch(struct work_struct *work)
+{
+ struct dbell_entry *entry = container_of(work,
+ struct dbell_entry, work);
+
+ entry->notify_cb(entry->client_data);
+ vmci_resource_put(&entry->resource);
+}
+
+/*
+ * Dispatches a doorbell notification to the host context.
+ */
+int vmci_dbell_host_context_notify(u32 src_cid, struct vmci_handle handle)
+{
+ struct dbell_entry *entry;
+ struct vmci_resource *resource;
+
+ if (VMCI_HANDLE_INVALID(handle)) {
+ pr_devel("Notifying an invalid doorbell (handle=0x%x:0x%x).\n",
+ handle.context, handle.resource);
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ resource = vmci_resource_by_handle(handle,
+ VMCI_RESOURCE_TYPE_DOORBELL);
+ if (!resource) {
+ pr_devel("Notifying an unknown doorbell (handle=0x%x:0x%x).\n",
+ handle.context, handle.resource);
+ return VMCI_ERROR_NOT_FOUND;
+ }
+
+ entry = container_of(resource, struct dbell_entry, resource);
+ if (entry->run_delayed) {
+ schedule_work(&entry->work);
+ } else {
+ entry->notify_cb(entry->client_data);
+ vmci_resource_put(resource);
+ }
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Register the notification bitmap with the host.
+ */
+bool vmci_dbell_register_notification_bitmap(u32 bitmap_ppn)
+{
+ int result;
+ struct vmci_notify_bm_set_msg bitmap_set_msg;
+
+ bitmap_set_msg.hdr.dst = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_SET_NOTIFY_BITMAP);
+ bitmap_set_msg.hdr.src = VMCI_ANON_SRC_HANDLE;
+ bitmap_set_msg.hdr.payload_size = sizeof(bitmap_set_msg) -
+ VMCI_DG_HEADERSIZE;
+ bitmap_set_msg.bitmap_ppn = bitmap_ppn;
+
+ result = vmci_send_datagram(&bitmap_set_msg.hdr);
+ if (result != VMCI_SUCCESS) {
+ pr_devel("Failed to register (PPN=%u) as notification bitmap (error=%d).\n",
+ bitmap_ppn, result);
+ return false;
+ }
+ return true;
+}
+
+/*
+ * Executes or schedules the handlers for a given notify index.
+ */
+static void dbell_fire_entries(u32 notify_idx)
+{
+ u32 bucket = VMCI_DOORBELL_HASH(notify_idx);
+ struct dbell_entry *dbell;
+ struct hlist_node *node;
+
+ spin_lock_bh(&vmci_doorbell_it.lock);
+
+ hlist_for_each_entry(dbell, node,
+ &vmci_doorbell_it.entries[bucket], node) {
+ if (dbell->idx == notify_idx &&
+ atomic_read(&dbell->active) == 1) {
+ BUG_ON(!dbell->notify_cb);
+ if (dbell->run_delayed) {
+ vmci_resource_get(&dbell->resource);
+ schedule_work(&dbell->work);
+ } else {
+ dbell->notify_cb(dbell->client_data);
+ }
+ }
+ }
+
+ spin_unlock_bh(&vmci_doorbell_it.lock);
+}
+
+/*
+ * Scans the notification bitmap, collects pending notifications,
+ * resets the bitmap and invokes appropriate callbacks.
+ */
+void vmci_dbell_scan_notification_entries(u8 *bitmap)
+{
+ u32 idx;
+
+ for (idx = 0; idx < max_notify_idx; idx++) {
+ if (bitmap[idx] & 0x1) {
+ bitmap[idx] &= ~1;
+ dbell_fire_entries(idx);
+ }
+ }
+}
+
+/*
+ * vmci_doorbell_create() - Creates a doorbell
+ * @handle: A handle used to track the resource. Can be invalid.
+ * @flags: Flag that determines context of callback.
+ * @priv_flags: Privileges flags.
+ * @notify_cb: The callback to be ivoked when the doorbell fires.
+ * @client_data: A parameter to be passed to the callback.
+ *
+ * Creates a doorbell with the given callback. If the handle is
+ * VMCI_INVALID_HANDLE, a free handle will be assigned, if
+ * possible. The callback can be run immediately (potentially with
+ * locks held - the default) or delayed (in a kernel thread) by
+ * specifying the flag VMCI_FLAG_DELAYED_CB. If delayed execution
+ * is selected, a given callback may not be run if the kernel is
+ * unable to allocate memory for the delayed execution (highly
+ * unlikely).
+ */
+int vmci_doorbell_create(struct vmci_handle *handle,
+ u32 flags,
+ u32 priv_flags,
+ vmci_callback notify_cb, void *client_data)
+{
+ struct dbell_entry *entry;
+ struct vmci_handle new_handle;
+ int result;
+
+ if (!handle || !notify_cb || flags & ~VMCI_FLAG_DELAYED_CB ||
+ priv_flags & ~VMCI_PRIVILEGE_ALL_FLAGS)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+ if (entry == NULL) {
+ pr_warn("Failed allocating memory for datagram entry.\n");
+ return VMCI_ERROR_NO_MEM;
+ }
+
+ if (VMCI_HANDLE_INVALID(*handle)) {
+ u32 context_id = vmci_get_context_id();
+
+ /* Let resource code allocate a free ID for us */
+ new_handle = vmci_make_handle(context_id, VMCI_INVALID_ID);
+ } else {
+ bool valid_context = false;
+
+ /*
+ * Validate the handle. We must do both of the checks below
+ * because we can be acting as both a host and a guest at the
+ * same time. We always allow the host context ID, since the
+ * host functionality is in practice always there with the
+ * unified driver.
+ */
+ if (handle->context == VMCI_HOST_CONTEXT_ID ||
+ (vmci_guest_code_active() &&
+ vmci_get_context_id() == handle->context)) {
+ valid_context = true;
+ }
+
+ if (!valid_context || handle->resource == VMCI_INVALID_ID) {
+ pr_devel("Invalid argument (handle=0x%x:0x%x).\n",
+ handle->context, handle->resource);
+ result = VMCI_ERROR_INVALID_ARGS;
+ goto free_mem;
+ }
+
+ new_handle = *handle;
+ }
+
+ entry->idx = 0;
+ INIT_HLIST_NODE(&entry->node);
+ entry->priv_flags = priv_flags;
+ INIT_WORK(&entry->work, dbell_delayed_dispatch);
+ entry->run_delayed = flags & VMCI_FLAG_DELAYED_CB;
+ entry->notify_cb = notify_cb;
+ entry->client_data = client_data;
+ atomic_set(&entry->active, 0);
+
+ result = vmci_resource_add(&entry->resource,
+ VMCI_RESOURCE_TYPE_DOORBELL,
+ new_handle);
+ if (result != VMCI_SUCCESS) {
+ pr_warn("Failed to add new resource (handle=0x%x:0x%x), error: %d\n",
+ new_handle.context, new_handle.resource, result);
+ goto free_mem;
+ }
+
+ if (vmci_guest_code_active()) {
+ dbell_index_table_add(entry);
+ result = dbell_link(new_handle, entry->idx);
+ if (VMCI_SUCCESS != result)
+ goto destroy_resource;
+
+ atomic_set(&entry->active, 1);
+ }
+
+ *handle = vmci_resource_handle(&entry->resource);
+
+ return result;
+
+ destroy_resource:
+ dbell_index_table_remove(entry);
+ vmci_resource_remove(&entry->resource);
+ free_mem:
+ kfree(entry);
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_doorbell_create);
+
+/*
+ * vmci_doorbell_destroy() - Destroy a doorbell.
+ * @handle: The handle tracking the resource.
+ *
+ * Destroys a doorbell previously created with vmcii_doorbell_create. This
+ * operation may block waiting for a callback to finish.
+ */
+int vmci_doorbell_destroy(struct vmci_handle handle)
+{
+ struct dbell_entry *entry;
+ struct vmci_resource *resource;
+
+ if (VMCI_HANDLE_INVALID(handle))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ resource = vmci_resource_by_handle(handle,
+ VMCI_RESOURCE_TYPE_DOORBELL);
+ if (!resource) {
+ pr_devel("Failed to destroy doorbell (handle=0x%x:0x%x).\n",
+ handle.context, handle.resource);
+ return VMCI_ERROR_NOT_FOUND;
+ }
+
+ entry = container_of(resource, struct dbell_entry, resource);
+
+ if (vmci_guest_code_active()) {
+ int result;
+
+ dbell_index_table_remove(entry);
+
+ result = dbell_unlink(handle);
+ if (VMCI_SUCCESS != result) {
+
+ /*
+ * The only reason this should fail would be
+ * an inconsistency between guest and
+ * hypervisor state, where the guest believes
+ * it has an active registration whereas the
+ * hypervisor doesn't. One case where this may
+ * happen is if a doorbell is unregistered
+ * following a hibernation at a time where the
+ * doorbell state hasn't been restored on the
+ * hypervisor side yet. Since the handle has
+ * now been removed in the guest, we just
+ * print a warning and return success.
+ */
+ pr_devel("Unlink of doorbell (handle=0x%x:0x%x) unknown by hypervisor (error=%d).\n",
+ handle.context, handle.resource, result);
+ }
+ }
+
+ /*
+ * Now remove the resource from the table. It might still be in use
+ * after this, in a callback or still on the delayed work queue.
+ */
+ vmci_resource_put(&entry->resource);
+ vmci_resource_remove(&entry->resource);
+
+ kfree(entry);
+
+ return VMCI_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(vmci_doorbell_destroy);
+
+/*
+ * vmci_doorbell_notify() - Ring the doorbell (and hide in the bushes).
+ * @dst: The handlle identifying the doorbell resource
+ * @priv_flags: Priviledge flags.
+ *
+ * Generates a notification on the doorbell identified by the
+ * handle. For host side generation of notifications, the caller
+ * can specify what the privilege of the calling side is.
+ */
+int vmci_doorbell_notify(struct vmci_handle dst, u32 priv_flags)
+{
+ int retval;
+ enum vmci_route route;
+ struct vmci_handle src;
+
+ if (VMCI_HANDLE_INVALID(dst) ||
+ (priv_flags & ~VMCI_PRIVILEGE_ALL_FLAGS))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ src = VMCI_INVALID_HANDLE;
+ retval = vmci_route(&src, &dst, false, &route);
+ if (retval < VMCI_SUCCESS)
+ return retval;
+
+ if (VMCI_ROUTE_AS_HOST == route)
+ return vmci_ctx_notify_dbell(VMCI_HOST_CONTEXT_ID,
+ dst, priv_flags);
+
+ if (VMCI_ROUTE_AS_GUEST == route)
+ return dbell_notify_as_guest(dst, priv_flags);
+
+ pr_warn("Unknown route (%d) for doorbell.\n", route);
+ return VMCI_ERROR_DST_UNREACHABLE;
+}
+EXPORT_SYMBOL_GPL(vmci_doorbell_notify);
diff --git a/drivers/misc/vmw_vmci/vmci_doorbell.h b/drivers/misc/vmw_vmci/vmci_doorbell.h
new file mode 100644
index 0000000..e4c0b17
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_doorbell.h
@@ -0,0 +1,51 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef VMCI_DOORBELL_H
+#define VMCI_DOORBELL_H
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/types.h>
+
+#include "vmci_driver.h"
+
+/*
+ * VMCINotifyResourceInfo: Used to create and destroy doorbells, and
+ * generate a notification for a doorbell or queue pair.
+ */
+struct vmci_dbell_notify_resource_info {
+ struct vmci_handle handle;
+ u16 resource;
+ u16 action;
+ s32 result;
+};
+
+/*
+ * Structure used for checkpointing the doorbell mappings. It is
+ * written to the checkpoint as is, so changing this structure will
+ * break checkpoint compatibility.
+ */
+struct dbell_cpt_state {
+ struct vmci_handle handle;
+ u64 bitmap_idx;
+};
+
+int vmci_dbell_host_context_notify(u32 src_cid, struct vmci_handle handle);
+int vmci_dbell_get_priv_flags(struct vmci_handle handle, u32 *priv_flags);
+
+bool vmci_dbell_register_notification_bitmap(u32 bitmap_ppn);
+void vmci_dbell_scan_notification_entries(u8 *bitmap);
+
+#endif /* VMCI_DOORBELL_H */
VMCI driver code implementes both the host and guest personalities of the VMCI driver.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/vmw_vmci/vmci_driver.c | 116 +++++++++++++++++++++++++++++++++++
drivers/misc/vmw_vmci/vmci_driver.h | 50 +++++++++++++++
2 files changed, 166 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/vmci_driver.c
create mode 100644 drivers/misc/vmw_vmci/vmci_driver.h
diff --git a/drivers/misc/vmw_vmci/vmci_driver.c b/drivers/misc/vmw_vmci/vmci_driver.c
new file mode 100644
index 0000000..8c5db38
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_driver.c
@@ -0,0 +1,116 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/vmw_vmci_api.h>
+#include <linux/atomic.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+static bool vmci_disable_host;
+module_param_named(disable_host, vmci_disable_host, bool, 0);
+MODULE_PARM_DESC(disable_host,
+ "Disable driver host personality (default=enabled)");
+
+static bool vmci_disable_guest;
+module_param_named(disable_guest, vmci_disable_guest, bool, 0);
+MODULE_PARM_DESC(disable_guest,
+ "Disable driver guest personality (default=enabled)");
+
+static bool vmci_guest_personality_initialized;
+static bool vmci_host_personality_initialized;
+
+/*
+ * vmci_get_context_id() - Gets the current context ID.
+ *
+ * Returns the current context ID. Note that since this is accessed only
+ * from code running in the host, this always returns the host context ID.
+ */
+u32 vmci_get_context_id(void)
+{
+ if (vmci_guest_code_active())
+ return vmci_get_vm_context_id();
+ else if (vmci_host_code_active())
+ return VMCI_HOST_CONTEXT_ID;
+
+ return VMCI_INVALID_ID;
+}
+EXPORT_SYMBOL_GPL(vmci_get_context_id);
+
+static int __init vmci_drv_init(void)
+{
+ int vmci_err;
+ int error;
+
+ vmci_err = vmci_event_init();
+ if (vmci_err < VMCI_SUCCESS) {
+ pr_err("Failed to initialize VMCIEvent (result=%d).\n", vmci_err);
+ return -EINVAL;
+ }
+
+ if (!vmci_disable_guest) {
+ error = vmci_guest_init();
+ if (error) {
+ pr_warn("Failed to initialize guest personality (err=%d).\n",
+ error);
+ } else {
+ vmci_guest_personality_initialized = true;
+ pr_info("Guest personality initialized and is %s\n",
+ vmci_guest_code_active() ?
+ "active" : "inactive");
+ }
+ }
+
+ if (!vmci_disable_host) {
+ error = vmci_host_init();
+ if (error) {
+ pr_warn("Unable to initialize host personality (err=%d).\n",
+ error);
+ } else {
+ vmci_host_personality_initialized = true;
+ pr_info("Initialized host personality\n");
+ }
+ }
+
+ if (!vmci_guest_personality_initialized &&
+ !vmci_host_personality_initialized) {
+ vmci_event_exit();
+ return -ENODEV;
+ }
+
+ return 0;
+}
+module_init(vmci_drv_init);
+
+static void __exit vmci_drv_exit(void)
+{
+ if (vmci_guest_personality_initialized)
+ vmci_guest_exit();
+
+ if (vmci_host_personality_initialized)
+ vmci_host_exit();
+
+ vmci_event_exit();
+}
+module_exit(vmci_drv_exit);
+
+MODULE_AUTHOR("VMware, Inc.");
+MODULE_DESCRIPTION("VMware Virtual Machine Communication Interface.");
+MODULE_VERSION(VMCI_DRIVER_VERSION_STRING);
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/misc/vmw_vmci/vmci_driver.h b/drivers/misc/vmw_vmci/vmci_driver.h
new file mode 100644
index 0000000..f69156a
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_driver.h
@@ -0,0 +1,50 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef _VMCI_DRIVER_H_
+#define _VMCI_DRIVER_H_
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/wait.h>
+
+#include "vmci_queue_pair.h"
+#include "vmci_context.h"
+
+enum vmci_obj_type {
+ VMCIOBJ_VMX_VM = 10,
+ VMCIOBJ_CONTEXT,
+ VMCIOBJ_SOCKET,
+ VMCIOBJ_NOT_SET,
+};
+
+/* For storing VMCI structures in file handles. */
+struct vmci_obj {
+ void *ptr;
+ enum vmci_obj_type type;
+};
+
+u32 vmci_get_context_id(void);
+int vmci_send_datagram(struct vmci_datagram *dg);
+
+int vmci_host_init(void);
+void vmci_host_exit(void);
+bool vmci_host_code_active(void);
+
+int vmci_guest_init(void);
+void vmci_guest_exit(void);
+bool vmci_guest_code_active(void);
+u32 vmci_get_vm_context_id(void);
+
+#endif /* _VMCI_DRIVER_H_ */
VMCI event code that manages event handlers and handles callbacks when specific
events fire.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/vmw_vmci/vmci_event.c | 229 ++++++++++++++++++++++++++++++++++++
drivers/misc/vmw_vmci/vmci_event.h | 25 ++++
2 files changed, 254 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/vmci_event.c
create mode 100644 drivers/misc/vmw_vmci/vmci_event.h
diff --git a/drivers/misc/vmw_vmci/vmci_event.c b/drivers/misc/vmw_vmci/vmci_event.c
new file mode 100644
index 0000000..e956c18
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_event.c
@@ -0,0 +1,229 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/vmw_vmci_api.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+#define EVENT_MAGIC 0xEABE0000
+#define VMCI_EVENT_MAX_ATTEMPTS 10
+
+struct vmci_subscription {
+ u32 id;
+ u32 event;
+ vmci_event_cb callback;
+ void *callback_data;
+ struct list_head node; /* on one of subscriber lists */
+};
+
+static struct list_head subscriber_array[VMCI_EVENT_MAX];
+static DEFINE_MUTEX(subscriber_mutex);
+
+int __init vmci_event_init(void)
+{
+ int i;
+
+ for (i = 0; i < VMCI_EVENT_MAX; i++)
+ INIT_LIST_HEAD(&subscriber_array[i]);
+
+ return VMCI_SUCCESS;
+}
+
+void vmci_event_exit(void)
+{
+ int e;
+
+ /* We free all memory at exit. */
+ for (e = 0; e < VMCI_EVENT_MAX; e++) {
+ struct vmci_subscription *cur, *p2;
+ list_for_each_entry_safe(cur, p2, &subscriber_array[e], node) {
+
+ /*
+ * We should never get here because all events
+ * should have been unregistered before we try
+ * to unload the driver module.
+ */
+ pr_warn("Unexpected free events occurring.\n");
+ list_del(&cur->node);
+ kfree(cur);
+ }
+ }
+}
+
+/*
+ * Find entry. Assumes subscriber_mutex is held.
+ */
+static struct vmci_subscription *event_find(u32 sub_id)
+{
+ int e;
+
+ for (e = 0; e < VMCI_EVENT_MAX; e++) {
+ struct vmci_subscription *cur;
+ list_for_each_entry(cur, &subscriber_array[e], node) {
+ if (cur->id == sub_id)
+ return cur;
+ }
+ }
+ return NULL;
+}
+
+/*
+ * Actually delivers the events to the subscribers.
+ * The callback function for each subscriber is invoked.
+ */
+static void event_deliver(struct vmci_event_msg *event_msg)
+{
+ struct vmci_subscription *cur;
+ struct list_head *subscriber_list;
+
+ rcu_read_lock();
+ subscriber_list = &subscriber_array[event_msg->event_data.event];
+ list_for_each_entry_rcu(cur, subscriber_list, node) {
+ BUG_ON(cur->event != event_msg->event_data.event);
+
+ cur->callback(cur->id, &event_msg->event_data,
+ cur->callback_data);
+ }
+ rcu_read_unlock();
+}
+
+/*
+ * Dispatcher for the VMCI_EVENT_RECEIVE datagrams. Calls all
+ * subscribers for given event.
+ */
+int vmci_event_dispatch(struct vmci_datagram *msg)
+{
+ struct vmci_event_msg *event_msg = (struct vmci_event_msg *)msg;
+
+ BUG_ON(msg->src.context != VMCI_HYPERVISOR_CONTEXT_ID ||
+ msg->dst.resource != VMCI_EVENT_HANDLER);
+
+ if (msg->payload_size < sizeof(u32) ||
+ msg->payload_size > sizeof(struct vmci_event_data_max))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ if (!VMCI_EVENT_VALID(event_msg->event_data.event))
+ return VMCI_ERROR_EVENT_UNKNOWN;
+
+ event_deliver(event_msg);
+ return VMCI_SUCCESS;
+}
+
+/*
+ * vmci_event_subscribe() - Subscribe to a given event.
+ * @event: The event to subscribe to.
+ * @callback: The callback to invoke upon the event.
+ * @callback_data: Data to pass to the callback.
+ * @subscription_id: ID used to track subscription. Used with
+ * vmci_event_unsubscribe()
+ *
+ * Subscribes to the provided event. The callback specified will be
+ * fired from RCU critical section and therefore must not sleep.
+ */
+int vmci_event_subscribe(u32 event,
+ vmci_event_cb callback,
+ void *callback_data,
+ u32 *new_subscription_id)
+{
+ struct vmci_subscription *sub;
+ int attempts;
+ int retval;
+ bool have_new_id = false;
+
+ if (!new_subscription_id) {
+ pr_devel("%s: Invalid subscription (NULL).\n", __func__);
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ if (!VMCI_EVENT_VALID(event) || !callback) {
+ pr_devel("%s: Failed to subscribe to event (type=%d) (callback=%p) (data=%p).\n",
+ __func__, event, callback, callback_data);
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ sub = kzalloc(sizeof(*sub), GFP_KERNEL);
+ if (!sub)
+ return VMCI_ERROR_NO_MEM;
+
+ sub->id = VMCI_EVENT_MAX;
+ sub->event = event;
+ sub->callback = callback;
+ sub->callback_data = callback_data;
+ INIT_LIST_HEAD(&sub->node);
+
+ mutex_lock(&subscriber_mutex);
+
+ /* Creation of a new event is always allowed. */
+ for (attempts = 0; attempts < VMCI_EVENT_MAX_ATTEMPTS; attempts++) {
+ static u32 subscription_id;
+ /*
+ * We try to get an id a couple of time before
+ * claiming we are out of resources.
+ */
+
+ /* Test for duplicate id. */
+ if (!event_find(++subscription_id)) {
+ sub->id = subscription_id;
+ have_new_id = true;
+ break;
+ }
+ }
+
+ if (have_new_id) {
+ list_add_rcu(&sub->node, &subscriber_array[event]);
+ retval = VMCI_SUCCESS;
+ } else {
+ retval = VMCI_ERROR_NO_RESOURCES;
+ }
+
+ mutex_unlock(&subscriber_mutex);
+
+ *new_subscription_id = sub->id;
+ return retval;
+}
+EXPORT_SYMBOL_GPL(vmci_event_subscribe);
+
+/*
+ * vmci_event_unsubscribe() - unsubscribe from an event.
+ * @sub_id: A subscription ID as provided by vmci_event_subscribe()
+ *
+ * Unsubscribe from given event. Removes it from list and frees it.
+ * Will return callback_data if requested by caller.
+ */
+int vmci_event_unsubscribe(u32 sub_id)
+{
+ struct vmci_subscription *s;
+
+ mutex_lock(&subscriber_mutex);
+ s = event_find(sub_id);
+ if (s)
+ list_del_rcu(&s->node);
+ mutex_unlock(&subscriber_mutex);
+
+ if (!s)
+ return VMCI_ERROR_NOT_FOUND;
+
+ synchronize_rcu();
+ kfree(s);
+
+ return VMCI_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(vmci_event_unsubscribe);
diff --git a/drivers/misc/vmw_vmci/vmci_event.h b/drivers/misc/vmw_vmci/vmci_event.h
new file mode 100644
index 0000000..7df9b1c
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_event.h
@@ -0,0 +1,25 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef __VMCI_EVENT_H__
+#define __VMCI_EVENT_H__
+
+#include <linux/vmw_vmci_api.h>
+
+int vmci_event_init(void);
+void vmci_event_exit(void);
+int vmci_event_dispatch(struct vmci_datagram *msg);
+
+#endif /*__VMCI_EVENT_H__ */
VMCI handle code adds support for dynamic arrays that will grow if they need to.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/vmw_vmci/vmci_handle_array.c | 142 +++++++++++++++++++++++++++++
drivers/misc/vmw_vmci/vmci_handle_array.h | 52 +++++++++++
2 files changed, 194 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/vmci_handle_array.c
create mode 100644 drivers/misc/vmw_vmci/vmci_handle_array.h
diff --git a/drivers/misc/vmw_vmci/vmci_handle_array.c b/drivers/misc/vmw_vmci/vmci_handle_array.c
new file mode 100644
index 0000000..9122373
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_handle_array.c
@@ -0,0 +1,142 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include <linux/slab.h>
+#include "vmci_handle_array.h"
+
+static size_t handle_arr_calc_size(size_t capacity)
+{
+ return sizeof(struct vmci_handle_arr) +
+ capacity * sizeof(struct vmci_handle);
+}
+
+struct vmci_handle_arr *vmci_handle_arr_create(size_t capacity)
+{
+ struct vmci_handle_arr *array;
+
+ if (capacity == 0)
+ capacity = VMCI_HANDLE_ARRAY_DEFAULT_SIZE;
+
+ array = kmalloc(handle_arr_calc_size(capacity), GFP_ATOMIC);
+ if (!array)
+ return NULL;
+
+ array->capacity = capacity;
+ array->size = 0;
+
+ return array;
+}
+
+void vmci_handle_arr_destroy(struct vmci_handle_arr *array)
+{
+ kfree(array);
+}
+
+void vmci_handle_arr_append_entry(struct vmci_handle_arr **array_ptr,
+ struct vmci_handle handle)
+{
+ struct vmci_handle_arr *array = *array_ptr;
+
+ if (unlikely(array->size >= array->capacity)) {
+ /* reallocate. */
+ struct vmci_handle_arr *new_array;
+ size_t new_capacity = array->capacity * VMCI_ARR_CAP_MULT;
+ size_t new_size = handle_arr_calc_size(new_capacity);
+
+ new_array = krealloc(array, new_size, GFP_ATOMIC);
+ if (!new_array)
+ return;
+
+ new_array->capacity = new_capacity;
+ *array_ptr = array = new_array;
+ }
+
+ array->entries[array->size] = handle;
+ array->size++;
+}
+
+/*
+ * Handle that was removed, VMCI_INVALID_HANDLE if entry not found.
+ */
+struct vmci_handle vmci_handle_arr_remove_entry(struct vmci_handle_arr *array,
+ struct vmci_handle entry_handle)
+{
+ struct vmci_handle handle = VMCI_INVALID_HANDLE;
+ size_t i;
+
+ for (i = 0; i < array->size; i++) {
+ if (VMCI_HANDLE_EQUAL(array->entries[i], entry_handle)) {
+ handle = array->entries[i];
+ array->size--;
+ array->entries[i] = array->entries[array->size];
+ array->entries[array->size] = VMCI_INVALID_HANDLE;
+ break;
+ }
+ }
+
+ return handle;
+}
+
+/*
+ * Handle that was removed, VMCI_INVALID_HANDLE if array was empty.
+ */
+struct vmci_handle vmci_handle_arr_remove_tail(struct vmci_handle_arr *array)
+{
+ struct vmci_handle handle = VMCI_INVALID_HANDLE;
+
+ if (array->size) {
+ array->size--;
+ handle = array->entries[array->size];
+ array->entries[array->size] = VMCI_INVALID_HANDLE;
+ }
+
+ return handle;
+}
+
+/*
+ * Handle at given index, VMCI_INVALID_HANDLE if invalid index.
+ */
+struct vmci_handle
+vmci_handle_arr_get_entry(const struct vmci_handle_arr *array, size_t index)
+{
+ if (unlikely(index >= array->size))
+ return VMCI_INVALID_HANDLE;
+
+ return array->entries[index];
+}
+
+bool vmci_handle_arr_has_entry(const struct vmci_handle_arr *array,
+ struct vmci_handle entry_handle)
+{
+ size_t i;
+
+ for (i = 0; i < array->size; i++)
+ if (VMCI_HANDLE_EQUAL(array->entries[i], entry_handle))
+ return true;
+
+ return false;
+}
+
+/*
+ * NULL if the array is empty. Otherwise, a pointer to the array
+ * of VMCI handles in the handle array.
+ */
+struct vmci_handle *vmci_handle_arr_get_handles(struct vmci_handle_arr *array)
+{
+ if (array->size)
+ return array->entries;
+
+ return NULL;
+}
diff --git a/drivers/misc/vmw_vmci/vmci_handle_array.h b/drivers/misc/vmw_vmci/vmci_handle_array.h
new file mode 100644
index 0000000..b5f3a7f
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_handle_array.h
@@ -0,0 +1,52 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef _VMCI_HANDLE_ARRAY_H_
+#define _VMCI_HANDLE_ARRAY_H_
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/types.h>
+
+#define VMCI_HANDLE_ARRAY_DEFAULT_SIZE 4
+#define VMCI_ARR_CAP_MULT 2 /* Array capacity multiplier */
+
+struct vmci_handle_arr {
+ size_t capacity;
+ size_t size;
+ struct vmci_handle entries[];
+};
+
+struct vmci_handle_arr *vmci_handle_arr_create(size_t capacity);
+void vmci_handle_arr_destroy(struct vmci_handle_arr *array);
+void vmci_handle_arr_append_entry(struct vmci_handle_arr **array_ptr,
+ struct vmci_handle handle);
+struct vmci_handle vmci_handle_arr_remove_entry(struct vmci_handle_arr *array,
+ struct vmci_handle
+ entry_handle);
+struct vmci_handle vmci_handle_arr_remove_tail(struct vmci_handle_arr *array);
+struct vmci_handle
+vmci_handle_arr_get_entry(const struct vmci_handle_arr *array, size_t index);
+bool vmci_handle_arr_has_entry(const struct vmci_handle_arr *array,
+ struct vmci_handle entry_handle);
+struct vmci_handle *vmci_handle_arr_get_handles(struct vmci_handle_arr *array);
+
+static inline size_t vmci_handle_arr_get_size(
+ const struct vmci_handle_arr *array)
+{
+ return array->size;
+}
+
+
+#endif /* _VMCI_HANDLE_ARRAY_H_ */
VMCI queue pairs allow for bi-directional ordered communication between host and guests.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/vmw_vmci/vmci_queue_pair.c | 3506 +++++++++++++++++++++++++++++++
drivers/misc/vmw_vmci/vmci_queue_pair.h | 191 ++
2 files changed, 3697 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/vmci_queue_pair.c
create mode 100644 drivers/misc/vmw_vmci/vmci_queue_pair.h
diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c b/drivers/misc/vmw_vmci/vmci_queue_pair.c
new file mode 100644
index 0000000..126f85e
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c
@@ -0,0 +1,3506 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include <linux/device-mapper.h>
+#include <linux/vmw_vmci_defs.h>
+#include <linux/vmw_vmci_api.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/socket.h>
+#include <linux/wait.h>
+
+#include "vmci_handle_array.h"
+#include "vmci_common_int.h"
+#include "vmci_queue_pair.h"
+#include "vmci_datagram.h"
+#include "vmci_resource.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+#include "vmci_route.h"
+
+/*
+ * In the following, we will distinguish between two kinds of VMX processes -
+ * the ones with versions lower than VMCI_VERSION_NOVMVM that use specialized
+ * VMCI page files in the VMX and supporting VM to VM communication and the
+ * newer ones that use the guest memory directly. We will in the following
+ * refer to the older VMX versions as old-style VMX'en, and the newer ones as
+ * new-style VMX'en.
+ *
+ * The state transition datagram is as follows (the VMCIQPB_ prefix has been
+ * removed for readability) - see below for more details on the transtions:
+ *
+ * -------------- NEW -------------
+ * | |
+ * \_/ \_/
+ * CREATED_NO_MEM <-----------------> CREATED_MEM
+ * | | |
+ * | o-----------------------o |
+ * | | |
+ * \_/ \_/ \_/
+ * ATTACHED_NO_MEM <----------------> ATTACHED_MEM
+ * | | |
+ * | o----------------------o |
+ * | | |
+ * \_/ \_/ \_/
+ * SHUTDOWN_NO_MEM <----------------> SHUTDOWN_MEM
+ * | |
+ * | |
+ * -------------> gone <-------------
+ *
+ * In more detail. When a VMCI queue pair is first created, it will be in the
+ * VMCIQPB_NEW state. It will then move into one of the following states:
+ *
+ * - VMCIQPB_CREATED_NO_MEM: this state indicates that either:
+ *
+ * - the created was performed by a host endpoint, in which case there is
+ * no backing memory yet.
+ *
+ * - the create was initiated by an old-style VMX, that uses
+ * vmci_qp_broker_set_page_store to specify the UVAs of the queue pair at
+ * a later point in time. This state can be distinguished from the one
+ * above by the context ID of the creator. A host side is not allowed to
+ * attach until the page store has been set.
+ *
+ * - VMCIQPB_CREATED_MEM: this state is the result when the queue pair
+ * is created by a VMX using the queue pair device backend that
+ * sets the UVAs of the queue pair immediately and stores the
+ * information for later attachers. At this point, it is ready for
+ * the host side to attach to it.
+ *
+ * Once the queue pair is in one of the created states (with the exception of
+ * the case mentioned for older VMX'en above), it is possible to attach to the
+ * queue pair. Again we have two new states possible:
+ *
+ * - VMCIQPB_ATTACHED_MEM: this state can be reached through the following
+ * paths:
+ *
+ * - from VMCIQPB_CREATED_NO_MEM when a new-style VMX allocates a queue
+ * pair, and attaches to a queue pair previously created by the host side.
+ *
+ * - from VMCIQPB_CREATED_MEM when the host side attaches to a queue pair
+ * already created by a guest.
+ *
+ * - from VMCIQPB_ATTACHED_NO_MEM, when an old-style VMX calls
+ * vmci_qp_broker_set_page_store (see below).
+ *
+ * - VMCIQPB_ATTACHED_NO_MEM: If the queue pair already was in the
+ * VMCIQPB_CREATED_NO_MEM due to a host side create, an old-style VMX will
+ * bring the queue pair into this state. Once vmci_qp_broker_set_page_store
+ * is called to register the user memory, the VMCIQPB_ATTACH_MEM state
+ * will be entered.
+ *
+ * From the attached queue pair, the queue pair can enter the shutdown states
+ * when either side of the queue pair detaches. If the guest side detaches
+ * first, the queue pair will enter the VMCIQPB_SHUTDOWN_NO_MEM state, where
+ * the content of the queue pair will no longer be available. If the host
+ * side detaches first, the queue pair will either enter the
+ * VMCIQPB_SHUTDOWN_MEM, if the guest memory is currently mapped, or
+ * VMCIQPB_SHUTDOWN_NO_MEM, if the guest memory is not mapped
+ * (e.g., the host detaches while a guest is stunned).
+ *
+ * New-style VMX'en will also unmap guest memory, if the guest is
+ * quiesced, e.g., during a snapshot operation. In that case, the guest
+ * memory will no longer be available, and the queue pair will transition from
+ * *_MEM state to a *_NO_MEM state. The VMX may later map the memory once more,
+ * in which case the queue pair will transition from the *_NO_MEM state at that
+ * point back to the *_MEM state. Note that the *_NO_MEM state may have changed,
+ * since the peer may have either attached or detached in the meantime. The
+ * values are laid out such that ++ on a state will move from a *_NO_MEM to a
+ * *_MEM state, and vice versa.
+ */
+
+/*
+ * VMCIMemcpy{To,From}QueueFunc() prototypes. Functions of these
+ * types are passed around to enqueue and dequeue routines. Note that
+ * often the functions passed are simply wrappers around memcpy
+ * itself.
+ *
+ * Note: In order for the memcpy typedefs to be compatible with the VMKernel,
+ * there's an unused last parameter for the hosted side. In
+ * ESX, that parameter holds a buffer type.
+ */
+typedef int vmci_memcpy_to_queue_func(struct vmci_queue *queue,
+ u64 queue_offset, const void *src,
+ size_t src_offset, size_t size);
+typedef int vmci_memcpy_from_queue_func(void *dest, size_t dest_offset,
+ const struct vmci_queue *queue,
+ u64 queue_offset, size_t size);
+
+/* The Kernel specific component of the struct vmci_queue structure. */
+struct vmci_queue_kern_if {
+ struct page **page;
+ struct page **header_page;
+ void *va;
+ struct mutex __mutex;
+ struct mutex *mutex;
+ bool host;
+ size_t num_pages;
+ bool mapped;
+};
+
+/*
+ * This structure is opaque to the clients.
+ */
+struct vmci_qp {
+ struct vmci_handle handle;
+ struct vmci_queue *produce_q;
+ struct vmci_queue *consume_q;
+ u64 produce_q_size;
+ u64 consume_q_size;
+ u32 peer;
+ u32 flags;
+ u32 priv_flags;
+ bool guest_endpoint;
+ unsigned int blocked;
+ unsigned int generation;
+ wait_queue_head_t event;
+};
+
+enum qp_broker_state {
+ VMCIQPB_NEW,
+ VMCIQPB_CREATED_NO_MEM,
+ VMCIQPB_CREATED_MEM,
+ VMCIQPB_ATTACHED_NO_MEM,
+ VMCIQPB_ATTACHED_MEM,
+ VMCIQPB_SHUTDOWN_NO_MEM,
+ VMCIQPB_SHUTDOWN_MEM,
+ VMCIQPB_GONE
+};
+
+#define QPBROKERSTATE_HAS_MEM(_qpb) (_qpb->state == VMCIQPB_CREATED_MEM || \
+ _qpb->state == VMCIQPB_ATTACHED_MEM || \
+ _qpb->state == VMCIQPB_SHUTDOWN_MEM)
+
+/*
+ * In the queue pair broker, we always use the guest point of view for
+ * the produce and consume queue values and references, e.g., the
+ * produce queue size stored is the guests produce queue size. The
+ * host endpoint will need to swap these around. The only exception is
+ * the local queue pairs on the host, in which case the host endpoint
+ * that creates the queue pair will have the right orientation, and
+ * the attaching host endpoint will need to swap.
+ */
+struct qp_entry {
+ struct list_head list_item;
+ struct vmci_handle handle;
+ u32 peer;
+ u32 flags;
+ u64 produce_size;
+ u64 consume_size;
+ u32 ref_count;
+};
+
+struct qp_broker_entry {
+ struct vmci_resource resource;
+ struct qp_entry qp;
+ u32 create_id;
+ u32 attach_id;
+ enum qp_broker_state state;
+ bool require_trusted_attach;
+ bool created_by_trusted;
+ bool vmci_page_files; /* Created by VMX using VMCI page files */
+ struct vmci_queue *produce_q;
+ struct vmci_queue *consume_q;
+ struct vmci_queue_header saved_produce_q;
+ struct vmci_queue_header saved_consume_q;
+ vmci_event_release_cb wakeup_cb;
+ void *client_data;
+ void *local_mem; /* Kernel memory for local queue pair */
+};
+
+struct qp_guest_endpoint {
+ struct vmci_resource resource;
+ struct qp_entry qp;
+ u64 num_ppns;
+ void *produce_q;
+ void *consume_q;
+ struct PPNSet ppn_set;
+};
+
+struct qp_list {
+ struct list_head head;
+ struct mutex mutex;
+};
+
+static struct qp_list qp_broker_list = {
+ .head = LIST_HEAD_INIT(qp_broker_list.head),
+ .mutex = __MUTEX_INITIALIZER(qp_broker_list.mutex),
+};
+
+static struct qp_list qp_guest_endpoints = {
+ .head = LIST_HEAD_INIT(qp_guest_endpoints.head),
+ .mutex = __MUTEX_INITIALIZER(qp_guest_endpoints.mutex),
+};
+
+#define INVALID_VMCI_GUEST_MEM_ID 0
+#define QPE_NUM_PAGES(_QPE) ((u32) \
+ (dm_div_up(_QPE.produce_size, PAGE_SIZE) + \
+ dm_div_up(_QPE.consume_size, PAGE_SIZE) + 2))
+
+
+/*
+ * Frees kernel VA space for a given queue and its queue header, and
+ * frees physical data pages.
+ */
+static void qp_free_queue(void *q, u64 size)
+{
+ struct vmci_queue *queue = q;
+
+ if (queue) {
+ u64 i = dm_div_up(size, PAGE_SIZE);
+
+ if (queue->kernel_if->mapped) {
+ vunmap(queue->kernel_if->va);
+ queue->kernel_if->va = NULL;
+ }
+
+ while (i)
+ __free_page(queue->kernel_if->page[--i]);
+
+ vfree(queue->q_header);
+ }
+}
+
+/*
+ * Allocates kernel VA space of specified size, plus space for the
+ * queue structure/kernel interface and the queue header. Allocates
+ * physical pages for the queue data pages.
+ *
+ * PAGE m: struct vmci_queue_header (struct vmci_queue->q_header)
+ * PAGE m+1: struct vmci_queue
+ * PAGE m+1+q: struct vmci_queue_kern_if (struct vmci_queue->kernel_if)
+ * PAGE n-size: Data pages (struct vmci_queue->kernel_if->page[])
+ */
+static void *qp_alloc_queue(u64 size, u32 flags)
+{
+ u64 i;
+ struct vmci_queue *queue;
+ struct vmci_queue_header *q_header;
+ const u64 num_data_pages = dm_div_up(size, PAGE_SIZE);
+ const uint queue_size =
+ PAGE_SIZE +
+ sizeof(*queue) + sizeof(*(queue->kernel_if)) +
+ num_data_pages * sizeof(*(queue->kernel_if->page));
+
+ BUG_ON(size > VMCI_MAX_GUEST_QP_MEMORY);
+ BUG_ON(vmci_qp_pinned(flags) && size > VMCI_MAX_PINNED_QP_MEMORY);
+
+ q_header = vmalloc(queue_size);
+ if (!q_header)
+ return NULL;
+
+ queue = (void *)q_header + PAGE_SIZE;
+ queue->q_header = q_header;
+ queue->saved_header = NULL;
+ queue->kernel_if = (struct vmci_queue_kern_if *)(queue + 1);
+ queue->kernel_if->header_page = NULL; /* Unused in guest. */
+ queue->kernel_if->page = (struct page **)(queue->kernel_if + 1);
+ queue->kernel_if->host = false;
+ queue->kernel_if->va = NULL;
+ queue->kernel_if->mapped = false;
+
+ for (i = 0; i < num_data_pages; i++) {
+ queue->kernel_if->page[i] = alloc_pages(GFP_KERNEL, 0);
+ if (!queue->kernel_if->page[i])
+ goto fail;
+ }
+
+ if (vmci_qp_pinned(flags)) {
+ queue->kernel_if->va =
+ vmap(queue->kernel_if->page, num_data_pages, VM_MAP,
+ PAGE_KERNEL);
+ if (!queue->kernel_if->va)
+ goto fail;
+
+ queue->kernel_if->mapped = true;
+ }
+
+ return (void *)queue;
+
+ fail:
+ qp_free_queue(queue, i * PAGE_SIZE);
+ return NULL;
+}
+
+/*
+ * Copies from a given buffer or iovector to a VMCI Queue. Uses
+ * kmap()/kunmap() to dynamically map/unmap required portions of the queue
+ * by traversing the offset -> page translation structure for the queue.
+ * Assumes that offset + size does not wrap around in the queue.
+ */
+static int __qp_memcpy_to_queue(struct vmci_queue *queue,
+ u64 queue_offset,
+ const void *src,
+ size_t size,
+ bool is_iovec)
+{
+ struct vmci_queue_kern_if *kernel_if = queue->kernel_if;
+ size_t bytes_copied = 0;
+
+ while (bytes_copied < size) {
+ u64 page_index = (queue_offset + bytes_copied) / PAGE_SIZE;
+ size_t page_offset =
+ (queue_offset + bytes_copied) & (PAGE_SIZE - 1);
+ void *va;
+ size_t to_copy;
+
+ if (!kernel_if->mapped)
+ va = kmap(kernel_if->page[page_index]);
+ else
+ va = (void *)((u8 *) kernel_if->va +
+ (page_index * PAGE_SIZE));
+
+ if (size - bytes_copied > PAGE_SIZE - page_offset)
+ /* Enough payload to fill up from this page. */
+ to_copy = PAGE_SIZE - page_offset;
+ else
+ to_copy = size - bytes_copied;
+
+ if (is_iovec) {
+ struct iovec *iov = (struct iovec *)src;
+ int err;
+
+ /* The iovec will track bytes_copied internally. */
+ err = memcpy_fromiovec((u8 *) va + page_offset,
+ iov, to_copy);
+ if (err != 0) {
+ kunmap(kernel_if->page[page_index]);
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+ } else {
+ memcpy((u8 *) va + page_offset,
+ (u8 *) src + bytes_copied, to_copy);
+ }
+
+ bytes_copied += to_copy;
+ if (!kernel_if->mapped)
+ kunmap(kernel_if->page[page_index]);
+ }
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Copies to a given buffer or iovector from a VMCI Queue. Uses
+ * kmap()/kunmap() to dynamically map/unmap required portions of the queue
+ * by traversing the offset -> page translation structure for the queue.
+ * Assumes that offset + size does not wrap around in the queue.
+ */
+static int __qp_memcpy_from_queue(void *dest,
+ const struct vmci_queue *queue,
+ u64 queue_offset,
+ size_t size,
+ bool is_iovec)
+{
+ struct vmci_queue_kern_if *kernel_if = queue->kernel_if;
+ size_t bytes_copied = 0;
+
+ while (bytes_copied < size) {
+ u64 page_index = (queue_offset + bytes_copied) / PAGE_SIZE;
+ size_t page_offset =
+ (queue_offset + bytes_copied) & (PAGE_SIZE - 1);
+ void *va;
+ size_t to_copy;
+
+ if (!kernel_if->mapped)
+ va = kmap(kernel_if->page[page_index]);
+ else
+ va = (void *)((u8 *) kernel_if->va +
+ (page_index * PAGE_SIZE));
+
+ if (size - bytes_copied > PAGE_SIZE - page_offset)
+ /* Enough payload to fill up this page. */
+ to_copy = PAGE_SIZE - page_offset;
+ else
+ to_copy = size - bytes_copied;
+
+ if (is_iovec) {
+ struct iovec *iov = (struct iovec *)dest;
+ int err;
+
+ /* The iovec will track bytes_copied internally. */
+ err = memcpy_toiovec(iov, (u8 *) va + page_offset,
+ to_copy);
+ if (err != 0) {
+ kunmap(kernel_if->page[page_index]);
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+ } else {
+ memcpy((u8 *) dest + bytes_copied,
+ (u8 *) va + page_offset, to_copy);
+ }
+
+ bytes_copied += to_copy;
+ if (!kernel_if->mapped)
+ kunmap(kernel_if->page[page_index]);
+ }
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Allocates two list of PPNs --- one for the pages in the produce queue,
+ * and the other for the pages in the consume queue. Intializes the list
+ * of PPNs with the page frame numbers of the KVA for the two queues (and
+ * the queue headers).
+ */
+static int qp_alloc_ppn_set(void *prod_q,
+ u64 num_produce_pages,
+ void *cons_q,
+ u64 num_consume_pages, struct PPNSet *ppn_set)
+{
+ u32 *produce_ppns;
+ u32 *consume_ppns;
+ struct vmci_queue *produce_q = prod_q;
+ struct vmci_queue *consume_q = cons_q;
+ u64 i;
+
+ if (!produce_q || !num_produce_pages || !consume_q ||
+ !num_consume_pages || !ppn_set)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ if (ppn_set->initialized)
+ return VMCI_ERROR_ALREADY_EXISTS;
+
+ produce_ppns =
+ kmalloc(num_produce_pages * sizeof(*produce_ppns), GFP_KERNEL);
+ if (!produce_ppns)
+ return VMCI_ERROR_NO_MEM;
+
+ consume_ppns =
+ kmalloc(num_consume_pages * sizeof(*consume_ppns), GFP_KERNEL);
+ if (!consume_ppns) {
+ kfree(produce_ppns);
+ return VMCI_ERROR_NO_MEM;
+ }
+
+ produce_ppns[0] = page_to_pfn(vmalloc_to_page(produce_q->q_header));
+ for (i = 1; i < num_produce_pages; i++) {
+ unsigned long pfn;
+
+ produce_ppns[i] =
+ page_to_pfn(produce_q->kernel_if->page[i - 1]);
+ pfn = produce_ppns[i];
+
+ /* Fail allocation if PFN isn't supported by hypervisor. */
+ if (sizeof(pfn) > sizeof(*produce_ppns)
+ && pfn != produce_ppns[i])
+ goto ppn_error;
+ }
+
+ consume_ppns[0] = page_to_pfn(vmalloc_to_page(consume_q->q_header));
+ for (i = 1; i < num_consume_pages; i++) {
+ unsigned long pfn;
+
+ consume_ppns[i] =
+ page_to_pfn(consume_q->kernel_if->page[i - 1]);
+ pfn = consume_ppns[i];
+
+ /* Fail allocation if PFN isn't supported by hypervisor. */
+ if (sizeof(pfn) > sizeof(*consume_ppns)
+ && pfn != consume_ppns[i])
+ goto ppn_error;
+ }
+
+ ppn_set->num_produce_pages = num_produce_pages;
+ ppn_set->num_consume_pages = num_consume_pages;
+ ppn_set->produce_ppns = produce_ppns;
+ ppn_set->consume_ppns = consume_ppns;
+ ppn_set->initialized = true;
+ return VMCI_SUCCESS;
+
+ ppn_error:
+ kfree(produce_ppns);
+ kfree(consume_ppns);
+ return VMCI_ERROR_INVALID_ARGS;
+}
+
+/*
+ * Frees the two list of PPNs for a queue pair.
+ */
+static void qp_free_ppn_set(struct PPNSet *ppn_set)
+{
+ if (ppn_set->initialized) {
+ /* Do not call these functions on NULL inputs. */
+ kfree(ppn_set->produce_ppns);
+ kfree(ppn_set->consume_ppns);
+ }
+ memset(ppn_set, 0, sizeof(*ppn_set));
+}
+
+/*
+ * Populates the list of PPNs in the hypercall structure with the PPNS
+ * of the produce queue and the consume queue.
+ */
+static int qp_populate_ppn_set(u8 *call_buf, const struct PPNSet *ppn_set)
+{
+ memcpy(call_buf, ppn_set->produce_ppns,
+ ppn_set->num_produce_pages * sizeof(*ppn_set->produce_ppns));
+ memcpy(call_buf +
+ ppn_set->num_produce_pages * sizeof(*ppn_set->produce_ppns),
+ ppn_set->consume_ppns,
+ ppn_set->num_consume_pages * sizeof(*ppn_set->consume_ppns));
+
+ return VMCI_SUCCESS;
+}
+
+static int qp_memcpy_to_queue(struct vmci_queue *queue,
+ u64 queue_offset,
+ const void *src, size_t src_offset, size_t size)
+{
+ return __qp_memcpy_to_queue(queue, queue_offset,
+ (u8 *) src + src_offset, size, false);
+}
+
+static int qp_memcpy_from_queue(void *dest,
+ size_t dest_offset,
+ const struct vmci_queue *queue,
+ u64 queue_offset, size_t size)
+{
+ return __qp_memcpy_from_queue((u8 *) dest + dest_offset,
+ queue, queue_offset, size, false);
+}
+
+/*
+ * Copies from a given iovec from a VMCI Queue.
+ */
+static int qp_memcpy_to_queue_iov(struct vmci_queue *queue,
+ u64 queue_offset,
+ const void *src,
+ size_t src_offset, size_t size)
+{
+
+ /*
+ * We ignore src_offset because src is really a struct iovec * and will
+ * maintain offset internally.
+ */
+ return __qp_memcpy_to_queue(queue, queue_offset, src, size, true);
+}
+
+/*
+ * Copies to a given iovec from a VMCI Queue.
+ */
+static int qp_memcpy_from_queue_iov(void *dest,
+ size_t dest_offset,
+ const struct vmci_queue *queue,
+ u64 queue_offset, size_t size)
+{
+ /*
+ * We ignore dest_offset because dest is really a struct iovec * and
+ * will maintain offset internally.
+ */
+ return __qp_memcpy_from_queue(dest, queue, queue_offset, size, true);
+}
+
+/*
+ * Allocates kernel VA space of specified size plus space for the queue
+ * and kernel interface. This is different from the guest queue allocator,
+ * because we do not allocate our own queue header/data pages here but
+ * share those of the guest.
+ */
+static struct vmci_queue *qp_host_alloc_queue(u64 size)
+{
+ struct vmci_queue *queue;
+ const size_t num_pages = dm_div_up(size, PAGE_SIZE) + 1;
+ const size_t queue_size = sizeof(*queue) + sizeof(*(queue->kernel_if));
+ const size_t queue_page_size =
+ num_pages * sizeof(*queue->kernel_if->page);
+
+ queue = kzalloc(queue_size + queue_page_size, GFP_KERNEL);
+ if (queue) {
+ queue->q_header = NULL;
+ queue->saved_header = NULL;
+ queue->kernel_if =
+ (struct vmci_queue_kern_if *)((u8 *) queue +
+ sizeof(*queue));
+ queue->kernel_if->host = true;
+ queue->kernel_if->mutex = NULL;
+ queue->kernel_if->num_pages = num_pages;
+ queue->kernel_if->header_page =
+ (struct page **)((u8 *) queue + queue_size);
+ queue->kernel_if->page = &queue->kernel_if->header_page[1];
+ queue->kernel_if->va = NULL;
+ queue->kernel_if->mapped = false;
+ }
+
+ return queue;
+}
+
+/*
+ * Frees kernel memory for a given queue (header plus translation
+ * structure).
+ */
+static void qp_host_free_queue(struct vmci_queue *queue, u64 queue_size)
+{
+ kfree(queue);
+}
+
+/*
+ * Initialize the mutex for the pair of queues. This mutex is used to
+ * protect the q_header and the buffer from changing out from under any
+ * users of either queue. Of course, it's only any good if the mutexes
+ * are actually acquired. Queue structure must lie on non-paged memory
+ * or we cannot guarantee access to the mutex.
+ */
+static void qp_init_queue_mutex(struct vmci_queue *produce_q,
+ struct vmci_queue *consume_q)
+{
+ /*
+ * Only the host queue has shared state - the guest queues do not
+ * need to synchronize access using a queue mutex.
+ */
+
+ if (produce_q->kernel_if->host) {
+ produce_q->kernel_if->mutex = &produce_q->kernel_if->__mutex;
+ consume_q->kernel_if->mutex = &produce_q->kernel_if->__mutex;
+ mutex_init(produce_q->kernel_if->mutex);
+ }
+}
+
+/*
+ * Cleans up the mutex for the pair of queues.
+ */
+static void qp_cleanup_queue_mutex(struct vmci_queue *produce_q,
+ struct vmci_queue *consume_q)
+{
+ if (produce_q->kernel_if->host) {
+ produce_q->kernel_if->mutex = NULL;
+ consume_q->kernel_if->mutex = NULL;
+ }
+}
+
+/*
+ * Acquire the mutex for the queue. Note that the produce_q and
+ * the consume_q share a mutex. So, only one of the two need to
+ * be passed in to this routine. Either will work just fine.
+ */
+static void qp_acquire_queue_mutex(struct vmci_queue *queue)
+{
+ if (queue->kernel_if->host) {
+ mutex_lock(queue->kernel_if->mutex);
+ }
+}
+
+/*
+ * Release the mutex for the queue. Note that the produce_q and
+ * the consume_q share a mutex. So, only one of the two need to
+ * be passed in to this routine. Either will work just fine.
+ */
+static void qp_release_queue_mutex(struct vmci_queue *queue)
+{
+ if (queue->kernel_if->host) {
+ mutex_unlock(queue->kernel_if->mutex);
+ }
+}
+
+/*
+ * Helper function to release pages in the PageStoreAttachInfo
+ * previously obtained using get_user_pages.
+ */
+static void qp_release_pages(struct page **pages,
+ u64 num_pages, bool dirty)
+{
+ int i;
+
+ for (i = 0; i < num_pages; i++) {
+ if (dirty)
+ set_page_dirty(pages[i]);
+
+ page_cache_release(pages[i]);
+ pages[i] = NULL;
+ }
+}
+
+/*
+ * Lock the user pages referenced by the {produce,consume}Buffer
+ * struct into memory and populate the {produce,consume}Pages
+ * arrays in the attach structure with them.
+ */
+static int qp_host_get_user_memory(u64 produce_uva,
+ u64 consume_uva,
+ struct vmci_queue *produce_q,
+ struct vmci_queue *consume_q)
+{
+ int retval;
+ int err = VMCI_SUCCESS;
+
+ down_write(¤t->mm->mmap_sem);
+ retval = get_user_pages(current,
+ current->mm,
+ (uintptr_t) produce_uva,
+ produce_q->kernel_if->num_pages,
+ 1, 0, produce_q->kernel_if->header_page, NULL);
+ if (retval < produce_q->kernel_if->num_pages) {
+ pr_warn("get_user_pages(produce) failed (retval=%d)", retval);
+ qp_release_pages(produce_q->kernel_if->header_page, retval,
+ false);
+ err = VMCI_ERROR_NO_MEM;
+ goto out;
+ }
+
+ retval = get_user_pages(current,
+ current->mm,
+ (uintptr_t) consume_uva,
+ consume_q->kernel_if->num_pages,
+ 1, 0, consume_q->kernel_if->header_page, NULL);
+ if (retval < consume_q->kernel_if->num_pages) {
+ pr_warn("get_user_pages(consume) failed (retval=%d)", retval);
+ qp_release_pages(consume_q->kernel_if->header_page, retval,
+ false);
+ qp_release_pages(produce_q->kernel_if->header_page,
+ produce_q->kernel_if->num_pages, false);
+ err = VMCI_ERROR_NO_MEM;
+ }
+
+ out:
+ up_write(¤t->mm->mmap_sem);
+
+ return err;
+}
+
+/*
+ * Registers the specification of the user pages used for backing a queue
+ * pair. Enough information to map in pages is stored in the OS specific
+ * part of the struct vmci_queue structure.
+ */
+static int qp_host_register_user_memory(struct vmci_qp_page_store *page_store,
+ struct vmci_queue *produce_q,
+ struct vmci_queue *consume_q)
+{
+ u64 produce_uva;
+ u64 consume_uva;
+
+ /*
+ * The new style and the old style mapping only differs in
+ * that we either get a single or two UVAs, so we split the
+ * single UVA range at the appropriate spot.
+ */
+ produce_uva = page_store->pages;
+ consume_uva = page_store->pages +
+ produce_q->kernel_if->num_pages * PAGE_SIZE;
+ return qp_host_get_user_memory(produce_uva, consume_uva, produce_q,
+ consume_q);
+}
+
+/*
+ * Releases and removes the references to user pages stored in the attach
+ * struct. Pages are released from the page cache and may become
+ * swappable again.
+ */
+static void qp_host_unregister_user_memory(struct vmci_queue *produce_q,
+ struct vmci_queue *consume_q)
+{
+ qp_release_pages(produce_q->kernel_if->header_page,
+ produce_q->kernel_if->num_pages, true);
+ memset(produce_q->kernel_if->header_page, 0,
+ sizeof(*produce_q->kernel_if->header_page) *
+ produce_q->kernel_if->num_pages);
+ qp_release_pages(consume_q->kernel_if->header_page,
+ consume_q->kernel_if->num_pages, true);
+ memset(consume_q->kernel_if->header_page, 0,
+ sizeof(*consume_q->kernel_if->header_page) *
+ consume_q->kernel_if->num_pages);
+}
+
+/*
+ * Once qp_host_register_user_memory has been performed on a
+ * queue, the queue pair headers can be mapped into the
+ * kernel. Once mapped, they must be unmapped with
+ * qp_host_unmap_queues prior to calling
+ * qp_host_unregister_user_memory.
+ * Pages are pinned.
+ */
+static int qp_host_map_queues(struct vmci_queue *produce_q,
+ struct vmci_queue *consume_q)
+{
+ int result;
+
+ if (!produce_q->q_header || !consume_q->q_header) {
+ struct page *headers[2];
+
+ if (produce_q->q_header != consume_q->q_header)
+ return VMCI_ERROR_QUEUEPAIR_MISMATCH;
+
+ if (produce_q->kernel_if->header_page == NULL ||
+ *produce_q->kernel_if->header_page == NULL)
+ return VMCI_ERROR_UNAVAILABLE;
+
+ headers[0] = *produce_q->kernel_if->header_page;
+ headers[1] = *consume_q->kernel_if->header_page;
+
+ produce_q->q_header = vmap(headers, 2, VM_MAP, PAGE_KERNEL);
+ if (produce_q->q_header != NULL) {
+ consume_q->q_header =
+ (struct vmci_queue_header *)((u8 *)
+ produce_q->q_header +
+ PAGE_SIZE);
+ result = VMCI_SUCCESS;
+ } else {
+ pr_warn("vmap failed.");
+ result = VMCI_ERROR_NO_MEM;
+ }
+ } else {
+ result = VMCI_SUCCESS;
+ }
+
+ return result;
+}
+
+/*
+ * Unmaps previously mapped queue pair headers from the kernel.
+ * Pages are unpinned.
+ */
+static int qp_host_unmap_queues(u32 gid,
+ struct vmci_queue *produce_q,
+ struct vmci_queue *consume_q)
+{
+ if (produce_q->q_header) {
+ if (produce_q->q_header < consume_q->q_header)
+ vunmap(produce_q->q_header);
+ else
+ vunmap(consume_q->q_header);
+
+ produce_q->q_header = NULL;
+ consume_q->q_header = NULL;
+ }
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Finds the entry in the list corresponding to a given handle. Assumes
+ * that the list is locked.
+ */
+static struct qp_entry *qp_list_find(struct qp_list *qp_list,
+ struct vmci_handle handle)
+{
+ struct qp_entry *entry;
+
+ if (VMCI_HANDLE_INVALID(handle))
+ return NULL;
+
+ list_for_each_entry(entry, &qp_list->head, list_item) {
+ if (VMCI_HANDLE_EQUAL(entry->handle, handle))
+ return entry;
+ }
+
+ return NULL;
+}
+
+/*
+ * Finds the entry in the list corresponding to a given handle.
+ */
+static struct qp_guest_endpoint *
+qp_guest_handle_to_entry(struct vmci_handle handle)
+{
+ struct qp_guest_endpoint *entry;
+ struct qp_entry *qp = qp_list_find(&qp_guest_endpoints, handle);
+
+ entry = qp ? container_of(
+ qp, struct qp_guest_endpoint, qp) : NULL;
+ return entry;
+}
+
+/*
+ * Finds the entry in the list corresponding to a given handle.
+ */
+static struct qp_broker_entry *
+qp_broker_handle_to_entry(struct vmci_handle handle)
+{
+ struct qp_broker_entry *entry;
+ struct qp_entry *qp = qp_list_find(&qp_broker_list, handle);
+
+ entry = qp ? container_of(
+ qp, struct qp_broker_entry, qp) : NULL;
+ return entry;
+}
+
+/*
+ * Dispatches a queue pair event message directly into the local event
+ * queue.
+ */
+static int qp_notify_peer_local(bool attach, struct vmci_handle handle)
+{
+ struct vmci_event_msg *e_msg;
+ struct vmci_event_payld_qp *e_payload;
+ /* buf is only 48 bytes. */
+ char buf[sizeof(*e_msg) + sizeof(*e_payload)];
+ u32 context_id;
+
+ context_id = vmci_get_context_id();
+
+ e_msg = (struct vmci_event_msg *)buf;
+ e_payload = vmci_event_data_payload(&e_msg->event_data);
+
+ e_msg->hdr.dst = vmci_make_handle(context_id, VMCI_EVENT_HANDLER);
+ e_msg->hdr.src = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_CONTEXT_RESOURCE_ID);
+ e_msg->hdr.payload_size =
+ sizeof(*e_msg) + sizeof(*e_payload) - sizeof(e_msg->hdr);
+ e_msg->event_data.event =
+ attach ? VMCI_EVENT_QP_PEER_ATTACH : VMCI_EVENT_QP_PEER_DETACH;
+ e_payload->peer_id = context_id;
+ e_payload->handle = handle;
+
+ return vmci_event_dispatch(&e_msg->hdr);
+}
+
+/*
+ * Allocates and initializes a qp_guest_endpoint structure.
+ * Allocates a queue_pair rid (and handle) iff the given entry has
+ * an invalid handle. 0 through VMCI_RESERVED_RESOURCE_ID_MAX
+ * are reserved handles. Assumes that the QP list mutex is held
+ * by the caller.
+ */
+static struct qp_guest_endpoint *
+qp_guest_endpoint_create(struct vmci_handle handle,
+ u32 peer,
+ u32 flags,
+ u64 produce_size,
+ u64 consume_size,
+ void *produce_q,
+ void *consume_q)
+{
+ int result;
+ struct qp_guest_endpoint *entry;
+ /* One page each for the queue headers. */
+ const u64 num_ppns = dm_div_up(produce_size, PAGE_SIZE) +
+ dm_div_up(consume_size, PAGE_SIZE) + 2;
+
+ if (VMCI_HANDLE_INVALID(handle)) {
+ u32 context_id = vmci_get_context_id();
+
+ handle = vmci_make_handle(context_id, VMCI_INVALID_ID);
+ }
+
+ ASSERT(!VMCI_HANDLE_INVALID(handle) &&
+ qp_list_find(&qp_guest_endpoints, handle) == NULL);
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (entry) {
+ entry->qp.peer = peer;
+ entry->qp.flags = flags;
+ entry->qp.produce_size = produce_size;
+ entry->qp.consume_size = consume_size;
+ entry->qp.ref_count = 0;
+ entry->num_ppns = num_ppns;
+ entry->produce_q = produce_q;
+ entry->consume_q = consume_q;
+ INIT_LIST_HEAD(&entry->qp.list_item);
+
+ /* Add resource obj */
+ result = vmci_resource_add(&entry->resource,
+ VMCI_RESOURCE_TYPE_QPAIR_GUEST,
+ handle);
+ entry->qp.handle = vmci_resource_handle(&entry->resource);
+ if ((result != VMCI_SUCCESS) ||
+ qp_list_find(&qp_guest_endpoints, entry->qp.handle)) {
+ pr_warn("Failed to add new resource (handle=0x%x:0x%x), error: %d",
+ handle.context, handle.resource, result);
+ kfree(entry);
+ entry = NULL;
+ }
+ }
+ return entry;
+}
+
+/*
+ * Frees a qp_guest_endpoint structure.
+ */
+static void qp_guest_endpoint_destroy(struct qp_guest_endpoint *entry)
+{
+ ASSERT(entry->qp.ref_count == 0);
+
+ qp_free_ppn_set(&entry->ppn_set);
+ qp_cleanup_queue_mutex(entry->produce_q, entry->consume_q);
+ qp_free_queue(entry->produce_q, entry->qp.produce_size);
+ qp_free_queue(entry->consume_q, entry->qp.consume_size);
+ /* Unlink from resource hash table and free callback */
+ vmci_resource_remove(&entry->resource);
+
+ kfree(entry);
+}
+
+/*
+ * Helper to make a queue_pairAlloc hypercall when the driver is
+ * supporting a guest device.
+ */
+static int qp_alloc_hypercall(const struct qp_guest_endpoint *entry)
+{
+ struct vmci_qp_alloc_msg *alloc_msg;
+ size_t msg_size;
+ int result;
+
+ if (!entry || entry->num_ppns <= 2)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ ASSERT(!(entry->qp.flags & VMCI_QPFLAG_LOCAL));
+
+ msg_size = sizeof(*alloc_msg) +
+ (size_t) entry->num_ppns * sizeof(u32);
+ alloc_msg = kmalloc(msg_size, GFP_KERNEL);
+ if (!alloc_msg)
+ return VMCI_ERROR_NO_MEM;
+
+ alloc_msg->hdr.dst = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_QUEUEPAIR_ALLOC);
+ alloc_msg->hdr.src = VMCI_ANON_SRC_HANDLE;
+ alloc_msg->hdr.payload_size = msg_size - VMCI_DG_HEADERSIZE;
+ alloc_msg->handle = entry->qp.handle;
+ alloc_msg->peer = entry->qp.peer;
+ alloc_msg->flags = entry->qp.flags;
+ alloc_msg->produce_size = entry->qp.produce_size;
+ alloc_msg->consume_size = entry->qp.consume_size;
+ alloc_msg->num_ppns = entry->num_ppns;
+
+ result = qp_populate_ppn_set((u8 *) alloc_msg + sizeof(*alloc_msg),
+ &entry->ppn_set);
+ if (result == VMCI_SUCCESS)
+ result = vmci_send_datagram(&alloc_msg->hdr);
+
+ kfree(alloc_msg);
+
+ return result;
+}
+
+/*
+ * Helper to make a queue_pairDetach hypercall when the driver is
+ * supporting a guest device.
+ */
+static int qp_detatch_hypercall(struct vmci_handle handle)
+{
+ struct vmci_qp_detach_msg detach_msg;
+
+ detach_msg.hdr.dst = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_QUEUEPAIR_DETACH);
+ detach_msg.hdr.src = VMCI_ANON_SRC_HANDLE;
+ detach_msg.hdr.payload_size = sizeof(handle);
+ detach_msg.handle = handle;
+
+ return vmci_send_datagram(&detach_msg.hdr);
+}
+
+/*
+ * Adds the given entry to the list. Assumes that the list is locked.
+ */
+static void qp_list_add_entry(struct qp_list *qp_list, struct qp_entry *entry)
+{
+ if (entry)
+ list_add(&entry->list_item, &qp_list->head);
+}
+
+/*
+ * Removes the given entry from the list. Assumes that the list is locked.
+ */
+static void qp_list_remove_entry(struct qp_list *qp_list,
+ struct qp_entry *entry)
+{
+ if (entry)
+ list_del(&entry->list_item);
+}
+
+/*
+ * Helper for VMCI queue_pair detach interface. Frees the physical
+ * pages for the queue pair.
+ */
+static int qp_detatch_guest_work(struct vmci_handle handle)
+{
+ int result;
+ struct qp_guest_endpoint *entry;
+ u32 ref_count = ~0; /* To avoid compiler warning below */
+
+ mutex_lock(&qp_guest_endpoints.mutex);
+
+ entry = qp_guest_handle_to_entry(handle);
+ if (!entry) {
+ mutex_unlock(&qp_guest_endpoints.mutex);
+ return VMCI_ERROR_NOT_FOUND;
+ }
+
+ ASSERT(entry->qp.ref_count >= 1);
+
+ if (entry->qp.flags & VMCI_QPFLAG_LOCAL) {
+ result = VMCI_SUCCESS;
+
+ if (entry->qp.ref_count > 1) {
+ result = qp_notify_peer_local(false, handle);
+ /*
+ * We can fail to notify a local queuepair
+ * because we can't allocate. We still want
+ * to release the entry if that happens, so
+ * don't bail out yet.
+ */
+ }
+ } else {
+ result = qp_detatch_hypercall(handle);
+ if (result < VMCI_SUCCESS) {
+ /*
+ * We failed to notify a non-local queuepair.
+ * That other queuepair might still be
+ * accessing the shared memory, so don't
+ * release the entry yet. It will get cleaned
+ * up by VMCIqueue_pair_Exit() if necessary
+ * (assuming we are going away, otherwise why
+ * did this fail?).
+ */
+
+ mutex_unlock(&qp_guest_endpoints.mutex);
+ return result;
+ }
+ }
+
+ /*
+ * If we get here then we either failed to notify a local queuepair, or
+ * we succeeded in all cases. Release the entry if required.
+ */
+
+ entry->qp.ref_count--;
+ if (entry->qp.ref_count == 0)
+ qp_list_remove_entry(&qp_guest_endpoints, &entry->qp);
+
+ /* If we didn't remove the entry, this could change once we unlock. */
+ if (entry)
+ ref_count = entry->qp.ref_count;
+
+ mutex_unlock(&qp_guest_endpoints.mutex);
+
+ if (ref_count == 0)
+ qp_guest_endpoint_destroy(entry);
+
+ return result;
+}
+
+/*
+ * This functions handles the actual allocation of a VMCI queue
+ * pair guest endpoint. Allocates physical pages for the queue
+ * pair. It makes OS dependent calls through generic wrappers.
+ */
+static int qp_alloc_guest_work(struct vmci_handle *handle,
+ struct vmci_queue **produce_q,
+ u64 produce_size,
+ struct vmci_queue **consume_q,
+ u64 consume_size,
+ u32 peer,
+ u32 flags,
+ u32 priv_flags)
+{
+ const u64 num_produce_pages =
+ dm_div_up(produce_size, PAGE_SIZE) + 1;
+ const u64 num_consume_pages =
+ dm_div_up(consume_size, PAGE_SIZE) + 1;
+ void *my_produce_q = NULL;
+ void *my_consume_q = NULL;
+ int result;
+ struct qp_guest_endpoint *queue_pair_entry = NULL;
+
+ if (priv_flags != VMCI_NO_PRIVILEGE_FLAGS)
+ return VMCI_ERROR_NO_ACCESS;
+
+ mutex_lock(&qp_guest_endpoints.mutex);
+
+ queue_pair_entry = qp_guest_handle_to_entry(*handle);
+ if (queue_pair_entry) {
+ if (queue_pair_entry->qp.flags & VMCI_QPFLAG_LOCAL) {
+ /* Local attach case. */
+ if (queue_pair_entry->qp.ref_count > 1) {
+ pr_devel("Error attempting to attach more "
+ "than once.");
+ result = VMCI_ERROR_UNAVAILABLE;
+ goto error_keep_entry;
+ }
+
+ if (queue_pair_entry->qp.produce_size != consume_size ||
+ queue_pair_entry->qp.consume_size !=
+ produce_size ||
+ queue_pair_entry->qp.flags !=
+ (flags & ~VMCI_QPFLAG_ATTACH_ONLY)) {
+ pr_devel("Error mismatched queue pair in "
+ "local attach.");
+ result = VMCI_ERROR_QUEUEPAIR_MISMATCH;
+ goto error_keep_entry;
+ }
+
+ /*
+ * Do a local attach. We swap the consume and
+ * produce queues for the attacher and deliver
+ * an attach event.
+ */
+ result = qp_notify_peer_local(true, *handle);
+ if (result < VMCI_SUCCESS)
+ goto error_keep_entry;
+
+ my_produce_q = queue_pair_entry->consume_q;
+ my_consume_q = queue_pair_entry->produce_q;
+ goto out;
+ }
+
+ result = VMCI_ERROR_ALREADY_EXISTS;
+ goto error_keep_entry;
+ }
+
+ my_produce_q = qp_alloc_queue(produce_size, flags);
+ if (!my_produce_q) {
+ pr_warn("Error allocating pages for produce queue.");
+ result = VMCI_ERROR_NO_MEM;
+ goto error;
+ }
+
+ my_consume_q = qp_alloc_queue(consume_size, flags);
+ if (!my_consume_q) {
+ pr_warn("Error allocating pages for consume queue.");
+ result = VMCI_ERROR_NO_MEM;
+ goto error;
+ }
+
+ queue_pair_entry = qp_guest_endpoint_create(*handle, peer, flags,
+ produce_size, consume_size,
+ my_produce_q, my_consume_q);
+ if (!queue_pair_entry) {
+ pr_warn("Error allocating memory in %s.", __func__);
+ result = VMCI_ERROR_NO_MEM;
+ goto error;
+ }
+
+ result = qp_alloc_ppn_set(my_produce_q, num_produce_pages, my_consume_q,
+ num_consume_pages,
+ &queue_pair_entry->ppn_set);
+ if (result < VMCI_SUCCESS) {
+ pr_warn("qp_alloc_ppn_set failed.");
+ goto error;
+ }
+
+ /*
+ * It's only necessary to notify the host if this queue pair will be
+ * attached to from another context.
+ */
+ if (queue_pair_entry->qp.flags & VMCI_QPFLAG_LOCAL) {
+ /* Local create case. */
+ u32 context_id = vmci_get_context_id();
+
+ /*
+ * Enforce similar checks on local queue pairs as we
+ * do for regular ones. The handle's context must
+ * match the creator or attacher context id (here they
+ * are both the current context id) and the
+ * attach-only flag cannot exist during create. We
+ * also ensure specified peer is this context or an
+ * invalid one.
+ */
+ if (queue_pair_entry->qp.handle.context != context_id ||
+ (queue_pair_entry->qp.peer != VMCI_INVALID_ID &&
+ queue_pair_entry->qp.peer != context_id)) {
+ result = VMCI_ERROR_NO_ACCESS;
+ goto error;
+ }
+
+ if (queue_pair_entry->qp.flags & VMCI_QPFLAG_ATTACH_ONLY) {
+ result = VMCI_ERROR_NOT_FOUND;
+ goto error;
+ }
+ } else {
+ result = qp_alloc_hypercall(queue_pair_entry);
+ if (result < VMCI_SUCCESS) {
+ pr_warn("qp_alloc_hypercall result = %d.", result);
+ goto error;
+ }
+ }
+
+ qp_init_queue_mutex((struct vmci_queue *)my_produce_q,
+ (struct vmci_queue *)my_consume_q);
+
+ qp_list_add_entry(&qp_guest_endpoints, &queue_pair_entry->qp);
+
+ out:
+ queue_pair_entry->qp.ref_count++;
+ *handle = queue_pair_entry->qp.handle;
+ *produce_q = (struct vmci_queue *)my_produce_q;
+ *consume_q = (struct vmci_queue *)my_consume_q;
+
+ /*
+ * We should initialize the queue pair header pages on a local
+ * queue pair create. For non-local queue pairs, the
+ * hypervisor initializes the header pages in the create step.
+ */
+ if ((queue_pair_entry->qp.flags & VMCI_QPFLAG_LOCAL) &&
+ queue_pair_entry->qp.ref_count == 1) {
+ vmci_q_header_init((*produce_q)->q_header, *handle);
+ vmci_q_header_init((*consume_q)->q_header, *handle);
+ }
+
+ mutex_unlock(&qp_guest_endpoints.mutex);
+
+ return VMCI_SUCCESS;
+
+ error:
+ mutex_unlock(&qp_guest_endpoints.mutex);
+ if (queue_pair_entry) {
+ /* The queues will be freed inside the destroy routine. */
+ qp_guest_endpoint_destroy(queue_pair_entry);
+ } else {
+ qp_free_queue(my_produce_q, produce_size);
+ qp_free_queue(my_consume_q, consume_size);
+ }
+ return result;
+
+ error_keep_entry:
+ /* This path should only be used when an existing entry was found. */
+ ASSERT(queue_pair_entry->qp.ref_count > 0);
+ mutex_unlock(&qp_guest_endpoints.mutex);
+ return result;
+}
+
+/*
+ * The first endpoint issuing a queue pair allocation will create the state
+ * of the queue pair in the queue pair broker.
+ *
+ * If the creator is a guest, it will associate a VMX virtual address range
+ * with the queue pair as specified by the page_store. For compatibility with
+ * older VMX'en, that would use a separate step to set the VMX virtual
+ * address range, the virtual address range can be registered later using
+ * vmci_qp_broker_set_page_store. In that case, a page_store of NULL should be
+ * used.
+ *
+ * If the creator is the host, a page_store of NULL should be used as well,
+ * since the host is not able to supply a page store for the queue pair.
+ *
+ * For older VMX and host callers, the queue pair will be created in the
+ * VMCIQPB_CREATED_NO_MEM state, and for current VMX callers, it will be
+ * created in VMCOQPB_CREATED_MEM state.
+ */
+static int qp_broker_create(struct vmci_handle handle,
+ u32 peer,
+ u32 flags,
+ u32 priv_flags,
+ u64 produce_size,
+ u64 consume_size,
+ struct vmci_qp_page_store *page_store,
+ struct vmci_ctx *context,
+ vmci_event_release_cb wakeup_cb,
+ void *client_data, struct qp_broker_entry **ent)
+{
+ struct qp_broker_entry *entry = NULL;
+ const u32 context_id = vmci_ctx_get_id(context);
+ bool is_local = flags & VMCI_QPFLAG_LOCAL;
+ int result;
+ u64 guest_produce_size;
+ u64 guest_consume_size;
+
+ /* Do not create if the caller asked not to. */
+ if (flags & VMCI_QPFLAG_ATTACH_ONLY)
+ return VMCI_ERROR_NOT_FOUND;
+
+ /*
+ * Creator's context ID should match handle's context ID or the creator
+ * must allow the context in handle's context ID as the "peer".
+ */
+ if (handle.context != context_id && handle.context != peer)
+ return VMCI_ERROR_NO_ACCESS;
+
+ if (VMCI_CONTEXT_IS_VM(context_id) && VMCI_CONTEXT_IS_VM(peer))
+ return VMCI_ERROR_DST_UNREACHABLE;
+
+ /*
+ * Creator's context ID for local queue pairs should match the
+ * peer, if a peer is specified.
+ */
+ if (is_local && peer != VMCI_INVALID_ID && context_id != peer)
+ return VMCI_ERROR_NO_ACCESS;
+
+ entry = kzalloc(sizeof(*entry), GFP_ATOMIC);
+ if (!entry)
+ return VMCI_ERROR_NO_MEM;
+
+ if (vmci_ctx_get_id(context) == VMCI_HOST_CONTEXT_ID && !is_local) {
+ /*
+ * The queue pair broker entry stores values from the guest
+ * point of view, so a creating host side endpoint should swap
+ * produce and consume values -- unless it is a local queue
+ * pair, in which case no swapping is necessary, since the local
+ * attacher will swap queues.
+ */
+
+ guest_produce_size = consume_size;
+ guest_consume_size = produce_size;
+ } else {
+ guest_produce_size = produce_size;
+ guest_consume_size = consume_size;
+ }
+
+ entry->qp.handle = handle;
+ entry->qp.peer = peer;
+ entry->qp.flags = flags;
+ entry->qp.produce_size = guest_produce_size;
+ entry->qp.consume_size = guest_consume_size;
+ entry->qp.ref_count = 1;
+ entry->create_id = context_id;
+ entry->attach_id = VMCI_INVALID_ID;
+ entry->state = VMCIQPB_NEW;
+ entry->require_trusted_attach =
+ !!(context->priv_flags & VMCI_PRIVILEGE_FLAG_RESTRICTED);
+ entry->created_by_trusted =
+ !!(priv_flags & VMCI_PRIVILEGE_FLAG_TRUSTED);
+ entry->vmci_page_files = false;
+ entry->wakeup_cb = wakeup_cb;
+ entry->client_data = client_data;
+ entry->produce_q = qp_host_alloc_queue(guest_produce_size);
+ if (entry->produce_q == NULL) {
+ result = VMCI_ERROR_NO_MEM;
+ goto error;
+ }
+ entry->consume_q = qp_host_alloc_queue(guest_consume_size);
+ if (entry->consume_q == NULL) {
+ result = VMCI_ERROR_NO_MEM;
+ goto error;
+ }
+
+ qp_init_queue_mutex(entry->produce_q, entry->consume_q);
+
+ INIT_LIST_HEAD(&entry->qp.list_item);
+
+ if (is_local) {
+ u8 *tmp;
+
+ entry->local_mem = kcalloc(QPE_NUM_PAGES(entry->qp),
+ PAGE_SIZE, GFP_KERNEL);
+ if (entry->local_mem == NULL) {
+ result = VMCI_ERROR_NO_MEM;
+ goto error;
+ }
+ entry->state = VMCIQPB_CREATED_MEM;
+ entry->produce_q->q_header = entry->local_mem;
+ tmp = (u8 *) entry->local_mem + PAGE_SIZE *
+ (dm_div_up(entry->qp.produce_size, PAGE_SIZE) + 1);
+ entry->consume_q->q_header = (struct vmci_queue_header *)tmp;
+ } else if (page_store) {
+ ASSERT(entry->create_id != VMCI_HOST_CONTEXT_ID || is_local);
+
+ /*
+ * The VMX already initialized the queue pair headers, so no
+ * need for the kernel side to do that.
+ */
+ result = qp_host_register_user_memory(page_store,
+ entry->produce_q,
+ entry->consume_q);
+ if (result < VMCI_SUCCESS)
+ goto error;
+
+ entry->state = VMCIQPB_CREATED_MEM;
+ } else {
+ /*
+ * A create without a page_store may be either a host
+ * side create (in which case we are waiting for the
+ * guest side to supply the memory) or an old style
+ * queue pair create (in which case we will expect a
+ * set page store call as the next step).
+ */
+ entry->state = VMCIQPB_CREATED_NO_MEM;
+ }
+
+ qp_list_add_entry(&qp_broker_list, &entry->qp);
+ if (ent != NULL)
+ *ent = entry;
+
+ /* Add to resource obj */
+ result = vmci_resource_add(&entry->resource,
+ VMCI_RESOURCE_TYPE_QPAIR_HOST,
+ handle);
+ if (result != VMCI_SUCCESS) {
+ pr_warn("Failed to add new resource (handle=0x%x:0x%x), error: %d",
+ handle.context, handle.resource, result);
+ goto error;
+ }
+
+ entry->qp.handle = vmci_resource_handle(&entry->resource);
+ if (is_local) {
+ vmci_q_header_init(entry->produce_q->q_header,
+ entry->qp.handle);
+ vmci_q_header_init(entry->consume_q->q_header,
+ entry->qp.handle);
+ }
+
+ vmci_ctx_qp_create(context, entry->qp.handle);
+
+ return VMCI_SUCCESS;
+
+ error:
+ if (entry != NULL) {
+ qp_host_free_queue(entry->produce_q, guest_produce_size);
+ qp_host_free_queue(entry->consume_q, guest_consume_size);
+ kfree(entry);
+ }
+
+ return result;
+}
+
+/*
+ * Enqueues an event datagram to notify the peer VM attached to
+ * the given queue pair handle about attach/detach event by the
+ * given VM. Returns Payload size of datagram enqueued on
+ * success, error code otherwise.
+ */
+static int qp_notify_peer(bool attach,
+ struct vmci_handle handle,
+ u32 my_id,
+ u32 peer_id)
+{
+ int rv;
+ struct vmci_event_msg *e_msg;
+ struct vmci_event_payld_qp *ev_payload;
+ char buf[sizeof(*e_msg) + sizeof(*ev_payload)];
+
+ if (VMCI_HANDLE_INVALID(handle) || my_id == VMCI_INVALID_ID ||
+ peer_id == VMCI_INVALID_ID)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ /*
+ * Notification message contains: queue pair handle and
+ * attaching/detaching VM's context id.
+ */
+ e_msg = (struct vmci_event_msg *)buf;
+
+ /*
+ * In vmci_ctx_enqueue_datagram() we enforce the upper limit on
+ * number of pending events from the hypervisor to a given VM
+ * otherwise a rogue VM could do an arbitrary number of attach
+ * and detach operations causing memory pressure in the host
+ * kernel.
+ */
+
+ /* Clear out any garbage. */
+ memset(e_msg, 0, sizeof(buf));
+
+ e_msg->hdr.dst = vmci_make_handle(peer_id, VMCI_EVENT_HANDLER);
+ e_msg->hdr.src = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_CONTEXT_RESOURCE_ID);
+ e_msg->hdr.payload_size = sizeof(*e_msg) + sizeof(*ev_payload) -
+ sizeof(e_msg->hdr);
+ e_msg->event_data.event = attach ?
+ VMCI_EVENT_QP_PEER_ATTACH : VMCI_EVENT_QP_PEER_DETACH;
+ ev_payload = vmci_event_data_payload(&e_msg->event_data);
+ ev_payload->handle = handle;
+ ev_payload->peer_id = my_id;
+
+ rv = vmci_datagram_dispatch(VMCI_HYPERVISOR_CONTEXT_ID,
+ &e_msg->hdr, false);
+ if (rv < VMCI_SUCCESS)
+ pr_warn("Failed to enqueue queue_pair %s event datagram "
+ "for context (ID=0x%x).", attach ? "ATTACH" : "DETACH",
+ peer_id);
+
+ return rv;
+}
+
+/*
+ * The second endpoint issuing a queue pair allocation will attach to
+ * the queue pair registered with the queue pair broker.
+ *
+ * If the attacher is a guest, it will associate a VMX virtual address
+ * range with the queue pair as specified by the page_store. At this
+ * point, the already attach host endpoint may start using the queue
+ * pair, and an attach event is sent to it. For compatibility with
+ * older VMX'en, that used a separate step to set the VMX virtual
+ * address range, the virtual address range can be registered later
+ * using vmci_qp_broker_set_page_store. In that case, a page_store of
+ * NULL should be used, and the attach event will be generated once
+ * the actual page store has been set.
+ *
+ * If the attacher is the host, a page_store of NULL should be used as
+ * well, since the page store information is already set by the guest.
+ *
+ * For new VMX and host callers, the queue pair will be moved to the
+ * VMCIQPB_ATTACHED_MEM state, and for older VMX callers, it will be
+ * moved to the VMCOQPB_ATTACHED_NO_MEM state.
+ */
+static int qp_broker_attach(struct qp_broker_entry *entry,
+ u32 peer,
+ u32 flags,
+ u32 priv_flags,
+ u64 produce_size,
+ u64 consume_size,
+ struct vmci_qp_page_store *page_store,
+ struct vmci_ctx *context,
+ vmci_event_release_cb wakeup_cb,
+ void *client_data,
+ struct qp_broker_entry **ent)
+{
+ const u32 context_id = vmci_ctx_get_id(context);
+ bool is_local = flags & VMCI_QPFLAG_LOCAL;
+ int result;
+
+ if (entry->state != VMCIQPB_CREATED_NO_MEM &&
+ entry->state != VMCIQPB_CREATED_MEM)
+ return VMCI_ERROR_UNAVAILABLE;
+
+ if (is_local) {
+ if (!(entry->qp.flags & VMCI_QPFLAG_LOCAL) ||
+ context_id != entry->create_id) {
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+ } else if (context_id == entry->create_id ||
+ context_id == entry->attach_id) {
+ return VMCI_ERROR_ALREADY_EXISTS;
+ }
+
+ ASSERT(entry->qp.ref_count < 2);
+ ASSERT(entry->attach_id == VMCI_INVALID_ID);
+
+ if (VMCI_CONTEXT_IS_VM(context_id) &&
+ VMCI_CONTEXT_IS_VM(entry->create_id))
+ return VMCI_ERROR_DST_UNREACHABLE;
+
+ /*
+ * If we are attaching from a restricted context then the queuepair
+ * must have been created by a trusted endpoint.
+ */
+ if ((context->priv_flags & VMCI_PRIVILEGE_FLAG_RESTRICTED) &&
+ !entry->created_by_trusted)
+ return VMCI_ERROR_NO_ACCESS;
+
+ /*
+ * If we are attaching to a queuepair that was created by a restricted
+ * context then we must be trusted.
+ */
+ if (entry->require_trusted_attach &&
+ (!(priv_flags & VMCI_PRIVILEGE_FLAG_TRUSTED)))
+ return VMCI_ERROR_NO_ACCESS;
+
+ /*
+ * If the creator specifies VMCI_INVALID_ID in "peer" field, access
+ * control check is not performed.
+ */
+ if (entry->qp.peer != VMCI_INVALID_ID && entry->qp.peer != context_id)
+ return VMCI_ERROR_NO_ACCESS;
+
+ if (entry->create_id == VMCI_HOST_CONTEXT_ID) {
+ /*
+ * Do not attach if the caller doesn't support Host Queue Pairs
+ * and a host created this queue pair.
+ */
+
+ if (!vmci_ctx_supports_host_qp(context))
+ return VMCI_ERROR_INVALID_RESOURCE;
+
+ } else if (context_id == VMCI_HOST_CONTEXT_ID) {
+ struct vmci_ctx *create_context;
+ bool supports_host_qp;
+
+ /*
+ * Do not attach a host to a user created queue pair if that
+ * user doesn't support host queue pair end points.
+ */
+
+ create_context = vmci_ctx_get(entry->create_id);
+ supports_host_qp = vmci_ctx_supports_host_qp(create_context);
+ vmci_ctx_put(create_context);
+
+ if (!supports_host_qp)
+ return VMCI_ERROR_INVALID_RESOURCE;
+ }
+
+ if ((entry->qp.flags & ~VMCI_QP_ASYMM) != (flags & ~VMCI_QP_ASYMM_PEER))
+ return VMCI_ERROR_QUEUEPAIR_MISMATCH;
+
+ if (context_id != VMCI_HOST_CONTEXT_ID) {
+ /*
+ * The queue pair broker entry stores values from the guest
+ * point of view, so an attaching guest should match the values
+ * stored in the entry.
+ */
+
+ if (entry->qp.produce_size != produce_size ||
+ entry->qp.consume_size != consume_size) {
+ return VMCI_ERROR_QUEUEPAIR_MISMATCH;
+ }
+ } else if (entry->qp.produce_size != consume_size ||
+ entry->qp.consume_size != produce_size) {
+ return VMCI_ERROR_QUEUEPAIR_MISMATCH;
+ }
+
+ if (context_id != VMCI_HOST_CONTEXT_ID) {
+ /*
+ * If a guest attached to a queue pair, it will supply
+ * the backing memory. If this is a pre NOVMVM vmx,
+ * the backing memory will be supplied by calling
+ * vmci_qp_broker_set_page_store() following the
+ * return of the vmci_qp_broker_alloc() call. If it is
+ * a vmx of version NOVMVM or later, the page store
+ * must be supplied as part of the
+ * vmci_qp_broker_alloc call. Under all circumstances
+ * must the initially created queue pair not have any
+ * memory associated with it already.
+ */
+
+ if (entry->state != VMCIQPB_CREATED_NO_MEM)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ if (page_store != NULL) {
+ /*
+ * Patch up host state to point to guest
+ * supplied memory. The VMX already
+ * initialized the queue pair headers, so no
+ * need for the kernel side to do that.
+ */
+
+ result = qp_host_register_user_memory(page_store,
+ entry->produce_q,
+ entry->consume_q);
+ if (result < VMCI_SUCCESS)
+ return result;
+
+ /*
+ * Preemptively load in the headers if non-blocking to
+ * prevent blocking later.
+ */
+ if (entry->qp.flags & VMCI_QPFLAG_NONBLOCK) {
+ result = qp_host_map_queues(entry->produce_q,
+ entry->consume_q);
+ if (result < VMCI_SUCCESS) {
+ qp_host_unregister_user_memory(
+ entry->produce_q,
+ entry->consume_q);
+ return result;
+ }
+ }
+
+ entry->state = VMCIQPB_ATTACHED_MEM;
+ } else {
+ entry->state = VMCIQPB_ATTACHED_NO_MEM;
+ }
+ } else if (entry->state == VMCIQPB_CREATED_NO_MEM) {
+ /*
+ * The host side is attempting to attach to a queue
+ * pair that doesn't have any memory associated with
+ * it. This must be a pre NOVMVM vmx that hasn't set
+ * the page store information yet, or a quiesced VM.
+ */
+
+ return VMCI_ERROR_UNAVAILABLE;
+ } else {
+ /*
+ * For non-blocking queue pairs, we cannot rely on
+ * enqueue/dequeue to map in the pages on the
+ * host-side, since it may block, so we make an
+ * attempt here.
+ */
+
+ if (flags & VMCI_QPFLAG_NONBLOCK) {
+ result =
+ qp_host_map_queues(entry->produce_q,
+ entry->consume_q);
+ if (result < VMCI_SUCCESS)
+ return result;
+
+ entry->qp.flags |= flags &
+ (VMCI_QPFLAG_NONBLOCK | VMCI_QPFLAG_PINNED);
+ }
+
+ /* The host side has successfully attached to a queue pair. */
+ entry->state = VMCIQPB_ATTACHED_MEM;
+ }
+
+ if (entry->state == VMCIQPB_ATTACHED_MEM) {
+ result =
+ qp_notify_peer(true, entry->qp.handle, context_id,
+ entry->create_id);
+ if (result < VMCI_SUCCESS)
+ pr_warn("Failed to notify peer (ID=0x%x) of "
+ "attach to queue pair (handle=0x%x:0x%x).",
+ entry->create_id, entry->qp.handle.context,
+ entry->qp.handle.resource);
+ }
+
+ entry->attach_id = context_id;
+ entry->qp.ref_count++;
+ if (wakeup_cb) {
+ entry->wakeup_cb = wakeup_cb;
+ entry->client_data = client_data;
+ }
+
+ /*
+ * When attaching to local queue pairs, the context already has
+ * an entry tracking the queue pair, so don't add another one.
+ */
+ if (!is_local)
+ vmci_ctx_qp_create(context, entry->qp.handle);
+ else
+ ASSERT(vmci_ctx_qp_exists(context, entry->qp.handle));
+
+ if (ent != NULL)
+ *ent = entry;
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * queue_pair_Alloc for use when setting up queue pair endpoints
+ * on the host.
+ */
+static int qp_broker_alloc(struct vmci_handle handle,
+ u32 peer,
+ u32 flags,
+ u32 priv_flags,
+ u64 produce_size,
+ u64 consume_size,
+ struct vmci_qp_page_store *page_store,
+ struct vmci_ctx *context,
+ vmci_event_release_cb wakeup_cb,
+ void *client_data,
+ struct qp_broker_entry **ent,
+ bool *swap)
+{
+ const u32 context_id = vmci_ctx_get_id(context);
+ bool create;
+ struct qp_broker_entry *entry = NULL;
+ bool is_local = flags & VMCI_QPFLAG_LOCAL;
+ int result;
+
+ if (VMCI_HANDLE_INVALID(handle) ||
+ (flags & ~VMCI_QP_ALL_FLAGS) || is_local ||
+ !(produce_size || consume_size) ||
+ !context || context_id == VMCI_INVALID_ID ||
+ handle.context == VMCI_INVALID_ID) {
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ if (page_store && !VMCI_QP_PAGESTORE_IS_WELLFORMED(page_store))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ /*
+ * In the initial argument check, we ensure that non-vmkernel hosts
+ * are not allowed to create local queue pairs.
+ */
+
+ ASSERT(!is_local);
+
+ mutex_lock(&qp_broker_list.mutex);
+
+ if (!is_local && vmci_ctx_qp_exists(context, handle)) {
+ pr_devel("Context (ID=0x%x) already attached to queue "
+ "pair (handle=0x%x:0x%x).", context_id,
+ handle.context, handle.resource);
+ mutex_unlock(&qp_broker_list.mutex);
+ return VMCI_ERROR_ALREADY_EXISTS;
+ }
+
+ if (handle.resource != VMCI_INVALID_ID)
+ entry = qp_broker_handle_to_entry(handle);
+
+ if (!entry) {
+ create = true;
+ result =
+ qp_broker_create(handle, peer, flags, priv_flags,
+ produce_size, consume_size, page_store,
+ context, wakeup_cb, client_data, ent);
+ } else {
+ create = false;
+ result =
+ qp_broker_attach(entry, peer, flags, priv_flags,
+ produce_size, consume_size, page_store,
+ context, wakeup_cb, client_data, ent);
+ }
+
+ mutex_unlock(&qp_broker_list.mutex);
+
+ if (swap)
+ *swap = (context_id == VMCI_HOST_CONTEXT_ID) &&
+ !(create && is_local);
+
+ return result;
+}
+
+/*
+ * This function implements the kernel API for allocating a queue
+ * pair.
+ */
+static int qp_alloc_host_work(struct vmci_handle *handle,
+ struct vmci_queue **produce_q,
+ u64 produce_size,
+ struct vmci_queue **consume_q,
+ u64 consume_size,
+ u32 peer,
+ u32 flags,
+ u32 priv_flags,
+ vmci_event_release_cb wakeup_cb,
+ void *client_data)
+{
+ struct vmci_ctx *context;
+ struct qp_broker_entry *entry;
+ int result;
+ bool swap;
+
+ if (VMCI_HANDLE_INVALID(*handle)) {
+ handle->context = VMCI_HOST_CONTEXT_ID;
+ *handle = vmci_make_handle(handle->context, VMCI_INVALID_ID);
+ }
+
+ context = vmci_ctx_get(VMCI_HOST_CONTEXT_ID);
+ entry = NULL;
+ result =
+ qp_broker_alloc(*handle, peer, flags, priv_flags,
+ produce_size, consume_size, NULL, context,
+ wakeup_cb, client_data, &entry, &swap);
+ if (result == VMCI_SUCCESS) {
+ if (swap) {
+ /*
+ * If this is a local queue pair, the attacher
+ * will swap around produce and consume
+ * queues.
+ */
+
+ *produce_q = entry->consume_q;
+ *consume_q = entry->produce_q;
+ } else {
+ *produce_q = entry->produce_q;
+ *consume_q = entry->consume_q;
+ }
+ } else {
+ *handle = VMCI_INVALID_HANDLE;
+ pr_devel("queue pair broker failed to alloc (result=%d).",
+ result);
+ }
+ vmci_ctx_put(context);
+ return result;
+}
+
+/*
+ * Allocates a VMCI queue_pair. Only checks validity of input
+ * arguments. The real work is done in the host or guest
+ * specific function.
+ */
+int vmci_qp_alloc(struct vmci_handle *handle,
+ struct vmci_queue **produce_q,
+ u64 produce_size,
+ struct vmci_queue **consume_q,
+ u64 consume_size,
+ u32 peer,
+ u32 flags,
+ u32 priv_flags,
+ bool guest_endpoint,
+ vmci_event_release_cb wakeup_cb,
+ void *client_data)
+{
+ if (!handle || !produce_q || !consume_q ||
+ (!produce_size && !consume_size) || (flags & ~VMCI_QP_ALL_FLAGS))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ if (guest_endpoint) {
+ return qp_alloc_guest_work(handle, produce_q,
+ produce_size, consume_q,
+ consume_size, peer,
+ flags, priv_flags);
+ } else {
+ return qp_alloc_host_work(handle, produce_q,
+ produce_size, consume_q,
+ consume_size, peer, flags,
+ priv_flags, wakeup_cb, client_data);
+ }
+}
+
+/*
+ * This function implements the host kernel API for detaching from
+ * a queue pair.
+ */
+static int qp_detatch_host_work(struct vmci_handle handle)
+{
+ int result;
+ struct vmci_ctx *context;
+
+ context = vmci_ctx_get(VMCI_HOST_CONTEXT_ID);
+
+ result = vmci_qp_broker_detach(handle, context);
+
+ vmci_ctx_put(context);
+ return result;
+}
+
+/*
+ * Detaches from a VMCI queue_pair. Only checks validity of input argument.
+ * Real work is done in the host or guest specific function.
+ */
+static int qp_detatch(struct vmci_handle handle, bool guest_endpoint)
+{
+ if (VMCI_HANDLE_INVALID(handle))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ if (guest_endpoint)
+ return qp_detatch_guest_work(handle);
+ else
+ return qp_detatch_host_work(handle);
+}
+
+/*
+ * Returns the entry from the head of the list. Assumes that the list is
+ * locked.
+ */
+static struct qp_entry *qp_list_get_head(struct qp_list *qp_list)
+{
+ if (!list_empty(&qp_list->head)) {
+ struct qp_entry *entry =
+ list_first_entry(&qp_list->head, struct qp_entry,
+ list_item);
+ return entry;
+ }
+
+ return NULL;
+}
+
+void vmci_qp_broker_exit(void)
+{
+ struct qp_entry *entry;
+ struct qp_broker_entry *be;
+
+ mutex_lock(&qp_broker_list.mutex);
+
+ while ((entry = qp_list_get_head(&qp_broker_list))) {
+ be = (struct qp_broker_entry *)entry;
+
+ qp_list_remove_entry(&qp_broker_list, entry);
+ kfree(be);
+ }
+
+ mutex_unlock(&qp_broker_list.mutex);
+}
+
+/*
+ * Requests that a queue pair be allocated with the VMCI queue
+ * pair broker. Allocates a queue pair entry if one does not
+ * exist. Attaches to one if it exists, and retrieves the page
+ * files backing that queue_pair. Assumes that the queue pair
+ * broker lock is held.
+ */
+int vmci_qp_broker_alloc(struct vmci_handle handle,
+ u32 peer,
+ u32 flags,
+ u32 priv_flags,
+ u64 produce_size,
+ u64 consume_size,
+ struct vmci_qp_page_store *page_store,
+ struct vmci_ctx *context)
+{
+ return qp_broker_alloc(handle, peer, flags, priv_flags,
+ produce_size, consume_size,
+ page_store, context, NULL, NULL, NULL, NULL);
+}
+
+/*
+ * VMX'en with versions lower than VMCI_VERSION_NOVMVM use a separate
+ * step to add the UVAs of the VMX mapping of the queue pair. This function
+ * provides backwards compatibility with such VMX'en, and takes care of
+ * registering the page store for a queue pair previously allocated by the
+ * VMX during create or attach. This function will move the queue pair state
+ * to either from VMCIQBP_CREATED_NO_MEM to VMCIQBP_CREATED_MEM or
+ * VMCIQBP_ATTACHED_NO_MEM to VMCIQBP_ATTACHED_MEM. If moving to the
+ * attached state with memory, the queue pair is ready to be used by the
+ * host peer, and an attached event will be generated.
+ *
+ * Assumes that the queue pair broker lock is held.
+ *
+ * This function is only used by the hosted platform, since there is no
+ * issue with backwards compatibility for vmkernel.
+ */
+int vmci_qp_broker_set_page_store(struct vmci_handle handle,
+ u64 produce_uva,
+ u64 consume_uva,
+ struct vmci_ctx *context)
+{
+ struct qp_broker_entry *entry;
+ int result;
+ const u32 context_id = vmci_ctx_get_id(context);
+
+ if (VMCI_HANDLE_INVALID(handle) || !context ||
+ context_id == VMCI_INVALID_ID)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ /*
+ * We only support guest to host queue pairs, so the VMX must
+ * supply UVAs for the mapped page files.
+ */
+
+ if (produce_uva == 0 || consume_uva == 0)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ mutex_lock(&qp_broker_list.mutex);
+
+ if (!vmci_ctx_qp_exists(context, handle)) {
+ pr_warn("Context (ID=0x%x) not attached to queue pair "
+ "(handle=0x%x:0x%x).", context_id, handle.context,
+ handle.resource);
+ result = VMCI_ERROR_NOT_FOUND;
+ goto out;
+ }
+
+ entry = qp_broker_handle_to_entry(handle);
+ if (!entry) {
+ result = VMCI_ERROR_NOT_FOUND;
+ goto out;
+ }
+
+ /*
+ * If I'm the owner then I can set the page store.
+ *
+ * Or, if a host created the queue_pair and I'm the attached peer
+ * then I can set the page store.
+ */
+ if (entry->create_id != context_id &&
+ (entry->create_id != VMCI_HOST_CONTEXT_ID ||
+ entry->attach_id != context_id)) {
+ result = VMCI_ERROR_QUEUEPAIR_NOTOWNER;
+ goto out;
+ }
+
+ if (entry->state != VMCIQPB_CREATED_NO_MEM &&
+ entry->state != VMCIQPB_ATTACHED_NO_MEM) {
+ result = VMCI_ERROR_UNAVAILABLE;
+ goto out;
+ }
+
+ result = qp_host_get_user_memory(produce_uva, consume_uva,
+ entry->produce_q, entry->consume_q);
+ if (result < VMCI_SUCCESS)
+ goto out;
+
+ result = qp_host_map_queues(entry->produce_q, entry->consume_q);
+ if (result < VMCI_SUCCESS) {
+ qp_host_unregister_user_memory(entry->produce_q,
+ entry->consume_q);
+ goto out;
+ }
+
+ if (entry->state == VMCIQPB_CREATED_NO_MEM) {
+ entry->state = VMCIQPB_CREATED_MEM;
+ } else {
+ ASSERT(entry->state == VMCIQPB_ATTACHED_NO_MEM);
+ entry->state = VMCIQPB_ATTACHED_MEM;
+ }
+ entry->vmci_page_files = true;
+
+ if (entry->state == VMCIQPB_ATTACHED_MEM) {
+ result =
+ qp_notify_peer(true, handle, context_id, entry->create_id);
+ if (result < VMCI_SUCCESS) {
+ pr_warn("Failed to notify peer (ID=0x%x) of "
+ "attach to queue pair (handle=0x%x:0x%x).",
+ entry->create_id, entry->qp.handle.context,
+ entry->qp.handle.resource);
+ }
+ }
+
+ result = VMCI_SUCCESS;
+ out:
+ mutex_unlock(&qp_broker_list.mutex);
+ return result;
+}
+
+/*
+ * Resets saved queue headers for the given QP broker
+ * entry. Should be used when guest memory becomes available
+ * again, or the guest detaches.
+ */
+static void qp_reset_saved_headers(struct qp_broker_entry *entry)
+{
+ entry->produce_q->saved_header = NULL;
+ entry->consume_q->saved_header = NULL;
+}
+
+/*
+ * The main entry point for detaching from a queue pair registered with the
+ * queue pair broker. If more than one endpoint is attached to the queue
+ * pair, the first endpoint will mainly decrement a reference count and
+ * generate a notification to its peer. The last endpoint will clean up
+ * the queue pair state registered with the broker.
+ *
+ * When a guest endpoint detaches, it will unmap and unregister the guest
+ * memory backing the queue pair. If the host is still attached, it will
+ * no longer be able to access the queue pair content.
+ *
+ * If the queue pair is already in a state where there is no memory
+ * registered for the queue pair (any *_NO_MEM state), it will transition to
+ * the VMCIQPB_SHUTDOWN_NO_MEM state. This will also happen, if a guest
+ * endpoint is the first of two endpoints to detach. If the host endpoint is
+ * the first out of two to detach, the queue pair will move to the
+ * VMCIQPB_SHUTDOWN_MEM state.
+ */
+int vmci_qp_broker_detach(struct vmci_handle handle, struct vmci_ctx *context)
+{
+ struct qp_broker_entry *entry;
+ const u32 context_id = vmci_ctx_get_id(context);
+ u32 peer_id;
+ bool is_local = false;
+ int result;
+
+ if (VMCI_HANDLE_INVALID(handle) || !context ||
+ context_id == VMCI_INVALID_ID) {
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ mutex_lock(&qp_broker_list.mutex);
+
+ if (!vmci_ctx_qp_exists(context, handle)) {
+ pr_devel("Context (ID=0x%x) not attached to queue pair "
+ "(handle=0x%x:0x%x).", context_id, handle.context,
+ handle.resource);
+ result = VMCI_ERROR_NOT_FOUND;
+ goto out;
+ }
+
+ entry = qp_broker_handle_to_entry(handle);
+ if (!entry) {
+ pr_devel("Context (ID=0x%x) reports being attached to "
+ "queue pair(handle=0x%x:0x%x) that isn't present "
+ "in broker.", context_id, handle.context,
+ handle.resource);
+ result = VMCI_ERROR_NOT_FOUND;
+ goto out;
+ }
+
+ if (context_id != entry->create_id && context_id != entry->attach_id) {
+ result = VMCI_ERROR_QUEUEPAIR_NOTATTACHED;
+ goto out;
+ }
+
+ if (context_id == entry->create_id) {
+ peer_id = entry->attach_id;
+ entry->create_id = VMCI_INVALID_ID;
+ } else {
+ peer_id = entry->create_id;
+ entry->attach_id = VMCI_INVALID_ID;
+ }
+ entry->qp.ref_count--;
+
+ is_local = entry->qp.flags & VMCI_QPFLAG_LOCAL;
+
+ if (context_id != VMCI_HOST_CONTEXT_ID) {
+ bool headers_mapped;
+
+ ASSERT(!is_local);
+
+ /*
+ * Pre NOVMVM vmx'en may detach from a queue pair
+ * before setting the page store, and in that case
+ * there is no user memory to detach from. Also, more
+ * recent VMX'en may detach from a queue pair in the
+ * quiesced state.
+ */
+
+ qp_acquire_queue_mutex(entry->produce_q);
+ headers_mapped = entry->produce_q->q_header ||
+ entry->consume_q->q_header;
+ if (QPBROKERSTATE_HAS_MEM(entry)) {
+ result =
+ qp_host_unmap_queues(INVALID_VMCI_GUEST_MEM_ID,
+ entry->produce_q,
+ entry->consume_q);
+ if (result < VMCI_SUCCESS)
+ pr_warn("Failed to unmap queue headers "
+ "for queue pair "
+ "(handle=0x%x:0x%x,result=%d).",
+ handle.context, handle.resource,
+ result);
+
+ if (entry->vmci_page_files)
+ qp_host_unregister_user_memory(entry->produce_q,
+ entry->
+ consume_q);
+ else
+ qp_host_unregister_user_memory(entry->produce_q,
+ entry->
+ consume_q);
+
+ }
+
+ if (!headers_mapped)
+ qp_reset_saved_headers(entry);
+
+ qp_release_queue_mutex(entry->produce_q);
+
+ if (!headers_mapped && entry->wakeup_cb)
+ entry->wakeup_cb(entry->client_data);
+
+ } else {
+ if (entry->wakeup_cb) {
+ entry->wakeup_cb = NULL;
+ entry->client_data = NULL;
+ }
+ }
+
+ if (entry->qp.ref_count == 0) {
+ qp_list_remove_entry(&qp_broker_list, &entry->qp);
+
+ if (is_local)
+ kfree(entry->local_mem);
+
+ qp_cleanup_queue_mutex(entry->produce_q, entry->consume_q);
+ qp_host_free_queue(entry->produce_q, entry->qp.produce_size);
+ qp_host_free_queue(entry->consume_q, entry->qp.consume_size);
+ /* Unlink from resource hash table and free callback */
+ vmci_resource_remove(&entry->resource);
+
+ kfree(entry);
+
+ vmci_ctx_qp_destroy(context, handle);
+ } else {
+ ASSERT(peer_id != VMCI_INVALID_ID);
+ qp_notify_peer(false, handle, context_id, peer_id);
+ if (context_id == VMCI_HOST_CONTEXT_ID &&
+ QPBROKERSTATE_HAS_MEM(entry)) {
+ entry->state = VMCIQPB_SHUTDOWN_MEM;
+ } else {
+ entry->state = VMCIQPB_SHUTDOWN_NO_MEM;
+ }
+
+ if (!is_local)
+ vmci_ctx_qp_destroy(context, handle);
+
+ }
+ result = VMCI_SUCCESS;
+ out:
+ mutex_unlock(&qp_broker_list.mutex);
+ return result;
+}
+
+/*
+ * Establishes the necessary mappings for a queue pair given a
+ * reference to the queue pair guest memory. This is usually
+ * called when a guest is unquiesced and the VMX is allowed to
+ * map guest memory once again.
+ */
+int vmci_qp_broker_map(struct vmci_handle handle,
+ struct vmci_ctx *context,
+ u64 guest_mem)
+{
+ struct qp_broker_entry *entry;
+ const u32 context_id = vmci_ctx_get_id(context);
+ bool is_local = false;
+ int result;
+
+ if (VMCI_HANDLE_INVALID(handle) || !context ||
+ context_id == VMCI_INVALID_ID)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ mutex_lock(&qp_broker_list.mutex);
+
+ if (!vmci_ctx_qp_exists(context, handle)) {
+ pr_devel("Context (ID=0x%x) not attached to queue pair "
+ "(handle=0x%x:0x%x).", context_id, handle.context,
+ handle.resource);
+ result = VMCI_ERROR_NOT_FOUND;
+ goto out;
+ }
+
+ entry = qp_broker_handle_to_entry(handle);
+ if (!entry) {
+ pr_devel("Context (ID=0x%x) reports being attached to "
+ "queue pair (handle=0x%x:0x%x) that isn't present "
+ "in broker.", context_id, handle.context,
+ handle.resource);
+ result = VMCI_ERROR_NOT_FOUND;
+ goto out;
+ }
+
+ if (context_id != entry->create_id && context_id != entry->attach_id) {
+ result = VMCI_ERROR_QUEUEPAIR_NOTATTACHED;
+ goto out;
+ }
+
+ is_local = entry->qp.flags & VMCI_QPFLAG_LOCAL;
+ result = VMCI_SUCCESS;
+
+ if (context_id != VMCI_HOST_CONTEXT_ID) {
+ struct vmci_qp_page_store page_store;
+
+ ASSERT(entry->state == VMCIQPB_CREATED_NO_MEM ||
+ entry->state == VMCIQPB_SHUTDOWN_NO_MEM ||
+ entry->state == VMCIQPB_ATTACHED_NO_MEM);
+ ASSERT(!is_local);
+
+ page_store.pages = guest_mem;
+ page_store.len = QPE_NUM_PAGES(entry->qp);
+
+ qp_acquire_queue_mutex(entry->produce_q);
+ qp_reset_saved_headers(entry);
+ result =
+ qp_host_register_user_memory(&page_store,
+ entry->produce_q,
+ entry->consume_q);
+ qp_release_queue_mutex(entry->produce_q);
+ if (result == VMCI_SUCCESS) {
+ /* Move state from *_NO_MEM to *_MEM */
+
+ entry->state++;
+
+ ASSERT(entry->state == VMCIQPB_CREATED_MEM ||
+ entry->state == VMCIQPB_SHUTDOWN_MEM ||
+ entry->state == VMCIQPB_ATTACHED_MEM);
+
+ if (entry->wakeup_cb)
+ entry->wakeup_cb(entry->client_data);
+ }
+ }
+
+ out:
+ mutex_unlock(&qp_broker_list.mutex);
+ return result;
+}
+
+/*
+ * Saves a snapshot of the queue headers for the given QP broker
+ * entry. Should be used when guest memory is unmapped.
+ * Results:
+ * VMCI_SUCCESS on success, appropriate error code if guest memory
+ * can't be accessed..
+ */
+static int qp_save_headers(struct qp_broker_entry *entry)
+{
+ int result;
+
+ if (entry->produce_q->saved_header != NULL &&
+ entry->consume_q->saved_header != NULL) {
+ /*
+ * If the headers have already been saved, we don't need to do
+ * it again, and we don't want to map in the headers
+ * unnecessarily.
+ */
+
+ return VMCI_SUCCESS;
+ }
+
+ if (NULL == entry->produce_q->q_header ||
+ NULL == entry->consume_q->q_header) {
+ result = qp_host_map_queues(entry->produce_q, entry->consume_q);
+ if (result < VMCI_SUCCESS)
+ return result;
+ }
+
+ memcpy(&entry->saved_produce_q, entry->produce_q->q_header,
+ sizeof(entry->saved_produce_q));
+ entry->produce_q->saved_header = &entry->saved_produce_q;
+ memcpy(&entry->saved_consume_q, entry->consume_q->q_header,
+ sizeof(entry->saved_consume_q));
+ entry->consume_q->saved_header = &entry->saved_consume_q;
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Removes all references to the guest memory of a given queue pair, and
+ * will move the queue pair from state *_MEM to *_NO_MEM. It is usually
+ * called when a VM is being quiesced where access to guest memory should
+ * avoided.
+ */
+int vmci_qp_broker_unmap(struct vmci_handle handle,
+ struct vmci_ctx *context,
+ u32 gid)
+{
+ struct qp_broker_entry *entry;
+ const u32 context_id = vmci_ctx_get_id(context);
+ bool is_local = false;
+ int result;
+
+ if (VMCI_HANDLE_INVALID(handle) || !context ||
+ context_id == VMCI_INVALID_ID)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ mutex_lock(&qp_broker_list.mutex);
+
+ if (!vmci_ctx_qp_exists(context, handle)) {
+ pr_devel("Context (ID=0x%x) not attached to queue pair "
+ "(handle=0x%x:0x%x).", context_id,
+ handle.context, handle.resource);
+ result = VMCI_ERROR_NOT_FOUND;
+ goto out;
+ }
+
+ entry = qp_broker_handle_to_entry(handle);
+ if (!entry) {
+ pr_devel("Context (ID=0x%x) reports being attached to "
+ "queue pair (handle=0x%x:0x%x) that isn't present "
+ "in broker.", context_id, handle.context,
+ handle.resource);
+ result = VMCI_ERROR_NOT_FOUND;
+ goto out;
+ }
+
+ if (context_id != entry->create_id && context_id != entry->attach_id) {
+ result = VMCI_ERROR_QUEUEPAIR_NOTATTACHED;
+ goto out;
+ }
+
+ is_local = entry->qp.flags & VMCI_QPFLAG_LOCAL;
+
+ if (context_id != VMCI_HOST_CONTEXT_ID) {
+ ASSERT(entry->state != VMCIQPB_CREATED_NO_MEM &&
+ entry->state != VMCIQPB_SHUTDOWN_NO_MEM &&
+ entry->state != VMCIQPB_ATTACHED_NO_MEM);
+ ASSERT(!is_local);
+
+ qp_acquire_queue_mutex(entry->produce_q);
+ result = qp_save_headers(entry);
+ if (result < VMCI_SUCCESS)
+ pr_warn("Failed to save queue headers for "
+ "queue pair (handle=0x%x:0x%x,result=%d).",
+ handle.context, handle.resource, result);
+
+ qp_host_unmap_queues(gid, entry->produce_q, entry->consume_q);
+
+ /*
+ * On hosted, when we unmap queue pairs, the VMX will also
+ * unmap the guest memory, so we invalidate the previously
+ * registered memory. If the queue pair is mapped again at a
+ * later point in time, we will need to reregister the user
+ * memory with a possibly new user VA.
+ */
+ qp_host_unregister_user_memory(entry->produce_q,
+ entry->consume_q);
+
+ /*
+ * Move state from *_MEM to *_NO_MEM.
+ */
+ entry->state--;
+
+ qp_release_queue_mutex(entry->produce_q);
+ }
+
+ result = VMCI_SUCCESS;
+
+ out:
+ mutex_unlock(&qp_broker_list.mutex);
+ return result;
+}
+
+/*
+ * Destroys all guest queue pair endpoints. If active guest queue
+ * pairs still exist, hypercalls to attempt detach from these
+ * queue pairs will be made. Any failure to detach is silently
+ * ignored.
+ */
+void vmci_qp_guest_endpoints_exit(void)
+{
+ struct qp_entry *entry;
+ struct qp_guest_endpoint *ep;
+
+ mutex_lock(&qp_guest_endpoints.mutex);
+
+ while ((entry = qp_list_get_head(&qp_guest_endpoints))) {
+ ep = (struct qp_guest_endpoint *)entry;
+
+ /* Don't make a hypercall for local queue_pairs. */
+ if (!(entry->flags & VMCI_QPFLAG_LOCAL))
+ qp_detatch_hypercall(entry->handle);
+
+ /* We cannot fail the exit, so let's reset ref_count. */
+ entry->ref_count = 0;
+ qp_list_remove_entry(&qp_guest_endpoints, entry);
+
+ qp_guest_endpoint_destroy(ep);
+ }
+
+ mutex_unlock(&qp_guest_endpoints.mutex);
+}
+
+/*
+ * Helper routine that will lock the queue pair before subsequent
+ * operations.
+ * Note: Non-blocking on the host side is currently only implemented in ESX.
+ * Since non-blocking isn't yet implemented on the host personality we
+ * have no reason to acquire a spin lock. So to avoid the use of an
+ * unnecessary lock only acquire the mutex if we can block.
+ * Note: It is assumed that QPFLAG_PINNED implies QPFLAG_NONBLOCK. Therefore
+ * we can use the same locking function for access to both the queue
+ * and the queue headers as it is the same logic. Assert this behvior.
+ */
+static void qp_lock(const struct vmci_qp *qpair)
+{
+ ASSERT(!vmci_qp_pinned(qpair->flags) ||
+ (vmci_qp_pinned(qpair->flags) && !vmci_can_block(qpair->flags)));
+
+ if (vmci_can_block(qpair->flags))
+ qp_acquire_queue_mutex(qpair->produce_q);
+}
+
+/*
+ * Helper routine that unlocks the queue pair after calling
+ * qp_lock. Respects non-blocking and pinning flags.
+ */
+static void qp_unlock(const struct vmci_qp *qpair)
+{
+ if (vmci_can_block(qpair->flags))
+ qp_release_queue_mutex(qpair->produce_q);
+}
+
+/*
+ * The queue headers may not be mapped at all times. If a queue is
+ * currently not mapped, it will be attempted to do so.
+ */
+static int qp_map_queue_headers(struct vmci_queue *produce_q,
+ struct vmci_queue *consume_q,
+ bool can_block)
+{
+ int result;
+
+ if (NULL == produce_q->q_header || NULL == consume_q->q_header) {
+ if (can_block)
+ result = qp_host_map_queues(produce_q, consume_q);
+ else
+ result = VMCI_ERROR_QUEUEPAIR_NOT_READY;
+
+ if (result < VMCI_SUCCESS)
+ return (produce_q->saved_header &&
+ consume_q->saved_header) ?
+ VMCI_ERROR_QUEUEPAIR_NOT_READY :
+ VMCI_ERROR_QUEUEPAIR_NOTATTACHED;
+ }
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Helper routine that will retrieve the produce and consume
+ * headers of a given queue pair. If the guest memory of the
+ * queue pair is currently not available, the saved queue headers
+ * will be returned, if these are available.
+ */
+static int qp_get_queue_headers(const struct vmci_qp *qpair,
+ struct vmci_queue_header **produce_q_header,
+ struct vmci_queue_header **consume_q_header)
+{
+ int result;
+
+ result = qp_map_queue_headers(qpair->produce_q, qpair->consume_q,
+ vmci_can_block(qpair->flags));
+ if (result == VMCI_SUCCESS) {
+ *produce_q_header = qpair->produce_q->q_header;
+ *consume_q_header = qpair->consume_q->q_header;
+ } else if (qpair->produce_q->saved_header &&
+ qpair->consume_q->saved_header) {
+ ASSERT(!qpair->guest_endpoint);
+ *produce_q_header = qpair->produce_q->saved_header;
+ *consume_q_header = qpair->consume_q->saved_header;
+ result = VMCI_SUCCESS;
+ }
+
+ return result;
+}
+
+/*
+ * Callback from VMCI queue pair broker indicating that a queue
+ * pair that was previously not ready, now either is ready or
+ * gone forever.
+ */
+static int qp_wakeup_cb(void *client_data)
+{
+ struct vmci_qp *qpair = (struct vmci_qp *)client_data;
+
+ qp_lock(qpair);
+ while (qpair->blocked > 0) {
+ qpair->blocked--;
+ qpair->generation++;
+ wake_up(&qpair->event);
+ }
+ qp_unlock(qpair);
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Makes the calling thread wait for the queue pair to become
+ * ready for host side access. Returns true when thread is
+ * woken up after queue pair state change, false otherwise.
+ */
+static bool qp_wait_for_ready_queue(struct vmci_qp *qpair)
+{
+ unsigned int generation;
+
+ if (unlikely(qpair->guest_endpoint))
+ ASSERT(false);
+
+ if (qpair->flags & VMCI_QPFLAG_NONBLOCK)
+ return false;
+
+ qpair->blocked++;
+ generation = qpair->generation;
+ qp_unlock(qpair);
+ wait_event(qpair->event, generation != qpair->generation);
+ qp_lock(qpair);
+
+ return true;
+}
+
+/*
+ * Enqueues a given buffer to the produce queue using the provided
+ * function. As many bytes as possible (space available in the queue)
+ * are enqueued. Assumes the queue->mutex has been acquired. Returns
+ * VMCI_ERROR_QUEUEPAIR_NOSPACE if no space was available to enqueue
+ * data, VMCI_ERROR_INVALID_SIZE, if any queue pointer is outside the
+ * queue (as defined by the queue size), VMCI_ERROR_INVALID_ARGS, if
+ * an error occured when accessing the buffer,
+ * VMCI_ERROR_QUEUEPAIR_NOTATTACHED, if the queue pair pages aren't
+ * available. Otherwise, the number of bytes written to the queue is
+ * returned. Updates the tail pointer of the produce queue.
+ */
+static ssize_t qp_enqueue_locked(struct vmci_queue *produce_q,
+ struct vmci_queue *consume_q,
+ const u64 produce_q_size,
+ const void *buf,
+ size_t buf_size,
+ vmci_memcpy_to_queue_func memcpy_to_queue,
+ bool can_block)
+{
+ s64 free_space;
+ u64 tail;
+ size_t written;
+ ssize_t result;
+
+ result = qp_map_queue_headers(produce_q, consume_q, can_block);
+ if (unlikely(result != VMCI_SUCCESS))
+ return result;
+
+ free_space = vmci_q_header_free_space(produce_q->q_header,
+ consume_q->q_header,
+ produce_q_size);
+ if (free_space == 0)
+ return VMCI_ERROR_QUEUEPAIR_NOSPACE;
+
+ if (free_space < VMCI_SUCCESS)
+ return (ssize_t) free_space;
+
+ written = (size_t) (free_space > buf_size ? buf_size : free_space);
+ tail = vmci_q_header_producer_tail(produce_q->q_header);
+ if (likely(tail + written < produce_q_size)) {
+ result = memcpy_to_queue(produce_q, tail, buf, 0, written);
+ } else {
+ /* Tail pointer wraps around. */
+
+ const size_t tmp = (size_t) (produce_q_size - tail);
+
+ result = memcpy_to_queue(produce_q, tail, buf, 0, tmp);
+ if (result >= VMCI_SUCCESS)
+ result = memcpy_to_queue(produce_q, 0, buf, tmp,
+ written - tmp);
+ }
+
+ if (result < VMCI_SUCCESS)
+ return result;
+
+ vmci_q_header_add_producer_tail(produce_q->q_header, written,
+ produce_q_size);
+ return written;
+}
+
+/*
+ * Dequeues data (if available) from the given consume queue. Writes data
+ * to the user provided buffer using the provided function.
+ * Assumes the queue->mutex has been acquired.
+ * Results:
+ * VMCI_ERROR_QUEUEPAIR_NODATA if no data was available to dequeue.
+ * VMCI_ERROR_INVALID_SIZE, if any queue pointer is outside the queue
+ * (as defined by the queue size).
+ * VMCI_ERROR_INVALID_ARGS, if an error occured when accessing the buffer.
+ * Otherwise the number of bytes dequeued is returned.
+ * Side effects:
+ * Updates the head pointer of the consume queue.
+ */
+static ssize_t qp_dequeue_locked(struct vmci_queue *produce_q,
+ struct vmci_queue *consume_q,
+ const u64 consume_q_size,
+ void *buf,
+ size_t buf_size,
+ vmci_memcpy_from_queue_func memcpy_from_queue,
+ bool update_consumer,
+ bool can_block)
+{
+ s64 buf_ready;
+ u64 head;
+ size_t read;
+ ssize_t result;
+
+ result = qp_map_queue_headers(produce_q, consume_q, can_block);
+ if (unlikely(result != VMCI_SUCCESS))
+ return result;
+
+ buf_ready = vmci_q_header_buf_ready(consume_q->q_header,
+ produce_q->q_header,
+ consume_q_size);
+ if (buf_ready == 0)
+ return VMCI_ERROR_QUEUEPAIR_NODATA;
+
+ if (buf_ready < VMCI_SUCCESS)
+ return (ssize_t) buf_ready;
+
+ read = (size_t) (buf_ready > buf_size ? buf_size : buf_ready);
+ head = vmci_q_header_consumer_head(produce_q->q_header);
+ if (likely(head + read < consume_q_size)) {
+ result = memcpy_from_queue(buf, 0, consume_q, head, read);
+ } else {
+ /* Head pointer wraps around. */
+
+ const size_t tmp = (size_t) (consume_q_size - head);
+
+ result = memcpy_from_queue(buf, 0, consume_q, head, tmp);
+ if (result >= VMCI_SUCCESS)
+ result = memcpy_from_queue(buf, tmp, consume_q, 0,
+ read - tmp);
+
+ }
+
+ if (result < VMCI_SUCCESS)
+ return result;
+
+ if (update_consumer)
+ vmci_q_header_add_consumer_head(produce_q->q_header,
+ read, consume_q_size);
+
+ return read;
+}
+
+/*
+ * vmci_qpair_alloc() - Allocates a queue pair.
+ * @qpair: Pointer for the new vmci_qp struct.
+ * @handle: Handle to track the resource.
+ * @produce_qsize: Desired size of the producer queue.
+ * @consume_qsize: Desired size of the consumer queue.
+ * @peer: ContextID of the peer.
+ * @flags: VMCI flags.
+ * @priv_flags: VMCI priviledge flags.
+ *
+ * This is the client interface for allocating the memory for a
+ * vmci_qp structure and then attaching to the underlying
+ * queue. If an error occurs allocating the memory for the
+ * vmci_qp structure no attempt is made to attach. If an
+ * error occurs attaching, then the structure is freed.
+ */
+int vmci_qpair_alloc(struct vmci_qp **qpair,
+ struct vmci_handle *handle,
+ u64 produce_qsize,
+ u64 consume_qsize,
+ u32 peer,
+ u32 flags,
+ u32 priv_flags)
+{
+ struct vmci_qp *my_qpair;
+ int retval;
+ struct vmci_handle src = VMCI_INVALID_HANDLE;
+ struct vmci_handle dst = vmci_make_handle(peer, VMCI_INVALID_ID);
+ enum vmci_route route;
+ vmci_event_release_cb wakeup_cb;
+ void *client_data;
+
+ /*
+ * Restrict the size of a queuepair. The device already
+ * enforces a limit on the total amount of memory that can be
+ * allocated to queuepairs for a guest. However, we try to
+ * allocate this memory before we make the queuepair
+ * allocation hypercall. On Linux, we allocate each page
+ * separately, which means rather than fail, the guest will
+ * thrash while it tries to allocate, and will become
+ * increasingly unresponsive to the point where it appears to
+ * be hung. So we place a limit on the size of an individual
+ * queuepair here, and leave the device to enforce the
+ * restriction on total queuepair memory. (Note that this
+ * doesn't prevent all cases; a user with only this much
+ * physical memory could still get into trouble.) The error
+ * used by the device is NO_RESOURCES, so use that here too.
+ */
+
+ if (produce_qsize + consume_qsize < max(produce_qsize, consume_qsize) ||
+ produce_qsize + consume_qsize > VMCI_MAX_GUEST_QP_MEMORY)
+ return VMCI_ERROR_NO_RESOURCES;
+
+ retval = vmci_route(&src, &dst, false, &route);
+ if (retval < VMCI_SUCCESS)
+ route = vmci_guest_code_active() ?
+ VMCI_ROUTE_AS_GUEST : VMCI_ROUTE_AS_HOST;
+
+ /* If NONBLOCK or PINNED is set, we better be the guest personality. */
+ if ((!vmci_can_block(flags) || vmci_qp_pinned(flags)) &&
+ VMCI_ROUTE_AS_GUEST != route) {
+ pr_devel("Not guest personality w/ NONBLOCK OR PINNED set");
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ /*
+ * Limit the size of pinned QPs and check sanity.
+ *
+ * Pinned pages implies non-blocking mode. Mutexes aren't acquired
+ * when the NONBLOCK flag is set in qpair code; and also should not be
+ * acquired when the PINNED flagged is set. Since pinning pages
+ * implies we want speed, it makes no sense not to have NONBLOCK
+ * set if PINNED is set. Hence enforce this implication.
+ */
+ if (vmci_qp_pinned(flags)) {
+ if (vmci_can_block(flags)) {
+ pr_err("Attempted to enable pinning w/o non-blocking");
+ return VMCI_ERROR_INVALID_ARGS;
+ }
+
+ if (produce_qsize + consume_qsize > VMCI_MAX_PINNED_QP_MEMORY)
+ return VMCI_ERROR_NO_RESOURCES;
+ }
+
+ my_qpair = kzalloc(sizeof(*my_qpair), GFP_KERNEL);
+ if (!my_qpair)
+ return VMCI_ERROR_NO_MEM;
+
+ my_qpair->produce_q_size = produce_qsize;
+ my_qpair->consume_q_size = consume_qsize;
+ my_qpair->peer = peer;
+ my_qpair->flags = flags;
+ my_qpair->priv_flags = priv_flags;
+
+ wakeup_cb = NULL;
+ client_data = NULL;
+
+ if (VMCI_ROUTE_AS_HOST == route) {
+ my_qpair->guest_endpoint = false;
+ if (!(flags & VMCI_QPFLAG_LOCAL)) {
+ my_qpair->blocked = 0;
+ my_qpair->generation = 0;
+ init_waitqueue_head(&my_qpair->event);
+ wakeup_cb = qp_wakeup_cb;
+ client_data = (void *)my_qpair;
+ }
+ } else {
+ my_qpair->guest_endpoint = true;
+ }
+
+ retval = vmci_qp_alloc(handle,
+ &my_qpair->produce_q,
+ my_qpair->produce_q_size,
+ &my_qpair->consume_q,
+ my_qpair->consume_q_size,
+ my_qpair->peer,
+ my_qpair->flags,
+ my_qpair->priv_flags,
+ my_qpair->guest_endpoint,
+ wakeup_cb, client_data);
+
+ if (retval < VMCI_SUCCESS) {
+ kfree(my_qpair);
+ return retval;
+ }
+
+ *qpair = my_qpair;
+ my_qpair->handle = *handle;
+
+ return retval;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_alloc);
+
+/*
+ * vmci_qpair_detach() - Detatches the client from a queue pair.
+ * @qpair: Reference of a pointer to the qpair struct.
+ *
+ * This is the client interface for detaching from a VMCIQPair.
+ * Note that this routine will free the memory allocated for the
+ * vmci_qp structure too.
+ */
+int vmci_qpair_detach(struct vmci_qp **qpair)
+{
+ int result;
+ struct vmci_qp *old_qpair;
+
+ if (!qpair || !(*qpair))
+ return VMCI_ERROR_INVALID_ARGS;
+
+ old_qpair = *qpair;
+ result = qp_detatch(old_qpair->handle, old_qpair->guest_endpoint);
+
+ /*
+ * The guest can fail to detach for a number of reasons, and
+ * if it does so, it will cleanup the entry (if there is one).
+ * The host can fail too, but it won't cleanup the entry
+ * immediately, it will do that later when the context is
+ * freed. Either way, we need to release the qpair struct
+ * here; there isn't much the caller can do, and we don't want
+ * to leak.
+ */
+
+ memset(old_qpair, 0, sizeof(*old_qpair));
+ old_qpair->handle = VMCI_INVALID_HANDLE;
+ old_qpair->peer = VMCI_INVALID_ID;
+ kfree(old_qpair);
+ *qpair = NULL;
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_detach);
+
+/*
+ * vmci_qpair_get_produce_indexes() - Retrieves the indexes of the producer.
+ * @qpair: Pointer to the queue pair struct.
+ * @producer_tail: Reference used for storing producer tail index.
+ * @consumer_head: Reference used for storing the consumer head index.
+ *
+ * This is the client interface for getting the current indexes of the
+ * QPair from the point of the view of the caller as the producer.
+ */
+int vmci_qpair_get_produce_indexes(const struct vmci_qp *qpair,
+ u64 *producer_tail,
+ u64 *consumer_head)
+{
+ struct vmci_queue_header *produce_q_header;
+ struct vmci_queue_header *consume_q_header;
+ int result;
+
+ if (!qpair)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ qp_lock(qpair);
+ result =
+ qp_get_queue_headers(qpair, &produce_q_header, &consume_q_header);
+ if (result == VMCI_SUCCESS)
+ vmci_q_header_get_pointers(produce_q_header, consume_q_header,
+ producer_tail, consumer_head);
+ qp_unlock(qpair);
+
+ if (result == VMCI_SUCCESS &&
+ ((producer_tail && *producer_tail >= qpair->produce_q_size) ||
+ (consumer_head && *consumer_head >= qpair->produce_q_size)))
+ return VMCI_ERROR_INVALID_SIZE;
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_get_produce_indexes);
+
+/*
+ * vmci_qpair_get_consume_indexes() - Retrieves the indexes of the comsumer.
+ * @qpair: Pointer to the queue pair struct.
+ * @consumer_tail: Reference used for storing consumer tail index.
+ * @producer_head: Reference used for storing the producer head index.
+ *
+ * This is the client interface for getting the current indexes of the
+ * QPair from the point of the view of the caller as the consumer.
+ */
+int vmci_qpair_get_consume_indexes(const struct vmci_qp *qpair,
+ u64 *consumer_tail,
+ u64 *producer_head)
+{
+ struct vmci_queue_header *produce_q_header;
+ struct vmci_queue_header *consume_q_header;
+ int result;
+
+ if (!qpair)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ qp_lock(qpair);
+ result =
+ qp_get_queue_headers(qpair, &produce_q_header, &consume_q_header);
+ if (result == VMCI_SUCCESS)
+ vmci_q_header_get_pointers(consume_q_header, produce_q_header,
+ consumer_tail, producer_head);
+ qp_unlock(qpair);
+
+ if (result == VMCI_SUCCESS &&
+ ((consumer_tail && *consumer_tail >= qpair->consume_q_size) ||
+ (producer_head && *producer_head >= qpair->consume_q_size)))
+ return VMCI_ERROR_INVALID_SIZE;
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_get_consume_indexes);
+
+/*
+ * vmci_qpair_produce_free_space() - Retrieves free space in producer queue.
+ * @qpair: Pointer to the queue pair struct.
+ *
+ * This is the client interface for getting the amount of free
+ * space in the QPair from the point of the view of the caller as
+ * the producer which is the common case. Returns < 0 if err, else
+ * available bytes into which data can be enqueued if > 0.
+ */
+s64 vmci_qpair_produce_free_space(const struct vmci_qp *qpair)
+{
+ struct vmci_queue_header *produce_q_header;
+ struct vmci_queue_header *consume_q_header;
+ s64 result;
+
+ if (!qpair)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ qp_lock(qpair);
+ result =
+ qp_get_queue_headers(qpair, &produce_q_header, &consume_q_header);
+ if (result == VMCI_SUCCESS)
+ result = vmci_q_header_free_space(produce_q_header,
+ consume_q_header,
+ qpair->produce_q_size);
+ else
+ result = 0;
+
+ qp_unlock(qpair);
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_produce_free_space);
+
+/*
+ * vmci_qpair_consume_free_space() - Retrieves free space in consumer queue.
+ * @qpair: Pointer to the queue pair struct.
+ *
+ * This is the client interface for getting the amount of free
+ * space in the QPair from the point of the view of the caller as
+ * the consumer which is not the common case. Returns < 0 if err, else
+ * available bytes into which data can be enqueued if > 0.
+ */
+s64 vmci_qpair_consume_free_space(const struct vmci_qp *qpair)
+{
+ struct vmci_queue_header *produce_q_header;
+ struct vmci_queue_header *consume_q_header;
+ s64 result;
+
+ if (!qpair)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ qp_lock(qpair);
+ result =
+ qp_get_queue_headers(qpair, &produce_q_header, &consume_q_header);
+ if (result == VMCI_SUCCESS)
+ result = vmci_q_header_free_space(consume_q_header,
+ produce_q_header,
+ qpair->consume_q_size);
+ else
+ result = 0;
+
+ qp_unlock(qpair);
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_consume_free_space);
+
+/*
+ * vmci_qpair_produce_buf_ready() - Gets bytes ready to read from
+ * producer queue.
+ * @qpair: Pointer to the queue pair struct.
+ *
+ * This is the client interface for getting the amount of
+ * enqueued data in the QPair from the point of the view of the
+ * caller as the producer which is not the common case. Returns < 0 if err,
+ * else available bytes that may be read.
+ */
+s64 vmci_qpair_produce_buf_ready(const struct vmci_qp *qpair)
+{
+ struct vmci_queue_header *produce_q_header;
+ struct vmci_queue_header *consume_q_header;
+ s64 result;
+
+ if (!qpair)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ qp_lock(qpair);
+ result =
+ qp_get_queue_headers(qpair, &produce_q_header, &consume_q_header);
+ if (result == VMCI_SUCCESS)
+ result = vmci_q_header_buf_ready(produce_q_header,
+ consume_q_header,
+ qpair->produce_q_size);
+ else
+ result = 0;
+
+ qp_unlock(qpair);
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_produce_buf_ready);
+
+/*
+ * vmci_qpair_consume_buf_ready() - Gets bytes ready to read from
+ * consumer queue.
+ * @qpair: Pointer to the queue pair struct.
+ *
+ * This is the client interface for getting the amount of
+ * enqueued data in the QPair from the point of the view of the
+ * caller as the consumer which is the normal case. Returns < 0 if err,
+ * else available bytes that may be read.
+ */
+s64 vmci_qpair_consume_buf_ready(const struct vmci_qp *qpair)
+{
+ struct vmci_queue_header *produce_q_header;
+ struct vmci_queue_header *consume_q_header;
+ s64 result;
+
+ if (!qpair)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ qp_lock(qpair);
+ result =
+ qp_get_queue_headers(qpair, &produce_q_header, &consume_q_header);
+ if (result == VMCI_SUCCESS)
+ result = vmci_q_header_buf_ready(consume_q_header,
+ produce_q_header,
+ qpair->consume_q_size);
+ else
+ result = 0;
+
+ qp_unlock(qpair);
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_consume_buf_ready);
+
+/*
+ * vmci_qpair_enqueue() - Throw data on the queue.
+ * @qpair: Pointer to the queue pair struct.
+ * @buf: Pointer to buffer containing data
+ * @buf_size: Length of buffer.
+ * @buf_type: Buffer type (Unused).
+ *
+ * This is the client interface for enqueueing data into the queue.
+ * Returns number of bytes enqueued or < 0 on error.
+ */
+ssize_t vmci_qpair_enqueue(struct vmci_qp *qpair,
+ const void *buf,
+ size_t buf_size,
+ int buf_type)
+{
+ ssize_t result;
+
+ if (!qpair || !buf)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ qp_lock(qpair);
+
+ do {
+ result = qp_enqueue_locked(qpair->produce_q,
+ qpair->consume_q,
+ qpair->produce_q_size,
+ buf, buf_size,
+ qp_memcpy_to_queue,
+ vmci_can_block(qpair->flags));
+
+ if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY &&
+ !qp_wait_for_ready_queue(qpair))
+ result = VMCI_ERROR_WOULD_BLOCK;
+
+ } while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+ qp_unlock(qpair);
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_enqueue);
+
+/*
+ * vmci_qpair_dequeue() - Get data from the queue.
+ * @qpair: Pointer to the queue pair struct.
+ * @buf: Pointer to buffer for the data
+ * @buf_size: Length of buffer.
+ * @buf_type: Buffer type (Unused).
+ *
+ * This is the client interface for dequeueing data from the queue.
+ * Returns number of bytes dequeued or < 0 on error.
+ */
+ssize_t vmci_qpair_dequeue(struct vmci_qp *qpair,
+ void *buf,
+ size_t buf_size,
+ int buf_type)
+{
+ ssize_t result;
+
+ if (!qpair || !buf)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ qp_lock(qpair);
+
+ do {
+ result = qp_dequeue_locked(qpair->produce_q,
+ qpair->consume_q,
+ qpair->consume_q_size,
+ buf, buf_size,
+ qp_memcpy_from_queue, true,
+ vmci_can_block(qpair->flags));
+
+ if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY &&
+ !qp_wait_for_ready_queue(qpair))
+ result = VMCI_ERROR_WOULD_BLOCK;
+
+ } while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+ qp_unlock(qpair);
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_dequeue);
+
+/*
+ * vmci_qpair_peek() - Peek at the data in the queue.
+ * @qpair: Pointer to the queue pair struct.
+ * @buf: Pointer to buffer for the data
+ * @buf_size: Length of buffer.
+ * @buf_type: Buffer type (Unused on Linux).
+ *
+ * This is the client interface for peeking into a queue. (I.e.,
+ * copy data from the queue without updating the head pointer.)
+ * Returns number of bytes dequeued or < 0 on error.
+ */
+ssize_t vmci_qpair_peek(struct vmci_qp *qpair,
+ void *buf,
+ size_t buf_size,
+ int buf_type)
+{
+ ssize_t result;
+
+ if (!qpair || !buf)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ qp_lock(qpair);
+
+ do {
+ result = qp_dequeue_locked(qpair->produce_q,
+ qpair->consume_q,
+ qpair->consume_q_size,
+ buf, buf_size,
+ qp_memcpy_from_queue, false,
+ vmci_can_block(qpair->flags));
+
+ if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY &&
+ !qp_wait_for_ready_queue(qpair))
+ result = VMCI_ERROR_WOULD_BLOCK;
+
+ } while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+ qp_unlock(qpair);
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_peek);
+
+/*
+ * vmci_qpair_enquev() - Throw data on the queue using iov.
+ * @qpair: Pointer to the queue pair struct.
+ * @iov: Pointer to buffer containing data
+ * @iov_size: Length of buffer.
+ * @buf_type: Buffer type (Unused).
+ *
+ * This is the client interface for enqueueing data into the queue.
+ * This function uses IO vectors to handle the work. Returns number
+ * of bytes enqueued or < 0 on error.
+ */
+ssize_t vmci_qpair_enquev(struct vmci_qp *qpair,
+ void *iov,
+ size_t iov_size,
+ int buf_type)
+{
+ ssize_t result;
+
+ if (!qpair || !iov)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ qp_lock(qpair);
+
+ do {
+ result = qp_enqueue_locked(qpair->produce_q,
+ qpair->consume_q,
+ qpair->produce_q_size,
+ iov, iov_size,
+ qp_memcpy_to_queue_iov,
+ vmci_can_block(qpair->flags));
+
+ if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY &&
+ !qp_wait_for_ready_queue(qpair))
+ result = VMCI_ERROR_WOULD_BLOCK;
+
+ } while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+ qp_unlock(qpair);
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_enquev);
+
+/*
+ * vmci_qpair_dequev() - Get data from the queue using iov.
+ * @qpair: Pointer to the queue pair struct.
+ * @iov: Pointer to buffer for the data
+ * @iov_size: Length of buffer.
+ * @buf_type: Buffer type (Unused).
+ *
+ * This is the client interface for dequeueing data from the queue.
+ * This function uses IO vectors to handle the work. Returns number
+ * of bytes dequeued or < 0 on error.
+ */
+ssize_t vmci_qpair_dequev(struct vmci_qp *qpair,
+ void *iov,
+ size_t iov_size,
+ int buf_type)
+{
+ ssize_t result;
+
+ qp_lock(qpair);
+
+ if (!qpair || !iov)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ do {
+ result = qp_dequeue_locked(qpair->produce_q,
+ qpair->consume_q,
+ qpair->consume_q_size,
+ iov, iov_size,
+ qp_memcpy_from_queue_iov,
+ true, vmci_can_block(qpair->flags));
+
+ if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY &&
+ !qp_wait_for_ready_queue(qpair))
+ result = VMCI_ERROR_WOULD_BLOCK;
+
+ } while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+ qp_unlock(qpair);
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_dequev);
+
+/*
+ * vmci_qpair_peekv() - Peek at the data in the queue using iov.
+ * @qpair: Pointer to the queue pair struct.
+ * @iov: Pointer to buffer for the data
+ * @iov_size: Length of buffer.
+ * @buf_type: Buffer type (Unused on Linux).
+ *
+ * This is the client interface for peeking into a queue. (I.e.,
+ * copy data from the queue without updating the head pointer.)
+ * This function uses IO vectors to handle the work. Returns number
+ * of bytes peeked or < 0 on error.
+ */
+ssize_t vmci_qpair_peekv(struct vmci_qp *qpair,
+ void *iov,
+ size_t iov_size,
+ int buf_type)
+{
+ ssize_t result;
+
+ if (!qpair || !iov)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ qp_lock(qpair);
+
+ do {
+ result = qp_dequeue_locked(qpair->produce_q,
+ qpair->consume_q,
+ qpair->consume_q_size,
+ iov, iov_size,
+ qp_memcpy_from_queue_iov,
+ false, vmci_can_block(qpair->flags));
+
+ if (result == VMCI_ERROR_QUEUEPAIR_NOT_READY &&
+ !qp_wait_for_ready_queue(qpair))
+ result = VMCI_ERROR_WOULD_BLOCK;
+
+ } while (result == VMCI_ERROR_QUEUEPAIR_NOT_READY);
+
+ qp_unlock(qpair);
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_qpair_peekv);
diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.h b/drivers/misc/vmw_vmci/vmci_queue_pair.h
new file mode 100644
index 0000000..8d8d6a1
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_queue_pair.h
@@ -0,0 +1,191 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef _VMCI_QUEUE_PAIR_H_
+#define _VMCI_QUEUE_PAIR_H_
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/types.h>
+
+#include "vmci_context.h"
+
+/* Callback needed for correctly waiting on events. */
+typedef int (*vmci_event_release_cb) (void *client_data);
+
+/* Guest device port I/O. */
+struct PPNSet {
+ u64 num_produce_pages;
+ u64 num_consume_pages;
+ u32 *produce_ppns;
+ u32 *consume_ppns;
+ bool initialized;
+};
+
+/* VMCIqueue_pairAllocInfo */
+struct vmci_qp_alloc_info {
+ struct vmci_handle handle;
+ u32 peer;
+ u32 flags;
+ u64 produce_size;
+ u64 consume_size;
+ u64 ppn_va; /* Start VA of queue pair PPNs. */
+ u64 num_ppns;
+ s32 result;
+ u32 version;
+};
+
+/* VMCIqueue_pairSetVAInfo */
+struct vmci_qp_set_va_info {
+ struct vmci_handle handle;
+ u64 va; /* Start VA of queue pair PPNs. */
+ u64 num_ppns;
+ u32 version;
+ s32 result;
+};
+
+/*
+ * For backwards compatibility, here is a version of the
+ * VMCIqueue_pairPageFileInfo before host support end-points was added.
+ * Note that the current version of that structure requires VMX to
+ * pass down the VA of the mapped file. Before host support was added
+ * there was nothing of the sort. So, when the driver sees the ioctl
+ * with a parameter that is the sizeof
+ * VMCIqueue_pairPageFileInfo_NoHostQP then it can infer that the version
+ * of VMX running can't attach to host end points because it doesn't
+ * provide the VA of the mapped files.
+ *
+ * The Linux driver doesn't get an indication of the size of the
+ * structure passed down from user space. So, to fix a long standing
+ * but unfiled bug, the _pad field has been renamed to version.
+ * Existing versions of VMX always initialize the PageFileInfo
+ * structure so that _pad, er, version is set to 0.
+ *
+ * A version value of 1 indicates that the size of the structure has
+ * been increased to include two UVA's: produce_uva and consume_uva.
+ * These UVA's are of the mmap()'d queue contents backing files.
+ *
+ * In addition, if when VMX is sending down the
+ * VMCIqueue_pairPageFileInfo structure it gets an error then it will
+ * try again with the _NoHostQP version of the file to see if an older
+ * VMCI kernel module is running.
+ */
+
+/* VMCIqueue_pairPageFileInfo */
+struct vmci_qp_page_file_info {
+ struct vmci_handle handle;
+ u64 produce_page_file; /* User VA. */
+ u64 consume_page_file; /* User VA. */
+ u64 produce_page_file_size; /* Size of the file name array. */
+ u64 consume_page_file_size; /* Size of the file name array. */
+ s32 result;
+ u32 version; /* Was _pad. */
+ u64 produce_va; /* User VA of the mapped file. */
+ u64 consume_va; /* User VA of the mapped file. */
+};
+
+/* vmci queuepair detach info */
+struct vmci_qp_dtch_info {
+ struct vmci_handle handle;
+ s32 result;
+ u32 _pad;
+};
+
+/*
+ * struct vmci_qp_page_store describes how the memory of a given queue pair
+ * is backed. When the queue pair is between the host and a guest, the
+ * page store consists of references to the guest pages. On vmkernel,
+ * this is a list of PPNs, and on hosted, it is a user VA where the
+ * queue pair is mapped into the VMX address space.
+ */
+struct vmci_qp_page_store {
+ /* Reference to pages backing the queue pair. */
+ u64 pages;
+ /* Length of pageList/virtual addres range (in pages). */
+ u32 len;
+};
+
+/*
+ * This data type contains the information about a queue.
+ * There are two queues (hence, queue pairs) per transaction model between a
+ * pair of end points, A & B. One queue is used by end point A to transmit
+ * commands and responses to B. The other queue is used by B to transmit
+ * commands and responses.
+ *
+ * struct vmci_queue_kern_if is a per-OS defined Queue structure. It contains
+ * either a direct pointer to the linear address of the buffer contents or a
+ * pointer to structures which help the OS locate those data pages. See
+ * vmciKernelIf.c for each platform for its definition.
+ */
+struct vmci_queue {
+ struct vmci_queue_header *q_header;
+ struct vmci_queue_header *saved_header;
+ struct vmci_queue_kern_if *kernel_if;
+};
+
+/*
+ * Utility function that checks whether the fields of the page
+ * store contain valid values.
+ * Result:
+ * true if the page store is wellformed. false otherwise.
+ */
+static inline bool
+VMCI_QP_PAGESTORE_IS_WELLFORMED(struct vmci_qp_page_store *page_store)
+{
+ return page_store->len >= 2;
+}
+
+/*
+ * Helper function to check if the non-blocking flag
+ * is set for a given queue pair.
+ */
+static inline bool vmci_can_block(u32 flags)
+{
+ return !(flags & VMCI_QPFLAG_NONBLOCK);
+}
+
+/*
+ * Helper function to check if the queue pair is pinned
+ * into memory.
+ */
+static inline bool vmci_qp_pinned(u32 flags)
+{
+ return flags & VMCI_QPFLAG_PINNED;
+}
+
+void vmci_qp_broker_exit(void);
+int vmci_qp_broker_alloc(struct vmci_handle handle, u32 peer,
+ u32 flags, u32 priv_flags,
+ u64 produce_size, u64 consume_size,
+ struct vmci_qp_page_store *page_store,
+ struct vmci_ctx *context);
+int vmci_qp_broker_set_page_store(struct vmci_handle handle,
+ u64 produce_uva, u64 consume_uva,
+ struct vmci_ctx *context);
+int vmci_qp_broker_detach(struct vmci_handle handle, struct vmci_ctx *context);
+
+void vmci_qp_guest_endpoints_exit(void);
+
+int vmci_qp_alloc(struct vmci_handle *handle,
+ struct vmci_queue **produce_q, u64 produce_size,
+ struct vmci_queue **consume_q, u64 consume_size,
+ u32 peer, u32 flags, u32 priv_flags,
+ bool guest_endpoint, vmci_event_release_cb wakeup_cb,
+ void *client_data);
+int vmci_qp_broker_map(struct vmci_handle handle,
+ struct vmci_ctx *context, u64 guest_mem);
+int vmci_qp_broker_unmap(struct vmci_handle handle,
+ struct vmci_ctx *context, u32 gid);
+
+#endif /* _VMCI_QUEUE_PAIR_H_ */
VMCI resource tracks all used resources within the vmci code.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/vmw_vmci/vmci_resource.c | 232 +++++++++++++++++++++++++++++++++
drivers/misc/vmw_vmci/vmci_resource.h | 59 ++++++++
2 files changed, 291 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/vmci_resource.c
create mode 100644 drivers/misc/vmw_vmci/vmci_resource.h
diff --git a/drivers/misc/vmw_vmci/vmci_resource.c b/drivers/misc/vmw_vmci/vmci_resource.c
new file mode 100644
index 0000000..b241dde
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_resource.c
@@ -0,0 +1,232 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/hash.h>
+#include <linux/types.h>
+#include <linux/rculist.h>
+
+#include "vmci_common_int.h"
+#include "vmci_resource.h"
+#include "vmci_driver.h"
+
+
+#define VMCI_RESOURCE_HASH_BITS 7
+#define VMCI_RESOURCE_HASH_BUCKETS (1 << VMCI_RESOURCE_HASH_BITS)
+
+struct vmci_hash_table {
+ spinlock_t lock;
+ struct hlist_head entries[VMCI_RESOURCE_HASH_BUCKETS];
+};
+
+static struct vmci_hash_table vmci_resource_table = {
+ .lock = __SPIN_LOCK_UNLOCKED(vmci_resource_table.lock),
+};
+
+static unsigned int vmci_resource_hash(struct vmci_handle handle)
+{
+ return hash_32(VMCI_HANDLE_TO_RESOURCE_ID(handle),
+ VMCI_RESOURCE_HASH_BITS);
+}
+
+/*
+ * Gets a resource (if one exists) matching given handle from the hash table.
+ */
+static struct vmci_resource *vmci_resource_lookup(struct vmci_handle handle)
+{
+ struct vmci_resource *r, *resource = NULL;
+ struct hlist_node *node;
+ unsigned int idx = vmci_resource_hash(handle);
+
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(r, node,
+ &vmci_resource_table.entries[idx], node) {
+ u32 rid = VMCI_HANDLE_TO_RESOURCE_ID(r->handle);
+ u32 cid = VMCI_HANDLE_TO_CONTEXT_ID(r->handle);
+
+ if (rid == VMCI_HANDLE_TO_RESOURCE_ID(handle) &&
+ (cid == VMCI_HANDLE_TO_CONTEXT_ID(handle) ||
+ cid == VMCI_INVALID_ID)) {
+ resource = r;
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return resource;
+}
+
+/*
+ * Find an unused resource ID and return it. The first
+ * VMCI_RESERVED_RESOURCE_ID_MAX are reserved so we start from
+ * its value + 1.
+ * Returns VMCI resource id on success, VMCI_INVALID_ID on failure.
+ */
+static u32 vmci_resource_find_id(u32 context_id)
+{
+ static u32 resource_id = VMCI_RESERVED_RESOURCE_ID_MAX + 1;
+ u32 old_rid = resource_id;
+ u32 current_rid;
+
+ /*
+ * Generate a unique resource ID. Keep on trying until we wrap around
+ * in the RID space.
+ */
+ BUG_ON(old_rid <= VMCI_RESERVED_RESOURCE_ID_MAX);
+
+ do {
+ struct vmci_handle handle;
+
+ current_rid = resource_id;
+ resource_id++;
+ if (unlikely(resource_id == VMCI_INVALID_ID)) {
+ /* Skip the reserved rids. */
+ resource_id = VMCI_RESERVED_RESOURCE_ID_MAX + 1;
+ }
+
+ handle = vmci_make_handle(context_id, current_rid);
+ if (!vmci_resource_lookup(handle))
+ return current_rid;
+ } while (resource_id != old_rid);
+
+ return VMCI_INVALID_ID;
+}
+
+
+int vmci_resource_add(struct vmci_resource *resource,
+ enum vmci_resource_type resource_type,
+ struct vmci_handle handle)
+
+{
+ unsigned int idx;
+ int result;
+
+ spin_lock(&vmci_resource_table.lock);
+
+ if (handle.resource == VMCI_INVALID_ID) {
+ handle.resource = vmci_resource_find_id(handle.context);
+ if (handle.resource == VMCI_INVALID_ID) {
+ result = VMCI_ERROR_NO_HANDLE;
+ goto out;
+ }
+ } else if (vmci_resource_lookup(handle)) {
+ result = VMCI_ERROR_ALREADY_EXISTS;
+ goto out;
+ }
+
+ resource->handle = handle;
+ resource->type = resource_type;
+ INIT_HLIST_NODE(&resource->node);
+ kref_init(&resource->kref);
+ init_completion(&resource->done);
+
+ idx = vmci_resource_hash(resource->handle);
+ BUG_ON(idx >= VMCI_RESOURCE_HASH_BUCKETS);
+ hlist_add_head_rcu(&resource->node, &vmci_resource_table.entries[idx]);
+
+ result = VMCI_SUCCESS;
+
+out:
+ spin_unlock(&vmci_resource_table.lock);
+ return result;
+}
+
+void vmci_resource_remove(struct vmci_resource *resource)
+{
+ struct vmci_handle handle = resource->handle;
+ unsigned int idx = vmci_resource_hash(handle);
+ struct vmci_resource *r;
+ struct hlist_node *node;
+
+ /* Remove resource from hash table. */
+ spin_lock(&vmci_resource_table.lock);
+
+ hlist_for_each_entry(r, node, &vmci_resource_table.entries[idx], node) {
+ if (VMCI_HANDLE_EQUAL(r->handle, resource->handle)) {
+ BUG_ON(r != resource);
+ hlist_del_init_rcu(&r->node);
+ break;
+ }
+ }
+
+ spin_unlock(&vmci_resource_table.lock);
+ synchronize_rcu();
+
+ vmci_resource_put(resource);
+ wait_for_completion(&resource->done);
+}
+
+struct vmci_resource *
+vmci_resource_by_handle(struct vmci_handle resource_handle,
+ enum vmci_resource_type resource_type)
+{
+ struct vmci_resource *r, *resource = NULL;
+
+ rcu_read_lock();
+
+ r = vmci_resource_lookup(resource_handle);
+ if (r &&
+ (resource_type == r->type ||
+ resource_type == VMCI_RESOURCE_TYPE_ANY)) {
+ resource = vmci_resource_get(r);
+ }
+
+ rcu_read_unlock();
+
+ return resource;
+}
+
+/*
+ * Get a reference to given resource.
+ */
+struct vmci_resource *vmci_resource_get(struct vmci_resource *resource)
+{
+ kref_get(&resource->kref);
+
+ return resource;
+}
+
+static void vmci_release_resource(struct kref *kref)
+{
+ struct vmci_resource *resource =
+ container_of(kref, struct vmci_resource, kref);
+
+ /* Verify the resource has been unlinked from hash table */
+ WARN_ON(!hlist_unhashed(&resource->node));
+
+ /* Signal that container of this resource can now be destroyed */
+ complete(&resource->done);
+}
+
+/*
+ * Resource's release function will get called if last reference.
+ * If it is the last reference, then we are sure that nobody else
+ * can increment the count again (it's gone from the resource hash
+ * table), so there's no need for locking here.
+ */
+int vmci_resource_put(struct vmci_resource *resource)
+{
+ /*
+ * We propagate the information back to caller in case it wants to know
+ * whether entry was freed.
+ */
+ return kref_put(&resource->kref, vmci_release_resource) ?
+ VMCI_SUCCESS_ENTRY_DEAD : VMCI_SUCCESS;
+}
+
+struct vmci_handle vmci_resource_handle(struct vmci_resource *resource)
+{
+ return resource->handle;
+}
diff --git a/drivers/misc/vmw_vmci/vmci_resource.h b/drivers/misc/vmw_vmci/vmci_resource.h
new file mode 100644
index 0000000..9190cd2
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_resource.h
@@ -0,0 +1,59 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef _VMCI_RESOURCE_H_
+#define _VMCI_RESOURCE_H_
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/types.h>
+
+#include "vmci_context.h"
+
+
+enum vmci_resource_type {
+ VMCI_RESOURCE_TYPE_ANY,
+ VMCI_RESOURCE_TYPE_API,
+ VMCI_RESOURCE_TYPE_GROUP,
+ VMCI_RESOURCE_TYPE_DATAGRAM,
+ VMCI_RESOURCE_TYPE_DOORBELL,
+ VMCI_RESOURCE_TYPE_QPAIR_GUEST,
+ VMCI_RESOURCE_TYPE_QPAIR_HOST
+};
+
+struct vmci_resource {
+ struct vmci_handle handle;
+ enum vmci_resource_type type;
+ struct hlist_node node;
+ struct kref kref;
+ struct completion done;
+};
+
+
+int vmci_resource_add(struct vmci_resource *resource,
+ enum vmci_resource_type resource_type,
+ struct vmci_handle handle);
+
+void vmci_resource_remove(struct vmci_resource *resource);
+
+struct vmci_resource *
+vmci_resource_by_handle(struct vmci_handle resource_handle,
+ enum vmci_resource_type resource_type);
+
+struct vmci_resource *vmci_resource_get(struct vmci_resource *resource);
+int vmci_resource_put(struct vmci_resource *resource);
+
+struct vmci_handle vmci_resource_handle(struct vmci_resource *resource);
+
+#endif /* _VMCI_RESOURCE_H_ */
VMCI routing code is responsible for routing between various hosts/guests as well as
routing in nested scenarios.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/vmw_vmci/vmci_route.c | 229 ++++++++++++++++++++++++++++++++++++
drivers/misc/vmw_vmci/vmci_route.h | 30 +++++
2 files changed, 259 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/vmci_route.c
create mode 100644 drivers/misc/vmw_vmci/vmci_route.h
diff --git a/drivers/misc/vmw_vmci/vmci_route.c b/drivers/misc/vmw_vmci/vmci_route.c
new file mode 100644
index 0000000..adf2f9f
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_route.c
@@ -0,0 +1,229 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/vmw_vmci_api.h>
+
+#include "vmci_common_int.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_route.h"
+
+/*
+ * Make a routing decision for the given source and destination handles.
+ * This will try to determine the route using the handles and the available
+ * devices. Will set the source context if it is invalid.
+ */
+int vmci_route(struct vmci_handle *src,
+ const struct vmci_handle *dst,
+ bool from_guest,
+ enum vmci_route *route)
+{
+ bool has_host_device = vmci_host_code_active();
+ bool has_guest_device = vmci_guest_code_active();
+
+ *route = VMCI_ROUTE_NONE;
+
+ /*
+ * "from_guest" is only ever set to true by
+ * IOCTL_VMCI_DATAGRAM_SEND (or by the vmkernel equivalent),
+ * which comes from the VMX, so we know it is coming from a
+ * guest.
+ *
+ * To avoid inconsistencies, test these once. We will test
+ * them again when we do the actual send to ensure that we do
+ * not touch a non-existent device.
+ */
+
+ /* Must have a valid destination context. */
+ if (VMCI_INVALID_ID == dst->context)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ /* Anywhere to hypervisor. */
+ if (VMCI_HYPERVISOR_CONTEXT_ID == dst->context) {
+
+ /*
+ * If this message already came from a guest then we
+ * cannot send it to the hypervisor. It must come
+ * from a local client.
+ */
+ if (from_guest)
+ return VMCI_ERROR_DST_UNREACHABLE;
+
+ /*
+ * We must be acting as a guest in order to send to
+ * the hypervisor.
+ */
+ if (!has_guest_device)
+ return VMCI_ERROR_DEVICE_NOT_FOUND;
+
+ /* And we cannot send if the source is the host context. */
+ if (VMCI_HOST_CONTEXT_ID == src->context)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ /*
+ * If the client passed the ANON source handle then
+ * respect it (both context and resource are invalid).
+ * However, if they passed only an invalid context,
+ * then they probably mean ANY, in which case we
+ * should set the real context here before passing it
+ * down.
+ */
+ if (VMCI_INVALID_ID == src->context &&
+ VMCI_INVALID_ID != src->resource)
+ src->context = vmci_get_context_id();
+
+ /* Send from local client down to the hypervisor. */
+ *route = VMCI_ROUTE_AS_GUEST;
+ return VMCI_SUCCESS;
+ }
+
+ /* Anywhere to local client on host. */
+ if (VMCI_HOST_CONTEXT_ID == dst->context) {
+ /*
+ * If it is not from a guest but we are acting as a
+ * guest, then we need to send it down to the host.
+ * Note that if we are also acting as a host then this
+ * will prevent us from sending from local client to
+ * local client, but we accept that restriction as a
+ * way to remove any ambiguity from the host context.
+ */
+ if (src->context == VMCI_HYPERVISOR_CONTEXT_ID) {
+ /*
+ * If the hypervisor is the source, this is
+ * host local communication. The hypervisor
+ * may send vmci event datagrams to the host
+ * itself, but it will never send datagrams to
+ * an "outer host" through the guest device.
+ */
+
+ if (has_host_device) {
+ *route = VMCI_ROUTE_AS_HOST;
+ return VMCI_SUCCESS;
+ } else {
+ return VMCI_ERROR_DEVICE_NOT_FOUND;
+ }
+ }
+
+ if (!from_guest && has_guest_device) {
+ /* If no source context then use the current. */
+ if (VMCI_INVALID_ID == src->context)
+ src->context = vmci_get_context_id();
+
+ /* Send it from local client down to the host. */
+ *route = VMCI_ROUTE_AS_GUEST;
+ return VMCI_SUCCESS;
+ }
+
+ /*
+ * Otherwise we already received it from a guest and
+ * it is destined for a local client on this host, or
+ * it is from another local client on this host. We
+ * must be acting as a host to service it.
+ */
+ if (!has_host_device)
+ return VMCI_ERROR_DEVICE_NOT_FOUND;
+
+ if (VMCI_INVALID_ID == src->context) {
+ /*
+ * If it came from a guest then it must have a
+ * valid context. Otherwise we can use the
+ * host context.
+ */
+ if (from_guest)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ src->context = VMCI_HOST_CONTEXT_ID;
+ }
+
+ /* Route to local client. */
+ *route = VMCI_ROUTE_AS_HOST;
+ return VMCI_SUCCESS;
+ }
+
+ /*
+ * If we are acting as a host then this might be destined for
+ * a guest.
+ */
+ if (has_host_device) {
+ /* It will have a context if it is meant for a guest. */
+ if (vmci_ctx_exists(dst->context)) {
+ if (VMCI_INVALID_ID == src->context) {
+ /*
+ * If it came from a guest then it
+ * must have a valid context.
+ * Otherwise we can use the host
+ * context.
+ */
+
+ if (from_guest)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ src->context = VMCI_HOST_CONTEXT_ID;
+ } else if (VMCI_CONTEXT_IS_VM(src->context) &&
+ src->context != dst->context) {
+ /*
+ * VM to VM communication is not
+ * allowed. Since we catch all
+ * communication destined for the host
+ * above, this must be destined for a
+ * VM since there is a valid context.
+ */
+
+ ASSERT(VMCI_CONTEXT_IS_VM(dst->context));
+
+ return VMCI_ERROR_DST_UNREACHABLE;
+ }
+
+ /* Pass it up to the guest. */
+ *route = VMCI_ROUTE_AS_HOST;
+ return VMCI_SUCCESS;
+ } else if (!has_guest_device) {
+ /*
+ * The host is attempting to reach a CID
+ * without an active context, and we can't
+ * send it down, since we have no guest
+ * device.
+ */
+
+ return VMCI_ERROR_DST_UNREACHABLE;
+ }
+ }
+
+ /*
+ * We must be a guest trying to send to another guest, which means
+ * we need to send it down to the host. We do not filter out VM to
+ * VM communication here, since we want to be able to use the guest
+ * driver on older versions that do support VM to VM communication.
+ */
+ if (!has_guest_device) {
+ /*
+ * Ending up here means we have neither guest nor host
+ * device.
+ */
+ return VMCI_ERROR_DEVICE_NOT_FOUND;
+ }
+
+ /* If no source context then use the current context. */
+ if (VMCI_INVALID_ID == src->context)
+ src->context = vmci_get_context_id();
+
+ /*
+ * Send it from local client down to the host, which will
+ * route it to the other guest for us.
+ */
+ *route = VMCI_ROUTE_AS_GUEST;
+ return VMCI_SUCCESS;
+}
diff --git a/drivers/misc/vmw_vmci/vmci_route.h b/drivers/misc/vmw_vmci/vmci_route.h
new file mode 100644
index 0000000..3b30e82
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_route.h
@@ -0,0 +1,30 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef _VMCI_ROUTE_H_
+#define _VMCI_ROUTE_H_
+
+#include <linux/vmw_vmci_defs.h>
+
+enum vmci_route {
+ VMCI_ROUTE_NONE,
+ VMCI_ROUTE_AS_HOST,
+ VMCI_ROUTE_AS_GUEST,
+};
+
+int vmci_route(struct vmci_handle *src, const struct vmci_handle *dst,
+ bool from_guest, enum vmci_route *route);
+
+#endif /* _VMCI_ROUTE_H_ */
VMCI guest side driver code implementation.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/vmw_vmci/vmci_guest.c | 762 ++++++++++++++++++++++++++++++++++++
1 files changed, 762 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/vmci_guest.c
diff --git a/drivers/misc/vmw_vmci/vmci_guest.c b/drivers/misc/vmw_vmci/vmci_guest.c
new file mode 100644
index 0000000..eedbe4d
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -0,0 +1,762 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/vmw_vmci_api.h>
+#include <linux/moduleparam.h>
+#include <linux/interrupt.h>
+#include <linux/highmem.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/init.h>
+#include <linux/pci.h>
+#include <linux/smp.h>
+#include <linux/io.h>
+
+#include "vmci_common_int.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+#define VMCI_UTIL_NUM_RESOURCES 1
+
+static bool vmci_disable_msi;
+module_param_named(disable_msi, vmci_disable_msi, bool, 0);
+MODULE_PARM_DESC(disable_msi, "Disable MSI use in driver - (default=0)");
+
+static bool vmci_disable_msix;
+module_param_named(disable_msix, vmci_disable_msix, bool, 0);
+MODULE_PARM_DESC(disable_msix, "Disable MSI-X use in driver - (default=0)");
+
+static u32 ctx_update_sub_id = VMCI_INVALID_ID;
+static u32 vm_context_id = VMCI_INVALID_ID;
+
+struct vmci_guest_device {
+ struct device *dev; /* PCI device we are attached to */
+ void __iomem *iobase;
+
+ unsigned int irq;
+ unsigned int intr_type;
+ bool exclusive_vectors;
+ struct msix_entry msix_entries[VMCI_MAX_INTRS];
+
+ struct tasklet_struct datagram_tasklet;
+ struct tasklet_struct bm_tasklet;
+
+ void *data_buffer;
+ void *notification_bitmap;
+};
+
+/* vmci_dev singleton device and supporting data*/
+static struct vmci_guest_device *vmci_dev_g;
+static DEFINE_SPINLOCK(vmci_dev_spinlock);
+
+static atomic_t vmci_num_guest_devices = ATOMIC_INIT(0);
+
+bool vmci_guest_code_active(void)
+{
+ return atomic_read(&vmci_num_guest_devices) != 0;
+}
+
+u32 vmci_get_vm_context_id(void)
+{
+ if (vm_context_id == VMCI_INVALID_ID) {
+ u32 result;
+ struct vmci_datagram get_cid_msg;
+ get_cid_msg.dst =
+ vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_GET_CONTEXT_ID);
+ get_cid_msg.src = VMCI_ANON_SRC_HANDLE;
+ get_cid_msg.payload_size = 0;
+ result = vmci_send_datagram(&get_cid_msg);
+ if (result >= 0)
+ vm_context_id = result;
+ }
+ return vm_context_id;
+}
+
+/*
+ * VM to hypervisor call mechanism. We use the standard VMware naming
+ * convention since shared code is calling this function as well.
+ */
+int vmci_send_datagram(struct vmci_datagram *dg)
+{
+ unsigned long flags;
+ int result;
+
+ /* Check args. */
+ if (dg == NULL)
+ return VMCI_ERROR_INVALID_ARGS;
+
+ /*
+ * Need to acquire spinlock on the device because the datagram
+ * data may be spread over multiple pages and the monitor may
+ * interleave device user rpc calls from multiple
+ * VCPUs. Acquiring the spinlock precludes that
+ * possibility. Disabling interrupts to avoid incoming
+ * datagrams during a "rep out" and possibly landing up in
+ * this function.
+ */
+ spin_lock_irqsave(&vmci_dev_spinlock, flags);
+
+ if (vmci_dev_g) {
+ iowrite8_rep(vmci_dev_g->iobase + VMCI_DATA_OUT_ADDR,
+ dg, VMCI_DG_SIZE(dg));
+ result = ioread32(vmci_dev_g->iobase + VMCI_RESULT_LOW_ADDR);
+ } else {
+ result = VMCI_ERROR_UNAVAILABLE;
+ }
+
+ spin_unlock_irqrestore(&vmci_dev_spinlock, flags);
+
+ return result;
+}
+EXPORT_SYMBOL_GPL(vmci_send_datagram);
+
+/*
+ * Gets called with the new context id if updated or resumed.
+ * Context id.
+ */
+static void vmci_guest_cid_update(u32 sub_id,
+ const struct vmci_event_data *event_data,
+ void *client_data)
+{
+ const struct vmci_event_payld_ctx *ev_payload =
+ vmci_event_data_const_payload(event_data);
+
+ if (sub_id != ctx_update_sub_id) {
+ pr_devel("Invalid subscriber (ID=0x%x).\n", sub_id);
+ return;
+ }
+
+ if (!event_data || ev_payload->context_id == VMCI_INVALID_ID) {
+ pr_devel("Invalid event data.\n");
+ return;
+ }
+
+ pr_devel("Updating context from (ID=0x%x) to (ID=0x%x) on event (type=%d).\n",
+ vm_context_id, ev_payload->context_id, event_data->event);
+
+ vm_context_id = ev_payload->context_id;
+}
+
+/*
+ * Verify that the host supports the hypercalls we need. If it does not,
+ * try to find fallback hypercalls and use those instead. Returns
+ * true if required hypercalls (or fallback hypercalls) are
+ * supported by the host, false otherwise.
+ */
+static bool vmci_check_host_caps(struct pci_dev *pdev)
+{
+ bool result;
+ struct vmci_resource_query_msg *msg;
+ u32 msg_size = sizeof(struct vmci_resource_query_hdr) +
+ VMCI_UTIL_NUM_RESOURCES * sizeof(u32);
+ struct vmci_datagram *check_msg;
+
+ check_msg = kmalloc(msg_size, GFP_KERNEL);
+ if (!check_msg) {
+ dev_err(&pdev->dev, "%s: Insufficient memory.\n", __func__);
+ return false;
+ }
+
+ check_msg->dst = vmci_make_handle(VMCI_HYPERVISOR_CONTEXT_ID,
+ VMCI_RESOURCES_QUERY);
+ check_msg->src = VMCI_ANON_SRC_HANDLE;
+ check_msg->payload_size = msg_size - VMCI_DG_HEADERSIZE;
+ msg = (struct vmci_resource_query_msg *)VMCI_DG_PAYLOAD(check_msg);
+
+ msg->num_resources = VMCI_UTIL_NUM_RESOURCES;
+ msg->resources[0] = VMCI_GET_CONTEXT_ID;
+
+ /* Checks that hyper calls are supported */
+ result = vmci_send_datagram(check_msg) == 0x01;
+ kfree(check_msg);
+
+ dev_dbg(&pdev->dev, "%s: Host capability check: %s.\n",
+ __func__, result ? "PASSED" : "FAILED");
+
+ /* We need the vector. There are no fallbacks. */
+ return result;
+}
+
+/*
+ * Reads datagrams from the data in port and dispatches them. We
+ * always start reading datagrams into only the first page of the
+ * datagram buffer. If the datagrams don't fit into one page, we
+ * use the maximum datagram buffer size for the remainder of the
+ * invocation. This is a simple heuristic for not penalizing
+ * small datagrams.
+ *
+ * This function assumes that it has exclusive access to the data
+ * in port for the duration of the call.
+ */
+static void vmci_dispatch_dgs(unsigned long data)
+{
+ struct vmci_guest_device *vmci_dev = (struct vmci_guest_device *)data;
+ u8 *dg_in_buffer = vmci_dev->data_buffer;
+ struct vmci_datagram *dg;
+ size_t dg_in_buffer_size = VMCI_MAX_DG_SIZE;
+ size_t current_dg_in_buffer_size = PAGE_SIZE;
+ size_t remaining_bytes;
+
+ BUILD_BUG_ON(VMCI_MAX_DG_SIZE < PAGE_SIZE);
+
+ ioread8_rep(vmci_dev->iobase + VMCI_DATA_IN_ADDR,
+ vmci_dev->data_buffer, current_dg_in_buffer_size);
+ dg = (struct vmci_datagram *)dg_in_buffer;
+ remaining_bytes = current_dg_in_buffer_size;
+
+ while (dg->dst.resource != VMCI_INVALID_ID ||
+ remaining_bytes > PAGE_SIZE) {
+ unsigned dg_in_size;
+
+ /*
+ * When the input buffer spans multiple pages, a datagram can
+ * start on any page boundary in the buffer.
+ */
+ if (dg->dst.resource == VMCI_INVALID_ID) {
+ ASSERT(remaining_bytes > PAGE_SIZE);
+ dg = (struct vmci_datagram *)roundup((uintptr_t)
+ dg + 1, PAGE_SIZE);
+ ASSERT((u8 *) dg <
+ dg_in_buffer + current_dg_in_buffer_size);
+ remaining_bytes =
+ (size_t) (dg_in_buffer + current_dg_in_buffer_size -
+ (u8 *) dg);
+ continue;
+ }
+
+ dg_in_size = VMCI_DG_SIZE_ALIGNED(dg);
+
+ if (dg_in_size <= dg_in_buffer_size) {
+ int result;
+
+ /*
+ * If the remaining bytes in the datagram
+ * buffer doesn't contain the complete
+ * datagram, we first make sure we have enough
+ * room for it and then we read the reminder
+ * of the datagram and possibly any following
+ * datagrams.
+ */
+ if (dg_in_size > remaining_bytes) {
+ if (remaining_bytes !=
+ current_dg_in_buffer_size) {
+
+ /*
+ * We move the partial
+ * datagram to the front and
+ * read the reminder of the
+ * datagram and possibly
+ * following calls into the
+ * following bytes.
+ */
+ memmove(dg_in_buffer, dg_in_buffer +
+ current_dg_in_buffer_size -
+ remaining_bytes,
+ remaining_bytes);
+ dg = (struct vmci_datagram *)
+ dg_in_buffer;
+ }
+
+ if (current_dg_in_buffer_size !=
+ dg_in_buffer_size)
+ current_dg_in_buffer_size =
+ dg_in_buffer_size;
+
+ ioread8_rep(vmci_dev->iobase +
+ VMCI_DATA_IN_ADDR,
+ vmci_dev->data_buffer +
+ remaining_bytes,
+ current_dg_in_buffer_size -
+ remaining_bytes);
+ }
+
+ /*
+ * We special case event datagrams from the
+ * hypervisor.
+ */
+ if (dg->src.context == VMCI_HYPERVISOR_CONTEXT_ID &&
+ dg->dst.resource == VMCI_EVENT_HANDLER) {
+ result = vmci_event_dispatch(dg);
+ } else {
+ result = vmci_datagram_invoke_guest_handler(dg);
+ }
+ if (result < VMCI_SUCCESS)
+ dev_dbg(vmci_dev->dev,
+ "Datagram with resource (ID=0x%x) failed (err=%d).\n",
+ dg->dst.resource, result);
+
+ /* On to the next datagram. */
+ dg = (struct vmci_datagram *)((u8 *) dg +
+ dg_in_size);
+ } else {
+ size_t bytes_to_skip;
+
+ /*
+ * Datagram doesn't fit in datagram buffer of maximal
+ * size. We drop it.
+ */
+ dev_dbg(vmci_dev->dev,
+ "Failed to receive datagram (size=%u bytes).\n",
+ dg_in_size);
+
+ bytes_to_skip = dg_in_size - remaining_bytes;
+ if (current_dg_in_buffer_size != dg_in_buffer_size)
+ current_dg_in_buffer_size = dg_in_buffer_size;
+
+ for (;;) {
+ ioread8_rep(vmci_dev->iobase +
+ VMCI_DATA_IN_ADDR,
+ vmci_dev->data_buffer,
+ current_dg_in_buffer_size);
+ if (bytes_to_skip <= current_dg_in_buffer_size)
+ break;
+
+ bytes_to_skip -= current_dg_in_buffer_size;
+ }
+ dg = (struct vmci_datagram *)(dg_in_buffer +
+ bytes_to_skip);
+ }
+
+ remaining_bytes =
+ (size_t) (dg_in_buffer + current_dg_in_buffer_size -
+ (u8 *) dg);
+
+ if (remaining_bytes < VMCI_DG_HEADERSIZE) {
+ /* Get the next batch of datagrams. */
+
+ ioread8_rep(vmci_dev->iobase + VMCI_DATA_IN_ADDR,
+ vmci_dev->data_buffer,
+ current_dg_in_buffer_size);
+ dg = (struct vmci_datagram *)dg_in_buffer;
+ remaining_bytes = current_dg_in_buffer_size;
+ }
+ }
+}
+
+/*
+ * Scans the notification bitmap for raised flags, clears them
+ * and handles the notifications.
+ */
+static void vmci_process_bitmap(unsigned long data)
+{
+ struct vmci_guest_device *dev = (struct vmci_guest_device *)data;
+
+ if (!dev->notification_bitmap) {
+ dev_dbg(dev->dev, "No bitmap present in %s.\n", __func__);
+ return;
+ }
+
+ vmci_dbell_scan_notification_entries(dev->notification_bitmap);
+}
+
+/*
+ * Enable MSI-X. Try exclusive vectors first, then shared vectors.
+ */
+static int vmci_enable_msix(struct pci_dev *pdev,
+ struct vmci_guest_device *vmci_dev)
+{
+ int i;
+ int result;
+
+ for (i = 0; i < VMCI_MAX_INTRS; ++i) {
+ vmci_dev->msix_entries[i].entry = i;
+ vmci_dev->msix_entries[i].vector = i;
+ }
+
+ result = pci_enable_msix(pdev, vmci_dev->msix_entries, VMCI_MAX_INTRS);
+ if (result == 0)
+ vmci_dev->exclusive_vectors = true;
+ else if (result > 0)
+ result = pci_enable_msix(pdev, vmci_dev->msix_entries, 1);
+
+ return result;
+}
+
+/*
+ * Interrupt handler for legacy or MSI interrupt, or for first MSI-X
+ * interrupt (vector VMCI_INTR_DATAGRAM).
+ */
+static irqreturn_t vmci_interrupt(int irq, void *_dev)
+{
+ struct vmci_guest_device *dev = _dev;
+
+ /*
+ * If we are using MSI-X with exclusive vectors then we simply schedule
+ * the datagram tasklet, since we know the interrupt was meant for us.
+ * Otherwise we must read the ICR to determine what to do.
+ */
+
+ if (dev->intr_type == VMCI_INTR_TYPE_MSIX && dev->exclusive_vectors) {
+ tasklet_schedule(&dev->datagram_tasklet);
+ } else {
+ unsigned int icr;
+
+ ASSERT(dev->intr_type == VMCI_INTR_TYPE_INTX ||
+ dev->intr_type == VMCI_INTR_TYPE_MSI);
+
+ /* Acknowledge interrupt and determine what needs doing. */
+ icr = ioread32(dev->iobase + VMCI_ICR_ADDR);
+ if (icr == 0 || icr == ~0)
+ return IRQ_NONE;
+
+ if (icr & VMCI_ICR_DATAGRAM) {
+ tasklet_schedule(&dev->datagram_tasklet);
+ icr &= ~VMCI_ICR_DATAGRAM;
+ }
+
+ if (icr & VMCI_ICR_NOTIFICATION) {
+ tasklet_schedule(&dev->bm_tasklet);
+ icr &= ~VMCI_ICR_NOTIFICATION;
+ }
+
+ if (icr != 0)
+ dev_warn(dev->dev,
+ "Ignoring unknown interrupt cause (%d).\n", icr);
+ }
+
+ return IRQ_HANDLED;
+}
+
+/*
+ * Interrupt handler for MSI-X interrupt vector VMCI_INTR_NOTIFICATION,
+ * which is for the notification bitmap. Will only get called if we are
+ * using MSI-X with exclusive vectors.
+ */
+static irqreturn_t vmci_interrupt_bm(int irq, void *_dev)
+{
+ struct vmci_guest_device *dev = _dev;
+
+ /* For MSI-X we can just assume it was meant for us. */
+ ASSERT(dev->intr_type == VMCI_INTR_TYPE_MSIX && dev->exclusive_vectors);
+ tasklet_schedule(&dev->bm_tasklet);
+
+ return IRQ_HANDLED;
+}
+
+/*
+ * Most of the initialization at module load time is done here.
+ */
+static int __devinit vmci_guest_probe_device(struct pci_dev *pdev,
+ const struct pci_device_id *id)
+{
+ struct vmci_guest_device *vmci_dev;
+ void __iomem *iobase;
+ unsigned int capabilities;
+ unsigned long cmd;
+ int vmci_err;
+ int error;
+
+ dev_dbg(&pdev->dev, "Probing for vmci/PCI guest device.\n");
+
+ error = pcim_enable_device(pdev);
+ if (error) {
+ dev_err(&pdev->dev,
+ "Failed to enable VMCI device: %d\n", error);
+ return error;
+ }
+
+ error = pcim_iomap_regions(pdev, 1 << 0, MODULE_NAME);
+ if (error) {
+ dev_err(&pdev->dev, "Failed to reserve/map IO regions.\n");
+ return error;
+ }
+
+ iobase = pcim_iomap_table(pdev)[0];
+
+ dev_info(&pdev->dev, "Found VMCI PCI device at %#lx, irq %u.\n",
+ (unsigned long)iobase, pdev->irq);
+
+ vmci_dev = devm_kzalloc(&pdev->dev, sizeof(*vmci_dev), GFP_KERNEL);
+ if (!vmci_dev) {
+ dev_err(&pdev->dev,
+ "Can't allocate memory for VMCI device.\n");
+ return -ENOMEM;
+ }
+
+ vmci_dev->dev = &pdev->dev;
+ vmci_dev->intr_type = VMCI_INTR_TYPE_INTX;
+ vmci_dev->exclusive_vectors = false;
+ vmci_dev->iobase = iobase;
+
+ tasklet_init(&vmci_dev->datagram_tasklet,
+ vmci_dispatch_dgs, (unsigned long)vmci_dev);
+ tasklet_init(&vmci_dev->bm_tasklet,
+ vmci_process_bitmap, (unsigned long)vmci_dev);
+
+ vmci_dev->data_buffer = vmalloc(VMCI_MAX_DG_SIZE);
+ if (!vmci_dev->data_buffer) {
+ dev_err(&pdev->dev,
+ "Can't allocate memory for datagram buffer.\n");
+ return -ENOMEM;
+ }
+
+ pci_set_master(pdev); /* To enable queue_pair functionality. */
+
+ /*
+ * Verify that the VMCI Device supports the capabilities that
+ * we need. If the device is missing capabilities that we would
+ * like to use, check for fallback capabilities and use those
+ * instead (so we can run a new VM on old hosts). Fail the load if
+ * a required capability is missing and there is no fallback.
+ *
+ * Right now, we need datagrams. There are no fallbacks.
+ */
+ capabilities = ioread32(vmci_dev->iobase + VMCI_CAPS_ADDR);
+ if (!(capabilities & VMCI_CAPS_DATAGRAM)) {
+ dev_err(&pdev->dev, "Device does not support datagrams.\n");
+ error = -ENXIO;
+ goto err_free_data_buffer;
+ }
+
+ /*
+ * If the hardware supports notifications, we will use that as
+ * well.
+ */
+ if (capabilities & VMCI_CAPS_NOTIFICATIONS) {
+ vmci_dev->notification_bitmap = vmalloc(PAGE_SIZE);
+ if (!vmci_dev->notification_bitmap) {
+ dev_warn(&pdev->dev,
+ "Unable to allocate notification bitmap.\n");
+ } else {
+ memset(vmci_dev->notification_bitmap, 0, PAGE_SIZE);
+ capabilities |= VMCI_CAPS_NOTIFICATIONS;
+ }
+ }
+
+ dev_info(&pdev->dev, "Using capabilities 0x%x.\n", capabilities);
+
+ /* Let the host know which capabilities we intend to use. */
+ iowrite32(capabilities, vmci_dev->iobase + VMCI_CAPS_ADDR);
+
+ /* Set up global device so that we can start sending datagrams */
+ spin_lock_irq(&vmci_dev_spinlock);
+ vmci_dev_g = vmci_dev;
+ spin_unlock_irq(&vmci_dev_spinlock);
+
+ /*
+ * Register notification bitmap with device if that capability is
+ * used.
+ */
+ if (capabilities & VMCI_CAPS_NOTIFICATIONS) {
+ struct page *page =
+ vmalloc_to_page(vmci_dev->notification_bitmap);
+ unsigned long bitmap_ppn = page_to_pfn(page);
+ if (!vmci_dbell_register_notification_bitmap(bitmap_ppn)) {
+ dev_warn(&pdev->dev,
+ "VMCI device unable to register notification bitmap with PPN 0x%x.\n",
+ (u32) bitmap_ppn);
+ goto err_remove_vmci_dev_g;
+ }
+ }
+
+ /* Check host capabilities. */
+ if (!vmci_check_host_caps(pdev))
+ goto err_remove_bitmap;
+
+ /* Enable device. */
+
+ /*
+ * We subscribe to the VMCI_EVENT_CTX_ID_UPDATE here so we can
+ * update the internal context id when needed.
+ */
+ vmci_err = vmci_event_subscribe(VMCI_EVENT_CTX_ID_UPDATE,
+ vmci_guest_cid_update, NULL,
+ &ctx_update_sub_id);
+ if (vmci_err < VMCI_SUCCESS)
+ dev_warn(&pdev->dev,
+ "Failed to subscribe to event (type=%d): %d\n",
+ VMCI_EVENT_CTX_ID_UPDATE, vmci_err);
+
+ /*
+ * Enable interrupts. Try MSI-X first, then MSI, and then fallback on
+ * legacy interrupts.
+ */
+ if (!vmci_disable_msix && !vmci_enable_msix(pdev, vmci_dev)) {
+ vmci_dev->intr_type = VMCI_INTR_TYPE_MSIX;
+ vmci_dev->irq = vmci_dev->msix_entries[0].vector;
+ } else if (!vmci_disable_msi && !pci_enable_msi(pdev)) {
+ vmci_dev->intr_type = VMCI_INTR_TYPE_MSI;
+ vmci_dev->irq = pdev->irq;
+ } else {
+ vmci_dev->intr_type = VMCI_INTR_TYPE_INTX;
+ vmci_dev->irq = pdev->irq;
+ }
+
+ /*
+ * Request IRQ for legacy or MSI interrupts, or for first
+ * MSI-X vector.
+ */
+ error = request_irq(vmci_dev->irq, vmci_interrupt, IRQF_SHARED,
+ MODULE_NAME, vmci_dev);
+ if (error) {
+ dev_err(&pdev->dev, "Irq %u in use: %d\n", vmci_dev->irq, error);
+ goto err_disable_msi;
+ }
+
+ /*
+ * For MSI-X with exclusive vectors we need to request an
+ * interrupt for each vector so that we get a separate
+ * interrupt handler routine. This allows us to distinguish
+ * between the vectors.
+ */
+ if (vmci_dev->exclusive_vectors) {
+ ASSERT(vmci_dev->intr_type == VMCI_INTR_TYPE_MSIX);
+ error = request_irq(vmci_dev->msix_entries[1].vector,
+ vmci_interrupt_bm, 0, MODULE_NAME,
+ vmci_dev);
+ if (error) {
+ dev_err(&pdev->dev,
+ "Failed to allocate irq %u: %d\n",
+ vmci_dev->msix_entries[1].vector, error);
+ goto err_free_irq;
+ }
+ }
+
+ dev_dbg(&pdev->dev, "Registered device.\n");
+
+ atomic_inc(&vmci_num_guest_devices);
+
+ /* Enable specific interrupt bits. */
+ cmd = VMCI_IMR_DATAGRAM;
+ if (capabilities & VMCI_CAPS_NOTIFICATIONS)
+ cmd |= VMCI_IMR_NOTIFICATION;
+ iowrite32(cmd, vmci_dev->iobase + VMCI_IMR_ADDR);
+
+ /* Enable interrupts. */
+ iowrite32(VMCI_CONTROL_INT_ENABLE,
+ vmci_dev->iobase + VMCI_CONTROL_ADDR);
+
+ pci_set_drvdata(pdev, vmci_dev);
+ return 0;
+
+err_free_irq:
+ free_irq(vmci_dev->irq, &vmci_dev);
+ tasklet_kill(&vmci_dev->datagram_tasklet);
+ tasklet_kill(&vmci_dev->bm_tasklet);
+
+err_disable_msi:
+ if (vmci_dev->intr_type == VMCI_INTR_TYPE_MSIX)
+ pci_disable_msix(pdev);
+ else if (vmci_dev->intr_type == VMCI_INTR_TYPE_MSI)
+ pci_disable_msi(pdev);
+
+ vmci_err = vmci_event_unsubscribe(ctx_update_sub_id);
+ if (vmci_err < VMCI_SUCCESS)
+ dev_warn(&pdev->dev,
+ "Failed to unsubscribe from event (type=%d) with subscriber (ID=0x%x): %d\n",
+ VMCI_EVENT_CTX_ID_UPDATE, ctx_update_sub_id, vmci_err);
+
+err_remove_bitmap:
+ if (vmci_dev->notification_bitmap) {
+ iowrite32(VMCI_CONTROL_RESET,
+ vmci_dev->iobase + VMCI_CONTROL_ADDR);
+ vfree(vmci_dev->notification_bitmap);
+ }
+
+err_remove_vmci_dev_g:
+ spin_lock_irq(&vmci_dev_spinlock);
+ vmci_dev_g = NULL;
+ spin_unlock_irq(&vmci_dev_spinlock);
+
+err_free_data_buffer:
+ vfree(vmci_dev->data_buffer);
+
+ /* The rest are managed resources and will be freed by PCI core */
+ return error;
+}
+
+static void __devexit vmci_guest_remove_device(struct pci_dev *pdev)
+{
+ struct vmci_guest_device *vmci_dev = pci_get_drvdata(pdev);
+ int vmci_err;
+
+ dev_dbg(&pdev->dev, "Removing device\n");
+
+ atomic_dec(&vmci_num_guest_devices);
+
+ vmci_qp_guest_endpoints_exit();
+
+ vmci_err = vmci_event_unsubscribe(ctx_update_sub_id);
+ if (vmci_err < VMCI_SUCCESS)
+ dev_warn(&pdev->dev,
+ "Failed to unsubscribe from event (type=%d) with subscriber (ID=0x%x): %d\n",
+ VMCI_EVENT_CTX_ID_UPDATE, ctx_update_sub_id, vmci_err);
+
+ spin_lock_irq(&vmci_dev_spinlock);
+ vmci_dev_g = NULL;
+ spin_unlock_irq(&vmci_dev_spinlock);
+
+ dev_dbg(&pdev->dev, "Resetting vmci device\n");
+ iowrite32(VMCI_CONTROL_RESET, vmci_dev->iobase + VMCI_CONTROL_ADDR);
+
+ /*
+ * Free IRQ and then disable MSI/MSI-X as appropriate. For
+ * MSI-X, we might have multiple vectors, each with their own
+ * IRQ, which we must free too.
+ */
+ free_irq(vmci_dev->irq, vmci_dev);
+ if (vmci_dev->intr_type == VMCI_INTR_TYPE_MSIX) {
+ if (vmci_dev->exclusive_vectors)
+ free_irq(vmci_dev->msix_entries[1].vector, vmci_dev);
+ pci_disable_msix(pdev);
+ } else if (vmci_dev->intr_type == VMCI_INTR_TYPE_MSI) {
+ pci_disable_msi(pdev);
+ }
+
+ tasklet_kill(&vmci_dev->datagram_tasklet);
+ tasklet_kill(&vmci_dev->bm_tasklet);
+
+ if (vmci_dev->notification_bitmap) {
+ /*
+ * The device reset above cleared the bitmap state of the
+ * device, so we can safely free it here.
+ */
+
+ vfree(vmci_dev->notification_bitmap);
+ }
+
+ vfree(vmci_dev->data_buffer);
+
+ /* The rest are managed resources and will be freed by PCI core */
+}
+
+static DEFINE_PCI_DEVICE_TABLE(vmci_ids) = {
+ { PCI_DEVICE(PCI_VENDOR_ID_VMWARE, PCI_DEVICE_ID_VMWARE_VMCI), },
+ { 0 },
+};
+MODULE_DEVICE_TABLE(pci, vmci_ids);
+
+static struct pci_driver vmci_guest_driver = {
+ .name = MODULE_NAME,
+ .id_table = vmci_ids,
+ .probe = vmci_guest_probe_device,
+ .remove = __devexit_p(vmci_guest_remove_device),
+};
+
+int __init vmci_guest_init(void)
+{
+ return pci_register_driver(&vmci_guest_driver);
+}
+
+void __exit vmci_guest_exit(void)
+{
+ pci_unregister_driver(&vmci_guest_driver);
+}
VMCI host side driver code implementation.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/vmw_vmci/vmci_host.c | 1033 +++++++++++++++++++++++++++++++++++++
1 files changed, 1033 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/vmci_host.c
diff --git a/drivers/misc/vmw_vmci/vmci_host.c b/drivers/misc/vmw_vmci/vmci_host.c
new file mode 100644
index 0000000..7580979
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_host.c
@@ -0,0 +1,1033 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include <linux/vmw_vmci_defs.h>
+#include <linux/vmw_vmci_api.h>
+#include <linux/moduleparam.h>
+#include <linux/miscdevice.h>
+#include <linux/interrupt.h>
+#include <linux/highmem.h>
+#include <linux/atomic.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/file.h>
+#include <linux/init.h>
+#include <linux/poll.h>
+#include <linux/pci.h>
+#include <linux/smp.h>
+#include <linux/fs.h>
+#include <linux/io.h>
+
+#include "vmci_handle_array.h"
+#include "vmci_common_int.h"
+#include "vmci_queue_pair.h"
+#include "vmci_datagram.h"
+#include "vmci_doorbell.h"
+#include "vmci_resource.h"
+#include "vmci_context.h"
+#include "vmci_driver.h"
+#include "vmci_event.h"
+
+#define VMCI_UTIL_NUM_RESOURCES 1
+
+enum {
+ VMCI_NOTIFY_RESOURCE_QUEUE_PAIR = 0,
+ VMCI_NOTIFY_RESOURCE_DOOR_BELL = 1,
+};
+
+enum {
+ VMCI_NOTIFY_RESOURCE_ACTION_NOTIFY = 0,
+ VMCI_NOTIFY_RESOURCE_ACTION_CREATE = 1,
+ VMCI_NOTIFY_RESOURCE_ACTION_DESTROY = 2,
+};
+
+/*
+ * VMCI driver initialization. This block can also be used to
+ * pass initial group membership etc.
+ */
+struct vmci_init_blk {
+ u32 cid;
+ u32 flags;
+};
+
+/* VMCIqueue_pairAllocInfo_VMToVM */
+struct vmci_qp_alloc_info_vmvm {
+ struct vmci_handle handle;
+ u32 peer;
+ u32 flags;
+ u64 produce_size;
+ u64 consume_size;
+ u64 produce_page_file; /* User VA. */
+ u64 consume_page_file; /* User VA. */
+ u64 produce_page_file_size; /* Size of the file name array. */
+ u64 consume_page_file_size; /* Size of the file name array. */
+ s32 result;
+ u32 _pad;
+};
+
+/* VMCISetNotifyInfo: Used to pass notify flag's address to the host driver. */
+struct vmci_set_notify_info {
+ u64 notify_uva;
+ s32 result;
+ u32 _pad;
+};
+
+/*
+ * Per-instance host state
+ */
+struct vmci_host_dev {
+ struct vmci_ctx *context;
+ int user_version;
+ enum vmci_obj_type ct_type;
+ struct mutex lock; /* Mutex lock for vmci context access */
+};
+
+static struct vmci_ctx *host_context;
+static bool vmci_host_device_initialized;
+static atomic_t vmci_host_active_users = ATOMIC_INIT(0);
+
+/*
+ * Determines whether the VMCI host personality is
+ * available. Since the core functionality of the host driver is
+ * always present, all guests could possibly use the host
+ * personality. However, to minimize the deviation from the
+ * pre-unified driver state of affairs, we only consider the host
+ * device active if there is no active guest device or if there
+ * are VMX'en with active VMCI contexts using the host device.
+ */
+bool vmci_host_code_active(void)
+{
+ return vmci_host_device_initialized &&
+ (!vmci_guest_code_active() ||
+ atomic_read(&vmci_host_active_users) > 0);
+}
+
+/*
+ * Called on open of /dev/vmci.
+ */
+static int vmci_host_open(struct inode *inode, struct file *filp)
+{
+ struct vmci_host_dev *vmci_host_dev;
+
+ vmci_host_dev = kzalloc(sizeof(struct vmci_host_dev), GFP_KERNEL);
+ if (vmci_host_dev == NULL)
+ return -ENOMEM;
+
+ vmci_host_dev->ct_type = VMCIOBJ_NOT_SET;
+ mutex_init(&vmci_host_dev->lock);
+ filp->private_data = vmci_host_dev;
+
+ return 0;
+}
+
+/*
+ * Called on close of /dev/vmci, most often when the process
+ * exits.
+ */
+static int vmci_host_close(struct inode *inode, struct file *filp)
+{
+ struct vmci_host_dev *vmci_host_dev = filp->private_data;
+
+ if (vmci_host_dev->ct_type == VMCIOBJ_CONTEXT) {
+ vmci_ctx_destroy(vmci_host_dev->context);
+ vmci_host_dev->context = NULL;
+
+ /*
+ * The number of active contexts is used to track whether any
+ * VMX'en are using the host personality. It is incremented when
+ * a context is created through the IOCTL_VMCI_INIT_CONTEXT
+ * ioctl.
+ */
+ atomic_dec(&vmci_host_active_users);
+ }
+ vmci_host_dev->ct_type = VMCIOBJ_NOT_SET;
+
+ kfree(vmci_host_dev);
+ filp->private_data = NULL;
+ return 0;
+}
+
+/*
+ * This is used to wake up the VMX when a VMCI call arrives, or
+ * to wake up select() or poll() at the next clock tick.
+ */
+static unsigned int vmci_host_poll(struct file *filp, poll_table *wait)
+{
+ struct vmci_host_dev *vmci_host_dev = filp->private_data;
+ struct vmci_ctx *context = vmci_host_dev->context;
+ unsigned int mask = 0;
+
+ if (vmci_host_dev->ct_type == VMCIOBJ_CONTEXT) {
+ /* Check for VMCI calls to this VM context. */
+ if (wait)
+ poll_wait(filp, &context->host_context.wait_queue,
+ wait);
+
+ spin_lock(&context->lock);
+ if (context->pending_datagrams > 0 ||
+ vmci_handle_arr_get_size(
+ context->pending_doorbell_array) > 0) {
+ mask = POLLIN;
+ }
+ spin_unlock(&context->lock);
+ }
+ return mask;
+}
+
+/*
+ * Copies the handles of a handle array into a user buffer, and
+ * returns the new length in userBufferSize. If the copy to the
+ * user buffer fails, the functions still returns VMCI_SUCCESS,
+ * but retval != 0.
+ */
+static int drv_cp_harray_to_user(void __user *user_buf_uva,
+ u64 *user_buf_size,
+ struct vmci_handle_arr *handle_array,
+ int *retval)
+{
+ u32 array_size = 0;
+ struct vmci_handle *handles;
+
+ if (handle_array)
+ array_size = vmci_handle_arr_get_size(handle_array);
+
+ if (array_size * sizeof(*handles) > *user_buf_size)
+ return VMCI_ERROR_MORE_DATA;
+
+ *user_buf_size = array_size * sizeof(*handles);
+ if (*user_buf_size)
+ *retval = copy_to_user(user_buf_uva,
+ vmci_handle_arr_get_handles
+ (handle_array), *user_buf_size);
+
+ return VMCI_SUCCESS;
+}
+
+/*
+ * Sets up a given context for notify to work. Calls drv_map_bool_ptr()
+ * which maps the notify boolean in user VA in kernel space.
+ */
+static int vmci_host_setup_notify(struct vmci_ctx *context,
+ unsigned long uva)
+{
+ struct page *page;
+ int retval;
+
+ if (context->notify_page) {
+ pr_devel("%s: Notify mechanism is already set up.\n", __func__);
+ return VMCI_ERROR_DUPLICATE_ENTRY;
+ }
+
+ if (!access_ok(VERIFY_WRITE, (void __user *)uva, sizeof(bool)))
+ return VMCI_ERROR_GENERIC;
+
+ /*
+ * Lock physical page backing a given user VA.
+ */
+ down_read(¤t->mm->mmap_sem);
+ retval = get_user_pages(current, current->mm,
+ PAGE_ALIGN(uva),
+ 1, 1, 0, &page, NULL);
+ up_read(¤t->mm->mmap_sem);
+ if (retval != 1)
+ return VMCI_ERROR_GENERIC;
+
+ /*
+ * Map the locked page and set up notify pointer.
+ */
+ context->notify = kmap(page) + (uva & (PAGE_SIZE - 1));
+ vmci_ctx_check_signal_notify(context);
+
+ return VMCI_SUCCESS;
+}
+
+static int vmci_host_get_version(struct vmci_host_dev *vmci_host_dev,
+ unsigned int cmd, void __user *uptr)
+{
+ if (cmd == IOCTL_VMCI_VERSION2) {
+ int __user *vptr = uptr;
+ if (get_user(vmci_host_dev->user_version, vptr))
+ return -EFAULT;
+ }
+
+ /*
+ * The basic logic here is:
+ *
+ * If the user sends in a version of 0 tell it our version.
+ * If the user didn't send in a version, tell it our version.
+ * If the user sent in an old version, tell it -its- version.
+ * If the user sent in an newer version, tell it our version.
+ *
+ * The rationale behind telling the caller its version is that
+ * Workstation 6.5 required that VMX and VMCI kernel module were
+ * version sync'd. All new VMX users will be programmed to
+ * handle the VMCI kernel module version.
+ */
+
+ if (vmci_host_dev->user_version > 0 &&
+ vmci_host_dev->user_version < VMCI_VERSION_HOSTQP) {
+ return vmci_host_dev->user_version;
+ }
+
+ return VMCI_VERSION;
+}
+
+#define vmci_ioctl_err(fmt, ...) \
+ pr_devel("%s: " fmt, ioctl_name, ##__VA_ARGS__)
+
+static int vmci_host_do_init_context(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_init_blk init_block;
+ const struct cred *cred;
+ int retval;
+
+ if (copy_from_user(&init_block, uptr, sizeof(init_block))) {
+ vmci_ioctl_err("error reading init block.\n");
+ return -EFAULT;
+ }
+
+ mutex_lock(&vmci_host_dev->lock);
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_NOT_SET) {
+ vmci_ioctl_err("received VMCI init on initialized handle.\n");
+ retval = -EINVAL;
+ goto out;
+ }
+
+ if (init_block.flags & ~VMCI_PRIVILEGE_FLAG_RESTRICTED) {
+ vmci_ioctl_err("unsupported VMCI restriction flag.\n");
+ retval = -EINVAL;
+ goto out;
+ }
+
+ cred = get_current_cred();
+ vmci_host_dev->context = vmci_ctx_create(init_block.cid,
+ init_block.flags, 0,
+ vmci_host_dev->user_version,
+ cred);
+ put_cred(cred);
+ if (IS_ERR(vmci_host_dev->context)) {
+ retval = PTR_ERR(vmci_host_dev->context);
+ vmci_ioctl_err("error initializing context.\n");
+ goto out;
+ }
+
+ /*
+ * Copy cid to userlevel, we do this to allow the VMX
+ * to enforce its policy on cid generation.
+ */
+ init_block.cid = vmci_ctx_get_id(vmci_host_dev->context);
+ if (copy_to_user(uptr, &init_block, sizeof(init_block))) {
+ vmci_ctx_destroy(vmci_host_dev->context);
+ vmci_host_dev->context = NULL;
+ vmci_ioctl_err("error writing init block.\n");
+ retval = -EFAULT;
+ goto out;
+ }
+
+ ASSERT(init_block.cid != VMCI_INVALID_ID);
+ vmci_host_dev->ct_type = VMCIOBJ_CONTEXT;
+ atomic_inc(&vmci_host_active_users);
+
+ retval = 0;
+
+out:
+ mutex_unlock(&vmci_host_dev->lock);
+ return retval;
+}
+
+static int vmci_host_do_send_datagram(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_datagram_snd_rcv_info send_info;
+ struct vmci_datagram *dg = NULL;
+ u32 cid;
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&send_info, uptr, sizeof(send_info)))
+ return -EFAULT;
+
+ if (send_info.len > VMCI_MAX_DG_SIZE) {
+ vmci_ioctl_err("datagram is too big (size=%d).\n",
+ send_info.len);
+ return -EINVAL;
+ }
+
+ if (send_info.len < sizeof(*dg)) {
+ vmci_ioctl_err("datagram is too small (size=%d).\n",
+ send_info.len);
+ return -EINVAL;
+ }
+
+ dg = kmalloc(send_info.len, GFP_KERNEL);
+ if (!dg) {
+ vmci_ioctl_err("cannot allocate memory to dispatch datagram.\n");
+ return -ENOMEM;
+ }
+
+ if (copy_from_user(dg, (void __user *)(uintptr_t)send_info.addr,
+ send_info.len)) {
+ vmci_ioctl_err("error getting datagram.\n");
+ kfree(dg);
+ return -EFAULT;
+ }
+
+ pr_devel("Datagram dst (handle=0x%x:0x%x) src (handle=0x%x:0x%x), payload (size=%llu bytes).\n",
+ dg->dst.context, dg->dst.resource,
+ dg->src.context, dg->src.resource,
+ (unsigned long long)dg->payload_size);
+
+ /* Get source context id. */
+ cid = vmci_ctx_get_id(vmci_host_dev->context);
+ ASSERT(cid != VMCI_INVALID_ID);
+ send_info.result = vmci_datagram_dispatch(cid, dg, true);
+ kfree(dg);
+
+ return copy_to_user(uptr, &send_info, sizeof(send_info)) ? -EFAULT : 0;
+}
+
+static int vmci_host_do_receive_datagram(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_datagram_snd_rcv_info recv_info;
+ struct vmci_datagram *dg = NULL;
+ int retval;
+ size_t size;
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&recv_info, uptr, sizeof(recv_info)))
+ return -EFAULT;
+
+ ASSERT(vmci_host_dev->ct_type == VMCIOBJ_CONTEXT);
+ size = recv_info.len;
+ recv_info.result = vmci_ctx_dequeue_datagram(vmci_host_dev->context,
+ &size, &dg);
+
+ if (recv_info.result >= VMCI_SUCCESS) {
+ void __user *ubuf = (void __user *)(uintptr_t)recv_info.addr;
+ retval = copy_to_user(ubuf, dg, VMCI_DG_SIZE(dg));
+ kfree(dg);
+ if (retval != 0)
+ return -EFAULT;
+ }
+
+ return copy_to_user(uptr, &recv_info, sizeof(recv_info)) ? -EFAULT : 0;
+}
+
+static int vmci_host_do_alloc_queuepair(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_handle handle;
+ int vmci_status;
+ int __user *retptr;
+ u32 cid;
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ cid = vmci_ctx_get_id(vmci_host_dev->context);
+
+ if (vmci_host_dev->user_version < VMCI_VERSION_NOVMVM) {
+ struct vmci_qp_alloc_info_vmvm alloc_info;
+ struct vmci_qp_alloc_info_vmvm __user *info = uptr;
+
+ if (copy_from_user(&alloc_info, uptr, sizeof(alloc_info)))
+ return -EFAULT;
+
+ handle = alloc_info.handle;
+ retptr = &info->result;
+
+ vmci_status = vmci_qp_broker_alloc(alloc_info.handle,
+ alloc_info.peer,
+ alloc_info.flags,
+ VMCI_NO_PRIVILEGE_FLAGS,
+ alloc_info.produce_size,
+ alloc_info.consume_size,
+ NULL,
+ vmci_host_dev->context);
+
+ if (vmci_status == VMCI_SUCCESS)
+ vmci_status = VMCI_SUCCESS_QUEUEPAIR_CREATE;
+ } else {
+ struct vmci_qp_alloc_info alloc_info;
+ struct vmci_qp_alloc_info __user *info = uptr;
+ struct vmci_qp_page_store page_store;
+
+ if (copy_from_user(&alloc_info, uptr, sizeof(alloc_info)))
+ return -EFAULT;
+
+ handle = alloc_info.handle;
+ retptr = &info->result;
+
+ page_store.pages = alloc_info.ppn_va;
+ page_store.len = alloc_info.num_ppns;
+
+ vmci_status = vmci_qp_broker_alloc(alloc_info.handle,
+ alloc_info.peer,
+ alloc_info.flags,
+ VMCI_NO_PRIVILEGE_FLAGS,
+ alloc_info.produce_size,
+ alloc_info.consume_size,
+ &page_store,
+ vmci_host_dev->context);
+ }
+
+ if (put_user(vmci_status, retptr)) {
+ if (vmci_status >= VMCI_SUCCESS) {
+ vmci_status = vmci_qp_broker_detach(handle,
+ vmci_host_dev->context);
+ BUG_ON(vmci_status < VMCI_SUCCESS);
+ }
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+static int vmci_host_do_queuepair_setva(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_qp_set_va_info set_va_info;
+ struct vmci_qp_set_va_info __user *info = uptr;
+ s32 result;
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (vmci_host_dev->user_version < VMCI_VERSION_NOVMVM) {
+ vmci_ioctl_err("is not allowed.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&set_va_info, uptr, sizeof(set_va_info)))
+ return -EFAULT;
+
+ if (set_va_info.va) {
+ /*
+ * VMX is passing down a new VA for the queue
+ * pair mapping.
+ */
+ result = vmci_qp_broker_map(set_va_info.handle,
+ vmci_host_dev->context,
+ set_va_info.va);
+ } else {
+ /*
+ * The queue pair is about to be unmapped by
+ * the VMX.
+ */
+ result = vmci_qp_broker_unmap(set_va_info.handle,
+ vmci_host_dev->context, 0);
+ }
+
+ return put_user(result, &info->result) ? -EFAULT : 0;
+}
+
+static int vmci_host_do_queuepair_setpf(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_qp_page_file_info page_file_info;
+ struct vmci_qp_page_file_info __user *info = uptr;
+ s32 result;
+
+ if (vmci_host_dev->user_version < VMCI_VERSION_HOSTQP ||
+ vmci_host_dev->user_version >= VMCI_VERSION_NOVMVM) {
+ vmci_ioctl_err("not supported on this VMX (version=%d).\n",
+ vmci_host_dev->user_version);
+ return -EINVAL;
+ }
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&page_file_info, uptr, sizeof(*info)))
+ return -EFAULT;
+
+ /*
+ * Communicate success pre-emptively to the caller. Note that the
+ * basic premise is that it is incumbent upon the caller not to look at
+ * the info.result field until after the ioctl() returns. And then,
+ * only if the ioctl() result indicates no error. We send up the
+ * SUCCESS status before calling SetPageStore() store because failing
+ * to copy up the result code means unwinding the SetPageStore().
+ *
+ * It turns out the logic to unwind a SetPageStore() opens a can of
+ * worms. For example, if a host had created the queue_pair and a
+ * guest attaches and SetPageStore() is successful but writing success
+ * fails, then ... the host has to be stopped from writing (anymore)
+ * data into the queue_pair. That means an additional test in the
+ * VMCI_Enqueue() code path. Ugh.
+ */
+
+ if (put_user(VMCI_SUCCESS, &info->result)) {
+ /*
+ * In this case, we can't write a result field of the
+ * caller's info block. So, we don't even try to
+ * SetPageStore().
+ */
+ return -EFAULT;
+ }
+
+ result = vmci_qp_broker_set_page_store(page_file_info.handle,
+ page_file_info.produce_va,
+ page_file_info.consume_va,
+ vmci_host_dev->context);
+ if (result < VMCI_SUCCESS) {
+ if (put_user(result, &info->result)) {
+ /*
+ * Note that in this case the SetPageStore()
+ * call failed but we were unable to
+ * communicate that to the caller (because the
+ * copy_to_user() call failed). So, if we
+ * simply return an error (in this case
+ * -EFAULT) then the caller will know that the
+ * SetPageStore failed even though we couldn't
+ * put the result code in the result field and
+ * indicate exactly why it failed.
+ *
+ * That says nothing about the issue where we
+ * were once able to write to the caller's info
+ * memory and now can't. Something more
+ * serious is probably going on than the fact
+ * that SetPageStore() didn't work.
+ */
+ return -EFAULT;
+ }
+ }
+
+ return 0;
+}
+
+static int vmci_host_do_qp_detach(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_qp_dtch_info detach_info;
+ struct vmci_qp_dtch_info __user *info = uptr;
+ s32 result;
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&detach_info, uptr, sizeof(detach_info)))
+ return -EFAULT;
+
+ result = vmci_qp_broker_detach(detach_info.handle,
+ vmci_host_dev->context);
+ if (result == VMCI_SUCCESS &&
+ vmci_host_dev->user_version < VMCI_VERSION_NOVMVM) {
+ result = VMCI_SUCCESS_LAST_DETACH;
+ }
+
+ return put_user(result, &info->result) ? -EFAULT : 0;
+}
+
+static int vmci_host_do_ctx_add_notify(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_ctx_info ar_info;
+ struct vmci_ctx_info __user *info = uptr;
+ s32 result;
+ u32 cid;
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&ar_info, uptr, sizeof(ar_info)))
+ return -EFAULT;
+
+ cid = vmci_ctx_get_id(vmci_host_dev->context);
+ result = vmci_ctx_add_notification(cid, ar_info.remote_cid);
+
+ return put_user(result, &info->result) ? -EFAULT : 0;
+}
+
+static int vmci_host_do_ctx_remove_notify(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_ctx_info ar_info;
+ struct vmci_ctx_info __user *info = uptr;
+ u32 cid;
+ int result;
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&ar_info, uptr, sizeof(ar_info)))
+ return -EFAULT;
+
+ cid = vmci_ctx_get_id(vmci_host_dev->context);
+ result = vmci_ctx_remove_notification(cid,
+ ar_info.remote_cid);
+
+ return put_user(result, &info->result) ? -EFAULT : 0;
+}
+
+static int vmci_host_do_ctx_get_cpt_state(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_ctx_chkpt_buf_info get_info;
+ u32 cid;
+ void *cpt_buf;
+ int retval;
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&get_info, uptr, sizeof(get_info)))
+ return -EFAULT;
+
+ cid = vmci_ctx_get_id(vmci_host_dev->context);
+ get_info.result = vmci_ctx_get_chkpt_state(cid, get_info.cpt_type,
+ &get_info.buf_size, &cpt_buf);
+ if (get_info.result == VMCI_SUCCESS && get_info.buf_size) {
+ void __user *ubuf = (void __user *)(uintptr_t)get_info.cpt_buf;
+ retval = copy_to_user(ubuf, cpt_buf, get_info.buf_size);
+ kfree(cpt_buf);
+
+ if (retval)
+ return -EFAULT;
+ }
+
+ return copy_to_user(uptr, &get_info, sizeof(get_info)) ? -EFAULT : 0;
+}
+
+static int vmci_host_do_ctx_set_cpt_state(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_ctx_chkpt_buf_info set_info;
+ u32 cid;
+ void *cpt_buf;
+ int retval;
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&set_info, uptr, sizeof(set_info)))
+ return -EFAULT;
+
+ cpt_buf = kmalloc(set_info.buf_size, GFP_KERNEL);
+ if (!cpt_buf) {
+ vmci_ioctl_err("cannot allocate memory to set cpt state (type=%d).\n",
+ set_info.cpt_type);
+ return -ENOMEM;
+ }
+
+ if (copy_from_user(cpt_buf, (void __user *)(uintptr_t)set_info.cpt_buf,
+ set_info.buf_size)) {
+ retval = -EFAULT;
+ goto out;
+ }
+
+ cid = vmci_ctx_get_id(vmci_host_dev->context);
+ set_info.result = vmci_ctx_set_chkpt_state(cid, set_info.cpt_type,
+ set_info.buf_size, cpt_buf);
+
+ retval = copy_to_user(uptr, &set_info, sizeof(set_info)) ? -EFAULT : 0;
+
+out:
+ kfree(cpt_buf);
+ return retval;
+}
+
+static int vmci_host_do_get_context_id(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ u32 __user *u32ptr = uptr;
+
+ return put_user(VMCI_HOST_CONTEXT_ID, u32ptr) ? -EFAULT : 0;
+}
+
+static int vmci_host_do_set_notify(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_set_notify_info notify_info;
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(¬ify_info, uptr, sizeof(notify_info)))
+ return -EFAULT;
+
+ if (notify_info.notify_uva) {
+ notify_info.result =
+ vmci_host_setup_notify(vmci_host_dev->context,
+ notify_info.notify_uva);
+ } else {
+ vmci_ctx_unset_notify(vmci_host_dev->context);
+ notify_info.result = VMCI_SUCCESS;
+ }
+
+ return copy_to_user(uptr, ¬ify_info, sizeof(notify_info)) ?
+ -EFAULT : 0;
+}
+
+static int vmci_host_do_notify_resource(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_dbell_notify_resource_info info;
+ u32 cid;
+
+ if (vmci_host_dev->user_version < VMCI_VERSION_NOTIFY) {
+ vmci_ioctl_err("invalid for current VMX versions.\n");
+ return -EINVAL;
+ }
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&info, uptr, sizeof(info)))
+ return -EFAULT;
+
+ cid = vmci_ctx_get_id(vmci_host_dev->context);
+
+ switch (info.action) {
+ case VMCI_NOTIFY_RESOURCE_ACTION_NOTIFY:
+ if (info.resource == VMCI_NOTIFY_RESOURCE_DOOR_BELL) {
+ u32 flags = VMCI_NO_PRIVILEGE_FLAGS;
+ info.result = vmci_ctx_notify_dbell(cid, info.handle,
+ flags);
+ } else {
+ info.result = VMCI_ERROR_UNAVAILABLE;
+ }
+ break;
+
+ case VMCI_NOTIFY_RESOURCE_ACTION_CREATE:
+ info.result = vmci_ctx_dbell_create(cid, info.handle);
+ break;
+
+ case VMCI_NOTIFY_RESOURCE_ACTION_DESTROY:
+ info.result = vmci_ctx_dbell_destroy(cid, info.handle);
+ break;
+
+ default:
+ vmci_ioctl_err("got unknown action (action=%d).\n",
+ info.action);
+ info.result = VMCI_ERROR_INVALID_ARGS;
+ }
+
+ return copy_to_user(uptr, &info, sizeof(info)) ? -EFAULT : 0;
+}
+
+static int vmci_host_do_recv_notifications(struct vmci_host_dev *vmci_host_dev,
+ const char *ioctl_name,
+ void __user *uptr)
+{
+ struct vmci_ctx_notify_recv_info info;
+ struct vmci_handle_arr *db_handle_array;
+ struct vmci_handle_arr *qp_handle_array;
+ void __user *ubuf;
+ u32 cid;
+ int retval = 0;
+
+ if (vmci_host_dev->ct_type != VMCIOBJ_CONTEXT) {
+ vmci_ioctl_err("only valid for contexts.\n");
+ return -EINVAL;
+ }
+
+ if (vmci_host_dev->user_version < VMCI_VERSION_NOTIFY) {
+ vmci_ioctl_err("not supported for the current vmx version.\n");
+ return -EINVAL;
+ }
+
+ if (copy_from_user(&info, uptr, sizeof(info)))
+ return -EFAULT;
+
+ if ((info.db_handle_buf_size && !info.db_handle_buf_uva) ||
+ (info.qp_handle_buf_size && !info.qp_handle_buf_uva)) {
+ return -EINVAL;
+ }
+
+ cid = vmci_ctx_get_id(vmci_host_dev->context);
+
+ info.result = vmci_ctx_rcv_notifications_get(cid,
+ &db_handle_array, &qp_handle_array);
+ if (info.result != VMCI_SUCCESS)
+ return copy_to_user(uptr, &info, sizeof(info)) ? -EFAULT : 0;
+
+ ubuf = (void __user *)(uintptr_t)info.db_handle_buf_uva;
+ info.result = drv_cp_harray_to_user(ubuf, &info.db_handle_buf_size,
+ db_handle_array, &retval);
+ if (info.result == VMCI_SUCCESS && !retval) {
+ ubuf = (void __user *)(uintptr_t)info.qp_handle_buf_uva;
+ info.result = drv_cp_harray_to_user(ubuf,
+ &info.qp_handle_buf_size,
+ qp_handle_array, &retval);
+ }
+
+ if (!retval && copy_to_user(uptr, &info, sizeof(info)))
+ retval = -EFAULT;
+
+ vmci_ctx_rcv_notifications_release(cid,
+ db_handle_array, qp_handle_array,
+ info.result == VMCI_SUCCESS && !retval);
+
+ return retval;
+}
+
+
+static long vmci_host_unlocked_ioctl(struct file *filp,
+ unsigned int iocmd, unsigned long ioarg)
+{
+#define VMCI_DO_IOCTL(ioctl_name, ioctl_fn) \
+ case ioctl_name: \
+ return ioctl_fn(vmci_host_dev, \
+ __stringify(ioctl_name), \
+ uptr)
+
+ struct vmci_host_dev *vmci_host_dev = filp->private_data;
+ void __user *uptr = (void __user *)ioarg;
+
+ switch (iocmd) {
+ VMCI_DO_IOCTL(IOCTL_VMCI_INIT_CONTEXT, vmci_host_do_init_context);
+ VMCI_DO_IOCTL(IOCTL_VMCI_DATAGRAM_SEND, vmci_host_do_send_datagram);
+ VMCI_DO_IOCTL(IOCTL_VMCI_DATAGRAM_RECEIVE,
+ vmci_host_do_receive_datagram);
+ VMCI_DO_IOCTL(IOCTL_VMCI_QUEUEPAIR_ALLOC, vmci_host_do_alloc_queuepair);
+ VMCI_DO_IOCTL(IOCTL_VMCI_QUEUEPAIR_SETVA, vmci_host_do_queuepair_setva);
+ VMCI_DO_IOCTL(IOCTL_VMCI_QUEUEPAIR_SETPAGEFILE,
+ vmci_host_do_queuepair_setpf);
+ VMCI_DO_IOCTL(IOCTL_VMCI_QUEUEPAIR_DETACH, vmci_host_do_qp_detach);
+ VMCI_DO_IOCTL(IOCTL_VMCI_CTX_ADD_NOTIFICATION,
+ vmci_host_do_ctx_add_notify);
+ VMCI_DO_IOCTL(IOCTL_VMCI_CTX_REMOVE_NOTIFICATION,
+ vmci_host_do_ctx_remove_notify);
+ VMCI_DO_IOCTL(IOCTL_VMCI_CTX_GET_CPT_STATE,
+ vmci_host_do_ctx_get_cpt_state);
+ VMCI_DO_IOCTL(IOCTL_VMCI_CTX_SET_CPT_STATE,
+ vmci_host_do_ctx_set_cpt_state);
+ VMCI_DO_IOCTL(IOCTL_VMCI_GET_CONTEXT_ID, vmci_host_do_get_context_id);
+ VMCI_DO_IOCTL(IOCTL_VMCI_SET_NOTIFY, vmci_host_do_set_notify);
+ VMCI_DO_IOCTL(IOCTL_VMCI_NOTIFY_RESOURCE, vmci_host_do_notify_resource);
+ VMCI_DO_IOCTL(IOCTL_VMCI_NOTIFICATIONS_RECEIVE,
+ vmci_host_do_recv_notifications);
+
+ case IOCTL_VMCI_VERSION:
+ case IOCTL_VMCI_VERSION2:
+ return vmci_host_get_version(vmci_host_dev, iocmd, uptr);
+
+ default:
+ pr_devel("%s: Unknown ioctl (iocmd=%d).\n", __func__, iocmd);
+ return -EINVAL;
+ }
+
+#undef VMCI_DO_IOCTL
+}
+
+static const struct file_operations vmuser_fops = {
+ .owner = THIS_MODULE,
+ .open = vmci_host_open,
+ .release = vmci_host_close,
+ .poll = vmci_host_poll,
+ .unlocked_ioctl = vmci_host_unlocked_ioctl,
+ .compat_ioctl = vmci_host_unlocked_ioctl,
+};
+
+static struct miscdevice vmci_host_miscdev = {
+ .name = MODULE_NAME,
+ .minor = MISC_DYNAMIC_MINOR,
+ .fops = &vmuser_fops,
+};
+
+int __init vmci_host_init(void)
+{
+ int error;
+
+ host_context = vmci_ctx_create(VMCI_HOST_CONTEXT_ID,
+ VMCI_DEFAULT_PROC_PRIVILEGE_FLAGS,
+ -1, VMCI_VERSION, NULL);
+ if (IS_ERR(host_context)) {
+ error = PTR_ERR(host_context);
+ pr_warn("Failed to initialize VMCIContext (error%d).\n",
+ error);
+ return error;
+ }
+
+ error = misc_register(&vmci_host_miscdev);
+ if (error) {
+ pr_warn("Module registration error (name=%s, major=%d, minor=%d, err=%d).\n",
+ vmci_host_miscdev.name,
+ MISC_MAJOR, vmci_host_miscdev.minor,
+ error);
+ pr_warn("Unable to initialize host personality\n");
+ vmci_ctx_destroy(host_context);
+ return error;
+ }
+
+ pr_info("VMCI host device registered (name=%s, major=%d, minor=%d).\n",
+ vmci_host_miscdev.name, MISC_MAJOR, vmci_host_miscdev.minor);
+
+ vmci_host_device_initialized = true;
+ return 0;
+}
+
+void __exit vmci_host_exit(void)
+{
+ int error;
+
+ vmci_host_device_initialized = false;
+
+ error = misc_deregister(&vmci_host_miscdev);
+ if (error)
+ pr_warn("Error unregistering character device: %d\n", error);
+
+ vmci_ctx_destroy(host_context);
+ vmci_qp_broker_exit();
+
+ pr_debug("VMCI host driver module unloaded\n");
+}
VMCI head config patch Adds all the necessary files to enable building of the VMCI
module with the Linux Makefiles and Kconfig systems. Also adds the header files used
for building modules against the driver.
Signed-off-by: George Zhang <[email protected]>
---
drivers/misc/Kconfig | 1
drivers/misc/Makefile | 2
drivers/misc/vmw_vmci/Kconfig | 16 +
drivers/misc/vmw_vmci/Makefile | 4
drivers/misc/vmw_vmci/vmci_common_int.h | 34 +
include/linux/vmw_vmci_api.h | 82 +++
include/linux/vmw_vmci_defs.h | 973 +++++++++++++++++++++++++++++++
7 files changed, 1112 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/vmw_vmci/Kconfig
create mode 100644 drivers/misc/vmw_vmci/Makefile
create mode 100644 drivers/misc/vmw_vmci/vmci_common_int.h
create mode 100644 include/linux/vmw_vmci_api.h
create mode 100644 include/linux/vmw_vmci_defs.h
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 2661f6e..fe38c7a 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -517,4 +517,5 @@ source "drivers/misc/lis3lv02d/Kconfig"
source "drivers/misc/carma/Kconfig"
source "drivers/misc/altera-stapl/Kconfig"
source "drivers/misc/mei/Kconfig"
+source "drivers/misc/vmw_vmci/Kconfig"
endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 456972f..21ed953 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -51,3 +51,5 @@ obj-y += carma/
obj-$(CONFIG_USB_SWITCH_FSA9480) += fsa9480.o
obj-$(CONFIG_ALTERA_STAPL) +=altera-stapl/
obj-$(CONFIG_INTEL_MEI) += mei/
+obj-$(CONFIG_MAX8997_MUIC) += max8997-muic.o
+obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/
diff --git a/drivers/misc/vmw_vmci/Kconfig b/drivers/misc/vmw_vmci/Kconfig
new file mode 100644
index 0000000..55015e7
--- /dev/null
+++ b/drivers/misc/vmw_vmci/Kconfig
@@ -0,0 +1,16 @@
+#
+# VMware VMCI device
+#
+
+config VMWARE_VMCI
+ tristate "VMware VMCI Driver"
+ depends on X86
+ help
+ This is VMware's Virtual Machine Communication Interface. It enables
+ high-speed communication between host and guest in a virtual
+ environment via the VMCI virtual device.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called vmw_vmci.
diff --git a/drivers/misc/vmw_vmci/Makefile b/drivers/misc/vmw_vmci/Makefile
new file mode 100644
index 0000000..4da9893
--- /dev/null
+++ b/drivers/misc/vmw_vmci/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci.o
+vmw_vmci-y += vmci_context.o vmci_datagram.o vmci_doorbell.o \
+ vmci_driver.o vmci_event.o vmci_guest.o vmci_handle_array.o \
+ vmci_host.o vmci_queue_pair.o vmci_resource.o vmci_route.o
diff --git a/drivers/misc/vmw_vmci/vmci_common_int.h b/drivers/misc/vmw_vmci/vmci_common_int.h
new file mode 100644
index 0000000..77667ec
--- /dev/null
+++ b/drivers/misc/vmw_vmci/vmci_common_int.h
@@ -0,0 +1,34 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef _VMCI_COMMONINT_H_
+#define _VMCI_COMMONINT_H_
+
+#include <linux/printk.h>
+
+#define ASSERT(cond) BUG_ON(!(cond))
+
+#define PCI_VENDOR_ID_VMWARE 0x15AD
+#define PCI_DEVICE_ID_VMWARE_VMCI 0x0740
+#define VMCI_DRIVER_VERSION_STRING "1.0.0.0-k"
+#define MODULE_NAME "vmw_vmci"
+
+/* Print magic... whee! */
+#ifdef pr_fmt
+#undef pr_fmt
+#define pr_fmt(fmt) MODULE_NAME ": " fmt
+#endif
+
+#endif /* _VMCI_COMMONINT_H_ */
diff --git a/include/linux/vmw_vmci_api.h b/include/linux/vmw_vmci_api.h
new file mode 100644
index 0000000..193129d
--- /dev/null
+++ b/include/linux/vmw_vmci_api.h
@@ -0,0 +1,82 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef __VMW_VMCI_API_H__
+#define __VMW_VMCI_API_H__
+
+#include <linux/uidgid.h>
+#include <linux/vmw_vmci_defs.h>
+
+#undef VMCI_KERNEL_API_VERSION
+#define VMCI_KERNEL_API_VERSION_1 1
+#define VMCI_KERNEL_API_VERSION_2 2
+#define VMCI_KERNEL_API_VERSION VMCI_KERNEL_API_VERSION_2
+
+typedef void (vmci_device_shutdown_fn) (void *device_registration,
+ void *user_data);
+
+int vmci_datagram_create_handle(u32 resource_id, u32 flags,
+ vmci_datagram_recv_cb recv_cb,
+ void *client_data,
+ struct vmci_handle *out_handle);
+int vmci_datagram_create_handle_priv(u32 resource_id, u32 flags, u32 priv_flags,
+ vmci_datagram_recv_cb recv_cb,
+ void *client_data,
+ struct vmci_handle *out_handle);
+int vmci_datagram_destroy_handle(struct vmci_handle handle);
+int vmci_datagram_send(struct vmci_datagram *msg);
+int vmci_doorbell_create(struct vmci_handle *handle, u32 flags,
+ u32 priv_flags,
+ vmci_callback notify_cb, void *client_data);
+int vmci_doorbell_destroy(struct vmci_handle handle);
+int vmci_doorbell_notify(struct vmci_handle handle, u32 priv_flags);
+u32 vmci_get_contextid(void);
+bool vmci_is_context_owner(u32 context_id, kuid_t uid);
+
+int vmci_event_subscribe(u32 event,
+ vmci_event_cb callback, void *callback_data,
+ u32 *subid);
+int vmci_event_unsubscribe(u32 subid);
+u32 vmci_context_get_priv_flags(u32 context_id);
+int vmci_qpair_alloc(struct vmci_qp **qpair,
+ struct vmci_handle *handle,
+ u64 produce_qsize,
+ u64 consume_qsize,
+ u32 peer, u32 flags, u32 priv_flags);
+int vmci_qpair_detach(struct vmci_qp **qpair);
+int vmci_qpair_get_produce_indexes(const struct vmci_qp *qpair,
+ u64 *producer_tail,
+ u64 *consumer_head);
+int vmci_qpair_get_consume_indexes(const struct vmci_qp *qpair,
+ u64 *consumer_tail,
+ u64 *producer_head);
+s64 vmci_qpair_produce_free_space(const struct vmci_qp *qpair);
+s64 vmci_qpair_produce_buf_ready(const struct vmci_qp *qpair);
+s64 vmci_qpair_consume_free_space(const struct vmci_qp *qpair);
+s64 vmci_qpair_consume_buf_ready(const struct vmci_qp *qpair);
+ssize_t vmci_qpair_enqueue(struct vmci_qp *qpair,
+ const void *buf, size_t buf_size, int mode);
+ssize_t vmci_qpair_dequeue(struct vmci_qp *qpair,
+ void *buf, size_t buf_size, int mode);
+ssize_t vmci_qpair_peek(struct vmci_qp *qpair, void *buf, size_t buf_size,
+ int mode);
+ssize_t vmci_qpair_enquev(struct vmci_qp *qpair,
+ void *iov, size_t iov_size, int mode);
+ssize_t vmci_qpair_dequev(struct vmci_qp *qpair,
+ void *iov, size_t iov_size, int mode);
+ssize_t vmci_qpair_peekv(struct vmci_qp *qpair, void *iov, size_t iov_size,
+ int mode);
+
+#endif /* !__VMW_VMCI_API_H__ */
diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h
new file mode 100644
index 0000000..40526e8
--- /dev/null
+++ b/include/linux/vmw_vmci_defs.h
@@ -0,0 +1,973 @@
+/*
+ * VMware VMCI Driver
+ *
+ * Copyright (C) 2012 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#ifndef _VMW_VMCI_DEF_H_
+#define _VMW_VMCI_DEF_H_
+
+#include <linux/atomic.h>
+
+/* Register offsets. */
+#define VMCI_STATUS_ADDR 0x00
+#define VMCI_CONTROL_ADDR 0x04
+#define VMCI_ICR_ADDR 0x08
+#define VMCI_IMR_ADDR 0x0c
+#define VMCI_DATA_OUT_ADDR 0x10
+#define VMCI_DATA_IN_ADDR 0x14
+#define VMCI_CAPS_ADDR 0x18
+#define VMCI_RESULT_LOW_ADDR 0x1c
+#define VMCI_RESULT_HIGH_ADDR 0x20
+
+/* Max number of devices. */
+#define VMCI_MAX_DEVICES 1
+
+/* Status register bits. */
+#define VMCI_STATUS_INT_ON 0x1
+
+/* Control register bits. */
+#define VMCI_CONTROL_RESET 0x1
+#define VMCI_CONTROL_INT_ENABLE 0x2
+#define VMCI_CONTROL_INT_DISABLE 0x4
+
+/* Capabilities register bits. */
+#define VMCI_CAPS_HYPERCALL 0x1
+#define VMCI_CAPS_GUESTCALL 0x2
+#define VMCI_CAPS_DATAGRAM 0x4
+#define VMCI_CAPS_NOTIFICATIONS 0x8
+
+/* Interrupt Cause register bits. */
+#define VMCI_ICR_DATAGRAM 0x1
+#define VMCI_ICR_NOTIFICATION 0x2
+
+/* Interrupt Mask register bits. */
+#define VMCI_IMR_DATAGRAM 0x1
+#define VMCI_IMR_NOTIFICATION 0x2
+
+/* Interrupt type. */
+enum {
+ VMCI_INTR_TYPE_INTX = 0,
+ VMCI_INTR_TYPE_MSI = 1,
+ VMCI_INTR_TYPE_MSIX = 2,
+};
+
+/* Maximum MSI/MSI-X interrupt vectors in the device. */
+#define VMCI_MAX_INTRS 2
+
+/*
+ * Supported interrupt vectors. There is one for each ICR value above,
+ * but here they indicate the position in the vector array/message ID.
+ */
+enum {
+ VMCI_INTR_DATAGRAM = 0,
+ VMCI_INTR_NOTIFICATION = 1,
+};
+
+/*
+ * A single VMCI device has an upper limit of 128MB on the amount of
+ * memory that can be used for queue pairs.
+ */
+#define VMCI_MAX_GUEST_QP_MEMORY (128 * 1024 * 1024)
+
+/*
+ * Queues with pre-mapped data pages must be small, so that we don't pin
+ * too much kernel memory (especially on vmkernel). We limit a queuepair to
+ * 32 KB, or 16 KB per queue for symmetrical pairs.
+ */
+#define VMCI_MAX_PINNED_QP_MEMORY (32 * 1024)
+
+/*
+ * We have a fixed set of resource IDs available in the VMX.
+ * This allows us to have a very simple implementation since we statically
+ * know how many will create datagram handles. If a new caller arrives and
+ * we have run out of slots we can manually increment the maximum size of
+ * available resource IDs.
+ *
+ * VMCI reserved hypervisor datagram resource IDs.
+ */
+enum {
+ VMCI_RESOURCES_QUERY = 0,
+ VMCI_GET_CONTEXT_ID = 1,
+ VMCI_SET_NOTIFY_BITMAP = 2,
+ VMCI_DOORBELL_LINK = 3,
+ VMCI_DOORBELL_UNLINK = 4,
+ VMCI_DOORBELL_NOTIFY = 5,
+ /*
+ * VMCI_DATAGRAM_REQUEST_MAP and VMCI_DATAGRAM_REMOVE_MAP are
+ * obsoleted by the removal of VM to VM communication.
+ */
+ VMCI_DATAGRAM_REQUEST_MAP = 6,
+ VMCI_DATAGRAM_REMOVE_MAP = 7,
+ VMCI_EVENT_SUBSCRIBE = 8,
+ VMCI_EVENT_UNSUBSCRIBE = 9,
+ VMCI_QUEUEPAIR_ALLOC = 10,
+ VMCI_QUEUEPAIR_DETACH = 11,
+
+ /*
+ * VMCI_VSOCK_VMX_LOOKUP was assigned to 12 for Fusion 3.0/3.1,
+ * WS 7.0/7.1 and ESX 4.1
+ */
+ VMCI_HGFS_TRANSPORT = 13,
+ VMCI_UNITY_PBRPC_REGISTER = 14,
+ VMCI_RPC_PRIVILEGED = 15,
+ VMCI_RPC_UNPRIVILEGED = 16,
+ VMCI_RESOURCE_MAX = 17,
+};
+
+/*
+ * struct vmci_handle - Ownership information structure
+ * @context: The VMX context ID.
+ * @resource: The resource ID (used for locating in resource hash).
+ *
+ * The vmci_handle structure is used to track resources used within
+ * vmw_vmci.
+ */
+struct vmci_handle {
+ u32 context;
+ u32 resource;
+};
+
+typedef u32 vmci_id;
+typedef u32 vmci_privilege_flags;
+typedef u32 vmci_event;
+
+static inline struct vmci_handle VMCI_MAKE_HANDLE(vmci_id cid, vmci_id rid)
+{
+ struct vmci_handle h;
+ h.context = cid;
+ h.resource = rid;
+ return h;
+}
+
+#define VMCI_HANDLE_TO_CONTEXT_ID(_handle) ((_handle).context)
+#define VMCI_HANDLE_TO_RESOURCE_ID(_handle) ((_handle).resource)
+#define VMCI_HANDLE_EQUAL(_h1, _h2) ((_h1).context == (_h2).context && \
+ (_h1).resource == (_h2).resource)
+
+#define VMCI_INVALID_ID ~0
+static const struct vmci_handle VMCI_INVALID_HANDLE = { VMCI_INVALID_ID,
+ VMCI_INVALID_ID
+};
+
+#define VMCI_HANDLE_INVALID(_handle) \
+ VMCI_HANDLE_EQUAL((_handle), VMCI_INVALID_HANDLE)
+
+/*
+ * The below defines can be used to send anonymous requests.
+ * This also indicates that no response is expected.
+ */
+#define VMCI_ANON_SRC_CONTEXT_ID VMCI_INVALID_ID
+#define VMCI_ANON_SRC_RESOURCE_ID VMCI_INVALID_ID
+#define VMCI_ANON_SRC_HANDLE vmci_make_handle(VMCI_ANON_SRC_CONTEXT_ID, \
+ VMCI_ANON_SRC_RESOURCE_ID)
+
+/* The lowest 16 context ids are reserved for internal use. */
+#define VMCI_RESERVED_CID_LIMIT ((u32) 16)
+
+/*
+ * Hypervisor context id, used for calling into hypervisor
+ * supplied services from the VM.
+ */
+#define VMCI_HYPERVISOR_CONTEXT_ID 0
+
+/*
+ * Well-known context id, a logical context that contains a set of
+ * well-known services. This context ID is now obsolete.
+ */
+#define VMCI_WELL_KNOWN_CONTEXT_ID 1
+
+/*
+ * Context ID used by host endpoints.
+ */
+#define VMCI_HOST_CONTEXT_ID 2
+
+#define VMCI_CONTEXT_IS_VM(_cid) (VMCI_INVALID_ID != (_cid) && \
+ (_cid) > VMCI_HOST_CONTEXT_ID)
+
+/*
+ * The VMCI_CONTEXT_RESOURCE_ID is used together with vmci_make_handle to make
+ * handles that refer to a specific context.
+ */
+#define VMCI_CONTEXT_RESOURCE_ID 0
+
+/*
+ * VMCI error codes.
+ */
+enum {
+ VMCI_SUCCESS_QUEUEPAIR_ATTACH = 5,
+ VMCI_SUCCESS_QUEUEPAIR_CREATE = 4,
+ VMCI_SUCCESS_LAST_DETACH = 3,
+ VMCI_SUCCESS_ACCESS_GRANTED = 2,
+ VMCI_SUCCESS_ENTRY_DEAD = 1,
+ VMCI_SUCCESS = 0,
+ VMCI_ERROR_INVALID_RESOURCE = (-1),
+ VMCI_ERROR_INVALID_ARGS = (-2),
+ VMCI_ERROR_NO_MEM = (-3),
+ VMCI_ERROR_DATAGRAM_FAILED = (-4),
+ VMCI_ERROR_MORE_DATA = (-5),
+ VMCI_ERROR_NO_MORE_DATAGRAMS = (-6),
+ VMCI_ERROR_NO_ACCESS = (-7),
+ VMCI_ERROR_NO_HANDLE = (-8),
+ VMCI_ERROR_DUPLICATE_ENTRY = (-9),
+ VMCI_ERROR_DST_UNREACHABLE = (-10),
+ VMCI_ERROR_PAYLOAD_TOO_LARGE = (-11),
+ VMCI_ERROR_INVALID_PRIV = (-12),
+ VMCI_ERROR_GENERIC = (-13),
+ VMCI_ERROR_PAGE_ALREADY_SHARED = (-14),
+ VMCI_ERROR_CANNOT_SHARE_PAGE = (-15),
+ VMCI_ERROR_CANNOT_UNSHARE_PAGE = (-16),
+ VMCI_ERROR_NO_PROCESS = (-17),
+ VMCI_ERROR_NO_DATAGRAM = (-18),
+ VMCI_ERROR_NO_RESOURCES = (-19),
+ VMCI_ERROR_UNAVAILABLE = (-20),
+ VMCI_ERROR_NOT_FOUND = (-21),
+ VMCI_ERROR_ALREADY_EXISTS = (-22),
+ VMCI_ERROR_NOT_PAGE_ALIGNED = (-23),
+ VMCI_ERROR_INVALID_SIZE = (-24),
+ VMCI_ERROR_REGION_ALREADY_SHARED = (-25),
+ VMCI_ERROR_TIMEOUT = (-26),
+ VMCI_ERROR_DATAGRAM_INCOMPLETE = (-27),
+ VMCI_ERROR_INCORRECT_IRQL = (-28),
+ VMCI_ERROR_EVENT_UNKNOWN = (-29),
+ VMCI_ERROR_OBSOLETE = (-30),
+ VMCI_ERROR_QUEUEPAIR_MISMATCH = (-31),
+ VMCI_ERROR_QUEUEPAIR_NOTSET = (-32),
+ VMCI_ERROR_QUEUEPAIR_NOTOWNER = (-33),
+ VMCI_ERROR_QUEUEPAIR_NOTATTACHED = (-34),
+ VMCI_ERROR_QUEUEPAIR_NOSPACE = (-35),
+ VMCI_ERROR_QUEUEPAIR_NODATA = (-36),
+ VMCI_ERROR_BUSMEM_INVALIDATION = (-37),
+ VMCI_ERROR_MODULE_NOT_LOADED = (-38),
+ VMCI_ERROR_DEVICE_NOT_FOUND = (-39),
+ VMCI_ERROR_QUEUEPAIR_NOT_READY = (-40),
+ VMCI_ERROR_WOULD_BLOCK = (-41),
+
+ /* VMCI clients should return error code within this range */
+ VMCI_ERROR_CLIENT_MIN = (-500),
+ VMCI_ERROR_CLIENT_MAX = (-550),
+
+ /* Internal error codes. */
+ VMCI_SHAREDMEM_ERROR_BAD_CONTEXT = (-1000),
+};
+
+/* VMCI reserved events. */
+enum {
+ /* Only applicable to guest endpoints */
+ VMCI_EVENT_CTX_ID_UPDATE = 0,
+
+ /* Applicable to guest and host */
+ VMCI_EVENT_CTX_REMOVED = 1,
+
+ /* Only applicable to guest endpoints */
+ VMCI_EVENT_QP_RESUMED = 2,
+
+ /* Applicable to guest and host */
+ VMCI_EVENT_QP_PEER_ATTACH = 3,
+
+ /* Applicable to guest and host */
+ VMCI_EVENT_QP_PEER_DETACH = 4,
+
+ /*
+ * Applicable to VMX and vmk. On vmk,
+ * this event has the Context payload type.
+ */
+ VMCI_EVENT_MEM_ACCESS_ON = 5,
+
+ /*
+ * Applicable to VMX and vmk. Same as
+ * above for the payload type.
+ */
+ VMCI_EVENT_MEM_ACCESS_OFF = 6,
+ VMCI_EVENT_MAX = 7,
+};
+
+/*
+ * Of the above events, a few are reserved for use in the VMX, and
+ * other endpoints (guest and host kernel) should not use them. For
+ * the rest of the events, we allow both host and guest endpoints to
+ * subscribe to them, to maintain the same API for host and guest
+ * endpoints.
+ */
+#define VMCI_EVENT_VALID_VMX(_event) ((_event) == VMCI_EVENT_MEM_ACCESS_ON || \
+ (_event) == VMCI_EVENT_MEM_ACCESS_OFF)
+
+#define VMCI_EVENT_VALID(_event) ((_event) < VMCI_EVENT_MAX && \
+ !VMCI_EVENT_VALID_VMX(_event))
+
+/* Reserved guest datagram resource ids. */
+#define VMCI_EVENT_HANDLER 0
+
+/*
+ * VMCI coarse-grained privileges (per context or host
+ * process/endpoint. An entity with the restricted flag is only
+ * allowed to interact with the hypervisor and trusted entities.
+ */
+enum {
+ VMCI_NO_PRIVILEGE_FLAGS = 0,
+ VMCI_PRIVILEGE_FLAG_RESTRICTED = 1,
+ VMCI_PRIVILEGE_FLAG_TRUSTED = 2,
+ VMCI_PRIVILEGE_ALL_FLAGS = (VMCI_PRIVILEGE_FLAG_RESTRICTED |
+ VMCI_PRIVILEGE_FLAG_TRUSTED),
+ VMCI_DEFAULT_PROC_PRIVILEGE_FLAGS = VMCI_NO_PRIVILEGE_FLAGS,
+ VMCI_LEAST_PRIVILEGE_FLAGS = VMCI_PRIVILEGE_FLAG_RESTRICTED,
+ VMCI_MAX_PRIVILEGE_FLAGS = VMCI_PRIVILEGE_FLAG_TRUSTED,
+};
+
+/* 0 through VMCI_RESERVED_RESOURCE_ID_MAX are reserved. */
+#define VMCI_RESERVED_RESOURCE_ID_MAX 1023
+
+/*
+ * Driver version.
+ *
+ * Increment major version when you make an incompatible change.
+ * Compatibility goes both ways (old driver with new executable
+ * as well as new driver with old executable).
+ */
+
+/* Never change VMCI_VERSION_SHIFT_WIDTH */
+#define VMCI_VERSION_SHIFT_WIDTH 16
+#define VMCI_MAKE_VERSION(_major, _minor) \
+ ((_major) << VMCI_VERSION_SHIFT_WIDTH | (u16) (_minor))
+
+#define VMCI_VERSION_MAJOR(v) ((u32) (v) >> VMCI_VERSION_SHIFT_WIDTH)
+#define VMCI_VERSION_MINOR(v) ((u16) (v))
+
+/*
+ * VMCI_VERSION is always the current version. Subsequently listed
+ * versions are ways of detecting previous versions of the connecting
+ * application (i.e., VMX).
+ *
+ * VMCI_VERSION_NOVMVM: This version removed support for VM to VM
+ * communication.
+ *
+ * VMCI_VERSION_NOTIFY: This version introduced doorbell notification
+ * support.
+ *
+ * VMCI_VERSION_HOSTQP: This version introduced host end point support
+ * for hosted products.
+ *
+ * VMCI_VERSION_PREHOSTQP: This is the version prior to the adoption of
+ * support for host end-points.
+ *
+ * VMCI_VERSION_PREVERS2: This fictional version number is intended to
+ * represent the version of a VMX which doesn't call into the driver
+ * with ioctl VERSION2 and thus doesn't establish its version with the
+ * driver.
+ */
+
+#define VMCI_VERSION VMCI_VERSION_NOVMVM
+#define VMCI_VERSION_NOVMVM VMCI_MAKE_VERSION(11, 0)
+#define VMCI_VERSION_NOTIFY VMCI_MAKE_VERSION(10, 0)
+#define VMCI_VERSION_HOSTQP VMCI_MAKE_VERSION(9, 0)
+#define VMCI_VERSION_PREHOSTQP VMCI_MAKE_VERSION(8, 0)
+#define VMCI_VERSION_PREVERS2 VMCI_MAKE_VERSION(1, 0)
+
+#define VMCI_SOCKETS_MAKE_VERSION(_p) \
+ ((((_p)[0] & 0xFF) << 24) | (((_p)[1] & 0xFF) << 16) | ((_p)[2]))
+
+/*
+ * Linux defines _IO* macros, but the core kernel code ignore the encoded
+ * ioctl value. It is up to individual drivers to decode the value (for
+ * example to look at the size of a structure to determine which version
+ * of a specific command should be used) or not (which is what we
+ * currently do, so right now the ioctl value for a given command is the
+ * command itself).
+ *
+ * Hence, we just define the IOCTL_VMCI_foo values directly, with no
+ * intermediate IOCTLCMD_ representation.
+ */
+#define IOCTLCMD(_cmd) IOCTL_VMCI_ ## _cmd
+
+enum {
+ /*
+ * We need to bracket the range of values used for ioctls,
+ * because x86_64 Linux forces us to explicitly register ioctl
+ * handlers by value for handling 32 bit ioctl syscalls.
+ * Hence FIRST and LAST. Pick something for FIRST that
+ * doesn't collide with vmmon (2001+).
+ */
+ IOCTLCMD(FIRST) = 1951,
+ IOCTLCMD(VERSION) = IOCTLCMD(FIRST),
+
+ /* BEGIN VMCI */
+ IOCTLCMD(INIT_CONTEXT),
+
+ /*
+ * The following two were used for process and datagram
+ * process creation. They are not used anymore and reserved
+ * for future use. They will fail if issued.
+ */
+ IOCTLCMD(RESERVED1),
+ IOCTLCMD(RESERVED2),
+
+ /*
+ * The following used to be for shared memory. It is now
+ * unused and and is reserved for future use. It will fail if
+ * issued.
+ */
+ IOCTLCMD(RESERVED3),
+
+ /*
+ * The follwoing three were also used to be for shared
+ * memory. An old WS6 user-mode client might try to use them
+ * with the new driver, but since we ensure that only contexts
+ * created by VMX'en of the appropriate version
+ * (VMCI_VERSION_NOTIFY or VMCI_VERSION_NEWQP) or higher use
+ * these ioctl, everything is fine.
+ */
+ IOCTLCMD(QUEUEPAIR_SETVA),
+ IOCTLCMD(NOTIFY_RESOURCE),
+ IOCTLCMD(NOTIFICATIONS_RECEIVE),
+ IOCTLCMD(VERSION2),
+ IOCTLCMD(QUEUEPAIR_ALLOC),
+ IOCTLCMD(QUEUEPAIR_SETPAGEFILE),
+ IOCTLCMD(QUEUEPAIR_DETACH),
+ IOCTLCMD(DATAGRAM_SEND),
+ IOCTLCMD(DATAGRAM_RECEIVE),
+ IOCTLCMD(DATAGRAM_REQUEST_MAP),
+ IOCTLCMD(DATAGRAM_REMOVE_MAP),
+ IOCTLCMD(CTX_ADD_NOTIFICATION),
+ IOCTLCMD(CTX_REMOVE_NOTIFICATION),
+ IOCTLCMD(CTX_GET_CPT_STATE),
+ IOCTLCMD(CTX_SET_CPT_STATE),
+ IOCTLCMD(GET_CONTEXT_ID),
+ IOCTLCMD(LAST),
+ /* END VMCI */
+
+ /*
+ * VMCI Socket IOCTLS are defined next and go from
+ * IOCTLCMD(LAST) (1972) to 1990. VMware reserves a range of
+ * 4 ioctls for VMCI Sockets to grow. We cannot reserve many
+ * ioctls here since we are close to overlapping with vmmon
+ * ioctls (2001+). Define a meta-ioctl if running out of this
+ * binary space.
+ */
+ IOCTLCMD(SOCKETS_FIRST) = IOCTLCMD(LAST),
+ IOCTLCMD(SOCKETS_VERSION) = IOCTLCMD(SOCKETS_FIRST),
+ IOCTLCMD(SOCKETS_BIND),
+ IOCTLCMD(SOCKETS_SET_SYMBOLS),
+ IOCTLCMD(SOCKETS_CONNECT),
+ /*
+ * The next two values are public (vmci_sockets.h) and cannot be
+ * changed. That means the number of values above these cannot be
+ * changed either unless the base index (specified below) is updated
+ * accordingly.
+ */
+ IOCTLCMD(SOCKETS_GET_AF_VALUE),
+ IOCTLCMD(SOCKETS_GET_LOCAL_CID),
+ IOCTLCMD(SOCKETS_GET_SOCK_NAME),
+ IOCTLCMD(SOCKETS_GET_SOCK_OPT),
+ IOCTLCMD(SOCKETS_GET_VM_BY_NAME),
+ IOCTLCMD(SOCKETS_IOCTL),
+ IOCTLCMD(SOCKETS_LISTEN),
+ IOCTLCMD(SOCKETS_RECV),
+ IOCTLCMD(SOCKETS_RECV_FROM),
+ IOCTLCMD(SOCKETS_SELECT),
+ IOCTLCMD(SOCKETS_SEND),
+ IOCTLCMD(SOCKETS_SEND_TO),
+ IOCTLCMD(SOCKETS_SET_SOCK_OPT),
+ IOCTLCMD(SOCKETS_SHUTDOWN),
+ IOCTLCMD(SOCKETS_SOCKET), /* 1990 on Linux. */
+
+ IOCTLCMD(SOCKETS_LAST) = 1994, /* 1994 on Linux. */
+
+ /*
+ * The VSockets ioctls occupy the block above. We define a
+ * new range of VMCI ioctls to maintain binary compatibility
+ * between the user land and the kernel driver. Careful,
+ * vmmon ioctls start from 2001, so this means we can add only
+ * 4 new VMCI ioctls. Define a meta-ioctl if running out of
+ * this binary space.
+ */
+ IOCTLCMD(FIRST2),
+ IOCTLCMD(SET_NOTIFY) = IOCTLCMD(FIRST2), /* 1995 on Linux. */
+ IOCTLCMD(LAST2),
+};
+
+/* Clean up helper macros */
+#undef IOCTLCMD
+
+/*
+ * struct vmci_queue_header - VMCI Queue Header information.
+ *
+ * A Queue cannot stand by itself as designed. Each Queue's header
+ * contains a pointer into itself (the producer_tail) and into its peer
+ * (consumer_head). The reason for the separation is one of
+ * accessibility: Each end-point can modify two things: where the next
+ * location to enqueue is within its produce_q (producer_tail); and
+ * where the next dequeue location is in its consume_q (consumer_head).
+ *
+ * An end-point cannot modify the pointers of its peer (guest to
+ * guest; NOTE that in the host both queue headers are mapped r/w).
+ * But, each end-point needs read access to both Queue header
+ * structures in order to determine how much space is used (or left)
+ * in the Queue. This is because for an end-point to know how full
+ * its produce_q is, it needs to use the consumer_head that points into
+ * the produce_q but -that- consumer_head is in the Queue header for
+ * that end-points consume_q.
+ *
+ * Thoroughly confused? Sorry.
+ *
+ * producer_tail: the point to enqueue new entrants. When you approach
+ * a line in a store, for example, you walk up to the tail.
+ *
+ * consumer_head: the point in the queue from which the next element is
+ * dequeued. In other words, who is next in line is he who is at the
+ * head of the line.
+ *
+ * Also, producer_tail points to an empty byte in the Queue, whereas
+ * consumer_head points to a valid byte of data (unless producer_tail ==
+ * consumer_head in which case consumer_head does not point to a valid
+ * byte of data).
+ *
+ * For a queue of buffer 'size' bytes, the tail and head pointers will be in
+ * the range [0, size-1].
+ *
+ * If produce_q_header->producer_tail == consume_q_header->consumer_head
+ * then the produce_q is empty.
+ */
+struct vmci_queue_header {
+ /* All fields are 64bit and aligned. */
+ struct vmci_handle handle; /* Identifier. */
+ atomic64_t producer_tail; /* Offset in this queue. */
+ atomic64_t consumer_head; /* Offset in peer queue. */
+};
+
+/*
+ * struct vmci_datagram - Base struct for vmci datagrams.
+ * @dst: A vmci_handle that tracks the destination of the datagram.
+ * @src: A vmci_handle that tracks the source of the datagram.
+ * @payload_size: The size of the payload.
+ *
+ * vmci_datagram structs are used when sending vmci datagrams. They include
+ * the necessary source and destination information to properly route
+ * the information along with the size of the package.
+ */
+struct vmci_datagram {
+ struct vmci_handle dst;
+ struct vmci_handle src;
+ u64 payload_size;
+};
+
+/*
+ * Second flag is for creating a well-known handle instead of a per context
+ * handle. Next flag is for deferring datagram delivery, so that the
+ * datagram callback is invoked in a delayed context (not interrupt context).
+ */
+#define VMCI_FLAG_DG_NONE 0
+#define VMCI_FLAG_WELLKNOWN_DG_HND 0x1
+#define VMCI_FLAG_ANYCID_DG_HND 0x2
+#define VMCI_FLAG_DG_DELAYED_CB 0x4
+
+/*
+ * Maximum supported size of a VMCI datagram for routable datagrams.
+ * Datagrams going to the hypervisor are allowed to be larger.
+ */
+#define VMCI_MAX_DG_SIZE (17 * 4096)
+#define VMCI_MAX_DG_PAYLOAD_SIZE (VMCI_MAX_DG_SIZE - \
+ sizeof(struct vmci_datagram))
+#define VMCI_DG_PAYLOAD(_dg) (void *)((char *)(_dg) + \
+ sizeof(struct vmci_datagram))
+#define VMCI_DG_HEADERSIZE sizeof(struct vmci_datagram)
+#define VMCI_DG_SIZE(_dg) (VMCI_DG_HEADERSIZE + (size_t)(_dg)->payload_size)
+#define VMCI_DG_SIZE_ALIGNED(_dg) ((VMCI_DG_SIZE(_dg) + 7) & (~((size_t) 0x7)))
+#define VMCI_MAX_DATAGRAM_QUEUE_SIZE (VMCI_MAX_DG_SIZE * 2)
+
+struct vmci_event_payload_qp {
+ struct vmci_handle handle; /* queue_pair handle. */
+ u32 peer_id; /* Context id of attaching/detaching VM. */
+ u32 _pad;
+};
+
+/* Flags for VMCI queue_pair API. */
+enum {
+ /* Fail alloc if QP not created by peer. */
+ VMCI_QPFLAG_ATTACH_ONLY = 1 << 0,
+
+ /* Only allow attaches from local context. */
+ VMCI_QPFLAG_LOCAL = 1 << 1,
+
+ /* Host won't block when guest is quiesced. */
+ VMCI_QPFLAG_NONBLOCK = 1 << 2,
+
+ /* Pin data pages in ESX. Used with NONBLOCK */
+ VMCI_QPFLAG_PINNED = 1 << 3,
+
+ /* Update the following flag when adding new flags. */
+ VMCI_QP_ALL_FLAGS = (VMCI_QPFLAG_ATTACH_ONLY | VMCI_QPFLAG_LOCAL |
+ VMCI_QPFLAG_NONBLOCK | VMCI_QPFLAG_PINNED),
+
+ /* Convenience flags */
+ VMCI_QP_ASYMM = (VMCI_QPFLAG_NONBLOCK | VMCI_QPFLAG_PINNED),
+ VMCI_QP_ASYMM_PEER = (VMCI_QPFLAG_ATTACH_ONLY | VMCI_QP_ASYMM),
+};
+
+/*
+ * We allow at least 1024 more event datagrams from the hypervisor past the
+ * normally allowed datagrams pending for a given context. We define this
+ * limit on event datagrams from the hypervisor to guard against DoS attack
+ * from a malicious VM which could repeatedly attach to and detach from a queue
+ * pair, causing events to be queued at the destination VM. However, the rate
+ * at which such events can be generated is small since it requires a VM exit
+ * and handling of queue pair attach/detach call at the hypervisor. Event
+ * datagrams may be queued up at the destination VM if it has interrupts
+ * disabled or if it is not draining events for some other reason. 1024
+ * datagrams is a grossly conservative estimate of the time for which
+ * interrupts may be disabled in the destination VM, but at the same time does
+ * not exacerbate the memory pressure problem on the host by much (size of each
+ * event datagram is small).
+ */
+#define VMCI_MAX_DATAGRAM_AND_EVENT_QUEUE_SIZE \
+ (VMCI_MAX_DATAGRAM_QUEUE_SIZE + \
+ 1024 * (sizeof(struct vmci_datagram) + \
+ sizeof(struct vmci_event_data_max)))
+
+/*
+ * Struct used for querying, via VMCI_RESOURCES_QUERY, the availability of
+ * hypervisor resources. Struct size is 16 bytes. All fields in struct are
+ * aligned to their natural alignment.
+ */
+struct vmci_resource_query_hdr {
+ struct vmci_datagram hdr;
+ u32 num_resources;
+ u32 _padding;
+};
+
+/*
+ * Convenience struct for negotiating vectors. Must match layout of
+ * VMCIResourceQueryHdr minus the struct vmci_datagram header.
+ */
+struct vmci_resource_query_msg {
+ u32 num_resources;
+ u32 _padding;
+ u32 resources[1];
+};
+
+/*
+ * The maximum number of resources that can be queried using
+ * VMCI_RESOURCE_QUERY is 31, as the result is encoded in the lower 31
+ * bits of a positive return value. Negative values are reserved for
+ * errors.
+ */
+#define VMCI_RESOURCE_QUERY_MAX_NUM 31
+
+/* Maximum size for the VMCI_RESOURCE_QUERY request. */
+#define VMCI_RESOURCE_QUERY_MAX_SIZE \
+ (sizeof(struct vmci_resource_query_hdr) + \
+ sizeof(u32) * VMCI_RESOURCE_QUERY_MAX_NUM)
+
+/*
+ * Struct used for setting the notification bitmap. All fields in
+ * struct are aligned to their natural alignment.
+ */
+struct vmci_notify_bm_set_msg {
+ struct vmci_datagram hdr;
+ u32 bitmap_ppn;
+ u32 _pad;
+};
+
+/*
+ * Struct used for linking a doorbell handle with an index in the
+ * notify bitmap. All fields in struct are aligned to their natural
+ * alignment.
+ */
+struct vmci_doorbell_link_msg {
+ struct vmci_datagram hdr;
+ struct vmci_handle handle;
+ u64 notify_idx;
+};
+
+/*
+ * Struct used for unlinking a doorbell handle from an index in the
+ * notify bitmap. All fields in struct are aligned to their natural
+ * alignment.
+ */
+struct vmci_doorbell_unlink_msg {
+ struct vmci_datagram hdr;
+ struct vmci_handle handle;
+};
+
+/*
+ * Struct used for generating a notification on a doorbell handle. All
+ * fields in struct are aligned to their natural alignment.
+ */
+struct vmci_doorbell_notify_msg {
+ struct vmci_datagram hdr;
+ struct vmci_handle handle;
+};
+
+/*
+ * This struct is used to contain data for events. Size of this struct is a
+ * multiple of 8 bytes, and all fields are aligned to their natural alignment.
+ */
+struct vmci_event_data {
+ u32 event; /* 4 bytes. */
+ u32 _pad;
+ /* Event payload is put here. */
+};
+
+/*
+ * Define the different VMCI_EVENT payload data types here. All structs must
+ * be a multiple of 8 bytes, and fields must be aligned to their natural
+ * alignment.
+ */
+struct vmci_event_payld_ctx {
+ u32 context_id; /* 4 bytes. */
+ u32 _pad;
+};
+
+struct vmci_event_payld_qp {
+ struct vmci_handle handle; /* queue_pair handle. */
+ u32 peer_id; /* Context id of attaching/detaching VM. */
+ u32 _pad;
+};
+
+/*
+ * We define the following struct to get the size of the maximum event
+ * data the hypervisor may send to the guest. If adding a new event
+ * payload type above, add it to the following struct too (inside the
+ * union).
+ */
+struct vmci_event_data_max {
+ struct vmci_event_data event_data;
+ union {
+ struct vmci_event_payld_ctx context_payload;
+ struct vmci_event_payld_qp qp_payload;
+ } ev_data_payload;
+};
+
+/*
+ * Struct used for VMCI_EVENT_SUBSCRIBE/UNSUBSCRIBE and
+ * VMCI_EVENT_HANDLER messages. Struct size is 32 bytes. All fields
+ * in struct are aligned to their natural alignment.
+ */
+struct vmci_event_msg {
+ struct vmci_datagram hdr;
+
+ /* Has event type and payload. */
+ struct vmci_event_data event_data;
+
+ /* Payload gets put here. */
+};
+
+/*
+ * Structs used for queue_pair alloc and detach messages. We align fields of
+ * these structs to 64bit boundaries.
+ */
+struct vmci_qp_alloc_msg {
+ struct vmci_datagram hdr;
+ struct vmci_handle handle;
+ u32 peer;
+ u32 flags;
+ u64 produce_size;
+ u64 consume_size;
+ u64 num_ppns;
+
+ /* List of PPNs placed here. */
+};
+
+struct vmci_qp_detach_msg {
+ struct vmci_datagram hdr;
+ struct vmci_handle handle;
+};
+
+/* VMCI Doorbell API. */
+#define VMCI_FLAG_DELAYED_CB 0x01
+
+typedef void (*vmci_callback) (void *client_data);
+
+/*
+ * struct vmci_qp - A vmw_vmci queue pair handle.
+ *
+ * This structure is used as a handle to a queue pair created by
+ * VMCI. It is intentionally left opaque to clients.
+ */
+struct vmci_qp;
+
+/* Callback needed for correctly waiting on events. */
+typedef int (*vmci_datagram_recv_cb) (void *client_data,
+ struct vmci_datagram *msg);
+
+/* VMCI Event API. */
+typedef void (*vmci_event_cb) (u32 sub_id, const struct vmci_event_data *ed,
+ void *client_data);
+
+/*
+ * We use the following inline function to access the payload data
+ * associated with an event data.
+ */
+static inline const void *
+vmci_event_data_const_payload(const struct vmci_event_data *ev_data)
+{
+ return (const char *)ev_data + sizeof(*ev_data);
+}
+
+static inline void *vmci_event_data_payload(struct vmci_event_data *ev_data)
+{
+ return (void *)vmci_event_data_const_payload(ev_data);
+}
+
+/*
+ * Helper to add a given offset to a head or tail pointer. Wraps the
+ * value of the pointer around the max size of the queue.
+ */
+static inline void vmci_qp_add_pointer(atomic64_t *var,
+ size_t add,
+ u64 size)
+{
+ u64 new_val = atomic64_read(var);
+
+ if (new_val >= size - add)
+ new_val -= size;
+
+ new_val += add;
+
+ atomic64_set(var, new_val);
+}
+
+/*
+ * Helper routine to get the Producer Tail from the supplied queue.
+ */
+static inline u64
+vmci_q_header_producer_tail(const struct vmci_queue_header *q_header)
+{
+ struct vmci_queue_header *qh = (struct vmci_queue_header *)q_header;
+ return atomic64_read(&qh->producer_tail);
+}
+
+/*
+ * Helper routine to get the Consumer Head from the supplied queue.
+ */
+static inline u64
+vmci_q_header_consumer_head(const struct vmci_queue_header *q_header)
+{
+ struct vmci_queue_header *qh = (struct vmci_queue_header *)q_header;
+ return atomic64_read(&qh->consumer_head);
+}
+
+/*
+ * Helper routine to increment the Producer Tail. Fundamentally,
+ * vmci_qp_add_pointer() is used to manipulate the tail itself.
+ */
+static inline void
+vmci_q_header_add_producer_tail(struct vmci_queue_header *q_header,
+ size_t add,
+ u64 queue_size)
+{
+ vmci_qp_add_pointer(&q_header->producer_tail, add, queue_size);
+}
+
+/*
+ * Helper routine to increment the Consumer Head. Fundamentally,
+ * vmci_qp_add_pointer() is used to manipulate the head itself.
+ */
+static inline void
+vmci_q_header_add_consumer_head(struct vmci_queue_header *q_header,
+ size_t add,
+ u64 queue_size)
+{
+ vmci_qp_add_pointer(&q_header->consumer_head, add, queue_size);
+}
+
+/*
+ * Helper routine for getting the head and the tail pointer for a queue.
+ * Both the VMCIQueues are needed to get both the pointers for one queue.
+ */
+static inline void
+vmci_q_header_get_pointers(const struct vmci_queue_header *produce_q_header,
+ const struct vmci_queue_header *consume_q_header,
+ u64 *producer_tail,
+ u64 *consumer_head)
+{
+ if (producer_tail)
+ *producer_tail = vmci_q_header_producer_tail(produce_q_header);
+
+ if (consumer_head)
+ *consumer_head = vmci_q_header_consumer_head(consume_q_header);
+}
+
+static inline void vmci_q_header_init(struct vmci_queue_header *q_header,
+ const struct vmci_handle handle)
+{
+ q_header->handle = handle;
+ atomic64_set(&q_header->producer_tail, 0);
+ atomic64_set(&q_header->consumer_head, 0);
+}
+
+/*
+ * Finds available free space in a produce queue to enqueue more
+ * data or reports an error if queue pair corruption is detected.
+ */
+static s64
+vmci_q_header_free_space(const struct vmci_queue_header *produce_q_header,
+ const struct vmci_queue_header *consume_q_header,
+ const u64 produce_q_size)
+{
+ u64 tail;
+ u64 head;
+ u64 free_space;
+
+ tail = vmci_q_header_producer_tail(produce_q_header);
+ head = vmci_q_header_consumer_head(consume_q_header);
+
+ if (tail >= produce_q_size || head >= produce_q_size)
+ return VMCI_ERROR_INVALID_SIZE;
+
+ /*
+ * Deduct 1 to avoid tail becoming equal to head which causes
+ * ambiguity. If head and tail are equal it means that the
+ * queue is empty.
+ */
+ if (tail >= head)
+ free_space = produce_q_size - (tail - head) - 1;
+ else
+ free_space = head - tail - 1;
+
+ return free_space;
+}
+
+/*
+ * vmci_q_header_free_space() does all the heavy lifting of
+ * determing the number of free bytes in a Queue. This routine,
+ * then subtracts that size from the full size of the Queue so
+ * the caller knows how many bytes are ready to be dequeued.
+ * Results:
+ * On success, available data size in bytes (up to MAX_INT64).
+ * On failure, appropriate error code.
+ */
+static inline s64
+vmci_q_header_buf_ready(const struct vmci_queue_header *consume_q_header,
+ const struct vmci_queue_header *produce_q_header,
+ const u64 consume_q_size)
+{
+ s64 free_space;
+
+ free_space = vmci_q_header_free_space(consume_q_header,
+ produce_q_header, consume_q_size);
+ if (free_space < VMCI_SUCCESS)
+ return free_space;
+
+ return consume_q_size - free_space - 1;
+}
+
+static inline struct vmci_handle vmci_make_handle(u32 cid, u32 rid)
+{
+ struct vmci_handle h;
+
+ h.context = cid;
+ h.resource = rid;
+
+ return h;
+}
+
+#endif /* _VMW_VMCI_DEF_H_ */
On Wed, Nov 07, 2012 at 10:41:26AM -0800, George Zhang wrote:
> +/*
> + * Create a datagram entry given a handle pointer.
> + */
> +static int dg_create_handle(u32 resource_id,
> + u32 flags,
> + u32 priv_flags,
> + vmci_datagram_recv_cb recv_cb,
> + void *client_data, struct vmci_handle *out_handle)
> +{
> + int result;
> + u32 context_id;
> + struct vmci_handle handle;
> + struct datagram_entry *entry;
> +
> + BUG_ON(!recv_cb);
> + BUG_ON(!out_handle);
> + BUG_ON(priv_flags & ~VMCI_PRIVILEGE_ALL_FLAGS);
Given that this is a static function, there's no need for these
"asserts", right? Please send a follow-on patch removing all BUG_ON()
calls from these files, it's not acceptable to crash a user's box from a
driver that is handling parameters _you_ are feeding it.
thanks,
greg k-h
On Wed, Nov 07, 2012 at 10:42:15AM -0800, George Zhang wrote:
> VMCI queue pairs allow for bi-directional ordered communication between host and guests.
You obviously didn't run checkpatch on this file :(
Please send me a follow-on patch to resolve the issues it found, before
someone else does...
thanks,
greg k-h
On Wed, Nov 07, 2012 at 10:43:03AM -0800, George Zhang wrote:
> +#ifndef _VMCI_COMMONINT_H_
> +#define _VMCI_COMMONINT_H_
> +
> +#include <linux/printk.h>
Why printk from a .h file?
> +
> +#define ASSERT(cond) BUG_ON(!(cond))
No, please no.
> +
> +#define PCI_VENDOR_ID_VMWARE 0x15AD
> +#define PCI_DEVICE_ID_VMWARE_VMCI 0x0740
Align these properly? Why do they need to be in a .h file?
> +#define VMCI_DRIVER_VERSION_STRING "1.0.0.0-k"
Is this needed at all anymore?
> +#define MODULE_NAME "vmw_vmci"
This isn't needed, use the built-in macro instead.
> +/* Print magic... whee! */
> +#ifdef pr_fmt
> +#undef pr_fmt
> +#define pr_fmt(fmt) MODULE_NAME ": " fmt
> +#endif
That's not how you do this, sorry. Please do it like the rest of the
kernel.
> --- /dev/null
> +++ b/include/linux/vmw_vmci_api.h
With the new uapi rework, should this file still be here? What about
the other .h files?
> +typedef u32 vmci_id;
> +typedef u32 vmci_privilege_flags;
> +typedef u32 vmci_event;
Ick, no typdefs please.
> +static inline struct vmci_handle VMCI_MAKE_HANDLE(vmci_id cid, vmci_id rid)
> +{
> + struct vmci_handle h;
> + h.context = cid;
> + h.resource = rid;
> + return h;
> +}
You return a structure on the stack that just went away? Yeah, I know
it's an inline, but come on, that's not ok.
> +#define VMCI_HANDLE_TO_CONTEXT_ID(_handle) ((_handle).context)
> +#define VMCI_HANDLE_TO_RESOURCE_ID(_handle) ((_handle).resource)
It's longer to write the macro out than to just do .context or
.resource, just use them instead.
> +#define VMCI_HANDLE_EQUAL(_h1, _h2) ((_h1).context == (_h2).context && \
> + (_h1).resource == (_h2).resource)
Inline function instead? How often do you really need this?
> +#define VMCI_INVALID_ID ~0
> +static const struct vmci_handle VMCI_INVALID_HANDLE = { VMCI_INVALID_ID,
> + VMCI_INVALID_ID
> +};
C99 initializers please.
> +
> +#define VMCI_HANDLE_INVALID(_handle) \
> + VMCI_HANDLE_EQUAL((_handle), VMCI_INVALID_HANDLE)
Gotta love magic values :(
> +/*
> + * The below defines can be used to send anonymous requests.
> + * This also indicates that no response is expected.
> + */
> +#define VMCI_ANON_SRC_CONTEXT_ID VMCI_INVALID_ID
> +#define VMCI_ANON_SRC_RESOURCE_ID VMCI_INVALID_ID
> +#define VMCI_ANON_SRC_HANDLE vmci_make_handle(VMCI_ANON_SRC_CONTEXT_ID, \
> + VMCI_ANON_SRC_RESOURCE_ID)
So you just created an invalid message?
> +/* The lowest 16 context ids are reserved for internal use. */
> +#define VMCI_RESERVED_CID_LIMIT ((u32) 16)
> +
> +/*
> + * Hypervisor context id, used for calling into hypervisor
> + * supplied services from the VM.
> + */
> +#define VMCI_HYPERVISOR_CONTEXT_ID 0
> +
> +/*
> + * Well-known context id, a logical context that contains a set of
> + * well-known services. This context ID is now obsolete.
> + */
> +#define VMCI_WELL_KNOWN_CONTEXT_ID 1
> +
> +/*
> + * Context ID used by host endpoints.
> + */
> +#define VMCI_HOST_CONTEXT_ID 2
> +
> +#define VMCI_CONTEXT_IS_VM(_cid) (VMCI_INVALID_ID != (_cid) && \
> + (_cid) > VMCI_HOST_CONTEXT_ID)
> +
Are you sure all of this stuff needs to be in a .h file that lives in
include/linux/?
> +/*
> + * Linux defines _IO* macros, but the core kernel code ignore the encoded
> + * ioctl value. It is up to individual drivers to decode the value (for
> + * example to look at the size of a structure to determine which version
> + * of a specific command should be used) or not (which is what we
> + * currently do, so right now the ioctl value for a given command is the
> + * command itself).
> + *
> + * Hence, we just define the IOCTL_VMCI_foo values directly, with no
> + * intermediate IOCTLCMD_ representation.
> + */
> +#define IOCTLCMD(_cmd) IOCTL_VMCI_ ## _cmd
I don't recall ever getting a valid answer for this (if you did, my
appologies, can you repeat it). What in the world are you talking about
here? Why is your driver somehow special from the thousands of other
ones that use the in-kernel IO macros properly for an ioctl?
> +enum {
> + /*
> + * We need to bracket the range of values used for ioctls,
> + * because x86_64 Linux forces us to explicitly register ioctl
> + * handlers by value for handling 32 bit ioctl syscalls.
> + * Hence FIRST and LAST. Pick something for FIRST that
> + * doesn't collide with vmmon (2001+).
> + */
What do you mean here? What are you trying to work around? And why?
> + IOCTLCMD(FIRST) = 1951,
> + IOCTLCMD(VERSION) = IOCTLCMD(FIRST),
Huh?
> +
> + /* BEGIN VMCI */
> + IOCTLCMD(INIT_CONTEXT),
> +
> + /*
> + * The following two were used for process and datagram
> + * process creation. They are not used anymore and reserved
> + * for future use. They will fail if issued.
> + */
> + IOCTLCMD(RESERVED1),
> + IOCTLCMD(RESERVED2),
Who's reserving them? If they aren't in the driver, no need to list
them here at all, who would ever call them? I think this is due to your
"weird" way of defining ioctls, please don't.
> + /*
> + * The following used to be for shared memory. It is now
> + * unused and and is reserved for future use. It will fail if
> + * issued.
> + */
> + IOCTLCMD(RESERVED3),
Same as above.
> + /*
> + * The follwoing three were also used to be for shared
> + * memory. An old WS6 user-mode client might try to use them
> + * with the new driver, but since we ensure that only contexts
> + * created by VMX'en of the appropriate version
> + * (VMCI_VERSION_NOTIFY or VMCI_VERSION_NEWQP) or higher use
> + * these ioctl, everything is fine.
> + */
> + IOCTLCMD(QUEUEPAIR_SETVA),
> + IOCTLCMD(NOTIFY_RESOURCE),
> + IOCTLCMD(NOTIFICATIONS_RECEIVE),
> + IOCTLCMD(VERSION2),
> + IOCTLCMD(QUEUEPAIR_ALLOC),
> + IOCTLCMD(QUEUEPAIR_SETPAGEFILE),
> + IOCTLCMD(QUEUEPAIR_DETACH),
> + IOCTLCMD(DATAGRAM_SEND),
> + IOCTLCMD(DATAGRAM_RECEIVE),
> + IOCTLCMD(DATAGRAM_REQUEST_MAP),
> + IOCTLCMD(DATAGRAM_REMOVE_MAP),
> + IOCTLCMD(CTX_ADD_NOTIFICATION),
> + IOCTLCMD(CTX_REMOVE_NOTIFICATION),
> + IOCTLCMD(CTX_GET_CPT_STATE),
> + IOCTLCMD(CTX_SET_CPT_STATE),
> + IOCTLCMD(GET_CONTEXT_ID),
> + IOCTLCMD(LAST),
> + /* END VMCI */
> +
> + /*
> + * VMCI Socket IOCTLS are defined next and go from
> + * IOCTLCMD(LAST) (1972) to 1990. VMware reserves a range of
> + * 4 ioctls for VMCI Sockets to grow. We cannot reserve many
> + * ioctls here since we are close to overlapping with vmmon
> + * ioctls (2001+). Define a meta-ioctl if running out of this
> + * binary space.
> + */
> + IOCTLCMD(SOCKETS_FIRST) = IOCTLCMD(LAST),
> + IOCTLCMD(SOCKETS_VERSION) = IOCTLCMD(SOCKETS_FIRST),
> + IOCTLCMD(SOCKETS_BIND),
> + IOCTLCMD(SOCKETS_SET_SYMBOLS),
> + IOCTLCMD(SOCKETS_CONNECT),
As you don't have any in-kernel code using these ioctls, why define them
yet?
And again, what's with the "reserving" ioctls? That's not ok.
> + /*
> + * The next two values are public (vmci_sockets.h) and cannot be
> + * changed. That means the number of values above these cannot be
> + * changed either unless the base index (specified below) is updated
> + * accordingly.
> + */
> + IOCTLCMD(SOCKETS_GET_AF_VALUE),
> + IOCTLCMD(SOCKETS_GET_LOCAL_CID),
> + IOCTLCMD(SOCKETS_GET_SOCK_NAME),
> + IOCTLCMD(SOCKETS_GET_SOCK_OPT),
> + IOCTLCMD(SOCKETS_GET_VM_BY_NAME),
> + IOCTLCMD(SOCKETS_IOCTL),
> + IOCTLCMD(SOCKETS_LISTEN),
> + IOCTLCMD(SOCKETS_RECV),
> + IOCTLCMD(SOCKETS_RECV_FROM),
> + IOCTLCMD(SOCKETS_SELECT),
> + IOCTLCMD(SOCKETS_SEND),
> + IOCTLCMD(SOCKETS_SEND_TO),
> + IOCTLCMD(SOCKETS_SET_SOCK_OPT),
> + IOCTLCMD(SOCKETS_SHUTDOWN),
> + IOCTLCMD(SOCKETS_SOCKET), /* 1990 on Linux. */
> +
> + IOCTLCMD(SOCKETS_LAST) = 1994, /* 1994 on Linux. */
> +
> + /*
> + * The VSockets ioctls occupy the block above. We define a
> + * new range of VMCI ioctls to maintain binary compatibility
> + * between the user land and the kernel driver. Careful,
> + * vmmon ioctls start from 2001, so this means we can add only
> + * 4 new VMCI ioctls. Define a meta-ioctl if running out of
> + * this binary space.
> + */
> + IOCTLCMD(FIRST2),
> + IOCTLCMD(SET_NOTIFY) = IOCTLCMD(FIRST2), /* 1995 on Linux. */
> + IOCTLCMD(LAST2),
> +};
I still don't understand, sorry.
Please redo this in the "normal" Linux kernel way of defining ioctls.
No need to be "special" here.
And shouldn't something like this be in a separate .h file, that gets
included from userspace? Not all of the stuff in this .h file should
get exported to userspace, right?
> +static inline struct vmci_handle vmci_make_handle(u32 cid, u32 rid)
> +{
> + struct vmci_handle h;
> +
> + h.context = cid;
> + h.resource = rid;
> +
> + return h;
> +}
Where have I see this before... :)
thanks,
greg k-h
On Wed, Nov 07, 2012 at 10:42:53AM -0800, George Zhang wrote:
> +/*
> + * Sets up a given context for notify to work. Calls drv_map_bool_ptr()
> + * which maps the notify boolean in user VA in kernel space.
> + */
> +static int vmci_host_setup_notify(struct vmci_ctx *context,
> + unsigned long uva)
> +{
> + struct page *page;
> + int retval;
> +
> + if (context->notify_page) {
> + pr_devel("%s: Notify mechanism is already set up.\n", __func__);
> + return VMCI_ERROR_DUPLICATE_ENTRY;
> + }
> +
> + if (!access_ok(VERIFY_WRITE, (void __user *)uva, sizeof(bool)))
> + return VMCI_ERROR_GENERIC;
This line causes sparse to complain. The odds that userspace knows what
gcc is using for "bool" is pretty low.
Please fix.
thanks,
greg k-h
On Wed, Nov 07, 2012 at 10:43:03AM -0800, George Zhang wrote:
> +#ifndef _VMCI_COMMONINT_H_
> +#define _VMCI_COMMONINT_H_
> +
> +#include <linux/printk.h>
Why printk from a .h file?
> +
> +#define ASSERT(cond) BUG_ON(!(cond))
No, please no.
> +
> +#define PCI_VENDOR_ID_VMWARE 0x15AD
> +#define PCI_DEVICE_ID_VMWARE_VMCI 0x0740
Align these properly? Why do they need to be in a .h file?
> +#define VMCI_DRIVER_VERSION_STRING "1.0.0.0-k"
Is this needed at all anymore?
> +#define MODULE_NAME "vmw_vmci"
This isn't needed, use the built-in macro instead.
> +/* Print magic... whee! */
> +#ifdef pr_fmt
> +#undef pr_fmt
> +#define pr_fmt(fmt) MODULE_NAME ": " fmt
> +#endif
That's not how you do this, sorry. Please do it like the rest of the
kernel.
> --- /dev/null
> +++ b/include/linux/vmw_vmci_api.h
With the new uapi rework, should this file still be here? What about
the other .h files?
> +typedef u32 vmci_id;
> +typedef u32 vmci_privilege_flags;
> +typedef u32 vmci_event;
Ick, no typdefs please.
> +static inline struct vmci_handle VMCI_MAKE_HANDLE(vmci_id cid, vmci_id rid)
> +{
> + struct vmci_handle h;
> + h.context = cid;
> + h.resource = rid;
> + return h;
> +}
You return a structure on the stack that just went away? Yeah, I know
it's an inline, but come on, that's not ok.
> +#define VMCI_HANDLE_TO_CONTEXT_ID(_handle) ((_handle).context)
> +#define VMCI_HANDLE_TO_RESOURCE_ID(_handle) ((_handle).resource)
It's longer to write the macro out than to just do .context or
.resource, just use them instead.
> +#define VMCI_HANDLE_EQUAL(_h1, _h2) ((_h1).context == (_h2).context && \
> + (_h1).resource == (_h2).resource)
Inline function instead? How often do you really need this?
> +#define VMCI_INVALID_ID ~0
> +static const struct vmci_handle VMCI_INVALID_HANDLE = { VMCI_INVALID_ID,
> + VMCI_INVALID_ID
> +};
C99 initializers please.
> +
> +#define VMCI_HANDLE_INVALID(_handle) \
> + VMCI_HANDLE_EQUAL((_handle), VMCI_INVALID_HANDLE)
Gotta love magic values :(
> +/*
> + * The below defines can be used to send anonymous requests.
> + * This also indicates that no response is expected.
> + */
> +#define VMCI_ANON_SRC_CONTEXT_ID VMCI_INVALID_ID
> +#define VMCI_ANON_SRC_RESOURCE_ID VMCI_INVALID_ID
> +#define VMCI_ANON_SRC_HANDLE vmci_make_handle(VMCI_ANON_SRC_CONTEXT_ID, \
> + VMCI_ANON_SRC_RESOURCE_ID)
So you just created an invalid message?
> +/* The lowest 16 context ids are reserved for internal use. */
> +#define VMCI_RESERVED_CID_LIMIT ((u32) 16)
> +
> +/*
> + * Hypervisor context id, used for calling into hypervisor
> + * supplied services from the VM.
> + */
> +#define VMCI_HYPERVISOR_CONTEXT_ID 0
> +
> +/*
> + * Well-known context id, a logical context that contains a set of
> + * well-known services. This context ID is now obsolete.
> + */
> +#define VMCI_WELL_KNOWN_CONTEXT_ID 1
> +
> +/*
> + * Context ID used by host endpoints.
> + */
> +#define VMCI_HOST_CONTEXT_ID 2
> +
> +#define VMCI_CONTEXT_IS_VM(_cid) (VMCI_INVALID_ID != (_cid) && \
> + (_cid) > VMCI_HOST_CONTEXT_ID)
> +
Are you sure all of this stuff needs to be in a .h file that lives in
include/linux/?
> +/*
> + * Linux defines _IO* macros, but the core kernel code ignore the encoded
> + * ioctl value. It is up to individual drivers to decode the value (for
> + * example to look at the size of a structure to determine which version
> + * of a specific command should be used) or not (which is what we
> + * currently do, so right now the ioctl value for a given command is the
> + * command itself).
> + *
> + * Hence, we just define the IOCTL_VMCI_foo values directly, with no
> + * intermediate IOCTLCMD_ representation.
> + */
> +#define IOCTLCMD(_cmd) IOCTL_VMCI_ ## _cmd
I don't recall ever getting a valid answer for this (if you did, my
appologies, can you repeat it). What in the world are you talking about
here? Why is your driver somehow special from the thousands of other
ones that use the in-kernel IO macros properly for an ioctl?
> +enum {
> + /*
> + * We need to bracket the range of values used for ioctls,
> + * because x86_64 Linux forces us to explicitly register ioctl
> + * handlers by value for handling 32 bit ioctl syscalls.
> + * Hence FIRST and LAST. Pick something for FIRST that
> + * doesn't collide with vmmon (2001+).
> + */
What do you mean here? What are you trying to work around? And why?
> + IOCTLCMD(FIRST) = 1951,
> + IOCTLCMD(VERSION) = IOCTLCMD(FIRST),
Huh?
> +
> + /* BEGIN VMCI */
> + IOCTLCMD(INIT_CONTEXT),
> +
> + /*
> + * The following two were used for process and datagram
> + * process creation. They are not used anymore and reserved
> + * for future use. They will fail if issued.
> + */
> + IOCTLCMD(RESERVED1),
> + IOCTLCMD(RESERVED2),
Who's reserving them? If they aren't in the driver, no need to list
them here at all, who would ever call them? I think this is due to your
"weird" way of defining ioctls, please don't.
> + /*
> + * The following used to be for shared memory. It is now
> + * unused and and is reserved for future use. It will fail if
> + * issued.
> + */
> + IOCTLCMD(RESERVED3),
Same as above.
> + /*
> + * The follwoing three were also used to be for shared
> + * memory. An old WS6 user-mode client might try to use them
> + * with the new driver, but since we ensure that only contexts
> + * created by VMX'en of the appropriate version
> + * (VMCI_VERSION_NOTIFY or VMCI_VERSION_NEWQP) or higher use
> + * these ioctl, everything is fine.
> + */
> + IOCTLCMD(QUEUEPAIR_SETVA),
> + IOCTLCMD(NOTIFY_RESOURCE),
> + IOCTLCMD(NOTIFICATIONS_RECEIVE),
> + IOCTLCMD(VERSION2),
> + IOCTLCMD(QUEUEPAIR_ALLOC),
> + IOCTLCMD(QUEUEPAIR_SETPAGEFILE),
> + IOCTLCMD(QUEUEPAIR_DETACH),
> + IOCTLCMD(DATAGRAM_SEND),
> + IOCTLCMD(DATAGRAM_RECEIVE),
> + IOCTLCMD(DATAGRAM_REQUEST_MAP),
> + IOCTLCMD(DATAGRAM_REMOVE_MAP),
> + IOCTLCMD(CTX_ADD_NOTIFICATION),
> + IOCTLCMD(CTX_REMOVE_NOTIFICATION),
> + IOCTLCMD(CTX_GET_CPT_STATE),
> + IOCTLCMD(CTX_SET_CPT_STATE),
> + IOCTLCMD(GET_CONTEXT_ID),
> + IOCTLCMD(LAST),
> + /* END VMCI */
> +
> + /*
> + * VMCI Socket IOCTLS are defined next and go from
> + * IOCTLCMD(LAST) (1972) to 1990. VMware reserves a range of
> + * 4 ioctls for VMCI Sockets to grow. We cannot reserve many
> + * ioctls here since we are close to overlapping with vmmon
> + * ioctls (2001+). Define a meta-ioctl if running out of this
> + * binary space.
> + */
> + IOCTLCMD(SOCKETS_FIRST) = IOCTLCMD(LAST),
> + IOCTLCMD(SOCKETS_VERSION) = IOCTLCMD(SOCKETS_FIRST),
> + IOCTLCMD(SOCKETS_BIND),
> + IOCTLCMD(SOCKETS_SET_SYMBOLS),
> + IOCTLCMD(SOCKETS_CONNECT),
As you don't have any in-kernel code using these ioctls, why define them
yet?
And again, what's with the "reserving" ioctls? That's not ok.
> + /*
> + * The next two values are public (vmci_sockets.h) and cannot be
> + * changed. That means the number of values above these cannot be
> + * changed either unless the base index (specified below) is updated
> + * accordingly.
> + */
> + IOCTLCMD(SOCKETS_GET_AF_VALUE),
> + IOCTLCMD(SOCKETS_GET_LOCAL_CID),
> + IOCTLCMD(SOCKETS_GET_SOCK_NAME),
> + IOCTLCMD(SOCKETS_GET_SOCK_OPT),
> + IOCTLCMD(SOCKETS_GET_VM_BY_NAME),
> + IOCTLCMD(SOCKETS_IOCTL),
> + IOCTLCMD(SOCKETS_LISTEN),
> + IOCTLCMD(SOCKETS_RECV),
> + IOCTLCMD(SOCKETS_RECV_FROM),
> + IOCTLCMD(SOCKETS_SELECT),
> + IOCTLCMD(SOCKETS_SEND),
> + IOCTLCMD(SOCKETS_SEND_TO),
> + IOCTLCMD(SOCKETS_SET_SOCK_OPT),
> + IOCTLCMD(SOCKETS_SHUTDOWN),
> + IOCTLCMD(SOCKETS_SOCKET), /* 1990 on Linux. */
> +
> + IOCTLCMD(SOCKETS_LAST) = 1994, /* 1994 on Linux. */
> +
> + /*
> + * The VSockets ioctls occupy the block above. We define a
> + * new range of VMCI ioctls to maintain binary compatibility
> + * between the user land and the kernel driver. Careful,
> + * vmmon ioctls start from 2001, so this means we can add only
> + * 4 new VMCI ioctls. Define a meta-ioctl if running out of
> + * this binary space.
> + */
> + IOCTLCMD(FIRST2),
> + IOCTLCMD(SET_NOTIFY) = IOCTLCMD(FIRST2), /* 1995 on Linux. */
> + IOCTLCMD(LAST2),
> +};
I still don't understand, sorry.
Please redo this in the "normal" Linux kernel way of defining ioctls.
No need to be "special" here.
And shouldn't something like this be in a separate .h file, that gets
included from userspace? Not all of the stuff in this .h file should
get exported to userspace, right?
> +static inline struct vmci_handle vmci_make_handle(u32 cid, u32 rid)
> +{
> + struct vmci_handle h;
> +
> + h.context = cid;
> + h.resource = rid;
> +
> + return h;
> +}
Where have I see this before... :)
thanks,
greg k-h
Hi Greg,
For some reason it still didn't go through to our corporate mail server
but I see it on LKML.
On Mon, Nov 26, 2012 at 04:03:04PM -0800, Greg KH wrote:
> On Wed, Nov 07, 2012 at 10:43:03AM -0800, George Zhang wrote:
>
> > +static inline struct vmci_handle VMCI_MAKE_HANDLE(vmci_id cid, vmci_id rid)
> > +{
> > + struct vmci_handle h;
> > + h.context = cid;
> > + h.resource = rid;
> > + return h;
> > +}
>
> You return a structure on the stack that just went away? Yeah, I know
> it's an inline, but come on, that's not ok.
This is certainly OK even if it is not inline, we return the _value_,
not the pointer to the stacki memory. And yes, the structure is 64 bit
value so it is returned in registers.
>
> > +#define VMCI_HANDLE_TO_CONTEXT_ID(_handle) ((_handle).context)
> > +#define VMCI_HANDLE_TO_RESOURCE_ID(_handle) ((_handle).resource)
>
> It's longer to write the macro out than to just do .context or
> .resource, just use them instead.
We really want vmci_handle to be opaque where possible and use proper
accessors.
>
> > +#define VMCI_HANDLE_EQUAL(_h1, _h2) ((_h1).context == (_h2).context && \
> > + (_h1).resource == (_h2).resource)
>
> Inline function instead? How often do you really need this?
OK, makes sense.
>
> > +#define VMCI_INVALID_ID ~0
> > +static const struct vmci_handle VMCI_INVALID_HANDLE = { VMCI_INVALID_ID,
> > + VMCI_INVALID_ID
> > +};
>
> C99 initializers please.
OK.
>
> > +
> > +#define VMCI_HANDLE_INVALID(_handle) \
> > + VMCI_HANDLE_EQUAL((_handle), VMCI_INVALID_HANDLE)
>
> Gotta love magic values :(
?
>
> > +/*
> > + * The below defines can be used to send anonymous requests.
> > + * This also indicates that no response is expected.
> > + */
> > +#define VMCI_ANON_SRC_CONTEXT_ID VMCI_INVALID_ID
> > +#define VMCI_ANON_SRC_RESOURCE_ID VMCI_INVALID_ID
> > +#define VMCI_ANON_SRC_HANDLE vmci_make_handle(VMCI_ANON_SRC_CONTEXT_ID, \
> > + VMCI_ANON_SRC_RESOURCE_ID)
>
> So you just created an invalid message?
Anonymous one, yes.
>
> > +/* The lowest 16 context ids are reserved for internal use. */
> > +#define VMCI_RESERVED_CID_LIMIT ((u32) 16)
> > +
> > +/*
> > + * Hypervisor context id, used for calling into hypervisor
> > + * supplied services from the VM.
> > + */
> > +#define VMCI_HYPERVISOR_CONTEXT_ID 0
> > +
> > +/*
> > + * Well-known context id, a logical context that contains a set of
> > + * well-known services. This context ID is now obsolete.
> > + */
> > +#define VMCI_WELL_KNOWN_CONTEXT_ID 1
> > +
> > +/*
> > + * Context ID used by host endpoints.
> > + */
> > +#define VMCI_HOST_CONTEXT_ID 2
> > +
> > +#define VMCI_CONTEXT_IS_VM(_cid) (VMCI_INVALID_ID != (_cid) && \
> > + (_cid) > VMCI_HOST_CONTEXT_ID)
> > +
>
> Are you sure all of this stuff needs to be in a .h file that lives in
> include/linux/?
Probably not. We'll rebase to 3.7 and split as needed.
>
> > +/*
> > + * Linux defines _IO* macros, but the core kernel code ignore the encoded
> > + * ioctl value. It is up to individual drivers to decode the value (for
> > + * example to look at the size of a structure to determine which version
> > + * of a specific command should be used) or not (which is what we
> > + * currently do, so right now the ioctl value for a given command is the
> > + * command itself).
> > + *
> > + * Hence, we just define the IOCTL_VMCI_foo values directly, with no
> > + * intermediate IOCTLCMD_ representation.
> > + */
> > +#define IOCTLCMD(_cmd) IOCTL_VMCI_ ## _cmd
>
> I don't recall ever getting a valid answer for this (if you did, my
> appologies, can you repeat it). What in the world are you talking about
> here? Why is your driver somehow special from the thousands of other
> ones that use the in-kernel IO macros properly for an ioctl?
Hmm, not sure, we'll review ioctl definitions and fix as needed.
Thanks!
--
Dmitry
On Mon, Nov 26, 2012 at 04:23:57PM -0800, Dmitry Torokhov wrote:
> Hi Greg,
>
> For some reason it still didn't go through to our corporate mail server
> but I see it on LKML.
Good.
> On Mon, Nov 26, 2012 at 04:03:04PM -0800, Greg KH wrote:
> > On Wed, Nov 07, 2012 at 10:43:03AM -0800, George Zhang wrote:
> >
> > > +static inline struct vmci_handle VMCI_MAKE_HANDLE(vmci_id cid, vmci_id rid)
> > > +{
> > > + struct vmci_handle h;
> > > + h.context = cid;
> > > + h.resource = rid;
> > > + return h;
> > > +}
> >
> > You return a structure on the stack that just went away? Yeah, I know
> > it's an inline, but come on, that's not ok.
>
> This is certainly OK even if it is not inline, we return the _value_,
> not the pointer to the stacki memory. And yes, the structure is 64 bit
> value so it is returned in registers.
Even on a 32bit processor? Also, you already have another function that
does this same thing, so having 2 functions in the same patch seems odd,
right?
greg k-h
On Monday, November 26, 2012 04:32:39 PM Greg KH wrote:
> On Mon, Nov 26, 2012 at 04:23:57PM -0800, Dmitry Torokhov wrote:
> > Hi Greg,
> >
> > For some reason it still didn't go through to our corporate mail server
> > but I see it on LKML.
>
> Good.
>
> > On Mon, Nov 26, 2012 at 04:03:04PM -0800, Greg KH wrote:
> > > On Wed, Nov 07, 2012 at 10:43:03AM -0800, George Zhang wrote:
> > > > +static inline struct vmci_handle VMCI_MAKE_HANDLE(vmci_id cid,
> > > > vmci_id rid) +{
> > > > + struct vmci_handle h;
> > > > + h.context = cid;
> > > > + h.resource = rid;
> > > > + return h;
> > > > +}
> > >
> > > You return a structure on the stack that just went away? Yeah, I know
> > > it's an inline, but come on, that's not ok.
> >
> > This is certainly OK even if it is not inline, we return the _value_,
> > not the pointer to the stacki memory. And yes, the structure is 64 bit
> > value so it is returned in registers.
>
> Even on a 32bit processor?
I thought it would, but it looks like it won't. Maybe we'll just switch it
to a macro with C99 style initializators to keep the same semantic but
avoid the question.
> Also, you already have another function that
> does this same thing, so having 2 functions in the same patch seems odd,
> right?
Yes, you can say that it is probably a bit excessive.
OK, now that we are on the same page we'll go and fix the issues.
Thanks,
Dmitry
I didn't get the resend either, so it seems our corporate mail really is
eating messages. Lovely.
> > > +#define IOCTLCMD(_cmd) IOCTL_VMCI_ ## _cmd
> >
> > I don't recall ever getting a valid answer for this (if you did, my
> > appologies, can you repeat it). What in the world are you talking
> > about here? Why is your driver somehow special from the thousands
> > of other ones that use the in-kernel IO macros properly for an
> > ioctl?
Because we're morons. And unfortunately, we've shipped our product
using those broken definitions: our VMX uses them to talk to the driver.
So here's what we'd like to do. We will send out a patch soon that
fixes the other issues you mention and also adds IOCTL definitions the
proper way using _IOBLAH(). But we'd also like to retain these broken
definitions for a short period, commented as such, at least until we
can get out a patch release to Workstation 9, at which point we can
remove them. Does that sound reasonable?
Thanks!
- Andy
On Fri, Nov 30, 2012 at 08:47:46AM -0800, Andy King wrote:
> I didn't get the resend either, so it seems our corporate mail really is
> eating messages. Lovely.
>
> > > > +#define IOCTLCMD(_cmd) IOCTL_VMCI_ ## _cmd
> > >
> > > I don't recall ever getting a valid answer for this (if you did, my
> > > appologies, can you repeat it). What in the world are you talking
> > > about here? Why is your driver somehow special from the thousands
> > > of other ones that use the in-kernel IO macros properly for an
> > > ioctl?
>
> Because we're morons. And unfortunately, we've shipped our product
> using those broken definitions: our VMX uses them to talk to the driver.
> So here's what we'd like to do. We will send out a patch soon that
> fixes the other issues you mention and also adds IOCTL definitions the
> proper way using _IOBLAH(). But we'd also like to retain these broken
> definitions for a short period, commented as such, at least until we
> can get out a patch release to Workstation 9, at which point we can
> remove them. Does that sound reasonable?
It has been my experience, that when people say "We will remove that api
sometime in the future", it never happens. So why not just do it now?
Especially given that this code will be coming out in 3.9 at the
earliest, and that is 6 months away, so that should be plenty of time to
get this fixed up.
thanks,
greg k-h
On Friday, November 30, 2012 09:09:21 AM Greg KH wrote:
> On Fri, Nov 30, 2012 at 08:47:46AM -0800, Andy King wrote:
> > I didn't get the resend either, so it seems our corporate mail really is
> > eating messages. Lovely.
> >
> > > > > +#define IOCTLCMD(_cmd) IOCTL_VMCI_ ## _cmd
> > > >
> > > > I don't recall ever getting a valid answer for this (if you did, my
> > > > appologies, can you repeat it). What in the world are you talking
> > > > about here? Why is your driver somehow special from the thousands
> > > > of other ones that use the in-kernel IO macros properly for an
> > > > ioctl?
> >
> > Because we're morons. And unfortunately, we've shipped our product
> > using those broken definitions: our VMX uses them to talk to the driver.
> > So here's what we'd like to do. We will send out a patch soon that
> > fixes the other issues you mention and also adds IOCTL definitions the
> > proper way using _IOBLAH(). But we'd also like to retain these broken
> > definitions for a short period, commented as such, at least until we
> > can get out a patch release to Workstation 9, at which point we can
> > remove them. Does that sound reasonable?
>
> It has been my experience, that when people say "We will remove that api
> sometime in the future", it never happens. So why not just do it now?
>
> Especially given that this code will be coming out in 3.9 at the
> earliest, and that is 6 months away, so that should be plenty of time to
> get this fixed up.
Our schedule for releasing hosted products is not necessarily aligned
with mainline kernel releases.
Also, thinking about it some more, maybe we should simply keep the old
ioctls as is? Yes, they would not use the standard kernel style encoding
for direction, etc, but the encoding only important if ioctl handler
wants to parse and use it. Since we do not flat ioctl number is fine
with us.
That said we 'll clean up comments and numbers to remove stuff not
applicable to mainline.
Thanks,
Dmitry
On Fri, Nov 30, 2012 at 09:20:41AM -0800, Dmitry Torokhov wrote:
> On Friday, November 30, 2012 09:09:21 AM Greg KH wrote:
> > On Fri, Nov 30, 2012 at 08:47:46AM -0800, Andy King wrote:
> > > I didn't get the resend either, so it seems our corporate mail really is
> > > eating messages. Lovely.
> > >
> > > > > > +#define IOCTLCMD(_cmd) IOCTL_VMCI_ ## _cmd
> > > > >
> > > > > I don't recall ever getting a valid answer for this (if you did, my
> > > > > appologies, can you repeat it). What in the world are you talking
> > > > > about here? Why is your driver somehow special from the thousands
> > > > > of other ones that use the in-kernel IO macros properly for an
> > > > > ioctl?
> > >
> > > Because we're morons. And unfortunately, we've shipped our product
> > > using those broken definitions: our VMX uses them to talk to the driver.
> > > So here's what we'd like to do. We will send out a patch soon that
> > > fixes the other issues you mention and also adds IOCTL definitions the
> > > proper way using _IOBLAH(). But we'd also like to retain these broken
> > > definitions for a short period, commented as such, at least until we
> > > can get out a patch release to Workstation 9, at which point we can
> > > remove them. Does that sound reasonable?
> >
> > It has been my experience, that when people say "We will remove that api
> > sometime in the future", it never happens. So why not just do it now?
> >
> > Especially given that this code will be coming out in 3.9 at the
> > earliest, and that is 6 months away, so that should be plenty of time to
> > get this fixed up.
>
> Our schedule for releasing hosted products is not necessarily aligned
> with mainline kernel releases.
And kernel developers don't really care about company schedules, nor
should they, you know this :)
thanks,
greg k-h
On Friday, November 30, 2012 10:39:18 AM Greg KH wrote:
> On Fri, Nov 30, 2012 at 09:20:41AM -0800, Dmitry Torokhov wrote:
> > On Friday, November 30, 2012 09:09:21 AM Greg KH wrote:
> > > On Fri, Nov 30, 2012 at 08:47:46AM -0800, Andy King wrote:
> > > > I didn't get the resend either, so it seems our corporate mail really
> > > > is
> > > > eating messages. Lovely.
> > > >
> > > > > > > +#define IOCTLCMD(_cmd) IOCTL_VMCI_ ## _cmd
> > > > > >
> > > > > > I don't recall ever getting a valid answer for this (if you did,
> > > > > > my
> > > > > > appologies, can you repeat it). What in the world are you talking
> > > > > > about here? Why is your driver somehow special from the thousands
> > > > > > of other ones that use the in-kernel IO macros properly for an
> > > > > > ioctl?
> > > >
> > > > Because we're morons. And unfortunately, we've shipped our product
> > > > using those broken definitions: our VMX uses them to talk to the
> > > > driver.
> > > > So here's what we'd like to do. We will send out a patch soon that
> > > > fixes the other issues you mention and also adds IOCTL definitions the
> > > > proper way using _IOBLAH(). But we'd also like to retain these broken
> > > > definitions for a short period, commented as such, at least until we
> > > > can get out a patch release to Workstation 9, at which point we can
> > > > remove them. Does that sound reasonable?
> > >
> > > It has been my experience, that when people say "We will remove that api
> > > sometime in the future", it never happens. So why not just do it now?
> > >
> > > Especially given that this code will be coming out in 3.9 at the
> > > earliest, and that is 6 months away, so that should be plenty of time to
> > > get this fixed up.
> >
> > Our schedule for releasing hosted products is not necessarily aligned
> > with mainline kernel releases.
>
> And kernel developers don't really care about company schedules, nor
> should they, you know this :)
>
That is why we are offering a compromise so that older installations
have a chance to work and nobody has to care about schedules too much.
However you snipped the rest of my reply: do we really need to renumber
ioctls? There is no benefit for the driver as its ioctl handler does
not parse the numbers into components.
Thanks,
Dmitry
On Fri, Nov 30, 2012 at 10:45:44AM -0800, Dmitry Torokhov wrote:
> However you snipped the rest of my reply: do we really need to renumber
> ioctls? There is no benefit for the driver as its ioctl handler does
> not parse the numbers into components.
I don't know if you need to renumber, I really don't understand what you
were trying to do with this code, and as it was acting differently from
all other kernel ioctl declarations, I asked for some clarity.
If you can rewrite it to look sane, and keep the same numbers, that's
fine with me.
thanks,
greg k-h
On Friday, November 30, 2012 10:57:55 AM Greg KH wrote:
> On Fri, Nov 30, 2012 at 10:45:44AM -0800, Dmitry Torokhov wrote:
> > However you snipped the rest of my reply: do we really need to renumber
> > ioctls? There is no benefit for the driver as its ioctl handler does
> > not parse the numbers into components.
>
> I don't know if you need to renumber, I really don't understand what you
> were trying to do with this code, and as it was acting differently from
> all other kernel ioctl declarations, I asked for some clarity.
>
> If you can rewrite it to look sane, and keep the same numbers, that's
> fine with me.
OK, it looks like we can redo them as:
#define IOCTL_VMCI_VERSION _IO(7, 0x9f) /* 1951 */
#define IOCTL_VMCI_INIT_CONTEXT _IO(7, 0xa0) /* 1952 */
Is this acceptable?
Thanks,
Dmitry
On Fri, Nov 30, 2012 at 12:09:40PM -0800, Dmitry Torokhov wrote:
> On Friday, November 30, 2012 10:57:55 AM Greg KH wrote:
> > On Fri, Nov 30, 2012 at 10:45:44AM -0800, Dmitry Torokhov wrote:
> > > However you snipped the rest of my reply: do we really need to renumber
> > > ioctls? There is no benefit for the driver as its ioctl handler does
> > > not parse the numbers into components.
> >
> > I don't know if you need to renumber, I really don't understand what you
> > were trying to do with this code, and as it was acting differently from
> > all other kernel ioctl declarations, I asked for some clarity.
> >
> > If you can rewrite it to look sane, and keep the same numbers, that's
> > fine with me.
>
> OK, it looks like we can redo them as:
>
> #define IOCTL_VMCI_VERSION _IO(7, 0x9f) /* 1951 */
> #define IOCTL_VMCI_INIT_CONTEXT _IO(7, 0xa0) /* 1952 */
>
> Is this acceptable?
Sure, that's better. You also got lucky, '7' happens to be unused right
now.
greg k-h
On Friday, November 30, 2012 12:44:06 PM Greg KH wrote:
> On Fri, Nov 30, 2012 at 12:09:40PM -0800, Dmitry Torokhov wrote:
> > On Friday, November 30, 2012 10:57:55 AM Greg KH wrote:
> > > On Fri, Nov 30, 2012 at 10:45:44AM -0800, Dmitry Torokhov wrote:
> > > > However you snipped the rest of my reply: do we really need to
> > > > renumber
> > > > ioctls? There is no benefit for the driver as its ioctl handler does
> > > > not parse the numbers into components.
> > >
> > > I don't know if you need to renumber, I really don't understand what you
> > > were trying to do with this code, and as it was acting differently from
> > > all other kernel ioctl declarations, I asked for some clarity.
> > >
> > > If you can rewrite it to look sane, and keep the same numbers, that's
> > > fine with me.
> >
> > OK, it looks like we can redo them as:
> >
> > #define IOCTL_VMCI_VERSION _IO(7, 0x9f) /* 1951 */
> > #define IOCTL_VMCI_INIT_CONTEXT _IO(7, 0xa0) /* 1952 */
> >
> > Is this acceptable?
>
> Sure, that's better. You also got lucky, '7' happens to be unused right
> now.
Excellent. You said you want the next drop after -rc1, right?
--
Dmitry
On Fri, Nov 30, 2012 at 12:58:25PM -0800, Dmitry Torokhov wrote:
> On Friday, November 30, 2012 12:44:06 PM Greg KH wrote:
> > On Fri, Nov 30, 2012 at 12:09:40PM -0800, Dmitry Torokhov wrote:
> > > On Friday, November 30, 2012 10:57:55 AM Greg KH wrote:
> > > > On Fri, Nov 30, 2012 at 10:45:44AM -0800, Dmitry Torokhov wrote:
> > > > > However you snipped the rest of my reply: do we really need to
> > > > > renumber
> > > > > ioctls? There is no benefit for the driver as its ioctl handler does
> > > > > not parse the numbers into components.
> > > >
> > > > I don't know if you need to renumber, I really don't understand what you
> > > > were trying to do with this code, and as it was acting differently from
> > > > all other kernel ioctl declarations, I asked for some clarity.
> > > >
> > > > If you can rewrite it to look sane, and keep the same numbers, that's
> > > > fine with me.
> > >
> > > OK, it looks like we can redo them as:
> > >
> > > #define IOCTL_VMCI_VERSION _IO(7, 0x9f) /* 1951 */
> > > #define IOCTL_VMCI_INIT_CONTEXT _IO(7, 0xa0) /* 1952 */
> > >
> > > Is this acceptable?
> >
> > Sure, that's better. You also got lucky, '7' happens to be unused right
> > now.
>
> Excellent. You said you want the next drop after -rc1, right?
Yes please, I will be ignoring patches until then.
greg k-h