2021-08-05 08:25:22

by Long Li

[permalink] [raw]
Subject: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

From: Long Li <[email protected]>

Azure Blob storage [1] is Microsoft's object storage solution for the
cloud. Users or client applications can access objects in Blob storage via
HTTP, from anywhere in the world. Objects in Blob storage are accessible
via the Azure Storage REST API, Azure PowerShell, Azure CLI, or an Azure
Storage client library. The Blob storage interface is not designed to be a
POSIX compliant interface.

Problem: When a client accesses Blob storage via HTTP, it must go through
the Blob storage boundary of Azure and get to the storage server through
multiple servers. This is also true for an Azure VM.

Solution: For an Azure VM, the Blob storage access can be accelerated by
having Azure host execute the Blob storage requests to the backend storage
server directly.

This driver implements a VSC (Virtual Service Client) for accelerating Blob
storage access for an Azure VM by communicating with a VSP (Virtual Service
Provider) on the Azure host. Instead of using HTTP to access the Blob
storage, an Azure VM passes the Blob storage request to the VSP on the
Azure host. The Azure host uses its native network to perform Blob storage
requests to the backend server directly.

This driver doesn’t implement Blob storage APIs. It acts as a fast channel
to pass user-mode Blob storage requests to the Azure host. The user-mode
program using this driver implements Blob storage APIs and packages the
Blob storage request as structured data to VSC. The request data is modeled
as three user provided buffers (request, response and data buffers), that
are patterned on the HTTP model used by existing Azure Blob clients. The
VSC passes those buffers to VSP for Blob storage requests.

The driver optimizes Blob storage access for an Azure VM in two ways:

1. The Blob storage requests are performed by the Azure host to the Azure
Blob backend storage server directly.

2. It allows the Azure host to use transport technologies (e.g. RDMA)
available to the Azure host but not available to the VM, to reach to Azure
Blob backend servers.

Test results using this driver for an Azure VM:
100 Blob clients running on an Azure VM, each reading 100GB Block Blobs.
(10 TB total read data)
With REST API over HTTP: 94.4 mins
Using this driver: 72.5 mins
Performance (measured in throughput) gain: 30%.

[1] https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

Cc: Jonathan Corbet <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Dexuan Cui <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Hans de Goede <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Maximilian Luz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Ben Widawsky <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Andra Paraschiv <[email protected]>
Cc: Siddharth Gupta <[email protected]>
Cc: Hannes Reinecke <[email protected]>

Changes:

v2:
Refactored the code in vmbus to scan devices
Reworked Azure Blob driver and moved user-mode interfaces to uapi

v3:
Changed licensing language
Patch format passed "checkpatch --strict"
debugfs and logging, module parameter cleanup
General code clean up
Fix device removal race conditions

v4:
Addressed licencing issues
Changed to dynamic device model

v5:
Added problem statement and test numbers to patch 0
Changed uapi header file to explicitly include all header files needed for user-mode
Make the driver handle vmbus rescind without waiting on user-mode for device removal

Long Li (3):
Drivers: hv: vmbus: add support to ignore certain PCIE devices
Drivers: hv: add Azure Blob driver
Drivers: hv: Add to maintainer for Hyper-V/Azure drivers

Documentation/userspace-api/ioctl/ioctl-number.rst | 2 +
MAINTAINERS | 2 +
drivers/hv/Kconfig | 11 +
drivers/hv/Makefile | 1 +
drivers/hv/channel_mgmt.c | 55 +-
drivers/hv/hv_azure_blob.c | 574 +++++++++++++++++++++
include/linux/hyperv.h | 9 +
include/uapi/misc/hv_azure_blob.h | 35 ++
8 files changed, 683 insertions(+), 6 deletions(-)
create mode 100644 drivers/hv/hv_azure_blob.c
create mode 100644 include/uapi/misc/hv_azure_blob.h

--
1.8.3.1


2021-08-05 08:30:35

by Long Li

[permalink] [raw]
Subject: [Patch v5 2/3] Drivers: hv: add Azure Blob driver

From: Long Li <[email protected]>

Azure Blob provides scalable, secure and shared storage services for the
internet.

This driver adds support for accelerated access to Azure Blob storage for
Azure VMs. As an alternative to using REST APIs over HTTP for Blob access,
an Azure VM can use this driver to send Blob requests to Azure host. Azure
host uses its native network to perform Blob requests directly to Blob
servers.

Cc: Jonathan Corbet <[email protected]>
Cc: "K. Y. Srinivasan" <[email protected]>
Cc: Haiyang Zhang <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Dexuan Cui <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Hans de Goede <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Maximilian Luz <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Ben Widawsky <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: Andra Paraschiv <[email protected]>
Cc: Siddharth Gupta <[email protected]>
Cc: Hannes Reinecke <[email protected]>
Cc: [email protected]
Signed-off-by: Long Li <[email protected]>
---
Documentation/userspace-api/ioctl/ioctl-number.rst | 2 +
drivers/hv/Kconfig | 11 +
drivers/hv/Makefile | 1 +
drivers/hv/channel_mgmt.c | 7 +
drivers/hv/hv_azure_blob.c | 574 +++++++++++++++++++++
include/linux/hyperv.h | 9 +
include/uapi/misc/hv_azure_blob.h | 35 ++
7 files changed, 639 insertions(+)
create mode 100644 drivers/hv/hv_azure_blob.c
create mode 100644 include/uapi/misc/hv_azure_blob.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 9bfc2b5..1ee8c0c7 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -180,6 +180,8 @@ Code Seq# Include File Comments
'R' 01 linux/rfkill.h conflict!
'R' C0-DF net/bluetooth/rfcomm.h
'R' E0 uapi/linux/fsl_mc.h
+'R' F0-FF uapi/misc/hv_azure_blob.h Microsoft Azure Blob driver
+ <mailto:[email protected]>
'S' all linux/cdrom.h conflict!
'S' 80-81 scsi/scsi_ioctl.h conflict!
'S' 82-FF scsi/scsi.h conflict!
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 66c794d..53bebd0 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -27,4 +27,15 @@ config HYPERV_BALLOON
help
Select this option to enable Hyper-V Balloon driver.

+config HYPERV_AZURE_BLOB
+ tristate "Microsoft Azure Blob driver"
+ depends on HYPERV && X86_64
+ help
+ Select this option to enable Microsoft Azure Blob driver.
+
+ This driver implements a fast datapath over Hyper-V to support
+ accelerated access to Microsoft Azure Blob services.
+ To compile this driver as a module, choose M here. The module will be
+ called azure_blob.
+
endmenu
diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
index 94daf82..2726445 100644
--- a/drivers/hv/Makefile
+++ b/drivers/hv/Makefile
@@ -2,6 +2,7 @@
obj-$(CONFIG_HYPERV) += hv_vmbus.o
obj-$(CONFIG_HYPERV_UTILS) += hv_utils.o
obj-$(CONFIG_HYPERV_BALLOON) += hv_balloon.o
+obj-$(CONFIG_HYPERV_AZURE_BLOB) += hv_azure_blob.o

CFLAGS_hv_trace.o = -I$(src)
CFLAGS_hv_balloon.o = -I$(src)
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 705e95d..3095611 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -154,6 +154,13 @@
.allowed_in_isolated = false,
},

+ /* Azure Blob */
+ { .dev_type = HV_AZURE_BLOB,
+ HV_AZURE_BLOB_GUID,
+ .perf_device = false,
+ .allowed_in_isolated = false,
+ },
+
/* Unknown GUID */
{ .dev_type = HV_UNKNOWN,
.perf_device = false,
diff --git a/drivers/hv/hv_azure_blob.c b/drivers/hv/hv_azure_blob.c
new file mode 100644
index 0000000..3a1063f
--- /dev/null
+++ b/drivers/hv/hv_azure_blob.c
@@ -0,0 +1,574 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2021 Microsoft Corporation. */
+
+#include <uapi/misc/hv_azure_blob.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/cred.h>
+#include <linux/debugfs.h>
+#include <linux/pagemap.h>
+#include <linux/hyperv.h>
+#include <linux/miscdevice.h>
+#include <linux/uio.h>
+
+struct az_blob_device {
+ struct kref kref;
+
+ struct hv_device *device;
+ struct miscdevice misc;
+
+ /* Lock for protecting pending_requests */
+ spinlock_t request_lock;
+ struct list_head pending_requests;
+ wait_queue_head_t waiting_to_drain;
+};
+
+/* VSP messages */
+enum az_blob_vsp_request_type {
+ AZ_BLOB_DRIVER_REQUEST_FIRST = 0x100,
+ AZ_BLOB_DRIVER_USER_REQUEST = 0x100,
+ AZ_BLOB_DRIVER_REGISTER_BUFFER = 0x101,
+ AZ_BLOB_DRIVER_DEREGISTER_BUFFER = 0x102,
+};
+
+/* VSC->VSP request */
+struct az_blob_vsp_request {
+ u32 version;
+ u32 timeout_ms;
+ u32 data_buffer_offset;
+ u32 data_buffer_length;
+ u32 data_buffer_valid;
+ u32 operation_type;
+ u32 request_buffer_offset;
+ u32 request_buffer_length;
+ u32 response_buffer_offset;
+ u32 response_buffer_length;
+ guid_t transaction_id;
+} __packed;
+
+/* VSP->VSC response */
+struct az_blob_vsp_response {
+ u32 length;
+ u32 error;
+ u32 response_len;
+} __packed;
+
+struct az_blob_vsp_request_ctx {
+ struct list_head list_device;
+ struct completion wait_vsp;
+ struct az_blob_request_sync *request;
+};
+
+/* The maximum number of pages we can pass to VSP in a single packet */
+#define AZ_BLOB_MAX_PAGES 8192
+
+/* Ring buffer size in bytes */
+#define AZ_BLOB_RING_SIZE (128 * 1024)
+
+/* System wide device queue depth */
+#define AZ_BLOB_QUEUE_DEPTH 1024
+
+/* The VSP protocol version this driver understands */
+#define VSP_PROTOCOL_VERSION_V1 0
+
+static const struct hv_vmbus_device_id id_table[] = {
+ { HV_AZURE_BLOB_GUID,
+ .driver_data = 0
+ },
+ { },
+};
+
+static void az_blob_device_get(struct az_blob_device *dev)
+{
+ kref_get(&dev->kref);
+}
+
+static void az_blob_release(struct kref *kref)
+{
+ struct az_blob_device *dev =
+ container_of(kref, struct az_blob_device, kref);
+
+ kfree(dev);
+}
+
+static void az_blob_device_put(struct az_blob_device *dev)
+{
+ kref_put(&dev->kref, az_blob_release);
+}
+
+static void az_blob_on_channel_callback(void *context)
+{
+ struct vmbus_channel *channel = (struct vmbus_channel *)context;
+ const struct vmpacket_descriptor *desc;
+
+ foreach_vmbus_pkt(desc, channel) {
+ struct az_blob_vsp_request_ctx *request_ctx;
+ struct az_blob_vsp_response *response;
+ u64 cmd_rqst = vmbus_request_addr(&channel->requestor,
+ desc->trans_id);
+ if (cmd_rqst == VMBUS_RQST_ERROR) {
+ dev_err(&channel->device_obj->device,
+ "incorrect transaction id %llu\n",
+ desc->trans_id);
+ continue;
+ }
+ request_ctx = (struct az_blob_vsp_request_ctx *)cmd_rqst;
+ response = hv_pkt_data(desc);
+
+ dev_dbg(&channel->device_obj->device,
+ "response for request %pUb status %u "
+ "response_len %u\n",
+ &request_ctx->request->guid, response->error,
+ response->response_len);
+ request_ctx->request->response.status = response->error;
+ request_ctx->request->response.response_len =
+ response->response_len;
+ complete(&request_ctx->wait_vsp);
+ }
+}
+
+static int az_blob_fop_open(struct inode *inode, struct file *file)
+{
+ struct az_blob_device *dev =
+ container_of(file->private_data, struct az_blob_device, misc);
+
+ az_blob_device_get(dev);
+
+ return 0;
+}
+
+static int az_blob_fop_release(struct inode *inode, struct file *file)
+{
+ struct az_blob_device *dev =
+ container_of(file->private_data, struct az_blob_device, misc);
+
+ az_blob_device_put(dev);
+
+ return 0;
+}
+
+static inline bool az_blob_safe_file_access(struct file *file)
+{
+ return file->f_cred == current_cred() && !uaccess_kernel();
+}
+
+/* Pin the user buffer pages into memory for passing to VSP */
+static int get_buffer_pages(int rw, void __user *buffer, u32 buffer_len,
+ struct page ***ppages, size_t *start,
+ size_t *num_pages)
+{
+ struct iovec iov;
+ struct iov_iter iter;
+ int ret;
+ ssize_t result;
+ struct page **pages;
+ int i;
+
+ ret = import_single_range(rw, buffer, buffer_len, &iov, &iter);
+ if (ret)
+ return ret;
+
+ result = iov_iter_get_pages_alloc(&iter, &pages, buffer_len, start);
+ if (result < 0)
+ return result;
+
+ *num_pages = (result + *start + PAGE_SIZE - 1) / PAGE_SIZE;
+ if (result != buffer_len) {
+ for (i = 0; i < *num_pages; i++)
+ put_page(pages[i]);
+ kvfree(pages);
+ return -EFAULT;
+ }
+
+ *ppages = pages;
+ return 0;
+}
+
+static void fill_in_page_buffer(u64 *pfn_array, int *index,
+ struct page **pages, unsigned long num_pages)
+{
+ int i, page_idx = *index;
+
+ for (i = 0; i < num_pages; i++)
+ pfn_array[page_idx++] = page_to_pfn(pages[i]);
+ *index = page_idx;
+}
+
+static void free_buffer_pages(size_t num_pages, struct page **pages)
+{
+ unsigned long i;
+
+ for (i = 0; i < num_pages; i++)
+ if (pages && pages[i])
+ put_page(pages[i]);
+ kvfree(pages);
+}
+
+static long az_blob_ioctl_user_request(struct file *filp, unsigned long arg)
+{
+ struct az_blob_device *dev =
+ container_of(filp->private_data, struct az_blob_device, misc);
+ struct az_blob_request_sync __user *request_user =
+ (struct az_blob_request_sync __user *)arg;
+ struct az_blob_request_sync request;
+ struct az_blob_vsp_request_ctx request_ctx;
+ unsigned long flags;
+ int ret;
+ size_t request_start, request_num_pages = 0;
+ size_t response_start, response_num_pages = 0;
+ size_t data_start, data_num_pages = 0, total_num_pages;
+ struct page **request_pages = NULL, **response_pages = NULL;
+ struct page **data_pages = NULL;
+ struct vmbus_packet_mpb_array *desc;
+ u64 *pfn_array;
+ int desc_size;
+ int page_idx;
+ struct az_blob_vsp_request *vsp_request;
+
+ if (!az_blob_safe_file_access(filp)) {
+ dev_dbg(&dev->device->device,
+ "process %d(%s) changed security contexts after"
+ " opening file descriptor\n",
+ task_tgid_vnr(current), current->comm);
+ return -EACCES;
+ }
+
+ if (copy_from_user(&request, request_user, sizeof(request))) {
+ dev_dbg(&dev->device->device,
+ "don't have permission to user provided buffer\n");
+ return -EFAULT;
+ }
+
+ dev_dbg(&dev->device->device,
+ "az_blob ioctl request guid %pUb timeout %u request_len %u"
+ " response_len %u data_len %u request_buffer %llx "
+ "response_buffer %llx data_buffer %llx\n",
+ &request.guid, request.timeout, request.request_len,
+ request.response_len, request.data_len, request.request_buffer,
+ request.response_buffer, request.data_buffer);
+
+ if (!request.request_len || !request.response_len)
+ return -EINVAL;
+
+ if (request.data_len && request.data_len < request.data_valid)
+ return -EINVAL;
+
+ if (request.data_len > PAGE_SIZE * AZ_BLOB_MAX_PAGES ||
+ request.request_len > PAGE_SIZE * AZ_BLOB_MAX_PAGES ||
+ request.response_len > PAGE_SIZE * AZ_BLOB_MAX_PAGES)
+ return -EINVAL;
+
+ init_completion(&request_ctx.wait_vsp);
+ request_ctx.request = &request;
+
+ ret = get_buffer_pages(READ, (void __user *)request.request_buffer,
+ request.request_len, &request_pages,
+ &request_start, &request_num_pages);
+ if (ret)
+ goto get_user_page_failed;
+
+ ret = get_buffer_pages(READ | WRITE,
+ (void __user *)request.response_buffer,
+ request.response_len, &response_pages,
+ &response_start, &response_num_pages);
+ if (ret)
+ goto get_user_page_failed;
+
+ if (request.data_len) {
+ ret = get_buffer_pages(READ | WRITE,
+ (void __user *)request.data_buffer,
+ request.data_len, &data_pages,
+ &data_start, &data_num_pages);
+ if (ret)
+ goto get_user_page_failed;
+ }
+
+ total_num_pages = request_num_pages + response_num_pages +
+ data_num_pages;
+ if (total_num_pages > AZ_BLOB_MAX_PAGES) {
+ dev_dbg(&dev->device->device,
+ "number of DMA pages %lu buffer exceeding %u\n",
+ total_num_pages, AZ_BLOB_MAX_PAGES);
+ ret = -EINVAL;
+ goto get_user_page_failed;
+ }
+
+ /* Construct a VMBUS packet and send it over to VSP */
+ desc_size = struct_size(desc, range.pfn_array, total_num_pages);
+ desc = kzalloc(desc_size, GFP_KERNEL);
+ vsp_request = kzalloc(sizeof(*vsp_request), GFP_KERNEL);
+ if (!desc || !vsp_request) {
+ kfree(desc);
+ kfree(vsp_request);
+ ret = -ENOMEM;
+ goto get_user_page_failed;
+ }
+
+ desc->range.offset = 0;
+ desc->range.len = total_num_pages * PAGE_SIZE;
+ pfn_array = desc->range.pfn_array;
+ page_idx = 0;
+
+ if (request.data_len) {
+ fill_in_page_buffer(pfn_array, &page_idx, data_pages,
+ data_num_pages);
+ vsp_request->data_buffer_offset = data_start;
+ vsp_request->data_buffer_length = request.data_len;
+ vsp_request->data_buffer_valid = request.data_valid;
+ }
+
+ fill_in_page_buffer(pfn_array, &page_idx, request_pages,
+ request_num_pages);
+ vsp_request->request_buffer_offset = request_start +
+ data_num_pages * PAGE_SIZE;
+ vsp_request->request_buffer_length = request.request_len;
+
+ fill_in_page_buffer(pfn_array, &page_idx, response_pages,
+ response_num_pages);
+ vsp_request->response_buffer_offset = response_start +
+ (data_num_pages + request_num_pages) * PAGE_SIZE;
+ vsp_request->response_buffer_length = request.response_len;
+
+ vsp_request->version = VSP_PROTOCOL_VERSION_V1;
+ vsp_request->timeout_ms = request.timeout;
+ vsp_request->operation_type = AZ_BLOB_DRIVER_USER_REQUEST;
+ guid_copy(&vsp_request->transaction_id, &request.guid);
+
+ spin_lock_irqsave(&dev->request_lock, flags);
+ list_add_tail(&request_ctx.list_device, &dev->pending_requests);
+ spin_unlock_irqrestore(&dev->request_lock, flags);
+
+ ret = vmbus_sendpacket_mpb_desc(dev->device->channel, desc, desc_size,
+ vsp_request, sizeof(*vsp_request),
+ (u64)&request_ctx);
+
+ kfree(desc);
+ kfree(vsp_request);
+ if (ret)
+ goto vmbus_send_failed;
+
+ wait_for_completion(&request_ctx.wait_vsp);
+
+ /*
+ * At this point, the response is already written to request
+ * by VMBUS completion handler, copy them to user-mode buffers
+ * and return to user-mode
+ */
+ if (copy_to_user(&request_user->response, &request.response,
+ sizeof(request.response)))
+ ret = -EFAULT;
+
+vmbus_send_failed:
+
+ spin_lock_irqsave(&dev->request_lock, flags);
+ list_del(&request_ctx.list_device);
+ if (list_empty(&dev->pending_requests))
+ wake_up(&dev->waiting_to_drain);
+ spin_unlock_irqrestore(&dev->request_lock, flags);
+
+get_user_page_failed:
+ free_buffer_pages(request_num_pages, request_pages);
+ free_buffer_pages(response_num_pages, response_pages);
+ free_buffer_pages(data_num_pages, data_pages);
+
+ return ret;
+}
+
+static long az_blob_fop_ioctl(struct file *filp, unsigned int cmd,
+ unsigned long arg)
+{
+ struct az_blob_device *dev =
+ container_of(filp->private_data, struct az_blob_device, misc);
+
+ switch (cmd) {
+ case IOCTL_AZ_BLOB_DRIVER_USER_REQUEST:
+ return az_blob_ioctl_user_request(filp, arg);
+
+ default:
+ dev_dbg(&dev->device->device,
+ "unrecognized IOCTL code %u\n", cmd);
+ }
+
+ return -EINVAL;
+}
+
+static const struct file_operations az_blob_client_fops = {
+ .owner = THIS_MODULE,
+ .open = az_blob_fop_open,
+ .unlocked_ioctl = az_blob_fop_ioctl,
+ .release = az_blob_fop_release,
+};
+
+#if defined(CONFIG_DEBUG_FS)
+static int az_blob_show_pending_requests(struct seq_file *m, void *v)
+{
+ unsigned long flags;
+ struct az_blob_vsp_request_ctx *request_ctx;
+ struct az_blob_device *dev = m->private;
+
+ seq_puts(m, "List of pending requests\n");
+ seq_puts(m, "UUID request_len response_len data_len data_valid "
+ "request_buffer response_buffer data_buffer\n");
+ spin_lock_irqsave(&dev->request_lock, flags);
+ list_for_each_entry(request_ctx, &dev->pending_requests, list_device) {
+ seq_printf(m, "%pUb ", &request_ctx->request->guid);
+ seq_printf(m, "%u ", request_ctx->request->request_len);
+ seq_printf(m, "%u ", request_ctx->request->response_len);
+ seq_printf(m, "%u ", request_ctx->request->data_len);
+ seq_printf(m, "%u ", request_ctx->request->data_valid);
+ seq_printf(m, "%llx ", request_ctx->request->request_buffer);
+ seq_printf(m, "%llx ", request_ctx->request->response_buffer);
+ seq_printf(m, "%llx\n", request_ctx->request->data_buffer);
+ }
+ spin_unlock_irqrestore(&dev->request_lock, flags);
+
+ return 0;
+}
+
+static int az_blob_debugfs_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, az_blob_show_pending_requests,
+ inode->i_private);
+}
+
+static const struct file_operations az_blob_debugfs_fops = {
+ .open = az_blob_debugfs_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release
+};
+#endif
+
+static void az_blob_remove_device(struct az_blob_device *dev)
+{
+ struct dentry *debugfs_root = debugfs_lookup("az_blob", NULL);
+
+ debugfs_remove_recursive(debugfs_root);
+ misc_deregister(&dev->misc);
+}
+
+static int az_blob_create_device(struct az_blob_device *dev)
+{
+ int ret;
+ struct dentry *debugfs_root;
+
+ dev->misc.minor = MISC_DYNAMIC_MINOR,
+ dev->misc.name = "azure_blob",
+ dev->misc.fops = &az_blob_client_fops,
+
+ ret = misc_register(&dev->misc);
+ if (ret)
+ return ret;
+
+ debugfs_root = debugfs_create_dir("az_blob", NULL);
+ debugfs_create_file("pending_requests", 0400, debugfs_root, dev,
+ &az_blob_debugfs_fops);
+
+ return 0;
+}
+
+static int az_blob_connect_to_vsp(struct hv_device *device,
+ struct az_blob_device *dev, u32 ring_size)
+{
+ int ret;
+
+ dev->device = device;
+ device->channel->rqstor_size = AZ_BLOB_QUEUE_DEPTH;
+
+ ret = vmbus_open(device->channel, ring_size, ring_size, NULL, 0,
+ az_blob_on_channel_callback, device->channel);
+
+ if (ret)
+ return ret;
+
+ hv_set_drvdata(device, dev);
+
+ return ret;
+}
+
+static void az_blob_remove_vmbus(struct hv_device *device)
+{
+ hv_set_drvdata(device, NULL);
+ vmbus_close(device->channel);
+}
+
+static int az_blob_probe(struct hv_device *device,
+ const struct hv_vmbus_device_id *dev_id)
+{
+ int ret;
+ struct az_blob_device *dev;
+
+ dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+ if (!dev)
+ return -ENOMEM;
+
+ spin_lock_init(&dev->request_lock);
+ INIT_LIST_HEAD(&dev->pending_requests);
+ init_waitqueue_head(&dev->waiting_to_drain);
+
+ kref_init(&dev->kref);
+
+ ret = az_blob_connect_to_vsp(device, dev, AZ_BLOB_RING_SIZE);
+ if (ret)
+ goto fail;
+
+ /* create user-mode client library facing device */
+ ret = az_blob_create_device(dev);
+ if (ret) {
+ dev_err(&dev->device->device,
+ "failed to create device ret=%d\n", ret);
+ az_blob_remove_vmbus(device);
+ goto fail;
+ }
+
+ return 0;
+
+fail:
+ az_blob_device_put(dev);
+ return ret;
+}
+
+static int az_blob_remove(struct hv_device *device)
+{
+ struct az_blob_device *dev = hv_get_drvdata(device);
+
+ az_blob_remove_device(dev);
+
+ /*
+ * The Hyper-V VSP still owns the user buffers of those pending
+ * requests. Wait until all the user buffers are released to
+ * the original owner before proceeding to remove the bus device.
+ */
+ wait_event(dev->waiting_to_drain, list_empty(&dev->pending_requests));
+
+ az_blob_remove_vmbus(device);
+ az_blob_device_put(dev);
+
+ return 0;
+}
+
+static struct hv_driver az_blob_drv = {
+ .name = KBUILD_MODNAME,
+ .id_table = id_table,
+ .probe = az_blob_probe,
+ .remove = az_blob_remove,
+ .driver = {
+ .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+ },
+};
+
+static int __init az_blob_drv_init(void)
+{
+ return vmbus_driver_register(&az_blob_drv);
+}
+
+static void __exit az_blob_drv_exit(void)
+{
+ vmbus_driver_unregister(&az_blob_drv);
+}
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Microsoft Azure Blob driver");
+module_init(az_blob_drv_init);
+module_exit(az_blob_drv_exit);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index d1e59db..ac31362 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -772,6 +772,7 @@ enum vmbus_device_type {
HV_FCOPY,
HV_BACKUP,
HV_DM,
+ HV_AZURE_BLOB,
HV_UNKNOWN,
};

@@ -1350,6 +1351,14 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
0x72, 0xe2, 0xff, 0xb1, 0xdc, 0x7f)

/*
+ * Azure Blob GUID
+ * {0590b792-db64-45cc-81db-b8d70c577c9e}
+ */
+#define HV_AZURE_BLOB_GUID \
+ .guid = GUID_INIT(0x0590b792, 0xdb64, 0x45cc, 0x81, 0xdb, \
+ 0xb8, 0xd7, 0x0c, 0x57, 0x7c, 0x9e)
+
+/*
* Shutdown GUID
* {0e0b6031-5213-4934-818b-38d90ced39db}
*/
diff --git a/include/uapi/misc/hv_azure_blob.h b/include/uapi/misc/hv_azure_blob.h
new file mode 100644
index 0000000..87a3f77
--- /dev/null
+++ b/include/uapi/misc/hv_azure_blob.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
+/* Copyright (c) 2021 Microsoft Corporation. */
+
+#ifndef _AZ_BLOB_H
+#define _AZ_BLOB_H
+
+#include <linux/ioctl.h>
+#include <linux/uuid.h>
+#include <linux/types.h>
+
+/* user-mode sync request sent through ioctl */
+struct az_blob_request_sync_response {
+ __u32 status;
+ __u32 response_len;
+};
+
+struct az_blob_request_sync {
+ guid_t guid;
+ __u32 timeout;
+ __u32 request_len;
+ __u32 response_len;
+ __u32 data_len;
+ __u32 data_valid;
+ __aligned_u64 request_buffer;
+ __aligned_u64 response_buffer;
+ __aligned_u64 data_buffer;
+ struct az_blob_request_sync_response response;
+};
+
+#define AZ_BLOB_MAGIC_NUMBER 'R'
+#define IOCTL_AZ_BLOB_DRIVER_USER_REQUEST \
+ _IOWR(AZ_BLOB_MAGIC_NUMBER, 0xf0, \
+ struct az_blob_request_sync)
+
+#endif /* define _AZ_BLOB_H */
--
1.8.3.1

2021-08-05 19:39:33

by Bart Van Assche

[permalink] [raw]
Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

On 8/5/21 12:00 AM, [email protected] wrote:
> From: Long Li <[email protected]>
>
> Azure Blob storage [1] is Microsoft's object storage solution for the
> cloud. Users or client applications can access objects in Blob storage via
> HTTP, from anywhere in the world. Objects in Blob storage are accessible
> via the Azure Storage REST API, Azure PowerShell, Azure CLI, or an Azure
> Storage client library. The Blob storage interface is not designed to be a
> POSIX compliant interface.
>
> Problem: When a client accesses Blob storage via HTTP, it must go through
> the Blob storage boundary of Azure and get to the storage server through
> multiple servers. This is also true for an Azure VM.
>
> Solution: For an Azure VM, the Blob storage access can be accelerated by
> having Azure host execute the Blob storage requests to the backend storage
> server directly.
>
> This driver implements a VSC (Virtual Service Client) for accelerating Blob
> storage access for an Azure VM by communicating with a VSP (Virtual Service
> Provider) on the Azure host. Instead of using HTTP to access the Blob
> storage, an Azure VM passes the Blob storage request to the VSP on the
> Azure host. The Azure host uses its native network to perform Blob storage
> requests to the backend server directly.
>
> This driver doesn’t implement Blob storage APIs. It acts as a fast channel
> to pass user-mode Blob storage requests to the Azure host. The user-mode
> program using this driver implements Blob storage APIs and packages the
> Blob storage request as structured data to VSC. The request data is modeled
> as three user provided buffers (request, response and data buffers), that
> are patterned on the HTTP model used by existing Azure Blob clients. The
> VSC passes those buffers to VSP for Blob storage requests.
>
> The driver optimizes Blob storage access for an Azure VM in two ways:
>
> 1. The Blob storage requests are performed by the Azure host to the Azure
> Blob backend storage server directly.
>
> 2. It allows the Azure host to use transport technologies (e.g. RDMA)
> available to the Azure host but not available to the VM, to reach to Azure
> Blob backend servers.
>
> Test results using this driver for an Azure VM:
> 100 Blob clients running on an Azure VM, each reading 100GB Block Blobs.
> (10 TB total read data)
> With REST API over HTTP: 94.4 mins
> Using this driver: 72.5 mins
> Performance (measured in throughput) gain: 30%.
>
> [1] https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

Is the ioctl interface the only user space interface provided by this
kernel driver? If so, why has this code been implemented as a kernel
driver instead of e.g. a user space library that uses vfio to interact
with a PCIe device? As an example, Qemu supports many different virtio
device types.

Thanks,

Bart.

2021-08-05 20:19:05

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 2/3] Drivers: hv: add Azure Blob driver

> Subject: Re: [Patch v5 2/3] Drivers: hv: add Azure Blob driver
>
> On 8/5/21 12:00 AM, [email protected] wrote:
> > diff --git a/include/uapi/misc/hv_azure_blob.h
> > b/include/uapi/misc/hv_azure_blob.h
> > new file mode 100644
> > index 0000000..87a3f77
> > --- /dev/null
> > +++ b/include/uapi/misc/hv_azure_blob.h
> > @@ -0,0 +1,35 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> > +/* Copyright (c) 2021 Microsoft Corporation. */
> > +
> > +#ifndef _AZ_BLOB_H
> > +#define _AZ_BLOB_H
> > +
> > +#include <linux/ioctl.h>
> > +#include <linux/uuid.h>
> > +#include <linux/types.h>
> > +
> > +/* user-mode sync request sent through ioctl */ struct
> > +az_blob_request_sync_response {
> > + __u32 status;
> > + __u32 response_len;
> > +};
> > +
> > +struct az_blob_request_sync {
> > + guid_t guid;
> > + __u32 timeout;
> > + __u32 request_len;
> > + __u32 response_len;
> > + __u32 data_len;
> > + __u32 data_valid;
> > + __aligned_u64 request_buffer;
> > + __aligned_u64 response_buffer;
> > + __aligned_u64 data_buffer;
> > + struct az_blob_request_sync_response response; };
> > +
> > +#define AZ_BLOB_MAGIC_NUMBER 'R'
> > +#define IOCTL_AZ_BLOB_DRIVER_USER_REQUEST \
> > + _IOWR(AZ_BLOB_MAGIC_NUMBER, 0xf0, \
> > + struct az_blob_request_sync)
> > +
> > +#endif /* define _AZ_BLOB_H */
>
> So this driver only supports synchronous requests? Is it likely that users will
> ask for support of an API that supports having multiple requests outstanding
> at the same time without having to create multiple user space threads?

Yes, the current implementation of the driver only supports synchronous requests. Most Blob applications use synchronous mode for Block Blobs.

We will add asynchronous support later.

>
> Thanks,
>
> Bart.
>

2021-08-05 20:20:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Patch v5 2/3] Drivers: hv: add Azure Blob driver

On Thu, Aug 05, 2021 at 10:06:03AM -0700, Bart Van Assche wrote:
> On 8/5/21 12:00 AM, [email protected] wrote:
> > diff --git a/include/uapi/misc/hv_azure_blob.h b/include/uapi/misc/hv_azure_blob.h
> > new file mode 100644
> > index 0000000..87a3f77
> > --- /dev/null
> > +++ b/include/uapi/misc/hv_azure_blob.h
> > @@ -0,0 +1,35 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> > +/* Copyright (c) 2021 Microsoft Corporation. */
> > +
> > +#ifndef _AZ_BLOB_H
> > +#define _AZ_BLOB_H
> > +
> > +#include <linux/ioctl.h>
> > +#include <linux/uuid.h>
> > +#include <linux/types.h>
> > +
> > +/* user-mode sync request sent through ioctl */
> > +struct az_blob_request_sync_response {
> > + __u32 status;
> > + __u32 response_len;
> > +};
> > +
> > +struct az_blob_request_sync {
> > + guid_t guid;
> > + __u32 timeout;
> > + __u32 request_len;
> > + __u32 response_len;
> > + __u32 data_len;
> > + __u32 data_valid;
> > + __aligned_u64 request_buffer;
> > + __aligned_u64 response_buffer;
> > + __aligned_u64 data_buffer;
> > + struct az_blob_request_sync_response response;
> > +};
> > +
> > +#define AZ_BLOB_MAGIC_NUMBER 'R'
> > +#define IOCTL_AZ_BLOB_DRIVER_USER_REQUEST \
> > + _IOWR(AZ_BLOB_MAGIC_NUMBER, 0xf0, \
> > + struct az_blob_request_sync)
> > +
> > +#endif /* define _AZ_BLOB_H */
>
> So this driver only supports synchronous requests? Is it likely that users
> will ask for support of an API that supports having multiple requests
> outstanding at the same time without having to create multiple user space
> threads?

They seem to want to go around the block layer and reinvent it all from
scratch, which is crazy, so let them have their slow i/o :(


2021-08-05 20:25:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

On Thu, Aug 05, 2021 at 06:24:57PM +0000, Long Li wrote:
> > Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated
> > access to Microsoft Azure Blob for Azure VM
> >
> > On 8/5/21 12:00 AM, [email protected] wrote:
> > > From: Long Li <[email protected]>
> > >
> > > Azure Blob storage [1] is Microsoft's object storage solution for the
> > > cloud. Users or client applications can access objects in Blob storage
> > > via HTTP, from anywhere in the world. Objects in Blob storage are
> > > accessible via the Azure Storage REST API, Azure PowerShell, Azure
> > > CLI, or an Azure Storage client library. The Blob storage interface is
> > > not designed to be a POSIX compliant interface.
> > >
> > > Problem: When a client accesses Blob storage via HTTP, it must go
> > > through the Blob storage boundary of Azure and get to the storage
> > > server through multiple servers. This is also true for an Azure VM.
> > >
> > > Solution: For an Azure VM, the Blob storage access can be accelerated
> > > by having Azure host execute the Blob storage requests to the backend
> > > storage server directly.
> > >
> > > This driver implements a VSC (Virtual Service Client) for accelerating
> > > Blob storage access for an Azure VM by communicating with a VSP
> > > (Virtual Service
> > > Provider) on the Azure host. Instead of using HTTP to access the Blob
> > > storage, an Azure VM passes the Blob storage request to the VSP on the
> > > Azure host. The Azure host uses its native network to perform Blob
> > > storage requests to the backend server directly.
> > >
> > > This driver doesn't implement Blob storage APIs. It acts as a fast
> > > channel to pass user-mode Blob storage requests to the Azure host. The
> > > user-mode program using this driver implements Blob storage APIs and
> > > packages the Blob storage request as structured data to VSC. The
> > > request data is modeled as three user provided buffers (request,
> > > response and data buffers), that are patterned on the HTTP model used
> > > by existing Azure Blob clients. The VSC passes those buffers to VSP for Blob
> > storage requests.
> > >
> > > The driver optimizes Blob storage access for an Azure VM in two ways:
> > >
> > > 1. The Blob storage requests are performed by the Azure host to the
> > > Azure Blob backend storage server directly.
> > >
> > > 2. It allows the Azure host to use transport technologies (e.g. RDMA)
> > > available to the Azure host but not available to the VM, to reach to
> > > Azure Blob backend servers.
> > >
> > > Test results using this driver for an Azure VM:
> > > 100 Blob clients running on an Azure VM, each reading 100GB Block Blobs.
> > > (10 TB total read data)
> > > With REST API over HTTP: 94.4 mins
> > > Using this driver: 72.5 mins
> > > Performance (measured in throughput) gain: 30%.
> > >
> > > [1]
> > >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> > > .microsoft.com%2Fen-us%2Fazure%2Fstorage%2Fblobs%2Fstorage-blobs-
> > intro
> > >
> > duction&amp;data=04%7C01%7Clongli%40microsoft.com%7C6ba60a78f4e74
> > aeb0b
> > >
> > b108d95833bf53%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6376
> > 378015
> > >
> > 92577579%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
> > V2luMzIiL
> > >
> > CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ab5Zl2cQdmUhdT3l
> > SotDwMl
> > > DQuE0JaY%2B1REPQ0%2FjXa4%3D&amp;reserved=0
> >
> > Is the ioctl interface the only user space interface provided by this kernel
> > driver? If so, why has this code been implemented as a kernel driver instead
> > of e.g. a user space library that uses vfio to interact with a PCIe device? As an
> > example, Qemu supports many different virtio device types.
>
> The Hyper-V presents one such device for the whole VM. This device is used by all processes on the VM. (The test benchmark used 100 processes)
>
> Hyper-V doesn't support creating one device for each process. We cannot use VFIO in this model.

I still think this "model" is totally broken and wrong overall. Again,
you are creating a custom "block" layer with a character device, forcing
all userspace programs to use a custom library (where is it at?) just to
get their data.

There's a reason the POSIX model is there, why are you all ignoring it?

greg k-h

2021-08-06 00:05:15

by Bart Van Assche

[permalink] [raw]
Subject: Re: [Patch v5 2/3] Drivers: hv: add Azure Blob driver

On 8/5/21 12:00 AM, [email protected] wrote:
> diff --git a/include/uapi/misc/hv_azure_blob.h b/include/uapi/misc/hv_azure_blob.h
> new file mode 100644
> index 0000000..87a3f77
> --- /dev/null
> +++ b/include/uapi/misc/hv_azure_blob.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> +/* Copyright (c) 2021 Microsoft Corporation. */
> +
> +#ifndef _AZ_BLOB_H
> +#define _AZ_BLOB_H
> +
> +#include <linux/ioctl.h>
> +#include <linux/uuid.h>
> +#include <linux/types.h>
> +
> +/* user-mode sync request sent through ioctl */
> +struct az_blob_request_sync_response {
> + __u32 status;
> + __u32 response_len;
> +};
> +
> +struct az_blob_request_sync {
> + guid_t guid;
> + __u32 timeout;
> + __u32 request_len;
> + __u32 response_len;
> + __u32 data_len;
> + __u32 data_valid;
> + __aligned_u64 request_buffer;
> + __aligned_u64 response_buffer;
> + __aligned_u64 data_buffer;
> + struct az_blob_request_sync_response response;
> +};
> +
> +#define AZ_BLOB_MAGIC_NUMBER 'R'
> +#define IOCTL_AZ_BLOB_DRIVER_USER_REQUEST \
> + _IOWR(AZ_BLOB_MAGIC_NUMBER, 0xf0, \
> + struct az_blob_request_sync)
> +
> +#endif /* define _AZ_BLOB_H */

So this driver only supports synchronous requests? Is it likely that
users will ask for support of an API that supports having multiple
requests outstanding at the same time without having to create multiple
user space threads?

Thanks,

Bart.


2021-08-06 00:18:31

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

> Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated
> access to Microsoft Azure Blob for Azure VM
>
> On 8/5/21 12:00 AM, [email protected] wrote:
> > From: Long Li <[email protected]>
> >
> > Azure Blob storage [1] is Microsoft's object storage solution for the
> > cloud. Users or client applications can access objects in Blob storage
> > via HTTP, from anywhere in the world. Objects in Blob storage are
> > accessible via the Azure Storage REST API, Azure PowerShell, Azure
> > CLI, or an Azure Storage client library. The Blob storage interface is
> > not designed to be a POSIX compliant interface.
> >
> > Problem: When a client accesses Blob storage via HTTP, it must go
> > through the Blob storage boundary of Azure and get to the storage
> > server through multiple servers. This is also true for an Azure VM.
> >
> > Solution: For an Azure VM, the Blob storage access can be accelerated
> > by having Azure host execute the Blob storage requests to the backend
> > storage server directly.
> >
> > This driver implements a VSC (Virtual Service Client) for accelerating
> > Blob storage access for an Azure VM by communicating with a VSP
> > (Virtual Service
> > Provider) on the Azure host. Instead of using HTTP to access the Blob
> > storage, an Azure VM passes the Blob storage request to the VSP on the
> > Azure host. The Azure host uses its native network to perform Blob
> > storage requests to the backend server directly.
> >
> > This driver doesn't implement Blob storage APIs. It acts as a fast
> > channel to pass user-mode Blob storage requests to the Azure host. The
> > user-mode program using this driver implements Blob storage APIs and
> > packages the Blob storage request as structured data to VSC. The
> > request data is modeled as three user provided buffers (request,
> > response and data buffers), that are patterned on the HTTP model used
> > by existing Azure Blob clients. The VSC passes those buffers to VSP for Blob
> storage requests.
> >
> > The driver optimizes Blob storage access for an Azure VM in two ways:
> >
> > 1. The Blob storage requests are performed by the Azure host to the
> > Azure Blob backend storage server directly.
> >
> > 2. It allows the Azure host to use transport technologies (e.g. RDMA)
> > available to the Azure host but not available to the VM, to reach to
> > Azure Blob backend servers.
> >
> > Test results using this driver for an Azure VM:
> > 100 Blob clients running on an Azure VM, each reading 100GB Block Blobs.
> > (10 TB total read data)
> > With REST API over HTTP: 94.4 mins
> > Using this driver: 72.5 mins
> > Performance (measured in throughput) gain: 30%.
> >
> > [1]
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> > .microsoft.com%2Fen-us%2Fazure%2Fstorage%2Fblobs%2Fstorage-blobs-
> intro
> >
> duction&amp;data=04%7C01%7Clongli%40microsoft.com%7C6ba60a78f4e74
> aeb0b
> >
> b108d95833bf53%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6376
> 378015
> >
> 92577579%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
> V2luMzIiL
> >
> CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ab5Zl2cQdmUhdT3l
> SotDwMl
> > DQuE0JaY%2B1REPQ0%2FjXa4%3D&amp;reserved=0
>
> Is the ioctl interface the only user space interface provided by this kernel
> driver? If so, why has this code been implemented as a kernel driver instead
> of e.g. a user space library that uses vfio to interact with a PCIe device? As an
> example, Qemu supports many different virtio device types.

The Hyper-V presents one such device for the whole VM. This device is used by all processes on the VM. (The test benchmark used 100 processes)

Hyper-V doesn't support creating one device for each process. We cannot use VFIO in this model.

>
> Thanks,
>
> Bart.

2021-08-07 18:32:49

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

> Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated
> access to Microsoft Azure Blob for Azure VM
>
> On Thu, Aug 05, 2021 at 06:24:57PM +0000, Long Li wrote:
> > > Subject: Re: [Patch v5 0/3] Introduce a driver to support host
> > > accelerated access to Microsoft Azure Blob for Azure VM
> > >
> > > On 8/5/21 12:00 AM, [email protected] wrote:
> > > > From: Long Li <[email protected]>
> > > >
> > > > Azure Blob storage [1] is Microsoft's object storage solution for
> > > > the cloud. Users or client applications can access objects in Blob
> > > > storage via HTTP, from anywhere in the world. Objects in Blob
> > > > storage are accessible via the Azure Storage REST API, Azure
> > > > PowerShell, Azure CLI, or an Azure Storage client library. The
> > > > Blob storage interface is not designed to be a POSIX compliant
> interface.
> > > >
> > > > Problem: When a client accesses Blob storage via HTTP, it must go
> > > > through the Blob storage boundary of Azure and get to the storage
> > > > server through multiple servers. This is also true for an Azure VM.
> > > >
> > > > Solution: For an Azure VM, the Blob storage access can be
> > > > accelerated by having Azure host execute the Blob storage requests
> > > > to the backend storage server directly.
> > > >
> > > > This driver implements a VSC (Virtual Service Client) for
> > > > accelerating Blob storage access for an Azure VM by communicating
> > > > with a VSP (Virtual Service
> > > > Provider) on the Azure host. Instead of using HTTP to access the
> > > > Blob storage, an Azure VM passes the Blob storage request to the
> > > > VSP on the Azure host. The Azure host uses its native network to
> > > > perform Blob storage requests to the backend server directly.
> > > >
> > > > This driver doesn't implement Blob storage APIs. It acts as a fast
> > > > channel to pass user-mode Blob storage requests to the Azure host.
> > > > The user-mode program using this driver implements Blob storage
> > > > APIs and packages the Blob storage request as structured data to
> > > > VSC. The request data is modeled as three user provided buffers
> > > > (request, response and data buffers), that are patterned on the
> > > > HTTP model used by existing Azure Blob clients. The VSC passes
> > > > those buffers to VSP for Blob
> > > storage requests.
> > > >
> > > > The driver optimizes Blob storage access for an Azure VM in two ways:
> > > >
> > > > 1. The Blob storage requests are performed by the Azure host to
> > > > the Azure Blob backend storage server directly.
> > > >
> > > > 2. It allows the Azure host to use transport technologies (e.g.
> > > > RDMA) available to the Azure host but not available to the VM, to
> > > > reach to Azure Blob backend servers.
> > > >
> > > > Test results using this driver for an Azure VM:
> > > > 100 Blob clients running on an Azure VM, each reading 100GB Block
> Blobs.
> > > > (10 TB total read data)
> > > > With REST API over HTTP: 94.4 mins Using this driver: 72.5 mins
> > > > Performance (measured in throughput) gain: 30%.
> > > >
> > > > [1]
> > > >
> > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdo
> > > cs
> > > > .microsoft.com%2Fen-us%2Fazure%2Fstorage%2Fblobs%2Fstorage-
> blobs-
> > > intro
> > > >
> > >
> duction&amp;data=04%7C01%7Clongli%40microsoft.com%7C6ba60a78f4e74
> > > aeb0b
> > > >
> > >
> b108d95833bf53%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6376
> > > 378015
> > > >
> > >
> 92577579%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
> > > V2luMzIiL
> > > >
> > >
> CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ab5Zl2cQdmUhdT3l
> > > SotDwMl
> > > > DQuE0JaY%2B1REPQ0%2FjXa4%3D&amp;reserved=0
> > >
> > > Is the ioctl interface the only user space interface provided by
> > > this kernel driver? If so, why has this code been implemented as a
> > > kernel driver instead of e.g. a user space library that uses vfio to
> > > interact with a PCIe device? As an example, Qemu supports many
> different virtio device types.
> >
> > The Hyper-V presents one such device for the whole VM. This device is
> > used by all processes on the VM. (The test benchmark used 100
> > processes)
> >
> > Hyper-V doesn't support creating one device for each process. We cannot
> use VFIO in this model.
>
> I still think this "model" is totally broken and wrong overall. Again, you are
> creating a custom "block" layer with a character device, forcing all userspace
> programs to use a custom library (where is it at?) just to get their data.

The Azure Blob library (with source code) is available in the following languages:
Java: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/storage/azure-storage-blob
JavaScript: https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/storage/storage-blob
Python: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/storage/azure-storage-blob
Go: https://github.com/Azure/azure-storage-blob-go
.NET: https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/storage/Azure.Storage.Blobs
PHP: https://github.com/Azure/azure-storage-php/tree/master/azure-storage-blob
Ruby: https://github.com/azure/azure-storage-ruby/tree/master/blob
C++: https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/storage#azure-storage-client-library-for-c

>
> There's a reason the POSIX model is there, why are you all ignoring it?

The Azure Blob APIs are not designed to be POSIX compatible. This driver is used
to accelerate Blob access for a Blob client running in an Azure VM. It doesn't attempt
to modify the Blob APIs. Changing the Blob APIs will break the existing Blob clients.

Thanks,
Long

2021-08-08 05:15:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

On Sat, Aug 07, 2021 at 06:29:06PM +0000, Long Li wrote:
> > I still think this "model" is totally broken and wrong overall. Again, you are
> > creating a custom "block" layer with a character device, forcing all userspace
> > programs to use a custom library (where is it at?) just to get their data.
>
> The Azure Blob library (with source code) is available in the following languages:
> Java: https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/storage/azure-storage-blob
> JavaScript: https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/storage/storage-blob
> Python: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/storage/azure-storage-blob
> Go: https://github.com/Azure/azure-storage-blob-go
> .NET: https://github.com/Azure/azure-sdk-for-net/tree/main/sdk/storage/Azure.Storage.Blobs
> PHP: https://github.com/Azure/azure-storage-php/tree/master/azure-storage-blob
> Ruby: https://github.com/azure/azure-storage-ruby/tree/master/blob
> C++: https://github.com/Azure/azure-sdk-for-cpp/tree/main/sdk/storage#azure-storage-client-library-for-c

And why wasn't this linked to in the changelog here?

In looking at the C code above, where is the interaction with this Linux
driver? I can't seem to find it...

> > There's a reason the POSIX model is there, why are you all ignoring it?
>
> The Azure Blob APIs are not designed to be POSIX compatible. This driver is used
> to accelerate Blob access for a Blob client running in an Azure VM. It doesn't attempt
> to modify the Blob APIs. Changing the Blob APIs will break the existing Blob clients.

There are no Blob clients today on Linux given that this code is not
merged into the kernel yet, so there is nothing to "break".

I still don't see a good reason why this can't just be a block device,
or a filesystem interface and why you need to make this a custom
character device ioctl.

thanks,

greg k-h

2021-08-10 07:37:18

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

> Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated
> access to Microsoft Azure Blob for Azure VM
>
> On Sat, Aug 07, 2021 at 06:29:06PM +0000, Long Li wrote:
> > > I still think this "model" is totally broken and wrong overall.
> > > Again, you are creating a custom "block" layer with a character
> > > device, forcing all userspace programs to use a custom library (where is it
> at?) just to get their data.
> >
> > The Azure Blob library (with source code) is available in the following
> languages:
> > Java:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2FAzure%2Fazure-sdk-for-
> java%2Ftree%2Fmain%2Fsdk%2Fstorage%2Faz
> > ure-storage-
> blob&amp;data=04%7C01%7Clongli%40microsoft.com%7C778083147
> >
> 8ed49b16e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1
> %7C0%7C6
> >
> 37639965101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> DAiLCJQIjoi
> >
> V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wcNhsEoH
> LV0VBc
> > uDf0CVXl7W0Ug9Cj7Q92%2Bw6qizroU%3D&amp;reserved=0
> > JavaScript:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2FAzure%2Fazure-sdk-for-
> js%2Ftree%2Fmain%2Fsdk%2Fstorage%2Fstor
> > age-
> blob&amp;data=04%7C01%7Clongli%40microsoft.com%7C7780831478ed49b
> 16
> >
> e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C
> 637639965
> >
> 101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
> iV2luMzIi
> >
> LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=I%2FfhdPX3Unz6S3
> eBPcpl
> > %2Bh55nKoV0u%2FO0%2BYgjLy4grQ%3D&amp;reserved=0
> > Python:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2FAzure%2Fazure-sdk-for-
> python%2Ftree%2Fmain%2Fsdk%2Fstorage%2F
> > azure-storage-
> blob&amp;data=04%7C01%7Clongli%40microsoft.com%7C7780831
> >
> 478ed49b16e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7
> C1%7C0%7
> >
> C637639965101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIj
> >
> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aAwsi%2
> BPVsN
> > tsDMJ7rKnRDigNc41fIao031lde247Nc0%3D&amp;reserved=0
> > Go:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2FAzure%2Fazure-storage-blob-
> go&amp;data=04%7C01%7Clongli%40mic
> >
> rosoft.com%7C7780831478ed49b16e6308d95a2b7ae8%7C72f988bf86f141a
> f91ab2d
> >
> 7cd011db47%7C1%7C0%7C637639965101378114%7CUnknown%7CTWFpbG
> Zsb3d8eyJWIj
> >
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C10
> 00&am
> >
> p;sdata=43JhbGsYQxA%2FoivNd7C3z7DSYO%2FPONCoaW2v7TN6xEU%3D&a
> mp;reserve
> > d=0
> > .NET:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2FAzure%2Fazure-sdk-for-
> net%2Ftree%2Fmain%2Fsdk%2Fstorage%2FAzu
> >
> re.Storage.Blobs&amp;data=04%7C01%7Clongli%40microsoft.com%7C77808
> 3147
> >
> 8ed49b16e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1
> %7C0%7C6
> >
> 37639965101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> DAiLCJQIjoi
> >
> V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6ClMeURlt
> cBv1q
> > 7l7PGGrxXVJbVDt9uMBlwoIVh7Wpw%3D&amp;reserved=0
> > PHP:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2FAzure%2Fazure-storage-php%2Ftree%2Fmaster%2Fazure-
> storage-blo
> >
> b&amp;data=04%7C01%7Clongli%40microsoft.com%7C7780831478ed49b16
> e6308d9
> >
> 5a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6376399
> 651013781
> >
> 14%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> LCJBTiI
> >
> 6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=DuZO539vd76c%2Byaqjn
> hetp%2B3T
> > i0b74601ZkNe39SNK4%3D&amp;reserved=0
> > Ruby:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2Fazure%2Fazure-storage-
> ruby%2Ftree%2Fmaster%2Fblob&amp;data=04
> > %7C01%7Clongli%40microsoft.com%7C7780831478ed49b16e6308d95a2b
> 7ae8%7C72
> >
> f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637639965101378114%7
> CUnknown%
> >
> 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> LCJX
> >
> VCI6Mn0%3D%7C1000&amp;sdata=6Zviu1IuRQE2do9bDCae2iJv0W2KOJu90t
> XSR6kDAR
> > 4%3D&amp;reserved=0
> > C++:
> >
> C++https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fg
> > C++ithub.com%2FAzure%2Fazure-sdk-for-
> cpp%2Ftree%2Fmain%2Fsdk%2Fstorage
> > C++%23azure-storage-client-library-for-
> c&amp;data=04%7C01%7Clongli%40m
> >
> C++icrosoft.com%7C7780831478ed49b16e6308d95a2b7ae8%7C72f988bf86
> f141af9
> >
> C++1ab2d7cd011db47%7C1%7C0%7C637639965101388074%7CUnknown%
> 7CTWFpbGZsb3
> >
> C++d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> 0%3
> >
> C++D%7C1000&amp;sdata=HH6jrqREWQ%2BkoRR%2Fsb02wRXnuLU5il4Erzm
> rBvUZu5w%
> > C++3D&amp;reserved=0
>
> And why wasn't this linked to in the changelog here?
>
> In looking at the C code above, where is the interaction with this Linux driver?
> I can't seem to find it...

Those are existing Blob client libraries. The new code using this driver is being
tested and has not been released to github.

I'm sorry I misunderstood your request. I'm asking the team to share the new
code for review. I will send the code location for review soon.

Thanks,
Long

2021-09-07 22:26:04

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [Patch v5 2/3] Drivers: hv: add Azure Blob driver

From: [email protected] <[email protected]> Sent: Thursday, August 5, 2021 12:00 AM
>
> Azure Blob provides scalable, secure and shared storage services for the
> internet.
>
> This driver adds support for accelerated access to Azure Blob storage for
> Azure VMs. As an alternative to using REST APIs over HTTP for Blob access,
> an Azure VM can use this driver to send Blob requests to Azure host. Azure
> host uses its native network to perform Blob requests directly to Blob
> servers.
>
> Cc: Jonathan Corbet <[email protected]>
> Cc: "K. Y. Srinivasan" <[email protected]>
> Cc: Haiyang Zhang <[email protected]>
> Cc: Stephen Hemminger <[email protected]>
> Cc: Wei Liu <[email protected]>
> Cc: Dexuan Cui <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Bjorn Andersson <[email protected]>
> Cc: Hans de Goede <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Maximilian Luz <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: Ben Widawsky <[email protected]>
> Cc: Jiri Slaby <[email protected]>
> Cc: Andra Paraschiv <[email protected]>
> Cc: Siddharth Gupta <[email protected]>
> Cc: Hannes Reinecke <[email protected]>
> Cc: [email protected]
> Signed-off-by: Long Li <[email protected]>
> ---
> Documentation/userspace-api/ioctl/ioctl-number.rst | 2 +
> drivers/hv/Kconfig | 11 +
> drivers/hv/Makefile | 1 +
> drivers/hv/channel_mgmt.c | 7 +
> drivers/hv/hv_azure_blob.c | 574 +++++++++++++++++++++
> include/linux/hyperv.h | 9 +
> include/uapi/misc/hv_azure_blob.h | 35 ++
> 7 files changed, 639 insertions(+)
> create mode 100644 drivers/hv/hv_azure_blob.c
> create mode 100644 include/uapi/misc/hv_azure_blob.h
>
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 9bfc2b5..1ee8c0c7 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -180,6 +180,8 @@ Code Seq# Include File Comments
> 'R' 01 linux/rfkill.h conflict!
> 'R' C0-DF net/bluetooth/rfcomm.h
> 'R' E0 uapi/linux/fsl_mc.h
> +'R' F0-FF uapi/misc/hv_azure_blob.h Microsoft Azure Blob driver
> + <mailto:[email protected]>
> 'S' all linux/cdrom.h conflict!
> 'S' 80-81 scsi/scsi_ioctl.h conflict!
> 'S' 82-FF scsi/scsi.h conflict!
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index 66c794d..53bebd0 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -27,4 +27,15 @@ config HYPERV_BALLOON
> help
> Select this option to enable Hyper-V Balloon driver.
>
> +config HYPERV_AZURE_BLOB
> + tristate "Microsoft Azure Blob driver"
> + depends on HYPERV && X86_64
> + help
> + Select this option to enable Microsoft Azure Blob driver.
> +
> + This driver implements a fast datapath over Hyper-V to support
> + accelerated access to Microsoft Azure Blob services.
> + To compile this driver as a module, choose M here. The module will be
> + called azure_blob.
> +
> endmenu
> diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
> index 94daf82..2726445 100644
> --- a/drivers/hv/Makefile
> +++ b/drivers/hv/Makefile
> @@ -2,6 +2,7 @@
> obj-$(CONFIG_HYPERV) += hv_vmbus.o
> obj-$(CONFIG_HYPERV_UTILS) += hv_utils.o
> obj-$(CONFIG_HYPERV_BALLOON) += hv_balloon.o
> +obj-$(CONFIG_HYPERV_AZURE_BLOB) += hv_azure_blob.o
>
> CFLAGS_hv_trace.o = -I$(src)
> CFLAGS_hv_balloon.o = -I$(src)
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 705e95d..3095611 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -154,6 +154,13 @@
> .allowed_in_isolated = false,
> },
>
> + /* Azure Blob */
> + { .dev_type = HV_AZURE_BLOB,
> + HV_AZURE_BLOB_GUID,
> + .perf_device = false,
> + .allowed_in_isolated = false,
> + },
> +
> /* Unknown GUID */
> { .dev_type = HV_UNKNOWN,
> .perf_device = false,
> diff --git a/drivers/hv/hv_azure_blob.c b/drivers/hv/hv_azure_blob.c
> new file mode 100644
> index 0000000..3a1063f
> --- /dev/null
> +++ b/drivers/hv/hv_azure_blob.c
> @@ -0,0 +1,574 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2021 Microsoft Corporation. */
> +
> +#include <uapi/misc/hv_azure_blob.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include <linux/cred.h>
> +#include <linux/debugfs.h>
> +#include <linux/pagemap.h>
> +#include <linux/hyperv.h>
> +#include <linux/miscdevice.h>
> +#include <linux/uio.h>
> +
> +struct az_blob_device {
> + struct kref kref;
> +
> + struct hv_device *device;
> + struct miscdevice misc;
> +
> + /* Lock for protecting pending_requests */
> + spinlock_t request_lock;
> + struct list_head pending_requests;
> + wait_queue_head_t waiting_to_drain;
> +};
> +
> +/* VSP messages */
> +enum az_blob_vsp_request_type {
> + AZ_BLOB_DRIVER_REQUEST_FIRST = 0x100,
> + AZ_BLOB_DRIVER_USER_REQUEST = 0x100,
> + AZ_BLOB_DRIVER_REGISTER_BUFFER = 0x101,
> + AZ_BLOB_DRIVER_DEREGISTER_BUFFER = 0x102,
> +};
> +
> +/* VSC->VSP request */
> +struct az_blob_vsp_request {
> + u32 version;
> + u32 timeout_ms;
> + u32 data_buffer_offset;
> + u32 data_buffer_length;
> + u32 data_buffer_valid;
> + u32 operation_type;
> + u32 request_buffer_offset;
> + u32 request_buffer_length;
> + u32 response_buffer_offset;
> + u32 response_buffer_length;
> + guid_t transaction_id;
> +} __packed;
> +
> +/* VSP->VSC response */
> +struct az_blob_vsp_response {
> + u32 length;
> + u32 error;
> + u32 response_len;
> +} __packed;
> +
> +struct az_blob_vsp_request_ctx {
> + struct list_head list_device;
> + struct completion wait_vsp;
> + struct az_blob_request_sync *request;
> +};
> +
> +/* The maximum number of pages we can pass to VSP in a single packet */
> +#define AZ_BLOB_MAX_PAGES 8192
> +
> +/* Ring buffer size in bytes */
> +#define AZ_BLOB_RING_SIZE (128 * 1024)
> +
> +/* System wide device queue depth */
> +#define AZ_BLOB_QUEUE_DEPTH 1024
> +
> +/* The VSP protocol version this driver understands */
> +#define VSP_PROTOCOL_VERSION_V1 0
> +
> +static const struct hv_vmbus_device_id id_table[] = {
> + { HV_AZURE_BLOB_GUID,
> + .driver_data = 0
> + },
> + { },
> +};
> +
> +static void az_blob_device_get(struct az_blob_device *dev)
> +{
> + kref_get(&dev->kref);
> +}
> +
> +static void az_blob_release(struct kref *kref)
> +{
> + struct az_blob_device *dev =
> + container_of(kref, struct az_blob_device, kref);
> +
> + kfree(dev);
> +}
> +
> +static void az_blob_device_put(struct az_blob_device *dev)
> +{
> + kref_put(&dev->kref, az_blob_release);
> +}
> +
> +static void az_blob_on_channel_callback(void *context)
> +{
> + struct vmbus_channel *channel = (struct vmbus_channel *)context;
> + const struct vmpacket_descriptor *desc;
> +
> + foreach_vmbus_pkt(desc, channel) {
> + struct az_blob_vsp_request_ctx *request_ctx;
> + struct az_blob_vsp_response *response;
> + u64 cmd_rqst = vmbus_request_addr(&channel->requestor,
> + desc->trans_id);
> + if (cmd_rqst == VMBUS_RQST_ERROR) {
> + dev_err(&channel->device_obj->device,
> + "incorrect transaction id %llu\n",
> + desc->trans_id);
> + continue;
> + }
> + request_ctx = (struct az_blob_vsp_request_ctx *)cmd_rqst;
> + response = hv_pkt_data(desc);
> +
> + dev_dbg(&channel->device_obj->device,
> + "response for request %pUb status %u "
> + "response_len %u\n",
> + &request_ctx->request->guid, response->error,
> + response->response_len);
> + request_ctx->request->response.status = response->error;
> + request_ctx->request->response.response_len =
> + response->response_len;
> + complete(&request_ctx->wait_vsp);
> + }
> +}
> +
> +static int az_blob_fop_open(struct inode *inode, struct file *file)
> +{
> + struct az_blob_device *dev =
> + container_of(file->private_data, struct az_blob_device, misc);
> +
> + az_blob_device_get(dev);
> +
> + return 0;
> +}
> +
> +static int az_blob_fop_release(struct inode *inode, struct file *file)
> +{
> + struct az_blob_device *dev =
> + container_of(file->private_data, struct az_blob_device, misc);
> +
> + az_blob_device_put(dev);
> +
> + return 0;
> +}
> +
> +static inline bool az_blob_safe_file_access(struct file *file)
> +{
> + return file->f_cred == current_cred() && !uaccess_kernel();
> +}
> +
> +/* Pin the user buffer pages into memory for passing to VSP */
> +static int get_buffer_pages(int rw, void __user *buffer, u32 buffer_len,
> + struct page ***ppages, size_t *start,
> + size_t *num_pages)
> +{
> + struct iovec iov;
> + struct iov_iter iter;
> + int ret;
> + ssize_t result;
> + struct page **pages;
> + int i;
> +
> + ret = import_single_range(rw, buffer, buffer_len, &iov, &iter);
> + if (ret)
> + return ret;
> +
> + result = iov_iter_get_pages_alloc(&iter, &pages, buffer_len, start);
> + if (result < 0)
> + return result;
> +
> + *num_pages = (result + *start + PAGE_SIZE - 1) / PAGE_SIZE;
> + if (result != buffer_len) {
> + for (i = 0; i < *num_pages; i++)
> + put_page(pages[i]);
> + kvfree(pages);
> + return -EFAULT;
> + }
> +
> + *ppages = pages;
> + return 0;
> +}
> +
> +static void fill_in_page_buffer(u64 *pfn_array, int *index,
> + struct page **pages, unsigned long num_pages)
> +{
> + int i, page_idx = *index;
> +
> + for (i = 0; i < num_pages; i++)
> + pfn_array[page_idx++] = page_to_pfn(pages[i]);
> + *index = page_idx;
> +}
> +
> +static void free_buffer_pages(size_t num_pages, struct page **pages)
> +{
> + unsigned long i;
> +
> + for (i = 0; i < num_pages; i++)
> + if (pages && pages[i])
> + put_page(pages[i]);
> + kvfree(pages);
> +}
> +
> +static long az_blob_ioctl_user_request(struct file *filp, unsigned long arg)
> +{
> + struct az_blob_device *dev =
> + container_of(filp->private_data, struct az_blob_device, misc);
> + struct az_blob_request_sync __user *request_user =
> + (struct az_blob_request_sync __user *)arg;
> + struct az_blob_request_sync request;
> + struct az_blob_vsp_request_ctx request_ctx;
> + unsigned long flags;
> + int ret;
> + size_t request_start, request_num_pages = 0;
> + size_t response_start, response_num_pages = 0;
> + size_t data_start, data_num_pages = 0, total_num_pages;
> + struct page **request_pages = NULL, **response_pages = NULL;
> + struct page **data_pages = NULL;
> + struct vmbus_packet_mpb_array *desc;
> + u64 *pfn_array;
> + int desc_size;
> + int page_idx;
> + struct az_blob_vsp_request *vsp_request;
> +
> + if (!az_blob_safe_file_access(filp)) {
> + dev_dbg(&dev->device->device,
> + "process %d(%s) changed security contexts after"
> + " opening file descriptor\n",
> + task_tgid_vnr(current), current->comm);
> + return -EACCES;
> + }
> +
> + if (copy_from_user(&request, request_user, sizeof(request))) {
> + dev_dbg(&dev->device->device,
> + "don't have permission to user provided buffer\n");
> + return -EFAULT;
> + }
> +
> + dev_dbg(&dev->device->device,
> + "az_blob ioctl request guid %pUb timeout %u request_len %u"
> + " response_len %u data_len %u request_buffer %llx "
> + "response_buffer %llx data_buffer %llx\n",
> + &request.guid, request.timeout, request.request_len,
> + request.response_len, request.data_len, request.request_buffer,
> + request.response_buffer, request.data_buffer);
> +
> + if (!request.request_len || !request.response_len)
> + return -EINVAL;
> +
> + if (request.data_len && request.data_len < request.data_valid)
> + return -EINVAL;
> +
> + if (request.data_len > PAGE_SIZE * AZ_BLOB_MAX_PAGES ||
> + request.request_len > PAGE_SIZE * AZ_BLOB_MAX_PAGES ||
> + request.response_len > PAGE_SIZE * AZ_BLOB_MAX_PAGES)
> + return -EINVAL;
> +
> + init_completion(&request_ctx.wait_vsp);
> + request_ctx.request = &request;
> +
> + ret = get_buffer_pages(READ, (void __user *)request.request_buffer,
> + request.request_len, &request_pages,
> + &request_start, &request_num_pages);
> + if (ret)
> + goto get_user_page_failed;
> +
> + ret = get_buffer_pages(READ | WRITE,
> + (void __user *)request.response_buffer,
> + request.response_len, &response_pages,
> + &response_start, &response_num_pages);
> + if (ret)
> + goto get_user_page_failed;
> +
> + if (request.data_len) {
> + ret = get_buffer_pages(READ | WRITE,
> + (void __user *)request.data_buffer,
> + request.data_len, &data_pages,
> + &data_start, &data_num_pages);
> + if (ret)
> + goto get_user_page_failed;
> + }
> +
> + total_num_pages = request_num_pages + response_num_pages +
> + data_num_pages;
> + if (total_num_pages > AZ_BLOB_MAX_PAGES) {
> + dev_dbg(&dev->device->device,
> + "number of DMA pages %lu buffer exceeding %u\n",
> + total_num_pages, AZ_BLOB_MAX_PAGES);
> + ret = -EINVAL;
> + goto get_user_page_failed;
> + }
> +
> + /* Construct a VMBUS packet and send it over to VSP */
> + desc_size = struct_size(desc, range.pfn_array, total_num_pages);
> + desc = kzalloc(desc_size, GFP_KERNEL);
> + vsp_request = kzalloc(sizeof(*vsp_request), GFP_KERNEL);
> + if (!desc || !vsp_request) {
> + kfree(desc);
> + kfree(vsp_request);
> + ret = -ENOMEM;
> + goto get_user_page_failed;
> + }
> +
> + desc->range.offset = 0;
> + desc->range.len = total_num_pages * PAGE_SIZE;
> + pfn_array = desc->range.pfn_array;
> + page_idx = 0;
> +
> + if (request.data_len) {
> + fill_in_page_buffer(pfn_array, &page_idx, data_pages,
> + data_num_pages);
> + vsp_request->data_buffer_offset = data_start;
> + vsp_request->data_buffer_length = request.data_len;
> + vsp_request->data_buffer_valid = request.data_valid;
> + }
> +
> + fill_in_page_buffer(pfn_array, &page_idx, request_pages,
> + request_num_pages);
> + vsp_request->request_buffer_offset = request_start +
> + data_num_pages * PAGE_SIZE;
> + vsp_request->request_buffer_length = request.request_len;
> +
> + fill_in_page_buffer(pfn_array, &page_idx, response_pages,
> + response_num_pages);
> + vsp_request->response_buffer_offset = response_start +
> + (data_num_pages + request_num_pages) * PAGE_SIZE;
> + vsp_request->response_buffer_length = request.response_len;
> +
> + vsp_request->version = VSP_PROTOCOL_VERSION_V1;
> + vsp_request->timeout_ms = request.timeout;
> + vsp_request->operation_type = AZ_BLOB_DRIVER_USER_REQUEST;
> + guid_copy(&vsp_request->transaction_id, &request.guid);
> +
> + spin_lock_irqsave(&dev->request_lock, flags);
> + list_add_tail(&request_ctx.list_device, &dev->pending_requests);
> + spin_unlock_irqrestore(&dev->request_lock, flags);
> +
> + ret = vmbus_sendpacket_mpb_desc(dev->device->channel, desc, desc_size,
> + vsp_request, sizeof(*vsp_request),
> + (u64)&request_ctx);
> +
> + kfree(desc);
> + kfree(vsp_request);
> + if (ret)
> + goto vmbus_send_failed;
> +
> + wait_for_completion(&request_ctx.wait_vsp);
> +
> + /*
> + * At this point, the response is already written to request
> + * by VMBUS completion handler, copy them to user-mode buffers
> + * and return to user-mode
> + */
> + if (copy_to_user(&request_user->response, &request.response,
> + sizeof(request.response)))
> + ret = -EFAULT;
> +
> +vmbus_send_failed:
> +
> + spin_lock_irqsave(&dev->request_lock, flags);
> + list_del(&request_ctx.list_device);
> + if (list_empty(&dev->pending_requests))
> + wake_up(&dev->waiting_to_drain);
> + spin_unlock_irqrestore(&dev->request_lock, flags);
> +
> +get_user_page_failed:
> + free_buffer_pages(request_num_pages, request_pages);
> + free_buffer_pages(response_num_pages, response_pages);
> + free_buffer_pages(data_num_pages, data_pages);
> +
> + return ret;
> +}
> +
> +static long az_blob_fop_ioctl(struct file *filp, unsigned int cmd,
> + unsigned long arg)
> +{
> + struct az_blob_device *dev =
> + container_of(filp->private_data, struct az_blob_device, misc);
> +
> + switch (cmd) {
> + case IOCTL_AZ_BLOB_DRIVER_USER_REQUEST:
> + return az_blob_ioctl_user_request(filp, arg);
> +
> + default:
> + dev_dbg(&dev->device->device,
> + "unrecognized IOCTL code %u\n", cmd);
> + }
> +
> + return -EINVAL;
> +}
> +
> +static const struct file_operations az_blob_client_fops = {
> + .owner = THIS_MODULE,
> + .open = az_blob_fop_open,
> + .unlocked_ioctl = az_blob_fop_ioctl,
> + .release = az_blob_fop_release,
> +};
> +
> +#if defined(CONFIG_DEBUG_FS)
> +static int az_blob_show_pending_requests(struct seq_file *m, void *v)
> +{
> + unsigned long flags;
> + struct az_blob_vsp_request_ctx *request_ctx;
> + struct az_blob_device *dev = m->private;
> +
> + seq_puts(m, "List of pending requests\n");
> + seq_puts(m, "UUID request_len response_len data_len data_valid "
> + "request_buffer response_buffer data_buffer\n");
> + spin_lock_irqsave(&dev->request_lock, flags);
> + list_for_each_entry(request_ctx, &dev->pending_requests, list_device) {
> + seq_printf(m, "%pUb ", &request_ctx->request->guid);
> + seq_printf(m, "%u ", request_ctx->request->request_len);
> + seq_printf(m, "%u ", request_ctx->request->response_len);
> + seq_printf(m, "%u ", request_ctx->request->data_len);
> + seq_printf(m, "%u ", request_ctx->request->data_valid);
> + seq_printf(m, "%llx ", request_ctx->request->request_buffer);
> + seq_printf(m, "%llx ", request_ctx->request->response_buffer);
> + seq_printf(m, "%llx\n", request_ctx->request->data_buffer);
> + }
> + spin_unlock_irqrestore(&dev->request_lock, flags);
> +
> + return 0;
> +}
> +
> +static int az_blob_debugfs_open(struct inode *inode, struct file *file)
> +{
> + return single_open(file, az_blob_show_pending_requests,
> + inode->i_private);
> +}
> +
> +static const struct file_operations az_blob_debugfs_fops = {
> + .open = az_blob_debugfs_open,
> + .read = seq_read,
> + .llseek = seq_lseek,
> + .release = seq_release
> +};
> +#endif
> +
> +static void az_blob_remove_device(struct az_blob_device *dev)
> +{
> + struct dentry *debugfs_root = debugfs_lookup("az_blob", NULL);
> +
> + debugfs_remove_recursive(debugfs_root);
> + misc_deregister(&dev->misc);
> +}
> +
> +static int az_blob_create_device(struct az_blob_device *dev)
> +{
> + int ret;
> + struct dentry *debugfs_root;
> +
> + dev->misc.minor = MISC_DYNAMIC_MINOR,
> + dev->misc.name = "azure_blob",
> + dev->misc.fops = &az_blob_client_fops,
> +
> + ret = misc_register(&dev->misc);
> + if (ret)
> + return ret;
> +
> + debugfs_root = debugfs_create_dir("az_blob", NULL);
> + debugfs_create_file("pending_requests", 0400, debugfs_root, dev,
> + &az_blob_debugfs_fops);
> +
> + return 0;
> +}
> +
> +static int az_blob_connect_to_vsp(struct hv_device *device,
> + struct az_blob_device *dev, u32 ring_size)
> +{
> + int ret;
> +
> + dev->device = device;
> + device->channel->rqstor_size = AZ_BLOB_QUEUE_DEPTH;
> +
> + ret = vmbus_open(device->channel, ring_size, ring_size, NULL, 0,
> + az_blob_on_channel_callback, device->channel);
> +
> + if (ret)
> + return ret;
> +
> + hv_set_drvdata(device, dev);
> +
> + return ret;
> +}
> +
> +static void az_blob_remove_vmbus(struct hv_device *device)
> +{
> + hv_set_drvdata(device, NULL);
> + vmbus_close(device->channel);
> +}
> +
> +static int az_blob_probe(struct hv_device *device,
> + const struct hv_vmbus_device_id *dev_id)
> +{
> + int ret;
> + struct az_blob_device *dev;
> +
> + dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> + if (!dev)
> + return -ENOMEM;
> +
> + spin_lock_init(&dev->request_lock);
> + INIT_LIST_HEAD(&dev->pending_requests);
> + init_waitqueue_head(&dev->waiting_to_drain);
> +
> + kref_init(&dev->kref);
> +
> + ret = az_blob_connect_to_vsp(device, dev, AZ_BLOB_RING_SIZE);
> + if (ret)
> + goto fail;
> +
> + /* create user-mode client library facing device */
> + ret = az_blob_create_device(dev);
> + if (ret) {
> + dev_err(&dev->device->device,
> + "failed to create device ret=%d\n", ret);
> + az_blob_remove_vmbus(device);
> + goto fail;
> + }
> +
> + return 0;
> +
> +fail:
> + az_blob_device_put(dev);
> + return ret;
> +}
> +
> +static int az_blob_remove(struct hv_device *device)
> +{
> + struct az_blob_device *dev = hv_get_drvdata(device);
> +
> + az_blob_remove_device(dev);
> +
> + /*
> + * The Hyper-V VSP still owns the user buffers of those pending
> + * requests. Wait until all the user buffers are released to
> + * the original owner before proceeding to remove the bus device.
> + */
> + wait_event(dev->waiting_to_drain, list_empty(&dev->pending_requests));

The only scenario is which there could still be pending requests is a
host-initiated rescind operation. The host should also send completions for
any pending requests, but the rescind message and the completion requests
come through different paths, so the rescind message could be processed
before the completions. So waiting here makes sense.

However, I'm concerned about user processes that have the device open at
the time of the rescind. As stated in my comments on previous versions, I'm
not an expert in this area, but I'm assuming that after az_blob_remove_device()
completes, a user process cannot open the device. But it's not 100% clear to
me if az_blob_remove_device() waits for any existing opens of the device to be
closed. I'm assuming not. (If it does then we're back to the problem of the rescind
processing waiting for user space.) But assuming that existing opens remain in
place, there's a race condition between such a user process and this path.
The user process could try to submit a new request to the VMbus channel
*after* this path has already found the pending_request list to be empty,
and closed and deleted the VMbus device and channel.

I think the easiest solution is to reintroduce the "removing" flag in the
az_blob_device structure. The flag should be set in this path before calling
"wait_event()". And the flag should be checked in az_blob_ioctl_user_request()
*after* the new request has been put on the pending queue but before
referencing the VMbus channel. If az_blob_ioctl_user_request() finds the
flag set at that point, it should clean up and exit without touching the
VMbus channel. Note that the az_blob_device can't go away because the
user process will have bumped the reference count in az_blob_fop_open(),
so manipulating the pending_request queue and doing the wakeup on
waiting_to_drain is safe. And the "removing" flag should be written and
read with WRITE_ONCE() and READ_ONCE() to ensure the compiler and
processor don't reorder the operations.

> +
> + az_blob_remove_vmbus(device);
> + az_blob_device_put(dev);
> +
> + return 0;
> +}
> +
> +static struct hv_driver az_blob_drv = {
> + .name = KBUILD_MODNAME,
> + .id_table = id_table,
> + .probe = az_blob_probe,
> + .remove = az_blob_remove,
> + .driver = {
> + .probe_type = PROBE_PREFER_ASYNCHRONOUS,
> + },
> +};
> +
> +static int __init az_blob_drv_init(void)
> +{
> + return vmbus_driver_register(&az_blob_drv);
> +}
> +
> +static void __exit az_blob_drv_exit(void)
> +{
> + vmbus_driver_unregister(&az_blob_drv);
> +}
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Microsoft Azure Blob driver");
> +module_init(az_blob_drv_init);
> +module_exit(az_blob_drv_exit);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index d1e59db..ac31362 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -772,6 +772,7 @@ enum vmbus_device_type {
> HV_FCOPY,
> HV_BACKUP,
> HV_DM,
> + HV_AZURE_BLOB,
> HV_UNKNOWN,
> };
>
> @@ -1350,6 +1351,14 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
> 0x72, 0xe2, 0xff, 0xb1, 0xdc, 0x7f)
>
> /*
> + * Azure Blob GUID
> + * {0590b792-db64-45cc-81db-b8d70c577c9e}
> + */
> +#define HV_AZURE_BLOB_GUID \
> + .guid = GUID_INIT(0x0590b792, 0xdb64, 0x45cc, 0x81, 0xdb, \
> + 0xb8, 0xd7, 0x0c, 0x57, 0x7c, 0x9e)
> +
> +/*
> * Shutdown GUID
> * {0e0b6031-5213-4934-818b-38d90ced39db}
> */
> diff --git a/include/uapi/misc/hv_azure_blob.h b/include/uapi/misc/hv_azure_blob.h
> new file mode 100644
> index 0000000..87a3f77
> --- /dev/null
> +++ b/include/uapi/misc/hv_azure_blob.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> +/* Copyright (c) 2021 Microsoft Corporation. */
> +
> +#ifndef _AZ_BLOB_H
> +#define _AZ_BLOB_H
> +
> +#include <linux/ioctl.h>
> +#include <linux/uuid.h>
> +#include <linux/types.h>
> +
> +/* user-mode sync request sent through ioctl */
> +struct az_blob_request_sync_response {
> + __u32 status;
> + __u32 response_len;
> +};
> +
> +struct az_blob_request_sync {
> + guid_t guid;
> + __u32 timeout;
> + __u32 request_len;
> + __u32 response_len;
> + __u32 data_len;
> + __u32 data_valid;
> + __aligned_u64 request_buffer;
> + __aligned_u64 response_buffer;
> + __aligned_u64 data_buffer;
> + struct az_blob_request_sync_response response;
> +};
> +
> +#define AZ_BLOB_MAGIC_NUMBER 'R'
> +#define IOCTL_AZ_BLOB_DRIVER_USER_REQUEST \
> + _IOWR(AZ_BLOB_MAGIC_NUMBER, 0xf0, \
> + struct az_blob_request_sync)
> +
> +#endif /* define _AZ_BLOB_H */
> --
> 1.8.3.1

2021-09-22 23:58:14

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

> Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access
> to Microsoft Azure Blob for Azure VM
>
> > Subject: Re: [Patch v5 0/3] Introduce a driver to support host
> > accelerated access to Microsoft Azure Blob for Azure VM
> >
> > On Sat, Aug 07, 2021 at 06:29:06PM +0000, Long Li wrote:
> > > > I still think this "model" is totally broken and wrong overall.
> > > > Again, you are creating a custom "block" layer with a character
> > > > device, forcing all userspace programs to use a custom library
> > > > (where is it
> > at?) just to get their data.
> > >
> > > The Azure Blob library (with source code) is available in the
> > > following
> > languages:
> > > Java:
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > th
> > > ub.com%2FAzure%2Fazure-sdk-for-
> > java%2Ftree%2Fmain%2Fsdk%2Fstorage%2Faz
> > > ure-storage-
> > blob&amp;data=04%7C01%7Clongli%40microsoft.com%7C778083147
> > >
> > 8ed49b16e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1
> > %7C0%7C6
> > >
> > 37639965101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> > DAiLCJQIjoi
> > >
> > V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wcNhsEoH
> > LV0VBc
> > > uDf0CVXl7W0Ug9Cj7Q92%2Bw6qizroU%3D&amp;reserved=0
> > > JavaScript:
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > th
> > > ub.com%2FAzure%2Fazure-sdk-for-
> > js%2Ftree%2Fmain%2Fsdk%2Fstorage%2Fstor
> > > age-
> > blob&amp;data=04%7C01%7Clongli%40microsoft.com%7C7780831478ed49b
> > 16
> > >
> > e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C
> > 637639965
> > >
> > 101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
> > iV2luMzIi
> > >
> > LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=I%2FfhdPX3Unz6S3
> > eBPcpl
> > > %2Bh55nKoV0u%2FO0%2BYgjLy4grQ%3D&amp;reserved=0
> > > Python:
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > th
> > > ub.com%2FAzure%2Fazure-sdk-for-
> > python%2Ftree%2Fmain%2Fsdk%2Fstorage%2F
> > > azure-storage-
> > blob&amp;data=04%7C01%7Clongli%40microsoft.com%7C7780831
> > >
> > 478ed49b16e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7
> > C1%7C0%7
> > >
> > C637639965101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> > MDAiLCJQIj
> > >
> > oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aAwsi%2
> > BPVsN
> > > tsDMJ7rKnRDigNc41fIao031lde247Nc0%3D&amp;reserved=0
> > > Go:
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > th
> > > ub.com%2FAzure%2Fazure-storage-blob-
> > go&amp;data=04%7C01%7Clongli%40mic
> > >
> > rosoft.com%7C7780831478ed49b16e6308d95a2b7ae8%7C72f988bf86f141a
> > f91ab2d
> > >
> > 7cd011db47%7C1%7C0%7C637639965101378114%7CUnknown%7CTWFpbG
> > Zsb3d8eyJWIj
> > >
> > oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C10
> > 00&am
> > >
> > p;sdata=43JhbGsYQxA%2FoivNd7C3z7DSYO%2FPONCoaW2v7TN6xEU%3D&a
> > mp;reserve
> > > d=0
> > > .NET:
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > th
> > > ub.com%2FAzure%2Fazure-sdk-for-
> > net%2Ftree%2Fmain%2Fsdk%2Fstorage%2FAzu
> > >
> > re.Storage.Blobs&amp;data=04%7C01%7Clongli%40microsoft.com%7C77808
> > 3147
> > >
> > 8ed49b16e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1
> > %7C0%7C6
> > >
> > 37639965101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> > DAiLCJQIjoi
> > >
> > V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6ClMeURlt
> > cBv1q
> > > 7l7PGGrxXVJbVDt9uMBlwoIVh7Wpw%3D&amp;reserved=0
> > > PHP:
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > th
> > > ub.com%2FAzure%2Fazure-storage-php%2Ftree%2Fmaster%2Fazure-
> > storage-blo
> > >
> > b&amp;data=04%7C01%7Clongli%40microsoft.com%7C7780831478ed49b16
> > e6308d9
> > >
> > 5a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6376399
> > 651013781
> > >
> > 14%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> > LCJBTiI
> > >
> > 6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=DuZO539vd76c%2Byaqjn
> > hetp%2B3T
> > > i0b74601ZkNe39SNK4%3D&amp;reserved=0
> > > Ruby:
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > th
> > > ub.com%2Fazure%2Fazure-storage-
> > ruby%2Ftree%2Fmaster%2Fblob&amp;data=04
> > > %7C01%7Clongli%40microsoft.com%7C7780831478ed49b16e6308d95a2b
> > 7ae8%7C72
> > >
> > f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637639965101378114%7
> > CUnknown%
> > >
> > 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> > LCJX
> > >
> > VCI6Mn0%3D%7C1000&amp;sdata=6Zviu1IuRQE2do9bDCae2iJv0W2KOJu90t
> > XSR6kDAR
> > > 4%3D&amp;reserved=0
> > > C++:
> > >
> > C++https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fg
> > > C++ithub.com%2FAzure%2Fazure-sdk-for-
> > cpp%2Ftree%2Fmain%2Fsdk%2Fstorage
> > > C++%23azure-storage-client-library-for-
> > c&amp;data=04%7C01%7Clongli%40m
> > >
> > C++icrosoft.com%7C7780831478ed49b16e6308d95a2b7ae8%7C72f988bf86
> > f141af9
> > >
> > C++1ab2d7cd011db47%7C1%7C0%7C637639965101388074%7CUnknown%
> > 7CTWFpbGZsb3
> > >
> >
> C++d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> > 0%3
> > >
> > C++D%7C1000&amp;sdata=HH6jrqREWQ%2BkoRR%2Fsb02wRXnuLU5il4Erzm
> > rBvUZu5w%
> > > C++3D&amp;reserved=0
> >
> > And why wasn't this linked to in the changelog here?
> >
> > In looking at the C code above, where is the interaction with this Linux driver?
> > I can't seem to find it...

Greg,

I apologize for the delay. I have attached the Java transport library (a tgz file) in the email. The file is released for review under "The MIT License (MIT)".

The transport library implemented functions needed for reading from a Block Blob using this driver. The function for transporting I/O is Java_com_azure_storage_fastpath_driver_FastpathDriver_read(), defined in "./src/fastpath/jni/fpjar_endpoint.cpp".

In particular, requestParams is in JSON format (REST) that is passed from a Blob application using Blob API for reading from a Block Blob.

For an example of how a Blob application using the transport library, please see Blob support for Hadoop ABFS:
https://github.com/apache/hadoop/pull/3309/commits/be7d12662e23a13e6cf10cf1fa5e7eb109738e7d

In ABFS, the entry point for using Blob I/O is at AbfsRestOperation executeRead() in hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java, from line 553 to 564, this function eventually calls into executeFastpathRead() in hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java.

ReadRequestParameters is the data that is passed to requestParams (described above) in the transport library. In this Blob application use-case, ReadRequestParameters has eTag and sessionInfo (sessionToken). They are both defined in this commit, and are treated as strings passed in JSON format to I/O issuing function Java_com_azure_storage_fastpath_driver_FastpathDriver_read() in the transport library using this driver.

Thanks,
Long

>
> Those are existing Blob client libraries. The new code using this driver is being
> tested and has not been released to github.
>
> I'm sorry I misunderstood your request. I'm asking the team to share the new
> code for review. I will send the code location for review soon.
>
> Thanks,
> Long


Attachments:
shared_library_for_azure_storage_fastpath_connector_for_java.tgz (40.14 kB)
shared_library_for_azure_storage_fastpath_connector_for_java.tgz

2021-09-30 22:28:10

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

> Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated
> access to Microsoft Azure Blob for Azure VM
>
> > Subject: RE: [Patch v5 0/3] Introduce a driver to support host
> > accelerated access to Microsoft Azure Blob for Azure VM
> >
> > > Subject: Re: [Patch v5 0/3] Introduce a driver to support host
> > > accelerated access to Microsoft Azure Blob for Azure VM
> > >
> > > On Sat, Aug 07, 2021 at 06:29:06PM +0000, Long Li wrote:
> > > > > I still think this "model" is totally broken and wrong overall.
> > > > > Again, you are creating a custom "block" layer with a character
> > > > > device, forcing all userspace programs to use a custom library
> > > > > (where is it
> > > at?) just to get their data.
> > > >
> > > > The Azure Blob library (with source code) is available in the
> > > > following
> > > languages:
> > > > Java:
> > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > gi
> > > > th
> > > > ub.com%2FAzure%2Fazure-sdk-for-
> > > java%2Ftree%2Fmain%2Fsdk%2Fstorage%2Faz
> > > > ure-storage-
> > > blob&amp;data=04%7C01%7Clongli%40microsoft.com%7C778083147
> > > >
> > > 8ed49b16e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1
> > > %7C0%7C6
> > > >
> > >
> 37639965101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> > > DAiLCJQIjoi
> > > >
> > >
> V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wcNhsEo
> H
> > > LV0VBc
> > > > uDf0CVXl7W0Ug9Cj7Q92%2Bw6qizroU%3D&amp;reserved=0
> > > > JavaScript:
> > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > gi
> > > > th
> > > > ub.com%2FAzure%2Fazure-sdk-for-
> > > js%2Ftree%2Fmain%2Fsdk%2Fstorage%2Fstor
> > > > age-
> > >
> blob&amp;data=04%7C01%7Clongli%40microsoft.com%7C7780831478ed49b
> > > 16
> > > >
> > > e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C
> > > 637639965
> > > >
> > >
> 101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj
> o
> > > iV2luMzIi
> > > >
> > >
> LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=I%2FfhdPX3Unz6S
> 3
> > > eBPcpl
> > > > %2Bh55nKoV0u%2FO0%2BYgjLy4grQ%3D&amp;reserved=0
> > > > Python:
> > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > gi
> > > > th
> > > > ub.com%2FAzure%2Fazure-sdk-for-
> > > python%2Ftree%2Fmain%2Fsdk%2Fstorage%2F
> > > > azure-storage-
> > > blob&amp;data=04%7C01%7Clongli%40microsoft.com%7C7780831
> > > >
> > > 478ed49b16e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7
> > > C1%7C0%7
> > > >
> > >
> C637639965101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> > > MDAiLCJQIj
> > > >
> > >
> oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aAwsi%
> 2
> > > BPVsN
> > > > tsDMJ7rKnRDigNc41fIao031lde247Nc0%3D&amp;reserved=0
> > > > Go:
> > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > gi
> > > > th
> > > > ub.com%2FAzure%2Fazure-storage-blob-
> > > go&amp;data=04%7C01%7Clongli%40mic
> > > >
> > >
> rosoft.com%7C7780831478ed49b16e6308d95a2b7ae8%7C72f988bf86f141a
> > > f91ab2d
> > > >
> > >
> 7cd011db47%7C1%7C0%7C637639965101378114%7CUnknown%7CTWFpbG
> > > Zsb3d8eyJWIj
> > > >
> > >
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
> 0
> > > 00&am
> > > >
> > >
> p;sdata=43JhbGsYQxA%2FoivNd7C3z7DSYO%2FPONCoaW2v7TN6xEU%3D&a
> > > mp;reserve
> > > > d=0
> > > > .NET:
> > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > gi
> > > > th
> > > > ub.com%2FAzure%2Fazure-sdk-for-
> > > net%2Ftree%2Fmain%2Fsdk%2Fstorage%2FAzu
> > > >
> > >
> re.Storage.Blobs&amp;data=04%7C01%7Clongli%40microsoft.com%7C77808
> > > 3147
> > > >
> > > 8ed49b16e6308d95a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1
> > > %7C0%7C6
> > > >
> > >
> 37639965101378114%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> > > DAiLCJQIjoi
> > > >
> > >
> V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6ClMeURl
> t
> > > cBv1q
> > > > 7l7PGGrxXVJbVDt9uMBlwoIVh7Wpw%3D&amp;reserved=0
> > > > PHP:
> > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > gi
> > > > th
> > > > ub.com%2FAzure%2Fazure-storage-php%2Ftree%2Fmaster%2Fazure-
> > > storage-blo
> > > >
> > >
> b&amp;data=04%7C01%7Clongli%40microsoft.com%7C7780831478ed49b16
> > > e6308d9
> > > >
> > > 5a2b7ae8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6376399
> > > 651013781
> > > >
> > >
> 14%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
> zIi
> > > LCJBTiI
> > > >
> > >
> 6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=DuZO539vd76c%2Byaqjn
> > > hetp%2B3T
> > > > i0b74601ZkNe39SNK4%3D&amp;reserved=0
> > > > Ruby:
> > > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > gi
> > > > th
> > > > ub.com%2Fazure%2Fazure-storage-
> > > ruby%2Ftree%2Fmaster%2Fblob&amp;data=04
> > > > %7C01%7Clongli%40microsoft.com%7C7780831478ed49b16e6308d95a2
> b
> > > 7ae8%7C72
> > > >
> > > f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637639965101378114%7
> > > CUnknown%
> > > >
> > >
> 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haW
> wi
> > > LCJX
> > > >
> > >
> VCI6Mn0%3D%7C1000&amp;sdata=6Zviu1IuRQE2do9bDCae2iJv0W2KOJu90t
> > > XSR6kDAR
> > > > 4%3D&amp;reserved=0
> > > > C++:
> > > >
> > >
> C++https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
> > > C++Fg
> > > > C++ithub.com%2FAzure%2Fazure-sdk-for-
> > > cpp%2Ftree%2Fmain%2Fsdk%2Fstorage
> > > > C++%23azure-storage-client-library-for-
> > > c&amp;data=04%7C01%7Clongli%40m
> > > >
> > >
> C++icrosoft.com%7C7780831478ed49b16e6308d95a2b7ae8%7C72f988bf86
> > > f141af9
> > > >
> > > C++1ab2d7cd011db47%7C1%7C0%7C637639965101388074%7CUnknown%
> > > 7CTWFpbGZsb3
> > > >
> > >
> >
> C++d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
> Mn
> > > 0%3
> > > >
> > >
> C++D%7C1000&amp;sdata=HH6jrqREWQ%2BkoRR%2Fsb02wRXnuLU5il4Erzm
> > > rBvUZu5w%
> > > > C++3D&amp;reserved=0
> > >
> > > And why wasn't this linked to in the changelog here?
> > >
> > > In looking at the C code above, where is the interaction with this Linux
> driver?
> > > I can't seem to find it...
>
> Greg,
>
> I apologize for the delay. I have attached the Java transport library (a tgz file)
> in the email. The file is released for review under "The MIT License (MIT)".
>
> The transport library implemented functions needed for reading from a Block
> Blob using this driver. The function for transporting I/O is
> Java_com_azure_storage_fastpath_driver_FastpathDriver_read(), defined
> in "./src/fastpath/jni/fpjar_endpoint.cpp".
>
> In particular, requestParams is in JSON format (REST) that is passed from a
> Blob application using Blob API for reading from a Block Blob.
>
> For an example of how a Blob application using the transport library, please
> see Blob support for Hadoop ABFS:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Fhadoop%2Fpull%2F3309%2Fcommits%2Fbe7d12662e2
> 3a13e6cf10cf1fa5e7eb109738e7d&amp;data=04%7C01%7Clongli%40microsof
> t.com%7C3acb68c5fd6144a1857908d97e247376%7C72f988bf86f141af91ab2d7
> cd011db47%7C1%7C0%7C637679518802561720%7CUnknown%7CTWFpbGZsb
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> %3D%7C1000&amp;sdata=6z3ZXPtMC5OvF%2FgrtbcRdFlqzzR1xJNRxE2v2Qrx
> FL8%3D&amp;reserved=0
>
> In ABFS, the entry point for using Blob I/O is at AbfsRestOperation
> executeRead() in hadoop-tools/hadoop-
> azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStr
> eam.java, from line 553 to 564, this function eventually calls into
> executeFastpathRead() in hadoop-tools/hadoop-
> azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.ja
> va.
>
> ReadRequestParameters is the data that is passed to requestParams
> (described above) in the transport library. In this Blob application use-case,
> ReadRequestParameters has eTag and sessionInfo (sessionToken). They are
> both defined in this commit, and are treated as strings passed in JSON format
> to I/O issuing function
> Java_com_azure_storage_fastpath_driver_FastpathDriver_read() in the
> transport library using this driver.
>
> Thanks,
> Long

Hello Greg,

I have shared the source code of the Blob client using this driver, and the reason why the Azure Blob driver is not implemented through POSIX with file system and Block layer.

Blob APIs are specified in this doc:
https://docs.microsoft.com/en-us/rest/api/storageservices/blob-service-rest-api

The semantic of reading data from Blob is specified in this doc:
https://docs.microsoft.com/en-us/rest/api/storageservices/get-blob

The source code I shared demonstrated how a Blob is read to Hadoop through ABFS. In general, A Blob client can use any optional request headers specified in the API suitable for its specific application. The Azure Blob service is not designed to be POSIX compliant. I hope this answers your question on why this driver is not implemented at file system or block layer.

Do you have more comments on this driver?

Thanks,
Long


2021-10-01 07:38:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

On Thu, Sep 30, 2021 at 10:25:12PM +0000, Long Li wrote:
> > Greg,
> >
> > I apologize for the delay. I have attached the Java transport library (a tgz file)
> > in the email. The file is released for review under "The MIT License (MIT)".
> >
> > The transport library implemented functions needed for reading from a Block
> > Blob using this driver. The function for transporting I/O is
> > Java_com_azure_storage_fastpath_driver_FastpathDriver_read(), defined
> > in "./src/fastpath/jni/fpjar_endpoint.cpp".
> >
> > In particular, requestParams is in JSON format (REST) that is passed from a
> > Blob application using Blob API for reading from a Block Blob.
> >
> > For an example of how a Blob application using the transport library, please
> > see Blob support for Hadoop ABFS:
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2Fapache%2Fhadoop%2Fpull%2F3309%2Fcommits%2Fbe7d12662e2
> > 3a13e6cf10cf1fa5e7eb109738e7d&amp;data=04%7C01%7Clongli%40microsof
> > t.com%7C3acb68c5fd6144a1857908d97e247376%7C72f988bf86f141af91ab2d7
> > cd011db47%7C1%7C0%7C637679518802561720%7CUnknown%7CTWFpbGZsb
> > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> > %3D%7C1000&amp;sdata=6z3ZXPtMC5OvF%2FgrtbcRdFlqzzR1xJNRxE2v2Qrx
> > FL8%3D&amp;reserved=0

Odd url :(

> > In ABFS, the entry point for using Blob I/O is at AbfsRestOperation
> > executeRead() in hadoop-tools/hadoop-
> > azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStr
> > eam.java, from line 553 to 564, this function eventually calls into
> > executeFastpathRead() in hadoop-tools/hadoop-
> > azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.ja
> > va.
> >
> > ReadRequestParameters is the data that is passed to requestParams
> > (described above) in the transport library. In this Blob application use-case,
> > ReadRequestParameters has eTag and sessionInfo (sessionToken). They are
> > both defined in this commit, and are treated as strings passed in JSON format
> > to I/O issuing function
> > Java_com_azure_storage_fastpath_driver_FastpathDriver_read() in the
> > transport library using this driver.
> >
> > Thanks,
> > Long
>
> Hello Greg,
>
> I have shared the source code of the Blob client using this driver, and the reason why the Azure Blob driver is not implemented through POSIX with file system and Block layer.

Please wrap your text lines...

Anyway, no, you showed a client for this interface, but you did not
explain why this could not be implemented using a filesystem and block
layer. Only that it is not what you did.

> Blob APIs are specified in this doc:
> https://docs.microsoft.com/en-us/rest/api/storageservices/blob-service-rest-api
>
> The semantic of reading data from Blob is specified in this doc:
> https://docs.microsoft.com/en-us/rest/api/storageservices/get-blob
>
> The source code I shared demonstrated how a Blob is read to Hadoop through ABFS. In general, A Blob client can use any optional request headers specified in the API suitable for its specific application. The Azure Blob service is not designed to be POSIX compliant. I hope this answers your question on why this driver is not implemented at file system or block layer.


Again, you are saying "it is this way because we created it this way",
which does not answer the question of "why were you required to do it
this way", right?

> Do you have more comments on this driver?

Again, please answer _why_ you are going around the block layer and
creating a new api that circumvents all of the interfaces and
protections that the normal file system layer provides. What is lacking
in the existing apis that has required you to create a new one that is
incompatible with everything that has ever existed so far?

thanks,

greg k-h

2021-10-07 19:47:08

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

> Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated
> access to Microsoft Azure Blob for Azure VM
>
> On Thu, Sep 30, 2021 at 10:25:12PM +0000, Long Li wrote:
> > > Greg,
> > >
> > > I apologize for the delay. I have attached the Java transport
> > > library (a tgz file) in the email. The file is released for review under "The
> MIT License (MIT)".
> > >
> > > The transport library implemented functions needed for reading from
> > > a Block Blob using this driver. The function for transporting I/O is
> > > Java_com_azure_storage_fastpath_driver_FastpathDriver_read(),
> > > defined in "./src/fastpath/jni/fpjar_endpoint.cpp".
> > >
> > > In particular, requestParams is in JSON format (REST) that is passed
> > > from a Blob application using Blob API for reading from a Block Blob.
> > >
> > > For an example of how a Blob application using the transport
> > > library, please see Blob support for Hadoop ABFS:
> > >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > th
> > >
> ub.com%2Fapache%2Fhadoop%2Fpull%2F3309%2Fcommits%2Fbe7d12662e2
> > >
> 3a13e6cf10cf1fa5e7eb109738e7d&amp;data=04%7C01%7Clongli%40microsof
> > >
> t.com%7C3acb68c5fd6144a1857908d97e247376%7C72f988bf86f141af91ab2d7
> > >
> cd011db47%7C1%7C0%7C637679518802561720%7CUnknown%7CTWFpbGZsb
> > >
> 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> > > %3D%7C1000&amp;sdata=6z3ZXPtMC5OvF%2FgrtbcRdFlqzzR1xJNRxE2v2
> Qrx
> > > FL8%3D&amp;reserved=0
>
> Odd url :(
>
> > > In ABFS, the entry point for using Blob I/O is at AbfsRestOperation
> > > executeRead() in hadoop-tools/hadoop-
> > >
> azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInput
> > > Str eam.java, from line 553 to 564, this function eventually calls
> > > into
> > > executeFastpathRead() in hadoop-tools/hadoop-
> > > azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClien
> > > t.ja
> > > va.
> > >
> > > ReadRequestParameters is the data that is passed to requestParams
> > > (described above) in the transport library. In this Blob application
> > > use-case, ReadRequestParameters has eTag and sessionInfo
> > > (sessionToken). They are both defined in this commit, and are
> > > treated as strings passed in JSON format to I/O issuing function
> > > Java_com_azure_storage_fastpath_driver_FastpathDriver_read() in the
> > > transport library using this driver.
> > >
> > > Thanks,
> > > Long
> >
> > Hello Greg,
> >
> > I have shared the source code of the Blob client using this driver, and the
> reason why the Azure Blob driver is not implemented through POSIX with file
> system and Block layer.
>
> Please wrap your text lines...
>
> Anyway, no, you showed a client for this interface, but you did not explain
> why this could not be implemented using a filesystem and block layer. Only
> that it is not what you did.
>
> > Blob APIs are specified in this doc:
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> > .microsoft.com%2Fen-us%2Frest%2Fapi%2Fstorageservices%2Fblob-
> service-r
> > est-
> api&amp;data=04%7C01%7Clongli%40microsoft.com%7C6a51f21c78a3413e63
> >
> 9d08d984ae2c58%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6376
> 867059
> >
> 24012728%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
> V2luMzIiL
> >
> CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ZiWmZ%2FpuQHNn
> dHNmnIWHO
> > yrXPSscNBbR6RvSr%2FCBuEY%3D&amp;reserved=0
> >
> > The semantic of reading data from Blob is specified in this doc:
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> > .microsoft.com%2Fen-us%2Frest%2Fapi%2Fstorageservices%2Fget-
> blob&amp;d
> >
> ata=04%7C01%7Clongli%40microsoft.com%7C6a51f21c78a3413e639d08d984a
> e2c5
> >
> 8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63768670592401272
> 8%7CUn
> >
> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
> Ik1haW
> >
> wiLCJXVCI6Mn0%3D%7C1000&amp;sdata=xqUObAdYkFf8efSRuK%2FOXm%2
> BRd%2FCiBI
> > 0BjNfx9YpkGN0%3D&amp;reserved=0
> >
> > The source code I shared demonstrated how a Blob is read to Hadoop
> through ABFS. In general, A Blob client can use any optional request headers
> specified in the API suitable for its specific application. The Azure Blob service
> is not designed to be POSIX compliant. I hope this answers your question on
> why this driver is not implemented at file system or block layer.
>
>
> Again, you are saying "it is this way because we created it this way", which
> does not answer the question of "why were you required to do it this way",
> right?
>
> > Do you have more comments on this driver?
>
> Again, please answer _why_ you are going around the block layer and
> creating a new api that circumvents all of the interfaces and protections that
> the normal file system layer provides. What is lacking in the existing apis that
> has required you to create a new one that is incompatible with everything
> that has ever existed so far?
>
> thanks,
>
> greg k-h

Hello Greg,

Azure Blob is massively scalable and secure object storage designed for cloud native
workloads. Many of its features are not possible to implement through POSIX file
system. Please find some of them below:

For read and write API calls (for both data and metadata) Conditional Support
(https://docs.microsoft.com/en-us/rest/api/storageservices/specifying-conditional-headers-for-blob-service-operations)
is supported by Azure Blob. Every change will result in an update to the Last Modified
Time (== ETag) of the changed file and customers can use If-Modified-Since, If-Unmodified-Since,
If-Match, and If-None-Match conditions. Furthermore, almost all APIs support this
since customers require fine-grained and complete control via these conditions. It
is not possible/practical to implement Conditional Support in POSIX filesystem.

The Blob API supports multiple write-modes of files with three different blob types:
Block Blobs (https://docs.microsoft.com/en-us/rest/api/storageservices/operations-on-block-blobs),
Append Blobs, and Page Blobs. Block Blobs support very large file sizes (hundreds
of TBs in a single file) and are more optimal for larger blocks, have two-phased
commit protocol, block sharing, and application control over block identifiers. Block
blobs support both uncommitted and committed data. Block blobs allow the user to
stage a series of modifications, then atomically update the block list to incorporate
multiple disjoint updates in one operation. This is not possible in POSIX filesystem.

Azure Blob supports Blob Tiers (https://docs.microsoft.com/en-us/azure/storage/blobs/access-tiers-overview).
The "Archive" tier is not possible to implement in POSIX file system. To access data
from an "Archive" tier, it needs to go through rehydration (https://docs.microsoft.com/en-us/azure/storage/blobs/archive-rehydrate-overview)
to become "Cool" or "Hot" tier. Note that the customer requirement for tiers is that
they do not change what URI, endpoint, or file/folder they access at all - same endpoint,
same file path is a must requirement. There is no POSIX semantics to describe Archive
and Rehydration, while maintaining the same path for the data.

The Azure Blob feature Customer Provided Keys (https://docs.microsoft.com/en-us/azure/storage/blobs/encryption-customer-provided-keys)
provides different encryption key for data at a per-request level. It's not possible
to inject this into POSIX filesystem and it is a critical security feature for customers
requiring higher level of security such as the Finance industry customers. There
exists file-level metadata implementation that indicates info about the encryption
as well. Note that encryption at file/folder level or higher granularity does not
meet such customers' needs - not just on individual customer requirements but also
related financial regulations.

The Immutable Storage (https://docs.microsoft.com/en-us/azure/storage/blobs/immutable-storage-overview)
feature is not possible with POSIX filesystem. This provides WORM (Write-Once Read-Many)
guarantees on data where it is impossible (regardless of access control, i.e. even
the highest level administrator/root) to modify/delete data until a certain interval
has passed; it also includes features such as Legal Hold. Note that per the industry
and security requirements, the store must enforce these WORM and Legal Hold aspects
directly, it cannot be done with access control mechanisms or enforcing this at the
various endpoints that access the data.

Blob Index (https://docs.microsoft.com/en-us/azure/storage/blobs/storage-manage-find-blobs)
which provides multi-dimensions secondary indexing on user-settable blob tags (metadata)
is not possible to accomplish in POSIX filesystem. The indexing engine needs to incorporate
with Storage access control integration, Lifecycle retention integration, runtime
API call conditions, it's not possible to support in the filesystem itself; in other
words, it cannot be done as a side-car or higher level service without compromising
on the customer requirements for Blob Index. Related Blob APIs for this are Set Blob
Tags (https://docs.microsoft.com/en-us/rest/api/storageservices/set-blob-tags) and
Find Blob by Tags (https://docs.microsoft.com/en-us/rest/api/storageservices/find-blobs-by-tags).

Thanks,

Long

2021-10-08 05:59:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

On Thu, Oct 07, 2021 at 06:15:25PM +0000, Long Li wrote:
> > Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated
> > access to Microsoft Azure Blob for Azure VM
> >
> > On Thu, Sep 30, 2021 at 10:25:12PM +0000, Long Li wrote:
> > > > Greg,
> > > >
> > > > I apologize for the delay. I have attached the Java transport
> > > > library (a tgz file) in the email. The file is released for review under "The
> > MIT License (MIT)".
> > > >
> > > > The transport library implemented functions needed for reading from
> > > > a Block Blob using this driver. The function for transporting I/O is
> > > > Java_com_azure_storage_fastpath_driver_FastpathDriver_read(),
> > > > defined in "./src/fastpath/jni/fpjar_endpoint.cpp".
> > > >
> > > > In particular, requestParams is in JSON format (REST) that is passed
> > > > from a Blob application using Blob API for reading from a Block Blob.
> > > >
> > > > For an example of how a Blob application using the transport
> > > > library, please see Blob support for Hadoop ABFS:
> > > >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > > th
> > > >
> > ub.com%2Fapache%2Fhadoop%2Fpull%2F3309%2Fcommits%2Fbe7d12662e2
> > > >
> > 3a13e6cf10cf1fa5e7eb109738e7d&amp;data=04%7C01%7Clongli%40microsof
> > > >
> > t.com%7C3acb68c5fd6144a1857908d97e247376%7C72f988bf86f141af91ab2d7
> > > >
> > cd011db47%7C1%7C0%7C637679518802561720%7CUnknown%7CTWFpbGZsb
> > > >
> > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
> > > > %3D%7C1000&amp;sdata=6z3ZXPtMC5OvF%2FgrtbcRdFlqzzR1xJNRxE2v2
> > Qrx
> > > > FL8%3D&amp;reserved=0
> >
> > Odd url :(
> >
> > > > In ABFS, the entry point for using Blob I/O is at AbfsRestOperation
> > > > executeRead() in hadoop-tools/hadoop-
> > > >
> > azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInput
> > > > Str eam.java, from line 553 to 564, this function eventually calls
> > > > into
> > > > executeFastpathRead() in hadoop-tools/hadoop-
> > > > azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClien
> > > > t.ja
> > > > va.
> > > >
> > > > ReadRequestParameters is the data that is passed to requestParams
> > > > (described above) in the transport library. In this Blob application
> > > > use-case, ReadRequestParameters has eTag and sessionInfo
> > > > (sessionToken). They are both defined in this commit, and are
> > > > treated as strings passed in JSON format to I/O issuing function
> > > > Java_com_azure_storage_fastpath_driver_FastpathDriver_read() in the
> > > > transport library using this driver.
> > > >
> > > > Thanks,
> > > > Long
> > >
> > > Hello Greg,
> > >
> > > I have shared the source code of the Blob client using this driver, and the
> > reason why the Azure Blob driver is not implemented through POSIX with file
> > system and Block layer.
> >
> > Please wrap your text lines...
> >
> > Anyway, no, you showed a client for this interface, but you did not explain
> > why this could not be implemented using a filesystem and block layer. Only
> > that it is not what you did.
> >
> > > Blob APIs are specified in this doc:
> > >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> > > .microsoft.com%2Fen-us%2Frest%2Fapi%2Fstorageservices%2Fblob-
> > service-r
> > > est-
> > api&amp;data=04%7C01%7Clongli%40microsoft.com%7C6a51f21c78a3413e63
> > >
> > 9d08d984ae2c58%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6376
> > 867059
> > >
> > 24012728%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
> > V2luMzIiL
> > >
> > CJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ZiWmZ%2FpuQHNn
> > dHNmnIWHO
> > > yrXPSscNBbR6RvSr%2FCBuEY%3D&amp;reserved=0
> > >
> > > The semantic of reading data from Blob is specified in this doc:
> > >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
> > > .microsoft.com%2Fen-us%2Frest%2Fapi%2Fstorageservices%2Fget-
> > blob&amp;d
> > >
> > ata=04%7C01%7Clongli%40microsoft.com%7C6a51f21c78a3413e639d08d984a
> > e2c5
> > >
> > 8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63768670592401272
> > 8%7CUn
> > >
> > known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
> > Ik1haW
> > >
> > wiLCJXVCI6Mn0%3D%7C1000&amp;sdata=xqUObAdYkFf8efSRuK%2FOXm%2
> > BRd%2FCiBI
> > > 0BjNfx9YpkGN0%3D&amp;reserved=0
> > >
> > > The source code I shared demonstrated how a Blob is read to Hadoop
> > through ABFS. In general, A Blob client can use any optional request headers
> > specified in the API suitable for its specific application. The Azure Blob service
> > is not designed to be POSIX compliant. I hope this answers your question on
> > why this driver is not implemented at file system or block layer.
> >
> >
> > Again, you are saying "it is this way because we created it this way", which
> > does not answer the question of "why were you required to do it this way",
> > right?
> >
> > > Do you have more comments on this driver?
> >
> > Again, please answer _why_ you are going around the block layer and
> > creating a new api that circumvents all of the interfaces and protections that
> > the normal file system layer provides. What is lacking in the existing apis that
> > has required you to create a new one that is incompatible with everything
> > that has ever existed so far?
> >
> > thanks,
> >
> > greg k-h
>
> Hello Greg,
>
> Azure Blob is massively scalable and secure object storage designed for cloud native
> workloads. Many of its features are not possible to implement through POSIX file
> system. Please find some of them below:
>
> For read and write API calls (for both data and metadata) Conditional Support
> (https://docs.microsoft.com/en-us/rest/api/storageservices/specifying-conditional-headers-for-blob-service-operations)
> is supported by Azure Blob. Every change will result in an update to the Last Modified
> Time (== ETag) of the changed file and customers can use If-Modified-Since, If-Unmodified-Since,
> If-Match, and If-None-Match conditions. Furthermore, almost all APIs support this
> since customers require fine-grained and complete control via these conditions. It
> is not possible/practical to implement Conditional Support in POSIX filesystem.
>
> The Blob API supports multiple write-modes of files with three different blob types:
> Block Blobs (https://docs.microsoft.com/en-us/rest/api/storageservices/operations-on-block-blobs),
> Append Blobs, and Page Blobs. Block Blobs support very large file sizes (hundreds
> of TBs in a single file) and are more optimal for larger blocks, have two-phased
> commit protocol, block sharing, and application control over block identifiers. Block
> blobs support both uncommitted and committed data. Block blobs allow the user to
> stage a series of modifications, then atomically update the block list to incorporate
> multiple disjoint updates in one operation. This is not possible in POSIX filesystem.
>
> Azure Blob supports Blob Tiers (https://docs.microsoft.com/en-us/azure/storage/blobs/access-tiers-overview).
> The "Archive" tier is not possible to implement in POSIX file system. To access data
> from an "Archive" tier, it needs to go through rehydration (https://docs.microsoft.com/en-us/azure/storage/blobs/archive-rehydrate-overview)
> to become "Cool" or "Hot" tier. Note that the customer requirement for tiers is that
> they do not change what URI, endpoint, or file/folder they access at all - same endpoint,
> same file path is a must requirement. There is no POSIX semantics to describe Archive
> and Rehydration, while maintaining the same path for the data.
>
> The Azure Blob feature Customer Provided Keys (https://docs.microsoft.com/en-us/azure/storage/blobs/encryption-customer-provided-keys)
> provides different encryption key for data at a per-request level. It's not possible
> to inject this into POSIX filesystem and it is a critical security feature for customers
> requiring higher level of security such as the Finance industry customers. There
> exists file-level metadata implementation that indicates info about the encryption
> as well. Note that encryption at file/folder level or higher granularity does not
> meet such customers' needs - not just on individual customer requirements but also
> related financial regulations.
>
> The Immutable Storage (https://docs.microsoft.com/en-us/azure/storage/blobs/immutable-storage-overview)
> feature is not possible with POSIX filesystem. This provides WORM (Write-Once Read-Many)
> guarantees on data where it is impossible (regardless of access control, i.e. even
> the highest level administrator/root) to modify/delete data until a certain interval
> has passed; it also includes features such as Legal Hold. Note that per the industry
> and security requirements, the store must enforce these WORM and Legal Hold aspects
> directly, it cannot be done with access control mechanisms or enforcing this at the
> various endpoints that access the data.
>
> Blob Index (https://docs.microsoft.com/en-us/azure/storage/blobs/storage-manage-find-blobs)
> which provides multi-dimensions secondary indexing on user-settable blob tags (metadata)
> is not possible to accomplish in POSIX filesystem. The indexing engine needs to incorporate
> with Storage access control integration, Lifecycle retention integration, runtime
> API call conditions, it's not possible to support in the filesystem itself; in other
> words, it cannot be done as a side-car or higher level service without compromising
> on the customer requirements for Blob Index. Related Blob APIs for this are Set Blob
> Tags (https://docs.microsoft.com/en-us/rest/api/storageservices/set-blob-tags) and
> Find Blob by Tags (https://docs.microsoft.com/en-us/rest/api/storageservices/find-blobs-by-tags).

You are mixing up a lot of different things here all at once.

You are confusing the actions that a file server must implement with how
a userspace interface should look like for an in-kernel functionality.

Not to mention the whole crazy idea of "let's implement our REST api
that used to go over a network connection over an ioctl instead!"
That's the main problem that you need to push back on here.

What is forcing you to put all of this into the kernel in the first
place? What's wrong with the userspace network connection/protocol that
you have today?

Does this mean that we now have to implement all REST apis that people
dream up as ioctl interfaces over a hyperv transport? That would be
insane.

thanks,

greg k-h

2021-10-08 11:12:12

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

Greg Kroah-Hartman <[email protected]> writes:

...
>
> Not to mention the whole crazy idea of "let's implement our REST api
> that used to go over a network connection over an ioctl instead!"
> That's the main problem that you need to push back on here.
>
> What is forcing you to put all of this into the kernel in the first
> place? What's wrong with the userspace network connection/protocol that
> you have today?
>
> Does this mean that we now have to implement all REST apis that people
> dream up as ioctl interfaces over a hyperv transport? That would be
> insane.

As far as I understand, the purpose of the driver is to replace a "slow"
network connection to API endpoint with a "fast" transport over
Vmbus. So what if instead of implementing this new driver we just use
Hyper-V Vsock and move API endpoint to the host?

--
Vitaly

2021-10-08 11:22:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

On Fri, Oct 08, 2021 at 01:11:02PM +0200, Vitaly Kuznetsov wrote:
> Greg Kroah-Hartman <[email protected]> writes:
>
> ...
> >
> > Not to mention the whole crazy idea of "let's implement our REST api
> > that used to go over a network connection over an ioctl instead!"
> > That's the main problem that you need to push back on here.
> >
> > What is forcing you to put all of this into the kernel in the first
> > place? What's wrong with the userspace network connection/protocol that
> > you have today?
> >
> > Does this mean that we now have to implement all REST apis that people
> > dream up as ioctl interfaces over a hyperv transport? That would be
> > insane.
>
> As far as I understand, the purpose of the driver is to replace a "slow"
> network connection to API endpoint with a "fast" transport over
> Vmbus.

Given that the network connection is already over vmbus, how is this
"slow" today? I have yet to see any benchmark numbers anywhere :(

> So what if instead of implementing this new driver we just use
> Hyper-V Vsock and move API endpoint to the host?

What is running on the host in the hypervisor that is supposed to be
handling these requests? Isn't that really on some other guest?

confused,

greg k-h

2021-10-08 13:31:01

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

Greg Kroah-Hartman <[email protected]> writes:

> On Fri, Oct 08, 2021 at 01:11:02PM +0200, Vitaly Kuznetsov wrote:
>> Greg Kroah-Hartman <[email protected]> writes:
>>
>> ...
>> >
>> > Not to mention the whole crazy idea of "let's implement our REST api
>> > that used to go over a network connection over an ioctl instead!"
>> > That's the main problem that you need to push back on here.
>> >
>> > What is forcing you to put all of this into the kernel in the first
>> > place? What's wrong with the userspace network connection/protocol that
>> > you have today?
>> >
>> > Does this mean that we now have to implement all REST apis that people
>> > dream up as ioctl interfaces over a hyperv transport? That would be
>> > insane.
>>
>> As far as I understand, the purpose of the driver is to replace a "slow"
>> network connection to API endpoint with a "fast" transport over
>> Vmbus.
>
> Given that the network connection is already over vmbus, how is this
> "slow" today? I have yet to see any benchmark numbers anywhere :(
>
>> So what if instead of implementing this new driver we just use
>> Hyper-V Vsock and move API endpoint to the host?
>
> What is running on the host in the hypervisor that is supposed to be
> handling these requests? Isn't that really on some other guest?
>

Long,

would it be possible to draw a simple picture for us describing the
backend flow of the feature, both with network connection and with this
new driver? We're struggling to understand which particular bottleneck
the driver is trying to eliminate.

--
Vitaly

2021-10-11 17:52:01

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

> Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access
> to Microsoft Azure Blob for Azure VM
>
> Greg Kroah-Hartman <[email protected]> writes:
>
> ...
> >
> > Not to mention the whole crazy idea of "let's implement our REST api
> > that used to go over a network connection over an ioctl instead!"
> > That's the main problem that you need to push back on here.
> >
> > What is forcing you to put all of this into the kernel in the first
> > place? What's wrong with the userspace network connection/protocol
> > that you have today?
> >
> > Does this mean that we now have to implement all REST apis that people
> > dream up as ioctl interfaces over a hyperv transport? That would be
> > insane.
>
> As far as I understand, the purpose of the driver is to replace a "slow"
> network connection to API endpoint with a "fast" transport over Vmbus. So
> what if instead of implementing this new driver we just use Hyper-V Vsock and
> move API endpoint to the host?

Hi Vitaly,

We looked at Hyper-V Vsock when designing this driver. The problem is that the Hyper-V device model of Vsock can't support the data throughput and scale needed for Blobs. Vsock is mostly used for management tasks.

The usage of Blob in Azure justifies an dedicated VMBUS channel (and sub-channels) for a new VSP/VSC driver.

Thanks,

Long

>
> --
> Vitaly

2021-10-11 17:58:30

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

> Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access
> to Microsoft Azure Blob for Azure VM
>
> On Fri, Oct 08, 2021 at 01:11:02PM +0200, Vitaly Kuznetsov wrote:
> > Greg Kroah-Hartman <[email protected]> writes:
> >
> > ...
> > >
> > > Not to mention the whole crazy idea of "let's implement our REST api
> > > that used to go over a network connection over an ioctl instead!"
> > > That's the main problem that you need to push back on here.
> > >
> > > What is forcing you to put all of this into the kernel in the first
> > > place? What's wrong with the userspace network connection/protocol
> > > that you have today?
> > >
> > > Does this mean that we now have to implement all REST apis that
> > > people dream up as ioctl interfaces over a hyperv transport? That
> > > would be insane.
> >
> > As far as I understand, the purpose of the driver is to replace a "slow"
> > network connection to API endpoint with a "fast" transport over Vmbus.
>
> Given that the network connection is already over vmbus, how is this "slow"
> today? I have yet to see any benchmark numbers anywhere :(

Hi Greg,

The problem statement and benchmark numbers are in this patch. Maybe it's getting lost because of the long discussion. I'm pasting them again in the email:

Azure Blob storage [1] is Microsoft's object storage solution for the cloud. Users or client applications can access objects in Blob storage via HTTP, from anywhere in the world. Objects in Blob storage are accessible via the Azure Storage REST API, Azure PowerShell, Azure CLI, or an Azure Storage client library. The Blob storage interface is not designed to be a POSIX compliant interface.

Problem: When a client accesses Blob storage via HTTP, it must go through the Blob storage boundary of Azure and get to the storage server through multiple servers. This is also true for an Azure VM.

Solution: For an Azure VM, the Blob storage access can be accelerated by having Azure host execute the Blob storage requests to the backend storage server directly.

This driver implements a VSC (Virtual Service Client) for accelerating Blob storage access for an Azure VM by communicating with a VSP (Virtual Service
Provider) on the Azure host. Instead of using HTTP to access the Blob storage, an Azure VM passes the Blob storage request to the VSP on the Azure host. The Azure host uses its native network to perform Blob storage requests to the backend server directly.

This driver doesn't implement Blob storage APIs. It acts as a fast channel to pass user-mode Blob storage requests to the Azure host. The user-mode program using this driver implements Blob storage APIs and packages the Blob storage request as structured data to VSC. The request data is modeled as three user provided buffers (request, response and data buffers), that are patterned on the HTTP model used by existing Azure Blob clients. The VSC passes those buffers to VSP for Blob storage requests.

The driver optimizes Blob storage access for an Azure VM in two ways:

1. The Blob storage requests are performed by the Azure host to the Azure Blob backend storage server directly.

2. It allows the Azure host to use transport technologies (e.g. RDMA) available to the Azure host but not available to the VM, to reach to Azure Blob backend servers.

Test results using this driver for an Azure VM:
100 Blob clients running on an Azure VM, each reading 100GB Block Blobs.
(10 TB total read data)
With REST API over HTTP: 94.4 mins
Using this driver: 72.5 mins
Performance (measured in throughput) gain: 30%.

[1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fstorage%2Fblobs%2Fstorage-blobs-introduction&amp;data=04%7C01%7Clongli%40microsoft.com%7C9b9af86ab70f4c0e147208d957deb1f7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637637436286978046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a8xHaFIsGEvI0D5u5cPFdzXT3WDtKXnmwtTSQj9byMY%3D&amp;reserved=0

>
> > So what if instead of implementing this new driver we just use Hyper-V
> > Vsock and move API endpoint to the host?
>
> What is running on the host in the hypervisor that is supposed to be handling
> these requests? Isn't that really on some other guest?

The requests are handled by Hyper-V via a dedicated Blob service on behalf of the VM. The Blob service is running in the Hyper-V root partition for all the VMs on this Hyper-V server. The request to the "Blob server" is sent by this service over native TCP or RDMA used by Azure backend.

Thanks,

Long

>
> confused,
>
> greg k-h

2021-10-11 18:02:15

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

> Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access
> to Microsoft Azure Blob for Azure VM
>
> Greg Kroah-Hartman <[email protected]> writes:
>
> > On Fri, Oct 08, 2021 at 01:11:02PM +0200, Vitaly Kuznetsov wrote:
> >> Greg Kroah-Hartman <[email protected]> writes:
> >>
> >> ...
> >> >
> >> > Not to mention the whole crazy idea of "let's implement our REST
> >> > api that used to go over a network connection over an ioctl instead!"
> >> > That's the main problem that you need to push back on here.
> >> >
> >> > What is forcing you to put all of this into the kernel in the first
> >> > place? What's wrong with the userspace network connection/protocol
> >> > that you have today?
> >> >
> >> > Does this mean that we now have to implement all REST apis that
> >> > people dream up as ioctl interfaces over a hyperv transport? That
> >> > would be insane.
> >>
> >> As far as I understand, the purpose of the driver is to replace a "slow"
> >> network connection to API endpoint with a "fast" transport over
> >> Vmbus.
> >
> > Given that the network connection is already over vmbus, how is this
> > "slow" today? I have yet to see any benchmark numbers anywhere :(
> >
> >> So what if instead of implementing this new driver we just use
> >> Hyper-V Vsock and move API endpoint to the host?
> >
> > What is running on the host in the hypervisor that is supposed to be
> > handling these requests? Isn't that really on some other guest?
> >
>
> Long,
>
> would it be possible to draw a simple picture for us describing the backend flow
> of the feature, both with network connection and with this new driver? We're
> struggling to understand which particular bottleneck the driver is trying to
> eliminate.

Thank you for this great suggestion. I'm preparing some diagrams for describing the problem. I will be sending them soon.

Thanks,

Long

>
> --
> Vitaly

2021-10-11 18:03:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

On Mon, Oct 11, 2021 at 05:46:48PM +0000, Long Li wrote:
> > Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access
> > to Microsoft Azure Blob for Azure VM
> >
> > Greg Kroah-Hartman <[email protected]> writes:
> >
> > ...
> > >
> > > Not to mention the whole crazy idea of "let's implement our REST api
> > > that used to go over a network connection over an ioctl instead!"
> > > That's the main problem that you need to push back on here.
> > >
> > > What is forcing you to put all of this into the kernel in the first
> > > place? What's wrong with the userspace network connection/protocol
> > > that you have today?
> > >
> > > Does this mean that we now have to implement all REST apis that people
> > > dream up as ioctl interfaces over a hyperv transport? That would be
> > > insane.
> >
> > As far as I understand, the purpose of the driver is to replace a "slow"
> > network connection to API endpoint with a "fast" transport over Vmbus. So
> > what if instead of implementing this new driver we just use Hyper-V Vsock and
> > move API endpoint to the host?
>
> Hi Vitaly,
>
> We looked at Hyper-V Vsock when designing this driver. The problem is that the Hyper-V device model of Vsock can't support the data throughput and scale needed for Blobs. Vsock is mostly used for management tasks.
>
> The usage of Blob in Azure justifies an dedicated VMBUS channel (and sub-channels) for a new VSP/VSC driver.

Why not just fix the vsock code to handle data better? That way all
users of it would benefit.

thanks,

greg k-h

2021-10-11 19:41:10

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

> Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access
> to Microsoft Azure Blob for Azure VM
>
> On Mon, Oct 11, 2021 at 05:46:48PM +0000, Long Li wrote:
> > > Subject: Re: [Patch v5 0/3] Introduce a driver to support host
> > > accelerated access to Microsoft Azure Blob for Azure VM
> > >
> > > Greg Kroah-Hartman <[email protected]> writes:
> > >
> > > ...
> > > >
> > > > Not to mention the whole crazy idea of "let's implement our REST
> > > > api that used to go over a network connection over an ioctl instead!"
> > > > That's the main problem that you need to push back on here.
> > > >
> > > > What is forcing you to put all of this into the kernel in the
> > > > first place? What's wrong with the userspace network
> > > > connection/protocol that you have today?
> > > >
> > > > Does this mean that we now have to implement all REST apis that
> > > > people dream up as ioctl interfaces over a hyperv transport? That
> > > > would be insane.
> > >
> > > As far as I understand, the purpose of the driver is to replace a "slow"
> > > network connection to API endpoint with a "fast" transport over
> > > Vmbus. So what if instead of implementing this new driver we just
> > > use Hyper-V Vsock and move API endpoint to the host?
> >
> > Hi Vitaly,
> >
> > We looked at Hyper-V Vsock when designing this driver. The problem is that
> the Hyper-V device model of Vsock can't support the data throughput and scale
> needed for Blobs. Vsock is mostly used for management tasks.
> >
> > The usage of Blob in Azure justifies an dedicated VMBUS channel (and sub-
> channels) for a new VSP/VSC driver.
>
> Why not just fix the vsock code to handle data better? That way all users of it
> would benefit.

Hi Greg,

The Vsock is connection based. On both Guest and Host, the model is around a connection over a socket. Internally, the Hyper-V creates a device for each connection. But it doesn't scale to large number of CPUs over a socket connection. Hyper-V vsock is designed to perform configuration management for a VM running on Hyper-V.

The Azure Blob service is not connection based. It doesn't fit into the device model presented by Vsock from Hyper-V.

Thanks,

Long

>
> thanks,
>
> greg k-h

2021-10-13 01:01:30

by Long Li

[permalink] [raw]
Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

> Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access
> to Microsoft Azure Blob for Azure VM
>
> > Subject: Re: [Patch v5 0/3] Introduce a driver to support host
> > accelerated access to Microsoft Azure Blob for Azure VM
> >
> > Greg Kroah-Hartman <[email protected]> writes:
> >
> > > On Fri, Oct 08, 2021 at 01:11:02PM +0200, Vitaly Kuznetsov wrote:
> > >> Greg Kroah-Hartman <[email protected]> writes:
> > >>
> > >> ...
> > >> >
> > >> > Not to mention the whole crazy idea of "let's implement our REST
> > >> > api that used to go over a network connection over an ioctl instead!"
> > >> > That's the main problem that you need to push back on here.
> > >> >
> > >> > What is forcing you to put all of this into the kernel in the
> > >> > first place? What's wrong with the userspace network
> > >> > connection/protocol that you have today?
> > >> >
> > >> > Does this mean that we now have to implement all REST apis that
> > >> > people dream up as ioctl interfaces over a hyperv transport?
> > >> > That would be insane.
> > >>
> > >> As far as I understand, the purpose of the driver is to replace a "slow"
> > >> network connection to API endpoint with a "fast" transport over
> > >> Vmbus.
> > >
> > > Given that the network connection is already over vmbus, how is this
> > > "slow" today? I have yet to see any benchmark numbers anywhere :(
> > >
> > >> So what if instead of implementing this new driver we just use
> > >> Hyper-V Vsock and move API endpoint to the host?
> > >
> > > What is running on the host in the hypervisor that is supposed to be
> > > handling these requests? Isn't that really on some other guest?
> > >
> >
> > Long,
> >
> > would it be possible to draw a simple picture for us describing the
> > backend flow of the feature, both with network connection and with
> > this new driver? We're struggling to understand which particular
> > bottleneck the driver is trying to eliminate.
>
> Thank you for this great suggestion. I'm preparing some diagrams for describing
> the problem. I will be sending them soon.
>

Please find the pictures describing the problem and data flow before and after this driver.

existing_blob_access.jpg shows the current method of accessing Blob through HTTP.
fastpath_blob_access.jpg shows the access to Blob through this driver.

This driver enables the Blob application to use the host native network to get access directly to the Data Block server. The host networks are the backbones of Azure. The networks are RDMA capable, but they are not available for use by VMs due to security requirements.

Thanks,

Long


Attachments:
existing_blob_access.jpg (66.13 kB)
existing_blob_access.jpg
fastpath_blob_access.jpg (75.75 kB)
fastpath_blob_access.jpg
Download all attachments

2021-10-13 07:05:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [Patch v5 0/3] Introduce a driver to support host accelerated access to Microsoft Azure Blob for Azure VM

On Wed, Oct 13, 2021 at 12:58:55AM +0000, Long Li wrote:
> > Subject: RE: [Patch v5 0/3] Introduce a driver to support host accelerated access
> > to Microsoft Azure Blob for Azure VM
> >
> > > Subject: Re: [Patch v5 0/3] Introduce a driver to support host
> > > accelerated access to Microsoft Azure Blob for Azure VM
> > >
> > > Greg Kroah-Hartman <[email protected]> writes:
> > >
> > > > On Fri, Oct 08, 2021 at 01:11:02PM +0200, Vitaly Kuznetsov wrote:
> > > >> Greg Kroah-Hartman <[email protected]> writes:
> > > >>
> > > >> ...
> > > >> >
> > > >> > Not to mention the whole crazy idea of "let's implement our REST
> > > >> > api that used to go over a network connection over an ioctl instead!"
> > > >> > That's the main problem that you need to push back on here.
> > > >> >
> > > >> > What is forcing you to put all of this into the kernel in the
> > > >> > first place? What's wrong with the userspace network
> > > >> > connection/protocol that you have today?
> > > >> >
> > > >> > Does this mean that we now have to implement all REST apis that
> > > >> > people dream up as ioctl interfaces over a hyperv transport?
> > > >> > That would be insane.
> > > >>
> > > >> As far as I understand, the purpose of the driver is to replace a "slow"
> > > >> network connection to API endpoint with a "fast" transport over
> > > >> Vmbus.
> > > >
> > > > Given that the network connection is already over vmbus, how is this
> > > > "slow" today? I have yet to see any benchmark numbers anywhere :(
> > > >
> > > >> So what if instead of implementing this new driver we just use
> > > >> Hyper-V Vsock and move API endpoint to the host?
> > > >
> > > > What is running on the host in the hypervisor that is supposed to be
> > > > handling these requests? Isn't that really on some other guest?
> > > >
> > >
> > > Long,
> > >
> > > would it be possible to draw a simple picture for us describing the
> > > backend flow of the feature, both with network connection and with
> > > this new driver? We're struggling to understand which particular
> > > bottleneck the driver is trying to eliminate.
> >
> > Thank you for this great suggestion. I'm preparing some diagrams for describing
> > the problem. I will be sending them soon.
> >
>
> Please find the pictures describing the problem and data flow before and after this driver.
>
> existing_blob_access.jpg shows the current method of accessing Blob through HTTP.
> fastpath_blob_access.jpg shows the access to Blob through this driver.
>
> This driver enables the Blob application to use the host native network to get access directly to the Data Block server. The host networks are the backbones of Azure. The networks are RDMA capable, but they are not available for use by VMs due to security requirements.

Please wrap your lines when responding...

Anyway, this shows that you are trying to work around a crazy network
design by adding lots of kernel code and a custom user/kernel api.

Please just fix your network design instead, put the network that you
want this "blob api" on the RDMA network so that you will get the same
throughput as this odd one-off ioctl will provide.

That way you also get the proper flow control, error handling,
encryption, and all the other goodness that a real network connection
provides you. Instead of this custom, one-off, fragile, ioctl command
that requires custom userspace code to handle and needs to be maintained
for the next 40+ years by yourself.

Do it right please, do not force the kernel and userspace to do foolish
things because your network designers do not want to do the real work
here.

thanks,

greg k-h