2022-04-07 15:48:52

by Andrea Parri

[permalink] [raw]
Subject: [PATCH 0/6] PCI: hv: VMbus requestor and related fixes

Changes since RFC [1]:

- Rebase on hyperv-fixes (although more likely -next material)
- Fix handling of messages with transaction ID of 0
- Avoid reading trans_id from the ring buffer
- Move hv_pci_request_addr_match()&co. to VMbus
- Introduce primitives to lock and unlock the requestor
- Improve comments and log messages

The series got bigger (mainly due to a certain re-factoring in VMbus): the
hv_pci changes are in patches #2 and #6. Let me stress that the "PCI: hv"
patches and the "Drivers: hv: vmbus" patches below are inter-dependent.

[1] https://lkml.kernel.org/r/[email protected]

Thanks,
Andrea

Andrea Parri (Microsoft) (6):
Drivers: hv: vmbus: Fix handling of messages with transaction ID of
zero
PCI: hv: Use vmbus_requestor to generate transaction IDs for VMbus
hardening
Drivers: hv: vmbus: Introduce vmbus_sendpacket_getid()
Drivers: hv: vmbus: Introduce vmbus_request_addr_match()
Drivers: hv: vmbus: Introduce {lock,unlock}_requestor()
PCI: hv: Fix synchronization between channel callback and
hv_compose_msi_msg()

drivers/hv/channel.c | 116 +++++++++++++++++++++-------
drivers/hv/hyperv_vmbus.h | 2 +-
drivers/hv/ring_buffer.c | 14 +++-
drivers/pci/controller/pci-hyperv.c | 68 ++++++++++++----
include/linux/hyperv.h | 27 +++++++
5 files changed, 179 insertions(+), 48 deletions(-)

--
2.25.1


2022-04-07 17:05:04

by Andrea Parri

[permalink] [raw]
Subject: [PATCH 2/6] PCI: hv: Use vmbus_requestor to generate transaction IDs for VMbus hardening

Currently, pointers to guest memory are passed to Hyper-V as transaction
IDs in hv_pci. In the face of errors or malicious behavior in Hyper-V,
hv_pci should not expose or trust the transaction IDs returned by
Hyper-V to be valid guest memory addresses. Instead, use small integers
generated by vmbus_requestor as request (transaction) IDs.

Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
---
drivers/pci/controller/pci-hyperv.c | 39 +++++++++++++++++++++--------
1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 88b3b56d05228..c1322ac37cda9 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -91,6 +91,13 @@ static enum pci_protocol_version_t pci_protocol_versions[] = {
/* space for 32bit serial number as string */
#define SLOT_NAME_SIZE 11

+/*
+ * Size of requestor for VMbus; the value is based on the observation
+ * that having more than one request outstanding is 'rare', and so 64
+ * should be generous in ensuring that we don't ever run out.
+ */
+#define HV_PCI_RQSTOR_SIZE 64
+
/*
* Message Types
*/
@@ -1407,7 +1414,7 @@ static void hv_int_desc_free(struct hv_pci_dev *hpdev,
int_pkt->wslot.slot = hpdev->desc.win_slot.slot;
int_pkt->int_desc = *int_desc;
vmbus_sendpacket(hpdev->hbus->hdev->channel, int_pkt, sizeof(*int_pkt),
- (unsigned long)&ctxt.pkt, VM_PKT_DATA_INBAND, 0);
+ 0, VM_PKT_DATA_INBAND, 0);
kfree(int_desc);
}

@@ -2649,7 +2656,7 @@ static void hv_eject_device_work(struct work_struct *work)
ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE;
ejct_pkt->wslot.slot = hpdev->desc.win_slot.slot;
vmbus_sendpacket(hbus->hdev->channel, ejct_pkt,
- sizeof(*ejct_pkt), (unsigned long)&ctxt.pkt,
+ sizeof(*ejct_pkt), 0,
VM_PKT_DATA_INBAND, 0);

/* For the get_pcichild() in hv_pci_eject_device() */
@@ -2696,8 +2703,9 @@ static void hv_pci_onchannelcallback(void *context)
const int packet_size = 0x100;
int ret;
struct hv_pcibus_device *hbus = context;
+ struct vmbus_channel *chan = hbus->hdev->channel;
u32 bytes_recvd;
- u64 req_id;
+ u64 req_id, req_addr;
struct vmpacket_descriptor *desc;
unsigned char *buffer;
int bufferlen = packet_size;
@@ -2715,8 +2723,8 @@ static void hv_pci_onchannelcallback(void *context)
return;

while (1) {
- ret = vmbus_recvpacket_raw(hbus->hdev->channel, buffer,
- bufferlen, &bytes_recvd, &req_id);
+ ret = vmbus_recvpacket_raw(chan, buffer, bufferlen,
+ &bytes_recvd, &req_id);

if (ret == -ENOBUFS) {
kfree(buffer);
@@ -2743,11 +2751,14 @@ static void hv_pci_onchannelcallback(void *context)
switch (desc->type) {
case VM_PKT_COMP:

- /*
- * The host is trusted, and thus it's safe to interpret
- * this transaction ID as a pointer.
- */
- comp_packet = (struct pci_packet *)req_id;
+ req_addr = chan->request_addr_callback(chan, req_id);
+ if (req_addr == VMBUS_RQST_ERROR) {
+ dev_warn_ratelimited(&hbus->hdev->device,
+ "Invalid transaction ID %llx\n",
+ req_id);
+ break;
+ }
+ comp_packet = (struct pci_packet *)req_addr;
response = (struct pci_response *)buffer;
comp_packet->completion_func(comp_packet->compl_ctxt,
response,
@@ -3428,6 +3439,10 @@ static int hv_pci_probe(struct hv_device *hdev,
goto free_dom;
}

+ hdev->channel->next_request_id_callback = vmbus_next_request_id;
+ hdev->channel->request_addr_callback = vmbus_request_addr;
+ hdev->channel->rqstor_size = HV_PCI_RQSTOR_SIZE;
+
ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
hv_pci_onchannelcallback, hbus);
if (ret)
@@ -3758,6 +3773,10 @@ static int hv_pci_resume(struct hv_device *hdev)

hbus->state = hv_pcibus_init;

+ hdev->channel->next_request_id_callback = vmbus_next_request_id;
+ hdev->channel->request_addr_callback = vmbus_request_addr;
+ hdev->channel->rqstor_size = HV_PCI_RQSTOR_SIZE;
+
ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
hv_pci_onchannelcallback, hbus);
if (ret)
--
2.25.1

2022-04-07 20:11:10

by Andrea Parri

[permalink] [raw]
Subject: [PATCH 6/6] PCI: hv: Fix synchronization between channel callback and hv_compose_msi_msg()

Dexuan wrote:

"[...] when we disable AccelNet, the host PCI VSP driver sends a
PCI_EJECT message first, and the channel callback may set
hpdev->state to hv_pcichild_ejecting on a different CPU. This can
cause hv_compose_msi_msg() to exit from the loop and 'return', and
the on-stack variable 'ctxt' is invalid. Now, if the response
message from the host arrives, the channel callback will try to
access the invalid 'ctxt' variable, and this may cause a crash."

Schematically:

Hyper-V sends PCI_EJECT msg
hv_pci_onchannelcallback()
state = hv_pcichild_ejecting
hv_compose_msi_msg()
alloc and init comp_pkt
state == hv_pcichild_ejecting
Hyper-V sends VM_PKT_COMP msg
hv_pci_onchannelcallback()
retrieve address of comp_pkt
'free' comp_pkt and return
comp_pkt->completion_func()

Dexuan also showed how the crash can be triggered after introducing
suitable delays in the driver code, thus validating the 'assumption'
that the host can still normally respond to the guest's compose_msi
request after the host has started to eject the PCI device.

Fix the synchronization by leveraging the requestor lock as follows:

- Before 'return'-ing in hv_compose_msi_msg(), remove the ID (while
holding the requestor lock) associated to the completion packet.

- Retrieve the address *and call ->completion_func() within a same
(requestor) critical section in hv_pci_onchannelcallback().

Fixes: de0aa7b2f97d3 ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Reported-by: Wei Hu <[email protected]>
Reported-by: Dexuan Cui <[email protected]>
Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
---
The "Fixes:" tag is mainly a reference: a back-port would depend
on the entire series (which, in turn, shouldn't be backported to
commits preceding bf5fd8cae3c8f).

drivers/pci/controller/pci-hyperv.c | 33 +++++++++++++++++++++++------
1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index c1322ac37cda9..f1d794f8a5ef1 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1695,7 +1695,7 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
struct pci_create_interrupt3 v3;
} int_pkts;
} __packed ctxt;
-
+ u64 trans_id;
u32 size;
int ret;

@@ -1757,10 +1757,10 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
goto free_int_desc;
}

- ret = vmbus_sendpacket(hpdev->hbus->hdev->channel, &ctxt.int_pkts,
- size, (unsigned long)&ctxt.pci_pkt,
- VM_PKT_DATA_INBAND,
- VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
+ ret = vmbus_sendpacket_getid(hpdev->hbus->hdev->channel, &ctxt.int_pkts,
+ size, (unsigned long)&ctxt.pci_pkt,
+ &trans_id, VM_PKT_DATA_INBAND,
+ VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
if (ret) {
dev_err(&hbus->hdev->device,
"Sending request for interrupt failed: 0x%x",
@@ -1839,6 +1839,15 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)

enable_tasklet:
tasklet_enable(&channel->callback_event);
+ /*
+ * The completion packet on the stack becomes invalid after 'return';
+ * remove the ID from the VMbus requestor if the identifier is still
+ * mapped to/associated with the packet. (The identifier could have
+ * been 're-used', i.e., already removed and (re-)mapped.)
+ *
+ * Cf. hv_pci_onchannelcallback().
+ */
+ vmbus_request_addr_match(channel, trans_id, (unsigned long)&ctxt.pci_pkt);
free_int_desc:
kfree(int_desc);
drop_reference:
@@ -2717,6 +2726,7 @@ static void hv_pci_onchannelcallback(void *context)
struct pci_dev_inval_block *inval;
struct pci_dev_incoming *dev_message;
struct hv_pci_dev *hpdev;
+ unsigned long flags;

buffer = kmalloc(bufferlen, GFP_ATOMIC);
if (!buffer)
@@ -2751,8 +2761,11 @@ static void hv_pci_onchannelcallback(void *context)
switch (desc->type) {
case VM_PKT_COMP:

- req_addr = chan->request_addr_callback(chan, req_id);
+ lock_requestor(chan, flags);
+ req_addr = __vmbus_request_addr_match(chan, req_id,
+ VMBUS_RQST_ADDR_ANY);
if (req_addr == VMBUS_RQST_ERROR) {
+ unlock_requestor(chan, flags);
dev_warn_ratelimited(&hbus->hdev->device,
"Invalid transaction ID %llx\n",
req_id);
@@ -2760,9 +2773,17 @@ static void hv_pci_onchannelcallback(void *context)
}
comp_packet = (struct pci_packet *)req_addr;
response = (struct pci_response *)buffer;
+ /*
+ * Call ->completion_func() within the critical section to make
+ * sure that the packet pointer is still valid during the call:
+ * here 'valid' means that there's a task still waiting for the
+ * completion, and that the packet data is still on the waiting
+ * task's stack. Cf. hv_compose_msi_msg().
+ */
comp_packet->completion_func(comp_packet->compl_ctxt,
response,
bytes_recvd);
+ unlock_requestor(chan, flags);
break;

case VM_PKT_DATA_INBAND:
--
2.25.1

2022-04-07 20:15:26

by Andrea Parri

[permalink] [raw]
Subject: [PATCH 5/6] Drivers: hv: vmbus: Introduce {lock,unlock}_requestor()

To abtract the lock and unlock operations on the requestor spin lock.
The helpers will come in handy in hv_pci.

No functional change.

Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
---
drivers/hv/channel.c | 11 +++++------
include/linux/hyperv.h | 15 +++++++++++++++
2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 49f10a603a091..56f7e06c673e4 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -1252,12 +1252,12 @@ u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr)
if (!channel->rqstor_size)
return VMBUS_NO_RQSTOR;

- spin_lock_irqsave(&rqstor->req_lock, flags);
+ lock_requestor(channel, flags);
current_id = rqstor->next_request_id;

/* Requestor array is full */
if (current_id >= rqstor->size) {
- spin_unlock_irqrestore(&rqstor->req_lock, flags);
+ unlock_requestor(channel, flags);
return VMBUS_RQST_ERROR;
}

@@ -1267,7 +1267,7 @@ u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr)
/* The already held spin lock provides atomicity */
bitmap_set(rqstor->req_bitmap, current_id, 1);

- spin_unlock_irqrestore(&rqstor->req_lock, flags);
+ unlock_requestor(channel, flags);

/*
* Cannot return an ID of 0, which is reserved for an unsolicited
@@ -1327,13 +1327,12 @@ EXPORT_SYMBOL_GPL(__vmbus_request_addr_match);
u64 vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
u64 rqst_addr)
{
- struct vmbus_requestor *rqstor = &channel->requestor;
unsigned long flags;
u64 req_addr;

- spin_lock_irqsave(&rqstor->req_lock, flags);
+ lock_requestor(channel, flags);
req_addr = __vmbus_request_addr_match(channel, trans_id, rqst_addr);
- spin_unlock_irqrestore(&rqstor->req_lock, flags);
+ unlock_requestor(channel, flags);

return req_addr;
}
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index c77d78f34b96a..015e4ceb43029 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1042,6 +1042,21 @@ struct vmbus_channel {
u32 max_pkt_size;
};

+#define lock_requestor(channel, flags) \
+do { \
+ struct vmbus_requestor *rqstor = &(channel)->requestor; \
+ \
+ spin_lock_irqsave(&rqstor->req_lock, flags); \
+} while (0)
+
+static __always_inline void unlock_requestor(struct vmbus_channel *channel,
+ unsigned long flags)
+{
+ struct vmbus_requestor *rqstor = &channel->requestor;
+
+ spin_unlock_irqrestore(&rqstor->req_lock, flags);
+}
+
u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr);
u64 __vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
u64 rqst_addr);
--
2.25.1

2022-04-07 20:23:00

by Andrea Parri

[permalink] [raw]
Subject: [PATCH 3/6] Drivers: hv: vmbus: Introduce vmbus_sendpacket_getid()

The function can be used to send a VMbus packet and retrieve the
corresponding transaction ID. It will be used by hv_pci.

No functional change.

Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
---
drivers/hv/channel.c | 38 ++++++++++++++++++++++++++++++++------
drivers/hv/hyperv_vmbus.h | 2 +-
drivers/hv/ring_buffer.c | 14 +++++++++++---
include/linux/hyperv.h | 7 +++++++
4 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 20fc8d50a0398..585a8084848bf 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -1022,11 +1022,13 @@ void vmbus_close(struct vmbus_channel *channel)
EXPORT_SYMBOL_GPL(vmbus_close);

/**
- * vmbus_sendpacket() - Send the specified buffer on the given channel
+ * vmbus_sendpacket_getid() - Send the specified buffer on the given channel
* @channel: Pointer to vmbus_channel structure
* @buffer: Pointer to the buffer you want to send the data from.
* @bufferlen: Maximum size of what the buffer holds.
* @requestid: Identifier of the request
+ * @trans_id: Identifier of the transaction associated to this request, if
+ * the send is successful; undefined, otherwise.
* @type: Type of packet that is being sent e.g. negotiate, time
* packet etc.
* @flags: 0 or VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED
@@ -1036,8 +1038,8 @@ EXPORT_SYMBOL_GPL(vmbus_close);
*
* Mainly used by Hyper-V drivers.
*/
-int vmbus_sendpacket(struct vmbus_channel *channel, void *buffer,
- u32 bufferlen, u64 requestid,
+int vmbus_sendpacket_getid(struct vmbus_channel *channel, void *buffer,
+ u32 bufferlen, u64 requestid, u64 *trans_id,
enum vmbus_packet_type type, u32 flags)
{
struct vmpacket_descriptor desc;
@@ -1063,7 +1065,31 @@ int vmbus_sendpacket(struct vmbus_channel *channel, void *buffer,
bufferlist[2].iov_base = &aligned_data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);

- return hv_ringbuffer_write(channel, bufferlist, num_vecs, requestid);
+ return hv_ringbuffer_write(channel, bufferlist, num_vecs, requestid, trans_id);
+}
+EXPORT_SYMBOL(vmbus_sendpacket_getid);
+
+/**
+ * vmbus_sendpacket() - Send the specified buffer on the given channel
+ * @channel: Pointer to vmbus_channel structure
+ * @buffer: Pointer to the buffer you want to send the data from.
+ * @bufferlen: Maximum size of what the buffer holds.
+ * @requestid: Identifier of the request
+ * @type: Type of packet that is being sent e.g. negotiate, time
+ * packet etc.
+ * @flags: 0 or VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED
+ *
+ * Sends data in @buffer directly to Hyper-V via the vmbus.
+ * This will send the data unparsed to Hyper-V.
+ *
+ * Mainly used by Hyper-V drivers.
+ */
+int vmbus_sendpacket(struct vmbus_channel *channel, void *buffer,
+ u32 bufferlen, u64 requestid,
+ enum vmbus_packet_type type, u32 flags)
+{
+ return vmbus_sendpacket_getid(channel, buffer, bufferlen,
+ requestid, NULL, type, flags);
}
EXPORT_SYMBOL(vmbus_sendpacket);

@@ -1122,7 +1148,7 @@ int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
bufferlist[2].iov_base = &aligned_data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);

- return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
+ return hv_ringbuffer_write(channel, bufferlist, 3, requestid, NULL);
}
EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer);

@@ -1160,7 +1186,7 @@ int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
bufferlist[2].iov_base = &aligned_data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);

- return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
+ return hv_ringbuffer_write(channel, bufferlist, 3, requestid, NULL);
}
EXPORT_SYMBOL_GPL(vmbus_sendpacket_mpb_desc);

diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 3a1f007b678a0..64c0b9cbe183b 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -181,7 +181,7 @@ void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info);

int hv_ringbuffer_write(struct vmbus_channel *channel,
const struct kvec *kv_list, u32 kv_count,
- u64 requestid);
+ u64 requestid, u64 *trans_id);

int hv_ringbuffer_read(struct vmbus_channel *channel,
void *buffer, u32 buflen, u32 *buffer_actual_len,
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 3d215d9dec433..e101b11f95e5d 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -283,7 +283,7 @@ void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info)
/* Write to the ring buffer. */
int hv_ringbuffer_write(struct vmbus_channel *channel,
const struct kvec *kv_list, u32 kv_count,
- u64 requestid)
+ u64 requestid, u64 *trans_id)
{
int i;
u32 bytes_avail_towrite;
@@ -294,7 +294,7 @@ int hv_ringbuffer_write(struct vmbus_channel *channel,
unsigned long flags;
struct hv_ring_buffer_info *outring_info = &channel->outbound;
struct vmpacket_descriptor *desc = kv_list[0].iov_base;
- u64 rqst_id = VMBUS_NO_RQSTOR;
+ u64 __trans_id, rqst_id = VMBUS_NO_RQSTOR;

if (channel->rescind)
return -ENODEV;
@@ -353,7 +353,15 @@ int hv_ringbuffer_write(struct vmbus_channel *channel,
}
}
desc = hv_get_ring_buffer(outring_info) + old_write;
- desc->trans_id = (rqst_id == VMBUS_NO_RQSTOR) ? requestid : rqst_id;
+ __trans_id = (rqst_id == VMBUS_NO_RQSTOR) ? requestid : rqst_id;
+ /*
+ * Ensure the compiler doesn't generate code that reads the value of
+ * the transaction ID from the ring buffer, which is shared with the
+ * Hyper-V host and subject to being changed at any time.
+ */
+ WRITE_ONCE(desc->trans_id, __trans_id);
+ if (trans_id)
+ *trans_id = __trans_id;

/* Set previous packet start */
prev_indices = hv_get_ring_bufferindices(outring_info);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index fe2e0179ed51e..a7cb596d893b1 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1161,6 +1161,13 @@ extern int vmbus_open(struct vmbus_channel *channel,

extern void vmbus_close(struct vmbus_channel *channel);

+extern int vmbus_sendpacket_getid(struct vmbus_channel *channel,
+ void *buffer,
+ u32 bufferLen,
+ u64 requestid,
+ u64 *trans_id,
+ enum vmbus_packet_type type,
+ u32 flags);
extern int vmbus_sendpacket(struct vmbus_channel *channel,
void *buffer,
u32 bufferLen,
--
2.25.1

2022-04-07 20:26:44

by Andrea Parri

[permalink] [raw]
Subject: [PATCH 4/6] Drivers: hv: vmbus: Introduce vmbus_request_addr_match()

The function can be used to retrieve and clear/remove a transation ID
from a channel requestor, provided the memory address corresponding to
the ID equals a specified address. The function, and its 'lockless'
variant __vmbus_request_addr_match(), will be used by hv_pci.

Refactor vmbus_request_addr() to reuse the 'newly' introduced code.

No functional change.

Suggested-by: Michael Kelley <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
---
drivers/hv/channel.c | 65 ++++++++++++++++++++++++++++++------------
include/linux/hyperv.h | 5 ++++
2 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 585a8084848bf..49f10a603a091 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -1279,17 +1279,11 @@ u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr)
}
EXPORT_SYMBOL_GPL(vmbus_next_request_id);

-/*
- * vmbus_request_addr - Returns the memory address stored at @trans_id
- * in @rqstor. Uses a spin lock to avoid race conditions.
- * @channel: Pointer to the VMbus channel struct
- * @trans_id: Request id sent back from Hyper-V. Becomes the requestor's
- * next request id.
- */
-u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id)
+/* As in vmbus_request_addr_match() but without the requestor lock */
+u64 __vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
+ u64 rqst_addr)
{
struct vmbus_requestor *rqstor = &channel->requestor;
- unsigned long flags;
u64 req_addr;

/* Check rqstor has been initialized */
@@ -1300,25 +1294,60 @@ u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id)
if (!trans_id)
return VMBUS_RQST_ERROR;

- spin_lock_irqsave(&rqstor->req_lock, flags);
-
/* Data corresponding to trans_id is stored at trans_id - 1 */
trans_id--;

/* Invalid trans_id */
- if (trans_id >= rqstor->size || !test_bit(trans_id, rqstor->req_bitmap)) {
- spin_unlock_irqrestore(&rqstor->req_lock, flags);
+ if (trans_id >= rqstor->size || !test_bit(trans_id, rqstor->req_bitmap))
return VMBUS_RQST_ERROR;
- }

req_addr = rqstor->req_arr[trans_id];
- rqstor->req_arr[trans_id] = rqstor->next_request_id;
- rqstor->next_request_id = trans_id;
+ if (rqst_addr == VMBUS_RQST_ADDR_ANY || req_addr == rqst_addr) {
+ rqstor->req_arr[trans_id] = rqstor->next_request_id;
+ rqstor->next_request_id = trans_id;

- /* The already held spin lock provides atomicity */
- bitmap_clear(rqstor->req_bitmap, trans_id, 1);
+ /* The already held spin lock provides atomicity */
+ bitmap_clear(rqstor->req_bitmap, trans_id, 1);
+ }
+
+ return req_addr;
+}
+EXPORT_SYMBOL_GPL(__vmbus_request_addr_match);
+
+/*
+ * vmbus_request_addr_match - Clears/removes @trans_id from the @channel's
+ * requestor, provided the memory address stored at @trans_id equals @rqst_addr
+ * (or provided @rqst_addr matches the sentinel value VMBUS_RQST_ADDR_ANY).
+ *
+ * Returns the memory address stored at @trans_id, or VMBUS_RQST_ERROR if
+ * @trans_id is not contained in the requestor.
+ *
+ * Acquires and releases the requestor spin lock.
+ */
+u64 vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
+ u64 rqst_addr)
+{
+ struct vmbus_requestor *rqstor = &channel->requestor;
+ unsigned long flags;
+ u64 req_addr;

+ spin_lock_irqsave(&rqstor->req_lock, flags);
+ req_addr = __vmbus_request_addr_match(channel, trans_id, rqst_addr);
spin_unlock_irqrestore(&rqstor->req_lock, flags);
+
return req_addr;
}
+EXPORT_SYMBOL_GPL(vmbus_request_addr_match);
+
+/*
+ * vmbus_request_addr - Returns the memory address stored at @trans_id
+ * in @rqstor. Uses a spin lock to avoid race conditions.
+ * @channel: Pointer to the VMbus channel struct
+ * @trans_id: Request id sent back from Hyper-V. Becomes the requestor's
+ * next request id.
+ */
+u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id)
+{
+ return vmbus_request_addr_match(channel, trans_id, VMBUS_RQST_ADDR_ANY);
+}
EXPORT_SYMBOL_GPL(vmbus_request_addr);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index a7cb596d893b1..c77d78f34b96a 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -788,6 +788,7 @@ struct vmbus_requestor {

#define VMBUS_NO_RQSTOR U64_MAX
#define VMBUS_RQST_ERROR (U64_MAX - 1)
+#define VMBUS_RQST_ADDR_ANY U64_MAX
/* NetVSC-specific */
#define VMBUS_RQST_ID_NO_RESPONSE (U64_MAX - 2)
/* StorVSC-specific */
@@ -1042,6 +1043,10 @@ struct vmbus_channel {
};

u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr);
+u64 __vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
+ u64 rqst_addr);
+u64 vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
+ u64 rqst_addr);
u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id);

static inline bool is_hvsock_channel(const struct vmbus_channel *c)
--
2.25.1

2022-04-08 17:41:44

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH 5/6] Drivers: hv: vmbus: Introduce {lock,unlock}_requestor()

From: Andrea Parri (Microsoft) <[email protected]> Sent: Wednesday, April 6, 2022 9:30 PM
>
> To abtract the lock and unlock operations on the requestor spin lock.
> The helpers will come in handy in hv_pci.
>
> No functional change.
>
> Suggested-by: Michael Kelley <[email protected]>
> Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
> ---
> drivers/hv/channel.c | 11 +++++------
> include/linux/hyperv.h | 15 +++++++++++++++
> 2 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 49f10a603a091..56f7e06c673e4 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -1252,12 +1252,12 @@ u64 vmbus_next_request_id(struct vmbus_channel
> *channel, u64 rqst_addr)
> if (!channel->rqstor_size)
> return VMBUS_NO_RQSTOR;
>
> - spin_lock_irqsave(&rqstor->req_lock, flags);
> + lock_requestor(channel, flags);
> current_id = rqstor->next_request_id;
>
> /* Requestor array is full */
> if (current_id >= rqstor->size) {
> - spin_unlock_irqrestore(&rqstor->req_lock, flags);
> + unlock_requestor(channel, flags);
> return VMBUS_RQST_ERROR;
> }
>
> @@ -1267,7 +1267,7 @@ u64 vmbus_next_request_id(struct vmbus_channel
> *channel, u64 rqst_addr)
> /* The already held spin lock provides atomicity */
> bitmap_set(rqstor->req_bitmap, current_id, 1);
>
> - spin_unlock_irqrestore(&rqstor->req_lock, flags);
> + unlock_requestor(channel, flags);
>
> /*
> * Cannot return an ID of 0, which is reserved for an unsolicited
> @@ -1327,13 +1327,12 @@ EXPORT_SYMBOL_GPL(__vmbus_request_addr_match);
> u64 vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
> u64 rqst_addr)
> {
> - struct vmbus_requestor *rqstor = &channel->requestor;
> unsigned long flags;
> u64 req_addr;
>
> - spin_lock_irqsave(&rqstor->req_lock, flags);
> + lock_requestor(channel, flags);
> req_addr = __vmbus_request_addr_match(channel, trans_id, rqst_addr);
> - spin_unlock_irqrestore(&rqstor->req_lock, flags);
> + unlock_requestor(channel, flags);
>
> return req_addr;
> }
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index c77d78f34b96a..015e4ceb43029 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1042,6 +1042,21 @@ struct vmbus_channel {
> u32 max_pkt_size;
> };
>
> +#define lock_requestor(channel, flags) \
> +do { \
> + struct vmbus_requestor *rqstor = &(channel)->requestor; \
> + \
> + spin_lock_irqsave(&rqstor->req_lock, flags); \
> +} while (0)
> +
> +static __always_inline void unlock_requestor(struct vmbus_channel *channel,
> + unsigned long flags)
> +{
> + struct vmbus_requestor *rqstor = &channel->requestor;
> +
> + spin_unlock_irqrestore(&rqstor->req_lock, flags);
> +}
> +
> u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr);
> u64 __vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
> u64 rqst_addr);
> --
> 2.25.1

Reviewed-by: Michael Kelley <[email protected]>

2022-04-08 21:10:29

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH 2/6] PCI: hv: Use vmbus_requestor to generate transaction IDs for VMbus hardening

> > @@ -2743,11 +2751,14 @@ static void hv_pci_onchannelcallback(void *context)
> > switch (desc->type) {
> > case VM_PKT_COMP:
> >
> > - /*
> > - * The host is trusted, and thus it's safe to interpret
> > - * this transaction ID as a pointer.
> > - */
> > - comp_packet = (struct pci_packet *)req_id;
> > + req_addr = chan->request_addr_callback(chan, req_id);
> > + if (req_addr == VMBUS_RQST_ERROR) {
> > + dev_warn_ratelimited(&hbus->hdev->device,
> > + "Invalid transaction ID %llx\n",
> > + req_id);
>
> This handling of a bad requestID error is a bit different from storvsc
> and netvsc. They both use dev_err(). Earlier in the storvsc and netvsc
> cases, I remember some discussion about whether to rate limit these errors,
> and evidently we decided not to. I think we should be consistent unless
> there's a specific reason not to.

Well, this 'error' is hardcoded in hv_compose_msi_msg() (as described in
patch #6). But no strong opinion really: let me replace with dev_err().

Thanks,
Andrea

2022-04-09 08:56:07

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH 4/6] Drivers: hv: vmbus: Introduce vmbus_request_addr_match()

From: Andrea Parri (Microsoft) <[email protected]> Sent: Wednesday, April 6, 2022 9:30 PM
>
> The function can be used to retrieve and clear/remove a transation ID
> from a channel requestor, provided the memory address corresponding to
> the ID equals a specified address. The function, and its 'lockless'
> variant __vmbus_request_addr_match(), will be used by hv_pci.
>
> Refactor vmbus_request_addr() to reuse the 'newly' introduced code.
>
> No functional change.
>
> Suggested-by: Michael Kelley <[email protected]>
> Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
> ---
> drivers/hv/channel.c | 65 ++++++++++++++++++++++++++++++------------
> include/linux/hyperv.h | 5 ++++
> 2 files changed, 52 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 585a8084848bf..49f10a603a091 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -1279,17 +1279,11 @@ u64 vmbus_next_request_id(struct vmbus_channel
> *channel, u64 rqst_addr)
> }
> EXPORT_SYMBOL_GPL(vmbus_next_request_id);
>
> -/*
> - * vmbus_request_addr - Returns the memory address stored at @trans_id
> - * in @rqstor. Uses a spin lock to avoid race conditions.
> - * @channel: Pointer to the VMbus channel struct
> - * @trans_id: Request id sent back from Hyper-V. Becomes the requestor's
> - * next request id.
> - */
> -u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id)
> +/* As in vmbus_request_addr_match() but without the requestor lock */
> +u64 __vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
> + u64 rqst_addr)
> {
> struct vmbus_requestor *rqstor = &channel->requestor;
> - unsigned long flags;
> u64 req_addr;
>
> /* Check rqstor has been initialized */
> @@ -1300,25 +1294,60 @@ u64 vmbus_request_addr(struct vmbus_channel
> *channel, u64 trans_id)
> if (!trans_id)
> return VMBUS_RQST_ERROR;
>
> - spin_lock_irqsave(&rqstor->req_lock, flags);
> -
> /* Data corresponding to trans_id is stored at trans_id - 1 */
> trans_id--;
>
> /* Invalid trans_id */
> - if (trans_id >= rqstor->size || !test_bit(trans_id, rqstor->req_bitmap)) {
> - spin_unlock_irqrestore(&rqstor->req_lock, flags);
> + if (trans_id >= rqstor->size || !test_bit(trans_id, rqstor->req_bitmap))
> return VMBUS_RQST_ERROR;
> - }
>
> req_addr = rqstor->req_arr[trans_id];
> - rqstor->req_arr[trans_id] = rqstor->next_request_id;
> - rqstor->next_request_id = trans_id;
> + if (rqst_addr == VMBUS_RQST_ADDR_ANY || req_addr == rqst_addr) {
> + rqstor->req_arr[trans_id] = rqstor->next_request_id;
> + rqstor->next_request_id = trans_id;
>
> - /* The already held spin lock provides atomicity */
> - bitmap_clear(rqstor->req_bitmap, trans_id, 1);
> + /* The already held spin lock provides atomicity */
> + bitmap_clear(rqstor->req_bitmap, trans_id, 1);
> + }

In the case where a specific match is required, and trans_id is
valid but the addr's do not match, it looks like this function will
return the addr that didn't match, without removing the entry.
Shouldn't it return VMBUS_RQST_ERROR in that case?

> +
> + return req_addr;
> +}
> +EXPORT_SYMBOL_GPL(__vmbus_request_addr_match);
> +
> +/*
> + * vmbus_request_addr_match - Clears/removes @trans_id from the @channel's
> + * requestor, provided the memory address stored at @trans_id equals @rqst_addr
> + * (or provided @rqst_addr matches the sentinel value VMBUS_RQST_ADDR_ANY).
> + *
> + * Returns the memory address stored at @trans_id, or VMBUS_RQST_ERROR if
> + * @trans_id is not contained in the requestor.
> + *
> + * Acquires and releases the requestor spin lock.
> + */
> +u64 vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
> + u64 rqst_addr)
> +{
> + struct vmbus_requestor *rqstor = &channel->requestor;
> + unsigned long flags;
> + u64 req_addr;
>
> + spin_lock_irqsave(&rqstor->req_lock, flags);
> + req_addr = __vmbus_request_addr_match(channel, trans_id, rqst_addr);
> spin_unlock_irqrestore(&rqstor->req_lock, flags);
> +
> return req_addr;
> }
> +EXPORT_SYMBOL_GPL(vmbus_request_addr_match);
> +
> +/*
> + * vmbus_request_addr - Returns the memory address stored at @trans_id
> + * in @rqstor. Uses a spin lock to avoid race conditions.
> + * @channel: Pointer to the VMbus channel struct
> + * @trans_id: Request id sent back from Hyper-V. Becomes the requestor's
> + * next request id.
> + */
> +u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id)
> +{
> + return vmbus_request_addr_match(channel, trans_id,
> VMBUS_RQST_ADDR_ANY);
> +}
> EXPORT_SYMBOL_GPL(vmbus_request_addr);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index a7cb596d893b1..c77d78f34b96a 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -788,6 +788,7 @@ struct vmbus_requestor {
>
> #define VMBUS_NO_RQSTOR U64_MAX
> #define VMBUS_RQST_ERROR (U64_MAX - 1)
> +#define VMBUS_RQST_ADDR_ANY U64_MAX
> /* NetVSC-specific */
> #define VMBUS_RQST_ID_NO_RESPONSE (U64_MAX - 2)
> /* StorVSC-specific */
> @@ -1042,6 +1043,10 @@ struct vmbus_channel {
> };
>
> u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr);
> +u64 __vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
> + u64 rqst_addr);
> +u64 vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id,
> + u64 rqst_addr);
> u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id);
>
> static inline bool is_hvsock_channel(const struct vmbus_channel *c)
> --
> 2.25.1

2022-04-09 09:16:07

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH 3/6] Drivers: hv: vmbus: Introduce vmbus_sendpacket_getid()

From: Andrea Parri (Microsoft) <[email protected]> Sent: Wednesday, April 6, 2022 9:30 PM
>
> The function can be used to send a VMbus packet and retrieve the
> corresponding transaction ID. It will be used by hv_pci.
>
> No functional change.
>
> Suggested-by: Michael Kelley <[email protected]>
> Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
> ---
> drivers/hv/channel.c | 38 ++++++++++++++++++++++++++++++++------
> drivers/hv/hyperv_vmbus.h | 2 +-
> drivers/hv/ring_buffer.c | 14 +++++++++++---
> include/linux/hyperv.h | 7 +++++++
> 4 files changed, 51 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 20fc8d50a0398..585a8084848bf 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -1022,11 +1022,13 @@ void vmbus_close(struct vmbus_channel *channel)
> EXPORT_SYMBOL_GPL(vmbus_close);
>
> /**
> - * vmbus_sendpacket() - Send the specified buffer on the given channel
> + * vmbus_sendpacket_getid() - Send the specified buffer on the given channel
> * @channel: Pointer to vmbus_channel structure
> * @buffer: Pointer to the buffer you want to send the data from.
> * @bufferlen: Maximum size of what the buffer holds.
> * @requestid: Identifier of the request
> + * @trans_id: Identifier of the transaction associated to this request, if
> + * the send is successful; undefined, otherwise.
> * @type: Type of packet that is being sent e.g. negotiate, time
> * packet etc.
> * @flags: 0 or VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED
> @@ -1036,8 +1038,8 @@ EXPORT_SYMBOL_GPL(vmbus_close);
> *
> * Mainly used by Hyper-V drivers.
> */
> -int vmbus_sendpacket(struct vmbus_channel *channel, void *buffer,
> - u32 bufferlen, u64 requestid,
> +int vmbus_sendpacket_getid(struct vmbus_channel *channel, void *buffer,
> + u32 bufferlen, u64 requestid, u64 *trans_id,
> enum vmbus_packet_type type, u32 flags)
> {
> struct vmpacket_descriptor desc;
> @@ -1063,7 +1065,31 @@ int vmbus_sendpacket(struct vmbus_channel *channel,
> void *buffer,
> bufferlist[2].iov_base = &aligned_data;
> bufferlist[2].iov_len = (packetlen_aligned - packetlen);
>
> - return hv_ringbuffer_write(channel, bufferlist, num_vecs, requestid);
> + return hv_ringbuffer_write(channel, bufferlist, num_vecs, requestid, trans_id);
> +}
> +EXPORT_SYMBOL(vmbus_sendpacket_getid);
> +
> +/**
> + * vmbus_sendpacket() - Send the specified buffer on the given channel
> + * @channel: Pointer to vmbus_channel structure
> + * @buffer: Pointer to the buffer you want to send the data from.
> + * @bufferlen: Maximum size of what the buffer holds.
> + * @requestid: Identifier of the request
> + * @type: Type of packet that is being sent e.g. negotiate, time
> + * packet etc.
> + * @flags: 0 or VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED
> + *
> + * Sends data in @buffer directly to Hyper-V via the vmbus.
> + * This will send the data unparsed to Hyper-V.
> + *
> + * Mainly used by Hyper-V drivers.
> + */
> +int vmbus_sendpacket(struct vmbus_channel *channel, void *buffer,
> + u32 bufferlen, u64 requestid,
> + enum vmbus_packet_type type, u32 flags)
> +{
> + return vmbus_sendpacket_getid(channel, buffer, bufferlen,
> + requestid, NULL, type, flags);
> }
> EXPORT_SYMBOL(vmbus_sendpacket);
>
> @@ -1122,7 +1148,7 @@ int vmbus_sendpacket_pagebuffer(struct vmbus_channel
> *channel,
> bufferlist[2].iov_base = &aligned_data;
> bufferlist[2].iov_len = (packetlen_aligned - packetlen);
>
> - return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
> + return hv_ringbuffer_write(channel, bufferlist, 3, requestid, NULL);
> }
> EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer);
>
> @@ -1160,7 +1186,7 @@ int vmbus_sendpacket_mpb_desc(struct vmbus_channel
> *channel,
> bufferlist[2].iov_base = &aligned_data;
> bufferlist[2].iov_len = (packetlen_aligned - packetlen);
>
> - return hv_ringbuffer_write(channel, bufferlist, 3, requestid);
> + return hv_ringbuffer_write(channel, bufferlist, 3, requestid, NULL);
> }
> EXPORT_SYMBOL_GPL(vmbus_sendpacket_mpb_desc);
>
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 3a1f007b678a0..64c0b9cbe183b 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -181,7 +181,7 @@ void hv_ringbuffer_cleanup(struct hv_ring_buffer_info
> *ring_info);
>
> int hv_ringbuffer_write(struct vmbus_channel *channel,
> const struct kvec *kv_list, u32 kv_count,
> - u64 requestid);
> + u64 requestid, u64 *trans_id);
>
> int hv_ringbuffer_read(struct vmbus_channel *channel,
> void *buffer, u32 buflen, u32 *buffer_actual_len,
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 3d215d9dec433..e101b11f95e5d 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -283,7 +283,7 @@ void hv_ringbuffer_cleanup(struct hv_ring_buffer_info
> *ring_info)
> /* Write to the ring buffer. */
> int hv_ringbuffer_write(struct vmbus_channel *channel,
> const struct kvec *kv_list, u32 kv_count,
> - u64 requestid)
> + u64 requestid, u64 *trans_id)
> {
> int i;
> u32 bytes_avail_towrite;
> @@ -294,7 +294,7 @@ int hv_ringbuffer_write(struct vmbus_channel *channel,
> unsigned long flags;
> struct hv_ring_buffer_info *outring_info = &channel->outbound;
> struct vmpacket_descriptor *desc = kv_list[0].iov_base;
> - u64 rqst_id = VMBUS_NO_RQSTOR;
> + u64 __trans_id, rqst_id = VMBUS_NO_RQSTOR;
>
> if (channel->rescind)
> return -ENODEV;
> @@ -353,7 +353,15 @@ int hv_ringbuffer_write(struct vmbus_channel *channel,
> }
> }
> desc = hv_get_ring_buffer(outring_info) + old_write;
> - desc->trans_id = (rqst_id == VMBUS_NO_RQSTOR) ? requestid : rqst_id;
> + __trans_id = (rqst_id == VMBUS_NO_RQSTOR) ? requestid : rqst_id;
> + /*
> + * Ensure the compiler doesn't generate code that reads the value of
> + * the transaction ID from the ring buffer, which is shared with the
> + * Hyper-V host and subject to being changed at any time.
> + */
> + WRITE_ONCE(desc->trans_id, __trans_id);
> + if (trans_id)
> + *trans_id = __trans_id;
>
> /* Set previous packet start */
> prev_indices = hv_get_ring_bufferindices(outring_info);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index fe2e0179ed51e..a7cb596d893b1 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1161,6 +1161,13 @@ extern int vmbus_open(struct vmbus_channel *channel,
>
> extern void vmbus_close(struct vmbus_channel *channel);
>
> +extern int vmbus_sendpacket_getid(struct vmbus_channel *channel,
> + void *buffer,
> + u32 bufferLen,
> + u64 requestid,
> + u64 *trans_id,
> + enum vmbus_packet_type type,
> + u32 flags);
> extern int vmbus_sendpacket(struct vmbus_channel *channel,
> void *buffer,
> u32 bufferLen,
> --
> 2.25.1

Reviewed-by: Michael Kelley <[email protected]>

2022-04-09 15:28:27

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH 6/6] PCI: hv: Fix synchronization between channel callback and hv_compose_msi_msg()

From: Andrea Parri (Microsoft) <[email protected]> Sent: Wednesday, April 6, 2022 9:30 PM
>
> Dexuan wrote:
>
> "[...] when we disable AccelNet, the host PCI VSP driver sends a
> PCI_EJECT message first, and the channel callback may set
> hpdev->state to hv_pcichild_ejecting on a different CPU. This can
> cause hv_compose_msi_msg() to exit from the loop and 'return', and
> the on-stack variable 'ctxt' is invalid. Now, if the response
> message from the host arrives, the channel callback will try to
> access the invalid 'ctxt' variable, and this may cause a crash."
>
> Schematically:
>
> Hyper-V sends PCI_EJECT msg
> hv_pci_onchannelcallback()
> state = hv_pcichild_ejecting
> hv_compose_msi_msg()
> alloc and init comp_pkt
> state == hv_pcichild_ejecting
> Hyper-V sends VM_PKT_COMP msg
> hv_pci_onchannelcallback()
> retrieve address of comp_pkt
> 'free' comp_pkt and return
> comp_pkt->completion_func()
>
> Dexuan also showed how the crash can be triggered after introducing
> suitable delays in the driver code, thus validating the 'assumption'
> that the host can still normally respond to the guest's compose_msi
> request after the host has started to eject the PCI device.
>
> Fix the synchronization by leveraging the requestor lock as follows:
>
> - Before 'return'-ing in hv_compose_msi_msg(), remove the ID (while
> holding the requestor lock) associated to the completion packet.
>
> - Retrieve the address *and call ->completion_func() within a same
> (requestor) critical section in hv_pci_onchannelcallback().
>
> Fixes: de0aa7b2f97d3 ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
> Reported-by: Wei Hu <[email protected]>
> Reported-by: Dexuan Cui <[email protected]>
> Suggested-by: Michael Kelley <[email protected]>
> Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
> ---
> The "Fixes:" tag is mainly a reference: a back-port would depend
> on the entire series (which, in turn, shouldn't be backported to
> commits preceding bf5fd8cae3c8f).
>
> drivers/pci/controller/pci-hyperv.c | 33 +++++++++++++++++++++++------
> 1 file changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index c1322ac37cda9..f1d794f8a5ef1 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1695,7 +1695,7 @@ static void hv_compose_msi_msg(struct irq_data *data,
> struct msi_msg *msg)
> struct pci_create_interrupt3 v3;
> } int_pkts;
> } __packed ctxt;
> -
> + u64 trans_id;
> u32 size;
> int ret;
>
> @@ -1757,10 +1757,10 @@ static void hv_compose_msi_msg(struct irq_data *data,
> struct msi_msg *msg)
> goto free_int_desc;
> }
>
> - ret = vmbus_sendpacket(hpdev->hbus->hdev->channel, &ctxt.int_pkts,
> - size, (unsigned long)&ctxt.pci_pkt,
> - VM_PKT_DATA_INBAND,
> - VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
> + ret = vmbus_sendpacket_getid(hpdev->hbus->hdev->channel, &ctxt.int_pkts,
> + size, (unsigned long)&ctxt.pci_pkt,
> + &trans_id, VM_PKT_DATA_INBAND,
> +
> VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
> if (ret) {
> dev_err(&hbus->hdev->device,
> "Sending request for interrupt failed: 0x%x",
> @@ -1839,6 +1839,15 @@ static void hv_compose_msi_msg(struct irq_data *data,
> struct msi_msg *msg)
>
> enable_tasklet:
> tasklet_enable(&channel->callback_event);
> + /*
> + * The completion packet on the stack becomes invalid after 'return';
> + * remove the ID from the VMbus requestor if the identifier is still
> + * mapped to/associated with the packet. (The identifier could have
> + * been 're-used', i.e., already removed and (re-)mapped.)
> + *
> + * Cf. hv_pci_onchannelcallback().
> + */
> + vmbus_request_addr_match(channel, trans_id, (unsigned long)&ctxt.pci_pkt);
> free_int_desc:
> kfree(int_desc);
> drop_reference:
> @@ -2717,6 +2726,7 @@ static void hv_pci_onchannelcallback(void *context)
> struct pci_dev_inval_block *inval;
> struct pci_dev_incoming *dev_message;
> struct hv_pci_dev *hpdev;
> + unsigned long flags;
>
> buffer = kmalloc(bufferlen, GFP_ATOMIC);
> if (!buffer)
> @@ -2751,8 +2761,11 @@ static void hv_pci_onchannelcallback(void *context)
> switch (desc->type) {
> case VM_PKT_COMP:
>
> - req_addr = chan->request_addr_callback(chan, req_id);
> + lock_requestor(chan, flags);
> + req_addr = __vmbus_request_addr_match(chan, req_id,
> + VMBUS_RQST_ADDR_ANY);
> if (req_addr == VMBUS_RQST_ERROR) {
> + unlock_requestor(chan, flags);
> dev_warn_ratelimited(&hbus->hdev->device,
> "Invalid transaction ID %llx\n",
> req_id);
> @@ -2760,9 +2773,17 @@ static void hv_pci_onchannelcallback(void *context)
> }
> comp_packet = (struct pci_packet *)req_addr;
> response = (struct pci_response *)buffer;
> + /*
> + * Call ->completion_func() within the critical section to make
> + * sure that the packet pointer is still valid during the call:
> + * here 'valid' means that there's a task still waiting for the
> + * completion, and that the packet data is still on the waiting
> + * task's stack. Cf. hv_compose_msi_msg().
> + */
> comp_packet->completion_func(comp_packet->compl_ctxt,
> response,
> bytes_recvd);
> + unlock_requestor(chan, flags);
> break;
>
> case VM_PKT_DATA_INBAND:
> --
> 2.25.1

Reviewed-by: Michael Kelley <[email protected]>

2022-04-09 15:36:51

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH 4/6] Drivers: hv: vmbus: Introduce vmbus_request_addr_match()

> > @@ -1300,25 +1294,60 @@ u64 vmbus_request_addr(struct vmbus_channel
> > *channel, u64 trans_id)
> > if (!trans_id)
> > return VMBUS_RQST_ERROR;
> >
> > - spin_lock_irqsave(&rqstor->req_lock, flags);
> > -
> > /* Data corresponding to trans_id is stored at trans_id - 1 */
> > trans_id--;
> >
> > /* Invalid trans_id */
> > - if (trans_id >= rqstor->size || !test_bit(trans_id, rqstor->req_bitmap)) {
> > - spin_unlock_irqrestore(&rqstor->req_lock, flags);
> > + if (trans_id >= rqstor->size || !test_bit(trans_id, rqstor->req_bitmap))
> > return VMBUS_RQST_ERROR;
> > - }
> >
> > req_addr = rqstor->req_arr[trans_id];
> > - rqstor->req_arr[trans_id] = rqstor->next_request_id;
> > - rqstor->next_request_id = trans_id;
> > + if (rqst_addr == VMBUS_RQST_ADDR_ANY || req_addr == rqst_addr) {
> > + rqstor->req_arr[trans_id] = rqstor->next_request_id;
> > + rqstor->next_request_id = trans_id;
> >
> > - /* The already held spin lock provides atomicity */
> > - bitmap_clear(rqstor->req_bitmap, trans_id, 1);
> > + /* The already held spin lock provides atomicity */
> > + bitmap_clear(rqstor->req_bitmap, trans_id, 1);
> > + }
>
> In the case where a specific match is required, and trans_id is
> valid but the addr's do not match, it looks like this function will
> return the addr that didn't match, without removing the entry.

Yes, that is consistent with the description on vmbus_request_addr_match():

Returns the memory address stored at @trans_id, or VMBUS_RQST_ERROR if
@trans_id is not contained in the requestor.


> Shouldn't it return VMBUS_RQST_ERROR in that case?

Can certainly be done, although I'm not sure to follow your concerns. Can
you elaborate?

Thanks,
Andrea

2022-04-10 02:38:36

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH 4/6] Drivers: hv: vmbus: Introduce vmbus_request_addr_match()

> > > In the case where a specific match is required, and trans_id is
> > > valid but the addr's do not match, it looks like this function will
> > > return the addr that didn't match, without removing the entry.
> >
> > Yes, that is consistent with the description on vmbus_request_addr_match():
> >
> > Returns the memory address stored at @trans_id, or VMBUS_RQST_ERROR if
> > @trans_id is not contained in the requestor.
> >
> >
> > > Shouldn't it return VMBUS_RQST_ERROR in that case?
> >
> > Can certainly be done, although I'm not sure to follow your concerns. Can
> > you elaborate?
> >
>
> Having the function return "success" when it failed to match is unexpected
> for me. There's only one invocation where we care about matching
> (in hv_compose_msi_msg). In that invocation the purpose for matching is to
> not remove the wrong entry, and the return value is ignored. So I think
> it all works correctly.

You're reading it wrongly: the point is that there's nothing wrong in *not
removing the "wrong entry" (or in failing to match). In the mentioned use,
that means the channel callback has already processed "our" request, and
that we don't have to worry about the ID. (Otherwise, i.e. if we do match,
the callback will eventually scream "Invalid transaction".)


> Just thinking out loud, maybe vmbus_request_addr_match() should be
> renamed to vmbus_request_addr_remove(), and not have a return value?

Mmh. We have vmbus_request_addr() that (always) removes the ID; it seems
a _remove() would just add to the confusion. And removing the return value
would mean duplicating most of vmbus_request_addr() in the "new" function.
So, I'm not convinced that's the right thing to do. I'm inclined to leave
this patch as is (and, as usual, happy to be proven wrong).

Thanks,
Andrea

2022-04-10 16:57:33

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH 4/6] Drivers: hv: vmbus: Introduce vmbus_request_addr_match()

From: Andrea Parri <[email protected]> Sent: Friday, April 8, 2022 9:47 AM
>
> > > @@ -1300,25 +1294,60 @@ u64 vmbus_request_addr(struct vmbus_channel
> > > *channel, u64 trans_id)
> > > if (!trans_id)
> > > return VMBUS_RQST_ERROR;
> > >
> > > - spin_lock_irqsave(&rqstor->req_lock, flags);
> > > -
> > > /* Data corresponding to trans_id is stored at trans_id - 1 */
> > > trans_id--;
> > >
> > > /* Invalid trans_id */
> > > - if (trans_id >= rqstor->size || !test_bit(trans_id, rqstor->req_bitmap)) {
> > > - spin_unlock_irqrestore(&rqstor->req_lock, flags);
> > > + if (trans_id >= rqstor->size || !test_bit(trans_id, rqstor->req_bitmap))
> > > return VMBUS_RQST_ERROR;
> > > - }
> > >
> > > req_addr = rqstor->req_arr[trans_id];
> > > - rqstor->req_arr[trans_id] = rqstor->next_request_id;
> > > - rqstor->next_request_id = trans_id;
> > > + if (rqst_addr == VMBUS_RQST_ADDR_ANY || req_addr == rqst_addr) {
> > > + rqstor->req_arr[trans_id] = rqstor->next_request_id;
> > > + rqstor->next_request_id = trans_id;
> > >
> > > - /* The already held spin lock provides atomicity */
> > > - bitmap_clear(rqstor->req_bitmap, trans_id, 1);
> > > + /* The already held spin lock provides atomicity */
> > > + bitmap_clear(rqstor->req_bitmap, trans_id, 1);
> > > + }
> >
> > In the case where a specific match is required, and trans_id is
> > valid but the addr's do not match, it looks like this function will
> > return the addr that didn't match, without removing the entry.
>
> Yes, that is consistent with the description on vmbus_request_addr_match():
>
> Returns the memory address stored at @trans_id, or VMBUS_RQST_ERROR if
> @trans_id is not contained in the requestor.
>
>
> > Shouldn't it return VMBUS_RQST_ERROR in that case?
>
> Can certainly be done, although I'm not sure to follow your concerns. Can
> you elaborate?
>

Having the function return "success" when it failed to match is unexpected
for me. There's only one invocation where we care about matching
(in hv_compose_msi_msg). In that invocation the purpose for matching is to
not remove the wrong entry, and the return value is ignored. So I think
it all works correctly.

Just thinking out loud, maybe vmbus_request_addr_match() should be
renamed to vmbus_request_addr_remove(), and not have a return value?
That would be a bit more consistent with the actual purpose.

Michael


2022-04-11 10:22:13

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH 2/6] PCI: hv: Use vmbus_requestor to generate transaction IDs for VMbus hardening

From: Andrea Parri (Microsoft) <[email protected]> Sent: Wednesday, April 6, 2022 9:30 PM
>
> Currently, pointers to guest memory are passed to Hyper-V as transaction
> IDs in hv_pci. In the face of errors or malicious behavior in Hyper-V,
> hv_pci should not expose or trust the transaction IDs returned by
> Hyper-V to be valid guest memory addresses. Instead, use small integers
> generated by vmbus_requestor as request (transaction) IDs.
>
> Suggested-by: Michael Kelley <[email protected]>
> Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
> ---
> drivers/pci/controller/pci-hyperv.c | 39 +++++++++++++++++++++--------
> 1 file changed, 29 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 88b3b56d05228..c1322ac37cda9 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -91,6 +91,13 @@ static enum pci_protocol_version_t pci_protocol_versions[] = {
> /* space for 32bit serial number as string */
> #define SLOT_NAME_SIZE 11
>
> +/*
> + * Size of requestor for VMbus; the value is based on the observation
> + * that having more than one request outstanding is 'rare', and so 64
> + * should be generous in ensuring that we don't ever run out.
> + */
> +#define HV_PCI_RQSTOR_SIZE 64
> +
> /*
> * Message Types
> */
> @@ -1407,7 +1414,7 @@ static void hv_int_desc_free(struct hv_pci_dev *hpdev,
> int_pkt->wslot.slot = hpdev->desc.win_slot.slot;
> int_pkt->int_desc = *int_desc;
> vmbus_sendpacket(hpdev->hbus->hdev->channel, int_pkt, sizeof(*int_pkt),
> - (unsigned long)&ctxt.pkt, VM_PKT_DATA_INBAND, 0);
> + 0, VM_PKT_DATA_INBAND, 0);
> kfree(int_desc);
> }
>
> @@ -2649,7 +2656,7 @@ static void hv_eject_device_work(struct work_struct *work)
> ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE;
> ejct_pkt->wslot.slot = hpdev->desc.win_slot.slot;
> vmbus_sendpacket(hbus->hdev->channel, ejct_pkt,
> - sizeof(*ejct_pkt), (unsigned long)&ctxt.pkt,
> + sizeof(*ejct_pkt), 0,
> VM_PKT_DATA_INBAND, 0);
>
> /* For the get_pcichild() in hv_pci_eject_device() */
> @@ -2696,8 +2703,9 @@ static void hv_pci_onchannelcallback(void *context)
> const int packet_size = 0x100;
> int ret;
> struct hv_pcibus_device *hbus = context;
> + struct vmbus_channel *chan = hbus->hdev->channel;
> u32 bytes_recvd;
> - u64 req_id;
> + u64 req_id, req_addr;
> struct vmpacket_descriptor *desc;
> unsigned char *buffer;
> int bufferlen = packet_size;
> @@ -2715,8 +2723,8 @@ static void hv_pci_onchannelcallback(void *context)
> return;
>
> while (1) {
> - ret = vmbus_recvpacket_raw(hbus->hdev->channel, buffer,
> - bufferlen, &bytes_recvd, &req_id);
> + ret = vmbus_recvpacket_raw(chan, buffer, bufferlen,
> + &bytes_recvd, &req_id);
>
> if (ret == -ENOBUFS) {
> kfree(buffer);
> @@ -2743,11 +2751,14 @@ static void hv_pci_onchannelcallback(void *context)
> switch (desc->type) {
> case VM_PKT_COMP:
>
> - /*
> - * The host is trusted, and thus it's safe to interpret
> - * this transaction ID as a pointer.
> - */
> - comp_packet = (struct pci_packet *)req_id;
> + req_addr = chan->request_addr_callback(chan, req_id);
> + if (req_addr == VMBUS_RQST_ERROR) {
> + dev_warn_ratelimited(&hbus->hdev->device,
> + "Invalid transaction ID %llx\n",
> + req_id);

This handling of a bad requestID error is a bit different from storvsc
and netvsc. They both use dev_err(). Earlier in the storvsc and netvsc
cases, I remember some discussion about whether to rate limit these errors,
and evidently we decided not to. I think we should be consistent unless
there's a specific reason not to.


> + break;
> + }
> + comp_packet = (struct pci_packet *)req_addr;
> response = (struct pci_response *)buffer;
> comp_packet->completion_func(comp_packet->compl_ctxt,
> response,
> @@ -3428,6 +3439,10 @@ static int hv_pci_probe(struct hv_device *hdev,
> goto free_dom;
> }
>
> + hdev->channel->next_request_id_callback = vmbus_next_request_id;
> + hdev->channel->request_addr_callback = vmbus_request_addr;
> + hdev->channel->rqstor_size = HV_PCI_RQSTOR_SIZE;
> +
> ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
> hv_pci_onchannelcallback, hbus);
> if (ret)
> @@ -3758,6 +3773,10 @@ static int hv_pci_resume(struct hv_device *hdev)
>
> hbus->state = hv_pcibus_init;
>
> + hdev->channel->next_request_id_callback = vmbus_next_request_id;
> + hdev->channel->request_addr_callback = vmbus_request_addr;
> + hdev->channel->rqstor_size = HV_PCI_RQSTOR_SIZE;
> +
> ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
> hv_pci_onchannelcallback, hbus);
> if (ret)
> --
> 2.25.1

2022-04-12 00:52:17

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH 4/6] Drivers: hv: vmbus: Introduce vmbus_request_addr_match()

From: Andrea Parri <[email protected]> Sent: Friday, April 8, 2022 1:38 PM
>
> > > > In the case where a specific match is required, and trans_id is
> > > > valid but the addr's do not match, it looks like this function will
> > > > return the addr that didn't match, without removing the entry.
> > >
> > > Yes, that is consistent with the description on vmbus_request_addr_match():
> > >
> > > Returns the memory address stored at @trans_id, or VMBUS_RQST_ERROR if
> > > @trans_id is not contained in the requestor.
> > >
> > >
> > > > Shouldn't it return VMBUS_RQST_ERROR in that case?
> > >
> > > Can certainly be done, although I'm not sure to follow your concerns. Can
> > > you elaborate?
> > >
> >
> > Having the function return "success" when it failed to match is unexpected
> > for me. There's only one invocation where we care about matching
> > (in hv_compose_msi_msg). In that invocation the purpose for matching is to
> > not remove the wrong entry, and the return value is ignored. So I think
> > it all works correctly.
>
> You're reading it wrongly: the point is that there's nothing wrong in *not
> removing the "wrong entry" (or in failing to match). In the mentioned use,
> that means the channel callback has already processed "our" request, and
> that we don't have to worry about the ID. (Otherwise, i.e. if we do match,
> the callback will eventually scream "Invalid transaction".)
>
>
> > Just thinking out loud, maybe vmbus_request_addr_match() should be
> > renamed to vmbus_request_addr_remove(), and not have a return value?
>
> Mmh. We have vmbus_request_addr() that (always) removes the ID; it seems
> a _remove() would just add to the confusion. And removing the return value
> would mean duplicating most of vmbus_request_addr() in the "new" function.
> So, I'm not convinced that's the right thing to do. I'm inclined to leave
> this patch as is (and, as usual, happy to be proven wrong).
>

I'll defer to your judgment. I don't see anything that broken with the
patch as written, so I can live with it as is.

Michael