2016-03-18 12:33:31

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
delivered to CPU0 regardless of what CPU we're sending CHANNELMSG_UNLOAD
from. vmbus_wait_for_unload() doesn't account for the fact that in case
we're crashing on some other CPU and CPU0 is still alive and operational
CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
vmbus_connection.unload_event, our wait on the current CPU will never
end.

Do the following:
1) Check for completion_done() in the loop. In case interrupt handler is
still alive we'll get the confirmation we need.

2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE will be
delivered there. We can race with still-alive interrupt handler doing
the same but we don't care as we're checking completion_done() now.

3) Cleanup message pages on all CPUs. This is required (at least for the
current CPU as we're clearing CPU0 messages now but we may want to bring
up additional CPUs on crash) as new messages won't be delivered till we
consume what's pending. On boot we'll place message pages somewhere else
and we won't be able to read stale messages.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++-----
1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index b10e8f74..5f37057 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel *channel, const uuid_le *type_gui

static void vmbus_wait_for_unload(void)
{
- int cpu = smp_processor_id();
- void *page_addr = hv_context.synic_message_page[cpu];
+ int cpu;
+ void *page_addr = hv_context.synic_message_page[0];
struct hv_message *msg = (struct hv_message *)page_addr +
VMBUS_MESSAGE_SINT;
struct vmbus_channel_message_header *hdr;
bool unloaded = false;

- while (1) {
+ /*
+ * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0. When we're
+ * crashing on a different CPU let's hope that IRQ handler on CPU0 is
+ * still functional and vmbus_unload_response() will complete
+ * vmbus_connection.unload_event. If not, the last thing we can do is
+ * read message page for CPU0 regardless of what CPU we're on.
+ */
+ while (!unloaded) {
+ if (completion_done(&vmbus_connection.unload_event)) {
+ unloaded = true;
+ break;
+ }
+
if (READ_ONCE(msg->header.message_type) == HVMSG_NONE) {
mdelay(10);
continue;
@@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)
unloaded = true;

vmbus_signal_eom(msg);
+ }

- if (unloaded)
- break;
+ /*
+ * We're crashing and already got the UNLOAD_RESPONSE, cleanup all
+ * maybe-pending messages on all CPUs to be able to receive new
+ * messages after we reconnect.
+ */
+ for_each_online_cpu(cpu) {
+ page_addr = hv_context.synic_message_page[cpu];
+ msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
+ msg->header.message_type = HVMSG_NONE;
}
}

--
2.5.0


2016-03-18 15:20:46

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

2016-03-18 13:33+0100, Vitaly Kuznetsov:
> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
> delivered to CPU0 regardless of what CPU we're sending CHANNELMSG_UNLOAD
> from. vmbus_wait_for_unload() doesn't account for the fact that in case
> we're crashing on some other CPU and CPU0 is still alive and operational
> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
> vmbus_connection.unload_event, our wait on the current CPU will never
> end.

(Any chance of learning about this behavior from the spec?)

> Do the following:
> 1) Check for completion_done() in the loop. In case interrupt handler is
> still alive we'll get the confirmation we need.
>
> 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE will be
> delivered there. We can race with still-alive interrupt handler doing
> the same but we don't care as we're checking completion_done() now.

(Yeah, seems better than hv_setup_vmbus_irq(NULL) or other hacks.)

> 3) Cleanup message pages on all CPUs. This is required (at least for the
> current CPU as we're clearing CPU0 messages now but we may want to bring
> up additional CPUs on crash) as new messages won't be delivered till we
> consume what's pending. On boot we'll place message pages somewhere else
> and we won't be able to read stale messages.

What if HV already set the pending message bit on current message,
do we get any guarantees that clearing once after UNLOAD_RESPONSE is
enough?

> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---

I had a question about NULL below. (Parenthesised rants aren't related
to r-b tag. ;)

> drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++-----
> 1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index b10e8f74..5f37057 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel *channel, const uuid_le *type_gui
>
> static void vmbus_wait_for_unload(void)
> {
> - int cpu = smp_processor_id();
> - void *page_addr = hv_context.synic_message_page[cpu];
> + int cpu;
> + void *page_addr = hv_context.synic_message_page[0];
> struct hv_message *msg = (struct hv_message *)page_addr +
> VMBUS_MESSAGE_SINT;
> struct vmbus_channel_message_header *hdr;
> bool unloaded = false;
>
> - while (1) {
> + /*
> + * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0. When we're
> + * crashing on a different CPU let's hope that IRQ handler on CPU0 is
> + * still functional and vmbus_unload_response() will complete
> + * vmbus_connection.unload_event. If not, the last thing we can do is
> + * read message page for CPU0 regardless of what CPU we're on.
> + */
> + while (!unloaded) {

(I'd feel a bit safer if this was bounded by some timeout, but all
scenarios where this would make a difference are unplausible ...
queue_work() not working while the rest is fine is the best one.)

> + if (completion_done(&vmbus_connection.unload_event)) {
> + unloaded = true;

(No need to set unloaded when you break.)

> + break;
> + }
> +
> if (READ_ONCE(msg->header.message_type) == HVMSG_NONE) {
> mdelay(10);
> continue;
> @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)

(I'm not a huge fan of the unloaded variable; what about remembering the
header/msgtype here ...

> unloaded = true;
>
> vmbus_signal_eom(msg);

and checking its value here?)

> + }
>
> - if (unloaded)
> - break;
> + /*
> + * We're crashing and already got the UNLOAD_RESPONSE, cleanup all
> + * maybe-pending messages on all CPUs to be able to receive new
> + * messages after we reconnect.
> + */
> + for_each_online_cpu(cpu) {
> + page_addr = hv_context.synic_message_page[cpu];

Can't this be NULL?

> + msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
> + msg->header.message_type = HVMSG_NONE;
> }

(And, this block belongs to a separate function. ;])

2016-03-18 15:53:43

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

Radim Krcmar <[email protected]> writes:

> 2016-03-18 13:33+0100, Vitaly Kuznetsov:
>> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
>> delivered to CPU0 regardless of what CPU we're sending CHANNELMSG_UNLOAD
>> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> we're crashing on some other CPU and CPU0 is still alive and operational
>> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> vmbus_connection.unload_event, our wait on the current CPU will never
>> end.
>
> (Any chance of learning about this behavior from the spec?)
>
>> Do the following:
>> 1) Check for completion_done() in the loop. In case interrupt handler is
>> still alive we'll get the confirmation we need.
>>
>> 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE will be
>> delivered there. We can race with still-alive interrupt handler doing
>> the same but we don't care as we're checking completion_done() now.
>
> (Yeah, seems better than hv_setup_vmbus_irq(NULL) or other hacks.)
>
>> 3) Cleanup message pages on all CPUs. This is required (at least for the
>> current CPU as we're clearing CPU0 messages now but we may want to bring
>> up additional CPUs on crash) as new messages won't be delivered till we
>> consume what's pending. On boot we'll place message pages somewhere else
>> and we won't be able to read stale messages.
>
> What if HV already set the pending message bit on current message,
> do we get any guarantees that clearing once after UNLOAD_RESPONSE is
> enough?

I think so but I'd like to get a confirmation from K.Y./Alex/Haiyang.

>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>
> I had a question about NULL below. (Parenthesised rants aren't related
> to r-b tag. ;)
>
>> drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++-----
>> 1 file changed, 25 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index b10e8f74..5f37057 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel *channel, const uuid_le *type_gui
>>
>> static void vmbus_wait_for_unload(void)
>> {
>> - int cpu = smp_processor_id();
>> - void *page_addr = hv_context.synic_message_page[cpu];
>> + int cpu;
>> + void *page_addr = hv_context.synic_message_page[0];
>> struct hv_message *msg = (struct hv_message *)page_addr +
>> VMBUS_MESSAGE_SINT;
>> struct vmbus_channel_message_header *hdr;
>> bool unloaded = false;
>>
>> - while (1) {
>> + /*
>> + * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0. When we're
>> + * crashing on a different CPU let's hope that IRQ handler on CPU0 is
>> + * still functional and vmbus_unload_response() will complete
>> + * vmbus_connection.unload_event. If not, the last thing we can do is
>> + * read message page for CPU0 regardless of what CPU we're on.
>> + */
>> + while (!unloaded) {
>
> (I'd feel a bit safer if this was bounded by some timeout, but all
> scenarios where this would make a difference are unplausible ...
> queue_work() not working while the rest is fine is the best one.)
>
>> + if (completion_done(&vmbus_connection.unload_event)) {
>> + unloaded = true;
>
> (No need to set unloaded when you break.)
>
>> + break;
>> + }
>> +
>> if (READ_ONCE(msg->header.message_type) == HVMSG_NONE) {
>> mdelay(10);
>> continue;
>> @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)
>
> (I'm not a huge fan of the unloaded variable; what about remembering the
> header/msgtype here ...
>
>> unloaded = true;
>>
>> vmbus_signal_eom(msg);
>
> and checking its value here?)
>

Sure, but we'll have to use a variable for that ... why would it be
better than 'unloaded'?

>> + }
>>
>> - if (unloaded)
>> - break;
>> + /*
>> + * We're crashing and already got the UNLOAD_RESPONSE, cleanup all
>> + * maybe-pending messages on all CPUs to be able to receive new
>> + * messages after we reconnect.
>> + */
>> + for_each_online_cpu(cpu) {
>> + page_addr = hv_context.synic_message_page[cpu];
>
> Can't this be NULL?

It can't, we allocate it from hv_synic_alloc() (and we don't support cpu
onlining/offlining on WS2012R2+).

>
>> + msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
>> + msg->header.message_type = HVMSG_NONE;
>> }
>
> (And, this block belongs to a separate function. ;])

I thought about moving it to hv_crash_handler() but then I decided to
leave it here as the need for this fixup is rather an artifact of how we
recieve the message. But I'm flexible here)

--
Vitaly

2016-03-18 16:11:57

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

2016-03-18 16:53+0100, Vitaly Kuznetsov:
> Radim Krcmar <[email protected]> writes:
>> 2016-03-18 13:33+0100, Vitaly Kuznetsov:
>>> @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)
>>
>> (I'm not a huge fan of the unloaded variable; what about remembering the
>> header/msgtype here ...
>>
>>> unloaded = true;
>>>
>>> vmbus_signal_eom(msg);
>>
>> and checking its value here?)
>>
>
> Sure, but we'll have to use a variable for that ... why would it be
> better than 'unloaded'?

It's easier to understand IMO,

x = mem | x = mem
if *x == sth | z = *x
u = true |
eoi() | eoi()
if u | if z == sth
break | break

And you can replace msg with the new variable,

z = *mem
eoi()
if z == sth
break

>> Can't this be NULL?
>
> It can't, we allocate it from hv_synic_alloc() (and we don't support cpu
> onlining/offlining on WS2012R2+).

Reviewed-by: Radim Krčmář <[email protected]>

Thanks.

>>> + msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
>>> + msg->header.message_type = HVMSG_NONE;
>>> }
>>
>> (And, this block belongs to a separate function. ;])
>
> I thought about moving it to hv_crash_handler() but then I decided to
> leave it here as the need for this fixup is rather an artifact of how we
> recieve the message. But I'm flexible here)

Ok, clearing all VCPUs made me think that it would be generally useful.

2016-03-18 18:02:58

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH] Drivers: hv: vmbus: handle various crash scenarios



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:[email protected]]
> Sent: Friday, March 18, 2016 5:33 AM
> To: [email protected]
> Cc: [email protected]; KY Srinivasan <[email protected]>;
> Haiyang Zhang <[email protected]>; Alex Ng (LIS)
> <[email protected]>; Radim Krcmar <[email protected]>; Cathy
> Avery <[email protected]>
> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>
> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
> delivered to CPU0 regardless of what CPU we're sending
> CHANNELMSG_UNLOAD
> from. vmbus_wait_for_unload() doesn't account for the fact that in case
> we're crashing on some other CPU and CPU0 is still alive and operational
> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
> vmbus_connection.unload_event, our wait on the current CPU will never
> end.

What was the host you were testing on?

K. Y
>
> Do the following:
> 1) Check for completion_done() in the loop. In case interrupt handler is
> still alive we'll get the confirmation we need.
>
> 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE
> will be
> delivered there. We can race with still-alive interrupt handler doing
> the same but we don't care as we're checking completion_done() now.
>
> 3) Cleanup message pages on all CPUs. This is required (at least for the
> current CPU as we're clearing CPU0 messages now but we may want to
> bring
> up additional CPUs on crash) as new messages won't be delivered till we
> consume what's pending. On boot we'll place message pages somewhere
> else
> and we won't be able to read stale messages.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++-----
> 1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index b10e8f74..5f37057 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel
> *channel, const uuid_le *type_gui
>
> static void vmbus_wait_for_unload(void)
> {
> - int cpu = smp_processor_id();
> - void *page_addr = hv_context.synic_message_page[cpu];
> + int cpu;
> + void *page_addr = hv_context.synic_message_page[0];
> struct hv_message *msg = (struct hv_message *)page_addr +
> VMBUS_MESSAGE_SINT;
> struct vmbus_channel_message_header *hdr;
> bool unloaded = false;
>
> - while (1) {
> + /*
> + * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0.
> When we're
> + * crashing on a different CPU let's hope that IRQ handler on CPU0 is
> + * still functional and vmbus_unload_response() will complete
> + * vmbus_connection.unload_event. If not, the last thing we can do
> is
> + * read message page for CPU0 regardless of what CPU we're on.
> + */
> + while (!unloaded) {
> + if (completion_done(&vmbus_connection.unload_event)) {
> + unloaded = true;
> + break;
> + }
> +
> if (READ_ONCE(msg->header.message_type) ==
> HVMSG_NONE) {
> mdelay(10);
> continue;
> @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)
> unloaded = true;
>
> vmbus_signal_eom(msg);
> + }
>
> - if (unloaded)
> - break;
> + /*
> + * We're crashing and already got the UNLOAD_RESPONSE, cleanup
> all
> + * maybe-pending messages on all CPUs to be able to receive new
> + * messages after we reconnect.
> + */
> + for_each_online_cpu(cpu) {
> + page_addr = hv_context.synic_message_page[cpu];
> + msg = (struct hv_message *)page_addr +
> VMBUS_MESSAGE_SINT;
> + msg->header.message_type = HVMSG_NONE;
> }
> }
>
> --
> 2.5.0

2016-03-21 07:52:06

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

KY Srinivasan <[email protected]> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:[email protected]]
>> Sent: Friday, March 18, 2016 5:33 AM
>> To: [email protected]
>> Cc: [email protected]; KY Srinivasan <[email protected]>;
>> Haiyang Zhang <[email protected]>; Alex Ng (LIS)
>> <[email protected]>; Radim Krcmar <[email protected]>; Cathy
>> Avery <[email protected]>
>> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>>
>> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
>> delivered to CPU0 regardless of what CPU we're sending
>> CHANNELMSG_UNLOAD
>> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> we're crashing on some other CPU and CPU0 is still alive and operational
>> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> vmbus_connection.unload_event, our wait on the current CPU will never
>> end.
>
> What was the host you were testing on?
>

I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
by forcing crash on a secondary CPU, e.g.:

# cat crash.sh
#! /bin/sh
echo c > /proc/sysrq-trigger

# taskset -c 1 ./crash.sh

>>
>> Do the following:
>> 1) Check for completion_done() in the loop. In case interrupt handler is
>> still alive we'll get the confirmation we need.
>>
>> 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE
>> will be
>> delivered there. We can race with still-alive interrupt handler doing
>> the same but we don't care as we're checking completion_done() now.
>>
>> 3) Cleanup message pages on all CPUs. This is required (at least for the
>> current CPU as we're clearing CPU0 messages now but we may want to
>> bring
>> up additional CPUs on crash) as new messages won't be delivered till we
>> consume what's pending. On boot we'll place message pages somewhere
>> else
>> and we won't be able to read stale messages.
>>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>> drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++-----
>> 1 file changed, 25 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index b10e8f74..5f37057 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel
>> *channel, const uuid_le *type_gui
>>
>> static void vmbus_wait_for_unload(void)
>> {
>> - int cpu = smp_processor_id();
>> - void *page_addr = hv_context.synic_message_page[cpu];
>> + int cpu;
>> + void *page_addr = hv_context.synic_message_page[0];
>> struct hv_message *msg = (struct hv_message *)page_addr +
>> VMBUS_MESSAGE_SINT;
>> struct vmbus_channel_message_header *hdr;
>> bool unloaded = false;
>>
>> - while (1) {
>> + /*
>> + * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0.
>> When we're
>> + * crashing on a different CPU let's hope that IRQ handler on CPU0 is
>> + * still functional and vmbus_unload_response() will complete
>> + * vmbus_connection.unload_event. If not, the last thing we can do
>> is
>> + * read message page for CPU0 regardless of what CPU we're on.
>> + */
>> + while (!unloaded) {
>> + if (completion_done(&vmbus_connection.unload_event)) {
>> + unloaded = true;
>> + break;
>> + }
>> +
>> if (READ_ONCE(msg->header.message_type) ==
>> HVMSG_NONE) {
>> mdelay(10);
>> continue;
>> @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)
>> unloaded = true;
>>
>> vmbus_signal_eom(msg);
>> + }
>>
>> - if (unloaded)
>> - break;
>> + /*
>> + * We're crashing and already got the UNLOAD_RESPONSE, cleanup
>> all
>> + * maybe-pending messages on all CPUs to be able to receive new
>> + * messages after we reconnect.
>> + */
>> + for_each_online_cpu(cpu) {
>> + page_addr = hv_context.synic_message_page[cpu];
>> + msg = (struct hv_message *)page_addr +
>> VMBUS_MESSAGE_SINT;
>> + msg->header.message_type = HVMSG_NONE;
>> }
>> }
>>
>> --
>> 2.5.0

--
Vitaly

2016-03-21 22:44:46

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH] Drivers: hv: vmbus: handle various crash scenarios



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:[email protected]]
> Sent: Monday, March 21, 2016 12:52 AM
> To: KY Srinivasan <[email protected]>
> Cc: [email protected]; [email protected]; Haiyang
> Zhang <[email protected]>; Alex Ng (LIS) <[email protected]>;
> Radim Krcmar <[email protected]>; Cathy Avery <[email protected]>
> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>
> KY Srinivasan <[email protected]> writes:
>
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:[email protected]]
> >> Sent: Friday, March 18, 2016 5:33 AM
> >> To: [email protected]
> >> Cc: [email protected]; KY Srinivasan <[email protected]>;
> >> Haiyang Zhang <[email protected]>; Alex Ng (LIS)
> >> <[email protected]>; Radim Krcmar <[email protected]>; Cathy
> >> Avery <[email protected]>
> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
> >>
> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is
> always
> >> delivered to CPU0 regardless of what CPU we're sending
> >> CHANNELMSG_UNLOAD
> >> from. vmbus_wait_for_unload() doesn't account for the fact that in case
> >> we're crashing on some other CPU and CPU0 is still alive and operational
> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
> >> vmbus_connection.unload_event, our wait on the current CPU will never
> >> end.
> >
> > What was the host you were testing on?
> >
>
> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
> by forcing crash on a secondary CPU, e.g.:

Prior to 2012R2, all messages would be delivered on CPU0 and this includes CHANNELMSG_UNLOAD_RESPONSE.
For this reason we don't support kexec on pre-2012 R2 hosts. On 2012. From 2012 R2 on, all vmbus
messages (responses) will be delivered on the CPU that we initially set up - look at the code in
vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to CHANNELMSG_UNLOAD_RESPONSE
will be delivered on the CPU where we initiate the contact with the host - CHANNELMSG_INITIATE_CONTACT message.
So, maybe we can stash away the CPU on which we made the initial contact and poll the state on that CPU
to make forward progress in the case of crash.

Regards,

K. Y



2016-03-22 09:47:47

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

KY Srinivasan <[email protected]> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:[email protected]]
>> Sent: Monday, March 21, 2016 12:52 AM
>> To: KY Srinivasan <[email protected]>
>> Cc: [email protected]; [email protected]; Haiyang
>> Zhang <[email protected]>; Alex Ng (LIS) <[email protected]>;
>> Radim Krcmar <[email protected]>; Cathy Avery <[email protected]>
>> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>>
>> KY Srinivasan <[email protected]> writes:
>>
>> >> -----Original Message-----
>> >> From: Vitaly Kuznetsov [mailto:[email protected]]
>> >> Sent: Friday, March 18, 2016 5:33 AM
>> >> To: [email protected]
>> >> Cc: [email protected]; KY Srinivasan <[email protected]>;
>> >> Haiyang Zhang <[email protected]>; Alex Ng (LIS)
>> >> <[email protected]>; Radim Krcmar <[email protected]>; Cathy
>> >> Avery <[email protected]>
>> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>> >>
>> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is
>> always
>> >> delivered to CPU0 regardless of what CPU we're sending
>> >> CHANNELMSG_UNLOAD
>> >> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> >> we're crashing on some other CPU and CPU0 is still alive and operational
>> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> >> vmbus_connection.unload_event, our wait on the current CPU will never
>> >> end.
>> >
>> > What was the host you were testing on?
>> >
>>
>> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
>> by forcing crash on a secondary CPU, e.g.:
>
> Prior to 2012R2, all messages would be delivered on CPU0 and this includes CHANNELMSG_UNLOAD_RESPONSE.
> For this reason we don't support kexec on pre-2012 R2 hosts. On 2012. From 2012 R2 on, all vmbus
> messages (responses) will be delivered on the CPU that we initially set up - look at the code in
> vmbus_negotiate_version().

Ok, missed that. In that case we need to remember which CPU it was --
I'll add this in v2.

> So on post 2012 R2 hosts, the response to CHANNELMSG_UNLOAD_RESPONSE
> will be delivered on the CPU where we initiate the contact with the host - CHANNELMSG_INITIATE_CONTACT message.
> So, maybe we can stash away the CPU on which we made the initial contact and poll the state on that CPU
> to make forward progress in the case of crash.

Yes, we can't have any expectation about other CPUs on crash as they can
be in any state (crashing also, hanging on some mutex/spinlock/...,) so
we need to use current CPU only. I'll fix and resend.

Thanks!

--
Vitaly

2016-03-22 14:01:13

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

KY Srinivasan <[email protected]> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:[email protected]]
>> Sent: Monday, March 21, 2016 12:52 AM
>> To: KY Srinivasan <[email protected]>
>> Cc: [email protected]; [email protected]; Haiyang
>> Zhang <[email protected]>; Alex Ng (LIS) <[email protected]>;
>> Radim Krcmar <[email protected]>; Cathy Avery <[email protected]>
>> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>>
>> KY Srinivasan <[email protected]> writes:
>>
>> >> -----Original Message-----
>> >> From: Vitaly Kuznetsov [mailto:[email protected]]
>> >> Sent: Friday, March 18, 2016 5:33 AM
>> >> To: [email protected]
>> >> Cc: [email protected]; KY Srinivasan <[email protected]>;
>> >> Haiyang Zhang <[email protected]>; Alex Ng (LIS)
>> >> <[email protected]>; Radim Krcmar <[email protected]>; Cathy
>> >> Avery <[email protected]>
>> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>> >>
>> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is
>> always
>> >> delivered to CPU0 regardless of what CPU we're sending
>> >> CHANNELMSG_UNLOAD
>> >> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> >> we're crashing on some other CPU and CPU0 is still alive and operational
>> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> >> vmbus_connection.unload_event, our wait on the current CPU will never
>> >> end.
>> >
>> > What was the host you were testing on?
>> >
>>
>> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
>> by forcing crash on a secondary CPU, e.g.:
>
> Prior to 2012R2, all messages would be delivered on CPU0 and this includes CHANNELMSG_UNLOAD_RESPONSE.
> For this reason we don't support kexec on pre-2012 R2 hosts. On 2012. From 2012 R2 on, all vmbus
> messages (responses) will be delivered on the CPU that we initially set up - look at the code in
> vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to CHANNELMSG_UNLOAD_RESPONSE
> will be delivered on the CPU where we initiate the contact with the
> host - CHANNELMSG_INITIATE_CONTACT message.

Unfortunatelly there is a descrepancy between WS2012R2 and WS2016TP4. On
WS2012R2 what you're saying is true and all messages including
CHANNELMSG_UNLOAD_RESPONSE are delivered to the CPU we used for initial
contact. On WS2016TP4 CHANNELMSG_UNLOAD_RESPONSE seems to be a special
case and it is always delivered to CPU0, no matter which CPU we used for
initial contact. This can be a host bug. You can use the attached patch
to see the issue.

For now I can suggest we check message pages for all CPUs from
vmbus_wait_for_unload(). We can race with other CPUs again but we don't
care as we're checking for completion_done() in the loop as well. I'll
try this approach.

--
Vitaly


Attachments:
0001-Drivers-hv-vmbus-handle-various-crash-scenarios.patch (6.03 kB)

2016-03-22 14:18:18

by KY Srinivasan

[permalink] [raw]
Subject: RE: [PATCH] Drivers: hv: vmbus: handle various crash scenarios



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:[email protected]]
> Sent: Tuesday, March 22, 2016 7:01 AM
> To: KY Srinivasan <[email protected]>
> Cc: [email protected]; [email protected]; Haiyang
> Zhang <[email protected]>; Alex Ng (LIS) <[email protected]>;
> Radim Krcmar <[email protected]>; Cathy Avery <[email protected]>
> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>
> KY Srinivasan <[email protected]> writes:
>
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:[email protected]]
> >> Sent: Monday, March 21, 2016 12:52 AM
> >> To: KY Srinivasan <[email protected]>
> >> Cc: [email protected]; [email protected]; Haiyang
> >> Zhang <[email protected]>; Alex Ng (LIS)
> <[email protected]>;
> >> Radim Krcmar <[email protected]>; Cathy Avery
> <[email protected]>
> >> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
> >>
> >> KY Srinivasan <[email protected]> writes:
> >>
> >> >> -----Original Message-----
> >> >> From: Vitaly Kuznetsov [mailto:[email protected]]
> >> >> Sent: Friday, March 18, 2016 5:33 AM
> >> >> To: [email protected]
> >> >> Cc: [email protected]; KY Srinivasan <[email protected]>;
> >> >> Haiyang Zhang <[email protected]>; Alex Ng (LIS)
> >> >> <[email protected]>; Radim Krcmar <[email protected]>;
> Cathy
> >> >> Avery <[email protected]>
> >> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
> >> >>
> >> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is
> >> always
> >> >> delivered to CPU0 regardless of what CPU we're sending
> >> >> CHANNELMSG_UNLOAD
> >> >> from. vmbus_wait_for_unload() doesn't account for the fact that in
> case
> >> >> we're crashing on some other CPU and CPU0 is still alive and
> operational
> >> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there
> completing
> >> >> vmbus_connection.unload_event, our wait on the current CPU will
> never
> >> >> end.
> >> >
> >> > What was the host you were testing on?
> >> >
> >>
> >> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
> >> by forcing crash on a secondary CPU, e.g.:
> >
> > Prior to 2012R2, all messages would be delivered on CPU0 and this includes
> CHANNELMSG_UNLOAD_RESPONSE.
> > For this reason we don't support kexec on pre-2012 R2 hosts. On 2012.
> From 2012 R2 on, all vmbus
> > messages (responses) will be delivered on the CPU that we initially set up -
> look at the code in
> > vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to
> CHANNELMSG_UNLOAD_RESPONSE
> > will be delivered on the CPU where we initiate the contact with the
> > host - CHANNELMSG_INITIATE_CONTACT message.
>
> Unfortunatelly there is a descrepancy between WS2012R2 and WS2016TP4.
> On
> WS2012R2 what you're saying is true and all messages including
> CHANNELMSG_UNLOAD_RESPONSE are delivered to the CPU we used for
> initial
> contact. On WS2016TP4 CHANNELMSG_UNLOAD_RESPONSE seems to be a
> special
> case and it is always delivered to CPU0, no matter which CPU we used for
> initial contact. This can be a host bug. You can use the attached patch
> to see the issue.

This looks like a host bug and I will try to get is addressed before ws2016
ships.
>
> For now I can suggest we check message pages for all CPUs from
> vmbus_wait_for_unload(). We can race with other CPUs again but we don't
> care as we're checking for completion_done() in the loop as well. I'll
> try this approach.
Thank you.

K. Y

>
> --
> Vitaly