From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Radim Krcmar <rkrcmar@redhat.com>
Cc: devel@linuxdriverproject.org, linux-kernel@vger.kernel.org,
        "K. Y. Srinivasan" <kys@microsoft.com>,
        Haiyang Zhang <haiyangz@microsoft.com>, Alex Ng <alexng@microsoft.com>,
        Cathy Avery <cavery@redhat.com>
Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
References: <1458304404-8347-1-git-send-email-vkuznets@redhat.com>
	<20160318152037.GK20310@potion.brq.redhat.com>
Date: Fri, 18 Mar 2016 16:53:37 +0100
In-Reply-To: <20160318152037.GK20310@potion.brq.redhat.com> (Radim Krcmar's
	message of "Fri, 18 Mar 2016 16:20:37 +0100")
Message-ID: <87d1qr3hq6.fsf@vitty.brq.redhat.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4678
Lines: 132

Radim Krcmar <rkrcmar@redhat.com> writes:

> 2016-03-18 13:33+0100, Vitaly Kuznetsov:
>> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
>> delivered to CPU0 regardless of what CPU we're sending CHANNELMSG_UNLOAD
>> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> we're crashing on some other CPU and CPU0 is still alive and operational
>> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> vmbus_connection.unload_event, our wait on the current CPU will never
>> end.
>
> (Any chance of learning about this behavior from the spec?)
>
>> Do the following:
>> 1) Check for completion_done() in the loop. In case interrupt handler is
>>    still alive we'll get the confirmation we need.
>> 
>> 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE will be
>>    delivered there. We can race with still-alive interrupt handler doing
>>    the same but we don't care as we're checking completion_done() now.
>
> (Yeah, seems better than hv_setup_vmbus_irq(NULL) or other hacks.)
>
>> 3) Cleanup message pages on all CPUs. This is required (at least for the
>>    current CPU as we're clearing CPU0 messages now but we may want to bring
>>    up additional CPUs on crash) as new messages won't be delivered till we
>>    consume what's pending. On boot we'll place message pages somewhere else
>>    and we won't be able to read stale messages.
>
> What if HV already set the pending message bit on current message,
> do we get any guarantees that clearing once after UNLOAD_RESPONSE is
> enough?

I think so but I'd like to get a confirmation from K.Y./Alex/Haiyang.

>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>
> I had a question about NULL below.  (Parenthesised rants aren't related
> to r-b tag. ;)
>
>>  drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++-----
>>  1 file changed, 25 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index b10e8f74..5f37057 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel *channel, const uuid_le *type_gui
>>  
>>  static void vmbus_wait_for_unload(void)
>>  {
>> -	int cpu = smp_processor_id();
>> -	void *page_addr = hv_context.synic_message_page[cpu];
>> +	int cpu;
>> +	void *page_addr = hv_context.synic_message_page[0];
>>  	struct hv_message *msg = (struct hv_message *)page_addr +
>>  				  VMBUS_MESSAGE_SINT;
>>  	struct vmbus_channel_message_header *hdr;
>>  	bool unloaded = false;
>>  
>> -	while (1) {
>> +	/*
>> +	 * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0. When we're
>> +	 * crashing on a different CPU let's hope that IRQ handler on CPU0 is
>> +	 * still functional and vmbus_unload_response() will complete
>> +	 * vmbus_connection.unload_event. If not, the last thing we can do is
>> +	 * read message page for CPU0 regardless of what CPU we're on.
>> +	 */
>> +	while (!unloaded) {
>
> (I'd feel a bit safer if this was bounded by some timeout, but all
>  scenarios where this would make a difference are unplausible ...
>  queue_work() not working while the rest is fine is the best one.)
>
>> +		if (completion_done(&vmbus_connection.unload_event)) {
>> +			unloaded = true;
>
> (No need to set unloaded when you break.)
>
>> +			break;
>> +		}
>> +
>>  		if (READ_ONCE(msg->header.message_type) == HVMSG_NONE) {
>>  			mdelay(10);
>>  			continue;
>> @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)
>
> (I'm not a huge fan of the unloaded variable; what about remembering the
>  header/msgtype here ...
>
>>  			unloaded = true;
>>  
>>  		vmbus_signal_eom(msg);
>
>  and checking its value here?)
>

Sure, but we'll have to use a variable for that ... why would it be
better than 'unloaded'?

>> +	}
>>  
>> -		if (unloaded)
>> -			break;
>> +	/*
>> +	 * We're crashing and already got the UNLOAD_RESPONSE, cleanup all
>> +	 * maybe-pending messages on all CPUs to be able to receive new
>> +	 * messages after we reconnect.
>> +	 */
>> +	for_each_online_cpu(cpu) {
>> +		page_addr = hv_context.synic_message_page[cpu];
>
> Can't this be NULL?

It can't, we allocate it from hv_synic_alloc() (and we don't support cpu
onlining/offlining on WS2012R2+).

>
>> +		msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
>> +		msg->header.message_type = HVMSG_NONE;
>>  	}
>
> (And, this block belongs to a separate function. ;])

I thought about moving it to hv_crash_handler() but then I decided to
leave it here as the need for this fixup is rather an artifact of how we
recieve the message. But I'm flexible here)

-- 
  Vitaly