From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: KY Srinivasan <kys@microsoft.com>
Cc: "devel\@linuxdriverproject.org" <devel@linuxdriverproject.org>,
        "Van De Ven\, Arjan" <arjan.van.de.ven@intel.com>,
        "linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "Haiyang Zhang" <haiyangz@microsoft.com>
Subject: Re: [PATCH] Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
References: <1477480307-5546-1-git-send-email-vkuznets@redhat.com>
        <DM5PR03MB24903B634D9D64C7DD32E302A0AB0@DM5PR03MB2490.namprd03.prod.outlook.com>
Date: Mon, 31 Oct 2016 11:04:52 +0100
In-Reply-To: <DM5PR03MB24903B634D9D64C7DD32E302A0AB0@DM5PR03MB2490.namprd03.prod.outlook.com>
        (KY Srinivasan's message of "Wed, 26 Oct 2016 19:52:16 +0000")
Message-ID: <87shrcbzqz.fsf@vitty.brq.redhat.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2571
Lines: 56

KY Srinivasan <kys@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Wednesday, October 26, 2016 4:12 AM
>> To: devel@linuxdriverproject.org
>> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
>> Haiyang Zhang <haiyangz@microsoft.com>
>> Subject: [PATCH] Drivers: hv: vmbus: Raise retry/wait limits in
>> vmbus_post_msg()
>> 
>> DoS protection conditions were altered in WS2016 and now it's easy to get
>> -EAGAIN returned from vmbus_post_msg() (e.g. when we try changing MTU
>> on a
>> netvsc device in a loop). All vmbus_post_msg() callers don't retry the
>> operation and we usually end up with a non-functional device or crash.
>> 
>> While host's DoS protection conditions are unknown to me my tests show
>> that
>> it can take up to 46 attempts to send a message after changing udelay() to
>> mdelay() and caping msec at '256', this means we can wait up to 10 seconds
>> before the message is sent so we need to use msleep() instead. Almost all
>> vmbus_post_msg() callers are ready to sleep but there is one special case:
>> vmbus_initiate_unload() which can be called from interrupt/NMI context
>> and
>> we can't sleep there. I'm also not sure about the lonely
>> vmbus_send_tl_connect_request() which has no in-tree users but its
>> external
>> users are most likely waiting for the host to reply so sleeping there is
>> also appropriate.
>
> Vitaly,
>
> One of the reasons why the delay was in microseconds was to make sure that the boot time
> was not adversely affected by the delay we had in setting up the channel. The change to microsecond
> delay and other changes in this code reduced the time it took to initialize netvsc from 
> 200 milliseconds to about 12 milliseconds. This is important for us as we look at achieving sub-second
> boot times.
> The situation you are trying to address are test cases where you are hitting the host with
> requests that triggers hosts DOS prevention code. Perhaps we could have a hybrid approach: we
> retain microsecond wait until we hit a threshold and then we use millisecond delays. This way, the normal boot
> path is still fast while we can handle some of the other cases where the host DOS prevention code kicks in.
>

Ok,

I actually tested boot time with my patch and didn't see a difference
(so I guess our first attempt to send messages usually succeeds) but if
we're concearned about less-than-a-second boot time we'd rather keep the
microseonds delay for first several attempts. I'll do v2.

Thanks,


-- 
  Vitaly