DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D28B20851
Subject: Re: [PATCH v1] svcrdma: Optimize the logic that selects the R_key to
 invalidate
To:     Chuck Lever <chuck.lever@oracle.com>
Cc:     linux-rdma@vger.kernel.org,
        Linux NFS Mailing List <linux-nfs@vger.kernel.org>
References: <20181127161016.6997.69002.stgit@klimt.1015granger.net>
 <14c5a1e8-115b-a58b-7c65-4e207caf3d33@talpey.com>
 <0E1C5F18-C0E8-43D2-AF21-B6DCC84E302C@oracle.com>
 <45f70f31-997b-ab1e-9430-57f2e0d78318@talpey.com>
 <79BDA67D-4B6E-423F-BAF3-ADE5E703B5BC@oracle.com>
From:   Tom Talpey <tom@talpey.com>
Message-ID: <85f9cad6-9833-a193-44d0-b0397cafbcd7@talpey.com>
Date:   Tue, 27 Nov 2018 20:13:14 -0500
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <79BDA67D-4B6E-423F-BAF3-ADE5E703B5BC@oracle.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk

On 11/27/2018 5:23 PM, Chuck Lever wrote:
> 
> 
>> On Nov 27, 2018, at 4:30 PM, Tom Talpey <tom@talpey.com> wrote:
>>
>> On 11/27/2018 4:21 PM, Chuck Lever wrote:
>>>> On Nov 27, 2018, at 4:16 PM, Tom Talpey <tom@talpey.com> wrote:
>>>>
>>>> On 11/27/2018 11:11 AM, Chuck Lever wrote:
>>>>> o Select the R_key to invalidate while the CPU cache still contains
>>>>>    the received RPC Call transport header, rather than waiting until
>>>>>    we're about to send the RPC Reply.
>>>>> o Choose Send With Invalidate if there is exactly one distinct R_key
>>>>>    in the received transport header. If there's more than one, the
>>>>>    client will have to perform local invalidation after it has
>>>>>    already waited for remote invalidation.
>>>>
>>>> What's the reason for remote-invalidating only if exactly one
>>>> region is targeted? It seems valuable to save the client the work,
>>>> no matter how many regions are used.
>>> Because remote invalidation delays the Receive completion.
>>
>> Well yes, but the invalidations have to happen before the reply is
>> processed, and remote invalidation saves a local work request plus
>> its completion.
> 
> That is true only if remote invalidation can knock down all the
> R_keys for that RPC. If there's more than one R_key for that RPC,
> a local invalidation is needed anyway, and there's no savings but
> rather there is a cost of the extra latency of waiting twice.
> 
> A couple of details to note:
> - remote invalidation is only available with FRWR, which
>    invalidates asynchronously
> - a smart FRWR client implementation will post a chain of LOCAL
>    INV WRs, then wait for the last one to signal completion. That's
>    just one doorbell, one interrupt, and one context switch no
>    matter how many LOCAL INV WRs are needed.
> 
> So if the client still has to do even one local invalidation, it's
> not worth the trouble to remotely invalidate.

I still don't agree about "not worth" it, but it's a choice.

Just a couple of other notes:

>> Have you measured the difference?
> 
> Yes, as reported in the patch description. Perhaps I can include
> some interesting iozone results.

I didn't see anything about this in the patch description, but I was
not arguing for including this kind of detail, just whether you had
actually measured it. I'm interested in that, btw.

> This behavior seems to be a typical feature of most recent hardware.
> I suspect there's some locking of the hardware Send queue to handle
> RI that contends with actual posted WRs from the host.

That would be really bad. Have you reported this to the vendors?

> With cards that have a shallow FR depth, multiple MRs/R_keys are
> required to register a single 1MB NFS READ or WRITE. Here's where
> squelching remote invalidation really pays off.

Sure, but such a bandwidth-dominated workload isn't very interesting
performance-wise. With 1MB ops I would expect you to be wire-limited,
right?

>> I think it would be best to capture some or all of this
>> explanation in the commit message, in any case.
> 
> You mean you want my patch description to explain _why_ ? ;-)

Sort of. My belief is that this decision represents a micro-optimization
and is unlikely to be forever true. More significantly, it's not a
bug fix or correctness issue. So, capturing the reasoning behind
it is useful for the future, in case someone thinks to unwind it.

Tom.