Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Subject: Re: [PATCH 4/4] crypto: hisilicon/sec2 - Add pbuffer mode for SEC
 driver
To:     Jonathan Cameron <Jonathan.Cameron@Huawei.com>
References: <1582189495-38051-1-git-send-email-xuzaibo@huawei.com>
 <1582189495-38051-5-git-send-email-xuzaibo@huawei.com>
 <20200224140154.00005967@Huawei.com>
 <80ab5da7-eceb-920e-dc36-1d411ad57a09@huawei.com>
 <20200225151426.000009f5@Huawei.com>
 <1fa85493-0e56-745e-2f24-5a12c2fec496@huawei.com>
 <20200226143037.00007ab0@Huawei.com>
CC:     <herbert@gondor.apana.org.au>, <davem@davemloft.net>,
        <qianweili@huawei.com>, <tanghui20@huawei.com>,
        <forest.zhouchang@huawei.com>, <linuxarm@huawei.com>,
        <zhangwei375@huawei.com>, <shenyang39@huawei.com>,
        <yekai13@huawei.com>, <linux-crypto@vger.kernel.org>
From:   Xu Zaibo <xuzaibo@huawei.com>
Message-ID: <bbb04877-dce7-9935-6c2a-87ec0e9485c5@huawei.com>
Date:   Thu, 27 Feb 2020 09:13:06 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.7.1
MIME-Version: 1.0
In-Reply-To: <20200226143037.00007ab0@Huawei.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk

Hi,
On 2020/2/26 22:30, Jonathan Cameron wrote:
> On Wed, 26 Feb 2020 19:18:51 +0800
> Xu Zaibo <xuzaibo@huawei.com> wrote:
>
>> Hi,
>> On 2020/2/25 23:14, Jonathan Cameron wrote:
>>> On Tue, 25 Feb 2020 11:16:52 +0800
>>> Xu Zaibo <xuzaibo@huawei.com> wrote:
>>>   
>>>> Hi,
>>>>
>>>>
>>>> On 2020/2/24 22:01, Jonathan Cameron wrote:
>>>>> On Thu, 20 Feb 2020 17:04:55 +0800
>>>>> Zaibo Xu <xuzaibo@huawei.com> wrote:
>>>>>    
>>>>>   
>> [...]
>>>>>>     
>>>>>> +static void sec_free_pbuf_resource(struct device *dev, struct sec_alg_res *res)
>>>>>> +{
>>>>>> +	if (res->pbuf)
>>>>>> +		dma_free_coherent(dev, SEC_TOTAL_PBUF_SZ,
>>>>>> +				  res->pbuf, res->pbuf_dma);
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> + * To improve performance, pbuffer is used for
>>>>>> + * small packets (< 576Bytes) as IOMMU translation using.
>>>>>> + */
>>>>>> +static int sec_alloc_pbuf_resource(struct device *dev, struct sec_alg_res *res)
>>>>>> +{
>>>>>> +	int pbuf_page_offset;
>>>>>> +	int i, j, k;
>>>>>> +
>>>>>> +	res->pbuf = dma_alloc_coherent(dev, SEC_TOTAL_PBUF_SZ,
>>>>>> +				&res->pbuf_dma, GFP_KERNEL);
>>>>> Would it make more sense perhaps to do this as a DMA pool and have
>>>>> it expand on demand?
>>>> Since there exist all kinds of buffer length, I think dma_alloc_coherent
>>>> may be better?
>>> As it currently stands we allocate a large buffer in one go but ensure
>>> we only have a single dma map that occurs at startup.
>>>
>>> If we allocate every time (don't use pbuf) performance is hit by
>>> the need to set up the page table entries and flush for every request.
>>>
>>> A dma pool with a fixed size element would at worst (for small messages)
>>> mean you had to do a dma map / unmap every time 6 ish buffers.
>>> This would only happen if you filled the whole queue.  Under normal operation
>>> you will have a fairly steady number of buffers in use at a time, so mostly
>>> it would be reusing buffers that were already mapped from a previous request.
>> Agree, dma pool may give a smaller range of mapped memory, which may
>> increase hits
>> of IOMMU TLB.
>>> You could implement your own allocator on top of dma_alloc_coherent but it'll
>>> probably be a messy and cost you more than using fixed size small elements.
>>>
>>> So a dmapool here would give you a mid point between using lots of memory
>>> and never needing to map/unmap vs map/unmap every time.
>>>   
>> My concern is the spinlock of DMA pool, which adds an exclusion between
>> sending requests
>> and receiving responses, since DMA blocks are allocated as sending and
>> freed at receiving.
> Agreed.  That may be a bottleneck.  Not clear to me whether that would be a
> significant issue or not.
>
Anyway, we will test the performance of DMA pool to get a better solution.

Thanks,
Zaibo

.
>
>
>> Thanks,
>> Zaibo
>>
>> .
>>>>>      
>>>>>> +	if (!res->pbuf)
>>>>>> +		return -ENOMEM;
>>>>>> +
>>>>>> +	/*
>>>>>> +	 * SEC_PBUF_PKG contains data pbuf, iv and
>>>>>> +	 * out_mac : <SEC_PBUF|SEC_IV|SEC_MAC>
>>>>>> +	 * Every PAGE contains six SEC_PBUF_PKG
>>>>>> +	 * The sec_qp_ctx contains QM_Q_DEPTH numbers of SEC_PBUF_PKG
>>>>>> +	 * So we need SEC_PBUF_PAGE_NUM numbers of PAGE
>>>>>> +	 * for the SEC_TOTAL_PBUF_SZ
>>>>>> +	 */
>>>>>> +	for (i = 0; i <= SEC_PBUF_PAGE_NUM; i++) {
>>>>>> +		pbuf_page_offset = PAGE_SIZE * i;
>>>>>> +		for (j = 0; j < SEC_PBUF_NUM; j++) {
>>>>>> +			k = i * SEC_PBUF_NUM + j;
>>>>>> +			if (k == QM_Q_DEPTH)
>>>>>> +				break;
>>>>>> +			res[k].pbuf = res->pbuf +
>>>>>> +				j * SEC_PBUF_PKG + pbuf_page_offset;
>>>>>> +			res[k].pbuf_dma = res->pbuf_dma +
>>>>>> +				j * SEC_PBUF_PKG + pbuf_page_offset;
>>>>>> +		}
>>>>>> +	}
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>> [...]
>>
>
> .
>