Subject: Re: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to
 reduce lock confliction
To: Joerg Roedel <joro@8bytes.org>
References: <1498484330-10840-1-git-send-email-thunder.leizhen@huawei.com>
 <1498484330-10840-2-git-send-email-thunder.leizhen@huawei.com>
 <20170822154142.GA19533@8bytes.org>
CC: Will Deacon <will.deacon@arm.com>,
        linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
        iommu <iommu@lists.linux-foundation.org>,
        Robin Murphy <robin.murphy@arm.com>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        Zefan Li <lizefan@huawei.com>, Xinwei Hu <huxinwei@huawei.com>,
        Tianhong Ding <dingtianhong@huawei.com>,
        Hanjun Guo <guohanjun@huawei.com>, John Garry <john.garry@huawei.com>
From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
Message-ID: <599CD89D.7080600@huawei.com>
Date: Wed, 23 Aug 2017 09:21:33 +0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.5.1
MIME-Version: 1.0
In-Reply-To: <20170822154142.GA19533@8bytes.org>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2850
Lines: 97


On 2017/8/22 23:41, Joerg Roedel wrote:
> On Mon, Jun 26, 2017 at 09:38:46PM +0800, Zhen Lei wrote:
>> -static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent)
>> +static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent, int optimize)
>>  {
>>  	if (queue_full(q))
>>  		return -ENOSPC;
>>  
>>  	queue_write(Q_ENT(q, q->prod), ent, q->ent_dwords);
>> -	queue_inc_prod(q);
>> +
>> +	/*
>> +	 * We don't want too many commands to be delayed, this may lead the
>> +	 * followed sync command to wait for a long time.
>> +	 */
>> +	if (optimize && (++q->nr_delay < CMDQ_MAX_DELAYED)) {
>> +		queue_inc_swprod(q);
>> +	} else {
>> +		queue_inc_prod(q);
>> +		q->nr_delay = 0;
>> +	}
>> +
>>  	return 0;
>>  }
>>  
>> @@ -909,6 +928,7 @@ static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu)
>>  static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>>  				    struct arm_smmu_cmdq_ent *ent)
>>  {
>> +	int optimize = 0;
>>  	u64 cmd[CMDQ_ENT_DWORDS];
>>  	unsigned long flags;
>>  	bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
>> @@ -920,8 +940,17 @@ static void arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu,
>>  		return;
>>  	}
>>  
>> +	/*
>> +	 * All TLBI commands should be followed by a sync command later.
>> +	 * The CFGI commands is the same, but they are rarely executed.
>> +	 * So just optimize TLBI commands now, to reduce the "if" judgement.
>> +	 */
>> +	if ((ent->opcode >= CMDQ_OP_TLBI_NH_ALL) &&
>> +	    (ent->opcode <= CMDQ_OP_TLBI_NSNH_ALL))
>> +		optimize = 1;
>> +
>>  	spin_lock_irqsave(&smmu->cmdq.lock, flags);
>> -	while (queue_insert_raw(q, cmd) == -ENOSPC) {
>> +	while (queue_insert_raw(q, cmd, optimize) == -ENOSPC) {
>>  		if (queue_poll_cons(q, false, wfe))
>>  			dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
>>  	}
> 
> This doesn't look correct. How do you make sure that a given IOVA range
> is flushed before the addresses are reused?
Hi, Joerg:
	It's actullay guaranteed by the upper layer functions, for example:
	static int arm_lpae_unmap(
        ...
    	unmapped = __arm_lpae_unmap(data, iova, size, lvl, ptep);	//__arm_lpae_unmap will indirectly call arm_smmu_cmdq_issue_cmd to invalidate tlbs
	if (unmapped)
		io_pgtable_tlb_sync(&data->iop);			//a tlb_sync wait all tlbi operations finished

	
	I also described it in the next patch(2/5). Showed below:

Some people might ask: Is it safe to do so? The answer is yes. The standard
processing flow is:
	alloc iova
	map
	process data
	unmap
	tlb invalidation and sync
	free iova

What should be guaranteed is: "free iova" action is behind "unmap" and "tlbi
operation" action, that is what we are doing right now. This ensures that:
all TLBs of an iova-range have been invalidated before the iova reallocated.

Best regards,
	LeiZhen

> 
> 
> Regards,
> 
> 	Joerg
> 
> 
> .
> 

-- 
Thanks!
BestRegards