Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Subject: Re: [PATCH v2] irqchip/gic-v3-its: fix ITS queue timeout
To:     Marc Zyngier <marc.zyngier@arm.com>, <julien.thierry@arm.com>
References: <1528252824-15144-1-git-send-email-yangyingliang@huawei.com>
 <0ebd8eef-1a86-3c7e-cd3b-f9580c497b5c@arm.com>
CC:     <linux-kernel@vger.kernel.org>,
        <linux-arm-kernel@lists.infradead.org>,
        Hanjun Guo <guohanjun@huawei.com>
From:   Yang Yingliang <yangyingliang@huawei.com>
Message-ID: <5B1E5BBE.9010907@huawei.com>
Date:   Mon, 11 Jun 2018 19:23:42 +0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.5.1
MIME-Version: 1.0
In-Reply-To: <0ebd8eef-1a86-3c7e-cd3b-f9580c497b5c@arm.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

Hi, Marc

On 2018/6/11 17:31, Marc Zyngier wrote:
> On 06/06/18 03:40, Yang Yingliang wrote:
>> When the kernel booted with maxcpus=x, 'x' is smaller
>> than actual cpu numbers, the TAs of offline cpus won't
>> be set to its->collection.
>>
>> If LPI is bind to offline cpu, sync cmd will use zero TA,
>> it leads to ITS queue timeout.  Fix this by choosing a
>> online cpu, if there is no online cpu in cpu_mask.
>>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>> ---
>>   drivers/irqchip/irq-gic-v3-its.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index 5416f2b..d8b9539 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -2309,7 +2309,9 @@ static int its_irq_domain_activate(struct irq_domain *domain,
>>   		cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>>   
>>   	/* Bind the LPI to the first possible CPU */
>> -	cpu = cpumask_first(cpu_mask);
>> +	cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
>> +	if (cpu >= nr_cpu_ids)
>> +		cpu = cpumask_first(cpu_online_mask);
> I've thought about this one a bit more, and apart from breaking TX1
> in a very bad way, I think it is actually correct. It is just that
> the commit message doesn't make much sense.
>
> The way I understand it is:
> - this is a NUMA system, with at least one node not online
> - the SRAT table indicates that this ITS is local to an offline node
Yes, your comment is more proper and correct. Mine describes how the BUG 
happens.
I will send a v3 later with proper comment.

Thanks,
Yang
>
> In that case, we need to pick an online CPU, and any will do (again,
> ignoring the silly Cavium erratum). Explained like this, the above
> hunk is sensible, and just needs to handle the TX1 quirk. Something like:
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 5416f2b2ac21..21b7b5151177 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -2309,7 +2309,13 @@ static int its_irq_domain_activate(struct irq_domain *domain,
>   		cpu_mask = cpumask_of_node(its_dev->its->numa_node);
>   
>   	/* Bind the LPI to the first possible CPU */
> -	cpu = cpumask_first(cpu_mask);
> +	cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
> +	if (cpu >= nr_cpu_idx) {
> +		if (its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144)
> +			return -EINVAL;
> +
> +		cpu = cpumask_first(cpu_online_mask);
> +	}
>   	its_dev->event_map.col_map[event] = cpu;
>   	irq_data_update_effective_affinity(d, cpumask_of(cpu));
>   
>
>>   	its_dev->event_map.col_map[event] = cpu;
>>   	irq_data_update_effective_affinity(d, cpumask_of(cpu));
>>   
>> @@ -2466,7 +2468,10 @@ static int its_vpe_set_affinity(struct irq_data *d,
>>   				bool force)
>>   {
>>   	struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
>> -	int cpu = cpumask_first(mask_val);
>> +	int cpu = cpumask_first_and(mask_val, cpu_online_mask);
>> +
>> +	if (cpu >= nr_cpu_ids)
>> +		cpu = cpumask_first(cpu_online_mask);
>>   
>>   	/*
>>   	 * Changing affinity is mega expensive, so let's be as lazy as
>>
> This hunk, on the other hand, is completely useless. Look how this is
> called from vgic_v4_flush_hwstate():
>
> 	err = irq_set_affinity(irq, cpumask_of(smp_processor_id()));
>
> The mask is always that of the CPU we run on, and we're in a non-premptible
> section. So no way we can be targeting an offline CPU.
>
> If you quickly respin this patch with a decent commit log, I'll take it.
>
> Thanks,
>
> 	M.