Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4654734yba; Sun, 12 May 2019 18:57:42 -0700 (PDT) X-Google-Smtp-Source: APXvYqyAiJqvON2142olksZHD9InPvM5FcvFn33lKnLR2lXAkThxIxGMSyZRRPT4OyGmbm97SewB X-Received: by 2002:a63:7054:: with SMTP id a20mr15042591pgn.354.1557712662180; Sun, 12 May 2019 18:57:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557712662; cv=none; d=google.com; s=arc-20160816; b=BTBJXirnyoLdQBHO2pmBUhSsQv9NYp/y62/6z5XjesFvEjcLnIC3A1U+9BmSnJHVvF oB3dmKdndOj3qFhTFWTJlDiMPGOsRauKw3aTLrIniC5yQOjThUSxi8ibyk0OIEz+Dlrp l4/hKgniU4gaGdTRVkf4976Hz0Y7GHssPciLho8YF76C2y2v+ydlgDdJQB+LrXd1aS1a sPwtsF4hHSIJmVflQlZHkQy2+Be3ZhHtXpzRJhpR0TerwFJg/fcyxanoBwInwc5zDcm2 7pNYzTvH2Lj7L0aldupMkId4CQAZD19jhit7BRj8ZTtv6jycHxX8CCUarlO+1cvt/Ajz rhCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=jkmp9l20cwjrBVZJbbVyFJ551/bdSDfBVg5CN7JGNiY=; b=PtJkzR0aY0+3fU4npUaX1rMJVlWNPbD+E1mjEUM1jcaU5fhkAaumZuUrGS1wVUP4ws UzoXkG34rmzXW0eAKRSjaaciRal8AcPm43RirJirlSY1W3ZqR4dZrwKyU1IzxYAHhiTp o+soNWXhxYzAfcg/4v8pNTJSJHwE8AylKqDAb7EDZR858WpC9YlluyDyLxhxA7uOYexd 0BVvj21B3BfwclpqyYLi9AOdBTgiHzprFsj+OhOmkOvq5enXjp5sMvWrZTefdH6TByU8 c5yNmTn1OKIWJOTctoL4bd4N2w9UV3+1jB7M3kk7DUSwj9CktXYHS7fixew2chpRquoU 0gLw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 24si15993001pgt.474.2019.05.12.18.56.56; Sun, 12 May 2019 18:57:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727302AbfEMBc3 (ORCPT + 99 others); Sun, 12 May 2019 21:32:29 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:36300 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727054AbfEMBc2 (ORCPT ); Sun, 12 May 2019 21:32:28 -0400 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 5EFD7A7C056586BE2EBB; Mon, 13 May 2019 09:32:26 +0800 (CST) Received: from [127.0.0.1] (10.177.31.55) by DGGEMS409-HUB.china.huawei.com (10.3.19.209) with Microsoft SMTP Server id 14.3.439.0; Mon, 13 May 2019 09:32:15 +0800 Subject: Re: [RFC] irqchip/gic-its: fix command queue pointer comparison bug To: References: <1557581684-71297-1-git-send-email-guoheyi@huawei.com> CC: , Thomas Gleixner , "Jason Cooper" , Marc Zyngier From: Heyi Guo Message-ID: <07768321-46c3-170f-e92b-58ad29a655c5@huawei.com> Date: Mon, 13 May 2019 09:32:14 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <1557581684-71297-1-git-send-email-guoheyi@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.31.55] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sorry, this patch still has issue when the queue is almost full, for the sample window becomes very small. How about we calculating the delta of each time read and then accumulating it to compare with the to_idr? This can guarantee to get real rd_idx with more than 1 wraps. Thanks, Heyi On 2019/5/11 21:34, Heyi Guo wrote: > When we run several VMs with PCI passthrough and GICv4 enabled, not > pinning vCPUs, we will occasionally see below warnings in dmesg: > > ITS queue timeout (65440 65504 480) > ITS cmd its_build_vmovp_cmd failed > > The reason for the above issue is that in BUILD_SINGLE_CMD_FUNC: > 1. Post the write command. > 2. Release the lock. > 3. Start to read GITS_CREADR to get the reader pointer. > 4. Compare the reader pointer to the target pointer. > 5. If reader pointer does not reach the target, sleep 1us and continue > to try. > > If we have several processors running the above concurrently, other > CPUs will post write commands while the 1st CPU is waiting the > completion. So we may have below issue: > > phase 1: > ---rd_idx-----from_idx-----to_idx--0--------- > > wait 1us: > > phase 2: > --------------from_idx-----to_idx--0-rd_idx-- > > That is the rd_idx may fly ahead of to_idx, and if in case to_idx is > near the wrap point, rd_idx will wrap around. So the below condition > will not be met even after 1s: > > if (from_idx < to_idx && rd_idx >= to_idx) > > There is another theoretical issue. For a slow and busy ITS, the > initial rd_idx may fall behind from_idx a lot, just as below: > > ---rd_idx---0--from_idx-----to_idx----------- > > This will cause the wait function exit too early. > > Actually, it does not make much sense to use from_idx to judge if > to_idx is wrapped, but we need a initial rd_idx when lock is still > acquired, and it can be used to judge whether to_idx is wrapped and > the current rd_idx is wrapped. > > Cc: Thomas Gleixner > Cc: Jason Cooper > Cc: Marc Zyngier > > Signed-off-by: Heyi Guo > --- > > This patch has only been tested on 4.19.36, for my NIC device driver has > something wrong with mainline kernel, so I mark it as a RFC until test has been > done upon mainline kernel. > > drivers/irqchip/irq-gic-v3-its.c | 22 ++++++++++++---------- > 1 file changed, 12 insertions(+), 10 deletions(-) > > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c > index 7577755..d14f3fbc 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -745,30 +745,30 @@ static void its_flush_cmd(struct its_node *its, struct its_cmd_block *cmd) > } > > static int its_wait_for_range_completion(struct its_node *its, > - struct its_cmd_block *from, > + u64 origin_rd_idx, > struct its_cmd_block *to) > { > - u64 rd_idx, from_idx, to_idx; > + u64 rd_idx, to_idx; > u32 count = 1000000; /* 1s! */ > > - from_idx = its_cmd_ptr_to_offset(its, from); > to_idx = its_cmd_ptr_to_offset(its, to); > + if (to_idx < origin_rd_idx) > + to_idx += ITS_CMD_QUEUE_SZ; > > while (1) { > rd_idx = readl_relaxed(its->base + GITS_CREADR); > > - /* Direct case */ > - if (from_idx < to_idx && rd_idx >= to_idx) > - break; > + /* Wrap around for CREADR */ > + if (rd_idx < origin_rd_idx) > + rd_idx += ITS_CMD_QUEUE_SZ; > > - /* Wrapped case */ > - if (from_idx >= to_idx && rd_idx >= to_idx && rd_idx < from_idx) > + if (rd_idx >= to_idx) > break; > > count--; > if (!count) { > pr_err_ratelimited("ITS queue timeout (%llu %llu %llu)\n", > - from_idx, to_idx, rd_idx); > + origin_rd_idx, to_idx, rd_idx); > return -1; > } > cpu_relax(); > @@ -787,6 +787,7 @@ void name(struct its_node *its, \ > struct its_cmd_block *cmd, *sync_cmd, *next_cmd; \ > synctype *sync_obj; \ > unsigned long flags; \ > + u64 rd_idx; \ > \ > raw_spin_lock_irqsave(&its->lock, flags); \ > \ > @@ -808,10 +809,11 @@ void name(struct its_node *its, \ > } \ > \ > post: \ > + rd_idx = readl_relaxed(its->base + GITS_CREADR); \ > next_cmd = its_post_commands(its); \ > raw_spin_unlock_irqrestore(&its->lock, flags); \ > \ > - if (its_wait_for_range_completion(its, cmd, next_cmd)) \ > + if (its_wait_for_range_completion(its, rd_idx, next_cmd)) \ > pr_err_ratelimited("ITS cmd %ps failed\n", builder); \ > } >