Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp615020pxt; Thu, 5 Aug 2021 07:37:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxe8f5PxLSqacO8VHVWqhH+CkWwkNqZDsEzjDcAz3/q15S59kYp6vHYdxGDB8xXYhPVnC9H X-Received: by 2002:aa7:c50a:: with SMTP id o10mr6924596edq.118.1628174271715; Thu, 05 Aug 2021 07:37:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628174271; cv=none; d=google.com; s=arc-20160816; b=MyLcK5IDRPGi3LyAU+bGLQJfGD4lyrZyEj1Ytv3/xcdydeeKckCxjc+XAP/qMTFE60 OULUB4xXxg3i9lU7GbAEdcYU3U0jcAtEK7seeH581kGytri+Hyh9ruXdIF8EGPehR5im v1/T6JA4jz9EvII3Fp3pBXTqSEhRsdJeQC0Sv5rJTlPQacKLnPuVCRe0ZOJePrs9wwrt sKcEn9qe4Grn1Yo0+TaVuZwFWmwel3UuR9Etvb9co92trSStQA8+DYCmf+u+UPHntWST gRv6KqJPmorpX4h8fxL64a3dnRGv1PjGNEq4y3AdzLL0vk7uyl16toXPdl/iIRKTux6w bhXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=GyXRkY8FU8a0zDCKMgfKloMiePLs1vPHo/+9Mx3SlQw=; b=mNIMnQtZZh175Ca/lce7hgDzFp2YXccx5c3ORVmtqwgN79nPKXW2pVMxrjLwLXaeFp 4rQf+0M5v9zQxgLl2K+4U+pGz9B2hDQ4zZ3uxQ90fZez6JKVkSIo5Wn7fqH5ZlKhagQ7 LcA7jEASQJlaG/1CaNEt1GEX7bQzAV1QO5ePJFjYH5KKFPw2jH9s4557V68GLShhFoF1 g02E7/Vo7bWnhzOVyFiEUXqKuW14oIglDGUIwLBiPK5eSJxAa2h9db7H7NIuxFl2UNVx Wx1UZ7CdNOw/ajuY+2r2x4Q95pTmJAJ7S48YpEVgf6Ww2N9wiB27p6cGNoFr6urTNUt2 9SEA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l3si6402158ejo.634.2021.08.05.07.37.27; Thu, 05 Aug 2021 07:37:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239457AbhHENky (ORCPT + 99 others); Thu, 5 Aug 2021 09:40:54 -0400 Received: from frasgout.his.huawei.com ([185.176.79.56]:3596 "EHLO frasgout.his.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235812AbhHENkx (ORCPT ); Thu, 5 Aug 2021 09:40:53 -0400 Received: from fraeml710-chm.china.huawei.com (unknown [172.18.147.200]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4GgV9H5QV2z6F8WF; Thu, 5 Aug 2021 21:40:19 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml710-chm.china.huawei.com (10.206.15.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Thu, 5 Aug 2021 15:40:37 +0200 Received: from [10.47.24.8] (10.47.24.8) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Thu, 5 Aug 2021 14:40:36 +0100 Subject: Re: [PATCH] iommu/arm-smmu-v3: Remove some unneeded init in arm_smmu_cmdq_issue_cmdlist() To: Robin Murphy , CC: , , , , References: <1624293394-202509-1-git-send-email-john.garry@huawei.com> From: John Garry Message-ID: <45a8af4f-4202-ecd8-0882-507acf9b2eb2@huawei.com> Date: Thu, 5 Aug 2021 14:40:05 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.47.24.8] X-ClientProxiedBy: lhreml706-chm.china.huawei.com (10.201.108.55) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/08/2021 12:24, Robin Murphy wrote: > On 2021-06-21 17:36, John Garry wrote: >> Members of struct "llq" will be zero-inited, apart from member >> max_n_shift. >> But we write llq.val straight after the init, so it was pointless to zero >> init those other members. As such, separately init member max_n_shift >> only. >> >> In addition, struct "head" is initialised to "llq" only so that member >> max_n_shift is set. But that member is never referenced for "head", so >> remove any init there. >> >> Removing these initializations is seen as a small performance >> optimisation, >> as this code is (very) hot path. > > I looked at this and immediately thought "surely the compiler can see > that all the prod/cons/val fields are written anyway and elide the > initialisation?", so I dumped the before and after disassembly, and... oh. > > You should probably clarify that it's zero-initialising all the > cacheline padding which is both pointless and painful. With that, > > Reviewed-by: Robin Murphy > > However, having looked this closely I'm now tangentially wondering why > max_n_shift isn't inside the padded union? It's read at the same time as > both prod and cons by queue_has_space(), and never updated, so there > doesn't appear to be any benefit to it being in a separate cacheline all > by itself, and llq is already twice as big as it needs to be. I think that the problem is if the prod+cons 64b value and the shift are on the same cacheline, then we have a chance of accessing a stale cacheline twice: static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, u64 *cmds, int n, bool sync) { u64 cmd_sync[CMDQ_ENT_DWORDS]; u32 prod; unsigned long flags; bool owner; struct arm_smmu_cmdq *cmdq = &smmu->cmdq; struct arm_smmu_ll_queue llq = { .max_n_shift = cmdq->q.llq.max_n_shift, // here }, head = llq; int ret = 0; /* 1. Allocate some space in the queue */ local_irq_save(flags); llq.val = READ_ONCE(cmdq->q.llq.val); // and again here since cmdq->q.llq is per-SMMU. If max_n_shift is on a separate cacheline, then it should never be stale. I suppose they could be combined into a smaller sub-struct and loaded in a single operation, but it looks messy, and prob without much gain. Thanks, John > Sorting > that would also be a good opportunity to store the value of interest in > its appropriate form so we're not needlessly recalculating 1 << shift > every flippin' time... > > Robin. > >> Signed-off-by: John Garry >> >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> index 54b2f27b81d4..8a8ad49bb7fd 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> @@ -727,11 +727,11 @@ static int arm_smmu_cmdq_issue_cmdlist(struct >> arm_smmu_device *smmu, >>       unsigned long flags; >>       bool owner; >>       struct arm_smmu_cmdq *cmdq = &smmu->cmdq; >> -    struct arm_smmu_ll_queue llq = { >> -        .max_n_shift = cmdq->q.llq.max_n_shift, >> -    }, head = llq; >> +    struct arm_smmu_ll_queue llq, head; >>       int ret = 0; >> +    llq.max_n_shift = cmdq->q.llq.max_n_shift; >> + >>       /* 1. Allocate some space in the queue */ >>       local_irq_save(flags); >>       llq.val = READ_ONCE(cmdq->q.llq.val); >> > .