Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751734AbdINTH4 (ORCPT ); Thu, 14 Sep 2017 15:07:56 -0400 Received: from mail-he1eur01on0087.outbound.protection.outlook.com ([104.47.0.87]:45874 "EHLO EUR01-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751641AbdINTHz (ORCPT ); Thu, 14 Sep 2017 15:07:55 -0400 From: Roy Pledge To: Catalin Marinas CC: Leo Li , "linuxppc-dev@lists.ozlabs.org" , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "mark.rutland@arm.com" , "arnd@arndb.de" , "Madalin-cristian Bucur" , "linux@armlinux.org.uk" , "oss@buserror.net" Subject: Re: [v4 07/11] soc/fsl/qbman: Rework portal mapping calls for ARM/PPC Thread-Topic: [v4 07/11] soc/fsl/qbman: Rework portal mapping calls for ARM/PPC Thread-Index: AQHTHRjndaeFq7ULeU2tESSqfBsMJg== Date: Thu, 14 Sep 2017 19:07:50 +0000 Message-ID: References: <1503607075-28970-1-git-send-email-roy.pledge@nxp.com> <1503607075-28970-8-git-send-email-roy.pledge@nxp.com> <20170914140014.taz7qwphfqm66kw7@localhost> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=roy.pledge@nxp.com; x-originating-ip: [192.88.168.1] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DB6PR04MB3253;6:cYlhkj66lLNJNK3OYicsxM2cmZatlDc7T9ddQozb4ptFOdZCemctODyg9Ocese8DbREqA99wnUjF7TJ54pTv/KiABfbPqxnxZ8e4VWBywu0NLxDOPi/5C3kM/9kx0wv3EEnDcNBJFQnq5bzaaiFv9SkNn3tQhfZ/ht1nrP8doP1LAI5Zt13yrSu2Fr+OQkv0XyfAQt63S45gkicuPd4spzoLqvEConOkeOza4Mt6inU639MCsG9LBh0M0H29jY9d0WuzNPqS8eA1Uzt94T1Q9xsQYwOdHFq9utIC7MQq26QLP+KrRwsduYV8BOt/lNGRgS0gEqWaGS9Ic+ZUOeoEBw==;5:hL8ETI7MVWEhnT26gv3LO3wKD6Uf3fW7bhtHukd8BRPMOLcfFovzjaJRPRRgXML6/itqXQ+6NS1M/g2oy3TjXcmBGTfU1yVfYU08jS3Wu5bijUI8+SdIVZpkWcqI7nn2UVfxhG5GkXYN2p7wgHefYQ==;24:B+7ngxj0o2+27gwHSU/kkqxXaXOglPnwxQZ8FQgfv3xcWwQhC14vzX5MIZBL5benXt+WuhSs6yk4/JPCCcbnh+HAx1cwsqbneEEL3zCW2JA=;7:ibkLhBaUvARk9yzLby14TI1qxJ8ukrxnb1b/S3yo3abRHV54qpOwFLVW1wwv2d/3y+iei86EzW/TeNQzE+Eastx3GtQV/BcA71WCDLZzicOtLJHO7DhKeJkB4TYqVw2dkWU8TK/RXlmhPwecoybr+GcDRX3Nj8WM6CmoiJV9UCzxUjze2GpV7q9MDGyWi9NkSqEG7Cvhkv5tJDM4t7S4qw377iU27ftW6GaO18t3MxA= x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-forefront-antispam-report: SFV:SKI;SCL:-1;SFV:NSPM;SFS:(10009020)(6009001)(376002)(346002)(39860400002)(199003)(189002)(377454003)(24454002)(7736002)(6916009)(66066001)(53936002)(305945005)(54906002)(76176999)(99286003)(81166006)(81156014)(5660300001)(189998001)(74316002)(106356001)(3846002)(7696004)(102836003)(8676002)(68736007)(97736004)(55016002)(2906002)(6116002)(8936002)(3280700002)(25786009)(110136004)(53546010)(4326008)(105586002)(101416001)(6246003)(3660700001)(86362001)(50986999)(33656002)(2900100001)(54356999)(14454004)(229853002)(478600001)(5250100002)(316002)(9686003)(6506006)(6436002);DIR:OUT;SFP:1101;SCL:1;SRVR:DB6PR04MB3253;H:DB6PR04MB2999.eurprd04.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; x-ms-office365-filtering-correlation-id: bd9e09d3-1280-4483-742c-08d4fba3e4e0 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(300000502095)(300135100095)(22001)(2017030254152)(300000503095)(300135400095)(48565401081)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:DB6PR04MB3253; x-ms-traffictypediagnostic: DB6PR04MB3253: x-exchange-antispam-report-test: UriScan:(275809806118684); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(100000703101)(100105400095)(93006095)(93001095)(10201501046)(3002001)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123558100)(20161123564025)(20161123560025)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:DB6PR04MB3253;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:DB6PR04MB3253; x-forefront-prvs: 0430FA5CB7 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 X-OriginatorOrg: nxp.com X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Sep 2017 19:07:50.3144 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 686ea1d3-bc2b-4c6f-a92c-d99c5c301635 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR04MB3253 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id v8EJ81N1027055 Content-Length: 6981 Lines: 168 On 9/14/2017 10:00 AM, Catalin Marinas wrote: > On Thu, Aug 24, 2017 at 04:37:51PM -0400, Roy Pledge wrote: >> diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c >> index ff8998f..e31c843 100644 >> --- a/drivers/soc/fsl/qbman/bman.c >> +++ b/drivers/soc/fsl/qbman/bman.c >> @@ -154,7 +154,7 @@ struct bm_mc { >> }; >> >> struct bm_addr { >> - void __iomem *ce; /* cache-enabled */ >> + void *ce; /* cache-enabled */ >> void __iomem *ci; /* cache-inhibited */ >> }; > > You dropped __iomem from ce, which is fine since it is now set via > memremap. However, I haven't seen (at least not in this patch), a change > to bm_ce_in() which still uses __raw_readl(). > > (it may be worth checking this code with sparse, it may warn about this) Thanks, you're correct I missed this. I will fix this (and the qman version) and run sparse. > >> diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c >> index 39b39c8..bb03503 100644 >> --- a/drivers/soc/fsl/qbman/bman_portal.c >> +++ b/drivers/soc/fsl/qbman/bman_portal.c >> @@ -91,7 +91,6 @@ static int bman_portal_probe(struct platform_device *pdev) >> struct device_node *node = dev->of_node; >> struct bm_portal_config *pcfg; >> struct resource *addr_phys[2]; >> - void __iomem *va; >> int irq, cpu; >> >> pcfg = devm_kmalloc(dev, sizeof(*pcfg), GFP_KERNEL); >> @@ -123,23 +122,34 @@ static int bman_portal_probe(struct platform_device *pdev) >> } >> pcfg->irq = irq; >> >> - va = ioremap_prot(addr_phys[0]->start, resource_size(addr_phys[0]), 0); >> - if (!va) { >> - dev_err(dev, "ioremap::CE failed\n"); >> + /* >> + * TODO: Ultimately we would like to use a cacheable/non-shareable >> + * (coherent) mapping for the portal on both architectures but that >> + * isn't currently available in the kernel. Because of HW differences >> + * PPC needs to be mapped cacheable while ARM SoCs will work with non >> + * cacheable mappings >> + */ > > This comment mentions "cacheable/non-shareable (coherent)". Was this > meant for ARM platforms? Because non-shareable is not coherent, nor is > this combination guaranteed to work with different CPUs and > interconnects. My wording is poor I should have been clearer that non-shareable == non-coherent. I will fix this. We do understand that cacheable/non shareable isn't supported on all CPU/interconnect combinations but we have verified with ARM that for the CPU/interconnects we have integrated QBMan on our use is OK. The note is here to try to explain why the mapping is different right now. Once we get the basic QBMan support integrated for ARM we do plan to try to have patches integrated that enable the cacheable mapping as it gives a significant performance boost. This is a step 2 as we understand the topic is complex and a little controversial so I think treating it as an independent change will be easier than mixing it with the less interesting changes in this patchset. > >> +#ifdef CONFIG_PPC >> + /* PPC requires a cacheable/non-coherent mapping of the portal */ >> + pcfg->addr_virt_ce = memremap(addr_phys[0]->start, >> + resource_size(addr_phys[0]), MEMREMAP_WB); >> +#else >> + /* ARM can use a write combine mapping. */ >> + pcfg->addr_virt_ce = memremap(addr_phys[0]->start, >> + resource_size(addr_phys[0]), MEMREMAP_WC); >> +#endif > > Nitpick: you could define something like QBMAN_MAP_ATTR to be different > between PPC and the rest and just keep a single memremap() call. I will change this - it will be a little more compact. > > One may complain that "ce" is no longer "cache enabled" but I'm > personally fine to keep the same name for historical reasons. Cache Enabled is also how the 'data sheet' for the processor describes the region and I think it is useful to keep it aligned so that anyone looking at the manual and the code can easily correlate the ter > >> diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h b/drivers/soc/fsl/qbman/dpaa_sys.h >> index 81a9a5e..0a1d573 100644 >> --- a/drivers/soc/fsl/qbman/dpaa_sys.h >> +++ b/drivers/soc/fsl/qbman/dpaa_sys.h >> @@ -51,12 +51,12 @@ >> >> static inline void dpaa_flush(void *p) >> { >> + /* >> + * Only PPC needs to flush the cache currently - on ARM the mapping >> + * is non cacheable >> + */ >> #ifdef CONFIG_PPC >> flush_dcache_range((unsigned long)p, (unsigned long)p+64); >> -#elif defined(CONFIG_ARM) >> - __cpuc_flush_dcache_area(p, 64); >> -#elif defined(CONFIG_ARM64) >> - __flush_dcache_area(p, 64); >> #endif >> } > > Dropping the private API cache maintenance is fine and the memory is WC > now for ARM (mapping to Normal NonCacheable). However, do you require > any barriers here? Normal NC doesn't guarantee any ordering. The barrier is done in the code where the command is formed. We follow this pattern a) Zero the command cache line (the device never reacts to a 0 command verb so a cast out of this will have no effect) b) Fill in everything in the command except the command verb (byte 0) c) Execute a memory barrier d) Set the command verb (byte 0) e) Flush the command If a castout happens between d) and e) doesn't matter since it was about to be flushed anyway . Any castout before d) will not cause HW to process the command because verb is still 0. The barrier at c) prevents reordering so the HW cannot see the verb set before the command is formed. > >> diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c >> index cbacdf4..41fe33a 100644 >> --- a/drivers/soc/fsl/qbman/qman_portal.c >> +++ b/drivers/soc/fsl/qbman/qman_portal.c >> @@ -224,7 +224,6 @@ static int qman_portal_probe(struct platform_device *pdev) >> struct device_node *node = dev->of_node; >> struct qm_portal_config *pcfg; >> struct resource *addr_phys[2]; >> - void __iomem *va; >> int irq, cpu, err; >> u32 val; >> >> @@ -262,23 +261,34 @@ static int qman_portal_probe(struct platform_device *pdev) >> } >> pcfg->irq = irq; >> >> - va = ioremap_prot(addr_phys[0]->start, resource_size(addr_phys[0]), 0); >> - if (!va) { >> - dev_err(dev, "ioremap::CE failed\n"); >> + /* >> + * TODO: Ultimately we would like to use a cacheable/non-shareable >> + * (coherent) mapping for the portal on both architectures but that >> + * isn't currently available in the kernel. Because of HW differences >> + * PPC needs to be mapped cacheable while ARM SoCs will work with non >> + * cacheable mappings >> + */ > > Same comment as above non non-shareable. > >> +#ifdef CONFIG_PPC >> + /* PPC requires a cacheable mapping of the portal */ >> + pcfg->addr_virt_ce = memremap(addr_phys[0]->start, >> + resource_size(addr_phys[0]), MEMREMAP_WB); >> +#else >> + /* ARM can use write combine mapping for the cacheable area */ >> + pcfg->addr_virt_ce = memremap(addr_phys[0]->start, >> + resource_size(addr_phys[0]), MEMREMAP_WT); >> +#endif > > Same nitpick: a single memremap() call. >