Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751848AbdITM72 (ORCPT ); Wed, 20 Sep 2017 08:59:28 -0400 Received: from mail-co1nam03on0087.outbound.protection.outlook.com ([104.47.40.87]:55363 "EHLO NAM03-CO1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751590AbdITM70 (ORCPT ); Wed, 20 Sep 2017 08:59:26 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Tomasz.Nowicki@cavium.com; Subject: Re: [PATCH v4 5/6] iommu/iova: Extend rbtree node caching To: Robin Murphy , joro@8bytes.org Cc: iommu@lists.linux-foundation.org, thunder.leizhen@huawei.com, nwatters@codeaurora.org, tomasz.nowicki@caviumnetworks.com, linux-kernel@vger.kernel.org References: <223f0fddac0dcefc43ad81594979875967745c58.1505829018.git.robin.murphy@arm.com> From: Tomasz Nowicki Message-ID: Date: Wed, 20 Sep 2017 14:59:09 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <223f0fddac0dcefc43ad81594979875967745c58.1505829018.git.robin.murphy@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit X-Originating-IP: [31.172.191.173] X-ClientProxiedBy: HE1PR07CA0002.eurprd07.prod.outlook.com (2603:10a6:7:67::12) To CY4PR0701MB3651.namprd07.prod.outlook.com (2603:10b6:910:93::14) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 2a2e5905-6088-4c9d-c9d5-08d500276a5d X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:CY4PR0701MB3651; X-Microsoft-Exchange-Diagnostics: 1;CY4PR0701MB3651;3:RCBiBrQGBHbYPJvEVRlx70ovsWnIb+BF/9jixMlXkxT/AvB14gMICcdSr4TLLECdpGMpcsxnAC04peLAkhm1sI9JRCi4GcrpjArYUt13lpLVPSw+jZeKe9qkONsykhA5/A8rwJb6QrDlSIqDfOgJv7dT1sDZANBvQbdO2b/zD5yObWUTfHpXWSxFq/7D9t1K8JiI+o6KQFXR68VN1X6LAkL8FDrrnoh9oJkOPjbc3MMcA5ZoZZF0zjxB8LjOakV8;25:isKd0r8E/ReatPYjsL/ZS/q9CThsp4RKyVt1s1lVwWPnzP+ahCMJONtcJ+Jm8mnGREELuV/CzSivng5J2FV3raZykndjMoCjDC0PF2ehdEg4QwaN7ZEs0jnuLdd6fuZ22T62M0IlWvogdwvLeeKYjFxW3ElVFf9G0AMJCST366z0MxjzKWnjg7r/OY5yKXgSWQVAuZXe1hwZujW7wZrOWnjo2ao5gB6eZBTQmig3q6lodZzZVNwfGVY0Mdd8XiA+Vmvtjn/AelQTrQFGKQALtYaEpdrkj9yg648xs6j6XQBgH67jMVW38PdC1hRkxGNaga24f4JNa5RyZayVPecJMw==;31:UGFGdSYqVKAQs+HUlR9PJv1QqsegqKv/IN/bU6xk0njOfoJaCcGz8VotsIom195LT16aRUUreeYg6QrIEcG3EpQCHc2SSEp4Do46iU40NOTIUd0Nmlv/OfNAgS2cOTcRmwEtE+QLI0eSSjETWcRZlg/C6oodFr7FiFhEbFI+D3IbueSX2vtgAnw1mEuqMu5UO2E98wXwvmqRhG1QbXkmeIA5GfOXpE1OLyQyRhAfQ34= X-MS-TrafficTypeDiagnostic: CY4PR0701MB3651: X-Microsoft-Exchange-Diagnostics: 1;CY4PR0701MB3651;20:QMBikGn0b5HEZ3pY04Py6x+4L+9tsoqfDtiuP5zMG7zkWuMV7Edhs+zSMQRoS9+uk2TM/u1M9ydHIWYGHbKDOBAo1WwzgXrK+/Smg9cVX48f9dLJWqNAV/YJ2MuAIBbqWMX1Y8Lay/Dm0qsowv5GKoojSS+Qg738CbDyk0SRRoGkerFLypaUiAnr3j4fdtLOHMcjrbxdpJS+p303gwCa1EXVlM24L5n0cAJV4NusNloPBC5xokuJ+1soujINySwfbXb7ZFdCmw6pp7O6DqJxLE/KC/noD5GuWkvUt8YqMkDiy3k27CvNJgD8p6lgXNN8/pCHDc+PA7scgIV8J6pNhXSK7qWyV7wyjPoXaQoZVxOVk6thy7o9wqr3vqzD7JG7lQamHQWs59LC6WAc4wtridaUvCrAUG2pCJHCZnzuV8NP9UkJ+jc4A25ieX3HX/4ulGo5iWYjBfEDWNCimJQLpsxaFRRGu4OOTgO04O8xbrYyqMeTht05rcpXHLwOTJAs4LUw/zFmRNWP+p7LCdeBzeih6J11In+Jt7VChHzRnrJGRcKyyICYFl4Y6IbmOoD3KemWBN4z39AGxbuc3MKT7D9nhjf+3A9N0RN2pc8BEDo= X-Exchange-Antispam-Report-Test: UriScan:(180628864354917)(50582790962513); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(93006095)(10201501046)(3002001)(100000703101)(100105400095)(6041248)(20161123560025)(20161123555025)(20161123558100)(20161123562025)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:CY4PR0701MB3651;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:CY4PR0701MB3651; X-Microsoft-Exchange-Diagnostics: 1;CY4PR0701MB3651;4:TBL5NHrdHoqZXdIjcjD1Y7XfBHznPzOWrdNwofaRt9PskLp1ZbGtXabpjnMqKcsDJ4J0/0Btj2bm87NWz/Jny0fXp+RFm79KBA6lYa7hXxZE9ZhMCcJcyvwUOM0z0MJNiULdHNYP+V7iGjme5yQ65CFswb3qpg+UkgYrR+5PD+revQS2LmxipAT+wPuZ+T6IZqAkRRXKmUfR7MwP0/w5Me7UAGti0hc8ih+Jtgf0VUE7k8n4FmZc6jG2AWbx+f+j9gF1wA1CRYlc7itJRzexcASJy4nClun/mG/j/HkMMAEvxQ+O60ILPbyuOc2gctH+P/CIwk8ie3f1bpb5Ul4cHw== X-Forefront-PRVS: 04362AC73B X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6049001)(6009001)(346002)(376002)(24454002)(199003)(189002)(3846002)(4326008)(54356999)(76176999)(53936002)(50986999)(81156014)(81166006)(6246003)(64126003)(189998001)(33646002)(8936002)(53546010)(105586002)(106356001)(25786009)(101416001)(31696002)(50466002)(6116002)(5660300001)(68736007)(83506001)(16576012)(8676002)(16526017)(316002)(31686004)(58126008)(2906002)(97736004)(36756003)(66066001)(65956001)(72206003)(23676002)(7736002)(77096006)(6486002)(229853002)(305945005)(478600001)(6666003)(230700001)(2950100002)(47776003)(42882006);DIR:OUT;SFP:1101;SCL:1;SRVR:CY4PR0701MB3651;H:[10.0.0.85];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtDWTRQUjA3MDFNQjM2NTE7MjM6OXBtS3BwZDlURVM5T3d2S1puTEVnN3ly?= =?utf-8?B?LzFvRGZacjdEVi9QY25GdTFPK1ZYd05FZEdTNEQ3Q2J5WExURC9YM2RVYzRL?= =?utf-8?B?MnByN1NSM2l1eUNMSi81RWd5RnNobFQrOGFjMGxzeEVCWlAzcVd3S2NQU0xp?= =?utf-8?B?OVFnL0U1MW9pdHNJNVYrTGFNZ0pIa2hJK3o2NVFGTlN4YjAwbklwVEFEUDRh?= =?utf-8?B?UUlwOXdmd0Y2YzkwR0JlNGs4bVNhcWE0K2dHNU55MTNZbUtQV3lBTGNvcUxR?= =?utf-8?B?dmw3Qi9SY0hmc0FFZjZmNXVlSTNOL21vVXl2eVQzN2x4QzU3dGdYeThvVDRN?= =?utf-8?B?TWtkb3ZQU0R0Q0I1VTB3MTErUzNyTmhVKzd1eUxISDdndzZKYzdZNG9ZcmVE?= =?utf-8?B?MUFzVHBxWDFTRFZXQ2xaQXVnUmttR3FYcGNMRFc0TmNEczFSZkp3Q3MwWHBK?= =?utf-8?B?TnRPQ0hGZDROaTJEZnRXZndmOVpDUEFjTE1JTDVzQzhtVFFJSDBvK3RWcTY1?= =?utf-8?B?SWdhNmVZL0pZUVM4OVpOT1U5ZWk2TDRiOTNyMlUrNUJyZXZQNWxCM0w2cVRZ?= =?utf-8?B?UHo3aTVNd3RnVGJxMkRwdGxzVFF6b0syY2FpV2tFaTV2Q3pXM2YrVW44Rm5o?= =?utf-8?B?RkU5MXVUT0x2c3NTcVpmdmw5c2VCMnJBNU94VGpXV1RBMmxPQlpxb1dRVWd3?= =?utf-8?B?QXduNm9vazFZaFBQaFFhRE14VXlqUlJkeXpZOEVBem1mSkZ5dEFpNnBpZ0NE?= =?utf-8?B?ZmNOWVd0blM3MnJTN0lxYjJ5Y2doTVlLNTM0VlpzdzJXOEFhd1ZnUEhmdkta?= =?utf-8?B?VXlrSldSZjF6QnpyVzMyT0JDZUk2R1U4bHAyRGt1cDRrSS9KM3M2a3d6YVFx?= =?utf-8?B?Y1dlTVVIZEVxRlhTK1hsWERjRndqR09wZnVQNGp1Y1pKRXdRdXVoRjhVTjBp?= =?utf-8?B?ZFhiWGticEkxSmtjM1lVQ2xxKzNYY2R4QitzeVN4R2gxTVpFRDRxQXhPWlh1?= =?utf-8?B?YnFhc0VRM2tub0dOcmNVT2h5S3Z2OUFLa1hhQzRTaW01dXVQQ1A5UEswYkc1?= =?utf-8?B?OEptTTMyMlY5YlpJN1BvQjhpV2R5YUl3U2c0OWJobVlySEo3VXNPbTA0T2l0?= =?utf-8?B?NjJRVkE1amc4cmVwdVVyeTRpTisrSzJKOG83ODFkZFJkcVdnUWpTUlgvVEhB?= =?utf-8?B?L25PUGoySkpvWHlTTktqRXBvOEUzVG4wd2dyNWhQSHRLWDJFK0NuVTRDV2Fr?= =?utf-8?B?ZHlnWGhBY3o3UktDQ1hQN0ZYdnRIRWN3QnhIZ1JXYng2S0tZb0FVRUFTM0Nn?= =?utf-8?B?ekdqR3NrU2tKK2pRMEg2VmovSlN6QlkyQkVwWEs0RmRkakNDcHVoSEFPazZT?= =?utf-8?B?dHRBQndkbHNYakJDalhoVklFRG94Vit2dXlDYUg3TWxPaEk3WFp1eEhTbjlp?= =?utf-8?B?NFphUTNmV2JRZmtlWENkRUxiMkIvVlpsay9XcnZydE95bmxESUFyb0lVVzht?= =?utf-8?B?SHlQNEowcHU4YlpPTUdnS3hDd0Y3d1o5ak12ZEQ4aWFQZnpFb0FpcUZheWdy?= =?utf-8?B?Sk9ZaHRSR1BKQTdlVTh0TnFaelJLY05hVjB5ME5lY0htblluelZuMXNEYnhD?= =?utf-8?B?eWgxRWdPQ25uTW11YXE2UEZLZ0VTaC9ONjFqQTltYi8rWjEwYnNoVERlczU0?= =?utf-8?Q?4/p9yx4mIOEyRJio54tHAzXlpx/B4HA61BsPlt7qb?= X-Microsoft-Exchange-Diagnostics: 1;CY4PR0701MB3651;6:ravK74ZzmdCc670s68iXSotuYnqLmeEwYQwazMYZlztmOf6SN9aJbsw9nAuGrWjbCXniUpqYPoFQaAdShGtNeIw61Oh/wvuUqbS0VEuqCZS94Kk0Kji1liD0zt14Qhj9l9xf5muGPUCe7DJZZu5tJmkiwwmRYZYIJowhzsvEhMPKoS+spYDIpzTV2n2oHb52n3U5Kdlw8//l8DmcI7xfgGqxkMceLJekvEP6KJJAftHGLa0xWoLaOk+RqPTRtSi77IjxUY3xUefSGjpqjcohmH5M14QqAUTFxaMzBYT9C4K4/DFb3hqH6xjpH3FfMjVWLpgwVdQXaMHqp5eE4w+GVA==;5:5rU5F4i5v7XqCTLRNgF4S2gpLAagZ4AH+STeV5aN6cNpVtgOwnJ2ii2Zie1TbdI2JVklrKT648I9wQZsQGjROiErij1LxVsIES900N81dOOa8fZrh07zMTuqy8b660HTOzZ3/s4NJPgJ61J4wV431w==;24:HlNp5ERANbEiHAdmtfTULSWqsrE+lqs8Qv3XB3nmJ/GbGlrcQg3FDRPWUzoSzW6XdYDxYRoyNqSwRxEvtuw3xWFjFJ4ixgjxbe2176IygOw=;7:r+Qa3JbACbymaIsjXxYhIHak0LSHJNjTfiEkjQfCswF1CNSd7LVyqTvVyOzO7AIc2h2z/sFiEwzuhfzQjPu+zFFpbY5UJ47CMs3bQg8tb64gEMbUlDigaunyYV5jYY/JcKVi+9Gvi7mprjeMdIGlPMk1tq602OVS/tFn/s6TEQYHh1TbG8iQO32hj2C35MnjsFmgs3WNG+yEtc3+11/AJ7AyGX8HnDjtxRiWnpMfhrc= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Sep 2017 12:59:21.1674 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR0701MB3651 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3893 Lines: 111 Hi Robin, On 19.09.2017 18:31, Robin Murphy wrote: > The cached node mechanism provides a significant performance benefit for > allocations using a 32-bit DMA mask, but in the case of non-PCI devices > or where the 32-bit space is full, the loss of this benefit can be > significant - on large systems there can be many thousands of entries in > the tree, such that walking all the way down to find free space every > time becomes increasingly awful. > > Maintain a similar cached node for the whole IOVA space as a superset of > the 32-bit space so that performance can remain much more consistent. > > Inspired by work by Zhen Lei . > > Tested-by: Ard Biesheuvel > Tested-by: Zhen Lei > Tested-by: Nate Watterson > Signed-off-by: Robin Murphy > --- > > v4: > - Adjust to simplified __get_cached_rbnode() behaviour > - Cosmetic tweaks > > drivers/iommu/iova.c | 43 +++++++++++++++++++++---------------------- > include/linux/iova.h | 3 ++- > 2 files changed, 23 insertions(+), 23 deletions(-) > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c > index c93a6c46bcb1..a125a5786dbf 100644 > --- a/drivers/iommu/iova.c > +++ b/drivers/iommu/iova.c > @@ -51,6 +51,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule, > > spin_lock_init(&iovad->iova_rbtree_lock); > iovad->rbroot = RB_ROOT; > + iovad->cached_node = NULL; > iovad->cached32_node = NULL; > iovad->granule = granule; > iovad->start_pfn = start_pfn; > @@ -119,39 +120,38 @@ __get_cached_rbnode(struct iova_domain *iovad, unsigned long limit_pfn) > if (limit_pfn <= iovad->dma_32bit_pfn && iovad->cached32_node) > return iovad->cached32_node; > > + if (iovad->cached_node) > + return iovad->cached_node; > + > return &iovad->anchor.node; > } > > static void > -__cached_rbnode_insert_update(struct iova_domain *iovad, > - unsigned long limit_pfn, struct iova *new) > +__cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new) > { > - if (limit_pfn != iovad->dma_32bit_pfn) > - return; > - iovad->cached32_node = &new->node; > + if (new->pfn_hi < iovad->dma_32bit_pfn) > + iovad->cached32_node = &new->node; > + else > + iovad->cached_node = &new->node; > } > > static void > __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free) > { > struct iova *cached_iova; > - struct rb_node *curr; > + struct rb_node **curr; > > - if (!iovad->cached32_node) > + if (free->pfn_hi < iovad->dma_32bit_pfn) > + curr = &iovad->cached32_node; > + else > + curr = &iovad->cached_node; > + > + if (!*curr) > return; > - curr = iovad->cached32_node; > - cached_iova = rb_entry(curr, struct iova, node); > > - if (free->pfn_lo >= cached_iova->pfn_lo) { > - struct rb_node *node = rb_next(&free->node); > - struct iova *iova = rb_entry(node, struct iova, node); > - > - /* only cache if it's below 32bit pfn */ > - if (node && iova->pfn_lo < iovad->dma_32bit_pfn) > - iovad->cached32_node = node; > - else > - iovad->cached32_node = NULL; > - } > + cached_iova = rb_entry(*curr, struct iova, node); > + if (free->pfn_lo >= cached_iova->pfn_lo) > + *curr = rb_next(&free->node); IMO, we may incorrectly update iovad->cached32_node here. 32-bit boundary -------- -------- | -------- -------- | | | | | | | | | ---- IOVA0 ----- IOVA1 ----------------- IOVA3 ----- anchor | | | | | | | | | | -------- -------- | -------- -------- If we free last IOVA from 32-bit space (IOVA1) we will update iovad->cached32_node to IOVA3. Thanks, Tomasz