Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752266AbdHGHR5 (ORCPT ); Mon, 7 Aug 2017 03:17:57 -0400 Received: from mga07.intel.com ([134.134.136.100]:4503 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752093AbdHGHR4 (ORCPT ); Mon, 7 Aug 2017 03:17:56 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,336,1498546800"; d="scan'208";a="887300138" From: Alexey Budankov Subject: Re: [PATCH v6 1/3] perf/core: use rb trees for pinned/flexible groups To: Peter Zijlstra Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Andi Kleen , Kan Liang , Dmitri Prokhorov , Valery Cherepennikov , Mark Rutland , Stephane Eranian , David Carrillo-Cisneros , linux-kernel References: <96c7776f-1f17-a39e-23e9-658596216d6b@linux.intel.com> <20170803130002.oatczvnaalplrsep@hirez.programming.kicks-ass.net> <86cbe0b0-a1ec-4d5f-addc-87bccf2e97d7@linux.intel.com> <20170804143628.34c2xqxl2e6k2arj@hirez.programming.kicks-ass.net> Organization: Intel Corp. Message-ID: <9d2e25c9-209c-f28a-d601-d3f1a71f032f@linux.intel.com> Date: Mon, 7 Aug 2017 10:17:46 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170804143628.34c2xqxl2e6k2arj@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4046 Lines: 160 On 04.08.2017 17:36, Peter Zijlstra wrote: > On Thu, Aug 03, 2017 at 11:30:09PM +0300, Alexey Budankov wrote: >> On 03.08.2017 16:00, Peter Zijlstra wrote: >>> On Wed, Aug 02, 2017 at 11:13:54AM +0300, Alexey Budankov wrote: > >>>> +/* >>>> + * Find group list by a cpu key and rotate it. >>>> + */ >>>> +static void >>>> +perf_event_groups_rotate(struct rb_root *groups, int cpu) >>>> +{ >>>> + struct rb_node *node; >>>> + struct perf_event *node_event; >>>> + >>>> + node = groups->rb_node; >>>> + >>>> + while (node) { >>>> + node_event = container_of(node, >>>> + struct perf_event, group_node); >>>> + >>>> + if (cpu < node_event->cpu) { >>>> + node = node->rb_left; >>>> + } else if (cpu > node_event->cpu) { >>>> + node = node->rb_right; >>>> + } else { >>>> + list_rotate_left(&node_event->group_list); >>>> + break; >>>> + } >>>> + } >>>> +} >>> >>> Ah, you worry about how to rotate inside a tree? >> >> Exactly. >> >>> >>> You can do that by adding (run)time based ordering, and you'll end up >>> with a runtime based scheduler. >> >> Do you mean replacing a CPU indexed rb_tree of lists with >> an CPU indexed rb_tree of counter indexed rb_trees? > > No, single tree, just more complicated ordering rules. > >>> A trivial variant keeps a simple counter per tree that is incremented >>> for each rotation. That should end up with the events ordered exactly >>> like the list. And if you have that comparator like above, expressing >>> that additional ordering becomes simple ;-) >>> >>> Something like: >>> >>> struct group { >>> u64 vtime; >>> rb_tree tree; >>> }; >>> >>> bool event_less(left, right) >>> { >>> if (left->cpu < right->cpu) >>> return true; >>> >>> if (left->cpu > right_cpu) >>> return false; >>> >>> if (left->vtime < right->vtime) >>> return true; >>> >>> return false; >>> } >>> >>> insert_group(group, event, tail) >>> { >>> if (tail) >>> event->vtime = ++group->vtime; >>> >>> tree_insert(&group->root, event); >>> } >>> >>> Then every time you use insert_group(.tail=1) it goes to the end of that >>> CPU's 'list'. >>> >> >> Could you elaborate more on how to implement rotation? > > Its almost all there, but let me write a complete replacement for your > perf_event_group_rotate() above. > > /* find the leftmost event matching @cpu */ > /* XXX not sure how to best parametrise a subtree search, */ > /* again, C sucks... */ > struct perf_event *__group_find_cpu(group, cpu) > { > struct rb_node *node = group->tree.rb_node; > struct perf_event *event, *match = NULL; > > while (node) { > event = container_of(node, struct perf_event, group_node); > > if (cpu > event->cpu) { > node = node->rb_right; > } else if (cpu < event->cpu) { > node = node->rb_left; > } else { > /* > * subtree match, try left subtree for a > * 'smaller' match. > */ > match = event; > node = node->rb_left; > } > } > > return match; > } > > void perf_event_group_rotate(group, int cpu) > { > struct perf_event *event = __group_find_cpu(cpu); > > if (!event) > return; > > tree_delete(&group->tree, event); > insert_group(group, event, 1); > } > > So we have a tree ordered by {cpu,vtime} and what we do is find the > leftmost {cpu} entry, that is the smallest vtime entry for that cpu. We > then take it out and re-insert it with a vtime number larger than any > other, which places it as the rightmost entry for that cpu. > > > So given: > > {1,1} > / \ > {0,5} {1,2} > / \ \ > {0,1} {0,6} {1,4} > > > __group_find_cpu(.cpu=1) will return {1,1} as being the leftmost entry > with cpu=1. We'll then remove it, update its vtime to 7 and re-insert. > resulting in something like: > > {1,2} > / \ > {0,5} {1,4} > / \ \ > {0,1} {0,6} {1,7} > Makes sense. The implementation becomes a bit simpler. The drawbacks may be several rotations of potentially big tree on the critical path, instead of updating four pointers in case of the tree of lists. > > > >