Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753158AbdHDOgm (ORCPT ); Fri, 4 Aug 2017 10:36:42 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:55973 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752220AbdHDOgj (ORCPT ); Fri, 4 Aug 2017 10:36:39 -0400 Date: Fri, 4 Aug 2017 16:36:28 +0200 From: Peter Zijlstra To: Alexey Budankov Cc: Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , Andi Kleen , Kan Liang , Dmitri Prokhorov , Valery Cherepennikov , Mark Rutland , Stephane Eranian , David Carrillo-Cisneros , linux-kernel Subject: Re: [PATCH v6 1/3] perf/core: use rb trees for pinned/flexible groups Message-ID: <20170804143628.34c2xqxl2e6k2arj@hirez.programming.kicks-ass.net> References: <96c7776f-1f17-a39e-23e9-658596216d6b@linux.intel.com> <20170803130002.oatczvnaalplrsep@hirez.programming.kicks-ass.net> <86cbe0b0-a1ec-4d5f-addc-87bccf2e97d7@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86cbe0b0-a1ec-4d5f-addc-87bccf2e97d7@linux.intel.com> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3646 Lines: 152 On Thu, Aug 03, 2017 at 11:30:09PM +0300, Alexey Budankov wrote: > On 03.08.2017 16:00, Peter Zijlstra wrote: > > On Wed, Aug 02, 2017 at 11:13:54AM +0300, Alexey Budankov wrote: > >> +/* > >> + * Find group list by a cpu key and rotate it. > >> + */ > >> +static void > >> +perf_event_groups_rotate(struct rb_root *groups, int cpu) > >> +{ > >> + struct rb_node *node; > >> + struct perf_event *node_event; > >> + > >> + node = groups->rb_node; > >> + > >> + while (node) { > >> + node_event = container_of(node, > >> + struct perf_event, group_node); > >> + > >> + if (cpu < node_event->cpu) { > >> + node = node->rb_left; > >> + } else if (cpu > node_event->cpu) { > >> + node = node->rb_right; > >> + } else { > >> + list_rotate_left(&node_event->group_list); > >> + break; > >> + } > >> + } > >> +} > > > > Ah, you worry about how to rotate inside a tree? > > Exactly. > > > > > You can do that by adding (run)time based ordering, and you'll end up > > with a runtime based scheduler. > > Do you mean replacing a CPU indexed rb_tree of lists with > an CPU indexed rb_tree of counter indexed rb_trees? No, single tree, just more complicated ordering rules. > > A trivial variant keeps a simple counter per tree that is incremented > > for each rotation. That should end up with the events ordered exactly > > like the list. And if you have that comparator like above, expressing > > that additional ordering becomes simple ;-) > > > > Something like: > > > > struct group { > > u64 vtime; > > rb_tree tree; > > }; > > > > bool event_less(left, right) > > { > > if (left->cpu < right->cpu) > > return true; > > > > if (left->cpu > right_cpu) > > return false; > > > > if (left->vtime < right->vtime) > > return true; > > > > return false; > > } > > > > insert_group(group, event, tail) > > { > > if (tail) > > event->vtime = ++group->vtime; > > > > tree_insert(&group->root, event); > > } > > > > Then every time you use insert_group(.tail=1) it goes to the end of that > > CPU's 'list'. > > > > Could you elaborate more on how to implement rotation? Its almost all there, but let me write a complete replacement for your perf_event_group_rotate() above. /* find the leftmost event matching @cpu */ /* XXX not sure how to best parametrise a subtree search, */ /* again, C sucks... */ struct perf_event *__group_find_cpu(group, cpu) { struct rb_node *node = group->tree.rb_node; struct perf_event *event, *match = NULL; while (node) { event = container_of(node, struct perf_event, group_node); if (cpu > event->cpu) { node = node->rb_right; } else if (cpu < event->cpu) { node = node->rb_left; } else { /* * subtree match, try left subtree for a * 'smaller' match. */ match = event; node = node->rb_left; } } return match; } void perf_event_group_rotate(group, int cpu) { struct perf_event *event = __group_find_cpu(cpu); if (!event) return; tree_delete(&group->tree, event); insert_group(group, event, 1); } So we have a tree ordered by {cpu,vtime} and what we do is find the leftmost {cpu} entry, that is the smallest vtime entry for that cpu. We then take it out and re-insert it with a vtime number larger than any other, which places it as the rightmost entry for that cpu. So given: {1,1} / \ {0,5} {1,2} / \ \ {0,1} {0,6} {1,4} __group_find_cpu(.cpu=1) will return {1,1} as being the leftmost entry with cpu=1. We'll then remove it, update its vtime to 7 and re-insert. resulting in something like: {1,2} / \ {0,5} {1,4} / \ \ {0,1} {0,6} {1,7}