Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932633AbbFIJnZ (ORCPT ); Tue, 9 Jun 2015 05:43:25 -0400 Received: from ns.horizon.com ([71.41.210.147]:34277 "HELO ns.horizon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752672AbbFIJnU (ORCPT ); Tue, 9 Jun 2015 05:43:20 -0400 Date: 9 Jun 2015 05:43:18 -0400 Message-ID: <20150609094318.12813.qmail@ns.horizon.com> From: "George Spelvin" To: linux@horizon.com, tglx@linutronix.de Subject: Re: [patch 2/7] timer: Remove FIFO guarantee Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, viresh.kumar@linaro.org In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2618 Lines: 59 Thomas Gleixner wrote: > The only reason is performance. The wheel has O(1) insertion and > deletion time while heaps and trees usually have O(log(n)). Pairing heaps also have O(1) insertion, and rescheduling can be quite lightweight. The issue I was worried about is the need for an additional pointer per timer (left, right, parent) and the associated cache line touch when modifying the heap. > Timer wheel timers are usually timeouts and 99% of them are canceled > before expiry. Networking is probably the heaviest use case followed > by disk I/O. And that rewards very lazy structures, that postpone work in the hope it will become unnecessary. But a wheel has real problems with non-tick-based timers, which as you note covers both hrtimers and NOHZ. Now, two things to note about pairing heaps (and many related heap structures like Fibonacci heaps, but pairing heaps have a particularly good constant factor in practice) are: 1) It is O(1) to "meld" two heaps into one. 2) The above is O(1) because it's literally "stick it on a linked list"; the left child pointers are NULL until the minimum (which is kept at the head/root) is deleted and a new minimum has to be found. Combining these two facts, we could do something wheel-like: divide the future into a number of heaps, link future events into the correct subheap, and meld the subheaps into the main heap when necessary. Hopefully, by the time it's necessary, the subheap would have been thinned out by timers being ccanceled. On reflection, it wouldn't even be necessary to have explicit code to handle the melding. Just allocate an array of "internal use" nodes which are easy to find, and place them in the main tree like normal. (Each has a timeout which is guaranteed to be earliest in its subheap, so the subjeap will never need sorting.) When one gets to the root, the internal node gets recycled (because we set up the callback function to do that!) and the subheap gets sorted and merged into the main heap automatically. Alternatively, the internal use node could be made smaller (e.g. an hlist head rather than a normal node) at the expense of needing special-case code to handle it. Have to think on this. Heapifying the sublist is O(n) work, which is the same as overflowing a bucket, but it means that additional deletions will be more expensive. Need to think on this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/