Date: 9 Jun 2015 05:43:18 -0400
Message-ID: <20150609094318.12813.qmail@ns.horizon.com>
From: "George Spelvin" <linux@horizon.com>
To: linux@horizon.com, tglx@linutronix.de
Subject: Re: [patch 2/7] timer: Remove FIFO guarantee
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
        viresh.kumar@linaro.org
In-Reply-To: <alpine.DEB.2.11.1506091001020.4133@nanos>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2618
Lines: 59

Thomas Gleixner wrote:
> The only reason is performance. The wheel has O(1) insertion and
> deletion time while heaps and trees usually have O(log(n)).

Pairing heaps also have O(1) insertion, and rescheduling can
be quite lightweight.

The issue I was worried about is the need for an additional pointer
per timer (left, right, parent) and the associated cache line touch
when modifying the heap.

> Timer wheel timers are usually timeouts and 99% of them are canceled
> before expiry. Networking is probably the heaviest use case followed
> by disk I/O.

And that rewards very lazy structures, that postpone work in the
hope it will become unnecessary.  But a wheel has real problems with
non-tick-based timers, which as you note covers both hrtimers and NOHZ.

Now, two things to note about pairing heaps (and many related heap
structures like Fibonacci heaps, but pairing heaps have a particularly
good constant factor in practice) are:

1) It is O(1) to "meld" two heaps into one.
2) The above is O(1) because it's literally "stick it on a linked list";
   the left child pointers are NULL until the minimum (which is kept
   at the head/root) is deleted and a new minimum has to be found.

Combining these two facts, we could do something wheel-like: divide
the future into a number of heaps, link future events into the correct
subheap, and meld the subheaps into the main heap when necessary.

Hopefully, by the time it's necessary, the subheap would have been
thinned out by timers being ccanceled.

On reflection, it wouldn't even be necessary to have explicit code to
handle the melding.  Just allocate an array of "internal use" nodes
which are easy to find, and place them in the main tree like
normal.  (Each has a timeout which is guaranteed to be earliest in
its subheap, so the subjeap will never need sorting.)

When one gets to the root, the internal node gets recycled (because
we set up the callback function to do that!) and the subheap gets
sorted and merged into the main heap automatically.

Alternatively, the internal use node could be made smaller (e.g.
an hlist head rather than a normal node) at the expense of needing
special-case code to handle it.

Have to think on this.  Heapifying the sublist is O(n) work, which is
the same as overflowing a bucket, but it means that additional deletions
will be more expensive.

Need to think on this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/