2012-05-25 22:48:48

by Daniel Santos

[permalink] [raw]
Subject: Generic Red-Black Trees (status update)

For anybody that's keeping up with this, I've gone through multiple
iterations and tests with 9 different gcc versions and concluded that
the search, insert & remove cores need to be coded in rbtree.h, using
the traditional interface (i.e., passing struct rb_node & rb_root
pointers instead of pointers to your specific object types). The reason
is that gcc can't handle the cool fully-generic code until 4.6. In gcc
4.5.x, optimization completely breaks expanding the inline functions
into huge bloated monsters. Also, while I'm re-coding it all, I'm
adding find_near & insert_near, for more efficient insertion & retrieval
when you already have a node that should be close to the one you want
(which is often the case when inserting many objects at once).

So after I'm done with this, I'll start on a new header file (grbtree.h
probably) using the "grb_" prefix for it's functions that implements the
gcc 4.6.x+ fully generic & type safe interface, but using cute
pre-processor tricks for pre-4.6.x compatibility (basically, something
to consider using once gcc 4.6+ is more widely used).

Daniel


2012-05-25 23:02:43

by Andi Kleen

[permalink] [raw]
Subject: Re: Generic Red-Black Trees (status update)

Daniel Santos <[email protected]> writes:

> For anybody that's keeping up with this, I've gone through multiple
> iterations and tests with 9 different gcc versions and concluded that
> the search, insert & remove cores need to be coded in rbtree.h, using
> the traditional interface (i.e., passing struct rb_node & rb_root
> pointers instead of pointers to your specific object types). The reason
> is that gcc can't handle the cool fully-generic code until 4.6. In gcc
> 4.5.x, optimization completely breaks expanding the inline functions

Can you post details?

> into huge bloated monsters. Also, while I'm re-coding it all, I'm
> adding find_near & insert_near, for more efficient insertion & retrieval
> when you already have a node that should be close to the one you want
> (which is often the case when inserting many objects at once).
>
> So after I'm done with this, I'll start on a new header file (grbtree.h
> probably) using the "grb_" prefix for it's functions that implements the
> gcc 4.6.x+ fully generic & type safe interface, but using cute
> pre-processor tricks for pre-4.6.x compatibility (basically, something
> to consider using once gcc 4.6+ is more widely used).

That doesn't make sense. Either it's used or it's not used,
but if it's available it should work with all compilers.

Otherwise you would end up with drivers or subsystems that
are compiler specific.

It's ok to be somewhat slower or bigger on older compilers.



-Andi

--
[email protected] -- Speaking for myself only

2012-05-26 01:12:40

by Daniel Santos

[permalink] [raw]
Subject: Re: Generic Red-Black Trees (status update)


> Daniel Santos <[email protected]> writes:
>
>> For anybody that's keeping up with this, I've gone through multiple
>> iterations and tests with 9 different gcc versions and concluded that
>> the search, insert & remove cores need to be coded in rbtree.h, using
>> the traditional interface (i.e., passing struct rb_node & rb_root
>> pointers instead of pointers to your specific object types). The reason
>> is that gcc can't handle the cool fully-generic code until 4.6. In gcc
>> 4.5.x, optimization completely breaks expanding the inline functions
> Can you post details?
Well, I suppose part of this is my own value judgment of what is a
"clean" implementation. By this, I mean balancing these requirements:
1.) minimal dependence on pre-processor
2.) avoiding pre-processor expanded code that will break debug
information (backtraces)
3.) optimal encapsulation of the details of your rbtree in minimal
source code (this is where you define the relationship between your
container and contained objects, their types, keys, rather or not
non-unique objects are allowed, etc.) -- preferably eliminating
duplication of these details entirely.
4.) offering a complete feature-set in a single implementation (not
multiple functions when various features are used)
5.) perfect optimization -- the generic function must be exactly as
efficient as the hand-coded version

So by those standards, the cleanest implementation I've come up with
uses a macro to define an anonymous interface struct something like this:

/* gerneric non-type-safe function */
static __always_inline void *__generic_func(void *obj);

/* macro to generate type-safe interface object (in practice, the real one
* defines all the functions in the interface, but I'm keeping it simple for
* brevity)
*/
#define INTERFACE_A(name, in_type, out_type) \
struct { \
out_type *(*const func)(in_type *obj); \
} name = { \
.func = (out_type *(*const)(in_type *obj))__generic_func; \
}

/* usage looks like this: */
INTERFACE_A(solution_a, struct something, struct something_else);
struct something *s;
struct something_else *se;
se = solution_a.func(s);

Calling solution_a.func(s) optimizes perfectly in 4.6, while in 4.5 and
prior, the call by struct-member-function-pointer is never inlined and
nothing passed to it is every considered a compile-time constant.
Because of the implementation of the generic functions, it bloats the
code unacceptably (3x larger). The following alternative works prior to
4.6, but with different syntax:

/* IMO, this solution is uglier and will break backtraces. */
#define INTERFACE_B(name, in_type, out_type) \
static __always_inline out_type * name##_func(in_type *obj) \
{ \
return (out_type *)__generic_func(obj); \
}

/* now you call solution_b_func(s) instead of solution_a.func(s) */

>> into huge bloated monsters. Also, while I'm re-coding it all, I'm
>> adding find_near & insert_near, for more efficient insertion & retrieval
>> when you already have a node that should be close to the one you want
>> (which is often the case when inserting many objects at once).
>>
>> So after I'm done with this, I'll start on a new header file (grbtree.h
>> probably) using the "grb_" prefix for it's functions that implements the
>> gcc 4.6.x+ fully generic & type safe interface, but using cute
>> pre-processor tricks for pre-4.6.x compatibility (basically, something
>> to consider using once gcc 4.6+ is more widely used).
> That doesn't make sense. Either it's used or it's not used,
> but if it's available it should work with all compilers.
>
> Otherwise you would end up with drivers or subsystems that
> are compiler specific.
>
> It's ok to be somewhat slower or bigger on older compilers.
You have a good point here, although I'm not sure that a 3x larger
function is an acceptable performance hit for a compiler as recent as
4.5. Perhaps it's best to just implement it using the INTERFACE_B style
above, accept the minor loss of backtrace-ability and pre-processor
ugliness and get on with it. There's no advantage to having two
competing syntaxes for usage. I'll post the full details with patch
tomorrow.

Daniel