2012-05-31 23:22:14

by Daniel Santos

[permalink] [raw]
Subject: [PATCH v2 1/3] [RFC] Generic Red-Black Trees

Well, I'm running on 3.4 patched with this generic rb tree
implementation in the fair scheduler (although these are slightly
cleaned up, so I still need to boot this version to make sure there's no
last minute snafus). So I finally settled on a mechanism for the
type-safety, even though it's a big macro (not my first choice), it's
the mechanism that works on all versions of gcc that still matter.

This is still preliminary. I have a user-space test program (that just
compiles the rbtree.h in userspace) and a test module that I want to
refine to do profiling & stress tests so that performance can be easily
checked on various architectures & compilers. So here's the outstanding
list:

* insert_near not finished (low priority)
* find_near not yet tested
* augmented not yet tested



Attachments:
0006-linux-bug.h-Add-BUILD_BUG_ON_NON_CONST-macros.patch (3.02 kB)

2012-05-31 23:30:15

by Daniel Santos

[permalink] [raw]
Subject: [PATCH v2 3/3] [RFC] Generic Red-Black Trees

I've put a few performance notes in comments. Specifically, I'm curious
if an inline function that expands to 128+ bytes like this should
possibly be wrapped in an __attribute__((flatten))
__attribute__((noinline)) function to force full expansion in one place
and then prevent it from getting inlined elsewhere (to keep the
generated code size down).


Attachments:
0008-Use-generic-rbtree-impl-in-fair-scheduler.patch (3.28 kB)

2012-06-01 01:08:36

by Daniel Santos

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] [RFC] Generic Red-Black Trees

Oh yeah, so the patch set I sent boots & runs fine. I've also noticed
that I have a few outdated doc-comments and a mistake in the
doc-comments for
the BUILD_BUG_ON_NON_CONST2() macro, under the "Normal Variables"
section, the text "global non-static const broken until 4.2 (-O1 broken
until 4.4)" contradicts the earlier statement that all const values
should never fail. This only applies to floats (but not doubles), which
we don't have to worry about in the kernel anyway. (incidentally, their
behavior under __builtin_constant_p() varies quite a bit until gcc 4.4,
where it's fine)

It was important for me to figure out what was and wasn't working with
__builtin_constant_p(), and in what versions of gcc because this
implementation relies heavily on the compiler optimizing out code based
upon values pass to it via a const struct pointer. Also, I was looking
for other possibilities, as I continued discovered broken (improperly
optimized) scenarios.

Daniel

2012-06-01 18:02:58

by Daniel Santos

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] [RFC] Generic Red-Black Trees (performance notes)

I've done a bit more analysis of the generated code of
kernel/sched/fair.c (patched) under three versions of gcc.

gcc-4.5.3
* 48 bytes larger
* one instance of a (const_flag & ENUM_VALUE) fails to compile out
* compare function not inlined

gcc-4.6.2
* same size
* ANDed constants compiled out
* compare function inlined

gcc-4.7.0:
* 64 bytes smaller - __enqueue_entity() reduced by 16 bytes (2 byte
smaller and saved some padding, so essentially the same), I'm not sure
where the other 48 bytes came from, I'm guessing just alignment changes
from using different registers (smaller opcodes).

So in summary, the results on 4.5 are worse, but not "horrible". The
problems are fixed in 4.6 and later.

There's still lots of other scenarios to test.

Daniel