Documentation/DocBook/kernel-api.tmpl | 4 +
include/linux/bug.h | 109 +++-
include/linux/compiler-gcc.h | 3 +
include/linux/compiler-gcc3.h | 8 +-
include/linux/compiler-gcc4.h | 31 +-
include/linux/compiler.h | 7 +-
include/linux/rbtree.h | 1044 ++++++++++++++++++++++++-
kernel/sched/fair.c | 82 +--
8 files changed, 1200 insertions(+), 88 deletions(-)
Summary
=======
This patch set improves on Andrea Arcangeli's original Red-Black Tree
implementation by adding generic search and insert functions with
complete support for:
o leftmost - keeps a pointer to the leftmost (lowest value) node cached
in your container struct
o rightmost - ditto for rightmost (greatest value)
o count - optionally update an count variable when you perform inserts
or deletes
o unique or non-unique keys
o find and insert "near" functions - when you already have a node that
is likely near another one you want to search for
o augmented / interval tree support
o type-safe wrapper interface available via pre-processor macro
Outstanding Issues
==================
o insert_near yet not tested
o find_near needs retesting
o find_{first,last,next,prev} untested
o augmented not yet tested
o doc comments still incomplete (but closer)
o need complete test module to perform tests for:
- correctness - test all functions & scenarios and verify results are
correct
- performance - compare generic functions with with hand-coded
versions to verify performance differences (or lack there of) on
various compilers and archs
o add CONFIG_RBTREE_DEBUG option (or similar) to check for usage
problems at run-time.
o need a new home for the general-purpose macros: IFF_EMPTY(), IS_EMPTY()
and OPT_OFFSETOF() (possibly in stddef.h?)
Theory of Operation
===================
Historically, genericity in C meant function pointers, the overhead of a
function call and the inability of the compiler to optimize code across
the function call boundary. GCC has been getting better and better at
optimization and determining when a value is a compile-time constant and
compiling it out. As of gcc 4.6, it has finally reached a point where
it's possible to have generic search & insert cores that optimize
exactly as well as if they were hand-coded. (see also gcc man page:
-findirect-inlining)
This implementation actually consists of two layers written on top of the
existing rbtree implementation.
Layer 1: Type-Specific (But Not Type-Safe)
------------------------------------------
The first layer consists of enum rb_flags, struct rb_relationship and
some generic inline functions(see patch for doc comments).
enum rb_flags {
RB_HAS_LEFTMOST = 0x00000001,
RB_HAS_RIGHTMOST = 0x00000002,
RB_HAS_COUNT = 0x00000004,
RB_UNIQUE_KEYS = 0x00000008,
RB_INSERT_REPLACES = 0x00000010,
RB_IS_AUGMENTED = 0x00000020,
};
struct rb_relationship {
ssize_t root_offset;
ssize_t left_offset;
ssize_t right_offset;
ssize_t count_offset;
ssize_t node_offset;
ssize_t key_offset;
int flags;
const rb_compare_f compare;
const rb_augment_f augment;
};
/* these function for use on all trees */
struct rb_node *rb_find(
struct rb_root *root,
const void *key,
const struct rb_relationship *rel);
struct rb_node *rb_find_near(
struct rb_node *from,
const void *key,
const struct rb_relationship *rel);
struct rb_node *rb_insert(
struct rb_root *root,
struct rb_node *node,
const struct rb_relationship *rel);
struct rb_node *rb_insert_near(
struct rb_root *root,
struct rb_node *start,
struct rb_node *node,
const struct rb_relationship *rel);
void rb_remove( struct rb_root *root,
struct rb_node *node,
const struct rb_relationship *rel);
/* these function for use on trees with non-unique keys */
struct rb_node *rb_find_first(
struct rb_root *root,
const void *key,
const struct rb_relationship *rel);
struct rb_node *rb_find_last(
struct rb_root *root,
const void *key,
const struct rb_relationship *rel);
struct rb_node *rb_find_next(
const struct rb_node *node,
const struct rb_relationship *rel)
struct rb_node *rb_find_prev(
const struct rb_node *node,
const struct rb_relationship *rel)
Using this layer involves initializing a const struct rb_relationship
variable with compile-time constant values and feeding its "address" to
the generic inline functions. The trick being, that (when gcc behaves
properly) it never creates a struct rb_relationship variable, stores an
initializer in the data section of the object file or passes a struct
rb_relationship pointer. Instead, gcc "optimizes out" out the struct,
and uses the compile-time constant values to dictate how the inline
functions will expand.
Thus, this structure can be thought of both as a database's DDL (data
definition language), defining the relationship between two entities and
the template parameters to a C++ templatized function that controls how
the template function is instantiated. This creates type-specific
functions, although type-safety is still not achieved (e.g., you can
pass a pointer to any rb_node you like).
To simplify usage, you can initialize your struct rb_relationship
variable with the RB_RELATIONSHIP macro, feeding it your types, member
names and flags and it will calculate the offsets for you. See doc
comments in patch for examples of using this layer (either with or
without the RB_RELATIONSHIP macro).
Layer 2: Type-Safety
--------------------
In order to achieve type-safety of a generic interface in C, we must
delve deep into the darkened Swamps of The Preprocessor and confront the
Prince of Darkness himself: Big Ugly Macro. To be fair, there is an
alternative solution (discussed in History & Design Goals), the
so-called "x-macro" or "supermacro" where you #define some pre-processor
values and include an unguarded header file. With 17 parameters, I
choose this solution for its ease of use and brevity, but it's an area
worth debate.
So this second layer allows you to use a single macro to define your
relationship as well as type-safe wrapper functions all in one go.
RB_DEFINE_INTERFACE(
prefix,
cont_type, root, left, right, count,
obj_type, node, key,
flags, compare, augment,
find_mod, insert_mod, find_near_mod, insert_near_mod)
To avoid needing multiple versions of the macro, we use a paradigm where
optional values can be left empty. (See RB_DEFINE_INTERFACE doc comments
for details.) Thus, if your container doesn't need to know leftmost,
you leave the parameter empty. Here's a quick example:
struct container {
struct rb_root root;
struct rb_node *leftmost;
unsigned long count;
};
struct object {
struct rb_node node;
long key;
};
static inline long compare_long(long *a, long *b)
{
return *a - *b;
}
RB_DEFINE_INTERFACE(
my_objects,
struct container, root, leftmost, /* no rightmost */, count,
struct object, node, key,
RB_UNIQUE_KEYS | RB_INSERT_REPLACES, compare_long,
/* no augment */,
,,,)
This will do some type-checking, create the struct rb_relationship and
the following static __always_inline wrapper functions. (Note that
"my_objects" is the prefix used in the example above. It will be
whatever you pass as the first parameter to the RB_DEFINE_INTERFACE
macro.)
struct object *my_objects_find(
struct container *cont,
const typeof(((struct object *)0)->key) *_key);
struct object *my_objects_insert(
struct container *cont,
struct object *obj);
struct object *my_objects_find_near(
struct object *near,
const typeof(((struct object *)0)->key) *_key);
struct object *my_objects_insert_near(
struct container *cont,
struct object *near,
struct object *obj);
void my_objects_remove(struct container *cont, struct object *obj);
struct object *my_objects_find_first(
struct container *cont,
const typeof(((struct object *)0)->key) *_key);
struct object *my_objects_find_last(
struct container *cont,
const typeof(((struct object *)0)->key) *_key);
struct object *my_objects_find_next(const struct object *obj);
struct object *my_objects_find_last(const struct object *obj);
struct object *my_objects_next(const struct object *obj);
struct object *my_objects_prev(const struct object *obj);
struct object *my_objects_first(struct container *cont);
struct object *my_objects_last(struct container *cont);
Each of these are each declared static __always_inline. However, you can
change the modifiers for the first four (find, insert, find_near and
insert_near) by populate any of the last 4 parameters with the function
modifiers of the respective function (when empty, they default to static
__always_inline).
Not only does this layer give you type-safety, it removes almost all of
the implementation details of the rbtree from the code using it, thus
making it easier to replace the underlying algorithm at some later
date.
History & Design Goals
======================
I've been through many iterations of various techniques searching for
the perfect "clean" implementation and finally settled on having a huge
macro expand to wrapper functions after exhausting all other
alternatives. The trick is that what one person considers a "clean"
implementation is a bit of a value judgment. So by "clean", I mean
balancing these requirements:
1.) minimal dependence on pre-processor
2.) avoiding pre-processor expanded code that will break debug
information (backtraces)
3.) optimal encapsulation of the details of your rbtree in minimal
source code (this is where you define the relationship between your
container and contained objects, their types, keys, rather or not
non-unique objects are allowed, etc.) -- preferably eliminating
duplication of these details entirely.
4.) offering a complete feature-set in a single implementation (not
multiple functions when various features are used)
5.) perfect optimization -- the generic function must be exactly as
efficient as the hand-coded version
By those standards, the "cleanest" implementation I had come up with
actually used a different mechanism: defining an anonymous interface
struct something like this:
/* generic non-type-safe function */
static __always_inline void *__generic_func(void *obj);
struct { \
out_type *(*const func)(in_type *obj); \
} name = { \
.func = (out_type *(*const)(in_type *obj))__generic_func;\
}
/* usage looks like this: */
DEFINE_INTERFACE(solution_a, struct something, struct something_else);
struct something *s;
struct something_else *se;
se = solution_a.func(s);
Sadly, while solution_a.func(s) optimizes perfectly in 4.6, it
completely bombed in 4.5 and prior -- the call by
struct-member-function-pointer is never inlined and nothing passed to it
is every considered a compile-time constant. Because of the
implementation of the generic functions, this bloated the code
unacceptably (3x larger). Thus, I finally settled on the current
RB_DEFINE_INTERFACE, which is massive, but optimizes perfectly in 4.6+
and close enough in 4.5 and prior (prior to 4.6, the compare function is
never inlined).
The other alternative I briefly considered was to have a header file
that is only included after #defining all of these parameters, relying
primarily on cpp rather than cc & compile-time constants to fill in the
relationship details (the "x-macro" approach). While this mechanism
would perform better on older compilers and never break backtraces, in
the end, I just couldn't stomach it. Aside from that, it would make
using the interface almost as verbose as hand-coding it yourself.
Q&A
===
Q: Why did you add BUILD_BUG_ON_NON_CONST() and
BUILD_BUG_ON_NON_CONST42()?
A: There were initially enough BUILD_BUG_ON(!__builtin_constant_p(arg))
calls to warrant it having a macro for it. However, I've since
discovered that using __builtin_constant_p on a struct member did not
behave very consistently, so after writing some test programs &
scripts, and refining 200k+ test results, I graphed out basically
where __builtin_constant_p() worked and didn't. As it turns out,
using it on struct members is fragile until gcc 4.2, so
BUILD_BUG_ON_NON_CONST42() is intended for use with struct members.
Q: Why empty parameters?
What is IFF_EMPTY() for?
Why don't you just pass zero instead of an empty parameter?
A: Support for caching the left- & right-most nodes in the tree as well
as maintaining a count variable are all optional. Passing the offset
value directly not only means more characters of code to use the
RB_RELATIONSHIP and RB_DEFINE_INTERFACE macros (because now you'll
have to invoke the offsetof macro, supplying your struct types
again), but the offset may actually be zero, so passing zero as "I'm
not using this feature" wont work. (This is the reason why the flags
RB_HAS_LEFTMOST, et. al. exist.) Thus, you would also need to
manually pass the appropriate rb_flag value to specify that you're
using the feature. All of this means more copy, paste & edit code
that is error-prone and a maintenance nightmare. This implementation
allows the caller to pass the name of the struct member or leave the
parameter empty to mean "I'm not using this feature", thus
eliminating all of these other complications.
Q: Using huge macro like RB_DEFINE_INTERFACE prone to usage errors that
create crappy error messages and have zero type-safety. (not really a
question)
A: True. However, much of this is mitigated by creating an
__rb_sanity_check_##name function that is never called, but will
generate meaningful error messages for most mistakes (incorrect
struct member types, etc.)
Revision History
===============
New in v4:
o Added type-safe wrapper functions for rb_{next,prev,first,last}
to RB_DEFINE_INTERFACE. Naming is the same as other type-safe
functions (e.g., prefix##_first wraps rb_first). (thanks Pavel Pisa
for the suggestion)
o Added rb_find_{first,next,last,prev} (for non-unique trees) to find
the first or last occurrence of a key and iterate through them.
Type-safe wrapper functions also added to RB_DEFINE_INTERFACE. (thanks
again Pavel Pisa)
o Added support for an unsigned long count member of the container
struct that will be updated upon insertions & deletions.
o Improve sanity checks performed by RB_DEFINE_INTERFACE -- error
messages are now more specific and clearer. Type safety for compare
function is now enforced.
o Completed implementation of insert_near (still untested).
o Completed testing for find_near. Performance is something like
O(log distance * 2 + 1), so if your start node is a bit closer than
half way across the tree, find_near will be about the same speed as
find. If it is further, it will be slower. Either way, it is larger
than a normal find (which should be taken into account), so should
only be used when you are fairly certain your target objects is near
the start. (code was changed since this, so it needs re-testing)
o Added support for specifying modifiers for functions generated by
RB_DEFINE_INTERFACE. This adds 4 more parameters, but is probably
better than forcing the user to write their own wrapper functions to
macro-generated wrapper functions, just to change their function
attributes.
o Added run-time versions of all of the __rb_xxx_to_xxx inline
functions, for use in those conditions where someone may actually need
to access these using a run-time struct rb_relatinoship value.
o Performed compile tests on gcc 3.4.6 - 4.7.0 and tweaked BUILD_BUG_ON*
macros to not fail on any of these compilers.
New in v3:
o Moved compare & augment functions back into struct rb_relationship
after discovering that calling them will be inlined in gcc 4.6+ if the
function is flattened.
o Improved doc comments.
o Solved problem of compare function not being checked for
type-correctness by adding a __sanity_check_##name() function to
__RB_DEFINE_INTERFACE that generates usable errors when there's a type
or member name problem in the macro parameters. This is helpful since
the errors produced when the RB_RELATIONSHIP macro expands were quite
terrible.
New in v2:
o Added RB_RELATIONSHIP macro (thanks Peter Zijlstra for the
suggestions).
o Added RB_DEFINE_INTERFACE macro.
Signed-off-by: Daniel Santos <[email protected]>
Add generic red-black tree code to rbtree.h
Signed-off-by: Daniel Santos <[email protected]>
---
include/linux/rbtree.h | 1044 +++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 1042 insertions(+), 2 deletions(-)
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 033b507..959bd27 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -1,7 +1,8 @@
/*
Red Black Trees
(C) 1999 Andrea Arcangeli <[email protected]>
-
+ (C) 2012 Daniel Santos <[email protected]>
+
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
@@ -96,6 +97,8 @@ static inline struct page * rb_insert_page_cache(struct inode * inode,
#include <linux/kernel.h>
#include <linux/stddef.h>
+#include <linux/typecheck.h>
+#include <linux/bug.h>
struct rb_node
{
@@ -148,6 +151,7 @@ extern void rb_insert_color(struct rb_node *, struct rb_root *);
extern void rb_erase(struct rb_node *, struct rb_root *);
typedef void (*rb_augment_f)(struct rb_node *node, void *data);
+typedef long (*rb_compare_f)(const void *a, const void *b);
extern void rb_augment_insert(struct rb_node *node,
rb_augment_f func, void *data);
@@ -162,7 +166,7 @@ extern struct rb_node *rb_first(const struct rb_root *);
extern struct rb_node *rb_last(const struct rb_root *);
/* Fast replacement of a single node without remove/rebalance/add/rebalance */
-extern void rb_replace_node(struct rb_node *victim, struct rb_node *new,
+extern void rb_replace_node(struct rb_node *victim, struct rb_node *new,
struct rb_root *root);
static inline void rb_link_node(struct rb_node * node, struct rb_node * parent,
@@ -174,4 +178,1040 @@ static inline void rb_link_node(struct rb_node * node, struct rb_node * parent,
*rb_link = node;
}
+
+#define __JUNK junk,
+#define _iff_empty(test_or_junk, t, f) __iff_empty(test_or_junk, t, f)
+#define __iff_empty(__ignored1, __ignored2, result, ...) result
+
+/**
+ * IFF_EMPTY() - Expands to the second argument when the first is empty, the
+ * third if non-empty.
+ * @test: An argument to test for emptiness.
+ * @t: A value to expand to if test is empty.
+ * @f: A value to expand to if test is non-empty.
+ *
+ * Caveats:
+ * IFF_EMPTY isn't perfect. The test parmater must either be empty or a valid
+ * pre-processor token as well as result in a valid token when pasted to the
+ * end of a word.
+ *
+ * Valid Examples:
+ * IFF_EMPTY(a, b, c) = c
+ * IFF_EMPTY( , b, c) = b
+ * IFF_EMPTY( , , c) = (nothing)
+ *
+ * Invalid Examples:
+ * IFF_EMPTY(., b, c)
+ * IFF_EMPTY(+, b, c)
+ */
+#define IFF_EMPTY(test, t, f) _iff_empty(__JUNK##test, t, f)
+
+/**
+ * IS_EMPTY() - test if a pre-processor argument is empty.
+ * @arg: An argument (empty or non-empty)
+ *
+ * If empty, expands to 1, 0 otherwise. See IFF_EMPTY() for caveats &
+ * limitations.
+ */
+#define IS_EMPTY(arg) IFF_EMPTY(arg, 1, 0)
+
+/**
+ * OPT_OFFSETOF() - return the offsetof for the supplied expression, or zero
+ * if m is an empty argument.
+ * @type: struct/union type
+ * @m: (optional) struct member name
+ *
+ * Since any offsetof can return zero if the specified member is the first in
+ * the struct/union, you should also check if the argument is empty separately
+ * with IS_EMPTY(m).
+ */
+#define OPT_OFFSETOF(type, member) IFF_EMPTY(member, 0, offsetof(type, member))
+
+enum rb_flags {
+RB_HAS_LEFTMOST = 0x00000001, /* The container has a struct rb_node *leftmost
+ member that will receive a pointer to the
+ leftmost (smallest) object in the tree that
+ is updated during inserts & deletions */
+RB_HAS_RIGHTMOST = 0x00000002, /* same as above (for right side of tree) */
+RB_HAS_COUNT = 0x00000004, /* The container has an unsigned long integer
+ field that will receive updates of the
+ object count in the tree. */
+RB_UNIQUE_KEYS = 0x00000008, /* the tree contains only unique values. */
+RB_INSERT_REPLACES = 0x00000010, /* when set, the rb_insert() will replace a
+ value if it matches the supplied one (valid
+ only when RB_UNIQUE_KEYS is set). */
+RB_IS_AUGMENTED = 0x00000020, /* is an augmented tree */
+RB_CRITICAL_PATH = 0x00000040, /* generate larger code for {find,insert}_near
+ functions that can be faster on some CPUs
+ with large L1 caches (but slower on a
+ Phenom 9850). This may be a complete flop
+ and just need to be removed. */
+RB_ALL_FLAGS = RB_HAS_LEFTMOST | RB_HAS_RIGHTMOST
+ | RB_HAS_COUNT | RB_UNIQUE_KEYS
+ | RB_INSERT_REPLACES | RB_IS_AUGMENTED
+ | RB_CRITICAL_PATH,
+};
+
+/**
+ * struct rb_relationship - Defines relationship between a container and
+ * the objects it contains.
+ * @root_offset: Offset of container's struct rb_root member.
+ * @left_offset: (Used only if has_left.) Offset of the container's struct
+ * rb_node *leftmost member for storing a pointer to the
+ * leftmost node in the tree, which is kept updated as inserts
+ * and deletions are made.
+ * @right_offset: Same as left_offset, except for right side of tree.
+ * @count_offset:
+ * @node_offset: Offset of object's struct rb_node member.
+ * @key_offset: Offset of object's key member.
+ * @flags: TODO
+ * @compare: Pointer to key rb_compare_f function to compare keys.
+ * Although it will be cast to and called as type long (*)(const
+ * void *a, const void *b), you should delcare it as accepting
+ * pointers to your key members, or sanity checks will fail.
+ * Further, it is optimial if the function is declared inline.
+ * @augment: Pointer to the rb_augment_f or zero if tree is not augmented.
+ *
+ * Instances of struct rb_relationship should be compile-time constants (or
+ * rather, the value of its members).
+ */
+struct rb_relationship {
+ ssize_t root_offset;
+ ssize_t left_offset;
+ ssize_t right_offset;
+ ssize_t count_offset;
+ ssize_t node_offset;
+ ssize_t key_offset;
+ int flags;
+ const rb_compare_f compare;
+ const rb_augment_f augment;
+};
+
+#define __RB_PTR(type, ptr, offset) ((type *)((char *)(ptr) + (offset)))
+
+/* public conversion functions for use with run-time values */
+static __always_inline
+struct rb_root *rb_to_root(const void *ptr,
+ const struct rb_relationship *rel)
+{
+ return __RB_PTR(struct rb_root, ptr, rel->root_offset);
+}
+
+static __always_inline
+struct rb_node **rb_root_to_left(struct rb_root *root,
+ const struct rb_relationship *rel)
+{
+ return __RB_PTR(struct rb_node *, root,
+ rel->left_offset - rel->root_offset);
+}
+
+static __always_inline
+struct rb_node **rb_root_to_right(struct rb_root *root,
+ const struct rb_relationship *rel)
+{
+ return __RB_PTR(struct rb_node *, root,
+ rel->right_offset - rel->root_offset);
+}
+
+static __always_inline
+unsigned long *rb_root_to_count(struct rb_root *root,
+ const struct rb_relationship *rel)
+{
+ return __RB_PTR(unsigned long, root,
+ rel->count_offset - rel->root_offset);
+}
+
+static __always_inline
+const void *rb_node_to_key(const struct rb_node *node,
+ const struct rb_relationship *rel)
+{
+ return __RB_PTR(const void, node,
+ rel->key_offset - rel->node_offset);
+}
+
+static __always_inline
+void *rb_node_to_obj(const struct rb_node *node,
+ const struct rb_relationship *rel)
+{
+ return __RB_PTR(void, node, -rel->node_offset);
+}
+
+static __always_inline
+struct rb_node *rb_to_node(const void *ptr, const struct rb_relationship *rel)
+{
+ return __RB_PTR(struct rb_node, ptr, rel->node_offset);
+}
+
+static __always_inline
+const void *rb_to_key(const void *ptr, const struct rb_relationship *rel)
+{
+ return __RB_PTR(const void, ptr, rel->key_offset);
+}
+
+
+/* checked conversion functions that will error on run-time values */
+static __always_inline
+struct rb_root *__rb_to_root(const void *ptr,
+ const struct rb_relationship *rel)
+{
+ BUILD_BUG_ON_NON_CONST42(rel->root_offset);
+ return rb_to_root(ptr, rel);
+}
+
+static __always_inline
+struct rb_node **__rb_root_to_left(struct rb_root *root,
+ const struct rb_relationship *rel)
+{
+ BUILD_BUG_ON42(!(rel->flags & RB_HAS_LEFTMOST));
+ BUILD_BUG_ON_NON_CONST42(rel->root_offset);
+ BUILD_BUG_ON_NON_CONST42(rel->left_offset);
+ return rb_root_to_left(root, rel);
+}
+
+static __always_inline
+struct rb_node **__rb_root_to_right(struct rb_root *root,
+ const struct rb_relationship *rel)
+{
+ BUILD_BUG_ON42(!(rel->flags & RB_HAS_RIGHTMOST));
+ BUILD_BUG_ON_NON_CONST42(rel->root_offset);
+ BUILD_BUG_ON_NON_CONST42(rel->right_offset);
+ return rb_root_to_right(root, rel);
+}
+
+static __always_inline
+unsigned long *__rb_root_to_count(struct rb_root *root,
+ const struct rb_relationship *rel)
+{
+ BUILD_BUG_ON42(!(rel->flags & RB_HAS_COUNT));
+ BUILD_BUG_ON_NON_CONST42(rel->root_offset);
+ BUILD_BUG_ON_NON_CONST42(rel->count_offset);
+ return rb_root_to_count(root, rel);
+}
+
+static __always_inline
+const void *__rb_node_to_key(const struct rb_node *node,
+ const struct rb_relationship *rel)
+{
+ BUILD_BUG_ON_NON_CONST42(rel->node_offset);
+ BUILD_BUG_ON_NON_CONST42(rel->key_offset);
+ return rb_node_to_key(node, rel);
+}
+
+static __always_inline
+void *__rb_node_to_obj(const struct rb_node *node,
+ const struct rb_relationship *rel)
+{
+ BUILD_BUG_ON_NON_CONST42(rel->node_offset);
+ return rb_node_to_obj(node, rel);
+}
+
+static __always_inline
+struct rb_node *__rb_to_node(const void *ptr, const struct rb_relationship *rel)
+{
+ BUILD_BUG_ON_NON_CONST42(rel->node_offset);
+ return rb_to_node(ptr, rel);
+}
+
+static __always_inline
+const void *__rb_to_key(const void *ptr, const struct rb_relationship *rel)
+{
+ BUILD_BUG_ON_NON_CONST42(rel->key_offset);
+ return rb_to_key(ptr, rel);
+}
+
+/** __rb_assert_good_rel -- perform compile-time sanity checks on a struct
+ * rb_relationship
+ * rel: Pointer to a const struct rb_relationship to check
+ */
+static __always_inline
+void __rb_assert_good_rel(const struct rb_relationship *rel)
+{
+ BUILD_BUG_ON_NON_CONST42(rel->flags);
+ BUILD_BUG_ON42(rel->flags & ~RB_ALL_FLAGS);
+
+ BUILD_BUG_ON_NON_CONST42(rel->root_offset);
+ BUILD_BUG_ON_NON_CONST42(rel->node_offset);
+ BUILD_BUG_ON_NON_CONST42(rel->key_offset);
+
+ if (rel->flags & RB_HAS_LEFTMOST)
+ BUILD_BUG_ON_NON_CONST42(rel->count_offset);
+
+ if (rel->flags & RB_HAS_RIGHTMOST)
+ BUILD_BUG_ON_NON_CONST42(rel->right_offset);
+
+ if (rel->flags & RB_HAS_COUNT)
+ BUILD_BUG_ON_NON_CONST42(rel->left_offset);
+
+ /* need if () construct to avoid false-positive prior to gcc 4.6 */
+ if (rel->flags & RB_INSERT_REPLACES)
+ BUILD_BUG_ON42(!(rel->flags & RB_UNIQUE_KEYS));
+}
+
+
+/** __rb_find - Perform a (normal) search of rbtree starting at the specified
+ * node, traversing downward.
+ * node: Node (root or subtree) to start the search from
+ * key: Pointer to a key to search for
+ * rel: Pointer to a const struct rb_relationship
+ */
+static __always_inline __flatten
+struct rb_node *__rb_find(
+ struct rb_node *node,
+ const void *key,
+ const struct rb_relationship *rel)
+{
+ __rb_assert_good_rel(rel);
+ while (node) {
+ long diff = rel->compare(key, __rb_node_to_key(node, rel));
+
+ if (diff > 0)
+ node = node->rb_right;
+ else if (diff < 0)
+ node = node->rb_left;
+ else
+ return node;
+ }
+
+ return 0;
+}
+
+/** rb_find - Perform a (normal) search of a Red-Black Tree.
+ * root: Root of the tree
+ * key: Pointer to a key to search for
+ * rel: Pointer to a const struct rb_relationship
+ */
+static __always_inline __flatten
+struct rb_node *rb_find(
+ struct rb_root *root,
+ const void *key,
+ const struct rb_relationship *rel)
+{
+ return __rb_find(root->rb_node, key, rel);
+}
+
+static __always_inline __flatten
+struct rb_node *__rb_find_first_last(
+ struct rb_node *node,
+ const void *key,
+ const struct rb_relationship *rel,
+ const int find_first)
+{
+ __rb_assert_good_rel(rel);
+
+ /* calling this function with a unique key tree maps to normal find */
+ if (rel->flags & RB_UNIQUE_KEYS)
+ return __rb_find(node, key, rel);
+
+ /* if this section is compiled, we should non-unqiue only */
+ BUILD_BUG_ON42(rel->flags & RB_UNIQUE_KEYS);
+ BUILD_BUG_ON_NON_CONST(find_first);
+
+ while (node) {
+ long diff = rel->compare(key, __rb_node_to_key(node, rel));
+
+ if (diff > 0)
+ node = node->rb_right;
+ else if (diff < 0)
+ node = node->rb_left;
+ else {
+ if (find_first && node->rb_left)
+ node = node->rb_left;
+ else if (!find_first && node->rb_right)
+ node = node->rb_right;
+ else
+ return node;
+ }
+ }
+
+ return 0;
+}
+
+/** rb_find_first - Search for first occurrence of key in the tree.
+ * root: Root of the tree
+ * key: Pointer to a key
+ * rel: Pointer to a const struct rb_relationship
+ *
+ * This function is intended for use with trees containing non-unique keys.
+ * When called for trees with unique keys, it maps to __rb_find (a normal
+ * search).
+ */
+static __always_inline __flatten
+struct rb_node *rb_find_first(
+ struct rb_root *root,
+ const void *key,
+ const struct rb_relationship *rel)
+{
+ return __rb_find_first_last(root->rb_node, key, rel, 1);
+}
+
+/** rb_find_last - Search for last occurrence of key in the tree.
+ * root: Root of the tree
+ * key: Pointer to a key
+ * rel: Pointer to a const struct rb_relationship
+ *
+ * This function is intended for use with trees containing non-unique keys.
+ * When called for trees with unique keys, it maps to __rb_find (a normal
+ * search).
+ */
+static __always_inline __flatten
+struct rb_node *rb_find_last(
+ struct rb_root *root,
+ const void *key,
+ const struct rb_relationship *rel)
+{
+ return __rb_find_first_last(root->rb_node, key, rel, 0);
+}
+
+/** rb_find_next - Locate the next node (in a non-unique tree) whos key matches
+ * the supplied node.
+ * node: Node of the current object in the tree.
+ * rel: Pointer to a const struct rb_relationship
+ *
+ * Gernerally for use after calling rb_find_first(). Only valid for use with a
+ * tree with non-unique keys.
+ */
+static __always_inline __flatten
+struct rb_node *rb_find_next(
+ const struct rb_node *node,
+ const struct rb_relationship *rel)
+{
+ const void *key = __rb_node_to_key(node, rel);
+ struct rb_node *next = rb_next(node);
+
+ BUILD_BUG_ON42(rel->flags & RB_UNIQUE_KEYS);
+ return (next && !rel->compare(key, __rb_node_to_key(next, rel)))
+ ? next : NULL;
+}
+
+/** rb_find_prev - Locate the previous node (in a non-unique tree) whos key
+ * matches the supplied node.
+ * node: Node of the current object in the tree.
+ * rel: Pointer to a const struct rb_relationship
+ *
+ * Gernerally for use after calling rb_find_last(). Only valid for use with a
+ * tree with non-unique keys.
+ */
+static __always_inline __flatten
+struct rb_node *rb_find_prev(
+ const struct rb_node *node,
+ const struct rb_relationship *rel)
+{
+ const void *key = __rb_node_to_key(node, rel);
+ struct rb_node *prev = rb_prev(node);
+
+ BUILD_BUG_ON42(rel->flags & RB_UNIQUE_KEYS);
+ return (prev && !rel->compare(key, __rb_node_to_key(prev, rel)))
+ ? prev : NULL;
+}
+
+
+enum rb_find_subtree_match {
+ RB_MATCH_NONE = 0,
+ RB_MATCH_IMMEDIATE = 2,
+ RB_MATCH_LEFT = -1,
+ RB_MATCH_RIGHT = 1,
+};
+
+/**
+ * __rb_find_subtree - Locate the subtree that contains the specified key (if
+ * it exists) starting from a specific node traversing
+ * upwards.
+ *
+ * This function is used by find_near and insert_near, but behaves differently
+ * for each case (maybe it could have been made separate functions).
+ * Specifically, when doing_insert is non-zero, it will set values in the
+ * location provided by populate ret_link & ret_parent.
+ *
+ */
+static __always_inline __flatten
+struct rb_node *__rb_find_subtree(
+ struct rb_root *root,
+ struct rb_node *start,
+ const void *key,
+ int *matched,
+ struct rb_node ***ret_link, /* wow, tripple indirection.
+ Am I smart or just nuts? */
+ struct rb_node **ret_parent,
+ const struct rb_relationship *rel,
+ const int doing_insert)
+{
+ struct rb_node *prev = start;
+ struct rb_node *node = rb_parent(start);
+ long diff;
+
+ __rb_assert_good_rel(rel);
+ BUILD_BUG_ON_NON_CONST(doing_insert);
+ BUG_ON(doing_insert && (!root || !ret_link || !ret_parent));
+
+ /* already at top of tree, so return start value */
+ if (!node) {
+ *matched = RB_MATCH_NONE;
+ if (doing_insert) {
+ *ret_link = &root->rb_node;
+ *ret_parent = **ret_link;
+ }
+ return start;
+ }
+
+ /* The first compare is just to figure out which direction up the tree
+ * we're traveling. When compare returns a value with a different
+ * sign, we'll have found our subtree, or an exact match if zero.
+ */
+ diff = rel->compare(key, __rb_node_to_key(node, rel));
+
+ if (diff) {
+ int go_left = diff < 0;
+ while (1) {
+ prev = node;
+ node = rb_parent(prev);
+ if (!node)
+ /* Reached top of tree. In this case. rather
+ * than having the top down search start from
+ * the root, we'll start on the prev sibling
+ * since we've already tested the root node, we
+ * know that we don't need to go back the way
+ * we came.
+ */
+ break;
+
+ diff = rel->compare(key, __rb_node_to_key(node, rel));
+ if (go_left ? diff > 0 : diff < 0)
+ /* found the divering node, so the child on the
+ * opposite side (of prev) is the subtree that
+ * will contain the key
+ */
+ break;
+ else if (!diff) {
+ /* exact match */
+ *matched = go_left
+ ? RB_MATCH_LEFT
+ : RB_MATCH_RIGHT;
+
+ goto find_parent_link;
+ }
+ }
+
+ *matched = RB_MATCH_NONE;
+ if (doing_insert) {
+ *ret_parent = prev;
+ *ret_link = go_left ? &prev->rb_left : &prev->rb_right;
+ return **ret_link;
+ } else {
+ return go_left ? prev->rb_left : prev->rb_right;
+ }
+ }
+
+ /* start node's parent was an exact match */
+ *matched = RB_MATCH_IMMEDIATE;
+
+find_parent_link:
+ if (doing_insert) {
+ struct rb_node *parent = rb_parent(node);
+
+ if (!parent) {
+ *ret_link = &root->rb_node;
+ *ret_parent = **ret_link;
+ } else if (parent->rb_left == node) {
+ *ret_link = &parent->rb_left;
+ *ret_parent = parent;
+ } else if (parent->rb_right == node) {
+ *ret_link = &parent->rb_left;
+ *ret_parent = parent;
+ } else {
+ BUG();
+ }
+ }
+
+ return node;
+}
+
+static __always_inline __flatten
+struct rb_node *rb_find_near(
+ struct rb_node *from,
+ const void *key,
+ const struct rb_relationship *rel)
+{
+ int matched;
+ struct rb_node *subtree;
+
+ subtree = __rb_find_subtree(NULL, from, key, &matched, NULL, NULL,
+ rel, 0);
+
+ if (matched)
+ return subtree;
+
+ return __rb_find(subtree, key, rel);
+}
+
+/* common insert epilogue used by rb_insert() and rb_insert_near() */
+static __always_inline __flatten
+struct rb_node *__rb_insert_epilogue(
+ struct rb_root *root,
+ struct rb_node *parent,
+ struct rb_node *node,
+ struct rb_node *found,
+ struct rb_node **rb_link,
+ const struct rb_relationship *rel)
+{
+ if ((rel->flags & RB_UNIQUE_KEYS) && found) {
+ if (rel->flags & RB_INSERT_REPLACES) {
+ /* if we're replacing the entry, we don't incrment
+ * count, but we do still need to do augment
+ */
+ rb_replace_node(found, node, root);
+ goto do_augment;
+ } else {
+ /* otherwise, we don't do either */
+ goto done;
+ }
+ } else {
+ rb_link_node(node, parent, rb_link);
+ rb_insert_color(node, root);
+ }
+
+ if ((rel->flags & RB_HAS_COUNT))
+ ++*__rb_root_to_count(root, rel);
+
+do_augment:
+ if (rel->augment)
+ rb_augment_insert(node, rel->augment, NULL);
+
+done:
+ return found;
+}
+
+
+/**
+ * rb_insert() - Inserts an object into a container.
+ * @root: Pointer to struct rb_root.
+ * @node: Pointer to the node of the new object to insert.
+ * @rel: Pointer to the compile-time constant relationship definition.
+ * @compare: Compare function
+ * @augmented: Augmented function (if using an augmented rbtree)
+ *
+ * If an object with the same key already exists and RB_INSERT_REPLACES is set
+ * then it is replaced with new object node; if RB_INSERT_REPLACES is not set,
+ * then no change is made. In either case, a pointer to the existing object
+ * node is returned.
+ *
+ * If no object with the same key exists, then the new object node is inserted
+ * and NULL is returned.
+ */
+static __always_inline __flatten
+struct rb_node *rb_insert(
+ struct rb_root *root,
+ struct rb_node *node,
+ const struct rb_relationship *rel)
+{
+ struct rb_node **p = &root->rb_node;
+ struct rb_node *parent = NULL;
+ const void *key = __rb_node_to_key(node, rel);
+ int leftmost = 1;
+ int rightmost = 1;
+
+ __rb_assert_good_rel(rel);
+
+ while (*p) {
+ long diff = rel->compare(key, __rb_node_to_key(*p, rel));
+ parent = *p;
+
+ if (diff > 0) {
+ p = &(*p)->rb_right;
+ if (rel->flags & RB_HAS_LEFTMOST)
+ leftmost = 0;
+ } else if (!(rel->flags & RB_UNIQUE_KEYS) || diff < 0) {
+ p = &(*p)->rb_left;
+ if (rel->flags & RB_HAS_RIGHTMOST)
+ rightmost = 0;
+ } else
+ break;
+ }
+
+ if ((rel->flags & RB_HAS_LEFTMOST) && leftmost) {
+ struct rb_node **left = __rb_root_to_left(root, rel);
+
+ if (!(rel->flags & RB_INSERT_REPLACES) || !(*p) || *left == *p)
+ *left = node;
+ }
+ if ((rel->flags & RB_HAS_RIGHTMOST) && rightmost) {
+ struct rb_node **right = __rb_root_to_right(root, rel);
+
+ if (!(rel->flags & RB_INSERT_REPLACES) || !(*p) || *right == *p)
+ *right = node;
+ }
+
+ return __rb_insert_epilogue(root, parent, node, *p, p, rel);
+}
+
+static __always_inline __flatten
+struct rb_node *rb_insert_near(
+ struct rb_root *root,
+ struct rb_node *start,
+ struct rb_node *node,
+ const struct rb_relationship *rel)
+{
+ const void *key = __rb_node_to_key(node, rel);
+ struct rb_node **p;
+ struct rb_node *parent;
+ struct rb_node *ret;
+ int matched;
+ long diff;
+
+ BUILD_BUG_ON_NON_CONST42(rel->flags);
+
+ ret = __rb_find_subtree(root, start, key, &matched, &p, &parent, rel,
+ 1);
+
+ if (!matched) {
+ while (*p) {
+ diff = rel->compare(__rb_node_to_key(*p, rel), key);
+ parent = *p;
+
+ if (diff > 0)
+ p = &(*p)->rb_right;
+ else if (!(rel->flags & RB_UNIQUE_KEYS) || diff < 0)
+ p = &(*p)->rb_left;
+ else
+ break;
+ }
+ ret = *p;
+ }
+
+ /* the longer way to see if we're left- or right-most (since we aren't
+ * starting from the top, we can't use the mechanism rb_insert()
+ * does.)
+ */
+ if (rel->flags & RB_HAS_LEFTMOST) {
+ struct rb_node **left = __rb_root_to_left(root, rel);
+ if (!*left || *left == ret ||
+ (*left == parent && &parent->rb_left == p))
+ *left = node;
+ }
+
+ if (rel->flags & RB_HAS_RIGHTMOST) {
+ struct rb_node **right = __rb_root_to_right(root, rel);
+ if (!*right || *right == ret ||
+ (*right == parent && &parent->rb_right == p))
+ *right = node;
+ }
+
+ return __rb_insert_epilogue(root, parent, node, ret, p, rel);
+}
+
+static __always_inline __flatten
+void rb_remove(
+ struct rb_root *root,
+ struct rb_node *node,
+ const struct rb_relationship *rel)
+{
+ struct rb_node *uninitialized_var(deepest);
+
+ BUILD_BUG_ON_NON_CONST42(rel->flags);
+
+ if (rel->augment)
+ deepest = rb_augment_erase_begin(node);
+
+ if (rel->flags & RB_HAS_LEFTMOST) {
+ struct rb_node **left = __rb_root_to_left(root, rel);
+
+ if (*left == node)
+ *left = rb_next(node);
+ }
+
+ if (rel->flags & RB_HAS_RIGHTMOST) {
+ struct rb_node **right = __rb_root_to_right(root, rel);
+
+ if (*right == node)
+ *right = rb_prev(node);
+ }
+
+ rb_erase(node, root);
+
+ if ((rel->flags & RB_HAS_COUNT))
+ --*__rb_root_to_count(root, rel);
+
+ if (rel->augment)
+ rb_augment_erase_end(deepest, rel->augment, NULL);
+}
+
+
+/**
+ * RB_RELATIONHIP - Define the relationship bewteen a container with a
+ * struct rb_root member, and the objects it contains.
+ * @cont_type: container type
+ * @root: Container's struct rb_root member name
+ * @left: (Optional) If the container needs a pointer to the tree's
+ * leftmost (smallest) object, then specify the container's struct
+ * rb_node *leftmost member. Otherwise, leave this parameter
+ * empty.
+ * @right: (Optional) Same as left, but for the rightmost (largest)
+ * @count (Optional) Name of container's unsigned int member that will be
+ * updated with the number of objects in the tree. It is assumed
+ * that there will never be more than 2^32 objects in a kernel
+ * rbtree, so overlfow is not handled or checked for. Note that if
+ * you add or remove objects from the tree without using the
+ * generic functions, you must update this value yourself.
+ * @obj_type: Type of object stored in container
+ * @node: The struct rb_node memeber of the object
+ * @key: The key member of the object
+ * @flags see enum rb_flags. Note: you do not have to specify
+ * RB_HAS_LEFTMOST, RB_HAS_RIGHTMOST, RB_HAS_COUNT or
+ * RB_IS_AUGMENTED as these will be added automatically if their
+ * respective field is non-empty.
+ * @compare: Pointer to key rb_compare_f function to compare keys.
+ * Although it will be cast to and called as type long (*)(const
+ * void *a, const void *b), you should delcare it as accepting
+ * pointers to your key members, or sanity checks will fail.
+ * Further, it is optimial if the function is declared inline.
+ * @_augment: (optional) pointer to the rb_augment_f or empty if tree is not
+ * augmented.
+ *
+ * Example:
+ * struct my_container {
+ * struct rb_root root;
+ * unsigned int count;
+ * struct rb_node *left;
+ * };
+ *
+ * struct my_object {
+ * struct rb_node node;
+ * int key;
+ * };
+ *
+ * static inline long compare_int(const int *a, const int *b)
+ * {
+ * return (long)*a - (long)*b);
+ * }
+ *
+ * static const struct rb_relationship my_rel = RB_RELATIONSHIP(
+ * struct my_container, root, left, , count, // no rightmost
+ * struct my_object, node, key,
+ * 0, compare_int, ); // no augment
+ */
+#define RB_RELATIONSHIP( \
+ cont_type, root, left, right, count, \
+ obj_type, node, key, \
+ _flags, _compare, _augment) { \
+ .root_offset = offsetof(cont_type, root), \
+ .left_offset = OPT_OFFSETOF(cont_type, left), \
+ .right_offset = OPT_OFFSETOF(cont_type, right), \
+ .count_offset = OPT_OFFSETOF(cont_type, count), \
+ .node_offset = offsetof(obj_type, node), \
+ .key_offset = offsetof(obj_type, key), \
+ .flags = (_flags) \
+ | IFF_EMPTY(left , 0, RB_HAS_LEFTMOST) \
+ | IFF_EMPTY(right, 0, RB_HAS_RIGHTMOST) \
+ | IFF_EMPTY(count, 0, RB_HAS_COUNT) \
+ | IFF_EMPTY(_augment, 0, RB_IS_AUGMENTED), \
+ .compare = (const rb_compare_f) (_compare), \
+ .augment = IFF_EMPTY(_augment, 0, _augment) \
+}
+
+/* type-validation functions used by __rb_sanity_check_##prefix */
+static inline void __rb_verify_root(struct rb_root *root) {}
+static inline void __rb_verify_left(struct rb_node * const *left) {}
+static inline void __rb_verify_right(struct rb_node * const *right) {}
+static inline void __rb_verify_count(const unsigned int *count) {}
+static inline void __rb_verify_node(struct rb_node *node) {}
+static inline void __rb_verify_compare_fn_ret(long *diff) {}
+
+/**
+ * RB_DEFINE_INTERFACE - Defines a complete interface for a relationship
+ * between container and object including a struct
+ * rb_relationship and an interface of type-safe wrapper
+ * functions.
+ * @prefix: name for the relationship (see explanation below)
+ * @cont_type: see RB_RELATIONHIP
+ * @root: see RB_RELATIONHIP
+ * @left: see RB_RELATIONHIP
+ * @right: see RB_RELATIONHIP
+ * @count see RB_RELATIONHIP
+ * @obj_type: see RB_RELATIONHIP
+ * @node: see RB_RELATIONHIP
+ * @key: see RB_RELATIONHIP
+ * @flags see RB_RELATIONHIP
+ * @compare: see RB_RELATIONHIP
+ * @augment: see RB_RELATIONHIP
+ * @insert_mod: (Optional) Function modifiers for insert function,
+ * defaults to "static __always_inline" if left empty.
+ * @insert_near_mod: (Optional) Same as above, for insert_near.
+ * @find_mod: (Optional) Same as above, for find.
+ * @find_near_mod: (Optional) Same as above, for find_near.
+ *
+ * This macro can be delcared in the global scope of either a source or header
+ * file and will generate a static const struct rb_relationship variable named
+ * prefix##_rel as well as similarly named (i.e., prefix##_##func_name)
+ * type-safe wrapper functions for find, find_near, insert, insert_near and
+ * remove. If these function names are not sufficient, you can use the
+ * __RB_DEFINE_INTERFACE macro to specify them explicitly.
+ *
+ * The compare function will be passed pointers to the key members of two
+ * objects. If your compare function needs access to other members of your
+ * struct (e.g., compound keys, etc.) , you can use the rb_entry macro to
+ * access other members. However, if you use this mechanism, your find
+ * function must always pass it's key parameter as a pointer to the key member
+ * of an object of type obj_type, since the compare function is used for both
+ * inserts and lookups (else, you'll be screwed).
+ *
+ * Example:
+ * struct my_container {
+ * struct rb_root root;
+ * unsigned int count;
+ * struct rb_node *left;
+ * };
+ *
+ * struct my_object {
+ * struct rb_node node;
+ * int key;
+ * };
+ *
+ * static inline long compare_int(const int *a, const int *b)
+ * {
+ * return (long)*a - (long)*b);
+ * }
+ *
+ * RB_DEFINE_INTERFACE(
+ * my_tree,
+ * struct my_container, root, left, , count, // no rightmost
+ * struct my_object, node, key,
+ * 0, compare_int,
+ * , // defaults for find
+ * static __flatten, // let gcc decide rather or not to inline insert()
+ * , // defaults on find_near
+ * static __flatten noinline) // don't let gcc inline insert_near()
+ */
+#define RB_DEFINE_INTERFACE( \
+ prefix, \
+ cont_type, root, left, right, count, \
+ obj_type, node, key, \
+ flags, compare, augment, \
+ find_mod, insert_mod, find_near_mod, insert_near_mod) \
+ \
+/* You need not call this function for validation to occur. We define \
+ * __rb_sanity_check function first, so errors will (hopefully) get \
+ * caught here and produce a more helpful error message than a failure \
+ * in the RB_RELATIONSHIP macro expansion. \
+ */ \
+static inline __maybe_unused \
+void __rb_sanity_check_ ## prefix(cont_type *cont, obj_type *obj) \
+{ \
+ /* compare function should accept pointer to key member */ \
+ typeof((compare)(&obj->key, &obj->key)) diff = \
+ (compare)(&obj->key, &obj->key); \
+ __rb_verify_compare_fn_ret(&diff); \
+ \
+ /* validate types of container members */ \
+ __rb_verify_root(&cont->root); \
+ __rb_verify_left (IFF_EMPTY(left , 0, &cont->left)); \
+ __rb_verify_right(IFF_EMPTY(right , 0, &cont->right)); \
+ __rb_verify_count(IFF_EMPTY(count , 0, &cont->count)); \
+ \
+ /* validate types of object node */ \
+ __rb_verify_node(&obj->node); \
+} \
+ \
+static const struct rb_relationship prefix ## _rel = \
+RB_RELATIONSHIP( \
+ cont_type, root, left, right, count, \
+ obj_type, node, key, \
+ flags, compare, augment); \
+ \
+IFF_EMPTY(find_mod, static __always_inline, find_mod) \
+obj_type *prefix ## _find(cont_type *cont, \
+ const typeof(((obj_type *)0)->key) *_key) \
+{ \
+ struct rb_node *ret = rb_find( \
+ &cont->root, _key, &prefix ## _rel); \
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+} \
+ \
+IFF_EMPTY(insert_mod, static __always_inline, insert_mod) \
+obj_type *prefix ## _insert(cont_type *cont, obj_type *obj) \
+{ \
+ struct rb_node *ret = rb_insert( \
+ &cont->root, &obj->node, &prefix ## _rel); \
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+} \
+ \
+IFF_EMPTY(find_near_mod, static __always_inline, find_near_mod) \
+obj_type *prefix ## _find_near(obj_type *near, \
+ const typeof(((obj_type *)0)->key) *_key) \
+{ \
+ struct rb_node *ret = rb_find_near( \
+ &near->node, _key, &prefix ## _rel); \
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+} \
+ \
+IFF_EMPTY(insert_near_mod, static __always_inline, insert_near_mod) \
+obj_type *prefix ## _insert_near(cont_type *cont, obj_type *near, \
+ obj_type *obj) \
+{ \
+ struct rb_node *ret = rb_insert_near( \
+ &cont->root, &near->node, &obj->node, \
+ &prefix ## _rel); \
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+} \
+ \
+static __always_inline \
+void prefix ## _remove(cont_type *cont, obj_type *obj) \
+{ \
+ rb_remove(&cont->root, &obj->node, &prefix ## _rel); \
+} \
+ \
+IFF_EMPTY(find_mod, static __always_inline, find_mod) __maybe_unused \
+obj_type *prefix ## _find_first(cont_type *cont, \
+ const typeof(((obj_type *)0)->key) *_key) \
+{ \
+ struct rb_node *ret = rb_find_first( \
+ &cont->root, _key, &prefix ## _rel); \
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+} \
+ \
+IFF_EMPTY(find_mod, static __always_inline, find_mod) __maybe_unused \
+obj_type *prefix ## _find_last(cont_type *cont, \
+ const typeof(((obj_type *)0)->key) *_key) \
+{ \
+ struct rb_node *ret = rb_find_last( \
+ &cont->root, _key, &prefix ## _rel); \
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+} \
+ \
+static __always_inline \
+obj_type *prefix ## _find_next(const obj_type *obj) \
+{ \
+ struct rb_node *ret = rb_find_next(&obj->node, &prefix ## _rel);\
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+} \
+ \
+static __always_inline \
+obj_type *prefix ## _find_prev(const obj_type *obj) \
+{ \
+ struct rb_node *ret = rb_find_prev(&obj->node, &prefix ## _rel);\
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+} \
+ \
+static __always_inline obj_type *prefix ## _next(const obj_type *obj) \
+{ \
+ struct rb_node *ret = rb_next(&obj->node); \
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+} \
+ \
+static __always_inline obj_type *prefix ## _prev(const obj_type *obj) \
+{ \
+ struct rb_node *ret = rb_prev(&obj->node); \
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+} \
+ \
+static __always_inline obj_type *prefix ## _first(cont_type *cont) \
+{ \
+ struct rb_node *ret = rb_first(&cont->root); \
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+} \
+ \
+static __always_inline obj_type *prefix ## _last(cont_type *cont) \
+{ \
+ struct rb_node *ret = rb_last(&cont->root); \
+ return ret ? rb_entry(ret, obj_type, node) : 0; \
+}
+
#endif /* _LINUX_RBTREE_H */
--
1.7.3.4
BUILD_BUG_ON42(arg)
BUILD_BUG_ON_CONST42(arg)
Prior to gcc 4.2, the optimizer was unable to determine that many
constant values stored in structs were indeed compile-time constants and
optimize them out. Sometimes, it will find an intergral value to be a
compile-time constant, but fail to perform a bit-wise AND at
compile-time. These two macros provide a mechanism to perform these
build-time checks, but not break on older compilers where we already
know they can't be checked at compile time.
For specific details, consult the doc comments for BUILD_BUG_ON_CONST.
These macros are used in the generic rbtree code.
Signed-off-by: Daniel Santos <[email protected]>
---
include/linux/bug.h | 35 +++++++++++++++++++++++++++++++++++
1 files changed, 35 insertions(+), 0 deletions(-)
diff --git a/include/linux/bug.h b/include/linux/bug.h
index e30f600..8e30337 100644
--- a/include/linux/bug.h
+++ b/include/linux/bug.h
@@ -129,6 +129,41 @@ struct pt_regs;
#define BUILD_BUG_ON_NON_CONST(exp)
#endif
+
+#if GCC_VERSION >= 40200
+/**
+ * BUILD_BUG_ON_NON_CONST42 - break compile if expression cannot be determined
+ * to be a compile-time constant (disabled prior to
+ * gcc 4.2)
+ * @exp: value to test for compile-time constness
+ *
+ * Use this macro instead of BUILD_BUG_ON_NON_CONST when testing struct
+ * members or dereferenced arrays and pointers. Note that the version checks
+ * for this macro are not perfect. BUILD_BUG_ON_NON_CONST42 expands to nothing
+ * prior to gcc-4.2, after which it is the same as BUILD_BUG_ON_NON_CONST.
+ * However, there are still many checks that will break with this macro (see
+ * the Gory Details section of BUILD_BUG_ON_NON_CONST for more info).
+ *
+ * See also BUILD_BUG_ON_NON_CONST()
+ */
+# define BUILD_BUG_ON_NON_CONST42(exp) BUILD_BUG_ON_NON_CONST(exp)
+
+/**
+ * BUILD_BUG_ON42 - break compile if expression cannot be determined
+ * (disabled prior to gcc 4.2)
+ *
+ * This gcc-version check is necessary due to breakages in testing struct
+ * members prior to gcc 4.2.
+ *
+ * See also BUILD_BUG_ON()
+ */
+# define BUILD_BUG_ON42(arg) BUILD_BUG_ON(arg)
+#else
+# define BUILD_BUG_ON_NON_CONST42(exp)
+# define BUILD_BUG_ON42(arg)
+#endif /* GCC_VERSION >= 40200 */
+
+
#endif /* __CHECKER__ */
#ifdef CONFIG_GENERIC_BUG
--
1.7.3.4
Negative sized arrays wont create a compile-time error in some cases
starting with gcc 4.4 (e.g., inlined functions), but gcc 4.3 introduced
the error function attribute that will. This patch modifies
BUILD_BUG_ON to behave like BUILD_BUG already does, using the error
function attribute so that you don't have to build the entire kernel to
discover that you have a problem, and then enjoy trying to track it down
from a link-time error.
Signed-off-by: Daniel Santos <[email protected]>
---
include/linux/bug.h | 24 ++++++++++++++----------
1 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/include/linux/bug.h b/include/linux/bug.h
index 298a916..c70b833 100644
--- a/include/linux/bug.h
+++ b/include/linux/bug.h
@@ -42,24 +42,28 @@ struct pt_regs;
* @condition: the condition which the compiler should know is false.
*
* If you have some code which relies on certain constants being equal, or
- * other compile-time-evaluated condition, you should use BUILD_BUG_ON to
+ * some other compile-time-evaluated condition, you should use BUILD_BUG_ON to
* detect if someone changes it.
*
* The implementation uses gcc's reluctance to create a negative array, but
* gcc (as of 4.4) only emits that error for obvious cases (eg. not arguments
- * to inline functions). So as a fallback we use the optimizer; if it can't
- * prove the condition is false, it will cause a link error on the undefined
- * "__build_bug_on_failed". This error message can be harder to track down
- * though, hence the two different methods.
+ * to inline functions). Luckily, in 4.3 they added the "error" function
+ * attribute just for this type of case. Thus, we use a negative sized array
+ * (should always create an error pre-gcc-4.4) and then call an undefined
+ * function with the error attribute (should always creates an error 4.3+). If
+ * for some reason, neither creates a compile-time error, we'll still have a
+ * link-time error, which is harder to track down.
*/
#ifndef __OPTIMIZE__
#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
#else
-extern int __build_bug_on_failed;
-#define BUILD_BUG_ON(condition) \
- do { \
- ((void)sizeof(char[1 - 2*!!(condition)])); \
- if (condition) __build_bug_on_failed = 1; \
+#define BUILD_BUG_ON(condition) \
+ do { \
+ extern void __build_bug_on_failed(void) \
+ __compiletime_error("BUILD_BUG_ON failed"); \
+ ((void)sizeof(char[1 - 2*!!(condition)])); \
+ if (condition) \
+ __build_bug_on_failed(); \
} while(0)
#endif
--
1.7.3.4
For gcc 4.1 & later, expands to __attribute__((flatten)) which forces
the compiler to inline everything it can into the function. This is
useful in combination with noinline when you want to control the depth
of inlining, or create a single function where inline expansions will
occur. (see
http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007bflatten_007d-function-attribute-2512)
Normally, it's best to leave this type of thing up to the compiler.
However, the generic rbtree code uses inline functions just to be able
to inject compile-time constant data that specifies how the caller wants
the function to behave (via struct rb_relationship). This data can be
thought of as the template parameters of a C++ templatized function.
Since some of these functions, once expanded, become quite large, gcc
sometimes decides not to perform some important inlining, in one case,
even generating a few bytes more code by not doing so. (Note: I have not
eliminated the possibility that this was an optimization bug, but the
flatten attribute fixes it in either case.)
Combining __flatten and noinline insures that important optimizations
occur in these cases and that the inline expansion occurs in exactly one
place, thus not leading to unnecissary bloat. However, it also can
eliminate some opportunities for optimization should gcc otherwise
decide the function its self is a good candidate for inlining.
Signed-off-by: Daniel Santos <[email protected]>
---
include/linux/compiler-gcc4.h | 7 ++++++-
include/linux/compiler.h | 4 ++++
2 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 5755e23..38fb81d 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -15,7 +15,12 @@
#if GCC_VERSION >= 40102
# define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
-#endif
+
+/* flatten introduced in 4.1, but broken in 4.6.0 (gcc bug #48731)*/
+# if GCC_VERSION != 40600
+# define __flatten __attribute__((flatten))
+# endif
+#endif /* GCC_VERSION >= 40102 */
#if GCC_VERSION >= 40300
/* Mark functions as cold. gcc will assume any path leading to a call
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 4d9f353..b26d606 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -244,6 +244,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
#define __always_inline inline
#endif
+#ifndef __flatten
+#define __flatten
+#endif
+
#endif /* __KERNEL__ */
/*
--
1.7.3.4
__linktime_error() does the same thing as __compiletime_error() and is
only used in bug.h. Since the macro defines a function attribute that
will cause a failure at compile-time (not link-time), it makes more
sense to keep __compiletime_error(), which is also neatly mated with
__compiletime_warning().
Signed-off-by: Daniel Santos <[email protected]>
---
include/linux/compiler-gcc4.h | 2 --
include/linux/compiler.h | 3 ---
2 files changed, 0 insertions(+), 5 deletions(-)
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 7ad60cd..5755e23 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -33,8 +33,6 @@
the kernel context */
#define __cold __attribute__((__cold__))
-#define __linktime_error(message) __attribute__((__error__(message)))
-
#ifndef __CHECKER__
# define __compiletime_warning(message) __attribute__((warning(message)))
# define __compiletime_error(message) __attribute__((error(message)))
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 923d093..4d9f353 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -293,9 +293,6 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
#ifndef __compiletime_error
# define __compiletime_error(message)
#endif
-#ifndef __linktime_error
-# define __linktime_error(message)
-#endif
/*
* Prevent the compiler from merging or refetching accesses. The compiler
* is also forbidden from reordering successive instances of ACCESS_ONCE(),
--
1.7.3.4
Using GCC_VERSION reduces complexity, is easier to read and is GCC's
recommended mechanism for doing version checks. (Just don't ask me why
they didn't define it in the first place.) This also makes it easy to
merge compiler-gcc{3,4}.h should somebody want to.
Signed-off-by: Daniel Santos <[email protected]>
---
include/linux/compiler-gcc3.h | 8 ++++----
include/linux/compiler-gcc4.h | 12 ++++++------
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/include/linux/compiler-gcc3.h b/include/linux/compiler-gcc3.h
index 37d4124..7d89feb 100644
--- a/include/linux/compiler-gcc3.h
+++ b/include/linux/compiler-gcc3.h
@@ -2,22 +2,22 @@
#error "Please don't include <linux/compiler-gcc3.h> directly, include <linux/compiler.h> instead."
#endif
-#if __GNUC_MINOR__ < 2
+#if GCC_VERSION < 30200
# error Sorry, your compiler is too old - please upgrade it.
#endif
-#if __GNUC_MINOR__ >= 3
+#if GCC_VERSION >= 30300
# define __used __attribute__((__used__))
#else
# define __used __attribute__((__unused__))
#endif
-#if __GNUC_MINOR__ >= 4
+#if GCC_VERSION >= 30400
#define __must_check __attribute__((warn_unused_result))
#endif
#ifdef CONFIG_GCOV_KERNEL
-# if __GNUC_MINOR__ < 4
+# if GCC_VERSION < 30400
# error "GCOV profiling support for gcc versions below 3.4 not included"
# endif /* __GNUC_MINOR__ */
#endif /* CONFIG_GCOV_KERNEL */
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index a334107..7ad60cd 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -4,7 +4,7 @@
/* GCC 4.1.[01] miscompiles __weak */
#ifdef __KERNEL__
-# if __GNUC_MINOR__ == 1 && __GNUC_PATCHLEVEL__ <= 1
+# if GCC_VERSION >= 40100 && GCC_VERSION <= 40101
# error Your version of gcc miscompiles the __weak directive
# endif
#endif
@@ -13,11 +13,11 @@
#define __must_check __attribute__((warn_unused_result))
#define __compiler_offsetof(a,b) __builtin_offsetof(a,b)
-#if __GNUC_MINOR__ > 0
+#if GCC_VERSION >= 40102
# define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
#endif
-#if __GNUC_MINOR__ >= 3
+#if GCC_VERSION >= 40300
/* Mark functions as cold. gcc will assume any path leading to a call
to them will be unlikely. This means a lot of manual unlikely()s
are unnecessary now for any paths leading to the usual suspects
@@ -39,9 +39,9 @@
# define __compiletime_warning(message) __attribute__((warning(message)))
# define __compiletime_error(message) __attribute__((error(message)))
#endif /* __CHECKER__ */
-#endif /* __GNUC_MINOR__ >= 3 */
+#endif /* GCC_VERSION >= 40300 */
-#if __GNUC_MINOR__ >= 5
+#if GCC_VERSION >= 40500
/*
* Mark a position in code as unreachable. This can be used to
* suppress control flow warnings after asm blocks that transfer
@@ -56,5 +56,5 @@
/* Mark a function definition as prohibited from being cloned. */
#define __noclone __attribute__((__noclone__))
-#endif /* __GNUC_MINOR__ >= 5 */
+#endif /* GCC_VERSION >= 40500 */
--
1.7.3.4
Signed-off-by: Daniel Santos <[email protected]>
---
include/linux/bug.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/linux/bug.h b/include/linux/bug.h
index aaac4bb..298a916 100644
--- a/include/linux/bug.h
+++ b/include/linux/bug.h
@@ -73,7 +73,7 @@ extern int __build_bug_on_failed;
#define BUILD_BUG() \
do { \
extern void __build_bug_failed(void) \
- __linktime_error("BUILD_BUG failed"); \
+ __compiletime_error("BUILD_BUG failed");\
__build_bug_failed(); \
} while (0)
--
1.7.3.4
Signed-off-by: Daniel Santos <[email protected]>
---
Documentation/DocBook/kernel-api.tmpl | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index 00687ee..210e9aa 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -43,6 +43,10 @@
<sect1><title>Doubly Linked Lists</title>
!Iinclude/linux/list.h
</sect1>
+ <sect1><title>Red-Black Trees</title>
+!Iinclude/linux/rbtree.h
+!Ilib/rbtree.c
+ </sect1>
</chapter>
<chapter id="libc">
--
1.7.3.4
A very common use of __builtin_constant_p is to make sure that a certain
value is a compile time constant and generate a build-time error if it
is not. However, __builtin_constant_p is broken in a variety of ways in
various situations (on various versions of gcc) and never returns one in
an unoptimized build. This macro provide a mechanism to perform these
build-time checks, but not break unoptimized builds (or modules being
build with -O0), of which there probably aren't many people that care
anyway.
This patch documents all of the relevant quirks I could find in the
"Gory Details" section of the doc-comments. For almost all cases,
BUILD_BUG_ON_NON_CONST() should never fail on a primitive, non-pointer
type variable declared const. A subsequent patch provides a separate
macro for performing tests which are known to be broken in older
compilers (pretty much, using __builtin_constant_p on arrays, pointers &
structs as well as testing those values).
Signed-off-by: Daniel Santos <[email protected]>
---
include/linux/bug.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 48 insertions(+), 0 deletions(-)
diff --git a/include/linux/bug.h b/include/linux/bug.h
index c70b833..e30f600 100644
--- a/include/linux/bug.h
+++ b/include/linux/bug.h
@@ -81,6 +81,54 @@ struct pt_regs;
__build_bug_failed(); \
} while (0)
+/**
+ * BUILD_BUG_ON_NON_CONST - break compile if expression cannot be determined
+ * to be a compile-time constant.
+ * @exp: value to test for compile-time constness
+ *
+ * __builtin_constant_p() is a work in progress and is broken in various ways
+ * on various versions of gcc and optimization levels. It can fail, even when
+ * gcc otherwise determines that the expression is compile-time constant when
+ * performing actual optimizations and thus, compile out the value anyway. Do
+ * not use this macro for struct members or dereferenced pointers and arrays,
+ * as these are broken in many versions of gcc -- use BUILD_BUG_ON_NON_CONST42
+ * or another gcc-version-checked macro instead.
+ *
+ * As long as you are passing a variable declared const (and not modified),
+ * this macro should never fail (except for floats). For information on gcc's
+ * behavior in other cases, see below.
+ *
+ * Gory Details:
+ *
+ * Normal primitive variables
+ * - global non-static non-const values are never compile-time constants (but
+ * you should already know that)
+ * - all const values (global/local, non/static) should never fail this test
+ * (3.4+) with one exception (below)
+ * - floats (which we wont use anyway) are broken in various ways until 4.2
+ * (-O1 broken until 4.4)
+ * - local static non-const broken until 4.2 (-O1 broken until 4.3)
+ * - local non-static non-const broken until 4.0
+ *
+ * Dereferencing pointers & arrays
+ * - all static const derefs broken until 4.4 (except arrays at -O2 or better,
+ * which are fixed in 4.2)
+ * - global non-static const pointer derefs always fail (<=4.7)
+ * - local non-static const derefs broken until 4.3, except for array derefs
+ * to a zero value, which works from 4.0+
+ * - local static non-const pointers always fail (<=4.7)
+ * - local static non-const arrays broken until 4.4
+ * - local non-static non-const arrays broken until 4.0 (unless zero deref,
+ * works in 3.4+)
+
+ */
+#ifdef __OPTIMIZE__
+#define BUILD_BUG_ON_NON_CONST(exp) \
+ BUILD_BUG_ON(!__builtin_constant_p(exp))
+#else
+#define BUILD_BUG_ON_NON_CONST(exp)
+#endif
+
#endif /* __CHECKER__ */
#ifdef CONFIG_GENERIC_BUG
--
1.7.3.4
This helps to keep the file from getting confusing, removes one
duplicate version check and should encourage future editors to put new
macros where they belong.
Signed-off-by: Daniel Santos <[email protected]>
---
include/linux/compiler-gcc4.h | 20 +++++++++++---------
1 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 10ce4fa..a334107 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -13,6 +13,10 @@
#define __must_check __attribute__((warn_unused_result))
#define __compiler_offsetof(a,b) __builtin_offsetof(a,b)
+#if __GNUC_MINOR__ > 0
+# define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
+#endif
+
#if __GNUC_MINOR__ >= 3
/* Mark functions as cold. gcc will assume any path leading to a call
to them will be unlikely. This means a lot of manual unlikely()s
@@ -31,6 +35,12 @@
#define __linktime_error(message) __attribute__((__error__(message)))
+#ifndef __CHECKER__
+# define __compiletime_warning(message) __attribute__((warning(message)))
+# define __compiletime_error(message) __attribute__((error(message)))
+#endif /* __CHECKER__ */
+#endif /* __GNUC_MINOR__ >= 3 */
+
#if __GNUC_MINOR__ >= 5
/*
* Mark a position in code as unreachable. This can be used to
@@ -46,13 +56,5 @@
/* Mark a function definition as prohibited from being cloned. */
#define __noclone __attribute__((__noclone__))
-#endif
-#endif
+#endif /* __GNUC_MINOR__ >= 5 */
-#if __GNUC_MINOR__ > 0
-#define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
-#endif
-#if __GNUC_MINOR__ >= 3 && !defined(__CHECKER__)
-#define __compiletime_warning(message) __attribute__((warning(message)))
-#define __compiletime_error(message) __attribute__((error(message)))
-#endif
--
1.7.3.4
Signed-off-by: Daniel Santos <[email protected]>
---
kernel/sched/fair.c | 82 +++++++++++++++++----------------------------------
1 files changed, 27 insertions(+), 55 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c099cc6..ac12e303 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -447,6 +447,20 @@ static inline int entity_before(struct sched_entity *a,
return (s64)(a->vruntime - b->vruntime) < 0;
}
+static inline long compare_vruntime(u64 *a, u64 *b)
+{
+#if __BITS_PER_LONG >= 64
+ return (long)((s64)*a - (s64)*b);
+#else
+/* This is hacky, but is done to reduce instructions -- we wont use this for
+ * rbtree lookups, only inserts, and since our relationship is defined as
+ * non-unique, we only need to return positive if a > b and any other value
+ * means less than.
+ */
+ return (long)(*a > *b);
+#endif
+}
+
static void update_min_vruntime(struct cfs_rq *cfs_rq)
{
u64 vruntime = cfs_rq->min_vruntime;
@@ -472,56 +486,14 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
#endif
}
-/*
- * Enqueue an entity into the rb-tree:
- */
-static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
-{
- struct rb_node **link = &cfs_rq->tasks_timeline.rb_node;
- struct rb_node *parent = NULL;
- struct sched_entity *entry;
- int leftmost = 1;
-
- /*
- * Find the right place in the rbtree:
- */
- while (*link) {
- parent = *link;
- entry = rb_entry(parent, struct sched_entity, run_node);
- /*
- * We dont care about collisions. Nodes with
- * the same key stay together.
- */
- if (entity_before(se, entry)) {
- link = &parent->rb_left;
- } else {
- link = &parent->rb_right;
- leftmost = 0;
- }
- }
-
- /*
- * Maintain a cache of leftmost tree entries (it is frequently
- * used):
- */
- if (leftmost)
- cfs_rq->rb_leftmost = &se->run_node;
-
- rb_link_node(&se->run_node, parent, link);
- rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline);
-}
-
-static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
-{
- if (cfs_rq->rb_leftmost == &se->run_node) {
- struct rb_node *next_node;
-
- next_node = rb_next(&se->run_node);
- cfs_rq->rb_leftmost = next_node;
- }
-
- rb_erase(&se->run_node, &cfs_rq->tasks_timeline);
-}
+RB_DEFINE_INTERFACE(
+ fair_tree,
+ struct cfs_rq, tasks_timeline, rb_leftmost, /* no right or count */, ,
+ struct sched_entity, run_node, vruntime,
+ 0, compare_vruntime, /* no augment */,
+ /* find unused */ ,
+ static __flatten, /* let gcc decide rather or not to inline insert */
+ /* find_near unused */, /* insert_near unused */)
struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
{
@@ -1108,7 +1080,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
update_stats_enqueue(cfs_rq, se);
check_spread(cfs_rq, se);
if (se != cfs_rq->curr)
- __enqueue_entity(cfs_rq, se);
+ fair_tree_insert(cfs_rq, se);
se->on_rq = 1;
if (cfs_rq->nr_running == 1) {
@@ -1189,7 +1161,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
clear_buddies(cfs_rq, se);
if (se != cfs_rq->curr)
- __dequeue_entity(cfs_rq, se);
+ fair_tree_remove(cfs_rq, se);
se->on_rq = 0;
update_cfs_load(cfs_rq, 0);
account_entity_dequeue(cfs_rq, se);
@@ -1260,7 +1232,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
* runqueue.
*/
update_stats_wait_end(cfs_rq, se);
- __dequeue_entity(cfs_rq, se);
+ fair_tree_remove(cfs_rq, se);
}
update_stats_curr_start(cfs_rq, se);
@@ -1339,7 +1311,7 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev)
if (prev->on_rq) {
update_stats_wait_start(cfs_rq, prev);
/* Put 'current' back into the tree. */
- __enqueue_entity(cfs_rq, prev);
+ fair_tree_insert(cfs_rq, prev);
}
cfs_rq->curr = NULL;
}
@@ -3593,7 +3565,7 @@ void update_group_power(struct sched_domain *sd, int cpu)
/*
* !SD_OVERLAP domains can assume that child groups
* span the current group.
- */
+ */
group = child->groups;
do {
--
1.7.3.4
__attribute__((error(msg))) was introduced in gcc 4.3 (not 4.4) and as I
was unable to find any gcc bugs pertaining to it, I'm presuming that it
has functioned as advertised since 4.3.0.
Signed-off-by: Daniel Santos <[email protected]>
---
include/linux/compiler-gcc4.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 2f40791..10ce4fa 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -52,7 +52,7 @@
#if __GNUC_MINOR__ > 0
#define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
#endif
-#if __GNUC_MINOR__ >= 4 && !defined(__CHECKER__)
+#if __GNUC_MINOR__ >= 3 && !defined(__CHECKER__)
#define __compiletime_warning(message) __attribute__((warning(message)))
#define __compiletime_error(message) __attribute__((error(message)))
#endif
--
1.7.3.4
Throughout compiler*.h, many version checks are made. These can be
simplified by using the macro that gcc's documentation recommends.
However, my primary reason for adding this is that I need bug-check
macros that are enabled at certain gcc versions and it's cleaner to use
this macro than the tradition method:
if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ => 2)
If you add patch level, it gets this ugly:
if __GNUC__ > 4 || (__GNUC__ == 4 && (__GNUC_MINOR__ > 2 || \
__GNUC_MINOR__ == 2 __GNUC_PATCHLEVEL__ >= 1))
As opposed to:
if GCC_VERSION >= 40201
While having separate headers for gcc 3 & 4 eliminates some of this
verbosity, they can still be cleaned up by this.
See also:
http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html
Signed-off-by: Daniel Santos <[email protected]>
---
include/linux/compiler-gcc.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index e5834aa..7970e31 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -5,6 +5,9 @@
/*
* Common definitions for all gcc versions go here.
*/
+#define GCC_VERSION (__GNUC__ * 10000 \
+ + __GNUC_MINOR__ * 100 \
+ + __GNUC_PATCHLEVEL__)
/* Optimization barrier */
--
1.7.3.4
On 06/22/2012 11:00 PM, Daniel Santos wrote:
> Theory of Operation
> ===================
> Historically, genericity in C meant function pointers, the overhead of a
> function call and the inability of the compiler to optimize code across
> the function call boundary. GCC has been getting better and better at
> optimization and determining when a value is a compile-time constant and
> compiling it out. As of gcc 4.6, it has finally reached a point where
> it's possible to have generic search & insert cores that optimize
> exactly as well as if they were hand-coded. (see also gcc man page:
> -findirect-inlining)
For those of us who stopped upgrading gcc when it went to a non-open
license, and the people trying to escape to llvm/pcc/open64/tcc/qcc/etc
and build the kernel with that, this will simply be "less optimized"
rather than "you're SOL, hail stallman"?
> Layer 2: Type-Safety
> --------------------
> In order to achieve type-safety of a generic interface in C, we must
> delve deep into the darkened Swamps of The Preprocessor and confront the
> Prince of Darkness himself: Big Ugly Macro. To be fair, there is an
> alternative solution (discussed in History & Design Goals), the
> so-called "x-macro" or "supermacro" where you #define some pre-processor
> values and include an unguarded header file. With 17 parameters, I
> choose this solution for its ease of use and brevity, but it's an area
> worth debate.
Because this is just _filling_ me with confidence about portability and
c99 compliance.
(Or I suppose C11!!one! compliance. The new thing that puts asserts in
the base language and makes u8 a keyword since _that_ won't break
existing code and putting utf8 string constants within quotes wasn't
previously possible.)
I'm not saying the standard's perfect, I'm saying a web page that ties
itself to mozilla at the expense of working on firefox, let alone
chrome, might be a bit short-sighted these days. XFree86 begat x.org,
OpenOffice begat libre, etc. The FSF went nuts again and this time
around EGCS is called LLVM, so talking about gcc 4.6-only features
thrills some of us less than you might expect.
I suppose sparse has to be able to cope with this, so that's something...
> To avoid needing multiple versions of the macro, we use a paradigm
Indeed.
I still have trouble remembering how trampolines work when I wander away
for a while. Oh well...
Rob
--
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation. Pick one.
First off, thanks for your lively commentary!
On 06/23/2012 06:01 PM, Rob Landley wrote:
> On 06/22/2012 11:00 PM, Daniel Santos wrote:
>> Theory of Operation
>> ===================
>> Historically, genericity in C meant function pointers, the overhead of a
>> function call and the inability of the compiler to optimize code across
>> the function call boundary. GCC has been getting better and better at
>> optimization and determining when a value is a compile-time constant and
>> compiling it out. As of gcc 4.6, it has finally reached a point where
>> it's possible to have generic search & insert cores that optimize
>> exactly as well as if they were hand-coded. (see also gcc man page:
>> -findirect-inlining)
> For those of us who stopped upgrading gcc when it went to a non-open
> license, and the people trying to escape to llvm/pcc/open64/tcc/qcc/etc
> and build the kernel with that, this will simply be "less optimized"
> rather than "you're SOL, hail stallman"?
Forgive me, but I'm a little stumped here. When did GCC move to a
"non-open" license?
Either way, yes, one thing that must be considered with this is that
code compiled prior to gcc 4.6 is indeed "less optimized." For gcc 4.2
through 4.5, the difference is very minor (the compare function is not
inlined). From there back, it starts to grow. In gcc 3.4.6, it's
fairly ugly.
I haven't performed tests on other compilers yet, but I wouldn't be
surprised if llvm/clang does this better. My next phase is to build a
test module so we can quickly get performance data on various platforms
& compilers. Either way, an alternate compiler will have to support at
least some of gcc's extensions to compile Linux.
Another question that has to be asked is "Is Linux ready for this?" The
answer today may be "yes" or "no", but to say that it can never be ready
is pure folly. Eventually, you have to decide to stop supporting older
compilers and move to newer paradigms that lower the maintenance burden,
enabling your developers to work on more important things. C++ offers
lots of these types of paradigms, but the compilers just aren't good
enough yet (of course, there's the laundry list of reasons to not use C++).
>
>> Layer 2: Type-Safety
>> --------------------
>> In order to achieve type-safety of a generic interface in C, we must
>> delve deep into the darkened Swamps of The Preprocessor and confront the
>> Prince of Darkness himself: Big Ugly Macro. To be fair, there is an
>> alternative solution (discussed in History & Design Goals), the
>> so-called "x-macro" or "supermacro" where you #define some pre-processor
>> values and include an unguarded header file. With 17 parameters, I
>> choose this solution for its ease of use and brevity, but it's an area
>> worth debate.
> Because this is just _filling_ me with confidence about portability and
> c99 compliance.
This is actually C99 compliant, even though it wont work on all
compilers. Somebody tested it on msvc and told me that it broke
(*faints*). The "iffy" macro is IFF_EMPTY(), which uses a similar
mechanism to kconfig.h's IS_ENABLED() macro. I kinda doubt that one
would break and not the other. It's fairly easy to test though.
Aside from that, is the size of the macro. After stripping comments,
the size is roughly 4155 bytes, indeed beyond the C99's minimum required
size of 4096 bytes (even though most compilers use a much larger
buffer). So that's another issue that can break on other compilers,
unless it stores the macro with whitespace condensed, which would bring
it down to 3416 bytes.
> (Or I suppose C11!!one! compliance. The new thing that puts asserts in
> the base language and makes u8 a keyword since _that_ won't break
> existing code and putting utf8 string constants within quotes wasn't
> previously possible.)
>
> I'm not saying the standard's perfect, I'm saying a web page that ties
> itself to mozilla at the expense of working on firefox, let alone
> chrome, might be a bit short-sighted these days. XFree86 begat x.org,
> OpenOffice begat libre, etc. The FSF went nuts again and this time
> around EGCS is called LLVM, so talking about gcc 4.6-only features
> thrills some of us less than you might expect.
It's really just a "not broken on gcc 4.6", since -findirect-inlining
was introduced in gcc 4.4 or some such.
Thanks for your feedback!
Daniel
On 06/23/2012 07:40 PM, Daniel Santos wrote:
> First off, thanks for your lively commentary!
>
> On 06/23/2012 06:01 PM, Rob Landley wrote:
>> On 06/22/2012 11:00 PM, Daniel Santos wrote:
>>> Theory of Operation
>>> ===================
>>> Historically, genericity in C meant function pointers, the overhead of a
>>> function call and the inability of the compiler to optimize code across
>>> the function call boundary. GCC has been getting better and better at
>>> optimization and determining when a value is a compile-time constant and
>>> compiling it out. As of gcc 4.6, it has finally reached a point where
>>> it's possible to have generic search & insert cores that optimize
>>> exactly as well as if they were hand-coded. (see also gcc man page:
>>> -findirect-inlining)
>> For those of us who stopped upgrading gcc when it went to a non-open
>> license, and the people trying to escape to llvm/pcc/open64/tcc/qcc/etc
>> and build the kernel with that, this will simply be "less optimized"
>> rather than "you're SOL, hail stallman"?
> Forgive me, but I'm a little stumped here. When did GCC move to a
> "non-open" license?
About when it decided anti-tivoization was within its mandate, so
delivering the modified code (satisfying "open source") was no longer
good enough for the FSF, so they added additional restrictions on how
you're allowed to install it on devices.
Can 'o worms. The list has been over this at great length.
> Either way, yes, one thing that must be considered with this is that
> code compiled prior to gcc 4.6 is indeed "less optimized." For gcc 4.2
> through 4.5, the difference is very minor (the compare function is not
> inlined). From there back, it starts to grow. In gcc 3.4.6, it's
> fairly ugly.
Less optimized but still works with old compilers is fine. (Less
optimized comes with the territory with old compilers.)
> I haven't performed tests on other compilers yet, but I wouldn't be
> surprised if llvm/clang does this better. My next phase is to build a
> test module so we can quickly get performance data on various platforms
> & compilers. Either way, an alternate compiler will have to support at
> least some of gcc's extensions to compile Linux.
Or patch them out, like tccboot did. Or incorporate them into new
standards, as c99 did. Or submit patches to the kernel ala the c99
structure initializers...
And who's talking "if"?
LLVM/CLANG builds a bootabe linux kernel:
http://lwn.net/Articles/441018/
Open64 builds a bootable linux kernel:
http://article.gmane.org/gmane.comp.compilers.open64.devel/2498
The PCC guys make an adorable token effort which is largely ignored:
http://bsdfund.org/bundle/
Heck, Fabrice bellard did it himself with tinycc back in 2004:
http://bellard.org/tcc/tccboot.html
(Yes, that was a modified subset of an obsolete version, and then a
little side project called "qemu" started taking up all his time, so the
tinycc project stagnated in 2005. But it _was_ the first non-gcc
compiler to pull this off, not counting Intel's closed-source ICC. And
reviving tinycc and gluing qemu's tiny code generator onto it as a
back-end so I don't have to handle target support myself is on my todo
list: http://landley.net/notes-2012.html#14-06-2012 )
LLVM is the leader this space because Apple's funding it, but I really
can't get that enthused about a C compiler written in C++. Of course
gcc's been moving that way since 2008 (http://lwn.net/Articles/286539/)
so LLVM isn't actually worse there. Highly questionable wither either is
really _progress_, though.
> Another question that has to be asked is "Is Linux ready for this?" The
> answer today may be "yes" or "no", but to say that it can never be ready
> is pure folly.
I'm saying that adding complexity is not necessarily an improvement, and
that over-optimizing at the expense of portability may turn out to be a
mistake in the long run.
That said, reducing code duplication is a good thing. I'm just not
convinced "how well gcc 4.6 optimizes this" is the only relevant test
criteria here.
> Eventually, you have to decide to stop supporting older
> compilers and move to newer paradigms that lower the maintenance burden,
You keep using the word "paradigm". Voluntarily. I find this odd.
> enabling your developers to work on more important things. C++ offers
> lots of these types of paradigms, but the compilers just aren't good
> enough yet (of course, there's the laundry list of reasons to not use C++).
Um, yes. Yes there is:
http://landley.net/notes-2011.html#16-03-2011
http://landley.net/notes-2011.html#19-03-2011
http://landley.net/notes-2011.html#20-03-2011
>>> Layer 2: Type-Safety
>>> --------------------
>>> In order to achieve type-safety of a generic interface in C, we must
>>> delve deep into the darkened Swamps of The Preprocessor and confront the
>>> Prince of Darkness himself: Big Ugly Macro. To be fair, there is an
>>> alternative solution (discussed in History & Design Goals), the
>>> so-called "x-macro" or "supermacro" where you #define some pre-processor
>>> values and include an unguarded header file. With 17 parameters, I
>>> choose this solution for its ease of use and brevity, but it's an area
>>> worth debate.
>> Because this is just _filling_ me with confidence about portability and
>> c99 compliance.
> This is actually C99 compliant, even though it wont work on all
> compilers. Somebody tested it on msvc and told me that it broke
> (*faints*).
LLVM/CLANG is bsd licensed (alas, with advertising clause, but code from
it might actually be incorporatable into the linux kernel, unlke gcc,
which is under a license extensively incompatible with the linux kernel).
Open64 is GPLv2, same license as the linux kernel.
The old tcc code was LGPLv2 but I have email from Fabrice saying he's ok
with his code being used under BSD, and I'm working on the clearances
and/or cleaning out the old code from people other than him to do 2
clause BSD on that. (Just for kicks at the moment. This became harder
when the tinycc website bit-rotted so badly 3 years after its last
release that the mailing list archive went away, but I'm dinking away at
it regardless...)
http://pcc.ludd.ltu.se/ is bsd licensed. It seems llvm has taken a bit
of the wind out of its' sails (having Apple devote multiple full-time
engineers will do that), but development is still chugging along.
Somebody glued Linus's Sparse to the LLVM backend last year:
http://lwn.net/Articles/456709/
Rather a large number of people reacted to gcc going GPLv3 the same way
they reacted to David Dawes changing the XFree86 license. They tend to
be quiet about it because the FSF guys scream bloody murder if they find
out (a heretic being worse than an atheist, as always). But the "shut up
and show me the code" aspects have been chugging along steadily.
They all seem to have decided to start over from scratch instead of
doing another egcs style fork like x.org did in response to that
relicensing. Possibly this is because the one thing GPLv3 _did_ manage
to do was undermine GPLv2 to the point that lots of people (myself
included) bit the bullet and started releasing BSD code.
(Now when you hear "the code is GPL", you can't tell from that whether
or not you can incorporate it in a given project. I struggled with that
for years before deciding it was untenable in the long run. Copyleft
only seems to work when there's a single unified pool whose network
effects made a single license a category killer, which GPLv2 was but
GPLv3 isn't. Sun tried splitting the community with CDDL, but it took
the FSF to have the leverage to truly poison the GPL with a CDDL2 six
years _after_ Linus announced he wouldn't switch to such a thing.
Section 6 of http://www.dwheeler.com/essays/gpl-compatible.html is
downright ironic in retrospect.)
Anyway, as I said: can 'o worms. You're welcome to disagree with my
assesment, but I'm not alone in it. I'm just usually open about it. Most
people who've lost faith in the GPL stay quiet about it, and several
mouth the words when everybody else sings from the hymbook to avoid the
inquisition.
(And yes, the transcript of
http://sf.geekitude.com/content/pros-and-cons-gnu-general-public-license-linucon-2005
is also retroactively ironic. Possibly in the Alanis Morisette sense of
"actually just unfortunate".)
> The "iffy" macro is IFF_EMPTY(), which uses a similar
> mechanism to kconfig.h's IS_ENABLED() macro. I kinda doubt that one
> would break and not the other. It's fairly easy to test though.
Yeah, people kept emailing me that because I implemented ENABLE macros
back in 2006 for BusyBox:
http://git.busybox.net/busybox/commit/?id=7bfa88f315d71c7f7a1b76fcec3886c7506aca24
At the time I played around with various preprocessor macros but wasn't
happy with the readability of any of it, so I modified kconfig and threw
some sed in the makefile instead. Alas, when I tried to push some of the
config infrastructure upstream, the kconfig maintainer at the time threw
up all over it, and it devolved into bikeshedding, and I lost interest
(as usual when that happens):
http://lkml.indiana.edu/hypermail/linux/kernel/0707.1/1741.html
(Also, if you typo a macro name then "build break" instead of "silently
always false" is a _feature_, not a bug. Macros that convert #ifdef
CONFIG_BLAH to if (ENABLE_BLAH) tend to go the second way because that's
the input they work with. Yeah, code review will catch it eventually,
but...)
So yay standard C99 micro stuff. I'm a little worried about being able
to unwrap it all and understand what's going on, but unlike templates we
have cc -E to fall back on here, so ok.
It's when you say "won't work on all compilers" that I start to worry.
C99 is a fairly reasonable standard. My understanding is that our
divergences from it are mostly historical (the kernel predates C99).
> Aside from that, is the size of the macro. After stripping comments,
> the size is roughly 4155 bytes, indeed beyond the C99's minimum required
> size of 4096 bytes (even though most compilers use a much larger
> buffer).
Eh, I'm not worried about that. That _is_ a "fix your compiler" issue.
(Hardwired buffer sizes in 2012... tsk.)
> So that's another issue that can break on other compilers,
> unless it stores the macro with whitespace condensed, which would bring
> it down to 3416 bytes.
>> (Or I suppose C11!!one! compliance. The new thing that puts asserts in
>> the base language and makes u8 a keyword since _that_ won't break
>> existing code and putting utf8 string constants within quotes wasn't
>> previously possible.)
>>
>> I'm not saying the standard's perfect, I'm saying a web page that ties
>> itself to mozilla at the expense of working on firefox, let alone
>> chrome, might be a bit short-sighted these days. XFree86 begat x.org,
>> OpenOffice begat libre, etc. The FSF went nuts again and this time
>> around EGCS is called LLVM, so talking about gcc 4.6-only features
>> thrills some of us less than you might expect.
> It's really just a "not broken on gcc 4.6", since -findirect-inlining
> was introduced in gcc 4.4 or some such.
Ok, so "designed with gcc 4.6's optimizer in mind, regression tested
back to 3.4 for at least minimal functionality, and not intentionally
breaking other compilers".
That's what I wanted to know, and it sounds good to me.
Rob
--
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation. Pick one.
Hello Rob,
I follow-up Daniel's RB series from Jonathan Corbet's
summarry on LWN http://lwn.net/Articles/500355/ .
I have attempted ourself to solve implementation of
type safe tree containers for C more than 10 years ago
(GAVL) and it is used in many of my own and our university
projects and even in (at least three) companies for
proprietary projects.
I have done multiple reviews of the Daniel's RB series
and I am confident that his approach is more clean
and generic than mine.
I would be happy if it can get into Linux kernel,
because it makes use of RB trees much simpler (user does
not need to care about details) and with much better
potential to catch user errors.
According to my review, there is only one possibly problematic
preprocessor construct. The Linux kernel is already dependent
on this CPP behavior (IS_ENABLED macro) anyway.
If you are fan of other compilers, please, provide
constructive help and debate there and test attached
C code on compilers which you use and consider
them as candidates to compile linux kernel.
According to my test, there is problem with MSVC
(free WDF version run under WINE to deliver our company
open-source projects to unfortunate users using Windows).
Version
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.207 for 80x86
Copyright (C) Microsoft Corporation. All rights reserved.
But that is compiler which breaks many standards
and if GCC is not enough open to you then I expect that
you do not mind if Linux kernel cannot be compiled by MSVC.
So I do not see that as a problem.
So again, please, test LLVM or other your beloved
compilers and report if there is a problem.
It would worth to have even user space version of the
kernel RB tree code with Dainie's interface to run feasibility
and performance tests in more environments quickly
(as Daniel mentioned already). But it is not necessary
for me to be convinced that provided changes should go into
kernel because I am convinced already and offer my
"Reviewed-by" confirmation.
It would even worth to have this code available for
user space projects and libraries use. But there is a problem
that core part of the Linux RB code is GPL licensed
and for library use LGPL, MPL or GPL with linking exception
is usually required.
Best wishes,
Pavel Pisa
e-mail: [email protected]
www: http://cmp.felk.cvut.cz/~pisa
university: http://dce.fel.cvut.cz/
company: http://www.pikron.com/
#include <stdio.h>
#define IFF_EMPTY(test, t, f) _iff_empty(__JUNK##test, t, f)
#define __JUNK junk ,
#define _iff_empty(test_or_junk, t, f) __iff_empty(test_or_junk, t, f)
#define __iff_empty(__ignored1, __ignored2, result, ...) result
#define IS_EMPTY(arg) IFF_EMPTY(arg, 1, 0)
#define TESTER_empty
#define TESTER_filled filled
int main(int argc, char *argv[])
{
printf("IS_EMPTY for empty %d (should be 1)\n", IS_EMPTY(TESTER_empty));
printf("IS_EMPTY for filled %d (should be 0)\n", IS_EMPTY(TESTER_filled));
printf("IS_EMPTY for unknown %d (should be 0)\n", IS_EMPTY(TESTER_unknown));
printf("IS_EMPTY for nil %d (should be 1)\n", IS_EMPTY());
printf("IS_EMPTY for space %d (should be 1)\n", IS_EMPTY( ));
printf("IS_EMPTY for comment %d (should be 1)\n", IS_EMPTY(/*test*/));
}
> > Forgive me, but I'm a little stumped here. When did GCC move to a
> > "non-open" license?
It didn't
> About when it decided anti-tivoization was within its mandate, so
A lot of consider the GPLv2 license has the same requirement
> That said, reducing code duplication is a good thing. I'm just not
> convinced "how well gcc 4.6 optimizes this" is the only relevant test
> criteria here.
It is for the other 99.999999999% of users.
> LLVM/CLANG is bsd licensed (alas, with advertising clause, but code from
> it might actually be incorporatable into the linux kernel, unlke gcc,
> which is under a license extensively incompatible with the linux kernel).
Advertising clause BSD is not GPL compatible.
> Ok, so "designed with gcc 4.6's optimizer in mind, regression tested
> back to 3.4 for at least minimal functionality, and not intentionally
> breaking other compilers".
If you want it to work with weird cornercase compilers its for your
benefit so it's your problem and you do the work. Thats how free software
works. At the point lots of people starting using your compiler it might
matter more and if everyone starts using your compiler then it'll end up
being the primary optimisation target.
Alan
On 06/24/2012 02:57 AM, Pavel Pisa wrote:
> So again, please, test LLVM or other your beloved
> compilers and report if there is a problem.
He already addressed my concerns.
Rob
--
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation. Pick one.
Rob,
Thank you for the education. I wasn't up on the "anti-tivoization"
issue, but after reading up I would have to say that I'm deeply in that
camp; if I buy the hardware, I should be able to do what I damn well
please with it -- why I haven't purchased an iphone for example. At
the same time, I also believe in GPL w/link-time exception if the
situation warrants it. But this is a discussion for another time & place.
On 06/23/2012 11:39 PM, Rob Landley wrote:
> Less optimized but still works with old compilers is fine. (Less
> optimized comes with the territory with old compilers.)
This was of particular concern to be because of how much less
optimized. For example, an inline function that expanded to about 128
bytes on gcc 4.6 expanded to 500-ish bytes on gcc 3.4.6. I guess I
should reserve any further comment until I have hard performance numbers.
> And who's talking "if"?
>
> LLVM/CLANG builds a bootabe linux kernel:
> http://lwn.net/Articles/441018/
>
> Open64 builds a bootable linux kernel:
> http://article.gmane.org/gmane.comp.compilers.open64.devel/2498
>
> The PCC guys make an adorable token effort which is largely ignored:
> http://bsdfund.org/bundle/
>
> Heck, Fabrice bellard did it himself with tinycc back in 2004:
> http://bellard.org/tcc/tccboot.html
Frankly, I'm glad to hear about these endeavors. I'm not a fan of Apple
because of their business policies, but I'm a huge fan of run-time
optimization being a hard-core Java programmer (and yes, I admit it :).
I did check out tccboot at one point out of interest.
>> Another question that has to be asked is "Is Linux ready for this?" The
>> answer today may be "yes" or "no", but to say that it can never be ready
>> is pure folly.
>
> I'm saying that adding complexity is not necessarily an improvement, and
> that over-optimizing at the expense of portability may turn out to be a
> mistake in the long run.
Any time you want to add complexity, it must be justified. However, I'm
operating by the compiler compatibility rules adopted by the Linux
kernel project and putting portability (to non-gcc compilers) as a lower
concern than the 99% use case.
> You keep using the word "paradigm". Voluntarily. I find this odd.
I don't tend to back away from a word just because PHBs (Pointy-Haired
Bosses) like to abuse them. I'll even use the word "synergistic" when I
think it applies. What I'm proposing introduces a few new paradigms,
patterns or schools of thought(and I prefer to pronounce it
"pair-uh-dig-uh-me", just because I can).
For one, I'm asking people to use a macro where you leave parameters
empty as a sort of "default" value. This isn't standard and if I'm
going to ask somebody(especially lots of people) to do something
non-standard I want to make sure I've explored it thoroughly, eliminated
all other options and that there's a really good reason for doing so.
If I can prove that there's a good reason for doing it, that there are
no other reasonable options, that it has a benefit and is reusable, then
we're using a new paradigm and we have to start thinking about the
interface differently than we're used to thinking about interfaces.
Many consider this out-right macro abuse -- that is a valid school of
thought. I'm introducing this as an alternate school of thought.
But the other paradigm that seems to be new is using constant function
pointers for a sort-of "type-injection" for generic inline functions.
We usually think about function pointers as overhead. But since gcc's
-findirect-inlining feature has matured, it creates the opportunity for
a new way of thinking about the use of function pointers (when they are
constant) for generic code. I prefer the term "paradigm" when mechanism
requires a distinct way of thinking about the mechanism and its
components and can be re-used as a pattern.
> So yay standard C99 micro stuff. I'm a little worried about being able
> to unwrap it all and understand what's going on, but unlike templates
> we have cc -E to fall back on here, so ok.
I have (or had) a script somewhere around here to parse that huge
one-line macro out and give it some reasonable linebreaks & indentation
(feeding it gcc -E). Developing complex pre-processor stuff is a super
pain, so I just want to make sure that using it is as painless as possible.
> Ok, so "designed with gcc 4.6's optimizer in mind, regression tested back to 3.4 for at least minimal functionality, and not intentionally breaking other compilers".
>
> That's what I wanted to know, and it sounds good to me.
Yes. I wouldn't think it appropriate to propose the patch set if it
degraded performance with a fairly recent compiler (stable). Although,
gcc 4.6.3 still isn't considered "stable" on Gentoo because there's
still a few (10-ish) broken packages blocking it.
Daniel
> It would worth to have even user space version of the
> kernel RB tree code with Dainie's interface to run feasibility
> and performance tests in more environments quickly
> (as Daniel mentioned already). But it is not necessary
> for me to be convinced that provided changes should go into
> kernel because I am convinced already and offer my
> "Reviewed-by" confirmation.
Thanks Pavel!
As far as user-space testing, I have a simple test program that
compilers in user-space using a hacked-up system include directory and
compiles the kernel code unmodified. I'll be sure and write the test
code to work both in-kernel and in user space, although working out the
portability of the build will be a bit of a challenge.
Daniel
On 12-06-23 12:00 AM, Daniel Santos wrote:
> Signed-off-by: Daniel Santos <[email protected]>
> ---
> include/linux/bug.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/bug.h b/include/linux/bug.h
> index aaac4bb..298a916 100644
> --- a/include/linux/bug.h
> +++ b/include/linux/bug.h
> @@ -73,7 +73,7 @@ extern int __build_bug_on_failed;
> #define BUILD_BUG() \
> do { \
> extern void __build_bug_failed(void) \
> - __linktime_error("BUILD_BUG failed"); \
> + __compiletime_error("BUILD_BUG failed");\
At a quick glance of the bug.h parts, I would think you need
this commit _before_ #5 (that deleted __linktime_error) otherwise
you'll have introduced a bisection build failure. Or, alternatively
you could combine #5 and #6 since they are clearly related, and
their separation is more of a per-file CVS mentality than it is of
any existence of distinct and separate/unrelated changesets.
P.
> __build_bug_failed(); \
> } while (0)
>
On 06/25/2012 01:16 PM, Paul Gortmaker wrote:
>
> At a quick glance of the bug.h parts, I would think you need
> this commit _before_ #5 (that deleted __linktime_error) otherwise
> you'll have introduced a bisection build failure. Or, alternatively
> you could combine #5 and #6 since they are clearly related, and
> their separation is more of a per-file CVS mentality than it is of
> any existence of distinct and separate/unrelated changesets.
>
> P.
Thanks, will do.
And after I thought about this more, I realized that both
__build_bug_failed and __build_bug_on_failed could just be declared
globally rather than being part of the macro. It may not be that big of
a deal, but it would reduce the size of pre-processed files at least
(something I look at a lot working with this patch set). But I'll let
you make the final call on that one
Oh, and as it turns out, adding the string-ized condition in the
BUILD_BUG_ON macro is useless (actually confusing) since gcc takes the
attributes of the first occurrence of an externed function in a
translation unit. Thus, the following code:
#include <linux/bug.h>
void func(void)
{
const int a = 0;
BUILD_BUG_ON(a == 1);
BUILD_BUG_ON(1) ;
}
would result in the error message:
call to ?__build_bug_on_failed? declared with attribute error:
BUILD_BUG_ON failed: a == 1
At least the line number is correct however. So my "declare a function
multiple times with with differing attributes" turns out to not work right.
On Fri, 2012-06-22 at 23:00 -0500, Daniel Santos wrote:
> +static inline long compare_vruntime(u64 *a, u64 *b)
> +{
> +#if __BITS_PER_LONG >= 64
> + return (long)((s64)*a - (s64)*b);
> +#else
> +/* This is hacky, but is done to reduce instructions -- we wont use this for
> + * rbtree lookups, only inserts, and since our relationship is defined as
> + * non-unique, we only need to return positive if a > b and any other value
> + * means less than.
> + */
> + return (long)(*a > *b);
> +#endif
> +}
That's wrong.. suppose: a = 10, b = ULLONG_MAX - 10
In that case (s64)(a - b) = 20, however a > b is false.
And yes, vruntime wrap does happen.
On 06/26/2012 07:15 AM, Peter Zijlstra wrote:
> On Fri, 2012-06-22 at 23:00 -0500, Daniel Santos wrote:
>> +static inline long compare_vruntime(u64 *a, u64 *b)
>> +{
>> +#if __BITS_PER_LONG >= 64
>> + return (long)((s64)*a - (s64)*b);
>> +#else
>> +/* This is hacky, but is done to reduce instructions -- we wont use this for
>> + * rbtree lookups, only inserts, and since our relationship is defined as
>> + * non-unique, we only need to return positive if a > b and any other value
>> + * means less than.
>> + */
>> + return (long)(*a > *b);
>> +#endif
>> +}
> That's wrong.. suppose: a = 10, b = ULLONG_MAX - 10
>
> In that case (s64)(a - b) = 20, however a > b is false.
>
> And yes, vruntime wrap does happen.
Oh, I see now! (looking at entity_before)
static inline int entity_before(struct sched_entity *a,
struct sched_entity *b)
{
return (s64)(a->vruntime - b->vruntime) < 0;
}
Do the subtraction unsigned, then evaluate the result as signed. Thank
you very much, I'll fix that.
Also, to address why we're not using entity_before (or a less()
function) directly, there's two main reasons (one that doesn't even
affect CFS). The first reason is that an "is equal" evaluation would
also be required for insertions in trees with unique keys, as well as
all lookups. This doesn't doesn't affect CFS because it isn't doing
lookups (it only cares about leftmost) and duplicate keys are allowed.
The second is that the compare function is only evaluated once by just
returning a diff. This *would* have an better performance benefit on
x86 if only gcc were willing to do the cmp or sub operation and then use
the CPU zero & negative flags to branch. Instead, it seems to like to
do a sub (to subtract the values) and then cmp the result with zero.
This is only once extra instruction in this case, but when you need to
use the (a > b ? 1 : (a < b ? -1 : 0)) construct, it's worse. Off
topic, but something I wanted to mention in light of my "this is hacky"
section.
I guess I just need "get off of my duff", put together a succinct test
case and file a gcc bug report for this.
Daniel
On Tue, 2012-06-26 at 16:59 -0500, Daniel Santos wrote:
> Also, to address why we're not using entity_before (or a less()
> function) directly, there's two main reasons (one that doesn't even
> affect CFS). The first reason is that an "is equal" evaluation would
> also be required for insertions in trees with unique keys,
Right, I saw that feature in your code, are there any such trees in the
kernel? It would be good to have a kernel code conversion for each
variant your code supports.
> as well as
> all lookups. This doesn't doesn't affect CFS because it isn't doing
> lookups (it only cares about leftmost) and duplicate keys are allowed.
>
Right.. there's that.
Too bad, partial order is slightly cheaper at times. A well, nothing for
it.
On Fri, 2012-06-22 at 23:00 -0500, Daniel Santos wrote:
> +typedef long (*rb_compare_f)(const void *a, const void *b);
> +struct rb_relationship {
> + ssize_t root_offset;
> + ssize_t left_offset;
> + ssize_t right_offset;
> + ssize_t count_offset;
> + ssize_t node_offset;
> + ssize_t key_offset;
> + int flags;
> + const rb_compare_f compare;
> + const rb_augment_f augment;
> +};
> +static __always_inline __flatten
> +struct rb_node *__rb_find(
> + struct rb_node *node,
> + const void *key,
> + const struct rb_relationship *rel)
> +{
> + __rb_assert_good_rel(rel);
> + while (node) {
> + long diff = rel->compare(key, __rb_node_to_key(node, rel));
> +
> + if (diff > 0)
> + node = node->rb_right;
> + else if (diff < 0)
> + node = node->rb_left;
> + else
> + return node;
> + }
> +
> + return 0;
> +}
> +#define RB_RELATIONSHIP( \
> + cont_type, root, left, right, count, \
> + obj_type, node, key, \
> + _flags, _compare, _augment) { \
> + .root_offset = offsetof(cont_type, root), \
> + .left_offset = OPT_OFFSETOF(cont_type, left), \
> + .right_offset = OPT_OFFSETOF(cont_type, right), \
> + .count_offset = OPT_OFFSETOF(cont_type, count), \
> + .node_offset = offsetof(obj_type, node), \
> + .key_offset = offsetof(obj_type, key), \
> + .flags = (_flags) \
> + | IFF_EMPTY(left , 0, RB_HAS_LEFTMOST) \
> + | IFF_EMPTY(right, 0, RB_HAS_RIGHTMOST) \
> + | IFF_EMPTY(count, 0, RB_HAS_COUNT) \
> + | IFF_EMPTY(_augment, 0, RB_IS_AUGMENTED), \
> + .compare = (const rb_compare_f) (_compare), \
> + .augment = IFF_EMPTY(_augment, 0, _augment) \
> +}
> +#define RB_DEFINE_INTERFACE( \
> + prefix, \
> + cont_type, root, left, right, count, \
> + obj_type, node, key, \
> + flags, compare, augment, \
> + find_mod, insert_mod, find_near_mod, insert_near_mod) \
> + \
> \
> +static const struct rb_relationship prefix ## _rel = \
> +RB_RELATIONSHIP( \
> + cont_type, root, left, right, count, \
> + obj_type, node, key, \
> + flags, compare, augment); \
> + \
> +IFF_EMPTY(find_mod, static __always_inline, find_mod) \
> +obj_type *prefix ## _find(cont_type *cont, \
> + const typeof(((obj_type *)0)->key) *_key) \
> +{ \
> + struct rb_node *ret = rb_find( \
> + &cont->root, _key, &prefix ## _rel); \
> + return ret ? rb_entry(ret, obj_type, node) : 0; \
> +} \
So the one thing I noticed was your 'fixed' long return type for
compare. I guess that's one of the things that inspired your CFS compare
'hack' since on 32bit that's too short.
I tried playing with typeof() and return types of function pointers, but
I couldn't make it work. Bummer. That leaves explicitly passing it to
the RB_DEFINE_INTERFACE() and pulling all relevant inline functions into
the macro as well.