2013-08-23 14:47:30

by Zhi Yong Wu

[permalink] [raw]
Subject: [PATCH] rbtree: Add some necessary condition checks

From: Zhi Yong Wu <[email protected]>

Signed-off-by: Zhi Yong Wu <[email protected]>
---
include/linux/rbtree_augmented.h | 3 ++-
lib/rbtree.c | 5 +++--
2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
index fea49b5..7d19770 100644
--- a/include/linux/rbtree_augmented.h
+++ b/include/linux/rbtree_augmented.h
@@ -199,7 +199,8 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
}

successor->rb_left = tmp = node->rb_left;
- rb_set_parent(tmp, successor);
+ if (tmp)
+ rb_set_parent(tmp, successor);

pc = node->__rb_parent_color;
tmp = __rb_parent(pc);
diff --git a/lib/rbtree.c b/lib/rbtree.c
index c0e31fe..2cb01ba 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -214,7 +214,7 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
*/
sibling = parent->rb_right;
if (node != sibling) { /* node == parent->rb_left */
- if (rb_is_red(sibling)) {
+ if (sibling && rb_is_red(sibling)) {
/*
* Case 1 - left rotate at parent
*
@@ -226,7 +226,8 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
*/
parent->rb_right = tmp1 = sibling->rb_left;
sibling->rb_left = parent;
- rb_set_parent_color(tmp1, parent, RB_BLACK);
+ if (tmp1)
+ rb_set_parent_color(tmp1, parent, RB_BLACK);
__rb_rotate_set_parents(parent, sibling, root,
RB_RED);
augment_rotate(parent, sibling);
--
1.7.11.7


2013-08-26 22:01:52

by Michel Lespinasse

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

On Fri, Aug 23, 2013 at 7:45 AM, <[email protected]> wrote:
> From: Zhi Yong Wu <[email protected]>
>
> Signed-off-by: Zhi Yong Wu <[email protected]>
> ---
> include/linux/rbtree_augmented.h | 3 ++-
> lib/rbtree.c | 5 +++--
> 2 files changed, 5 insertions(+), 3 deletions(-)

So, you are saying that the checks are necessary, but you are not saying why.

The way I see it, the checks are *not* necessary, because the rbtree
invariants guarantee them to be true. The only way for the checks to
fail would be if people directly manipulate the rbtrees without going
through the proper APIs, and if they do that then I think they're on
their own. So to me, I think it's the same situation as dereferencing
a pointer without checking if it's NULL, because you know it should
never be NULL - which in my eyes is perfectly acceptable.

If you really feel this is a problem, you should explain why. I would
also prefer if any checks you add could be limited to debug builds.

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

2013-09-02 06:30:50

by Zhi Yong Wu

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

In Tue, Aug 27, 2013 at 6:01 AM, Michel Lespinasse <[email protected]> wrote:
> On Fri, Aug 23, 2013 at 7:45 AM, <[email protected]> wrote:
>> From: Zhi Yong Wu <[email protected]>
>>
>> Signed-off-by: Zhi Yong Wu <[email protected]>
>> ---
>> include/linux/rbtree_augmented.h | 3 ++-
>> lib/rbtree.c | 5 +++--
>> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> So, you are saying that the checks are necessary, but you are not saying why.
>
> The way I see it, the checks are *not* necessary, because the rbtree
> invariants guarantee them to be true. The only way for the checks to
> fail would be if people directly manipulate the rbtrees without going
> through the proper APIs, and if they do that then I think they're on
> their own. So to me, I think it's the same situation as dereferencing
> a pointer without checking if it's NULL, because you know it should
> never be NULL - which in my eyes is perfectly acceptable.
In my patchset, some rbtree APIs to be invoked, and I think that those
rbtree APIs are used corrently, Below is the pointer of its code:
https://github.com/wuzhy/kernel/compare/torvalds:master...hot_tracking
But I hit some issues when using compilebench to do perf benchmark.
compile dir kernel-7 691MB in 8.92 seconds (77.53 MB/s)
[ 411.597890] general protection fault: 0000 [#1] SMP
[ 411.598011] Modules linked in:
[ 411.598011] CPU: 0 PID: 24 Comm: kswapd0 Not tainted 3.11.0-rc7+ #1235
[ 411.598011] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 411.598011] task: ffff88029366df70 ti: ffff88029211e000 task.ti:
ffff88029211e000
[ 411.598011] RIP: 0010:[<ffffffff8149952d>] [<ffffffff8149952d>]
rb_erase+0x1bd/0x390
[ 411.598011] RSP: 0018:ffff88029211fb38 EFLAGS: 00010206
[ 411.598011] RAX: ffff8801484805f8 RBX: ffff8801484804e0 RCX: 00ffff880155f00a
[ 411.598011] RDX: ffff880155f00ae1 RSI: ffff880288630000 RDI: ffff880155f00ad8
[ 411.598011] RBP: ffff88029211fb38 R08: ffff880155f00ae1 R09: 0000000000000000
[ 411.598011] R10: ffff8801484805f8 R11: 0000000000000000 R12: ffff880288630000
[ 411.598011] R13: ffff880148480520 R14: 0000000000003af0 R15: ffff8801484805b0
[ 411.598011] FS: 0000000000000000(0000) GS:ffff88029fc00000(0000)
knlGS:0000000000000000
[ 411.598011] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 411.598011] CR2: 00000000f7798000 CR3: 000000028a69d000 CR4: 00000000000006f0
[ 411.598011] Stack:
[ 411.598011] ffff88029211fb68 ffffffff811dd6f9 ffff8801484804e0
ffff880288630000
[ 411.598011] ffffffff811dd1d0 0000000000003af0 ffff88029211fb78
ffffffff811dd79c
[ 411.598011] ffff88029211fbe8 ffffffff811dd8f9 ffff88029211fba8
ffff880288632008
[ 411.598011] Call Trace:
[ 411.598011] [<ffffffff811dd6f9>] hot_inode_item_free+0x29/0xa0
[ 411.598011] [<ffffffff811dd1d0>] ? hot_mem_limit_sum+0x10/0x10
[ 411.598011] [<ffffffff811dd79c>] hot_inode_item_put+0x2c/0x30
[ 411.598011] [<ffffffff811dd8f9>] hot_item_evict.part.5+0xa9/0x120
[ 411.598011] [<ffffffff811dd9aa>] hot_track_prune+0x3a/0x70
[ 411.598011] [<ffffffff81159f33>] shrink_slab+0x153/0x2f0
[ 411.598011] [<ffffffff81080cd3>] ? up_read+0x23/0x40
[ 411.598011] [<ffffffff8115d4d1>] balance_pgdat+0x491/0x5d0
[ 411.598011] [<ffffffff8115d7a0>] kswapd+0x190/0x470
[ 411.598011] [<ffffffff8107ca10>] ? wake_up_bit+0x40/0x40
[ 411.598011] [<ffffffff8115d610>] ? balance_pgdat+0x5d0/0x5d0
[ 411.598011] [<ffffffff8107c4fa>] kthread+0xea/0xf0
[ 411.598011] [<ffffffff8107c410>] ? flush_kthread_work+0x1b0/0x1b0
[ 411.598011] [<ffffffff818b9a1c>] ret_from_fork+0x7c/0xb0
[ 411.598011] [<ffffffff8107c410>] ? flush_kthread_work+0x1b0/0x1b0
[ 411.598011] Code: 49 89 d0 49 83 c8 01 4c 89 07 48 89 d7 48 89 ca
48 8b 4a 08 49 89 d0 49 83 c8 01 48 85 c9 48 89 48 10 48 89 42 08 4c
89 07 74 0c <48> 8b 39 83 e7 01 48 09 c7 48 89 39 48 8b 08 48 89 0a 48
83 e1
[ 411.598011] RIP [<ffffffff8149952d>] rb_erase+0x1bd/0x390
[ 411.598011] RSP <ffff88029211fb38>
[ 411.669881] ---[ end trace 89d346eca258dcf8 ]---

(gdb) l *rb_erase+0x1bd
0xffffffff8149952d is in rb_erase (include/linux/rbtree_augmented.h:101).
96 #define rb_is_red(rb) __rb_is_red((rb)->__rb_parent_color)
97 #define rb_is_black(rb) __rb_is_black((rb)->__rb_parent_color)
98
99 static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
100 {
101 rb->__rb_parent_color = rb_color(rb) | (unsigned long)p;
102 }
103
104 static inline void rb_set_parent_color(struct rb_node *rb,
105 struct rb_node *p, int color)
(gdb)

compile dir kernel-27 691MB in 14.26 seconds (48.50 MB/s)
create dir kernel-64334 222MB in 12.78 seconds (17.40 MB/s)
[ 1630.646485] BUG: unable to handle kernel NULL pointer dereference
at (null)
[ 1630.647010] IP: [<ffffffff814993d2>] rb_erase+0x62/0x390
[ 1630.647010] PGD 289f78067 PUD 291de4067 PMD 0
[ 1630.647010] Oops: 0000 [#1] SMP
[ 1630.647010] Modules linked in:
[ 1630.647010] CPU: 0 PID: 24 Comm: kswapd0 Not tainted 3.11.0-rc7+ #1235
[ 1630.647010] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1630.647010] task: ffff88029366df70 ti: ffff88029211e000 task.ti:
ffff88029211e000
[ 1630.647010] RIP: 0010:[<ffffffff814993d2>] [<ffffffff814993d2>]
rb_erase+0x62/0x390
[ 1630.647010] RSP: 0018:ffff88029211fb38 EFLAGS: 00010282
[ 1630.647010] RAX: ffff8800a57b3a08 RBX: ffff8800a57b39c0 RCX: 0000000000000000
[ 1630.647010] RDX: ffff8800a57b3ad8 RSI: ffff88013011c000 RDI: ffff8800a57b3a08
[ 1630.647010] RBP: ffff88029211fb38 R08: ffff8801cd31b118 R09: 0000000000000000
[ 1630.647010] R10: ffff8800a57b3ad8 R11: 0000000000000000 R12: ffff88013011c000
[ 1630.647010] R13: ffff8800a57b3a00 R14: 0000000000003f58 R15: ffff8800a57b3a90
[ 1630.647010] FS: 0000000000000000(0000) GS:ffff88029fc00000(0000)
knlGS:0000000000000000
[ 1630.647010] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1630.647010] CR2: 0000000000000000 CR3: 0000000256e71000 CR4: 00000000000006f0
[ 1630.647010] Stack:
[ 1630.647010] ffff88029211fb68 ffffffff811dd6f9 ffff8800a57b39c0
ffff88013011c000
[ 1630.647010] ffffffff811dd1d0 0000000000003f58 ffff88029211fb78
ffffffff811dd79c
[ 1630.647010] ffff88029211fbe8 ffffffff811dd8f9 ffff88029211fba8
ffff88013011e008
[ 1630.647010] Call Trace:
[ 1630.647010] [<ffffffff811dd6f9>] hot_inode_item_free+0x29/0xa0
[ 1630.647010] [<ffffffff811dd1d0>] ? hot_mem_limit_sum+0x10/0x10
[ 1630.647010] [<ffffffff811dd79c>] hot_inode_item_put+0x2c/0x30
[ 1630.647010] [<ffffffff811dd8f9>] hot_item_evict.part.5+0xa9/0x120
[ 1630.647010] [<ffffffff811dd9aa>] hot_track_prune+0x3a/0x70
[ 1630.647010] [<ffffffff81159f33>] shrink_slab+0x153/0x2f0
[ 1630.647010] [<ffffffff81080cd3>] ? up_read+0x23/0x40
[ 1630.647010] [<ffffffff8115d4d1>] balance_pgdat+0x491/0x5d0
[ 1630.647010] [<ffffffff8115d7a0>] kswapd+0x190/0x470
[ 1630.647010] [<ffffffff8107ca10>] ? wake_up_bit+0x40/0x40
[ 1630.647010] [<ffffffff8115d610>] ? balance_pgdat+0x5d0/0x5d0
[ 1630.647010] [<ffffffff8107c4fa>] kthread+0xea/0xf0
[ 1630.647010] [<ffffffff8107c410>] ? flush_kthread_work+0x1b0/0x1b0
[ 1630.647010] [<ffffffff818b9a1c>] ret_from_fork+0x7c/0xb0
[ 1630.647010] [<ffffffff8107c410>] ? flush_kthread_work+0x1b0/0x1b0
[ 1630.647010] Code: 8b 4a 10 48 85 c9 75 f1 4c 8b 4a 08 49 89 d2 4c
89 48 10 4c 89 42 08 49 8b 08 83 e1 01 48 09 d1 49 89 08 48 8b 4f 10
48 89 4a 10 <4c> 8b 01 41 83 e0 01 4d 09 d0 4c 89 01 48 8b 0f 49 89 c8
49 83
[ 1630.647010] RIP [<ffffffff814993d2>] rb_erase+0x62/0x390
[ 1630.647010] RSP <ffff88029211fb38>
[ 1630.647010] CR2: 0000000000000000
[ 1630.724782] ---[ end trace aec3e0601deedd6a ]---

60 inc.head = ACCESS_ONCE(lock->tickets.head);
(gdb) l *rb_erase+0x62
0xffffffff814993d2 is in rb_erase (include/linux/rbtree_augmented.h:101).
96 #define rb_is_red(rb) __rb_is_red((rb)->__rb_parent_color)
97 #define rb_is_black(rb) __rb_is_black((rb)->__rb_parent_color)
98
99 static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
100 {
101 rb->__rb_parent_color = rb_color(rb) | (unsigned long)p;
102 }
103
104 static inline void rb_set_parent_color(struct rb_node *rb,
105 struct rb_node *p, int color)
(gdb)


clean kernel-7 691MB in 0.42 seconds (1646.66 MB/s)
delete kernel-27 in 3.63 seconds
[ 623.474634] BUG: unable to handle kernel paging request at fffffffffffffffc
[ 623.475009] IP: [<ffffffff81499a10>] rb_insert_color+0xa0/0x170
[ 623.475009] PGD 1e0c067 PUD 1e0e067 PMD 0
[ 623.475009] Oops: 0000 [#1] SMP
[ 623.475009] Modules linked in:
[ 623.475009] CPU: 0 PID: 1696 Comm: btrfs-transacti Not tainted
3.11.0-rc7+ #1235
[ 623.475009] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 623.475009] task: ffff88028c691fd0 ti: ffff880291ef2000 task.ti:
ffff880291ef2000
[ 623.475009] RIP: 0010:[<ffffffff81499a10>] [<ffffffff81499a10>]
rb_insert_color+0xa0/0x170
[ 623.475009] RSP: 0018:ffff880291ef3958 EFLAGS: 00010286
[ 623.475009] RAX: fffffffffffffffc RBX: ffff880289ed0000 RCX: ffffffff820ea8c0
[ 623.475009] RDX: ffff8801f83bd1e8 RSI: ffff880289ed0000 RDI: ffff8801f83bd1e9
[ 623.475009] RBP: ffff880291ef3958 R08: ffff8801f83bd1e8 R09: 0000000000000001
[ 623.475009] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880289ed2008
[ 623.475009] R13: ffff8802365b4ea0 R14: 0000000000000101 R15: ffff8802365b4e18
[ 623.475009] FS: 0000000000000000(0000) GS:ffff88029fc00000(0000)
knlGS:0000000000000000
[ 623.475009] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 623.475009] CR2: fffffffffffffffc CR3: 0000000289ee2000 CR4: 00000000000006f0
[ 623.475009] Stack:
[ 623.475009] ffff880291ef39a8 ffffffff811ddfc6 ffff880291ef3988
00000001813a5e52
[ 623.475009] ffff880291ef39c0 0000000000040000 0000000000040000
0000000000040000
[ 623.475009] 7fffffffffffffff 0000000000000000 ffff880291ef3a08
ffffffff811de338
[ 623.475009] Call Trace:
[ 623.475009] [<ffffffff811ddfc6>] hot_inode_item_lookup+0x166/0x1b0
[ 623.475009] [<ffffffff811de338>] hot_freqs_update+0x78/0x160
[ 623.475009] [<ffffffff81390c30>] ?
insert_reserved_file_extent.constprop.62+0x2c0/0x2c0
[ 623.475009] [<ffffffff811542fa>] do_writepages+0x6a/0xa0
[ 623.475009] [<ffffffff81148f19>] __filemap_fdatawrite_range+0x59/0x60
[ 623.475009] [<ffffffff81149d83>] filemap_fdatawrite_range+0x13/0x20
[ 623.475009] [<ffffffff813a571d>] btrfs_wait_ordered_range+0x4d/0x120
[ 623.475009] [<ffffffff813cca94>] __btrfs_write_out_cache+0x764/0x970
[ 623.475009] [<ffffffff813ccfe8>] btrfs_write_out_cache+0x98/0xf0
[ 623.475009] [<ffffffff8137bf50>] btrfs_write_dirty_block_groups+0x580/0x660
[ 623.475009] [<ffffffff813ac4e9>] ? free_extent_buffer+0x49/0xc0
[ 623.475009] [<ffffffff818a5f50>] commit_cowonly_roots+0x154/0x226
[ 623.475009] [<ffffffff8138c5b5>] btrfs_commit_transaction+0x445/0x950
[ 623.475009] [<ffffffff81383efd>] transaction_kthread+0x1bd/0x240
[ 623.475009] [<ffffffff81383d40>] ? cleaner_kthread+0x1a0/0x1a0
[ 623.475009] [<ffffffff8107c4fa>] kthread+0xea/0xf0
[ 623.475009] [<ffffffff8107c410>] ? flush_kthread_work+0x1b0/0x1b0
[ 623.475009] [<ffffffff818b9a1c>] ret_from_fork+0x7c/0xb0
[ 623.475009] [<ffffffff8107c410>] ? flush_kthread_work+0x1b0/0x1b0
[ 623.475009] Code: 00 48 89 41 08 5d c3 0f 1f 40 00 48 89 d7 48 83
cf 01 48 89 39 48 89 38 48 8b 02 48 83 e0 fc 48 85 c0 48 89 02 0f 84
8e 00 00 00 <48> 8b 10 4c 89 c7 f6 c2 01 75 cf 48 8b 4a 08 49 89 d0 48
39 c8
[ 623.475009] RIP [<ffffffff81499a10>] rb_insert_color+0xa0/0x170
[ 623.475009] RSP <ffff880291ef3958>
[ 623.475009] CR2: fffffffffffffffc
[ 623.475009] ---[ end trace 0e8e9ac8daa97bd0 ]---

(gdb) l *rb_insert_color+0xa0
0xffffffff81499a10 is in rb_insert_color (lib/rbtree.c:89).
84 * want a red root or two consecutive red nodes.
85 */
86 if (!parent) {
87 rb_set_parent_color(node, NULL, RB_BLACK);
88 break;
89 } else if (rb_is_black(parent))
90 break;
91
92 gparent = rb_red_parent(parent);
93
(gdb)

create dir kernel-23 222MB in 6.01 seconds (37.00 MB/s)
[ 6622.862142] BUG: unable to handle kernel NULL pointer dereference
at (null)
[ 6622.863012] IP: [<ffffffff81499488>] rb_erase+0x118/0x390
[ 6622.863012] PGD 231f4f067 PUD 238297067 PMD 0
[ 6622.863012] Oops: 0002 [#1] SMP
[ 6622.863012] Modules linked in:
[ 6622.863012] CPU: 0 PID: 24 Comm: kswapd0 Not tainted 3.11.0-rc7+ #1235
[ 6622.863012] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 6622.863012] task: ffff88029366df70 ti: ffff88029211e000 task.ti:
ffff88029211e000
[ 6622.863012] RIP: 0010:[<ffffffff81499488>] [<ffffffff81499488>]
rb_erase+0x118/0x390
[ 6622.863012] RSP: 0018:ffff88029211fb38 EFLAGS: 00010282
[ 6622.863012] RAX: ffff88019c1f1ee8 RBX: ffff88019c1f1d00 RCX: 0000000000000000
[ 6622.863012] RDX: ffff88017f964a08 RSI: ffff880233798000 RDI: ffff88019c1f1ee9
[ 6622.863012] RBP: ffff88029211fb38 R08: ffff88019c1f1ee8 R09: 0000000000000000
[ 6622.863012] R10: ffff88019c1f1e18 R11: 0000000000000000 R12: ffff880233798000
[ 6622.863012] R13: ffff88019c1f1d40 R14: 0000000000003d0c R15: ffff88019c1f1dd0
[ 6622.863012] FS: 0000000000000000(0000) GS:ffff88029fc00000(0000)
knlGS:0000000000000000
[ 6622.863012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6622.863012] CR2: 0000000000000000 CR3: 00000002336fc000 CR4: 00000000000006f0
[ 6622.863012] Stack:
[ 6622.863012] ffff88029211fb68 ffffffff811dd6f9 ffff88019c1f1d00
ffff880233798000
[ 6622.863012] ffffffff811dd1d0 0000000000003d0c ffff88029211fb78
ffffffff811dd79c
[ 6622.863012] ffff88029211fbe8 ffffffff811dd8f9 ffff88029211fba8
ffff88023379a008
[ 6622.863012] Call Trace:
[ 6622.863012] [<ffffffff811dd6f9>] hot_inode_item_free+0x29/0xa0
[ 6622.863012] [<ffffffff811dd1d0>] ? hot_mem_limit_sum+0x10/0x10
[ 6622.863012] [<ffffffff811dd79c>] hot_inode_item_put+0x2c/0x30
[ 6622.863012] [<ffffffff811dd8f9>] hot_item_evict.part.5+0xa9/0x120
[ 6622.863012] [<ffffffff811dd9aa>] hot_track_prune+0x3a/0x70
[ 6622.863012] [<ffffffff81159f33>] shrink_slab+0x153/0x2f0
[ 6622.863012] [<ffffffff81080cd3>] ? up_read+0x23/0x40
[ 6622.863012] [<ffffffff8115d4d1>] balance_pgdat+0x491/0x5d0
[ 6622.863012] [<ffffffff8115d7a0>] kswapd+0x190/0x470
[ 6622.863012] [<ffffffff8107ca10>] ? wake_up_bit+0x40/0x40
[ 6622.863012] [<ffffffff8115d610>] ? balance_pgdat+0x5d0/0x5d0
[ 6622.863012] [<ffffffff8107c4fa>] kthread+0xea/0xf0
[ 6622.863012] [<ffffffff8107c410>] ? flush_kthread_work+0x1b0/0x1b0
[ 6622.863012] [<ffffffff818b9a1c>] ret_from_fork+0x7c/0xb0
[ 6622.863012] [<ffffffff8107c410>] ? flush_kthread_work+0x1b0/0x1b0
[ 6622.863012] Code: e2 fc 74 ab 48 89 c1 48 89 d0 48 8b 50 08 48 39
ca 74 48 f6 02 01 75 b3 48 8b 4a 10 48 89 c7 48 83 cf 01 48 89 48 08
48 89 42 10 <48> 89 39 48 8b 38 48 89 3a 48 83 e7 fc 48 89 10 0f 84 02
01 00
[ 6622.863012] RIP [<ffffffff81499488>] rb_erase+0x118/0x390
[ 6622.863012] RSP <ffff88029211fb38>
[ 6622.863012] CR2: 0000000000000000
[ 6622.938771] ---[ end trace 75fedfe9cb5f4943 ]---

(gdb) l *rb_erase+0x118
0xffffffff81499488 is in rb_erase (include/linux/rbtree_augmented.h:107).
102 }
103
104 static inline void rb_set_parent_color(struct rb_node *rb,
105 struct rb_node *p, int color)
106 {
107 rb->__rb_parent_color = (unsigned long)p | color;
108 }
109
110 static inline void
111 __rb_change_child(struct rb_node *old, struct rb_node *new,
(gdb)


>
> If you really feel this is a problem, you should explain why. I would
> also prefer if any checks you add could be limited to debug builds.
>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.



--
Regards,

Zhi Yong Wu

2013-09-02 08:57:36

by Michel Lespinasse

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

On Sun, Sep 1, 2013 at 11:30 PM, Zhi Yong Wu <[email protected]> wrote:
> In Tue, Aug 27, 2013 at 6:01 AM, Michel Lespinasse <[email protected]> wrote:
>> On Fri, Aug 23, 2013 at 7:45 AM, <[email protected]> wrote:
>>> From: Zhi Yong Wu <[email protected]>
>>>
>>> Signed-off-by: Zhi Yong Wu <[email protected]>
>>> ---
>>> include/linux/rbtree_augmented.h | 3 ++-
>>> lib/rbtree.c | 5 +++--
>>> 2 files changed, 5 insertions(+), 3 deletions(-)
>>
>> So, you are saying that the checks are necessary, but you are not saying why.
>>
>> The way I see it, the checks are *not* necessary, because the rbtree
>> invariants guarantee them to be true. The only way for the checks to
>> fail would be if people directly manipulate the rbtrees without going
>> through the proper APIs, and if they do that then I think they're on
>> their own. So to me, I think it's the same situation as dereferencing
>> a pointer without checking if it's NULL, because you know it should
>> never be NULL - which in my eyes is perfectly acceptable.
> In my patchset, some rbtree APIs to be invoked, and I think that those
> rbtree APIs are used corrently, Below is the pointer of its code:
> https://github.com/wuzhy/kernel/compare/torvalds:master...hot_tracking
> But I hit some issues when using compilebench to do perf benchmark.
> compile dir kernel-7 691MB in 8.92 seconds (77.53 MB/s)

Thanks for the link - I now better understand where you are coming
from with these fixes.

Going back to the original message:

> diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
> index fea49b5..7d19770 100644
> --- a/include/linux/rbtree_augmented.h
> +++ b/include/linux/rbtree_augmented.h
> @@ -199,7 +199,8 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
> }
>
> successor->rb_left = tmp = node->rb_left;
> - rb_set_parent(tmp, successor);
> + if (tmp)
> + rb_set_parent(tmp, successor);
>
> pc = node->__rb_parent_color;
> tmp = __rb_parent(pc);

Note that node->rb_left was already fetched at the top of
__rb_erase_augmented(), and was checked to be non-NULL at the time -
otherwise we would have executed 'Case 1' in that function. So, you
are not expected to find tmp == NULL here.

> diff --git a/lib/rbtree.c b/lib/rbtree.c
> index c0e31fe..2cb01ba 100644
> --- a/lib/rbtree.c
> +++ b/lib/rbtree.c
> @@ -214,7 +214,7 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
> */
> sibling = parent->rb_right;
> if (node != sibling) { /* node == parent->rb_left */
> - if (rb_is_red(sibling)) {
> + if (sibling && rb_is_red(sibling)) {
> /*
> * Case 1 - left rotate at parent
> *

Note the loop invariants quoted just above:

/*
* Loop invariants:
* - node is black (or NULL on first iteration)
* - node is not the root (parent is not NULL)
* - All leaf paths going through parent and node have a
* black node count that is 1 lower than other leaf paths.
*/

Because of these, each path from sibling to a leaf must include at
least one black node, which implies that sibling can't be NULL - or to
put it another way, if sibling is null then the expected invariants
were violated before we even got there.

> @@ -226,7 +226,8 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
> */
> parent->rb_right = tmp1 = sibling->rb_left;
> sibling->rb_left = parent;
> - rb_set_parent_color(tmp1, parent, RB_BLACK);
> + if (tmp1)
> + rb_set_parent_color(tmp1, parent, RB_BLACK);
> __rb_rotate_set_parents(parent, sibling, root,
> RB_RED);
> augment_rotate(parent, sibling);

This is actually the same invariant here - each path from sibling to a
leaf must include at least one black node, and sibling is now known to
be red, so it must have two black children.


Now I had a quick look at your code and I couldn't tell at which point
the invariants are violated. However I did notice a couple suspicious
things in the very first patch
(f5c8f2b256d87ac0bf789a787e6b795ac0c736e8):

1- In both hot_range_tree_free() and and hot_tree_exit(), you try to
destroy rb trees by iterating on each node with rb_next() and then
freeing them. Note that rb_next() can reference prior nodes, which
have already been freed in your scheme, so that seems quite unsafe.

The simplest fix would be to do a full rb_erase() on each node before
freeing it. (you may be able to avoid rebalancing the tree here as
you're going to destroy it all, but if you really have that need it
would be better to come up with a new API to cover it rather than
hardcode it where you need it - I think it's easiest to start with the
simple dumb fix of using rb_erase).

2- I did not look long enough to understand the locking, but it wasn't
clear to me if you lock the rbtrees when doing rb_erase() on them
(while I could more clearly see that you do it for insertions).

I'm really not sure if either of these will fix the issues you're
seeing, though. What I would try next would be to add explicit rbtree
invariant checks before and after rbtree manipulations, like what the
check() function does in lib/rbtree_test.c, to see at which point do
they get broken.

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

2013-09-03 04:45:55

by Zhi Yong Wu

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

On Mon, Sep 2, 2013 at 4:57 PM, Michel Lespinasse <[email protected]> wrote:
> On Sun, Sep 1, 2013 at 11:30 PM, Zhi Yong Wu <[email protected]> wrote:
>> In Tue, Aug 27, 2013 at 6:01 AM, Michel Lespinasse <[email protected]> wrote:
>>> On Fri, Aug 23, 2013 at 7:45 AM, <[email protected]> wrote:
>>>> From: Zhi Yong Wu <[email protected]>
>>>>
>>>> Signed-off-by: Zhi Yong Wu <[email protected]>
>>>> ---
>>>> include/linux/rbtree_augmented.h | 3 ++-
>>>> lib/rbtree.c | 5 +++--
>>>> 2 files changed, 5 insertions(+), 3 deletions(-)
>>>
>>> So, you are saying that the checks are necessary, but you are not saying why.
>>>
>>> The way I see it, the checks are *not* necessary, because the rbtree
>>> invariants guarantee them to be true. The only way for the checks to
>>> fail would be if people directly manipulate the rbtrees without going
>>> through the proper APIs, and if they do that then I think they're on
>>> their own. So to me, I think it's the same situation as dereferencing
>>> a pointer without checking if it's NULL, because you know it should
>>> never be NULL - which in my eyes is perfectly acceptable.
>> In my patchset, some rbtree APIs to be invoked, and I think that those
>> rbtree APIs are used corrently, Below is the pointer of its code:
>> https://github.com/wuzhy/kernel/compare/torvalds:master...hot_tracking
>> But I hit some issues when using compilebench to do perf benchmark.
>> compile dir kernel-7 691MB in 8.92 seconds (77.53 MB/s)
>
> Thanks for the link - I now better understand where you are coming
> from with these fixes.
>
> Going back to the original message:
>
>> diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
>> index fea49b5..7d19770 100644
>> --- a/include/linux/rbtree_augmented.h
>> +++ b/include/linux/rbtree_augmented.h
>> @@ -199,7 +199,8 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
>> }
>>
>> successor->rb_left = tmp = node->rb_left;
>> - rb_set_parent(tmp, successor);
>> + if (tmp)
>> + rb_set_parent(tmp, successor);
>>
>> pc = node->__rb_parent_color;
>> tmp = __rb_parent(pc);
>
> Note that node->rb_left was already fetched at the top of
> __rb_erase_augmented(), and was checked to be non-NULL at the time -
> otherwise we would have executed 'Case 1' in that function. So, you
If 'Case 1' is executed, this line of code is also done, how about the result?
'Case 1' seems *not* to change node->rb_left at all.

> are not expected to find tmp == NULL here.
>
>> diff --git a/lib/rbtree.c b/lib/rbtree.c
>> index c0e31fe..2cb01ba 100644
>> --- a/lib/rbtree.c
>> +++ b/lib/rbtree.c
>> @@ -214,7 +214,7 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
>> */
>> sibling = parent->rb_right;
>> if (node != sibling) { /* node == parent->rb_left */
>> - if (rb_is_red(sibling)) {
>> + if (sibling && rb_is_red(sibling)) {
>> /*
>> * Case 1 - left rotate at parent
>> *
>
> Note the loop invariants quoted just above:
>
> /*
> * Loop invariants:
> * - node is black (or NULL on first iteration)
> * - node is not the root (parent is not NULL)
> * - All leaf paths going through parent and node have a
> * black node count that is 1 lower than other leaf paths.
> */
>
> Because of these, each path from sibling to a leaf must include at
> least one black node, which implies that sibling can't be NULL - or to
> put it another way, if sibling is null then the expected invariants
> were violated before we even got there.
In theory, i can understand what you mean, But don't know why and
where it got violated.
>
>> @@ -226,7 +226,8 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
>> */
>> parent->rb_right = tmp1 = sibling->rb_left;
>> sibling->rb_left = parent;
>> - rb_set_parent_color(tmp1, parent, RB_BLACK);
>> + if (tmp1)
>> + rb_set_parent_color(tmp1, parent, RB_BLACK);
>> __rb_rotate_set_parents(parent, sibling, root,
>> RB_RED);
>> augment_rotate(parent, sibling);
>
> This is actually the same invariant here - each path from sibling to a
> leaf must include at least one black node, and sibling is now known to
> be red, so it must have two black children.
Ditto.
>
>
> Now I had a quick look at your code and I couldn't tell at which point
> the invariants are violated. However I did notice a couple suspicious
> things in the very first patch
> (f5c8f2b256d87ac0bf789a787e6b795ac0c736e8):
>
> 1- In both hot_range_tree_free() and and hot_tree_exit(), you try to
> destroy rb trees by iterating on each node with rb_next() and then
yes, but this item may not been freed immediately, You can know each item
has its ref count.
> freeing them. Note that rb_next() can reference prior nodes, which
> have already been freed in your scheme, so that seems quite unsafe.
I checked rb_next() function, and found that if its prior nodes are
freed, is this node's parent not NULL? If parent is NULL, 'node ==
parent->rb_right' will not been executed, so i don't think any issue
will happen when rb_next() is called here. right?

while ((parent = rb_parent(node)) && node == parent->rb_right)
node = parent;

>
> The simplest fix would be to do a full rb_erase() on each node before
full rb_erase()? sorry, i don't get what you mean. Do you mean we
should erase all nodes from rbtree, then begin to free them? If yes,
how to iterate them? If no, can you elaborate it?

> freeing it. (you may be able to avoid rebalancing the tree here as
> you're going to destroy it all, but if you really have that need it
> would be better to come up with a new API to cover it rather than
> hardcode it where you need it - I think it's easiest to start with the
> simple dumb fix of using rb_erase).
>
> 2- I did not look long enough to understand the locking, but it wasn't
> clear to me if you lock the rbtrees when doing rb_erase() on them
> (while I could more clearly see that you do it for insertions).
Yes, it get locking when doing rb_erase() or rb_insert(). You can see
there are multiple functions maybe rbtree at the same time. To sync
them, we need to lock the rbtree.

>
> I'm really not sure if either of these will fix the issues you're
> seeing, though. What I would try next would be to add explicit rbtree
> invariant checks before and after rbtree manipulations, like what the
> check() function does in lib/rbtree_test.c, to see at which point do
> they get broken.
Great, any progress so far? :)

>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.



--
Regards,

Zhi Yong Wu

2013-09-03 05:49:03

by Michel Lespinasse

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

On Mon, Sep 2, 2013 at 9:45 PM, Zhi Yong Wu <[email protected]> wrote:
> On Mon, Sep 2, 2013 at 4:57 PM, Michel Lespinasse <[email protected]> wrote:
>> Thanks for the link - I now better understand where you are coming
>> from with these fixes.
>>
>> Going back to the original message:
>>
>>> diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
>>> index fea49b5..7d19770 100644
>>> --- a/include/linux/rbtree_augmented.h
>>> +++ b/include/linux/rbtree_augmented.h
>>> @@ -199,7 +199,8 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
>>> }
>>>
>>> successor->rb_left = tmp = node->rb_left;
>>> - rb_set_parent(tmp, successor);
>>> + if (tmp)
>>> + rb_set_parent(tmp, successor);
>>>
>>> pc = node->__rb_parent_color;
>>> tmp = __rb_parent(pc);
>>
>> Note that node->rb_left was already fetched at the top of
>> __rb_erase_augmented(), and was checked to be non-NULL at the time -
>> otherwise we would have executed 'Case 1' in that function. So, you
> If 'Case 1' is executed, this line of code is also done, how about the result?
> 'Case 1' seems *not* to change node->rb_left at all.

Wait, I believe this line of code is executed only in Case 2 and Case 3 ?

>>> diff --git a/lib/rbtree.c b/lib/rbtree.c
>>> index c0e31fe..2cb01ba 100644
>>> --- a/lib/rbtree.c
>>> +++ b/lib/rbtree.c
>>> @@ -214,7 +214,7 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
>>> */
>>> sibling = parent->rb_right;
>>> if (node != sibling) { /* node == parent->rb_left */
>>> - if (rb_is_red(sibling)) {
>>> + if (sibling && rb_is_red(sibling)) {
>>> /*
>>> * Case 1 - left rotate at parent
>>> *
>>
>> Note the loop invariants quoted just above:
>>
>> /*
>> * Loop invariants:
>> * - node is black (or NULL on first iteration)
>> * - node is not the root (parent is not NULL)
>> * - All leaf paths going through parent and node have a
>> * black node count that is 1 lower than other leaf paths.
>> */
>>
>> Because of these, each path from sibling to a leaf must include at
>> least one black node, which implies that sibling can't be NULL - or to
>> put it another way, if sibling is null then the expected invariants
>> were violated before we even got there.
> In theory, i can understand what you mean, But don't know why and
> where it got violated.

Same here. My point is, I don't think we can fix the issue without
answering that question.

>> Now I had a quick look at your code and I couldn't tell at which point
>> the invariants are violated. However I did notice a couple suspicious
>> things in the very first patch
>> (f5c8f2b256d87ac0bf789a787e6b795ac0c736e8):
>>
>> 1- In both hot_range_tree_free() and and hot_tree_exit(), you try to
>> destroy rb trees by iterating on each node with rb_next() and then
> yes, but this item may not been freed immediately, You can know each item
> has its ref count.

Are items guaranteed to have another refcount than the one we're dropping ?

>> freeing them. Note that rb_next() can reference prior nodes, which
>> have already been freed in your scheme, so that seems quite unsafe.
> I checked rb_next() function, and found that if its prior nodes are
> freed, is this node's parent not NULL?

No, if the parent was freed with just a put() operation, the child
will still have a pointer to it. This is why I suggested using
rb_erase() on each node before freeing them, so that we don't keep
pointers to freed nodes.

>> The simplest fix would be to do a full rb_erase() on each node before
> full rb_erase()? sorry, i don't get what you mean. Do you mean we
> should erase all nodes from rbtree, then begin to free them? If yes,
> how to iterate them? If no, can you elaborate it?

No, I meant to call rb_erase() on each individual node right before
the corresponding put() operation.

>> 2- I did not look long enough to understand the locking, but it wasn't
>> clear to me if you lock the rbtrees when doing rb_erase() on them
>> (while I could more clearly see that you do it for insertions).
> Yes, it get locking when doing rb_erase() or rb_insert(). You can see
> there are multiple functions maybe rbtree at the same time. To sync
> them, we need to lock the rbtree.

Yes, agree we need to lock rbtree in all such operations. I just
wasn't able to determine if it's done around rb_erase() calls, but it
definitely needs to be.

>> I'm really not sure if either of these will fix the issues you're
>> seeing, though. What I would try next would be to add explicit rbtree
>> invariant checks before and after rbtree manipulations, like what the
>> check() function does in lib/rbtree_test.c, to see at which point do
>> they get broken.
> Great, any progress so far? :)

Unfortunately no.

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

2013-09-03 06:22:17

by Zhi Yong Wu

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

On Tue, Sep 3, 2013 at 1:48 PM, Michel Lespinasse <[email protected]> wrote:
> On Mon, Sep 2, 2013 at 9:45 PM, Zhi Yong Wu <[email protected]> wrote:
>> On Mon, Sep 2, 2013 at 4:57 PM, Michel Lespinasse <[email protected]> wrote:
>>> Thanks for the link - I now better understand where you are coming
>>> from with these fixes.
>>>
>>> Going back to the original message:
>>>
>>>> diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
>>>> index fea49b5..7d19770 100644
>>>> --- a/include/linux/rbtree_augmented.h
>>>> +++ b/include/linux/rbtree_augmented.h
>>>> @@ -199,7 +199,8 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
>>>> }
>>>>
>>>> successor->rb_left = tmp = node->rb_left;
>>>> - rb_set_parent(tmp, successor);
>>>> + if (tmp)
>>>> + rb_set_parent(tmp, successor);
>>>>
>>>> pc = node->__rb_parent_color;
>>>> tmp = __rb_parent(pc);
>>>
>>> Note that node->rb_left was already fetched at the top of
>>> __rb_erase_augmented(), and was checked to be non-NULL at the time -
>>> otherwise we would have executed 'Case 1' in that function. So, you
>> If 'Case 1' is executed, this line of code is also done, how about the result?
>> 'Case 1' seems *not* to change node->rb_left at all.
>
> Wait, I believe this line of code is executed only in Case 2 and Case 3 ?
Yes.
>
>>>> diff --git a/lib/rbtree.c b/lib/rbtree.c
>>>> index c0e31fe..2cb01ba 100644
>>>> --- a/lib/rbtree.c
>>>> +++ b/lib/rbtree.c
>>>> @@ -214,7 +214,7 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
>>>> */
>>>> sibling = parent->rb_right;
>>>> if (node != sibling) { /* node == parent->rb_left */
>>>> - if (rb_is_red(sibling)) {
>>>> + if (sibling && rb_is_red(sibling)) {
>>>> /*
>>>> * Case 1 - left rotate at parent
>>>> *
>>>
>>> Note the loop invariants quoted just above:
>>>
>>> /*
>>> * Loop invariants:
>>> * - node is black (or NULL on first iteration)
>>> * - node is not the root (parent is not NULL)
>>> * - All leaf paths going through parent and node have a
>>> * black node count that is 1 lower than other leaf paths.
>>> */
>>>
>>> Because of these, each path from sibling to a leaf must include at
>>> least one black node, which implies that sibling can't be NULL - or to
>>> put it another way, if sibling is null then the expected invariants
>>> were violated before we even got there.
>> In theory, i can understand what you mean, But don't know why and
>> where it got violated.
>
> Same here. My point is, I don't think we can fix the issue without
> answering that question.
Yes, but how to look for this answer is the key. :)
>
>>> Now I had a quick look at your code and I couldn't tell at which point
>>> the invariants are violated. However I did notice a couple suspicious
>>> things in the very first patch
>>> (f5c8f2b256d87ac0bf789a787e6b795ac0c736e8):
>>>
>>> 1- In both hot_range_tree_free() and and hot_tree_exit(), you try to
>>> destroy rb trees by iterating on each node with rb_next() and then
>> yes, but this item may not been freed immediately, You can know each item
>> has its ref count.
>
> Are items guaranteed to have another refcount than the one we're dropping ?
Those items maybe have another refcount. You know, other function may
be referencing those items. e.g. hot_freqs_update().
>
>>> freeing them. Note that rb_next() can reference prior nodes, which
>>> have already been freed in your scheme, so that seems quite unsafe.
>> I checked rb_next() function, and found that if its prior nodes are
>> freed, is this node's parent not NULL?
>
> No, if the parent was freed with just a put() operation, the child
> will still have a pointer to it. This is why I suggested using
> rb_erase() on each node before freeing them, so that we don't keep
> pointers to freed nodes.
ah.
>
>>> The simplest fix would be to do a full rb_erase() on each node before
>> full rb_erase()? sorry, i don't get what you mean. Do you mean we
>> should erase all nodes from rbtree, then begin to free them? If yes,
>> how to iterate them? If no, can you elaborate it?
>
> No, I meant to call rb_erase() on each individual node right before
> the corresponding put() operation.
rb_erase() has been called in current code. You can see
static void hot_range_item_free(struct kref *kref)
{
struct hot_range_item *hr = container_of(kref,
struct hot_range_item, refs);
struct hot_info *root = hr->hot_inode->hot_root;

if (!RB_EMPTY_NODE(&hr->rb_node))
rb_erase(&hr->rb_node, &hr->hot_inode->hot_range_tree);
spin_lock(&root->m_lock);
if (!list_empty(&hr->track_list))
list_del_init(&hr->track_list);
spin_unlock(&root->m_lock);
call_rcu(&hr->rcu, hot_range_item_free_cb);
}

void hot_range_item_put(struct hot_range_item *hr)
{
kref_put(&hr->refs, hot_range_item_free);
}

But hot_range_item_free() may be not executed if this item's refcount
is more than one when put() is called.


>
>>> 2- I did not look long enough to understand the locking, but it wasn't
>>> clear to me if you lock the rbtrees when doing rb_erase() on them
>>> (while I could more clearly see that you do it for insertions).
>> Yes, it get locking when doing rb_erase() or rb_insert(). You can see
>> there are multiple functions maybe rbtree at the same time. To sync
>> them, we need to lock the rbtree.
>
> Yes, agree we need to lock rbtree in all such operations. I just
> wasn't able to determine if it's done around rb_erase() calls, but it
> definitely needs to be.
>
>>> I'm really not sure if either of these will fix the issues you're
>>> seeing, though. What I would try next would be to add explicit rbtree
>>> invariant checks before and after rbtree manipulations, like what the
>>> check() function does in lib/rbtree_test.c, to see at which point do
>>> they get broken.
>> Great, any progress so far? :)
>
> Unfortunately no.
>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.



--
Regards,

Zhi Yong Wu

2013-09-03 06:58:11

by Zhi Yong Wu

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

On Tue, Sep 3, 2013 at 1:48 PM, Michel Lespinasse <[email protected]> wrote:
> On Mon, Sep 2, 2013 at 9:45 PM, Zhi Yong Wu <[email protected]> wrote:
>> On Mon, Sep 2, 2013 at 4:57 PM, Michel Lespinasse <[email protected]> wrote:
>>> Thanks for the link - I now better understand where you are coming
>>> from with these fixes.
>>>
>>> Going back to the original message:
>>>
>>>> diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
>>>> index fea49b5..7d19770 100644
>>>> --- a/include/linux/rbtree_augmented.h
>>>> +++ b/include/linux/rbtree_augmented.h
>>>> @@ -199,7 +199,8 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
>>>> }
>>>>
>>>> successor->rb_left = tmp = node->rb_left;
>>>> - rb_set_parent(tmp, successor);
>>>> + if (tmp)
>>>> + rb_set_parent(tmp, successor);
>>>>
>>>> pc = node->__rb_parent_color;
>>>> tmp = __rb_parent(pc);
>>>
>>> Note that node->rb_left was already fetched at the top of
>>> __rb_erase_augmented(), and was checked to be non-NULL at the time -
>>> otherwise we would have executed 'Case 1' in that function. So, you
>> If 'Case 1' is executed, this line of code is also done, how about the result?
>> 'Case 1' seems *not* to change node->rb_left at all.
>
> Wait, I believe this line of code is executed only in Case 2 and Case 3 ?
>
>>>> diff --git a/lib/rbtree.c b/lib/rbtree.c
>>>> index c0e31fe..2cb01ba 100644
>>>> --- a/lib/rbtree.c
>>>> +++ b/lib/rbtree.c
>>>> @@ -214,7 +214,7 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
>>>> */
>>>> sibling = parent->rb_right;
>>>> if (node != sibling) { /* node == parent->rb_left */
>>>> - if (rb_is_red(sibling)) {
>>>> + if (sibling && rb_is_red(sibling)) {
>>>> /*
>>>> * Case 1 - left rotate at parent
>>>> *
>>>
>>> Note the loop invariants quoted just above:
>>>
>>> /*
>>> * Loop invariants:
>>> * - node is black (or NULL on first iteration)
>>> * - node is not the root (parent is not NULL)
>>> * - All leaf paths going through parent and node have a
>>> * black node count that is 1 lower than other leaf paths.
>>> */
>>>
>>> Because of these, each path from sibling to a leaf must include at
>>> least one black node, which implies that sibling can't be NULL - or to
>>> put it another way, if sibling is null then the expected invariants
>>> were violated before we even got there.
>> In theory, i can understand what you mean, But don't know why and
>> where it got violated.
>
> Same here. My point is, I don't think we can fix the issue without
> answering that question.
>
>>> Now I had a quick look at your code and I couldn't tell at which point
>>> the invariants are violated. However I did notice a couple suspicious
>>> things in the very first patch
>>> (f5c8f2b256d87ac0bf789a787e6b795ac0c736e8):
>>>
>>> 1- In both hot_range_tree_free() and and hot_tree_exit(), you try to
>>> destroy rb trees by iterating on each node with rb_next() and then
>> yes, but this item may not been freed immediately, You can know each item
>> has its ref count.
>
> Are items guaranteed to have another refcount than the one we're dropping ?
>
>>> freeing them. Note that rb_next() can reference prior nodes, which
>>> have already been freed in your scheme, so that seems quite unsafe.
>> I checked rb_next() function, and found that if its prior nodes are
>> freed, is this node's parent not NULL?
>
> No, if the parent was freed with just a put() operation, the child
> will still have a pointer to it. This is why I suggested using
> rb_erase() on each node before freeing them, so that we don't keep
> pointers to freed nodes.
>
>>> The simplest fix would be to do a full rb_erase() on each node before
>> full rb_erase()? sorry, i don't get what you mean. Do you mean we
>> should erase all nodes from rbtree, then begin to free them? If yes,
>> how to iterate them? If no, can you elaborate it?
>
> No, I meant to call rb_erase() on each individual node right before
> the corresponding put() operation.
It has been done in current code.
>
>>> 2- I did not look long enough to understand the locking, but it wasn't
>>> clear to me if you lock the rbtrees when doing rb_erase() on them
>>> (while I could more clearly see that you do it for insertions).
>> Yes, it get locking when doing rb_erase() or rb_insert(). You can see
>> there are multiple functions maybe rbtree at the same time. To sync
>> them, we need to lock the rbtree.
>
> Yes, agree we need to lock rbtree in all such operations. I just
> wasn't able to determine if it's done around rb_erase() calls, but it
Yes, the locking has been done around rb_erase().
> definitely needs to be.
>
>>> I'm really not sure if either of these will fix the issues you're
>>> seeing, though. What I would try next would be to add explicit rbtree
>>> invariant checks before and after rbtree manipulations, like what the
>>> check() function does in lib/rbtree_test.c, to see at which point do
>>> they get broken.
>> Great, any progress so far? :)
>
> Unfortunately no.
Look forward to seeing it.
>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.



--
Regards,

Zhi Yong Wu

2013-09-04 17:22:46

by Zhi Yong Wu

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

On Mon, Sep 2, 2013 at 4:57 PM, Michel Lespinasse <[email protected]> wrote:
> On Sun, Sep 1, 2013 at 11:30 PM, Zhi Yong Wu <[email protected]> wrote:
>> In Tue, Aug 27, 2013 at 6:01 AM, Michel Lespinasse <[email protected]> wrote:
>>> On Fri, Aug 23, 2013 at 7:45 AM, <[email protected]> wrote:
>>>> From: Zhi Yong Wu <[email protected]>
>>>>
>>>> Signed-off-by: Zhi Yong Wu <[email protected]>
>>>> ---
>>>> include/linux/rbtree_augmented.h | 3 ++-
>>>> lib/rbtree.c | 5 +++--
>>>> 2 files changed, 5 insertions(+), 3 deletions(-)
>>>
>>> So, you are saying that the checks are necessary, but you are not saying why.
>>>
>>> The way I see it, the checks are *not* necessary, because the rbtree
>>> invariants guarantee them to be true. The only way for the checks to
>>> fail would be if people directly manipulate the rbtrees without going
>>> through the proper APIs, and if they do that then I think they're on
>>> their own. So to me, I think it's the same situation as dereferencing
>>> a pointer without checking if it's NULL, because you know it should
>>> never be NULL - which in my eyes is perfectly acceptable.
>> In my patchset, some rbtree APIs to be invoked, and I think that those
>> rbtree APIs are used corrently, Below is the pointer of its code:
>> https://github.com/wuzhy/kernel/compare/torvalds:master...hot_tracking
>> But I hit some issues when using compilebench to do perf benchmark.
>> compile dir kernel-7 691MB in 8.92 seconds (77.53 MB/s)
>
> Thanks for the link - I now better understand where you are coming
> from with these fixes.
>
> Going back to the original message:
>
>> diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
>> index fea49b5..7d19770 100644
>> --- a/include/linux/rbtree_augmented.h
>> +++ b/include/linux/rbtree_augmented.h
>> @@ -199,7 +199,8 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
>> }
>>
>> successor->rb_left = tmp = node->rb_left;
>> - rb_set_parent(tmp, successor);
>> + if (tmp)
>> + rb_set_parent(tmp, successor);
>>
>> pc = node->__rb_parent_color;
>> tmp = __rb_parent(pc);
>
> Note that node->rb_left was already fetched at the top of
> __rb_erase_augmented(), and was checked to be non-NULL at the time -
> otherwise we would have executed 'Case 1' in that function. So, you
> are not expected to find tmp == NULL here.
>
>> diff --git a/lib/rbtree.c b/lib/rbtree.c
>> index c0e31fe..2cb01ba 100644
>> --- a/lib/rbtree.c
>> +++ b/lib/rbtree.c
>> @@ -214,7 +214,7 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
>> */
>> sibling = parent->rb_right;
>> if (node != sibling) { /* node == parent->rb_left */
>> - if (rb_is_red(sibling)) {
>> + if (sibling && rb_is_red(sibling)) {
>> /*
>> * Case 1 - left rotate at parent
>> *
>
> Note the loop invariants quoted just above:
>
> /*
> * Loop invariants:
> * - node is black (or NULL on first iteration)
> * - node is not the root (parent is not NULL)
> * - All leaf paths going through parent and node have a
> * black node count that is 1 lower than other leaf paths.
> */
>
> Because of these, each path from sibling to a leaf must include at
> least one black node, which implies that sibling can't be NULL - or to
> put it another way, if sibling is null then the expected invariants
> were violated before we even got there.
>
>> @@ -226,7 +226,8 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
>> */
>> parent->rb_right = tmp1 = sibling->rb_left;
>> sibling->rb_left = parent;
>> - rb_set_parent_color(tmp1, parent, RB_BLACK);
>> + if (tmp1)
>> + rb_set_parent_color(tmp1, parent, RB_BLACK);
>> __rb_rotate_set_parents(parent, sibling, root,
>> RB_RED);
>> augment_rotate(parent, sibling);
>
> This is actually the same invariant here - each path from sibling to a
> leaf must include at least one black node, and sibling is now known to
> be red, so it must have two black children.
If sibling is red, it can be made sure to have two non-null black
children? but my patchset sometimes trigger red sibling to have no
non-null black children. Do you know what reason usually cause this?
You know rbtree code is very tricky.

>
>
> Now I had a quick look at your code and I couldn't tell at which point
> the invariants are violated. However I did notice a couple suspicious
> things in the very first patch
> (f5c8f2b256d87ac0bf789a787e6b795ac0c736e8):
>
> 1- In both hot_range_tree_free() and and hot_tree_exit(), you try to
> destroy rb trees by iterating on each node with rb_next() and then
> freeing them. Note that rb_next() can reference prior nodes, which
> have already been freed in your scheme, so that seems quite unsafe.
>
> The simplest fix would be to do a full rb_erase() on each node before
> freeing it. (you may be able to avoid rebalancing the tree here as
> you're going to destroy it all, but if you really have that need it
> would be better to come up with a new API to cover it rather than
> hardcode it where you need it - I think it's easiest to start with the
> simple dumb fix of using rb_erase).
>
> 2- I did not look long enough to understand the locking, but it wasn't
> clear to me if you lock the rbtrees when doing rb_erase() on them
> (while I could more clearly see that you do it for insertions).
>
> I'm really not sure if either of these will fix the issues you're
> seeing, though. What I would try next would be to add explicit rbtree
> invariant checks before and after rbtree manipulations, like what the
> check() function does in lib/rbtree_test.c, to see at which point do
> they get broken.
>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.



--
Regards,

Zhi Yong Wu

2013-09-04 23:59:17

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

On Thu, 2013-09-05 at 01:22 +0800, Zhi Yong Wu wrote:
> On Mon, Sep 2, 2013 at 4:57 PM, Michel Lespinasse <[email protected]> wrote:
> > On Sun, Sep 1, 2013 at 11:30 PM, Zhi Yong Wu <[email protected]> wrote:
> >> In Tue, Aug 27, 2013 at 6:01 AM, Michel Lespinasse <[email protected]> wrote:
> >>> On Fri, Aug 23, 2013 at 7:45 AM, <[email protected]> wrote:
> >>>> From: Zhi Yong Wu <[email protected]>
> >>>>
> >>>> Signed-off-by: Zhi Yong Wu <[email protected]>
> >>>> ---
> >>>> include/linux/rbtree_augmented.h | 3 ++-
> >>>> lib/rbtree.c | 5 +++--
> >>>> 2 files changed, 5 insertions(+), 3 deletions(-)
> >>>
> >>> So, you are saying that the checks are necessary, but you are not saying why.
> >>>
> >>> The way I see it, the checks are *not* necessary, because the rbtree
> >>> invariants guarantee them to be true. The only way for the checks to
> >>> fail would be if people directly manipulate the rbtrees without going
> >>> through the proper APIs, and if they do that then I think they're on
> >>> their own. So to me, I think it's the same situation as dereferencing
> >>> a pointer without checking if it's NULL, because you know it should
> >>> never be NULL - which in my eyes is perfectly acceptable.
> >> In my patchset, some rbtree APIs to be invoked, and I think that those
> >> rbtree APIs are used corrently, Below is the pointer of its code:
> >> https://github.com/wuzhy/kernel/compare/torvalds:master...hot_tracking
> >> But I hit some issues when using compilebench to do perf benchmark.
> >> compile dir kernel-7 691MB in 8.92 seconds (77.53 MB/s)
> >
> > Thanks for the link - I now better understand where you are coming
> > from with these fixes.
> >
> > Going back to the original message:
> >
> >> diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
> >> index fea49b5..7d19770 100644
> >> --- a/include/linux/rbtree_augmented.h
> >> +++ b/include/linux/rbtree_augmented.h
> >> @@ -199,7 +199,8 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
> >> }
> >>
> >> successor->rb_left = tmp = node->rb_left;
> >> - rb_set_parent(tmp, successor);
> >> + if (tmp)
> >> + rb_set_parent(tmp, successor);
> >>
> >> pc = node->__rb_parent_color;
> >> tmp = __rb_parent(pc);
> >
> > Note that node->rb_left was already fetched at the top of
> > __rb_erase_augmented(), and was checked to be non-NULL at the time -
> > otherwise we would have executed 'Case 1' in that function. So, you
> > are not expected to find tmp == NULL here.
> >
> >> diff --git a/lib/rbtree.c b/lib/rbtree.c
> >> index c0e31fe..2cb01ba 100644
> >> --- a/lib/rbtree.c
> >> +++ b/lib/rbtree.c
> >> @@ -214,7 +214,7 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
> >> */
> >> sibling = parent->rb_right;
> >> if (node != sibling) { /* node == parent->rb_left */
> >> - if (rb_is_red(sibling)) {
> >> + if (sibling && rb_is_red(sibling)) {
> >> /*
> >> * Case 1 - left rotate at parent
> >> *
> >
> > Note the loop invariants quoted just above:
> >
> > /*
> > * Loop invariants:
> > * - node is black (or NULL on first iteration)
> > * - node is not the root (parent is not NULL)
> > * - All leaf paths going through parent and node have a
> > * black node count that is 1 lower than other leaf paths.
> > */
> >
> > Because of these, each path from sibling to a leaf must include at
> > least one black node, which implies that sibling can't be NULL - or to
> > put it another way, if sibling is null then the expected invariants
> > were violated before we even got there.
> >
> >> @@ -226,7 +226,8 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
> >> */
> >> parent->rb_right = tmp1 = sibling->rb_left;
> >> sibling->rb_left = parent;
> >> - rb_set_parent_color(tmp1, parent, RB_BLACK);
> >> + if (tmp1)
> >> + rb_set_parent_color(tmp1, parent, RB_BLACK);
> >> __rb_rotate_set_parents(parent, sibling, root,
> >> RB_RED);
> >> augment_rotate(parent, sibling);
> >
> > This is actually the same invariant here - each path from sibling to a
> > leaf must include at least one black node, and sibling is now known to
> > be red, so it must have two black children.
> If sibling is red, it can be made sure to have two non-null black
> children?

This is guaranteed by cases 1 and 2 in __rb_insert().

> but my patchset sometimes trigger red sibling to have no
> non-null black children. Do you know what reason usually cause this?
> You know rbtree code is very tricky.

I haven't looked at your code, but a good way of verifying the tree
integrity is running rbtree_test.

Thanks,
Davidlohr

>
> >
> >
> > Now I had a quick look at your code and I couldn't tell at which point
> > the invariants are violated. However I did notice a couple suspicious
> > things in the very first patch
> > (f5c8f2b256d87ac0bf789a787e6b795ac0c736e8):
> >
> > 1- In both hot_range_tree_free() and and hot_tree_exit(), you try to
> > destroy rb trees by iterating on each node with rb_next() and then
> > freeing them. Note that rb_next() can reference prior nodes, which
> > have already been freed in your scheme, so that seems quite unsafe.
> >
> > The simplest fix would be to do a full rb_erase() on each node before
> > freeing it. (you may be able to avoid rebalancing the tree here as
> > you're going to destroy it all, but if you really have that need it
> > would be better to come up with a new API to cover it rather than
> > hardcode it where you need it - I think it's easiest to start with the
> > simple dumb fix of using rb_erase).
> >
> > 2- I did not look long enough to understand the locking, but it wasn't
> > clear to me if you lock the rbtrees when doing rb_erase() on them
> > (while I could more clearly see that you do it for insertions).
> >
> > I'm really not sure if either of these will fix the issues you're
> > seeing, though. What I would try next would be to add explicit rbtree
> > invariant checks before and after rbtree manipulations, like what the
> > check() function does in lib/rbtree_test.c, to see at which point do
> > they get broken.
> >
> > --
> > Michel "Walken" Lespinasse
> > A program is never fully debugged until the last user dies.
>
>
>

2013-09-05 00:37:21

by Zhi Yong Wu

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

On Thu, Sep 5, 2013 at 7:59 AM, Davidlohr Bueso <[email protected]> wrote:
> On Thu, 2013-09-05 at 01:22 +0800, Zhi Yong Wu wrote:
>> On Mon, Sep 2, 2013 at 4:57 PM, Michel Lespinasse <[email protected]> wrote:
>> > On Sun, Sep 1, 2013 at 11:30 PM, Zhi Yong Wu <[email protected]> wrote:
>> >> In Tue, Aug 27, 2013 at 6:01 AM, Michel Lespinasse <[email protected]> wrote:
>> >>> On Fri, Aug 23, 2013 at 7:45 AM, <[email protected]> wrote:
>> >>>> From: Zhi Yong Wu <[email protected]>
>> >>>>
>> >>>> Signed-off-by: Zhi Yong Wu <[email protected]>
>> >>>> ---
>> >>>> include/linux/rbtree_augmented.h | 3 ++-
>> >>>> lib/rbtree.c | 5 +++--
>> >>>> 2 files changed, 5 insertions(+), 3 deletions(-)
>> >>>
>> >>> So, you are saying that the checks are necessary, but you are not saying why.
>> >>>
>> >>> The way I see it, the checks are *not* necessary, because the rbtree
>> >>> invariants guarantee them to be true. The only way for the checks to
>> >>> fail would be if people directly manipulate the rbtrees without going
>> >>> through the proper APIs, and if they do that then I think they're on
>> >>> their own. So to me, I think it's the same situation as dereferencing
>> >>> a pointer without checking if it's NULL, because you know it should
>> >>> never be NULL - which in my eyes is perfectly acceptable.
>> >> In my patchset, some rbtree APIs to be invoked, and I think that those
>> >> rbtree APIs are used corrently, Below is the pointer of its code:
>> >> https://github.com/wuzhy/kernel/compare/torvalds:master...hot_tracking
>> >> But I hit some issues when using compilebench to do perf benchmark.
>> >> compile dir kernel-7 691MB in 8.92 seconds (77.53 MB/s)
>> >
>> > Thanks for the link - I now better understand where you are coming
>> > from with these fixes.
>> >
>> > Going back to the original message:
>> >
>> >> diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
>> >> index fea49b5..7d19770 100644
>> >> --- a/include/linux/rbtree_augmented.h
>> >> +++ b/include/linux/rbtree_augmented.h
>> >> @@ -199,7 +199,8 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
>> >> }
>> >>
>> >> successor->rb_left = tmp = node->rb_left;
>> >> - rb_set_parent(tmp, successor);
>> >> + if (tmp)
>> >> + rb_set_parent(tmp, successor);
>> >>
>> >> pc = node->__rb_parent_color;
>> >> tmp = __rb_parent(pc);
>> >
>> > Note that node->rb_left was already fetched at the top of
>> > __rb_erase_augmented(), and was checked to be non-NULL at the time -
>> > otherwise we would have executed 'Case 1' in that function. So, you
>> > are not expected to find tmp == NULL here.
>> >
>> >> diff --git a/lib/rbtree.c b/lib/rbtree.c
>> >> index c0e31fe..2cb01ba 100644
>> >> --- a/lib/rbtree.c
>> >> +++ b/lib/rbtree.c
>> >> @@ -214,7 +214,7 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
>> >> */
>> >> sibling = parent->rb_right;
>> >> if (node != sibling) { /* node == parent->rb_left */
>> >> - if (rb_is_red(sibling)) {
>> >> + if (sibling && rb_is_red(sibling)) {
>> >> /*
>> >> * Case 1 - left rotate at parent
>> >> *
>> >
>> > Note the loop invariants quoted just above:
>> >
>> > /*
>> > * Loop invariants:
>> > * - node is black (or NULL on first iteration)
>> > * - node is not the root (parent is not NULL)
>> > * - All leaf paths going through parent and node have a
>> > * black node count that is 1 lower than other leaf paths.
>> > */
>> >
>> > Because of these, each path from sibling to a leaf must include at
>> > least one black node, which implies that sibling can't be NULL - or to
>> > put it another way, if sibling is null then the expected invariants
>> > were violated before we even got there.
>> >
>> >> @@ -226,7 +226,8 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
>> >> */
>> >> parent->rb_right = tmp1 = sibling->rb_left;
>> >> sibling->rb_left = parent;
>> >> - rb_set_parent_color(tmp1, parent, RB_BLACK);
>> >> + if (tmp1)
>> >> + rb_set_parent_color(tmp1, parent, RB_BLACK);
>> >> __rb_rotate_set_parents(parent, sibling, root,
>> >> RB_RED);
>> >> augment_rotate(parent, sibling);
>> >
>> > This is actually the same invariant here - each path from sibling to a
>> > leaf must include at least one black node, and sibling is now known to
>> > be red, so it must have two black children.
>> If sibling is red, it can be made sure to have two non-null black
>> children?
>
> This is guaranteed by cases 1 and 2 in __rb_insert().
ah, but this code is very tricky.

>
>> but my patchset sometimes trigger red sibling to have no
>> non-null black children. Do you know what reason usually cause this?
>> You know rbtree code is very tricky.
>
> I haven't looked at your code, but a good way of verifying the tree
> integrity is running rbtree_test.
rbtree_test seem to be not available for my patchset. my perf testing
is super large scale, and it will try to create 1,0000,000 rb_nodes,
while rbtree_test try to verify the rbtree when every rb_node is
reserted or erased. This will cause my perf testing to be running very
very slowly.

>
> Thanks,
> Davidlohr
>
>>
>> >
>> >
>> > Now I had a quick look at your code and I couldn't tell at which point
>> > the invariants are violated. However I did notice a couple suspicious
>> > things in the very first patch
>> > (f5c8f2b256d87ac0bf789a787e6b795ac0c736e8):
>> >
>> > 1- In both hot_range_tree_free() and and hot_tree_exit(), you try to
>> > destroy rb trees by iterating on each node with rb_next() and then
>> > freeing them. Note that rb_next() can reference prior nodes, which
>> > have already been freed in your scheme, so that seems quite unsafe.
>> >
>> > The simplest fix would be to do a full rb_erase() on each node before
>> > freeing it. (you may be able to avoid rebalancing the tree here as
>> > you're going to destroy it all, but if you really have that need it
>> > would be better to come up with a new API to cover it rather than
>> > hardcode it where you need it - I think it's easiest to start with the
>> > simple dumb fix of using rb_erase).
>> >
>> > 2- I did not look long enough to understand the locking, but it wasn't
>> > clear to me if you lock the rbtrees when doing rb_erase() on them
>> > (while I could more clearly see that you do it for insertions).
>> >
>> > I'm really not sure if either of these will fix the issues you're
>> > seeing, though. What I would try next would be to add explicit rbtree
>> > invariant checks before and after rbtree manipulations, like what the
>> > check() function does in lib/rbtree_test.c, to see at which point do
>> > they get broken.
>> >
>> > --
>> > Michel "Walken" Lespinasse
>> > A program is never fully debugged until the last user dies.
>>
>>
>>
>
>



--
Regards,

Zhi Yong Wu

2013-09-05 01:12:24

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

On Thu, 2013-09-05 at 08:37 +0800, Zhi Yong Wu wrote:
> On Thu, Sep 5, 2013 at 7:59 AM, Davidlohr Bueso <[email protected]> wrote:
> > On Thu, 2013-09-05 at 01:22 +0800, Zhi Yong Wu wrote:
> >> On Mon, Sep 2, 2013 at 4:57 PM, Michel Lespinasse <[email protected]> wrote:
> >> > On Sun, Sep 1, 2013 at 11:30 PM, Zhi Yong Wu <[email protected]> wrote:
> >> >> In Tue, Aug 27, 2013 at 6:01 AM, Michel Lespinasse <[email protected]> wrote:
> >> >>> On Fri, Aug 23, 2013 at 7:45 AM, <[email protected]> wrote:
> >> >>>> From: Zhi Yong Wu <[email protected]>
> >> >>>>
> >> >>>> Signed-off-by: Zhi Yong Wu <[email protected]>
> >> >>>> ---
> >> >>>> include/linux/rbtree_augmented.h | 3 ++-
> >> >>>> lib/rbtree.c | 5 +++--
> >> >>>> 2 files changed, 5 insertions(+), 3 deletions(-)
> >> >>>
> >> >>> So, you are saying that the checks are necessary, but you are not saying why.
> >> >>>
> >> >>> The way I see it, the checks are *not* necessary, because the rbtree
> >> >>> invariants guarantee them to be true. The only way for the checks to
> >> >>> fail would be if people directly manipulate the rbtrees without going
> >> >>> through the proper APIs, and if they do that then I think they're on
> >> >>> their own. So to me, I think it's the same situation as dereferencing
> >> >>> a pointer without checking if it's NULL, because you know it should
> >> >>> never be NULL - which in my eyes is perfectly acceptable.
> >> >> In my patchset, some rbtree APIs to be invoked, and I think that those
> >> >> rbtree APIs are used corrently, Below is the pointer of its code:
> >> >> https://github.com/wuzhy/kernel/compare/torvalds:master...hot_tracking
> >> >> But I hit some issues when using compilebench to do perf benchmark.
> >> >> compile dir kernel-7 691MB in 8.92 seconds (77.53 MB/s)
> >> >
> >> > Thanks for the link - I now better understand where you are coming
> >> > from with these fixes.
> >> >
> >> > Going back to the original message:
> >> >
> >> >> diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
> >> >> index fea49b5..7d19770 100644
> >> >> --- a/include/linux/rbtree_augmented.h
> >> >> +++ b/include/linux/rbtree_augmented.h
> >> >> @@ -199,7 +199,8 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
> >> >> }
> >> >>
> >> >> successor->rb_left = tmp = node->rb_left;
> >> >> - rb_set_parent(tmp, successor);
> >> >> + if (tmp)
> >> >> + rb_set_parent(tmp, successor);
> >> >>
> >> >> pc = node->__rb_parent_color;
> >> >> tmp = __rb_parent(pc);
> >> >
> >> > Note that node->rb_left was already fetched at the top of
> >> > __rb_erase_augmented(), and was checked to be non-NULL at the time -
> >> > otherwise we would have executed 'Case 1' in that function. So, you
> >> > are not expected to find tmp == NULL here.
> >> >
> >> >> diff --git a/lib/rbtree.c b/lib/rbtree.c
> >> >> index c0e31fe..2cb01ba 100644
> >> >> --- a/lib/rbtree.c
> >> >> +++ b/lib/rbtree.c
> >> >> @@ -214,7 +214,7 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
> >> >> */
> >> >> sibling = parent->rb_right;
> >> >> if (node != sibling) { /* node == parent->rb_left */
> >> >> - if (rb_is_red(sibling)) {
> >> >> + if (sibling && rb_is_red(sibling)) {
> >> >> /*
> >> >> * Case 1 - left rotate at parent
> >> >> *
> >> >
> >> > Note the loop invariants quoted just above:
> >> >
> >> > /*
> >> > * Loop invariants:
> >> > * - node is black (or NULL on first iteration)
> >> > * - node is not the root (parent is not NULL)
> >> > * - All leaf paths going through parent and node have a
> >> > * black node count that is 1 lower than other leaf paths.
> >> > */
> >> >
> >> > Because of these, each path from sibling to a leaf must include at
> >> > least one black node, which implies that sibling can't be NULL - or to
> >> > put it another way, if sibling is null then the expected invariants
> >> > were violated before we even got there.
> >> >
> >> >> @@ -226,7 +226,8 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
> >> >> */
> >> >> parent->rb_right = tmp1 = sibling->rb_left;
> >> >> sibling->rb_left = parent;
> >> >> - rb_set_parent_color(tmp1, parent, RB_BLACK);
> >> >> + if (tmp1)
> >> >> + rb_set_parent_color(tmp1, parent, RB_BLACK);
> >> >> __rb_rotate_set_parents(parent, sibling, root,
> >> >> RB_RED);
> >> >> augment_rotate(parent, sibling);
> >> >
> >> > This is actually the same invariant here - each path from sibling to a
> >> > leaf must include at least one black node, and sibling is now known to
> >> > be red, so it must have two black children.
> >> If sibling is red, it can be made sure to have two non-null black
> >> children?
> >
> > This is guaranteed by cases 1 and 2 in __rb_insert().
> ah, but this code is very tricky.
>

While it's not trivial, it is a lot more readable than a whole bunch of
red-black tree implementations out there - not to mention optimized.

AFAICT by the thread, you have yet to provide a case where, by properly
using the rbtree API, the tree implementation does not comply.

> >
> >> but my patchset sometimes trigger red sibling to have no
> >> non-null black children. Do you know what reason usually cause this?
> >> You know rbtree code is very tricky.
> >
> > I haven't looked at your code, but a good way of verifying the tree
> > integrity is running rbtree_test.
> rbtree_test seem to be not available for my patchset.

Why not? Is this an older kernel you're dealing with?

> my perf testing
> is super large scale, and it will try to create 1,0000,000 rb_nodes,
> while rbtree_test try to verify the rbtree when every rb_node is
> reserted or erased. This will cause my perf testing to be running very
> very slowly.
>

May I ask what you are attempting to do here? Are you trying to stress
the kernel's rbtree implementation?

Well, performance isn't a concern when doing this kind of testing. Yes,
the tree integrity verification (check() calls) is done when inserting
and deleting every node, which is the whole purpose of such tests. I
admit that the rbtree_test module is a bit limited as to user options -
ie: making the amount of nodes be a parameter is on my todo list. That
said, the check() function does properly test the rbtree properties, and
so far it complies.

Thanks,
Davidlohr

2013-09-05 01:25:42

by Zhi Yong Wu

[permalink] [raw]
Subject: Re: [PATCH] rbtree: Add some necessary condition checks

On Thu, Sep 5, 2013 at 9:12 AM, Davidlohr Bueso <[email protected]> wrote:
> On Thu, 2013-09-05 at 08:37 +0800, Zhi Yong Wu wrote:
>> On Thu, Sep 5, 2013 at 7:59 AM, Davidlohr Bueso <[email protected]> wrote:
>> > On Thu, 2013-09-05 at 01:22 +0800, Zhi Yong Wu wrote:
>> >> On Mon, Sep 2, 2013 at 4:57 PM, Michel Lespinasse <[email protected]> wrote:
>> >> > On Sun, Sep 1, 2013 at 11:30 PM, Zhi Yong Wu <[email protected]> wrote:
>> >> >> In Tue, Aug 27, 2013 at 6:01 AM, Michel Lespinasse <[email protected]> wrote:
>> >> >>> On Fri, Aug 23, 2013 at 7:45 AM, <[email protected]> wrote:
>> >> >>>> From: Zhi Yong Wu <[email protected]>
>> >> >>>>
>> >> >>>> Signed-off-by: Zhi Yong Wu <[email protected]>
>> >> >>>> ---
>> >> >>>> include/linux/rbtree_augmented.h | 3 ++-
>> >> >>>> lib/rbtree.c | 5 +++--
>> >> >>>> 2 files changed, 5 insertions(+), 3 deletions(-)
>> >> >>>
>> >> >>> So, you are saying that the checks are necessary, but you are not saying why.
>> >> >>>
>> >> >>> The way I see it, the checks are *not* necessary, because the rbtree
>> >> >>> invariants guarantee them to be true. The only way for the checks to
>> >> >>> fail would be if people directly manipulate the rbtrees without going
>> >> >>> through the proper APIs, and if they do that then I think they're on
>> >> >>> their own. So to me, I think it's the same situation as dereferencing
>> >> >>> a pointer without checking if it's NULL, because you know it should
>> >> >>> never be NULL - which in my eyes is perfectly acceptable.
>> >> >> In my patchset, some rbtree APIs to be invoked, and I think that those
>> >> >> rbtree APIs are used corrently, Below is the pointer of its code:
>> >> >> https://github.com/wuzhy/kernel/compare/torvalds:master...hot_tracking
>> >> >> But I hit some issues when using compilebench to do perf benchmark.
>> >> >> compile dir kernel-7 691MB in 8.92 seconds (77.53 MB/s)
>> >> >
>> >> > Thanks for the link - I now better understand where you are coming
>> >> > from with these fixes.
>> >> >
>> >> > Going back to the original message:
>> >> >
>> >> >> diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
>> >> >> index fea49b5..7d19770 100644
>> >> >> --- a/include/linux/rbtree_augmented.h
>> >> >> +++ b/include/linux/rbtree_augmented.h
>> >> >> @@ -199,7 +199,8 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
>> >> >> }
>> >> >>
>> >> >> successor->rb_left = tmp = node->rb_left;
>> >> >> - rb_set_parent(tmp, successor);
>> >> >> + if (tmp)
>> >> >> + rb_set_parent(tmp, successor);
>> >> >>
>> >> >> pc = node->__rb_parent_color;
>> >> >> tmp = __rb_parent(pc);
>> >> >
>> >> > Note that node->rb_left was already fetched at the top of
>> >> > __rb_erase_augmented(), and was checked to be non-NULL at the time -
>> >> > otherwise we would have executed 'Case 1' in that function. So, you
>> >> > are not expected to find tmp == NULL here.
>> >> >
>> >> >> diff --git a/lib/rbtree.c b/lib/rbtree.c
>> >> >> index c0e31fe..2cb01ba 100644
>> >> >> --- a/lib/rbtree.c
>> >> >> +++ b/lib/rbtree.c
>> >> >> @@ -214,7 +214,7 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
>> >> >> */
>> >> >> sibling = parent->rb_right;
>> >> >> if (node != sibling) { /* node == parent->rb_left */
>> >> >> - if (rb_is_red(sibling)) {
>> >> >> + if (sibling && rb_is_red(sibling)) {
>> >> >> /*
>> >> >> * Case 1 - left rotate at parent
>> >> >> *
>> >> >
>> >> > Note the loop invariants quoted just above:
>> >> >
>> >> > /*
>> >> > * Loop invariants:
>> >> > * - node is black (or NULL on first iteration)
>> >> > * - node is not the root (parent is not NULL)
>> >> > * - All leaf paths going through parent and node have a
>> >> > * black node count that is 1 lower than other leaf paths.
>> >> > */
>> >> >
>> >> > Because of these, each path from sibling to a leaf must include at
>> >> > least one black node, which implies that sibling can't be NULL - or to
>> >> > put it another way, if sibling is null then the expected invariants
>> >> > were violated before we even got there.
>> >> >
>> >> >> @@ -226,7 +226,8 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
>> >> >> */
>> >> >> parent->rb_right = tmp1 = sibling->rb_left;
>> >> >> sibling->rb_left = parent;
>> >> >> - rb_set_parent_color(tmp1, parent, RB_BLACK);
>> >> >> + if (tmp1)
>> >> >> + rb_set_parent_color(tmp1, parent, RB_BLACK);
>> >> >> __rb_rotate_set_parents(parent, sibling, root,
>> >> >> RB_RED);
>> >> >> augment_rotate(parent, sibling);
>> >> >
>> >> > This is actually the same invariant here - each path from sibling to a
>> >> > leaf must include at least one black node, and sibling is now known to
>> >> > be red, so it must have two black children.
>> >> If sibling is red, it can be made sure to have two non-null black
>> >> children?
>> >
>> > This is guaranteed by cases 1 and 2 in __rb_insert().
>> ah, but this code is very tricky.
>>
>
> While it's not trivial, it is a lot more readable than a whole bunch of
> red-black tree implementations out there - not to mention optimized.
>
> AFAICT by the thread, you have yet to provide a case where, by properly
> using the rbtree API, the tree implementation does not comply.
>
>> >
>> >> but my patchset sometimes trigger red sibling to have no
>> >> non-null black children. Do you know what reason usually cause this?
>> >> You know rbtree code is very tricky.
>> >
>> > I haven't looked at your code, but a good way of verifying the tree
>> > integrity is running rbtree_test.
>> rbtree_test seem to be not available for my patchset.
>
> Why not? Is this an older kernel you're dealing with?
It was built with latest kernel upstream. As i said below, it will
cause this perf testing to be running very very slowly.
>
>> my perf testing
>> is super large scale, and it will try to create 1,0000,000 rb_nodes,
>> while rbtree_test try to verify the rbtree when every rb_node is
>> reserted or erased. This will cause my perf testing to be running very
>> very slowly.
>>
>
> May I ask what you are attempting to do here? Are you trying to stress
My patchset is using the rbtree to record I/O frequency for each inode
and its range in VFS layer. Its rbtree may be very, very big if you
create a lot of files.
> the kernel's rbtree implementation?
No. I am trying to stress the rbtree which is created by my patchset.
Below is my patchset, and i don't find it is calling the rbtree APIs
uncorrectly. right?
https://github.com/wuzhy/kernel/compare/torvalds:master...hot_tracking
>
> Well, performance isn't a concern when doing this kind of testing. Yes,
> the tree integrity verification (check() calls) is done when inserting
> and deleting every node, which is the whole purpose of such tests. I
> admit that the rbtree_test module is a bit limited as to user options -
> ie: making the amount of nodes be a parameter is on my todo list. That
> said, the check() function does properly test the rbtree properties, and
> so far it complies.
>
> Thanks,
> Davidlohr
>



--
Regards,

Zhi Yong Wu