2008-01-06 22:03:28

by Benjamin LaHaise

[permalink] [raw]
Subject: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces

Hello folks,

2.6.24-rc6 regresses on the 10000 network interface creation test relative to
2.6.23. The cause appears to be the new code in sysctl_check_lookup(), which
shows up as the #1 item while profiling. Is a revert of this new code
possible until its scaling issues are fixed? 2.6.23 can do more than 100 new
network interfaces per second for the first few thousand devices, but with
2.6.24-rc6 the results drop off rather dramatically to less than 10 interfaces
per second. The 10000 interface test is unbearable with the new sysctl_check
code.

time to creat 100 more interfaces
interfaces v2.6.24-rc6 sysctl_check disabled
0 0.729s 0.222s
100 1.791s 0.223s
200 3.966s 0.230s
300 7.460s 0.236s
400 10.747s 0.241s
500 13.633s 0.252s

samples % app name symbol name
524598 33.4231 vmlinux sysctl_check_lookup
297996 18.9859 vmlinux cpu_idle
130263 8.2993 vmlinux __rcu_pending
123953 7.8973 vmlinux sysctl_head_next
121691 7.7532 vmlinux quicklist_trim
89624 5.7101 vmlinux strcmp
87257 5.5593 vmlinux rcu_pending
78475 4.9998 vmlinux poll_idle
43721 2.7855 vmlinux check_pgt_cache
34454 2.1951 vmlinux sysctl_parent
7494 0.4775 vmlinux rt_run_flush

-ben


diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index a68425a..a4468e2 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1459,6 +1459,8 @@ static void sysctl_check_bin_path(struct ctl_table *table, const char **fail)
int sysctl_check_table(struct ctl_table *table)
{
int error = 0;
+ return 0;
+
for (; table->ctl_name || table->procname; table++) {
const char *fail = NULL;


2008-01-07 06:58:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces

Benjamin LaHaise <[email protected]> writes:

> Hello folks,
>
> 2.6.24-rc6 regresses on the 10000 network interface creation test relative to
> 2.6.23. The cause appears to be the new code in sysctl_check_lookup(), which
> shows up as the #1 item while profiling. Is a revert of this new code
> possible until its scaling issues are fixed? 2.6.23 can do more than 100 new
> network interfaces per second for the first few thousand devices, but with
> 2.6.24-rc6 the results drop off rather dramatically to less than 10 interfaces
> per second. The 10000 interface test is unbearable with the new sysctl_check
> code.

Why do we need 10000 interfaces? Why isn't network device creation a
slow path?

The problem seems to be in the data structures used by sysctl.
You are increasing the length of the linked list each time you
add a network interface. So sysctl lookups slow down.
At 10000 entries that is a long linked list walk.

At 100000 things get even longer. Now the numbers you report still
seem like a lot of time to me. My guess would be that we are getting
badly into cache miss territory.

If what you describe is a real scenario where users care we need
to fix the sysctl data structures so that they scale.

Because of this bug report and another one I got earlier today
about a real bug in the parallel port code detected by the very
lookup that is slowing you down. I am quite reluctant to contemplate
pulling this code. It seems to be doing it's job, if in some cases
uncomfortably so.

So is this a bug report telling me that there are users with
10k or 100k interfaces that care. So we need to fix sysctl.

Is there a specific kernel test case that is run often that having
slow sysctl performance matters for? CONFIG_SYSCTL=n should solve
that if it is specialized.

Eric

2008-01-07 07:10:50

by David Miller

[permalink] [raw]
Subject: Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces

From: [email protected] (Eric W. Biederman)
Date: Sun, 06 Jan 2008 23:57:57 -0700

> Why do we need 10000 interfaces? Why isn't network device creation a
> slow path?

Because people create virtual devices like mad.

> So is this a bug report telling me that there are users with
> 10k or 100k interfaces that care. So we need to fix sysctl.

Unquestionably, we do, it's a major regression.

People create thousands of VLAN devices, as one of many examples, all
the time.

That's why we even bother hashing network devices in the network code.

2008-01-07 10:25:54

by Eric W. Biederman

[permalink] [raw]
Subject: Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces

David Miller <[email protected]> writes:

> From: [email protected] (Eric W. Biederman)
> Date: Sun, 06 Jan 2008 23:57:57 -0700
>
>> Why do we need 10000 interfaces? Why isn't network device creation a
>> slow path?
>
> Because people create virtual devices like mad.
>
>> So is this a bug report telling me that there are users with
>> 10k or 100k interfaces that care. So we need to fix sysctl.
>
> Unquestionably, we do, it's a major regression.
>
> People create thousands of VLAN devices, as one of many examples, all
> the time.
>
> That's why we even bother hashing network devices in the network code.

Cool thanks. Although I think that was only a 256 way hash. So it
is a bit stretched at 10,000 chain length of 39 and approaching ugly
at 100,000. Still it should perform much better the sysctl.

I think someone failed to notice that using /proc/sys slowed to a crawl
in that event, and now that I am doing a lookup on register it seems to
showing up in the benchmarks.

At 256 or fewer that we that the network device hash is optimized for
the sysctl data structures didn't look to be falling over to badly.
2s vs .2s if I read Benjamin numbers right.

Given that it is late in the release cycle (so we can't do the
surgery that it appears the internal sysctl data structures need)
I propose doing something like the patch below.

Benjamin can you test the patch below and tell me if it also
keeps the network device performance at acceptable levels.

I really don't want to remove the check for invalid binary sysctl names
or the other ones if I can help it. But that should be a small constant
cost and not make things progressively worse.

Eric

diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index a68425a..d69ef6d 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -1343,6 +1343,7 @@ static void sysctl_repair_table(struct ctl_table *table)
}
}

+#if 0
static struct ctl_table *sysctl_check_lookup(struct ctl_table *table)
{
struct ctl_table_header *head;
@@ -1385,6 +1386,7 @@ out:
sysctl_head_finish(head);
return ref;
}
+#endif

static void set_fail(const char **fail, struct ctl_table *table, const char *str)
{
@@ -1397,6 +1399,10 @@ static void set_fail(const char **fail, struct ctl_table *table, const char *str
*fail = str;
}

+#if 0
+/* Temporarily disabled to improve network device creation speed
+ * Reenable after we have fixed the sysctl data structures.
+ */
static int sysctl_check_dir(struct ctl_table *table)
{
struct ctl_table *ref;
@@ -1436,6 +1442,15 @@ static void sysctl_check_leaf(struct ctl_table *table, const char **fail)
if (ref && (ref != table))
set_fail(fail, table, "Sysctl already exists");
}
+#else
+static int sysctl_check_dir(struct ctl_table *table)
+{
+ return 0;
+}
+static void sysctl_check_leaf(struct ctl_table *table, const char **fail)
+{
+}
+#endif

static void sysctl_check_bin_path(struct ctl_table *table, const char **fail)
{

2008-01-07 20:45:31

by Andi Kleen

[permalink] [raw]
Subject: Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces

David Miller <[email protected]> writes:
>
>> So is this a bug report telling me that there are users with
>> 10k or 100k interfaces that care. So we need to fix sysctl.
>
> Unquestionably, we do, it's a major regression.
>
> People create thousands of VLAN devices, as one of many examples, all
> the time.

It might be an reasonable option to just stop creating sysctl entries
for interfaces after some threshold. I presume people who have that
many interfaces will mostly work through {default,all}/* anyways.

I think that would be a better option than to complicate sysctl.c
for this uncommon case.

-Andi

2008-01-07 21:33:48

by Alan

[permalink] [raw]
Subject: Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces

> I think that would be a better option than to complicate sysctl.c
> for this uncommon case.

What is so complicated about hashing the entries if you are checking for
duplicates when debugging. You can set the hash function to "0" and the
array size to 1 when the debug is off and it'll all go away nicely

2008-01-07 22:55:30

by Andi Kleen

[permalink] [raw]
Subject: Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces

On Mon, Jan 07, 2008 at 09:30:54PM +0000, Alan Cox wrote:
> > I think that would be a better option than to complicate sysctl.c
> > for this uncommon case.
>
> What is so complicated about hashing the entries if you are checking for

One thing I'm worrying about is memory bloat (yes I know that's not
popular but someone has to do it ;-)

You would need a hash table for each table. To handle 100k entries
you would need a larger hash tables with at least a few hundred entries.
And that for each subdirectory.

% find /proc/sys -type d | wc -l
64

Assuming e.g. a 128 byte entry hash table (which is probably too small
for 100k entries) that would require 64 * 128 * 8 = 64k of memory.
Not gigantic, but lots of small fry bloat also adds up. Now if you
chose an actually realistic hash table size it gets even bigger.

Most likely you would need to implement a tree or a resizeable hash table
to do this sanely and then you quickly go into complicated territory.

> duplicates when debugging. You can set the hash function to "0" and the

My understanding was that the code was always on; not only for debugging.

-Andi

2008-01-08 01:47:23

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: regression: sysctl_check changes in 2.6.24 are O(n) resulting in slow creation of 10000 network interfaces

On Mon, Jan 07, 2008 at 03:25:00AM -0700, Eric W. Biederman wrote:
> I think someone failed to notice that using /proc/sys slowed to a crawl
> in that event, and now that I am doing a lookup on register it seems to
> showing up in the benchmarks.

The directory that is problematic rarely gets accessed. Fixing sysfs
properly at some point would be a good idea, it's just one of those things
that isn't quite high priority yet.

> Benjamin can you test the patch below and tell me if it also
> keeps the network device performance at acceptable levels.

I can confirm that this patch does indeed put the regression at bay for
now.

-ben