2010-11-12 05:45:52

by Andres Salomon

[permalink] [raw]
Subject: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)


Calling alloc_bootmem() for tiny chunks of memory over and over is really
slow; on an XO-1, it caused the time between when the kernel started
booting and when the display came alive (post-lxfb probe) to increase
to 44s. This patch optimizes the prom_early_alloc function by
calling alloc_bootmem for 4k-sized blocks of memory, and handing out
chunks of that to callers. With this patch, the time between kernel load
and display initialization decreased to 23s. If there's a better way to
do this early in the boot process, please let me know.

(Note: increasing the chunk size to 16k didn't noticably affect boot time,
and wasted 9k.)

v2: reorder function as suggested by Grant.

Signed-off-by: Andres Salomon <[email protected]>
---
arch/x86/platform/olpc/olpc_dt.c | 27 ++++++++++++++++++++++-----
1 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/platform/olpc/olpc_dt.c b/arch/x86/platform/olpc/olpc_dt.c
index b8c8ff9..0ab824d 100644
--- a/arch/x86/platform/olpc/olpc_dt.c
+++ b/arch/x86/platform/olpc/olpc_dt.c
@@ -126,14 +126,31 @@ static unsigned int prom_early_allocated __initdata;

void * __init prom_early_alloc(unsigned long size)
{
+ static u8 *mem = NULL;
+ static size_t free_mem = 0;
void *res;

- res = alloc_bootmem(size);
- if (res)
- memset(res, 0, size);
-
- prom_early_allocated += size;
+ if (free_mem < size) {
+ const size_t chunk_size = max(PAGE_SIZE, size);
+
+ /*
+ * To mimimize the number of allocations, grab at least 4k of
+ * memory (that's an arbitrary choice that matches PAGE_SIZE on
+ * the platforms we care about, and minimizes wasted bootmem)
+ * and hand off chunks of it to callers.
+ */
+ res = mem = alloc_bootmem(chunk_size);
+ if (!res)
+ return NULL;
+ prom_early_allocated += chunk_size;
+ memset(res, 0, chunk_size);
+ free_mem = chunk_size;
+ }

+ /* allocate from the local cache */
+ free_mem -= size;
+ res = mem;
+ mem += size;
return res;
}

--
1.7.2.3


2010-11-12 07:48:33

by Milton Miller

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)

On Thu, 11 Nov 2010 around 21:45:46 -0800, Andres Salomon wrote:
> diff --git a/arch/x86/platform/olpc/olpc_dt.c b/arch/x86/platform/olpc/olpc_dt.c
> index b8c8ff9..0ab824d 100644
> --- a/arch/x86/platform/olpc/olpc_dt.c
> +++ b/arch/x86/platform/olpc/olpc_dt.c
> @@ -126,14 +126,31 @@ static unsigned int prom_early_allocated __initdata;
>
> void * __init prom_early_alloc(unsigned long size)
> {
> + static u8 *mem = NULL;
> + static size_t free_mem = 0;

Static variables are implicitly 0 and NULL

> void *res;
>
> - res = alloc_bootmem(size);
> - if (res)
> - memset(res, 0, size);
> -
> - prom_early_allocated += size;
> + if (free_mem < size) {
> + const size_t chunk_size = max(PAGE_SIZE, size);
> +
> + /*
> + * To mimimize the number of allocations, grab at least 4k of
> + * memory (that's an arbitrary choice that matches PAGE_SIZE on
> + * the platforms we care about, and minimizes wasted bootmem)
> + * and hand off chunks of it to callers.
> + */
> + res = mem = alloc_bootmem(chunk_size);
> + if (!res)
> + return NULL;

Oops. If alloc_bootmem fails, we loose mem but don't reset free_mem, so a
later call (possibly for a smaller chunk) may return memory starting
at NULL.

I suggest just assinging res above and then add mem = res inside this if.

Oh, this is alloc_bootmem not alloc_bootmem_nopainc ... should it be?

> + prom_early_allocated += chunk_size;
> + memset(res, 0, chunk_size);
> + free_mem = chunk_size;
> + }
>
> + /* allocate from the local cache */
> + free_mem -= size;
> + res = mem;
> + mem += size;
> return res;
> }
>

milton

2010-11-12 08:27:11

by Andres Salomon

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)

On Fri, 12 Nov 2010 01:48:30 -0600
Milton Miller <[email protected]> wrote:

> On Thu, 11 Nov 2010 around 21:45:46 -0800, Andres Salomon wrote:
> > diff --git a/arch/x86/platform/olpc/olpc_dt.c
> > b/arch/x86/platform/olpc/olpc_dt.c index b8c8ff9..0ab824d 100644
> > --- a/arch/x86/platform/olpc/olpc_dt.c
> > +++ b/arch/x86/platform/olpc/olpc_dt.c
> > @@ -126,14 +126,31 @@ static unsigned int prom_early_allocated
> > __initdata;
> > void * __init prom_early_alloc(unsigned long size)
> > {
> > + static u8 *mem = NULL;
> > + static size_t free_mem = 0;
>
> Static variables are implicitly 0 and NULL

That's true for global static variables; is it also true for static
variables that are local to a function?


>
> > void *res;
> >
> > - res = alloc_bootmem(size);
> > - if (res)
> > - memset(res, 0, size);
> > -
> > - prom_early_allocated += size;
> > + if (free_mem < size) {
> > + const size_t chunk_size = max(PAGE_SIZE, size);
> > +
> > + /*
> > + * To mimimize the number of allocations, grab at
> > least 4k of
> > + * memory (that's an arbitrary choice that matches
> > PAGE_SIZE on
> > + * the platforms we care about, and minimizes
> > wasted bootmem)
> > + * and hand off chunks of it to callers.
> > + */
> > + res = mem = alloc_bootmem(chunk_size);
> > + if (!res)
> > + return NULL;
>
> Oops. If alloc_bootmem fails, we loose mem but don't reset free_mem,
> so a later call (possibly for a smaller chunk) may return memory
> starting at NULL.

You're right, good catch. Thanks!


>
> I suggest just assinging res above and then add mem = res inside this
> if.
>
> Oh, this is alloc_bootmem not alloc_bootmem_nopainc ... should it be?
>

Currently, nothing in drivers/of/pdt.c actually checks alloc_bootmem
results.. so we're panicking whether we like it or not. It would be
good to actually handle memory failures gracefully and not oops (the DT
certainly isn't essential to booting in OLPC's case), but that's a
separate patch (series).


> > + prom_early_allocated += chunk_size;
> > + memset(res, 0, chunk_size);
> > + free_mem = chunk_size;
> > + }
> >
> > + /* allocate from the local cache */
> > + free_mem -= size;
> > + res = mem;
> > + mem += size;
> > return res;
> > }
> >
>
> milton

2010-11-14 09:50:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)


* Andres Salomon <[email protected]> wrote:

> On Fri, 12 Nov 2010 01:48:30 -0600
> Milton Miller <[email protected]> wrote:
>
> > On Thu, 11 Nov 2010 around 21:45:46 -0800, Andres Salomon wrote:
> > > diff --git a/arch/x86/platform/olpc/olpc_dt.c
> > > b/arch/x86/platform/olpc/olpc_dt.c index b8c8ff9..0ab824d 100644
> > > --- a/arch/x86/platform/olpc/olpc_dt.c
> > > +++ b/arch/x86/platform/olpc/olpc_dt.c
> > > @@ -126,14 +126,31 @@ static unsigned int prom_early_allocated
> > > __initdata;
> > > void * __init prom_early_alloc(unsigned long size)
> > > {
> > > + static u8 *mem = NULL;
> > > + static size_t free_mem = 0;
> >
> > Static variables are implicitly 0 and NULL
>
> That's true for global static variables; is it also true for static
> variables that are local to a function?

It is the same allocation and initialization wise - just not visible to functions
outside this one.

It's being frowned upon btw, because it's easy to overlook (and for other reasons) -
please dont use statics inside functions - please put them in front of the function
or further up into file scope.

Ingo

2010-11-15 04:22:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)

On 11/14/2010 01:50 AM, Ingo Molnar wrote:
>
> It's being frowned upon btw, because it's easy to overlook (and for other reasons) -
> please dont use statics inside functions - please put them in front of the function
> or further up into file scope.
>

What? What is wrong with static variables in functions? It really
doesn't seem to be a good idea to make them file-scope if they don't
need to be.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-11-15 07:03:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)


* H. Peter Anvin <[email protected]> wrote:

> What? What is wrong with static variables in functions? It really doesn't seem
> to be a good idea to make them file-scope if they don't need to be.

They are very easy to overlook and mix up with regular stack variables and i've seen
(and introduced myself) a number of bugs due to them.

They also often are used in buggy ways (with SMP not taken into consideration), so
overlooking them during review compounds their negative effects. Putting them in
front of the function isnt a big deal in exchange.

There are people who never overlook them (like yourself), but my brain is wired up
differently.

Thanks,

Ingo

2010-11-15 17:43:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)

On 11/14/2010 11:02 PM, Ingo Molnar wrote:
>
> * H. Peter Anvin <[email protected]> wrote:
>
>> What? What is wrong with static variables in functions? It really doesn't seem
>> to be a good idea to make them file-scope if they don't need to be.
>
> They are very easy to overlook and mix up with regular stack variables and i've seen
> (and introduced myself) a number of bugs due to them.
>
> They also often are used in buggy ways (with SMP not taken into consideration), so
> overlooking them during review compounds their negative effects. Putting them in
> front of the function isnt a big deal in exchange.
>
> There are people who never overlook them (like yourself), but my brain is wired up
> differently.
>

However, I have to vehemently object to putting them in a wider scope
than is otherwise necessary. I agree that static variables should be
used sparsely if at all (there really are vary few uses of them that are
valid), but putting them in a larger scope screams "I'm used in more
than one function", and that is *not* a good thing.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-11-17 06:12:56

by Andres Salomon

[permalink] [raw]
Subject: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v3)


Calling alloc_bootmem() for tiny chunks of memory over and over is really
slow; on an XO-1, it caused the time between when the kernel started
booting and when the display came alive (post-lxfb probe) to increase
to 44s. This patch optimizes the prom_early_alloc function by
calling alloc_bootmem for 4k-sized blocks of memory, and handing out
chunks of that to callers. With this patch, the time between kernel load
and display initialization decreased to 23s. If there's a better way to
do this early in the boot process, please let me know.

(Note: increasing the chunk size to 16k didn't noticably affect boot time,
and wasted 9k.)

v3: fix wasted memory buglet found by Milton Miller, and style fixes.
v2: reorder prom_early_alloc as suggested by Grant Likely.

Signed-off-by: Andres Salomon <[email protected]>
---
arch/x86/platform/olpc/olpc_dt.c | 28 +++++++++++++++++++++++-----
1 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/x86/platform/olpc/olpc_dt.c b/arch/x86/platform/olpc/olpc_dt.c
index 7054697..6614ec4 100644
--- a/arch/x86/platform/olpc/olpc_dt.c
+++ b/arch/x86/platform/olpc/olpc_dt.c
@@ -126,14 +126,32 @@ static unsigned int prom_early_allocated __initdata;

void * __init prom_early_alloc(unsigned long size)
{
+ static u8 *mem;
+ static size_t free_mem;
void *res;

- res = alloc_bootmem(size);
- if (res)
- memset(res, 0, size);
-
- prom_early_allocated += size;
+ if (free_mem < size) {
+ const size_t chunk_size = max(PAGE_SIZE, size);
+
+ /*
+ * To mimimize the number of allocations, grab at least 4k of
+ * memory (that's an arbitrary choice that matches PAGE_SIZE on
+ * the platforms we care about, and minimizes wasted bootmem)
+ * and hand off chunks of it to callers.
+ */
+ res = alloc_bootmem(chunk_size);
+ if (!res)
+ return NULL;
+ prom_early_allocated += chunk_size;
+ memset(res, 0, chunk_size);
+ free_mem = chunk_size;
+ mem = res;
+ }

+ /* allocate from the local cache */
+ free_mem -= size;
+ res = mem;
+ mem += size;
return res;
}

--
1.7.2.3


2010-11-18 08:34:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)


* H. Peter Anvin <[email protected]> wrote:

> On 11/14/2010 11:02 PM, Ingo Molnar wrote:
> >
> > * H. Peter Anvin <[email protected]> wrote:
> >
> >> What? What is wrong with static variables in functions? It really doesn't seem
> >> to be a good idea to make them file-scope if they don't need to be.
> >
> > They are very easy to overlook and mix up with regular stack variables and i've seen
> > (and introduced myself) a number of bugs due to them.
> >
> > They also often are used in buggy ways (with SMP not taken into consideration), so
> > overlooking them during review compounds their negative effects. Putting them in
> > front of the function isnt a big deal in exchange.
> >
> > There are people who never overlook them (like yourself), but my brain is wired up
> > differently.
> >
>
> However, I have to vehemently object to putting them in a wider scope
> than is otherwise necessary. I agree that static variables should be
> used sparsely if at all (there really are vary few uses of them that are
> valid), but putting them in a larger scope screams "I'm used in more
> than one function", and that is *not* a good thing.

That's why we sometimes use the (imperfect) compromise to put them in front of that
function, not at the top of the file.

Look at the general balance of hardship: very little harm is done (it's not a big
deal if a variable is only used in a single function) but having it with local
variables can be _really_ harmful - for example i overlooked them when i reviewed
this patch. I dont like important details obscured - i like them to be apparent.
Again, this is something that some people can parse immediately on the visual level
- me and many others cannot.

Thanks,

Ingo

2010-11-18 11:02:18

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)

On Thu, 2010-11-18 at 09:34 +0100, Ingo Molnar wrote:
> * H. Peter Anvin <[email protected]> wrote:
>
> > On 11/14/2010 11:02 PM, Ingo Molnar wrote:
> > >
> > > * H. Peter Anvin <[email protected]> wrote:
> > >
> > >> What? What is wrong with static variables in functions? It really doesn't seem
> > >> to be a good idea to make them file-scope if they don't need to be.
> > >
> > > They are very easy to overlook and mix up with regular stack variables and i've seen
> > > (and introduced myself) a number of bugs due to them.
> > >
> > > They also often are used in buggy ways (with SMP not taken into consideration), so
> > > overlooking them during review compounds their negative effects. Putting them in
> > > front of the function isnt a big deal in exchange.
> > >
> > > There are people who never overlook them (like yourself), but my brain is wired up
> > > differently.
> > >
> >
> > However, I have to vehemently object to putting them in a wider scope
> > than is otherwise necessary. I agree that static variables should be
> > used sparsely if at all (there really are vary few uses of them that are
> > valid), but putting them in a larger scope screams "I'm used in more
> > than one function", and that is *not* a good thing.
>
> That's why we sometimes use the (imperfect) compromise to put them in front of that
> function, not at the top of the file.
>
> Look at the general balance of hardship: very little harm is done (it's not a big
> deal if a variable is only used in a single function) but having it with local
> variables can be _really_ harmful - for example i overlooked them when i reviewed
> this patch. I dont like important details obscured - i like them to be apparent.
> Again, this is something that some people can parse immediately on the visual level
> - me and many others cannot.

What about:

int foo(void)
{
static int bar;

struct thing_struct *thing;
int other_var;
char *p;

...
}

I think the visual wrongness of that formatting would be enough for me
to stop and look twice. Though I guess it doesn't work if you have few,
or no other variables other than the statics to declare.

cheers


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-11-18 15:04:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)

On 11/18/2010 03:02 AM, Michael Ellerman wrote:
> On Thu, 2010-11-18 at 09:34 +0100, Ingo Molnar wrote:
>>
>> Look at the general balance of hardship: very little harm is done (it's not a big
>> deal if a variable is only used in a single function) but having it with local
>> variables can be _really_ harmful - for example i overlooked them when i reviewed
>> this patch. I dont like important details obscured - i like them to be apparent.
>> Again, this is something that some people can parse immediately on the visual level
>> - me and many others cannot.
>

No, sorry, this sounds like a personal preference that is well out of
line with the vast majority of C programmers I've ever come across, not
just in the Linux kernel world but outside of it.

> What about:
>
> int foo(void)
> {
> static int bar;
>
> struct thing_struct *thing;
> int other_var;
> char *p;
>
> ...
> }
>
> I think the visual wrongness of that formatting would be enough for me
> to stop and look twice. Though I guess it doesn't work if you have few,
> or no other variables other than the statics to declare.
>

I wouldn't object to a convention like that, but let's bloody well
realize that that is a brand new convention, and if this convention is
going to stick at all it needs to be made official and put in CodingStyle.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-11-18 17:42:00

by Andres Salomon

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)

On Thu, 18 Nov 2010 07:04:04 -0800
"H. Peter Anvin" <[email protected]> wrote:

> On 11/18/2010 03:02 AM, Michael Ellerman wrote:
> > On Thu, 2010-11-18 at 09:34 +0100, Ingo Molnar wrote:
> >>
> >> Look at the general balance of hardship: very little harm is done
> >> (it's not a big deal if a variable is only used in a single
> >> function) but having it with local variables can be _really_
> >> harmful - for example i overlooked them when i reviewed this
> >> patch. I dont like important details obscured - i like them to be
> >> apparent. Again, this is something that some people can parse
> >> immediately on the visual level
> >> - me and many others cannot.
> >
>
> No, sorry, this sounds like a personal preference that is well out of
> line with the vast majority of C programmers I've ever come across,
> not just in the Linux kernel world but outside of it.


This is actually one of the reasons I specifically like initialized
static variables (inside of functions). Take the following code:

int foo(void)
{
static char *frob = NULL;
int p;

if (frob) {
...
}


Upon seeing that and thinking "whoa, how could frob be
initialized and then checked?", I realize that it's either a bug or I
look back at the initialization and realize that frob is static. It's
less obvious (to me) with non-explicit initialization.

2010-11-18 17:50:54

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)

On 11/18/2010 09:41 AM, Andres Salomon wrote:
>>
>> No, sorry, this sounds like a personal preference that is well out of
>> line with the vast majority of C programmers I've ever come across,
>> not just in the Linux kernel world but outside of it.
>
>
> This is actually one of the reasons I specifically like initialized
> static variables (inside of functions). Take the following code:
>
> int foo(void)
> {
> static char *frob = NULL;
> int p;
>
> if (frob) {
> ...
> }
>
>
> Upon seeing that and thinking "whoa, how could frob be
> initialized and then checked?", I realize that it's either a bug or I
> look back at the initialization and realize that frob is static. It's
> less obvious (to me) with non-explicit initialization.

I have to agree with this one. In general I dislike relying on an
implicit (even well-defined) initialized value; unfortunately we ripped
out explicit initializations across the Linux kernel, not due to
readability but due to the fact that long-since-obsolete versions of gcc
would put explicitly-initialized variables in data rather than bss even
if the initial value is zero.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2010-11-19 20:24:18

by Andres Salomon

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)

On Thu, 18 Nov 2010 09:48:59 -0800
"H. Peter Anvin" <[email protected]> wrote:

> On 11/18/2010 09:41 AM, Andres Salomon wrote:
> >>
> >> No, sorry, this sounds like a personal preference that is well out
> >> of line with the vast majority of C programmers I've ever come
> >> across, not just in the Linux kernel world but outside of it.
> >
> >
> > This is actually one of the reasons I specifically like initialized
> > static variables (inside of functions). Take the following code:
> >
> > int foo(void)
> > {
> > static char *frob = NULL;
> > int p;
> >
> > if (frob) {
> > ...
> > }
> >
> >
> > Upon seeing that and thinking "whoa, how could frob be
> > initialized and then checked?", I realize that it's either a bug or
> > I look back at the initialization and realize that frob is static.
> > It's less obvious (to me) with non-explicit initialization.
>
> I have to agree with this one. In general I dislike relying on an
> implicit (even well-defined) initialized value; unfortunately we
> ripped out explicit initializations across the Linux kernel, not due
> to readability but due to the fact that long-since-obsolete versions
> of gcc would put explicitly-initialized variables in data rather than
> bss even if the initial value is zero.
>
> -hpa
>
>

Note that I sent another update for this patch the other day
(Tuesday). It uses implicit initialization. Some Acks would be
awesome if folks are happy w/ the way I've done things.. ;)

2010-11-29 23:39:57

by Andres Salomon

[permalink] [raw]
Subject: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v4)


Calling alloc_bootmem() for tiny chunks of memory over and over is really
slow; on an XO-1, it caused the time between when the kernel started
booting and when the display came alive (post-lxfb probe) to increase
to 44s. This patch optimizes the prom_early_alloc function by
calling alloc_bootmem for 4k-sized blocks of memory, and handing out
chunks of that to callers. With this patch, the time between kernel load
and display initialization decreased to 23s. If there's a better way to
do this early in the boot process, please let me know.

(Note: increasing the chunk size to 16k didn't noticably affect boot time,
and wasted 9k.)

v4: clarify comment, requested by hpa
v3: fix wasted memory buglet found by Milton Miller, and style fix.
v2: reorder prom_early_alloc as suggested by Grant.

Signed-off-by: Andres Salomon <[email protected]>
---
arch/x86/platform/olpc/olpc_dt.c | 28 +++++++++++++++++++++++-----
1 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/x86/platform/olpc/olpc_dt.c b/arch/x86/platform/olpc/olpc_dt.c
index 7054697..dab8746 100644
--- a/arch/x86/platform/olpc/olpc_dt.c
+++ b/arch/x86/platform/olpc/olpc_dt.c
@@ -126,14 +126,32 @@ static unsigned int prom_early_allocated __initdata;

void * __init prom_early_alloc(unsigned long size)
{
+ static u8 *mem;
+ static size_t free_mem;
void *res;

- res = alloc_bootmem(size);
- if (res)
- memset(res, 0, size);
-
- prom_early_allocated += size;
+ if (free_mem < size) {
+ const size_t chunk_size = max(PAGE_SIZE, size);
+
+ /*
+ * To mimimize the number of allocations, grab at least
+ * PAGE_SIZE of memory (that's an arbitrary choice that's
+ * fast enough on the platforms we care about while minimizing
+ * wasted bootmem) and hand off chunks of it to callers.
+ */
+ res = alloc_bootmem(chunk_size);
+ if (!res)
+ return NULL;
+ prom_early_allocated += chunk_size;
+ memset(res, 0, chunk_size);
+ free_mem = chunk_size;
+ mem = res;
+ }

+ /* allocate from the local cache */
+ free_mem -= size;
+ res = mem;
+ mem += size;
return res;
}

--
1.7.2.3

2010-12-16 02:58:59

by Andres Salomon

[permalink] [raw]
Subject: [tip:x86/olpc] x86, olpc: Speed up device tree creation during boot

Commit-ID: b5318d302f8a20eacbbfc01b0ee35b108085a363
Gitweb: http://git.kernel.org/tip/b5318d302f8a20eacbbfc01b0ee35b108085a363
Author: Andres Salomon <[email protected]>
AuthorDate: Mon, 29 Nov 2010 23:39:51 +0000
Committer: H. Peter Anvin <[email protected]>
CommitDate: Wed, 15 Dec 2010 17:11:40 -0800

x86, olpc: Speed up device tree creation during boot

Calling alloc_bootmem() for tiny chunks of memory over and over is really
slow; on an XO-1, it caused the time between when the kernel started
booting and when the display came alive (post-lxfb probe) to increase
to 44s. This patch optimizes the prom_early_alloc function by
calling alloc_bootmem for 4k-sized blocks of memory, and handing out
chunks of that to callers. With this patch, the time between kernel load
and display initialization decreased to 23s. If there's a better way to
do this early in the boot process, please let me know.

(Note: increasing the chunk size to 16k didn't noticably affect boot time,
and wasted 9k.)

v4: clarify comment, requested by hpa
v3: fix wasted memory buglet found by Milton Miller, and style fix.
v2: reorder prom_early_alloc as suggested by Grant.

Signed-off-by: Andres Salomon <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
---
arch/x86/platform/olpc/olpc_dt.c | 28 +++++++++++++++++++++++-----
1 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/x86/platform/olpc/olpc_dt.c b/arch/x86/platform/olpc/olpc_dt.c
index 7054697..dab8746 100644
--- a/arch/x86/platform/olpc/olpc_dt.c
+++ b/arch/x86/platform/olpc/olpc_dt.c
@@ -126,14 +126,32 @@ static unsigned int prom_early_allocated __initdata;

void * __init prom_early_alloc(unsigned long size)
{
+ static u8 *mem;
+ static size_t free_mem;
void *res;

- res = alloc_bootmem(size);
- if (res)
- memset(res, 0, size);
-
- prom_early_allocated += size;
+ if (free_mem < size) {
+ const size_t chunk_size = max(PAGE_SIZE, size);
+
+ /*
+ * To mimimize the number of allocations, grab at least
+ * PAGE_SIZE of memory (that's an arbitrary choice that's
+ * fast enough on the platforms we care about while minimizing
+ * wasted bootmem) and hand off chunks of it to callers.
+ */
+ res = alloc_bootmem(chunk_size);
+ if (!res)
+ return NULL;
+ prom_early_allocated += chunk_size;
+ memset(res, 0, chunk_size);
+ free_mem = chunk_size;
+ mem = res;
+ }

+ /* allocate from the local cache */
+ free_mem -= size;
+ res = mem;
+ mem += size;
return res;
}

2010-12-23 11:57:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] x86: OLPC: speed up device tree creation during boot (v2)


* Ingo Molnar <[email protected]> wrote:

> > However, I have to vehemently object to putting them in a wider scope
> > than is otherwise necessary. I agree that static variables should be
> > used sparsely if at all (there really are vary few uses of them that are
> > valid), but putting them in a larger scope screams "I'm used in more
> > than one function", and that is *not* a good thing.
>
> That's why we sometimes use the (imperfect) compromise to put them in front of
> that function, not at the top of the file.
>
> Look at the general balance of hardship: very little harm is done (it's not a big
> deal if a variable is only used in a single function) but having it with local
> variables can be _really_ harmful - for example i overlooked them when i reviewed
> this patch. I dont like important details obscured - i like them to be apparent.
> Again, this is something that some people can parse immediately on the visual
> level - me and many others cannot.

As an addendum, beyond my own bad experience with them, see below a recent upstream
fix that shows the kinds of problems that overlooked function scope statics can
cause.

Ingo

------------->
>From 3cb50ddf97a0a1ca4c68bc12fa1e727a6b45fbf2 Mon Sep 17 00:00:00 2001
From: Al Viro <[email protected]>
Date: Mon, 20 Dec 2010 15:53:18 +0000
Subject: [PATCH] Fix btrfs b0rkage

Buggered-in: 76dda93c6ae2 ("Btrfs: add snapshot/subvolume destroy
ioctl")

Signed-off-by: Al Viro <[email protected]>
Acked-by: Chris Mason <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
---
fs/btrfs/export.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c
index 6f04444..659f532 100644
--- a/fs/btrfs/export.c
+++ b/fs/btrfs/export.c
@@ -166,7 +166,7 @@ static struct dentry *btrfs_fh_to_dentry(struct super_block *sb, struct fid *fh,
static struct dentry *btrfs_get_parent(struct dentry *child)
{
struct inode *dir = child->d_inode;
- static struct dentry *dentry;
+ struct dentry *dentry;
struct btrfs_root *root = BTRFS_I(dir)->root;
struct btrfs_path *path;
struct extent_buffer *leaf;