Hi,
I'm just sending this message to report about a work I've recently done
to speed-up symbol resolution for modules by using a SysV ELF hash table
(without relying upon binutils support).
This work has been presented few days ago at the Embedded Linux Conference Europe.
Patches are already publicly available for 2.6.23 kernel @STLinux git
(http://git.stlinux.com/?p=stm/linux-sh4-2.6.23.y.git;a=summary)
For 2.6.30 already ported but not yet available.
Benchmarks have shown an average reduction of 96% in time spent for symbol resolution
(that is 25x faster).
All details can be found at
http://tree.celinuxforum.org/CelfPubWiki/ELCEurope2009Presentations?action=AttachFile&do=view&target=C_AMOROSO_Fast_lkm_loader_ELC-E_2009.pdf
I'm working to update them to mainline and post for review and discussion.
We are also working right now to update this work too to use GNU hash instead of SysV ELF hash
Carmelo
On 10/18/09, Carmelo Amoroso <[email protected]> wrote:
> Hi,
> I'm just sending this message to report about a work I've recently done
> to speed-up symbol resolution for modules by using a SysV ELF hash table
> (without relying upon binutils support).
> This work has been presented few days ago at the Embedded Linux Conference
> Europe.
>
> Patches are already publicly available for 2.6.23 kernel @STLinux git
> (http://git.stlinux.com/?p=stm/linux-sh4-2.6.23.y.git;a=summary)
>
> For 2.6.30 already ported but not yet available.
>
> Benchmarks have shown an average reduction of 96% in time spent for symbol
> resolution
> (that is 25x faster).
>
> All details can be found at
> http://tree.celinuxforum.org/CelfPubWiki/ELCEurope2009Presentations?action=AttachFile&do=view&target=C_AMOROSO_Fast_lkm_loader_ELC-E_2009.pdf
>
> I'm working to update them to mainline and post for review and discussion.
> We are also working right now to update this work too to use GNU hash
> instead of SysV ELF hash
Hi!
I found this very interesting. I recently posted a prototype to use
binary search to optimize symbol lookup[1]. I guess it's unlikely for
more than one such optimization to be merged into mainline :).
The nice thing about binary search is that it doesn't require
increased memory structures. You just have to sort the existing
tables (although it's easier said than done). Anyway, this means I
didn't have to worry about making it optional, or being accused of
bloat. I also managed to patch into the existing modpost run, instead
of adding another intermediate build step.
---
We should certainly expect hash tables to be faster. Strictly
speaking our numbers are incomparable, because your test machine is a
bit different to my x86 netbook :). I didn't even report the same
numbers. That said, I have some saved "perf report" output, and it
_looks_ like using bsearch cut down find_symbol()+strcmp() by 96%
<grin>.
If look at the total savings hash tables made in your slides, I
actually get 98%. I guess either the analysis was conservative or
there were more modules which were omitted for brevity.
---
Hypothetically: imagine we both finish our work and testing on the
same machine shows hash tables saving 100% and bsearch saving 90%. In
absolute terms, hash tables might have an advantage of 0.03s on my
system (where bsearch saved 0.3s), and a total advantage of 0.015s for
the modules you tested (where hash tables saved ~0.15s).
Would you accept bsearch in this case? Or would you feel that the
performance of hash tables outweighed the extra memory requirements?
(This leaves the question of why you need to load 0.015s worth of
always-needed in-tree kernel code as modules. For those who haven't
read the slides, the reasoning is that built-in code would take
_longer_ to load. The boot-loader is often slower at IO, and it
doesn't allow other initialization to occur in parallel).
Warm regards
Alan
---
[1] My bsearch prototype has several undisclosed problems which I'm
working on. If you're curious you can find the patches by searching
"from:[email protected] to:Rusty". At the moment the series
is blocked on ARM. I want to kill EXPORT_SYMBOL_ALIAS in armksyms.c,
because it breaks some simplifying assumptions I was relying on.
The protoype also limits the optimization to built-in symbols to avoid
extra modpost overhead. However, this is an orthogonal decision - it
should not be hard to change if desired.
On 10/18/09, Alan Jenkins <[email protected]> wrote:
> On 10/18/09, Carmelo Amoroso <[email protected]> wrote:
>> Hi,
>> I'm just sending this message to report about a work I've recently done
>> to speed-up symbol resolution for modules by using a SysV ELF hash table
>> (without relying upon binutils support).
>> This work has been presented few days ago at the Embedded Linux
>> Conference
>> Europe.
>>
>> Patches are already publicly available for 2.6.23 kernel @STLinux git
>> (http://git.stlinux.com/?p=stm/linux-sh4-2.6.23.y.git;a=summary)
>>
>> For 2.6.30 already ported but not yet available.
>>
>> Benchmarks have shown an average reduction of 96% in time spent for
>> symbol
>> resolution
>> (that is 25x faster).
>>
>> All details can be found at
>> http://tree.celinuxforum.org/CelfPubWiki/ELCEurope2009Presentations?action=AttachFile&do=view&target=C_AMOROSO_Fast_lkm_loader_ELC-E_2009.pdf
>>
>> I'm working to update them to mainline and post for review and
>> discussion.
>> We are also working right now to update this work too to use GNU hash
>> instead of SysV ELF hash
>
> Hi!
>
> I found this very interesting. I recently posted a prototype to use
> binary search to optimize symbol lookup[1]. I guess it's unlikely for
> more than one such optimization to be merged into mainline :).
>
> The nice thing about binary search is that it doesn't require
> increased memory structures. You just have to sort the existing
> tables (although it's easier said than done). Anyway, this means I
> didn't have to worry about making it optional, or being accused of
> bloat. I also managed to patch into the existing modpost run, instead
> of adding another intermediate build step.
>
> ---
> We should certainly expect hash tables to be faster. Strictly
> speaking our numbers are incomparable, because your test machine is a
> bit different to my x86 netbook :). I didn't even report the same
> numbers. That said, I have some saved "perf report" output, and it
> _looks_ like using bsearch cut down find_symbol()+strcmp() by 96%
> <grin>.
>
> If look at the total savings hash tables made in your slides, I
> actually get 98%. I guess either the analysis was conservative or
> there were more modules which were omitted for brevity.
> ---
>
> Hypothetically: imagine we both finish our work and testing on the
> same machine shows hash tables saving 100% and bsearch saving 90%. In
> absolute terms, hash tables might have an advantage of 0.03s on my
> system (where bsearch saved 0.3s), and a total advantage of 0.015s for
> the modules you tested (where hash tables saved ~0.15s).
>
> Would you accept bsearch in this case? Or would you feel that the
> performance of hash tables outweighed the extra memory requirements?
>
> (This leaves the question of why you need to load 0.015s worth of
> always-needed in-tree kernel code as modules. For those who haven't
> read the slides, the reasoning is that built-in code would take
> _longer_ to load. The boot-loader is often slower at IO, and it
> doesn't allow other initialization to occur in parallel).
>
> Warm regards
> Alan
>
> ---
> [1] My bsearch prototype has several undisclosed problems which I'm
> working on.
So here's a dump of my current state. It fixes the known issues (some
ugliness is still marked as TODO). Boot-tested for i386 only; it
should also build for ARM.
<http://github.com/sourcejedi/linux-2.6/commits/module-V2-alpha1>
It's definitely supposed to be cross-platform, so if you do try it and
fail then I'd be very interested to hear that. Even if all you say is
"idiot, it doesn't work, why haven't you tested it in qemu yet" :-).
> At the moment the series
> is blocked on ARM. I want to kill EXPORT_SYMBOL_ALIAS in armksyms.c,
> because it breaks some simplifying assumptions I was relying on.
>
> The protoype also limits the optimization to built-in symbols to avoid
> extra modpost overhead. However, this is an orthogonal decision - it
> should not be hard to change if desired.
On Sun, Oct 18, 2009 at 01:44:04PM +0100, Alan Jenkins wrote:
> Hypothetically: imagine we both finish our work and testing on the
> same machine shows hash tables saving 100% and bsearch saving 90%. In
> absolute terms, hash tables might have an advantage of 0.03s on my
> system (where bsearch saved 0.3s), and a total advantage of 0.015s for
> the modules you tested (where hash tables saved ~0.15s).
>
> Would you accept bsearch in this case? Or would you feel that the
> performance of hash tables outweighed the extra memory requirements?
The performance difference in "raw" time speed might be much different
on embedded platforms with slower processors, so it might be worth the
tiny complexity to get that much noticed speed.
> (This leaves the question of why you need to load 0.015s worth of
> always-needed in-tree kernel code as modules. For those who haven't
> read the slides, the reasoning is that built-in code would take
> _longer_ to load. The boot-loader is often slower at IO, and it
> doesn't allow other initialization to occur in parallel).
Distros are forced to build almost everything as modules. I've played
with building some modules into the kernel directly (see the openSUSE
moblin kernels for examples of that), but when you have to suport a much
larger range of hardware types and features that some users use and
others don't, and the ability to override specific drivers by others due
to manufacturer requests for specific updates, the need to keep things
as modules is the only way to solve the problem.
So I'm glad to see this speedup happen, very nice work everyone.
thanks,
greg k-h
On 10/18/09, Greg KH <[email protected]> wrote:
> On Sun, Oct 18, 2009 at 01:44:04PM +0100, Alan Jenkins wrote:
>> Hypothetically: imagine we both finish our work and testing on the
>> same machine shows hash tables saving 100% and bsearch saving 90%. In
>> absolute terms, hash tables might have an advantage of 0.03s on my
>> system (where bsearch saved 0.3s), and a total advantage of 0.015s for
>> the modules you tested (where hash tables saved ~0.15s).
>>
>> Would you accept bsearch in this case? Or would you feel that the
>> performance of hash tables outweighed the extra memory requirements?
>
> The performance difference in "raw" time speed might be much different
> on embedded platforms with slower processors, so it might be worth the
> tiny complexity to get that much noticed speed.
>
>> (This leaves the question of why you need to load 0.015s worth of
>> always-needed in-tree kernel code as modules. For those who haven't
>> read the slides, the reasoning is that built-in code would take
>> _longer_ to load. The boot-loader is often slower at IO, and it
>> doesn't allow other initialization to occur in parallel).
>
> Distros are forced to build almost everything as modules. I've played
> with building some modules into the kernel directly (see the openSUSE
> moblin kernels for examples of that), but when you have to suport a much
> larger range of hardware types and features that some users use and
> others don't, and the ability to override specific drivers by others due
> to manufacturer requests for specific updates, the need to keep things
> as modules is the only way to solve the problem.
Those are all good reasons. They're just not the reasons for loading
usbcore as a module on a 266Mhz processor as shown in the slides.
That module accounted for 84ms out of the 145ms total (without the
optimisation). So I feel obliged to stipulate this specific reasoning
before quoting the numbers on the slide.
> So I'm glad to see this speedup happen, very nice work everyone.
>
> thanks,
>
> greg k-h
>
I neglected to say that in the first place, but let me second it.
It's great to see this work from STLinux being proposed for inclusion
in mainline. If all my work does is serve as a baseline for some even
better numbers - I can survive that :). My netbook will still boot
0.3s faster :).
Alan
2009/10/18 Greg KH <[email protected]>:
> On Sun, Oct 18, 2009 at 01:44:04PM +0100, Alan Jenkins wrote:
>> Hypothetically: imagine we both finish our work and testing on the
>> same machine shows hash tables saving 100% and bsearch saving 90%. ?In
>> absolute terms, hash tables might have an advantage of 0.03s on my
>> system (where bsearch saved 0.3s), and a total advantage of 0.015s for
>> the modules you tested (where hash tables saved ~0.15s).
>>
>> Would you accept bsearch in this case? ?Or would you feel that the
>> performance of hash tables outweighed the extra memory requirements?
>
> The performance difference in "raw" time speed might be much different
> on embedded platforms with slower processors, so it might be worth the
> tiny complexity to get that much noticed speed.
>
yes, right. Further the figures I reported on the slides were taken
from a development
system with the rootfs mounted via NFS (no HDD or flash), with a debug
kernel and so on.
>> (This leaves the question of why you need to load 0.015s worth of
>> always-needed in-tree kernel code as modules. ?For those who haven't
>> read the slides, the ?reasoning is that built-in code would take
>> _longer_ to load. ?The boot-loader is often slower at IO, and it
>> doesn't allow other initialization to occur in parallel).
>
> Distros are forced to build almost everything as modules. ?I've played
> with building some modules into the kernel directly (see the openSUSE
> moblin kernels for examples of that), but when you have to suport a much
> larger range of hardware types and features that some users use and
> others don't, and the ability to override specific drivers by others due
> to manufacturer requests for specific updates, the need to keep things
> as modules is the only way to solve the problem.
>
> So I'm glad to see this speedup happen, very nice work everyone.
>
Just a few other notes. The current implementation I did based on SysV
has a drawback
that is not backward compatible, so you cannot use old modules with a kernel
with the option enabled due to changes on struct kernel_symbol.
Anyway I've just figured out how to change it to remove this limitation.
I need some time to review these patches.
Further, the newer implementation based on GNU hash which we are
working on right now,
will not require the extra .undef.hash ELF sections because hash
values are already embedded
into the GNU hash table, with a reduction in terms of footprint.
> thanks,
>
> greg k-h
>
Thanks to you too,
Carmelo
On Mon, Oct 19, 2009 at 01:45:20PM +0200, Carmelo Amoroso wrote:
> Just a few other notes. The current implementation I did based on SysV
> has a drawback that is not backward compatible, so you cannot use old
> modules with a kernel with the option enabled due to changes on struct
> kernel_symbol.
Why would this be a problem? Whenever making a kernel config change,
you should be able to rebuild everything, as lots of other configuration
options are that way.
> Anyway I've just figured out how to change it to remove this limitation.
> I need some time to review these patches. Further, the newer
> implementation based on GNU hash which we are working on right now,
> will not require the extra .undef.hash ELF sections because hash
> values are already embedded into the GNU hash table, with a reduction
> in terms of footprint.
Footprint in the memory for the loaded module, or just in the footprint
for the module on the disk?
I'd be interested in seeing your patches when you have something that
works for the current Linus kernel tree.
thanks,
greg k-h
2009/10/19 Greg KH <[email protected]>:
> On Mon, Oct 19, 2009 at 01:45:20PM +0200, Carmelo Amoroso wrote:
>> Just a few other notes. The current implementation I did based on SysV
>> has a drawback that is not backward compatible, so you cannot use old
>> modules with a kernel with the option enabled due to changes on struct
>> kernel_symbol.
>
> Why would this be a problem? ?Whenever making a kernel config change,
> you should be able to rebuild everything, as lots of other configuration
> options are that way.
>
This is not always true... there could be cases in which you cannot
recompile old modules
(e.g vendors that provide non GPL modules)
>> Anyway I've just figured out how to change it to remove this limitation.
>> I need some time to review these patches. ?Further, the newer
>> implementation based on GNU hash which we are working on right now,
>> will not require the extra .undef.hash ELF sections because hash
>> values are already embedded into the GNU hash table, with a reduction
>> in terms of footprint.
>
> Footprint in the memory for the loaded module, or just in the footprint
> for the module on the disk?
>
both
> I'd be interested in seeing your patches when you have something that
> works for the current Linus kernel tree.
>
sure. I tested weeks ago it on a 2.6.30 tree on x86 at it worked
without problems.
I had to hack the x86 linker scripts to remove a check on file size
that sounded strange
to me.
carmelo
> thanks,
>
> greg k-h
>
On Mon, Oct 19, 2009 at 05:02:51PM +0200, Carmelo Amoroso wrote:
> 2009/10/19 Greg KH <[email protected]>:
> > On Mon, Oct 19, 2009 at 01:45:20PM +0200, Carmelo Amoroso wrote:
> >> Just a few other notes. The current implementation I did based on SysV
> >> has a drawback that is not backward compatible, so you cannot use old
> >> modules with a kernel with the option enabled due to changes on struct
> >> kernel_symbol.
> >
> > Why would this be a problem? ?Whenever making a kernel config change,
> > you should be able to rebuild everything, as lots of other configuration
> > options are that way.
> >
>
> This is not always true... there could be cases in which you cannot
> recompile old modules (e.g vendors that provide non GPL modules)
But we do not care at all about that kind of thing, sorry.
> >> Anyway I've just figured out how to change it to remove this limitation.
> >> I need some time to review these patches. ?Further, the newer
> >> implementation based on GNU hash which we are working on right now,
> >> will not require the extra .undef.hash ELF sections because hash
> >> values are already embedded into the GNU hash table, with a reduction
> >> in terms of footprint.
> >
> > Footprint in the memory for the loaded module, or just in the footprint
> > for the module on the disk?
> >
>
> both
Why would the already-loaded module size increase?
I guess I'll just wait to see the code before worrying about this :)
thanks,
greg k-h
On 10/19/09, Greg KH <[email protected]> wrote:
> On Mon, Oct 19, 2009 at 05:02:51PM +0200, Carmelo Amoroso wrote:
>> 2009/10/19 Greg KH <[email protected]>:
>> > On Mon, Oct 19, 2009 at 01:45:20PM +0200, Carmelo Amoroso wrote:
>> >> Just a few other notes. The current implementation I did based on SysV
>> >> has a drawback that is not backward compatible, so you cannot use old
>> >> modules with a kernel with the option enabled due to changes on struct
>> >> kernel_symbol.
>> >
>> > Why would this be a problem? ?Whenever making a kernel config change,
>> > you should be able to rebuild everything, as lots of other configuration
>> > options are that way.
>> >
>>
>> This is not always true... there could be cases in which you cannot
>> recompile old modules (e.g vendors that provide non GPL modules)
>
> But we do not care at all about that kind of thing, sorry.
>
>> >> Anyway I've just figured out how to change it to remove this
>> >> limitation.
>> >> I need some time to review these patches. ?Further, the newer
>> >> implementation based on GNU hash which we are working on right now,
>> >> will not require the extra .undef.hash ELF sections because hash
>> >> values are already embedded into the GNU hash table, with a reduction
>> >> in terms of footprint.
>> >
>> > Footprint in the memory for the loaded module, or just in the footprint
>> > for the module on the disk?
>> >
>>
>> both
>
> Why would the already-loaded module size increase?
>
> I guess I'll just wait to see the code before worrying about this :)
Modules export symbols, as well as importing them :).
On Tue, 20 Oct 2009 01:32:51 am Carmelo Amoroso wrote:
> 2009/10/19 Greg KH <[email protected]>:
> > On Mon, Oct 19, 2009 at 01:45:20PM +0200, Carmelo Amoroso wrote:
> >> Just a few other notes. The current implementation I did based on SysV
> >> has a drawback that is not backward compatible, so you cannot use old
> >> modules with a kernel with the option enabled due to changes on struct
> >> kernel_symbol.
> >
> > Why would this be a problem? Whenever making a kernel config change,
> > you should be able to rebuild everything, as lots of other configuration
> > options are that way.
>
> This is not always true... there could be cases in which you cannot
> recompile old modules
> (e.g vendors that provide non GPL modules)
And breaking them is a feature. I do not go out of my way to avoid breaking
out-of-tree modules; it's certainly more important to have simple maintainable
code.
You guys figure out what the best speed/size tradeoff is, and send me the
patch for review.
Thanks!
Rusty.
On 10/19/2009 09:02 AM, Carmelo Amoroso wrote:
> 2009/10/19 Greg KH<[email protected]>:
>> On Mon, Oct 19, 2009 at 01:45:20PM +0200, Carmelo Amoroso wrote:
>>> Just a few other notes. The current implementation I did based on SysV
>>> has a drawback that is not backward compatible, so you cannot use old
>>> modules with a kernel with the option enabled due to changes on struct
>>> kernel_symbol.
>>
>> Why would this be a problem? Whenever making a kernel config change,
>> you should be able to rebuild everything, as lots of other configuration
>> options are that way.
>>
>
> This is not always true... there could be cases in which you cannot
> recompile old modules
> (e.g vendors that provide non GPL modules)
Even non-GPL modules can normally be rebuilt as far as the module format
is concerned, there's usually an object file blob that gets compiled
into a module on install or something, like the Nvidia graphics driver.
If anyone's providing binary-only fully built modules (which would be
inherently tied to one exact kernel version and one configuration) they
really need to have their head examined..
On Tue, Oct 20, 2009 at 11:43:44PM -0600, Robert Hancock wrote:
> On 10/19/2009 09:02 AM, Carmelo Amoroso wrote:
>> 2009/10/19 Greg KH<[email protected]>:
>>> On Mon, Oct 19, 2009 at 01:45:20PM +0200, Carmelo Amoroso wrote:
>>>> Just a few other notes. The current implementation I did based on SysV
>>>> has a drawback that is not backward compatible, so you cannot use old
>>>> modules with a kernel with the option enabled due to changes on struct
>>>> kernel_symbol.
>>>
>>> Why would this be a problem? Whenever making a kernel config change,
>>> you should be able to rebuild everything, as lots of other configuration
>>> options are that way.
>>>
>>
>> This is not always true... there could be cases in which you cannot
>> recompile old modules
>> (e.g vendors that provide non GPL modules)
>
> Even non-GPL modules can normally be rebuilt as far as the module format is
> concerned, there's usually an object file blob that gets compiled into a
> module on install or something, like the Nvidia graphics driver. If
> anyone's providing binary-only fully built modules (which would be
> inherently tied to one exact kernel version and one configuration) they
> really need to have their head examined..
Welcome to embedded Linux :)