2015-04-14 17:20:58

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH] tags: much faster, parallel "make tags"

ctags is single-threaded program. Split list of files to be tagged into
equal parts, 1 part for each CPU and then merge the results.

Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
On another 4-way box: ~120 s => ~65 s (-46%!).

Resulting "tags" files aren't byte-for-byte identical because ctags
program numbers anon struct and enum declarations with "__anonNNN"
symbols. If those lines are removed, "tags" file becomes byte-for-byte
identical with those generated with current code.

Signed-off-by: Alexey Dobriyan <[email protected]>
---

scripts/tags.sh | 34 ++++++++++++++++++++++++++++++++--
1 file changed, 32 insertions(+), 2 deletions(-)

--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -152,7 +152,24 @@ dogtags()

exuberant()
{
- all_target_sources | xargs $1 -a \
+ NR_CPUS=1
+ if [ -e /proc/cpuinfo ]; then
+ NR_CPUS=$(grep -e '^processor : ' /proc/cpuinfo | wc -l)
+ fi
+
+ rm -f .make-tags.src.* .make-tags.*
+
+ all_target_sources >.make-tags.src
+ # seems like Useless Use of cat(1) but not really
+ NR_LINES=$(cat .make-tags.src | wc -l)
+ NR_LINES=$((($NR_LINES + $NR_CPUS - 1) / $NR_CPUS))
+
+ split -a 6 -d -l $NR_LINES .make-tags.src .make-tags.src.
+
+ for i in .make-tags.src.*; do
+ N=$(echo $i | sed -e 's/.*\.//')
+ # -u: don't sort now, sort later
+ cat $i | xargs $1 -a -f .make-tags.$N -u \
-I __initdata,__exitdata,__initconst, \
-I __cpuinitdata,__initdata_memblock \
-I __refdata,__attribute,__maybe_unused,__always_unused \
@@ -211,7 +228,20 @@ exuberant()
--regex-c='/DEFINE_PCI_DEVICE_TABLE\((\w*)/\1/v/' \
--regex-c='/(^\s)OFFSET\((\w*)/\2/v/' \
--regex-c='/(^\s)DEFINE\((\w*)/\2/v/' \
- --regex-c='/DEFINE_HASHTABLE\((\w*)/\1/v/'
+ --regex-c='/DEFINE_HASHTABLE\((\w*)/\1/v/' \
+ &
+ done
+ wait
+ rm -f .make-tags.src .make-tags.src.*
+
+ # write header
+ $1 -f tags /dev/null
+ # remove header
+ for i in .make-tags.*; do
+ sed -i -e '/^!/d' $i
+ done
+ sort .make-tags.* >>tags
+ rm -f .make-tags.*

all_kconfigs | xargs $1 -a \
--langdef=kconfig --language-force=kconfig \


2015-04-14 20:05:26

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] tags: much faster, parallel "make tags"

On 04/14/15 10:20, Alexey Dobriyan wrote:
> ctags is single-threaded program. Split list of files to be tagged into
> equal parts, 1 part for each CPU and then merge the results.
>
> Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
> On another 4-way box: ~120 s => ~65 s (-46%!).
>
> Resulting "tags" files aren't byte-for-byte identical because ctags
> program numbers anon struct and enum declarations with "__anonNNN"
> symbols. If those lines are removed, "tags" file becomes byte-for-byte
> identical with those generated with current code.
>
> Signed-off-by: Alexey Dobriyan <[email protected]>
> ---
>
> scripts/tags.sh | 34 ++++++++++++++++++++++++++++++++--
> 1 file changed, 32 insertions(+), 2 deletions(-)
>
> --- a/scripts/tags.sh
> +++ b/scripts/tags.sh
> @@ -152,7 +152,24 @@ dogtags()
>
> exuberant()
> {
> - all_target_sources | xargs $1 -a \
> + NR_CPUS=1
> + if [ -e /proc/cpuinfo ]; then
> + NR_CPUS=$(grep -e '^processor : ' /proc/cpuinfo | wc -l)

That grep is rather arch-specific. If an arch does not have that string
(with an embedded tab), won't NR_CPUS be zero? so at least, set it back to 1?

or (if 'getconf' is installed):
NR_CPUS = `getconf _NPROCESSORS_ONLN`

> + fi
> +
> + rm -f .make-tags.src.* .make-tags.*
> +
> + all_target_sources >.make-tags.src
> + # seems like Useless Use of cat(1) but not really
> + NR_LINES=$(cat .make-tags.src | wc -l)
> + NR_LINES=$((($NR_LINES + $NR_CPUS - 1) / $NR_CPUS))
> +
> + split -a 6 -d -l $NR_LINES .make-tags.src .make-tags.src.
> +
> + for i in .make-tags.src.*; do
> + N=$(echo $i | sed -e 's/.*\.//')
> + # -u: don't sort now, sort later
> + cat $i | xargs $1 -a -f .make-tags.$N -u \
> -I __initdata,__exitdata,__initconst, \
> -I __cpuinitdata,__initdata_memblock \
> -I __refdata,__attribute,__maybe_unused,__always_unused \
> @@ -211,7 +228,20 @@ exuberant()
> --regex-c='/DEFINE_PCI_DEVICE_TABLE\((\w*)/\1/v/' \
> --regex-c='/(^\s)OFFSET\((\w*)/\2/v/' \
> --regex-c='/(^\s)DEFINE\((\w*)/\2/v/' \
> - --regex-c='/DEFINE_HASHTABLE\((\w*)/\1/v/'
> + --regex-c='/DEFINE_HASHTABLE\((\w*)/\1/v/' \
> + &
> + done
> + wait
> + rm -f .make-tags.src .make-tags.src.*
> +
> + # write header
> + $1 -f tags /dev/null
> + # remove header
> + for i in .make-tags.*; do
> + sed -i -e '/^!/d' $i
> + done
> + sort .make-tags.* >>tags
> + rm -f .make-tags.*
>
> all_kconfigs | xargs $1 -a \
> --langdef=kconfig --language-force=kconfig \
> --


--
~Randy

2015-04-14 20:25:34

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] tags: much faster, parallel "make tags"

On Tue, Apr 14, 2015 at 01:05:09PM -0700, Randy Dunlap wrote:
> On 04/14/15 10:20, Alexey Dobriyan wrote:
> > ctags is single-threaded program. Split list of files to be tagged into
> > equal parts, 1 part for each CPU and then merge the results.
> >
> > Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
> > On another 4-way box: ~120 s => ~65 s (-46%!).
> >
> > Resulting "tags" files aren't byte-for-byte identical because ctags
> > program numbers anon struct and enum declarations with "__anonNNN"
> > symbols. If those lines are removed, "tags" file becomes byte-for-byte
> > identical with those generated with current code.
> >
> > Signed-off-by: Alexey Dobriyan <[email protected]>
> > ---
> >
> > scripts/tags.sh | 34 ++++++++++++++++++++++++++++++++--
> > 1 file changed, 32 insertions(+), 2 deletions(-)
> >
> > --- a/scripts/tags.sh
> > +++ b/scripts/tags.sh
> > @@ -152,7 +152,24 @@ dogtags()
> >
> > exuberant()
> > {
> > - all_target_sources | xargs $1 -a \
> > + NR_CPUS=1
> > + if [ -e /proc/cpuinfo ]; then
> > + NR_CPUS=$(grep -e '^processor : ' /proc/cpuinfo | wc -l)
>
> That grep is rather arch-specific. If an arch does not have that string
> (with an embedded tab), won't NR_CPUS be zero? so at least, set it back to 1?
>
> or (if 'getconf' is installed):
> NR_CPUS = `getconf _NPROCESSORS_ONLN`
>
What is wrong with nproc ?

Guenter

2015-04-15 09:36:59

by Michal Marek

[permalink] [raw]
Subject: Re: [PATCH] tags: much faster, parallel "make tags"

On 2015-04-14 22:24, Guenter Roeck wrote:
> On Tue, Apr 14, 2015 at 01:05:09PM -0700, Randy Dunlap wrote:
>> On 04/14/15 10:20, Alexey Dobriyan wrote:
>>> ctags is single-threaded program. Split list of files to be tagged into
>>> equal parts, 1 part for each CPU and then merge the results.
>>>
>>> Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
>>> On another 4-way box: ~120 s => ~65 s (-46%!).
>>>
>>> Resulting "tags" files aren't byte-for-byte identical because ctags
>>> program numbers anon struct and enum declarations with "__anonNNN"
>>> symbols. If those lines are removed, "tags" file becomes byte-for-byte
>>> identical with those generated with current code.
>>>
>>> Signed-off-by: Alexey Dobriyan <[email protected]>
>>> ---
>>>
>>> scripts/tags.sh | 34 ++++++++++++++++++++++++++++++++--
>>> 1 file changed, 32 insertions(+), 2 deletions(-)
>>>
>>> --- a/scripts/tags.sh
>>> +++ b/scripts/tags.sh
>>> @@ -152,7 +152,24 @@ dogtags()
>>>
>>> exuberant()
>>> {
>>> - all_target_sources | xargs $1 -a \
>>> + NR_CPUS=1
>>> + if [ -e /proc/cpuinfo ]; then
>>> + NR_CPUS=$(grep -e '^processor : ' /proc/cpuinfo | wc -l)
>>
>> That grep is rather arch-specific. If an arch does not have that string
>> (with an embedded tab), won't NR_CPUS be zero? so at least, set it back to 1?
>>
>> or (if 'getconf' is installed):
>> NR_CPUS = `getconf _NPROCESSORS_ONLN`
>>
> What is wrong with nproc ?

It's too new. 'getconf _NPROCESSORS_ONLN' has been available since 1996
according to glibc.git.

Michal

2015-04-15 09:38:50

by Michal Marek

[permalink] [raw]
Subject: Re: [PATCH] tags: much faster, parallel "make tags"

On 2015-04-14 19:20, Alexey Dobriyan wrote:
> ctags is single-threaded program. Split list of files to be tagged into
> equal parts, 1 part for each CPU and then merge the results.
>
> Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
> On another 4-way box: ~120 s => ~65 s (-46%!).

I want this! :-)


> + # seems like Useless Use of cat(1) but not really
> + NR_LINES=$(cat .make-tags.src | wc -l)

wc -l <.make-tags.src

Michal

2015-04-15 09:51:16

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH] tags: much faster, parallel "make tags"

On Wed, Apr 15, 2015 at 12:38 PM, Michal Marek <[email protected]> wrote:
> On 2015-04-14 19:20, Alexey Dobriyan wrote:
>> ctags is single-threaded program. Split list of files to be tagged into
>> equal parts, 1 part for each CPU and then merge the results.
>>
>> Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
>> On another 4-way box: ~120 s => ~65 s (-46%!).
>
> I want this! :-)
>
>
>> + # seems like Useless Use of cat(1) but not really
>> + NR_LINES=$(cat .make-tags.src | wc -l)
>
> wc -l <.make-tags.src

Indeed.

I'll change script to use getconf, since it appears to be standardized.

2015-04-15 13:24:35

by Michal Marek

[permalink] [raw]
Subject: Re: [PATCH] tags: much faster, parallel "make tags"

On 2015-04-14 19:20, Alexey Dobriyan wrote:
> ctags is single-threaded program. Split list of files to be tagged into
> equal parts, 1 part for each CPU and then merge the results.
>
> Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
> On another 4-way box: ~120 s => ~65 s (-46%!).
>
> Resulting "tags" files aren't byte-for-byte identical because ctags
> program numbers anon struct and enum declarations with "__anonNNN"
> symbols. If those lines are removed, "tags" file becomes byte-for-byte
> identical with those generated with current code.
>
> Signed-off-by: Alexey Dobriyan <[email protected]>
> ---
>
> scripts/tags.sh | 34 ++++++++++++++++++++++++++++++++--
> 1 file changed, 32 insertions(+), 2 deletions(-)
>
> --- a/scripts/tags.sh
> +++ b/scripts/tags.sh
> @@ -152,7 +152,24 @@ dogtags()
>
> exuberant()
> {
> - all_target_sources | xargs $1 -a \
> + NR_CPUS=1
> + if [ -e /proc/cpuinfo ]; then
> + NR_CPUS=$(grep -e '^processor : ' /proc/cpuinfo | wc -l)
> + fi

I wonder if we should rather respect the -j option to make here. But
then most people probably won't realize that make tags is parallel and
will not use -j when generating tags. So let's leave it as is.


> +
> + rm -f .make-tags.src.* .make-tags.*

.make-tags.src.* is a subset of .make-tags.*


> +
> + all_target_sources >.make-tags.src
> + # seems like Useless Use of cat(1) but not really
> + NR_LINES=$(cat .make-tags.src | wc -l)
> + NR_LINES=$((($NR_LINES + $NR_CPUS - 1) / $NR_CPUS))
> +
> + split -a 6 -d -l $NR_LINES .make-tags.src .make-tags.src.
> +
> + for i in .make-tags.src.*; do
> + N=$(echo $i | sed -e 's/.*\.//')
> + # -u: don't sort now, sort later
> + cat $i | xargs $1 -a -f .make-tags.$N -u \

xargs <$i $1 ... if you are concerned about uses of cat(1) ;) and the -a
option is not necessary since we are creating the tmp files.


> + # write header
> + $1 -f tags /dev/null
> + # remove header
> + for i in .make-tags.*; do
> + sed -i -e '/^!/d' $i
> + done
> + sort .make-tags.* >>tags

The hardcoded "tags" filename will break 'make TAGS' when using
exuberant ctags via an 'etags' symlink.

Michal

2015-04-15 13:41:38

by Michal Marek

[permalink] [raw]
Subject: Re: [PATCH] tags: much faster, parallel "make tags"

On 2015-04-15 15:24, Michal Marek wrote:
> On 2015-04-14 19:20, Alexey Dobriyan wrote:
>> ctags is single-threaded program. Split list of files to be tagged into
>> equal parts, 1 part for each CPU and then merge the results.
>>
>> Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
>> On another 4-way box: ~120 s => ~65 s (-46%!).
>>
>> Resulting "tags" files aren't byte-for-byte identical because ctags
>> program numbers anon struct and enum declarations with "__anonNNN"
>> symbols. If those lines are removed, "tags" file becomes byte-for-byte
>> identical with those generated with current code.
>>
>> Signed-off-by: Alexey Dobriyan <[email protected]>
>> ---
>>
>> scripts/tags.sh | 34 ++++++++++++++++++++++++++++++++--
>> 1 file changed, 32 insertions(+), 2 deletions(-)
>>
>> --- a/scripts/tags.sh
>> +++ b/scripts/tags.sh
>> @@ -152,7 +152,24 @@ dogtags()
>>
>> exuberant()
>> {
>> - all_target_sources | xargs $1 -a \
>> + NR_CPUS=1
>> + if [ -e /proc/cpuinfo ]; then
>> + NR_CPUS=$(grep -e '^processor : ' /proc/cpuinfo | wc -l)
>> + fi
>
> I wonder if we should rather respect the -j option to make here. But
> then most people probably won't realize that make tags is parallel and
> will not use -j when generating tags. So let's leave it as is.

I meant, leave the concept as is, but fix the detection of the number of
cpus.


>> + # write header
>> + $1 -f tags /dev/null
>> + # remove header
>> + for i in .make-tags.*; do
>> + sed -i -e '/^!/d' $i
>> + done
>> + sort .make-tags.* >>tags
>
> The hardcoded "tags" filename will break 'make TAGS' when using
> exuberant ctags via an 'etags' symlink.

Additionally, the TAGS file must not be sorted.

Michal

2015-04-15 19:45:44

by Alexey Dobriyan

[permalink] [raw]
Subject: [PATCH v2] tags: much faster, parallel "make tags"

On Wed, Apr 15, 2015 at 03:24:26PM +0200, Michal Marek wrote:
> On 2015-04-14 19:20, Alexey Dobriyan wrote:
> > ctags is single-threaded program. Split list of files to be tagged into
> > equal parts, 1 part for each CPU and then merge the results.
> >
> > Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
> > On another 4-way box: ~120 s => ~65 s (-46%!).
> >
> > Resulting "tags" files aren't byte-for-byte identical because ctags
> > program numbers anon struct and enum declarations with "__anonNNN"
> > symbols. If those lines are removed, "tags" file becomes byte-for-byte
> > identical with those generated with current code.
> >
> > Signed-off-by: Alexey Dobriyan <[email protected]>
> > ---
> >
> > scripts/tags.sh | 34 ++++++++++++++++++++++++++++++++--
> > 1 file changed, 32 insertions(+), 2 deletions(-)
> >
> > --- a/scripts/tags.sh
> > +++ b/scripts/tags.sh
> > @@ -152,7 +152,24 @@ dogtags()
> >
> > exuberant()
> > {
> > - all_target_sources | xargs $1 -a \
> > + NR_CPUS=1
> > + if [ -e /proc/cpuinfo ]; then
> > + NR_CPUS=$(grep -e '^processor : ' /proc/cpuinfo | wc -l)
> > + fi
>
> I wonder if we should rather respect the -j option to make here. But
> then most people probably won't realize that make tags is parallel and
> will not use -j when generating tags. So let's leave it as is.
>
>
> > +
> > + rm -f .make-tags.src.* .make-tags.*
>
> .make-tags.src.* is a subset of .make-tags.*
>
>
> > +
> > + all_target_sources >.make-tags.src
> > + # seems like Useless Use of cat(1) but not really
> > + NR_LINES=$(cat .make-tags.src | wc -l)
> > + NR_LINES=$((($NR_LINES + $NR_CPUS - 1) / $NR_CPUS))
> > +
> > + split -a 6 -d -l $NR_LINES .make-tags.src .make-tags.src.
> > +
> > + for i in .make-tags.src.*; do
> > + N=$(echo $i | sed -e 's/.*\.//')
> > + # -u: don't sort now, sort later
> > + cat $i | xargs $1 -a -f .make-tags.$N -u \
>
> xargs <$i $1 ... if you are concerned about uses of cat(1) ;) and the -a
> option is not necessary since we are creating the tmp files.

If "-a" is omitted, tags file becomes broken for some reason,
e. g. "file_operations" isn't there anymore.

> > + # write header
> > + $1 -f tags /dev/null
> > + # remove header
> > + for i in .make-tags.*; do
> > + sed -i -e '/^!/d' $i
> > + done
> > + sort .make-tags.* >>tags
>
> The hardcoded "tags" filename will break 'make TAGS' when using
> exuberant ctags via an 'etags' symlink.

I don't understand why would you fake etags with ctags.
It would go via "emacs" path, which is not changed.
----------------------------------------------------------------------
[PATCH] tags: much faster, parallel "make tags"

ctags is single-threaded program. Split list of files to be tagged into
equal parts, 1 part for each CPU and then merge the results.

Speedup on one 2-way box I have is ~143 s => ~99 s (-31%).
On another 4-way box: ~120 s => ~65 s (-46%!).

Resulting "tags" files aren't byte-for-byte identical because ctags
program numbers anon struct and enum declarations with "__anonNNN"
symbols. If those lines are removed, "tags" file becomes byte-for-byte
identical with those generated with current code.

Signed-off-by: Alexey Dobriyan <[email protected]>
---

scripts/tags.sh | 36 +++++++++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 5 deletions(-)

--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -152,7 +152,19 @@ dogtags()

exuberant()
{
- all_target_sources | xargs $1 -a \
+ rm -f .make-tags.*
+
+ all_target_sources >.make-tags.src
+ NR_CPUS=$(getconf _NPROCESSORS_ONLN 2>/dev/null || echo 1)
+ NR_LINES=$(wc -l <.make-tags.src)
+ NR_LINES=$((($NR_LINES + $NR_CPUS - 1) / $NR_CPUS))
+
+ split -a 6 -d -l $NR_LINES .make-tags.src .make-tags.src.
+
+ for i in .make-tags.src.*; do
+ N=$(echo $i | sed -e 's/.*\.//')
+ # -u: don't sort now, sort later
+ xargs <$i $1 -a -f .make-tags.$N -u \
-I __initdata,__exitdata,__initconst, \
-I __cpuinitdata,__initdata_memblock \
-I __refdata,__attribute,__maybe_unused,__always_unused \
@@ -211,7 +223,21 @@ exuberant()
--regex-c='/DEFINE_PCI_DEVICE_TABLE\((\w*)/\1/v/' \
--regex-c='/(^\s)OFFSET\((\w*)/\2/v/' \
--regex-c='/(^\s)DEFINE\((\w*)/\2/v/' \
- --regex-c='/DEFINE_HASHTABLE\((\w*)/\1/v/'
+ --regex-c='/DEFINE_HASHTABLE\((\w*)/\1/v/' \
+ &
+ done
+ wait
+ rm -f .make-tags.src .make-tags.src.*
+
+ # write header
+ $1 -f $2 /dev/null
+ # remove headers
+ for i in .make-tags.*; do
+ sed -i -e '/^!/d' $i &
+ done
+ wait
+ sort .make-tags.* >>$2
+ rm -f .make-tags.*

all_kconfigs | xargs $1 -a \
--langdef=kconfig --language-force=kconfig \
@@ -276,7 +302,7 @@ emacs()
xtags()
{
if $1 --version 2>&1 | grep -iq exuberant; then
- exuberant $1
+ exuberant $1 $2
elif $1 --version 2>&1 | grep -iq emacs; then
emacs $1
else
@@ -322,13 +348,13 @@ case "$1" in

"tags")
rm -f tags
- xtags ctags
+ xtags ctags tags
remove_structs=y
;;

"TAGS")
rm -f TAGS
- xtags etags
+ xtags etags TAGS
remove_structs=y
;;
esac