Hi Greg,
It follows a series of improvements for get_abi.pl. it is on the top of next-20210923.
With such changes, on my development tree, the script is taking 6 seconds to run
on my desktop:
$ !1076
$ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
real 0m6,292s
user 0m5,640s
sys 0m0,634s
6838 undefined_after
808 undefined_symbols
7646 total
And 7 seconds on a Dell Precision 5820:
$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
real 0m7.162s
user 0m5.836s
sys 0m1.329s
6548 undefined
772 undefined_symbols
Both tests were done against this tree (based on today's linux-next):
$ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
It should be noticed that, as my tree has several ABI fixes, the time to run the
script is likely less than if you run on your tree, as there will be less symbols to
be reported, and the algorithm is optimized to reduce the number of regexes
when a symbol is found.
Besides optimizing and improving the seek logic, this series also change the
debug logic. It how receives a bitmap, where "8" means to print the regexes
that will be used by "undefined" command:
$ time ./scripts/get_abi.pl undefined --debug 8 >foo
real 0m17,189s
user 0m13,940s
sys 0m2,404s
$wc -l foo
18421939 foo
$ cat foo
...
/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
...
On other words, on my desktop, the /sys match is performing >18M regular
expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is
enabled and sent to an area on my nvme storage).
Regards,
Mauro
---
Mauro Carvalho Chehab (13):
scripts: get_abi.pl: Better handle multiple What parameters
scripts: get_abi.pl: Check for missing symbols at the ABI specs
scripts: get_abi.pl: detect softlinks
scripts: get_abi.pl: add an option to filter undefined results
scripts: get_abi.pl: don't skip what that ends with wildcards
scripts: get_abi.pl: Ignore fs/cgroup sysfs nodes earlier
scripts: get_abi.pl: add a graph to speedup the undefined algorithm
scripts: get_abi.pl: improve debug logic
scripts: get_abi.pl: Better handle leaves with wildcards
scripts: get_abi.pl: ignore some sysfs nodes earlier
scripts: get_abi.pl: stop check loop earlier when regex is found
scripts: get_abi.pl: precompile what match regexes
scripts: get_abi.pl: ensure that "others" regex will be parsed
scripts/get_abi.pl | 388 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 372 insertions(+), 16 deletions(-)
--
2.31.1
When checking for undefined symbols, some nodes aren't easy
or don't make sense to be checked right now. Prevent allocating
memory for those, as they'll be ignored anyway.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 3c0063d0e05e..42eb16eb78e9 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -628,6 +628,14 @@ sub parse_existing_sysfs {
# Ignore cgroup and firmware
return if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
+ # Ignore some sysfs nodes
+ return if ($file =~ m#/(sections|notes)/#);
+
+ # Would need to check at
+ # Documentation/admin-guide/kernel-parameters.txt, but this
+ # is not easily parseable.
+ return if ($file =~ m#/parameters/#);
+
my $mode = (lstat($file))[2];
my $abs_file = abs_path($file);
@@ -709,14 +717,6 @@ sub check_undefined_symbols {
next if ($exact);
- # Ignore some sysfs nodes
- next if ($file =~ m#/(sections|notes)/#);
-
- # Would need to check at
- # Documentation/admin-guide/kernel-parameters.txt, but this
- # is not easily parseable.
- next if ($file =~ m#/parameters/#);
-
if ($hint && $defined && (!$search_string || $found_string)) {
$what =~ s/\xac/\n\t/g;
if ($leave ne "others") {
--
2.31.1
Check for the symbols that exists under /sys but aren't
defined at Documentation/ABI.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 90 ++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 88 insertions(+), 2 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 48077feea89c..e714bf75f5c2 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -13,7 +13,9 @@ my $help = 0;
my $man = 0;
my $debug = 0;
my $enable_lineno = 0;
+my $show_warnings = 1;
my $prefix="Documentation/ABI";
+my $sysfs_prefix="/sys";
#
# If true, assumes that the description is formatted with ReST
@@ -36,7 +38,7 @@ pod2usage(2) if (scalar @ARGV < 1 || @ARGV > 2);
my ($cmd, $arg) = @ARGV;
-pod2usage(2) if ($cmd ne "search" && $cmd ne "rest" && $cmd ne "validate");
+pod2usage(2) if ($cmd ne "search" && $cmd ne "rest" && $cmd ne "validate" && $cmd ne "undefined");
pod2usage(2) if ($cmd eq "search" && !$arg);
require Data::Dumper if ($debug);
@@ -50,6 +52,8 @@ my %symbols;
sub parse_error($$$$) {
my ($file, $ln, $msg, $data) = @_;
+ return if (!$show_warnings);
+
$data =~ s/\s+$/\n/;
print STDERR "Warning: file $file#$ln:\n\t$msg";
@@ -522,11 +526,88 @@ sub search_symbols {
}
}
+# Exclude /sys/kernel/debug and /sys/kernel/tracing from the search path
+sub skip_debugfs {
+ if (($File::Find::dir =~ m,^/sys/kernel,)) {
+ return grep {!/(debug|tracing)/ } @_;
+ }
+
+ if (($File::Find::dir =~ m,^/sys/fs,)) {
+ return grep {!/(pstore|bpf|fuse)/ } @_;
+ }
+
+ return @_
+}
+
+my %leaf;
+
+my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xff]) }x;
+sub parse_existing_sysfs {
+ my $file = $File::Find::name;
+
+ my $mode = (stat($file))[2];
+ return if ($mode & S_IFDIR);
+
+ my $leave = $file;
+ $leave =~ s,.*/,,;
+
+ if (defined($leaf{$leave})) {
+ # FIXME: need to check if the path makes sense
+ my $what = $leaf{$leave};
+
+ $what =~ s/,/ /g;
+
+ $what =~ s/\<[^\>]+\>/.*/g;
+ $what =~ s/\{[^\}]+\}/.*/g;
+ $what =~ s/\[[^\]]+\]/.*/g;
+ $what =~ s,/\.\.\./,/.*/,g;
+ $what =~ s,/\*/,/.*/,g;
+
+ $what =~ s/\s+/ /g;
+
+ # Escape all other symbols
+ $what =~ s/$escape_symbols/\\$1/g;
+
+ foreach my $i (split / /,$what) {
+ if ($file =~ m#^$i$#) {
+# print "$file: $i: OK!\n";
+ return;
+ }
+ }
+
+ print "$file: $leave is defined at $what\n";
+
+ return;
+ }
+
+ print "$file not found.\n";
+}
+
+sub undefined_symbols {
+ foreach my $w (sort keys %data) {
+ foreach my $what (split /\xac /,$w) {
+ my $leave = $what;
+ $leave =~ s,.*/,,;
+
+ if (defined($leaf{$leave})) {
+ $leaf{$leave} .= " " . $what;
+ } else {
+ $leaf{$leave} = $what;
+ }
+ }
+ }
+
+ find({wanted =>\&parse_existing_sysfs, preprocess =>\&skip_debugfs, no_chdir => 1}, $sysfs_prefix);
+}
+
# Ensure that the prefix will always end with a slash
# While this is not needed for find, it makes the patch nicer
# with --enable-lineno
$prefix =~ s,/?$,/,;
+if ($cmd eq "undefined" || $cmd eq "search") {
+ $show_warnings = 0;
+}
#
# Parses all ABI files located at $prefix dir
#
@@ -537,7 +618,9 @@ print STDERR Data::Dumper->Dump([\%data], [qw(*data)]) if ($debug);
#
# Handles the command
#
-if ($cmd eq "search") {
+if ($cmd eq "undefined") {
+ undefined_symbols;
+} elsif ($cmd eq "search") {
search_symbols;
} else {
if ($cmd eq "rest") {
@@ -576,6 +659,9 @@ B<rest> - output the ABI in ReST markup language
B<validate> - validate the ABI contents
+B<undefined> - existing symbols at the system that aren't
+ defined at Documentation/ABI
+
=back
=head1 OPTIONS
--
2.31.1
The output of this script can be too big. Add an option to
filter out results, in order to help finding issues at the
ABI files.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 37 +++++++++++++++++++++++++++++++------
1 file changed, 31 insertions(+), 6 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index a7cb4be6886c..40f10175bb98 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -18,6 +18,7 @@ my $enable_lineno = 0;
my $show_warnings = 1;
my $prefix="Documentation/ABI";
my $sysfs_prefix="/sys";
+my $search_string;
#
# If true, assumes that the description is formatted with ReST
@@ -31,6 +32,7 @@ GetOptions(
"dir=s" => \$prefix,
'help|?' => \$help,
"show-hints" => \$hint,
+ "search-string=s" => \$search_string,
man => \$man
) or pod2usage(2);
@@ -569,16 +571,13 @@ sub parse_existing_sysfs {
sub check_undefined_symbols {
foreach my $file (sort @files) {
- # sysfs-module is special, as its definitions are inside
- # a text. For now, just ignore them.
- next if ($file =~ m#^/sys/module/#);
-
# Ignore cgroup and firmware
next if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
my $defined = 0;
my $exact = 0;
my $whats = "";
+ my $found_string;
my $leave = $file;
$leave =~ s,.*/,,;
@@ -586,6 +585,12 @@ sub check_undefined_symbols {
my $path = $file;
$path =~ s,(.*/).*,$1,;
+ if ($search_string) {
+ next if (!($file =~ m#$search_string#));
+ $found_string = 1;
+ }
+
+ print "--> $file\n" if ($found_string && $hint);
if (defined($leaf{$leave})) {
my $what = $leaf{$leave};
$whats .= " $what" if (!($whats =~ m/$what/));
@@ -611,6 +616,7 @@ sub check_undefined_symbols {
if (substr($file, 0, $len) eq $new) {
my $newf = $a . substr($file, $len);
+ print " $newf\n" if ($found_string && $hint);
foreach my $w (split / /, $what) {
if ($newf =~ m#^$w$#) {
$exact = 1;
@@ -633,10 +639,10 @@ sub check_undefined_symbols {
next if ($file =~ m#/parameters/#);
if ($hint && $defined) {
- print "$leave at $path might be one of:$whats\n";
+ print "$leave at $path might be one of:$whats\n" if (!$search_string || $found_string);
next;
}
- print "$file not found.\n";
+ print "$file not found.\n" if (!$search_string || $found_string);
}
}
@@ -702,16 +708,29 @@ sub undefined_symbols {
$what =~ s/\\([\[\]\(\)\|])/$1/g;
$what =~ s/(\d+)\\(-\d+)/$1$2/g;
+ $what =~ s/\xff/\\d+/g;
+
+
+ # Special case: IIO ABI which a parenthesis.
+ $what =~ s/sqrt(.*)/sqrt\(.*\)/;
+
$leave =~ s/[\(\)]//g;
+ my $added = 0;
foreach my $l (split /\|/, $leave) {
if (defined($leaf{$l})) {
next if ($leaf{$l} =~ m/$what/);
$leaf{$l} .= " " . $what;
+ $added = 1;
} else {
$leaf{$l} = $what;
+ $added = 1;
}
}
+ if ($search_string && $added) {
+ print "What: $what\n" if ($what =~ m#$search_string#);
+ }
+
}
}
check_undefined_symbols;
@@ -765,6 +784,7 @@ abi_book.pl - parse the Linux ABI files and produce a ReST book.
B<abi_book.pl> [--debug] [--enable-lineno] [--man] [--help]
[--(no-)rst-source] [--dir=<dir>] [--show-hints]
+ [--search-string <regex>]
<COMAND> [<ARGUMENT>]
Where <COMMAND> can be:
@@ -812,6 +832,11 @@ times, to increase verbosity.
Show hints about possible definitions for the missing ABI symbols.
Used only when B<undefined>.
+=item B<--search-string> [regex string]
+
+Show only occurences that match a search string.
+Used only when B<undefined>.
+
=item B<--help>
Prints a brief help message and exits.
--
2.31.1
Using a comma here is problematic, as some What: expressions
may already contain a comma. So, use \xac character instead.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index d7aa82094296..48077feea89c 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -129,12 +129,12 @@ sub parse_abi {
push @{$symbols{$content}->{file}}, " $file:" . ($ln - 1);
if ($tag =~ m/what/) {
- $what .= ", " . $content;
+ $what .= "\xac" . $content;
} else {
if ($what) {
parse_error($file, $ln, "What '$what' doesn't have a description", "") if (!$data{$what}->{description});
- foreach my $w(split /, /, $what) {
+ foreach my $w(split /\xac/, $what) {
$symbols{$w}->{xref} = $what;
};
}
@@ -239,7 +239,7 @@ sub parse_abi {
if ($what) {
parse_error($file, $ln, "What '$what' doesn't have a description", "") if (!$data{$what}->{description});
- foreach my $w(split /, /,$what) {
+ foreach my $w(split /\xac/,$what) {
$symbols{$w}->{xref} = $what;
};
}
@@ -328,7 +328,7 @@ sub output_rest {
printf ".. _%s:\n\n", $data{$what}->{label};
- my @names = split /, /,$w;
+ my @names = split /\xac/,$w;
my $len = 0;
foreach my $name (@names) {
@@ -492,6 +492,7 @@ sub search_symbols {
my $file = $data{$what}->{filepath};
+ $what =~ s/\xac/, /g;
my $bar = $what;
$bar =~ s/./-/g;
--
2.31.1
In order to speedup the parser and store less data, handle
fs/cgroup exceptions a lot earlier.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 8f69acec4ae5..41a49ae31c25 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -551,6 +551,10 @@ my @files;
my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xfe]) }x;
sub parse_existing_sysfs {
my $file = $File::Find::name;
+
+ # Ignore cgroup and firmware
+ return if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
+
my $mode = (lstat($file))[2];
my $abs_file = abs_path($file);
@@ -571,9 +575,6 @@ sub parse_existing_sysfs {
sub check_undefined_symbols {
foreach my $file (sort @files) {
- # Ignore cgroup and firmware
- next if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
-
my $defined = 0;
my $exact = 0;
my $whats = "";
--
2.31.1
When the the leaf of a regex ends with a wildcard, the speedup
algorithm to reduce the number of regexes to seek won't work.
So, when those are found, place at the "others" exception.
That slows down the search from 0.14s to 1 minute on my
machine, but the results are a lot more consistent.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index bb80303fea22..3c0063d0e05e 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -665,7 +665,7 @@ sub get_leave($)
# However, there are a few occurences where the leave is
# either a wildcard or a number. Just group such cases
# altogether.
- if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
+ if ($leave =~ m/\.\*/ || $leave eq "" || $leave =~ /\\d/) {
$leave = "others";
}
--
2.31.1
The search algorithm used inside check_undefined_symbols
has an optimization: it seeks only whats that have the same
leave name. This helps not only to speedup the search, but
it also allows providing a hint about a partial match.
There's a drawback, however: when "what:" finishes with a
wildcard, the logic will skip the what, reporting it as
"not found".
Fix it by grouping the remaining cases altogether, and
disabing any hints for such cases.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 74 +++++++++++++++++++++++++++-------------------
1 file changed, 43 insertions(+), 31 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 40f10175bb98..8f69acec4ae5 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -590,44 +590,47 @@ sub check_undefined_symbols {
$found_string = 1;
}
+ if ($leave =~ /^\d+$/ || !defined($leaf{$leave})) {
+ $leave = "others";
+ }
+
print "--> $file\n" if ($found_string && $hint);
- if (defined($leaf{$leave})) {
- my $what = $leaf{$leave};
- $whats .= " $what" if (!($whats =~ m/$what/));
+ my $what = $leaf{$leave};
+ $whats .= " $what" if (!($whats =~ m/$what/));
- foreach my $w (split / /, $what) {
- if ($file =~ m#^$w$#) {
- $exact = 1;
- last;
- }
+ foreach my $w (split / /, $what) {
+ if ($file =~ m#^$w$#) {
+ $exact = 1;
+ last;
}
- # Check for aliases
- #
- # TODO: this algorithm is O(w * n²). It can be
- # improved in the future in order to handle it
- # faster, by changing parse_existing_sysfs to
- # store the sysfs inside a tree, at the expense
- # on making the code less readable and/or using some
- # additional perl library.
- foreach my $a (keys %aliases) {
- my $new = $aliases{$a};
- my $len = length($new);
+ }
+ # Check for aliases
+ #
+ # TODO: this algorithm is O(w * n²). It can be
+ # improved in the future in order to handle it
+ # faster, by changing parse_existing_sysfs to
+ # store the sysfs inside a tree, at the expense
+ # on making the code less readable and/or using some
+ # additional perl library.
+ foreach my $a (keys %aliases) {
+ my $new = $aliases{$a};
+ my $len = length($new);
- if (substr($file, 0, $len) eq $new) {
- my $newf = $a . substr($file, $len);
+ if (substr($file, 0, $len) eq $new) {
+ my $newf = $a . substr($file, $len);
- print " $newf\n" if ($found_string && $hint);
- foreach my $w (split / /, $what) {
- if ($newf =~ m#^$w$#) {
- $exact = 1;
- last;
- }
+ print " $newf\n" if ($found_string && $hint);
+ foreach my $w (split / /, $what) {
+ if ($newf =~ m#^$w$#) {
+ $exact = 1;
+ last;
}
}
}
-
- $defined++;
}
+
+ $defined++;
+
next if ($exact);
# Ignore some sysfs nodes
@@ -638,7 +641,7 @@ sub check_undefined_symbols {
# is not easily parseable.
next if ($file =~ m#/parameters/#);
- if ($hint && $defined) {
+ if ($hint && $defined && $leave ne "others") {
print "$leave at $path might be one of:$whats\n" if (!$search_string || $found_string);
next;
}
@@ -700,7 +703,16 @@ sub undefined_symbols {
my $leave = $what;
$leave =~ s,.*/,,;
- next if ($leave =~ m/^\.\*/ || $leave eq "");
+ # $leave is used to improve search performance at
+ # check_undefined_symbols, as the algorithm there can seek
+ # for a small number of "what". It also allows giving a
+ # hint about a leave with the same name somewhere else.
+ # However, there are a few occurences where the leave is
+ # either a wildcard or a number. Just group such cases
+ # altogether.
+ if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
+ $leave = "others" ;
+ }
# Escape all other symbols
$what =~ s/$escape_symbols/\\$1/g;
--
2.31.1
Right now, there are two loops used to seek for a regex. Make
sure that both will be skip when a match is found.
While here, drop the unused $defined variable.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 42eb16eb78e9..d45e5ba56f9c 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -685,7 +685,6 @@ sub check_undefined_symbols {
my @names = @{$$file_ref{"__name"}};
my $file = $names[0];
- my $defined = 0;
my $exact = 0;
my $found_string;
@@ -711,13 +710,11 @@ sub check_undefined_symbols {
last;
}
}
+ last if ($exact);
}
-
- $defined++;
-
next if ($exact);
- if ($hint && $defined && (!$search_string || $found_string)) {
+ if ($hint && (!$search_string || $found_string)) {
$what =~ s/\xac/\n\t/g;
if ($leave ne "others") {
print " more likely regexes:\n\t$what\n";
--
2.31.1
Searching for symlinks is an expensive operation with the current
logic, as it is at the order of O(n^3). In practice, running the
check spends 2-3 minutes to check all symbols.
Fix it by storing the directory tree into a graph, and using
a Breadth First Search (BFS) to find the links for each sysfs node.
With such improvement, it can now report issues with ~11 seconds
on my machine.
It comes with a price, though: there are more symbols reported
as undefined after this change. I suspect it is due to some
sysfs circular loops that are dropped by BFS. Despite such
increase, it seems that the reports are now more coherent.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 189 ++++++++++++++++++++++++++++++---------------
1 file changed, 127 insertions(+), 62 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 41a49ae31c25..9eb8a033d363 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -547,6 +547,73 @@ sub dont_parse_special_attributes {
my %leaf;
my %aliases;
my @files;
+my %root;
+
+sub graph_add_file {
+ my $file = shift;
+ my $type = shift;
+
+ my $dir = $file;
+ $dir =~ s,^(.*/).*,$1,;
+ $file =~ s,.*/,,;
+
+ my $name;
+ my $file_ref = \%root;
+ foreach my $edge(split "/", $dir) {
+ $name .= "$edge/";
+ if (!defined ${$file_ref}{$edge}) {
+ ${$file_ref}{$edge} = { };
+ }
+ $file_ref = \%{$$file_ref{$edge}};
+ ${$file_ref}{"__name"} = [ $name ];
+ }
+ $name .= "$file";
+ ${$file_ref}{$file} = {
+ "__name" => [ $name ]
+ };
+
+ return \%{$$file_ref{$file}};
+}
+
+sub graph_add_link {
+ my $file = shift;
+ my $link = shift;
+
+ # Traverse graph to find the reference
+ my $file_ref = \%root;
+ foreach my $edge(split "/", $file) {
+ $file_ref = \%{$$file_ref{$edge}} || die "Missing node!";
+ }
+
+ # do a BFS
+
+ my @queue;
+ my %seen;
+ my $base_name;
+ my $st;
+
+ push @queue, $file_ref;
+ $seen{$start}++;
+
+ while (@queue) {
+ my $v = shift @queue;
+ my @child = keys(%{$v});
+
+ foreach my $c(@child) {
+ next if $seen{$$v{$c}};
+ next if ($c eq "__name");
+
+ # Add new name
+ my $name = @{$$v{$c}{"__name"}}[0];
+ if ($name =~ s#^$file/#$link/#) {
+ push @{$$v{$c}{"__name"}}, $name;
+ }
+ # Add child to the queue and mark as seen
+ push @queue, $$v{$c};
+ $seen{$c}++;
+ }
+ }
+}
my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xfe]) }x;
sub parse_existing_sysfs {
@@ -569,19 +636,50 @@ sub parse_existing_sysfs {
return if (defined($data{$file}));
return if (defined($data{$abs_file}));
- push @files, $abs_file;
+ push @files, graph_add_file($abs_file, "file");
+}
+
+sub get_leave($)
+{
+ my $what = shift;
+ my $leave;
+
+ my $l = $what;
+ my $stop = 1;
+
+ $leave = $l;
+ $leave =~ s,/$,,;
+ $leave =~ s,.*/,,;
+ $leave =~ s/[\(\)]//g;
+
+ # $leave is used to improve search performance at
+ # check_undefined_symbols, as the algorithm there can seek
+ # for a small number of "what". It also allows giving a
+ # hint about a leave with the same name somewhere else.
+ # However, there are a few occurences where the leave is
+ # either a wildcard or a number. Just group such cases
+ # altogether.
+ if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
+ $leave = "others";
+ }
+
+ return $leave;
}
sub check_undefined_symbols {
- foreach my $file (sort @files) {
+ foreach my $file_ref (sort @files) {
+ my @names = @{$$file_ref{"__name"}};
+ my $file = $names[0];
my $defined = 0;
my $exact = 0;
- my $whats = "";
my $found_string;
- my $leave = $file;
- $leave =~ s,.*/,,;
+ my $leave = get_leave($file);
+ if (!defined($leaf{$leave})) {
+ $leave = "others";
+ }
+ my $what = $leaf{$leave};
my $path = $file;
$path =~ s,(.*/).*,$1,;
@@ -591,41 +689,12 @@ sub check_undefined_symbols {
$found_string = 1;
}
- if ($leave =~ /^\d+$/ || !defined($leaf{$leave})) {
- $leave = "others";
- }
-
- print "--> $file\n" if ($found_string && $hint);
- my $what = $leaf{$leave};
- $whats .= " $what" if (!($whats =~ m/$what/));
-
- foreach my $w (split / /, $what) {
- if ($file =~ m#^$w$#) {
- $exact = 1;
- last;
- }
- }
- # Check for aliases
- #
- # TODO: this algorithm is O(w * n²). It can be
- # improved in the future in order to handle it
- # faster, by changing parse_existing_sysfs to
- # store the sysfs inside a tree, at the expense
- # on making the code less readable and/or using some
- # additional perl library.
- foreach my $a (keys %aliases) {
- my $new = $aliases{$a};
- my $len = length($new);
-
- if (substr($file, 0, $len) eq $new) {
- my $newf = $a . substr($file, $len);
-
- print " $newf\n" if ($found_string && $hint);
- foreach my $w (split / /, $what) {
- if ($newf =~ m#^$w$#) {
- $exact = 1;
- last;
- }
+ foreach my $a (@names) {
+ print "--> $a\n" if ($found_string && $hint);
+ foreach my $w (split /\xac/, $what) {
+ if ($a =~ m#^$w$#) {
+ $exact = 1;
+ last;
}
}
}
@@ -642,8 +711,13 @@ sub check_undefined_symbols {
# is not easily parseable.
next if ($file =~ m#/parameters/#);
- if ($hint && $defined && $leave ne "others") {
- print "$leave at $path might be one of:$whats\n" if (!$search_string || $found_string);
+ if ($hint && $defined && (!$search_string || $found_string)) {
+ $what =~ s/\xac/\n\t/g;
+ if ($leave ne "others") {
+ print " more likely regexes:\n\t$what\n";
+ } else {
+ print " tested regexes:\n\t$what\n";
+ }
next;
}
print "$file not found.\n" if (!$search_string || $found_string);
@@ -657,8 +731,10 @@ sub undefined_symbols {
no_chdir => 1
}, $sysfs_prefix);
+ $leaf{"others"} = "";
+
foreach my $w (sort keys %data) {
- foreach my $what (split /\xac /,$w) {
+ foreach my $what (split /\xac/,$w) {
next if (!($what =~ m/^$sysfs_prefix/));
# Convert what into regular expressions
@@ -701,20 +777,6 @@ sub undefined_symbols {
# (this happens on a few IIO definitions)
$what =~ s,\s*\=.*$,,;
- my $leave = $what;
- $leave =~ s,.*/,,;
-
- # $leave is used to improve search performance at
- # check_undefined_symbols, as the algorithm there can seek
- # for a small number of "what". It also allows giving a
- # hint about a leave with the same name somewhere else.
- # However, there are a few occurences where the leave is
- # either a wildcard or a number. Just group such cases
- # altogether.
- if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
- $leave = "others" ;
- }
-
# Escape all other symbols
$what =~ s/$escape_symbols/\\$1/g;
$what =~ s/\\\\/\\/g;
@@ -723,17 +785,15 @@ sub undefined_symbols {
$what =~ s/\xff/\\d+/g;
-
# Special case: IIO ABI which a parenthesis.
$what =~ s/sqrt(.*)/sqrt\(.*\)/;
- $leave =~ s/[\(\)]//g;
-
+ my $leave = get_leave($what);
my $added = 0;
foreach my $l (split /\|/, $leave) {
if (defined($leaf{$l})) {
- next if ($leaf{$l} =~ m/$what/);
- $leaf{$l} .= " " . $what;
+ next if ($leaf{$l} =~ m/\b$what\b/);
+ $leaf{$l} .= "\xac" . $what;
$added = 1;
} else {
$leaf{$l} = $what;
@@ -746,6 +806,11 @@ sub undefined_symbols {
}
}
+ # Take links into account
+ foreach my $link (keys %aliases) {
+ my $abs_file = $aliases{$link};
+ graph_add_link($abs_file, $link);
+ }
check_undefined_symbols;
}
--
2.31.1
The way the search algorithm works is that reduces the number of regex
expressions that will be checked for a given file entry at sysfs. It
does that by looking at the devnode name. For instance, when it checks for
this file:
/sys/bus/pci/drivers/iosf_mbi_pci/bind
The logic will seek only the "What:" expressions that end with "bind".
Currently, there are just a couple of What expressions that matches
it:
What: /sys/bus/fsl\-mc/drivers/.*/bind
What: /sys/bus/pci/drivers/.*/bind
It will then run an O(n²) algorithm to seek, which runs quickly
when there are few regexs to seek. There are, however, some What:
expressions that end with a wildcard. Those are harder to process.
Right now, they're all grouped together at the "others" group.
As those don't depend on the basename of the node, add an extra
loop to ensure that those will be processed at the end, if
not done yet.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index f2b5efef9c30..f25c98b1971e 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -723,6 +723,22 @@ sub check_undefined_symbols {
}
next if ($exact);
+ if ($leave ne "others") {
+ my @expr = @{$leaf{$leave}->{expr}};
+ for (my $i = 0; $i < @names; $i++) {
+ foreach my $re (@expr) {
+ print "$names[$i] =~ /^$re\$/\n" if ($debug && $dbg_undefined);
+ if ($names[$i] =~ $re) {
+ $exact = 1;
+ last;
+ }
+ }
+ last if ($exact);
+ }
+ last if ($exact);
+ }
+ next if ($exact);
+
if ($hint && (!$search_string || $found_string)) {
my $what = $leaf{$leave}->{what};
$what =~ s/\xac/\n\t/g;
--
2.31.1
In order to earn some time during matches, pre-compile regexes.
Before this patch:
$ time ./scripts/get_abi.pl undefined |wc -l
6970
real 0m54,751s
user 0m54,022s
sys 0m0,592s
Afterwards:
$ time ./scripts/get_abi.pl undefined |wc -l
6970
real 0m5,888s
user 0m5,310s
sys 0m0,562s
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 38 +++++++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 9 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index d45e5ba56f9c..f2b5efef9c30 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -25,6 +25,7 @@ my $search_string;
my $dbg_what_parsing = 1;
my $dbg_what_open = 2;
my $dbg_dump_abi_structs = 4;
+my $dbg_undefined = 8;
#
# If true, assumes that the description is formatted with ReST
@@ -692,7 +693,8 @@ sub check_undefined_symbols {
if (!defined($leaf{$leave})) {
$leave = "others";
}
- my $what = $leaf{$leave};
+ my @expr = @{$leaf{$leave}->{expr}};
+ die ("missing rules for $leave") if (!defined($leaf{$leave}));
my $path = $file;
$path =~ s,(.*/).*,$1,;
@@ -702,10 +704,17 @@ sub check_undefined_symbols {
$found_string = 1;
}
- foreach my $a (@names) {
- print "--> $a\n" if ($found_string && $hint);
- foreach my $w (split /\xac/, $what) {
- if ($a =~ m#^$w$#) {
+ for (my $i = 0; $i < @names; $i++) {
+ if ($found_string && $hint) {
+ if (!$i) {
+ print "--> $names[$i]\n";
+ } else {
+ print " $names[$i]\n";
+ }
+ }
+ foreach my $re (@expr) {
+ print "$names[$i] =~ /^$re\$/\n" if ($debug && $dbg_undefined);
+ if ($names[$i] =~ $re) {
$exact = 1;
last;
}
@@ -715,6 +724,7 @@ sub check_undefined_symbols {
next if ($exact);
if ($hint && (!$search_string || $found_string)) {
+ my $what = $leaf{$leave}->{what};
$what =~ s/\xac/\n\t/g;
if ($leave ne "others") {
print " more likely regexes:\n\t$what\n";
@@ -734,7 +744,7 @@ sub undefined_symbols {
no_chdir => 1
}, $sysfs_prefix);
- $leaf{"others"} = "";
+ $leaf{"others"}->{what} = "";
foreach my $w (sort keys %data) {
foreach my $what (split /\xac/,$w) {
@@ -792,14 +802,15 @@ sub undefined_symbols {
$what =~ s/sqrt(.*)/sqrt\(.*\)/;
my $leave = get_leave($what);
+
my $added = 0;
foreach my $l (split /\|/, $leave) {
if (defined($leaf{$l})) {
- next if ($leaf{$l} =~ m/\b$what\b/);
- $leaf{$l} .= "\xac" . $what;
+ next if ($leaf{$l}->{what} =~ m/\b$what\b/);
+ $leaf{$l}->{what} .= "\xac" . $what;
$added = 1;
} else {
- $leaf{$l} = $what;
+ $leaf{$l}->{what} = $what;
$added = 1;
}
}
@@ -809,6 +820,15 @@ sub undefined_symbols {
}
}
+ # Compile regexes
+ foreach my $l (keys %leaf) {
+ my @expr;
+ foreach my $w(split /\xac/, $leaf{$l}->{what}) {
+ push @expr, qr /^$w$/;
+ }
+ $leaf{$l}->{expr} = \@expr;
+ }
+
# Take links into account
foreach my $link (keys %aliases) {
my $abs_file = $aliases{$link};
--
2.31.1
The way sysfs works is that the same leave may be present under
/sys/devices, /sys/bus and /sys/class, etc, linked via soft
symlinks.
To make it harder to parse, the ABI definition usually refers
only to one of those locations.
So, improve the logic in order to retrieve the symlinks.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 207 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 165 insertions(+), 42 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index e714bf75f5c2..a7cb4be6886c 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -8,8 +8,10 @@ use Pod::Usage;
use Getopt::Long;
use File::Find;
use Fcntl ':mode';
+use Cwd 'abs_path';
my $help = 0;
+my $hint = 0;
my $man = 0;
my $debug = 0;
my $enable_lineno = 0;
@@ -28,6 +30,7 @@ GetOptions(
"rst-source!" => \$description_is_rst,
"dir=s" => \$prefix,
'help|?' => \$help,
+ "show-hints" => \$hint,
man => \$man
) or pod2usage(2);
@@ -527,7 +530,7 @@ sub search_symbols {
}
# Exclude /sys/kernel/debug and /sys/kernel/tracing from the search path
-sub skip_debugfs {
+sub dont_parse_special_attributes {
if (($File::Find::dir =~ m,^/sys/kernel,)) {
return grep {!/(debug|tracing)/ } @_;
}
@@ -540,64 +543,178 @@ sub skip_debugfs {
}
my %leaf;
+my %aliases;
+my @files;
-my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xff]) }x;
+my $escape_symbols = qr { ([\x01-\x08\x0e-\x1f\x21-\x29\x2b-\x2d\x3a-\x40\x7b-\xfe]) }x;
sub parse_existing_sysfs {
my $file = $File::Find::name;
+ my $mode = (lstat($file))[2];
+ my $abs_file = abs_path($file);
- my $mode = (stat($file))[2];
- return if ($mode & S_IFDIR);
-
- my $leave = $file;
- $leave =~ s,.*/,,;
-
- if (defined($leaf{$leave})) {
- # FIXME: need to check if the path makes sense
- my $what = $leaf{$leave};
-
- $what =~ s/,/ /g;
-
- $what =~ s/\<[^\>]+\>/.*/g;
- $what =~ s/\{[^\}]+\}/.*/g;
- $what =~ s/\[[^\]]+\]/.*/g;
- $what =~ s,/\.\.\./,/.*/,g;
- $what =~ s,/\*/,/.*/,g;
-
- $what =~ s/\s+/ /g;
-
- # Escape all other symbols
- $what =~ s/$escape_symbols/\\$1/g;
-
- foreach my $i (split / /,$what) {
- if ($file =~ m#^$i$#) {
-# print "$file: $i: OK!\n";
- return;
- }
- }
-
- print "$file: $leave is defined at $what\n";
-
+ if (S_ISLNK($mode)) {
+ $aliases{$file} = $abs_file;
return;
}
- print "$file not found.\n";
+ return if (S_ISDIR($mode));
+
+ # Trivial: file is defined exactly the same way at ABI What:
+ return if (defined($data{$file}));
+ return if (defined($data{$abs_file}));
+
+ push @files, $abs_file;
+}
+
+sub check_undefined_symbols {
+ foreach my $file (sort @files) {
+
+ # sysfs-module is special, as its definitions are inside
+ # a text. For now, just ignore them.
+ next if ($file =~ m#^/sys/module/#);
+
+ # Ignore cgroup and firmware
+ next if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
+
+ my $defined = 0;
+ my $exact = 0;
+ my $whats = "";
+
+ my $leave = $file;
+ $leave =~ s,.*/,,;
+
+ my $path = $file;
+ $path =~ s,(.*/).*,$1,;
+
+ if (defined($leaf{$leave})) {
+ my $what = $leaf{$leave};
+ $whats .= " $what" if (!($whats =~ m/$what/));
+
+ foreach my $w (split / /, $what) {
+ if ($file =~ m#^$w$#) {
+ $exact = 1;
+ last;
+ }
+ }
+ # Check for aliases
+ #
+ # TODO: this algorithm is O(w * n²). It can be
+ # improved in the future in order to handle it
+ # faster, by changing parse_existing_sysfs to
+ # store the sysfs inside a tree, at the expense
+ # on making the code less readable and/or using some
+ # additional perl library.
+ foreach my $a (keys %aliases) {
+ my $new = $aliases{$a};
+ my $len = length($new);
+
+ if (substr($file, 0, $len) eq $new) {
+ my $newf = $a . substr($file, $len);
+
+ foreach my $w (split / /, $what) {
+ if ($newf =~ m#^$w$#) {
+ $exact = 1;
+ last;
+ }
+ }
+ }
+ }
+
+ $defined++;
+ }
+ next if ($exact);
+
+ # Ignore some sysfs nodes
+ next if ($file =~ m#/(sections|notes)/#);
+
+ # Would need to check at
+ # Documentation/admin-guide/kernel-parameters.txt, but this
+ # is not easily parseable.
+ next if ($file =~ m#/parameters/#);
+
+ if ($hint && $defined) {
+ print "$leave at $path might be one of:$whats\n";
+ next;
+ }
+ print "$file not found.\n";
+ }
}
sub undefined_symbols {
+ find({
+ wanted =>\&parse_existing_sysfs,
+ preprocess =>\&dont_parse_special_attributes,
+ no_chdir => 1
+ }, $sysfs_prefix);
+
foreach my $w (sort keys %data) {
foreach my $what (split /\xac /,$w) {
+ next if (!($what =~ m/^$sysfs_prefix/));
+
+ # Convert what into regular expressions
+
+ $what =~ s,/\.\.\./,/*/,g;
+ $what =~ s,\*,.*,g;
+
+ # Temporarily change [0-9]+ type of patterns
+ $what =~ s/\[0\-9\]\+/\xff/g;
+
+ # Temporarily change [\d+-\d+] type of patterns
+ $what =~ s/\[0\-\d+\]/\xff/g;
+ $what =~ s/\[(\d+)\]/\xf4$1\xf5/g;
+
+ # Temporarily change [0-9] type of patterns
+ $what =~ s/\[(\d)\-(\d)\]/\xf4$1-$2\xf5/g;
+
+ # Handle multiple option patterns
+ $what =~ s/[\{\<\[]([\w_]+)(?:[,|]+([\w_]+)){1,}[\}\>\]]/($1|$2)/g;
+
+ # Handle wildcards
+ $what =~ s/\<[^\>]+\>/.*/g;
+ $what =~ s/\{[^\}]+\}/.*/g;
+ $what =~ s/\[[^\]]+\]/.*/g;
+
+ $what =~ s/[XYZ]/.*/g;
+
+ # Recover [0-9] type of patterns
+ $what =~ s/\xf4/[/g;
+ $what =~ s/\xf5/]/g;
+
+ # Remove duplicated spaces
+ $what =~ s/\s+/ /g;
+
+ # Special case: this ABI has a parenthesis on it
+ $what =~ s/sqrt\(x^2\+y^2\+z^2\)/sqrt\(x^2\+y^2\+z^2\)/;
+
+ # Special case: drop comparition as in:
+ # What: foo = <something>
+ # (this happens on a few IIO definitions)
+ $what =~ s,\s*\=.*$,,;
+
my $leave = $what;
$leave =~ s,.*/,,;
- if (defined($leaf{$leave})) {
- $leaf{$leave} .= " " . $what;
- } else {
- $leaf{$leave} = $what;
+ next if ($leave =~ m/^\.\*/ || $leave eq "");
+
+ # Escape all other symbols
+ $what =~ s/$escape_symbols/\\$1/g;
+ $what =~ s/\\\\/\\/g;
+ $what =~ s/\\([\[\]\(\)\|])/$1/g;
+ $what =~ s/(\d+)\\(-\d+)/$1$2/g;
+
+ $leave =~ s/[\(\)]//g;
+
+ foreach my $l (split /\|/, $leave) {
+ if (defined($leaf{$l})) {
+ next if ($leaf{$l} =~ m/$what/);
+ $leaf{$l} .= " " . $what;
+ } else {
+ $leaf{$l} = $what;
+ }
}
}
}
-
- find({wanted =>\&parse_existing_sysfs, preprocess =>\&skip_debugfs, no_chdir => 1}, $sysfs_prefix);
+ check_undefined_symbols;
}
# Ensure that the prefix will always end with a slash
@@ -647,7 +764,8 @@ abi_book.pl - parse the Linux ABI files and produce a ReST book.
=head1 SYNOPSIS
B<abi_book.pl> [--debug] [--enable-lineno] [--man] [--help]
- [--(no-)rst-source] [--dir=<dir>] <COMAND> [<ARGUMENT>]
+ [--(no-)rst-source] [--dir=<dir>] [--show-hints]
+ <COMAND> [<ARGUMENT>]
Where <COMMAND> can be:
@@ -689,6 +807,11 @@ Enable output of #define LINENO lines.
Put the script in verbose mode, useful for debugging. Can be called multiple
times, to increase verbosity.
+=item B<--show-hints>
+
+Show hints about possible definitions for the missing ABI symbols.
+Used only when B<undefined>.
+
=item B<--help>
Prints a brief help message and exits.
--
2.31.1
On Thu, Sep 23, 2021 at 03:29:58PM +0200, Mauro Carvalho Chehab wrote:
> Hi Greg,
>
> It follows a series of improvements for get_abi.pl. it is on the top of next-20210923.
Hm, looks like I hadn't pushed my -testing tree out so that it will show
up in linux-next yet, so we got a bunch of conflicts here.
I've done so now, can you rebase against my tree and resend? I think
only 4 patches are new here.
thanks,
greg k-h
Hi Greg,
As requested, this is exactly the same changes, rebased on the top of
driver-core/driver-core-next.
-
It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
With such changes, on my development tree, the script is taking 6 seconds to run
on my desktop:
$ !1076
$ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
real 0m6,292s
user 0m5,640s
sys 0m0,634s
6838 undefined_after
808 undefined_symbols
7646 total
And 7 seconds on a Dell Precision 5820:
$ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
real 0m7.162s
user 0m5.836s
sys 0m1.329s
6548 undefined
772 undefined_symbols
Both tests were done against this tree (based on today's linux-next):
$ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
It should be noticed that, as my tree has several ABI fixes, the time to run the
script is likely less than if you run on your tree, as there will be less symbols to
be reported, and the algorithm is optimized to reduce the number of regexes
when a symbol is found.
Besides optimizing and improving the seek logic, this series also change the
debug logic. It how receives a bitmap, where "8" means to print the regexes
that will be used by "undefined" command:
$ time ./scripts/get_abi.pl undefined --debug 8 >foo
real 0m17,189s
user 0m13,940s
sys 0m2,404s
$wc -l foo
18421939 foo
$ cat foo
...
/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
/sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
...
On other words, on my desktop, the /sys match is performing >18M regular
expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is
enabled and sent to an area on my nvme storage).
Regards,
Mauro
Mauro Carvalho Chehab (8):
scripts: get_abi.pl: Fix get_abi.pl search output
scripts: get_abi.pl: call get_leave() a little late
scripts: get_abi.pl: improve debug logic
scripts: get_abi.pl: Better handle leaves with wildcards
scripts: get_abi.pl: ignore some sysfs nodes earlier
scripts: get_abi.pl: stop check loop earlier when regex is found
scripts: get_abi.pl: precompile what match regexes
scripts: get_abi.pl: ensure that "others" regex will be parsed
scripts/get_abi.pl | 109 +++++++++++++++++++++++++++++++--------------
1 file changed, 76 insertions(+), 33 deletions(-)
--
2.31.1
Add a level for debug, in order to allow it to be extended to
debug other parts of the script.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 9eb8a033d363..bb80303fea22 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -9,6 +9,7 @@ use Getopt::Long;
use File::Find;
use Fcntl ':mode';
use Cwd 'abs_path';
+use Data::Dumper;
my $help = 0;
my $hint = 0;
@@ -20,13 +21,18 @@ my $prefix="Documentation/ABI";
my $sysfs_prefix="/sys";
my $search_string;
+# Debug options
+my $dbg_what_parsing = 1;
+my $dbg_what_open = 2;
+my $dbg_dump_abi_structs = 4;
+
#
# If true, assumes that the description is formatted with ReST
#
my $description_is_rst = 1;
GetOptions(
- "debug|d+" => \$debug,
+ "debug=i" => \$debug,
"enable-lineno" => \$enable_lineno,
"rst-source!" => \$description_is_rst,
"dir=s" => \$prefix,
@@ -46,7 +52,7 @@ my ($cmd, $arg) = @ARGV;
pod2usage(2) if ($cmd ne "search" && $cmd ne "rest" && $cmd ne "validate" && $cmd ne "undefined");
pod2usage(2) if ($cmd eq "search" && !$arg);
-require Data::Dumper if ($debug);
+require Data::Dumper if ($debug & $dbg_dump_abi_structs);
my %data;
my %symbols;
@@ -106,7 +112,7 @@ sub parse_abi {
my @labels;
my $label = "";
- print STDERR "Opening $file\n" if ($debug > 1);
+ print STDERR "Opening $file\n" if ($debug & $dbg_what_open);
open IN, $file;
while(<IN>) {
$ln++;
@@ -178,7 +184,7 @@ sub parse_abi {
$data{$what}->{filepath} .= " " . $file;
}
}
- print STDERR "\twhat: $what\n" if ($debug > 1);
+ print STDERR "\twhat: $what\n" if ($debug & $dbg_what_parsing);
$data{$what}->{line_no} = $ln;
} else {
$data{$what}->{line_no} = $ln if (!defined($data{$what}->{line_no}));
@@ -827,7 +833,7 @@ if ($cmd eq "undefined" || $cmd eq "search") {
#
find({wanted =>\&parse_abi, no_chdir => 1}, $prefix);
-print STDERR Data::Dumper->Dump([\%data], [qw(*data)]) if ($debug);
+print STDERR Data::Dumper->Dump([\%data], [qw(*data)]) if ($debug & $dbg_dump_abi_structs);
#
# Handles the command
@@ -860,7 +866,7 @@ abi_book.pl - parse the Linux ABI files and produce a ReST book.
=head1 SYNOPSIS
-B<abi_book.pl> [--debug] [--enable-lineno] [--man] [--help]
+B<abi_book.pl> [--debug <level>] [--enable-lineno] [--man] [--help]
[--(no-)rst-source] [--dir=<dir>] [--show-hints]
[--search-string <regex>]
<COMAND> [<ARGUMENT>]
@@ -900,10 +906,14 @@ logic (--no-rst-source).
Enable output of #define LINENO lines.
-=item B<--debug>
+=item B<--debug> I<debug level>
-Put the script in verbose mode, useful for debugging. Can be called multiple
-times, to increase verbosity.
+Print debug information according with the level, which is given by the
+following bitmask:
+
+ - 1: Debug parsing What entries from ABI files;
+ - 2: Shows what files are opened from ABI files;
+ - 4: Dump the structs used to store the contents of the ABI files.
=item B<--show-hints>
--
2.31.1
When the the leaf of a regex ends with a wildcard, the speedup
algorithm to reduce the number of regexes to seek won't work.
So, when those are found, place at the "others" exception.
That slows down the search from 0.14s to 1 minute on my
machine, but the results are a lot more consistent.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index bb80303fea22..3c0063d0e05e 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -665,7 +665,7 @@ sub get_leave($)
# However, there are a few occurences where the leave is
# either a wildcard or a number. Just group such cases
# altogether.
- if ($leave =~ m/^\.\*/ || $leave eq "" || $leave =~ /^\d+$/) {
+ if ($leave =~ m/\.\*/ || $leave eq "" || $leave =~ /\\d/) {
$leave = "others";
}
--
2.31.1
Currently, the get_abi.pl will print an invalid symbol
(\xac character). Fix it.
Fixes: ab9c14805b37 ("scripts: get_abi.pl: Better handle multiple What parameters")
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 1 +
1 file changed, 1 insertion(+)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index c52a1cf0f49d..65261f464e25 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -501,6 +501,7 @@ sub search_symbols {
my $file = $data{$what}->{filepath};
+ $what =~ s/\xac/, /g;
my $bar = $what;
$bar =~ s/./-/g;
--
2.31.1
The $what conversions need to replace some characters to avoid
breaking regex expressions found on some What:.
only after replacing them back, the script should get the
$leave devnode.
Fixes: ca8e055c2215 ("scripts: get_abi.pl: add a graph to speedup the undefined algorithm")
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 65261f464e25..9eb8a033d363 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -777,8 +777,6 @@ sub undefined_symbols {
# (this happens on a few IIO definitions)
$what =~ s,\s*\=.*$,,;
- my $leave = get_leave($what);
-
# Escape all other symbols
$what =~ s/$escape_symbols/\\$1/g;
$what =~ s/\\\\/\\/g;
@@ -790,6 +788,7 @@ sub undefined_symbols {
# Special case: IIO ABI which a parenthesis.
$what =~ s/sqrt(.*)/sqrt\(.*\)/;
+ my $leave = get_leave($what);
my $added = 0;
foreach my $l (split /\|/, $leave) {
if (defined($leaf{$l})) {
--
2.31.1
When checking for undefined symbols, some nodes aren't easy
or don't make sense to be checked right now. Prevent allocating
memory for those, as they'll be ignored anyway.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 3c0063d0e05e..42eb16eb78e9 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -628,6 +628,14 @@ sub parse_existing_sysfs {
# Ignore cgroup and firmware
return if ($file =~ m#^/sys/(fs/cgroup|firmware)/#);
+ # Ignore some sysfs nodes
+ return if ($file =~ m#/(sections|notes)/#);
+
+ # Would need to check at
+ # Documentation/admin-guide/kernel-parameters.txt, but this
+ # is not easily parseable.
+ return if ($file =~ m#/parameters/#);
+
my $mode = (lstat($file))[2];
my $abs_file = abs_path($file);
@@ -709,14 +717,6 @@ sub check_undefined_symbols {
next if ($exact);
- # Ignore some sysfs nodes
- next if ($file =~ m#/(sections|notes)/#);
-
- # Would need to check at
- # Documentation/admin-guide/kernel-parameters.txt, but this
- # is not easily parseable.
- next if ($file =~ m#/parameters/#);
-
if ($hint && $defined && (!$search_string || $found_string)) {
$what =~ s/\xac/\n\t/g;
if ($leave ne "others") {
--
2.31.1
Right now, there are two loops used to seek for a regex. Make
sure that both will be skip when a match is found.
While here, drop the unused $defined variable.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index 42eb16eb78e9..d45e5ba56f9c 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -685,7 +685,6 @@ sub check_undefined_symbols {
my @names = @{$$file_ref{"__name"}};
my $file = $names[0];
- my $defined = 0;
my $exact = 0;
my $found_string;
@@ -711,13 +710,11 @@ sub check_undefined_symbols {
last;
}
}
+ last if ($exact);
}
-
- $defined++;
-
next if ($exact);
- if ($hint && $defined && (!$search_string || $found_string)) {
+ if ($hint && (!$search_string || $found_string)) {
$what =~ s/\xac/\n\t/g;
if ($leave ne "others") {
print " more likely regexes:\n\t$what\n";
--
2.31.1
The way the search algorithm works is that reduces the number of regex
expressions that will be checked for a given file entry at sysfs. It
does that by looking at the devnode name. For instance, when it checks for
this file:
/sys/bus/pci/drivers/iosf_mbi_pci/bind
The logic will seek only the "What:" expressions that end with "bind".
Currently, there are just a couple of What expressions that matches
it:
What: /sys/bus/fsl\-mc/drivers/.*/bind
What: /sys/bus/pci/drivers/.*/bind
It will then run an O(n²) algorithm to seek, which runs quickly
when there are few regexs to seek. There are, however, some What:
expressions that end with a wildcard. Those are harder to process.
Right now, they're all grouped together at the "others" group.
As those don't depend on the basename of the node, add an extra
loop to ensure that those will be processed at the end, if
not done yet.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index f2b5efef9c30..f25c98b1971e 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -723,6 +723,22 @@ sub check_undefined_symbols {
}
next if ($exact);
+ if ($leave ne "others") {
+ my @expr = @{$leaf{$leave}->{expr}};
+ for (my $i = 0; $i < @names; $i++) {
+ foreach my $re (@expr) {
+ print "$names[$i] =~ /^$re\$/\n" if ($debug && $dbg_undefined);
+ if ($names[$i] =~ $re) {
+ $exact = 1;
+ last;
+ }
+ }
+ last if ($exact);
+ }
+ last if ($exact);
+ }
+ next if ($exact);
+
if ($hint && (!$search_string || $found_string)) {
my $what = $leaf{$leave}->{what};
$what =~ s/\xac/\n\t/g;
--
2.31.1
In order to earn some time during matches, pre-compile regexes.
Before this patch:
$ time ./scripts/get_abi.pl undefined |wc -l
6970
real 0m54,751s
user 0m54,022s
sys 0m0,592s
Afterwards:
$ time ./scripts/get_abi.pl undefined |wc -l
6970
real 0m5,888s
user 0m5,310s
sys 0m0,562s
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
scripts/get_abi.pl | 38 +++++++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 9 deletions(-)
diff --git a/scripts/get_abi.pl b/scripts/get_abi.pl
index d45e5ba56f9c..f2b5efef9c30 100755
--- a/scripts/get_abi.pl
+++ b/scripts/get_abi.pl
@@ -25,6 +25,7 @@ my $search_string;
my $dbg_what_parsing = 1;
my $dbg_what_open = 2;
my $dbg_dump_abi_structs = 4;
+my $dbg_undefined = 8;
#
# If true, assumes that the description is formatted with ReST
@@ -692,7 +693,8 @@ sub check_undefined_symbols {
if (!defined($leaf{$leave})) {
$leave = "others";
}
- my $what = $leaf{$leave};
+ my @expr = @{$leaf{$leave}->{expr}};
+ die ("missing rules for $leave") if (!defined($leaf{$leave}));
my $path = $file;
$path =~ s,(.*/).*,$1,;
@@ -702,10 +704,17 @@ sub check_undefined_symbols {
$found_string = 1;
}
- foreach my $a (@names) {
- print "--> $a\n" if ($found_string && $hint);
- foreach my $w (split /\xac/, $what) {
- if ($a =~ m#^$w$#) {
+ for (my $i = 0; $i < @names; $i++) {
+ if ($found_string && $hint) {
+ if (!$i) {
+ print "--> $names[$i]\n";
+ } else {
+ print " $names[$i]\n";
+ }
+ }
+ foreach my $re (@expr) {
+ print "$names[$i] =~ /^$re\$/\n" if ($debug && $dbg_undefined);
+ if ($names[$i] =~ $re) {
$exact = 1;
last;
}
@@ -715,6 +724,7 @@ sub check_undefined_symbols {
next if ($exact);
if ($hint && (!$search_string || $found_string)) {
+ my $what = $leaf{$leave}->{what};
$what =~ s/\xac/\n\t/g;
if ($leave ne "others") {
print " more likely regexes:\n\t$what\n";
@@ -734,7 +744,7 @@ sub undefined_symbols {
no_chdir => 1
}, $sysfs_prefix);
- $leaf{"others"} = "";
+ $leaf{"others"}->{what} = "";
foreach my $w (sort keys %data) {
foreach my $what (split /\xac/,$w) {
@@ -792,14 +802,15 @@ sub undefined_symbols {
$what =~ s/sqrt(.*)/sqrt\(.*\)/;
my $leave = get_leave($what);
+
my $added = 0;
foreach my $l (split /\|/, $leave) {
if (defined($leaf{$l})) {
- next if ($leaf{$l} =~ m/\b$what\b/);
- $leaf{$l} .= "\xac" . $what;
+ next if ($leaf{$l}->{what} =~ m/\b$what\b/);
+ $leaf{$l}->{what} .= "\xac" . $what;
$added = 1;
} else {
- $leaf{$l} = $what;
+ $leaf{$l}->{what} = $what;
$added = 1;
}
}
@@ -809,6 +820,15 @@ sub undefined_symbols {
}
}
+ # Compile regexes
+ foreach my $l (keys %leaf) {
+ my @expr;
+ foreach my $w(split /\xac/, $leaf{$l}->{what}) {
+ push @expr, qr /^$w$/;
+ }
+ $leaf{$l}->{expr} = \@expr;
+ }
+
# Take links into account
foreach my $link (keys %aliases) {
my $abs_file = $aliases{$link};
--
2.31.1
On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:
> Hi Greg,
>
> As requested, this is exactly the same changes, rebased on the top of
> driver-core/driver-core-next.
>
> -
>
> It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
>
> With such changes, on my development tree, the script is taking 6 seconds to run
> on my desktop:
>
> $ !1076
> $ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
>
> real 0m6,292s
> user 0m5,640s
> sys 0m0,634s
> 6838 undefined_after
> 808 undefined_symbols
> 7646 total
>
> And 7 seconds on a Dell Precision 5820:
>
> $ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
>
> real 0m7.162s
> user 0m5.836s
> sys 0m1.329s
> 6548 undefined
> 772 undefined_symbols
>
> Both tests were done against this tree (based on today's linux-next):
>
> $ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
>
> It should be noticed that, as my tree has several ABI fixes, the time to run the
> script is likely less than if you run on your tree, as there will be less symbols to
> be reported, and the algorithm is optimized to reduce the number of regexes
> when a symbol is found.
>
> Besides optimizing and improving the seek logic, this series also change the
> debug logic. It how receives a bitmap, where "8" means to print the regexes
> that will be used by "undefined" command:
>
> $ time ./scripts/get_abi.pl undefined --debug 8 >foo
> real 0m17,189s
> user 0m13,940s
> sys 0m2,404s
>
> $wc -l foo
> 18421939 foo
>
> $ cat foo
> ...
> /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> ...
>
> On other words, on my desktop, the /sys match is performing >18M regular
> expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is
> enabled and sent to an area on my nvme storage).
Better, it's down to 10 minutes on my machine now:
real 10m39.218s
user 10m37.742s
sys 0m0.775s
thanks!
greg k-h
Em Thu, 23 Sep 2021 19:13:04 +0200
Greg Kroah-Hartman <[email protected]> escreveu:
> On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:
> > Hi Greg,
> >
> > As requested, this is exactly the same changes, rebased on the top of
> > driver-core/driver-core-next.
> >
> > -
> >
> > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> >
> > With such changes, on my development tree, the script is taking 6 seconds to run
> > on my desktop:
> >
> > $ !1076
> > $ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> >
> > real 0m6,292s
> > user 0m5,640s
> > sys 0m0,634s
> > 6838 undefined_after
> > 808 undefined_symbols
> > 7646 total
> >
> > And 7 seconds on a Dell Precision 5820:
> >
> > $ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> >
> > real 0m7.162s
> > user 0m5.836s
> > sys 0m1.329s
> > 6548 undefined
> > 772 undefined_symbols
> >
> > Both tests were done against this tree (based on today's linux-next):
> >
> > $ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> >
> > It should be noticed that, as my tree has several ABI fixes, the time to run the
> > script is likely less than if you run on your tree, as there will be less symbols to
> > be reported, and the algorithm is optimized to reduce the number of regexes
> > when a symbol is found.
> >
> > Besides optimizing and improving the seek logic, this series also change the
> > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > that will be used by "undefined" command:
> >
> > $ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > real 0m17,189s
> > user 0m13,940s
> > sys 0m2,404s
> >
> > $wc -l foo
> > 18421939 foo
> >
> > $ cat foo
> > ...
> > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > ...
> >
> > On other words, on my desktop, the /sys match is performing >18M regular
> > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is
> > enabled and sent to an area on my nvme storage).
>
> Better, it's down to 10 minutes on my machine now:
>
> real 10m39.218s
> user 10m37.742s
> sys 0m0.775s
A lot better, but not clear why it is still taking ~40x more than here...
It could well be due to the other ABI changes yet to be applied
(I'll submit it probably later today), but it could also be related to
something else. Could this be due to disk writes?
Thanks,
Mauro
On Mon, Sep 27, 2021 at 10:55:53AM +0200, Mauro Carvalho Chehab wrote:
> Em Thu, 23 Sep 2021 19:13:04 +0200
> Greg Kroah-Hartman <[email protected]> escreveu:
>
> > On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:
> > > Hi Greg,
> > >
> > > As requested, this is exactly the same changes, rebased on the top of
> > > driver-core/driver-core-next.
> > >
> > > -
> > >
> > > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> > >
> > > With such changes, on my development tree, the script is taking 6 seconds to run
> > > on my desktop:
> > >
> > > $ !1076
> > > $ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> > >
> > > real 0m6,292s
> > > user 0m5,640s
> > > sys 0m0,634s
> > > 6838 undefined_after
> > > 808 undefined_symbols
> > > 7646 total
> > >
> > > And 7 seconds on a Dell Precision 5820:
> > >
> > > $ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > >
> > > real 0m7.162s
> > > user 0m5.836s
> > > sys 0m1.329s
> > > 6548 undefined
> > > 772 undefined_symbols
> > >
> > > Both tests were done against this tree (based on today's linux-next):
> > >
> > > $ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> > >
> > > It should be noticed that, as my tree has several ABI fixes, the time to run the
> > > script is likely less than if you run on your tree, as there will be less symbols to
> > > be reported, and the algorithm is optimized to reduce the number of regexes
> > > when a symbol is found.
> > >
> > > Besides optimizing and improving the seek logic, this series also change the
> > > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > > that will be used by "undefined" command:
> > >
> > > $ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > > real 0m17,189s
> > > user 0m13,940s
> > > sys 0m2,404s
> > >
> > > $wc -l foo
> > > 18421939 foo
> > >
> > > $ cat foo
> > > ...
> > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > > ...
> > >
> > > On other words, on my desktop, the /sys match is performing >18M regular
> > > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is
> > > enabled and sent to an area on my nvme storage).
> >
> > Better, it's down to 10 minutes on my machine now:
> >
> > real 10m39.218s
> > user 10m37.742s
> > sys 0m0.775s
>
> A lot better, but not clear why it is still taking ~40x more than here...
> It could well be due to the other ABI changes yet to be applied
> (I'll submit it probably later today), but it could also be related to
> something else. Could this be due to disk writes?
Disk writes to where for what? This is a very fast disk (nvme raid
array) It's also a very "big" system, with lots of sysfs files:
$ find /sys/devices/ -type f | wc -l
44334
compared to my laptop that only has 17k entries in /sys/devices/
I'll run this updated script on my laptop later today and give you some
numbers. And any Documentation/ABI/ updates you might have I'll gladly
take as well.
thanks,
greg k-h
Em Mon, 27 Sep 2021 11:23:20 +0200
Greg Kroah-Hartman <[email protected]> escreveu:
> On Mon, Sep 27, 2021 at 10:55:53AM +0200, Mauro Carvalho Chehab wrote:
> > Em Thu, 23 Sep 2021 19:13:04 +0200
> > Greg Kroah-Hartman <[email protected]> escreveu:
> >
> > > On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:
> > > > Hi Greg,
> > > >
> > > > As requested, this is exactly the same changes, rebased on the top of
> > > > driver-core/driver-core-next.
> > > >
> > > > -
> > > >
> > > > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> > > >
> > > > With such changes, on my development tree, the script is taking 6 seconds to run
> > > > on my desktop:
> > > >
> > > > $ !1076
> > > > $ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> > > >
> > > > real 0m6,292s
> > > > user 0m5,640s
> > > > sys 0m0,634s
> > > > 6838 undefined_after
> > > > 808 undefined_symbols
> > > > 7646 total
> > > >
> > > > And 7 seconds on a Dell Precision 5820:
> > > >
> > > > $ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > >
> > > > real 0m7.162s
> > > > user 0m5.836s
> > > > sys 0m1.329s
> > > > 6548 undefined
> > > > 772 undefined_symbols
> > > >
> > > > Both tests were done against this tree (based on today's linux-next):
> > > >
> > > > $ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> > > >
> > > > It should be noticed that, as my tree has several ABI fixes, the time to run the
> > > > script is likely less than if you run on your tree, as there will be less symbols to
> > > > be reported, and the algorithm is optimized to reduce the number of regexes
> > > > when a symbol is found.
> > > >
> > > > Besides optimizing and improving the seek logic, this series also change the
> > > > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > > > that will be used by "undefined" command:
> > > >
> > > > $ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > > > real 0m17,189s
> > > > user 0m13,940s
> > > > sys 0m2,404s
> > > >
> > > > $wc -l foo
> > > > 18421939 foo
> > > >
> > > > $ cat foo
> > > > ...
> > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > > > ...
> > > >
> > > > On other words, on my desktop, the /sys match is performing >18M regular
> > > > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is
> > > > enabled and sent to an area on my nvme storage).
> > >
> > > Better, it's down to 10 minutes on my machine now:
> > >
> > > real 10m39.218s
> > > user 10m37.742s
> > > sys 0m0.775s
> >
> > A lot better, but not clear why it is still taking ~40x more than here...
> > It could well be due to the other ABI changes yet to be applied
> > (I'll submit it probably later today), but it could also be related to
> > something else. Could this be due to disk writes?
>
> Disk writes to where for what? This is a very fast disk (nvme raid
> array) It's also a very "big" system, with lots of sysfs files:
>
> $ find /sys/devices/ -type f | wc -l
> 44334
Ok. Maybe that partially explains why it is taking so long, as the
number of regex to compare will increase (not linearly).
> compared to my laptop that only has 17k entries in /sys/devices/
>
> I'll run this updated script on my laptop later today and give you some
> numbers.
Ok, thanks!
> And any Documentation/ABI/ updates you might have I'll gladly
> take as well.
I'll be submitting it soon enough. Got sidetracked by a regression
on my INBOX due to a fetchmail regression[1].
> thanks,
>
> greg k-h
[1] https://gitlab.com/fetchmail/fetchmail/-/issues/39
Thanks,
Mauro
On Mon, Sep 27, 2021 at 03:39:42PM +0200, Mauro Carvalho Chehab wrote:
> Em Mon, 27 Sep 2021 11:23:20 +0200
> Greg Kroah-Hartman <[email protected]> escreveu:
>
> > On Mon, Sep 27, 2021 at 10:55:53AM +0200, Mauro Carvalho Chehab wrote:
> > > Em Thu, 23 Sep 2021 19:13:04 +0200
> > > Greg Kroah-Hartman <[email protected]> escreveu:
> > >
> > > > On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:
> > > > > Hi Greg,
> > > > >
> > > > > As requested, this is exactly the same changes, rebased on the top of
> > > > > driver-core/driver-core-next.
> > > > >
> > > > > -
> > > > >
> > > > > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> > > > >
> > > > > With such changes, on my development tree, the script is taking 6 seconds to run
> > > > > on my desktop:
> > > > >
> > > > > $ !1076
> > > > > $ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> > > > >
> > > > > real 0m6,292s
> > > > > user 0m5,640s
> > > > > sys 0m0,634s
> > > > > 6838 undefined_after
> > > > > 808 undefined_symbols
> > > > > 7646 total
> > > > >
> > > > > And 7 seconds on a Dell Precision 5820:
> > > > >
> > > > > $ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > > >
> > > > > real 0m7.162s
> > > > > user 0m5.836s
> > > > > sys 0m1.329s
> > > > > 6548 undefined
> > > > > 772 undefined_symbols
> > > > >
> > > > > Both tests were done against this tree (based on today's linux-next):
> > > > >
> > > > > $ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> > > > >
> > > > > It should be noticed that, as my tree has several ABI fixes, the time to run the
> > > > > script is likely less than if you run on your tree, as there will be less symbols to
> > > > > be reported, and the algorithm is optimized to reduce the number of regexes
> > > > > when a symbol is found.
> > > > >
> > > > > Besides optimizing and improving the seek logic, this series also change the
> > > > > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > > > > that will be used by "undefined" command:
> > > > >
> > > > > $ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > > > > real 0m17,189s
> > > > > user 0m13,940s
> > > > > sys 0m2,404s
> > > > >
> > > > > $wc -l foo
> > > > > 18421939 foo
> > > > >
> > > > > $ cat foo
> > > > > ...
> > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > > > > ...
> > > > >
> > > > > On other words, on my desktop, the /sys match is performing >18M regular
> > > > > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is
> > > > > enabled and sent to an area on my nvme storage).
> > > >
> > > > Better, it's down to 10 minutes on my machine now:
> > > >
> > > > real 10m39.218s
> > > > user 10m37.742s
> > > > sys 0m0.775s
> > >
> > > A lot better, but not clear why it is still taking ~40x more than here...
> > > It could well be due to the other ABI changes yet to be applied
> > > (I'll submit it probably later today), but it could also be related to
> > > something else. Could this be due to disk writes?
> >
> > Disk writes to where for what? This is a very fast disk (nvme raid
> > array) It's also a very "big" system, with lots of sysfs files:
> >
> > $ find /sys/devices/ -type f | wc -l
> > 44334
>
> Ok. Maybe that partially explains why it is taking so long, as the
> number of regex to compare will increase (not linearly).
No idea. I just ran it on my laptop and it took only 5 seconds.
Hm, you aren't reading the values of the sysfs files, right?
Anything I can do to run to help figure out where the script is taking
so long?
> > And any Documentation/ABI/ updates you might have I'll gladly
> > take as well.
>
> I'll be submitting it soon enough. Got sidetracked by a regression
> on my INBOX due to a fetchmail regression[1].
Ick, fetchmail. I recommend getmail instead, much more robust and a
sane maintainer :)
I'll take a look at those patches now.
thanks,
greg k-h
Em Mon, 27 Sep 2021 17:48:05 +0200
Greg Kroah-Hartman <[email protected]> escreveu:
> On Mon, Sep 27, 2021 at 03:39:42PM +0200, Mauro Carvalho Chehab wrote:
> > Em Mon, 27 Sep 2021 11:23:20 +0200
> > Greg Kroah-Hartman <[email protected]> escreveu:
> >
> > > On Mon, Sep 27, 2021 at 10:55:53AM +0200, Mauro Carvalho Chehab wrote:
> > > > Em Thu, 23 Sep 2021 19:13:04 +0200
> > > > Greg Kroah-Hartman <[email protected]> escreveu:
> > > >
> > > > > On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:
> > > > > > Hi Greg,
> > > > > >
> > > > > > As requested, this is exactly the same changes, rebased on the top of
> > > > > > driver-core/driver-core-next.
> > > > > >
> > > > > > -
> > > > > >
> > > > > > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> > > > > >
> > > > > > With such changes, on my development tree, the script is taking 6 seconds to run
> > > > > > on my desktop:
> > > > > >
> > > > > > $ !1076
> > > > > > $ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> > > > > >
> > > > > > real 0m6,292s
> > > > > > user 0m5,640s
> > > > > > sys 0m0,634s
> > > > > > 6838 undefined_after
> > > > > > 808 undefined_symbols
> > > > > > 7646 total
> > > > > >
> > > > > > And 7 seconds on a Dell Precision 5820:
> > > > > >
> > > > > > $ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > > > >
> > > > > > real 0m7.162s
> > > > > > user 0m5.836s
> > > > > > sys 0m1.329s
> > > > > > 6548 undefined
> > > > > > 772 undefined_symbols
> > > > > >
> > > > > > Both tests were done against this tree (based on today's linux-next):
> > > > > >
> > > > > > $ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> > > > > >
> > > > > > It should be noticed that, as my tree has several ABI fixes, the time to run the
> > > > > > script is likely less than if you run on your tree, as there will be less symbols to
> > > > > > be reported, and the algorithm is optimized to reduce the number of regexes
> > > > > > when a symbol is found.
> > > > > >
> > > > > > Besides optimizing and improving the seek logic, this series also change the
> > > > > > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > > > > > that will be used by "undefined" command:
> > > > > >
> > > > > > $ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > > > > > real 0m17,189s
> > > > > > user 0m13,940s
> > > > > > sys 0m2,404s
> > > > > >
> > > > > > $wc -l foo
> > > > > > 18421939 foo
> > > > > >
> > > > > > $ cat foo
> > > > > > ...
> > > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > > > > > ...
> > > > > >
> > > > > > On other words, on my desktop, the /sys match is performing >18M regular
> > > > > > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is
> > > > > > enabled and sent to an area on my nvme storage).
> > > > >
> > > > > Better, it's down to 10 minutes on my machine now:
> > > > >
> > > > > real 10m39.218s
> > > > > user 10m37.742s
> > > > > sys 0m0.775s
> > > >
> > > > A lot better, but not clear why it is still taking ~40x more than here...
> > > > It could well be due to the other ABI changes yet to be applied
> > > > (I'll submit it probably later today), but it could also be related to
> > > > something else. Could this be due to disk writes?
> > >
> > > Disk writes to where for what? This is a very fast disk (nvme raid
> > > array) It's also a very "big" system, with lots of sysfs files:
> > >
> > > $ find /sys/devices/ -type f | wc -l
> > > 44334
> >
> > Ok. Maybe that partially explains why it is taking so long, as the
> > number of regex to compare will increase (not linearly).
>
> No idea. I just ran it on my laptop and it took only 5 seconds.
Ok, 5 seconds is similar to what I got here on the machines I
tested so far. I'm waiting for a (shared) big machine to be available
in order to be able to do some tests on it.
> Hm, you aren't reading the values of the sysfs files, right?
No. Just retrieving the directory contents. That part is actually
fast: it takes less than 2 seconds here to read all ABI + traverse
sysfs directories. Also, from your past logs, the time is spent
later on, when it is handling the regex. On that time, there are
just the regex parsing and printing the results.
> Anything I can do to run to help figure out where the script is taking
> so long?
Not sure if it is worth the efforts. I mean, the relationship
between the number of processed sysfs nodes and the number of regex
to be tested (using big-oh and big-omega notation) should be between
Ω(n . log(n)) and O(n^2 . log(n)). There's not much space left for
optimizing it, I guess.
So, I would expect that a big server would take a log more time to
process, it, due to the larger number of sysfs entries.
Also, if one wants to speedup on a big machine, it could either
exclude some pattern, like:
# Won't parse any PCI devices
$time ./scripts/get_abi.pl undefined --search-string '^(?!.*pci)' |wc -l
8438
real 0m3,494s
user 0m2,829s
sys 0m0,658s
or (more likely) just search for an specific part of the ABI:
# Seek ABI only for PCI devices
$ ./scripts/get_abi.pl undefined --search-string pci
---
After sleeping on it, I opted to implement some progress information.
That will help to identify any issues that might be causing the
script to take so long to finish.
I'll send the patches on a new series.
>
> > > And any Documentation/ABI/ updates you might have I'll gladly
> > > take as well.
> >
> > I'll be submitting it soon enough. Got sidetracked by a regression
> > on my INBOX due to a fetchmail regression[1].
>
> Ick, fetchmail. I recommend getmail instead, much more robust and a
> sane maintainer :)
Hmm... interesting. Never tried getmail. I guess I'll give it a
try. It is a shame that Fedora doesn't package it yet.
>
> I'll take a look at those patches now.
>
> thanks,
>
> greg k-h
Thanks,
Mauro
On Tue, Sep 28, 2021 at 12:03:04PM +0200, Mauro Carvalho Chehab wrote:
> Em Mon, 27 Sep 2021 17:48:05 +0200
> Greg Kroah-Hartman <[email protected]> escreveu:
>
> > On Mon, Sep 27, 2021 at 03:39:42PM +0200, Mauro Carvalho Chehab wrote:
> > > Em Mon, 27 Sep 2021 11:23:20 +0200
> > > Greg Kroah-Hartman <[email protected]> escreveu:
> > >
> > > > On Mon, Sep 27, 2021 at 10:55:53AM +0200, Mauro Carvalho Chehab wrote:
> > > > > Em Thu, 23 Sep 2021 19:13:04 +0200
> > > > > Greg Kroah-Hartman <[email protected]> escreveu:
> > > > >
> > > > > > On Thu, Sep 23, 2021 at 05:41:11PM +0200, Mauro Carvalho Chehab wrote:
> > > > > > > Hi Greg,
> > > > > > >
> > > > > > > As requested, this is exactly the same changes, rebased on the top of
> > > > > > > driver-core/driver-core-next.
> > > > > > >
> > > > > > > -
> > > > > > >
> > > > > > > It follows a series of improvements for get_abi.pl. it is on the top of driver-core/driver-core-next.
> > > > > > >
> > > > > > > With such changes, on my development tree, the script is taking 6 seconds to run
> > > > > > > on my desktop:
> > > > > > >
> > > > > > > $ !1076
> > > > > > > $ time ./scripts/get_abi.pl undefined |sort >undefined_after && cat undefined_after| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined_after undefined_symbols
> > > > > > >
> > > > > > > real 0m6,292s
> > > > > > > user 0m5,640s
> > > > > > > sys 0m0,634s
> > > > > > > 6838 undefined_after
> > > > > > > 808 undefined_symbols
> > > > > > > 7646 total
> > > > > > >
> > > > > > > And 7 seconds on a Dell Precision 5820:
> > > > > > >
> > > > > > > $ time ./scripts/get_abi.pl undefined |sort >undefined && cat undefined| perl -ne 'print "$1\n" if (m#.*/(\S+) not found#)'|sort|uniq -c|sort -nr >undefined_symbols; wc -l undefined; wc -l undefined_symbols
> > > > > > >
> > > > > > > real 0m7.162s
> > > > > > > user 0m5.836s
> > > > > > > sys 0m1.329s
> > > > > > > 6548 undefined
> > > > > > > 772 undefined_symbols
> > > > > > >
> > > > > > > Both tests were done against this tree (based on today's linux-next):
> > > > > > >
> > > > > > > $ https://git.kernel.org/pub/scm/linux/kernel/git/mchehab/devel.git/log/?h=get_abi_undefined-latest
> > > > > > >
> > > > > > > It should be noticed that, as my tree has several ABI fixes, the time to run the
> > > > > > > script is likely less than if you run on your tree, as there will be less symbols to
> > > > > > > be reported, and the algorithm is optimized to reduce the number of regexes
> > > > > > > when a symbol is found.
> > > > > > >
> > > > > > > Besides optimizing and improving the seek logic, this series also change the
> > > > > > > debug logic. It how receives a bitmap, where "8" means to print the regexes
> > > > > > > that will be used by "undefined" command:
> > > > > > >
> > > > > > > $ time ./scripts/get_abi.pl undefined --debug 8 >foo
> > > > > > > real 0m17,189s
> > > > > > > user 0m13,940s
> > > > > > > sys 0m2,404s
> > > > > > >
> > > > > > > $wc -l foo
> > > > > > > 18421939 foo
> > > > > > >
> > > > > > > $ cat foo
> > > > > > > ...
> > > > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_voltage.*_scale_available$)$/
> > > > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_voltage.*_scale_available$)$/
> > > > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/out_altvoltage.*_scale_available$)$/
> > > > > > > /sys/kernel/kexec_crash_loaded =~ /^(?^:^/sys/.*/iio\:device.*/in_pressure.*_scale_available$)$/
> > > > > > > ...
> > > > > > >
> > > > > > > On other words, on my desktop, the /sys match is performing >18M regular
> > > > > > > expression searches, which takes 6,2 seconds (or 17,2 seconds, if debug is
> > > > > > > enabled and sent to an area on my nvme storage).
> > > > > >
> > > > > > Better, it's down to 10 minutes on my machine now:
> > > > > >
> > > > > > real 10m39.218s
> > > > > > user 10m37.742s
> > > > > > sys 0m0.775s
> > > > >
> > > > > A lot better, but not clear why it is still taking ~40x more than here...
> > > > > It could well be due to the other ABI changes yet to be applied
> > > > > (I'll submit it probably later today), but it could also be related to
> > > > > something else. Could this be due to disk writes?
> > > >
> > > > Disk writes to where for what? This is a very fast disk (nvme raid
> > > > array) It's also a very "big" system, with lots of sysfs files:
> > > >
> > > > $ find /sys/devices/ -type f | wc -l
> > > > 44334
> > >
> > > Ok. Maybe that partially explains why it is taking so long, as the
> > > number of regex to compare will increase (not linearly).
> >
> > No idea. I just ran it on my laptop and it took only 5 seconds.
>
> Ok, 5 seconds is similar to what I got here on the machines I
> tested so far. I'm waiting for a (shared) big machine to be available
> in order to be able to do some tests on it.
>
> > Hm, you aren't reading the values of the sysfs files, right?
>
> No. Just retrieving the directory contents. That part is actually
> fast: it takes less than 2 seconds here to read all ABI + traverse
> sysfs directories. Also, from your past logs, the time is spent
> later on, when it is handling the regex. On that time, there are
> just the regex parsing and printing the results.
>
> > Anything I can do to run to help figure out where the script is taking
> > so long?
>
> Not sure if it is worth the efforts. I mean, the relationship
> between the number of processed sysfs nodes and the number of regex
> to be tested (using big-oh and big-omega notation) should be between
> Ω(n . log(n)) and O(n^2 . log(n)). There's not much space left for
> optimizing it, I guess.
>
> So, I would expect that a big server would take a log more time to
> process, it, due to the larger number of sysfs entries.
>
> Also, if one wants to speedup on a big machine, it could either
> exclude some pattern, like:
>
> # Won't parse any PCI devices
> $time ./scripts/get_abi.pl undefined --search-string '^(?!.*pci)' |wc -l
> 8438
>
> real 0m3,494s
> user 0m2,829s
> sys 0m0,658s
That only takes 8 seconds on this box:
$ time ./scripts/get_abi.pl undefined --search-string '^(?!.*pci)' |wc -l
18872
real 0m8.026s
user 0m7.300s
sys 0m0.726s
> or (more likely) just search for an specific part of the ABI:
>
> # Seek ABI only for PCI devices
> $ ./scripts/get_abi.pl undefined --search-string pci
This takes much longer, I didn't want to wait the 10 minutes :)
> ---
>
> After sleeping on it, I opted to implement some progress information.
>
> That will help to identify any issues that might be causing the
> script to take so long to finish.
>
> I'll send the patches on a new series.
Thanks, I'll go try those now...
greg k-h