Hi,
I was doing some 'git grep SPDX-License-Identifier' statistics, but
noticed that I had to do a lot more normalization than expected (clearly
handling different comment markers is needed).
How about running something like the below after -rc1? The end result is
2558 files changed, 2558 insertions(+), 2558 deletions(-)
mostly from the last fixup, before that it's merely
90 files changed, 90 insertions(+), 90 deletions(-)
Rasmus
#!/bin/sh
fixup() {
gp="$1"
cmd="$2"
git grep --files-with-matches "SPDX-License-Identifier:$gp" | grep
-v COPYING | \
xargs -r -P8 sed -E -s -i -e "1,3 { /SPDX-License-Identifier/ {
$cmd } }"
git diff --stat | tail -n1
}
# tab->space, the first string is "dot asterisk tab"
fixup '.* ' 's/\t/ /g'
# trailing space
fixup '.* $' 's/ *$//'
# collapse multiple spaces
fixup '.* ' 's/ */ /g'
# or -> OR
fixup '.* or ' 's/ or / OR /g'
# Remove outer parenthesis - when that pair is the only set of
# parenthesis. Only none or */ trailing comment marker is handled.
fixup ' (' 's|Identifier: \(([^()]*)\)( \*/)?$|Identifier: \1\2|'
On Fri, Feb 26, 2021 at 01:32:04PM +0100, Rasmus Villemoes wrote:
> Hi,
>
> I was doing some 'git grep SPDX-License-Identifier' statistics, but
> noticed that I had to do a lot more normalization than expected (clearly
> handling different comment markers is needed).
>
> How about running something like the below after -rc1? The end result is
>
> 2558 files changed, 2558 insertions(+), 2558 deletions(-)
>
> mostly from the last fixup, before that it's merely
>
> 90 files changed, 90 insertions(+), 90 deletions(-)
>
> Rasmus
>
> #!/bin/sh
>
> fixup() {
> gp="$1"
> cmd="$2"
>
> git grep --files-with-matches "SPDX-License-Identifier:$gp" | grep
> -v COPYING | \
> xargs -r -P8 sed -E -s -i -e "1,3 { /SPDX-License-Identifier/ {
> $cmd } }"
> git diff --stat | tail -n1
> }
>
> # tab->space, the first string is "dot asterisk tab"
> fixup '.* ' 's/\t/ /g'
>
> # trailing space
> fixup '.* $' 's/ *$//'
>
> # collapse multiple spaces
> fixup '.* ' 's/ */ /g'
>
> # or -> OR
> fixup '.* or ' 's/ or / OR /g'
>
> # Remove outer parenthesis - when that pair is the only set of
> # parenthesis. Only none or */ trailing comment marker is handled.
> fixup ' (' 's|Identifier: \(([^()]*)\)( \*/)?$|Identifier: \1\2|'
What exactly are you trying to "clean up" here? What tool are you using
that can not properly parse the tags that we currently have?
confused,
greg k-h
On 26/02/2021 13.58, Greg Kroah-Hartman wrote:
> What exactly are you trying to "clean up" here?
For example, just inside
Documentation/devicetree/bindings/display/panel/, the exact same thing
is expressed in four different ways:
(GPL-2.0-only or BSD-2-Clause)
GPL-2.0-only or BSD-2-Clause
(GPL-2.0-only OR BSD-2-Clause)
GPL-2.0-only OR BSD-2-Clause
AFAICT, lower-case "or" isn't even compliant to the spec. The outer
parentheses doesn't really serve any purpose (even single-item
"(GPL-2.0)" appears), and that, and the random whitespace garbage, just
makes the tags appear different unless one does a lot of ad hoc
normalization.
But feel free to nak/drop it, just a suggestion.
Rasmus