Commit bc41a7f36469 ("LICENSES: Add the CC-BY-4.0 license")
unfortunately introduced LICENSES/dual/CC-BY-4.0 in UTF-8 Unicode text
While python will barf at it with:
FAIL: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
Traceback (most recent call last):
File "scripts/spdxcheck.py", line 244, in <module>
spdx = read_spdxdata(repo)
File "scripts/spdxcheck.py", line 47, in read_spdxdata
for l in open(el.path).readlines():
File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
While it is indeed debatable if 'Licensor.' used in the license file
needs unicode quotes, instead, let us force spdxcheck to read utf-8
instead.
Reported-by: Rahul T R <[email protected]>
Signed-off-by: Nishanth Menon <[email protected]>
---
scripts/spdxcheck.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py
index 3e784cf9f401..ebd06ae642c9 100755
--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -44,7 +44,7 @@ def read_spdxdata(repo):
continue
exception = None
- for l in open(el.path).readlines():
+ for l in open(el.path, encoding="utf-8").readlines():
if l.startswith('Valid-License-Identifier:'):
lid = l.split(':')[1].strip().upper()
if lid in spdx.licenses:
--
2.32.0
Nishanth,
On Fri, Jul 02 2021 at 20:21, Nishanth Menon wrote:
> Commit bc41a7f36469 ("LICENSES: Add the CC-BY-4.0 license")
> unfortunately introduced LICENSES/dual/CC-BY-4.0 in UTF-8 Unicode text
Sigh. Why are people adding such things w/o running this script in the
first place.
> While python will barf at it with:
>
> FAIL: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
> Traceback (most recent call last):
> File "scripts/spdxcheck.py", line 244, in <module>
> spdx = read_spdxdata(repo)
> File "scripts/spdxcheck.py", line 47, in read_spdxdata
> for l in open(el.path).readlines():
> File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> return codecs.ascii_decode(input, self.errors)[0]
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2109: ordinal not in range(128)
>
> While it is indeed debatable if 'Licensor.' used in the license file
> needs unicode quotes, instead, let us force spdxcheck to read utf-8
> instead.
s/let us//
Ditto for the $subject. See Documentation/process/ for further enlightment.
> Reported-by: Rahul T R <[email protected]>
> Signed-off-by: Nishanth Menon <[email protected]>
With that fixed:
Reviewed-by: Thomas Gleixner <[email protected]>
Thomas Gleixner <[email protected]> writes:
> Nishanth,
> On Fri, Jul 02 2021 at 20:21, Nishanth Menon wrote:
>> Commit bc41a7f36469 ("LICENSES: Add the CC-BY-4.0 license")
>> unfortunately introduced LICENSES/dual/CC-BY-4.0 in UTF-8 Unicode text
>
> Sigh. Why are people adding such things w/o running this script in the
> first place.
I have a guess on that front ... there is nothing in our documentation
that says anybody should run it, and the script itself gives no
indication of what it does, when it should be run, or how to run it.
That might just reduce uptake a little bit...:)
I increasingly believe that anything we add to scripts/ should start
with a "usage" header describing why it exists and how to make it do its
thing. That would be a welcome addition to spdxcheck.py. Adding
something to Documentation/process/license-rules.html would be a nice
bonus.
Thanks,
jon