2021-04-12 18:22:08

by Wu XiangCheng

[permalink] [raw]
Subject: [RFC 0/2] Add a new translation tool scripts/trslt.py

Hi all,

This set of patches aim to add a new translation tool - trslt.py, which
can control the transltions version corresponding to source files.

For a long time, kernel documentation translations lacks a way to control the
version corresponding to the source files. If you translate a file and then
someone updates the source file, there will be a problem. It's hard to know
which version the existing translation corresponds to, and even harder to sync
them.

The common way now is to check the date, but this is not exactly accurate,
especially for documents that are often updated. And some translators write
corresponding commit ID in the commit log for reference, it is a good way,
but still a little troublesome.

Thus, the purpose of ``trslt.py`` is to add a new annotating tag to the file
to indicate corresponding version of the source file::

.. translation_origin_commit: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The script will automatically copy file and generate tag when creating new
translation, and give update suggestions based on those tags when updating
translations.

More details please read doc in [Patch 2/2].

Still need working:
- improve verbose mode
- test on more python 3.x version
- only support linux now, need test on Mac OS, nonsupport Windows
due to '\'

Any suggestion is welcome!

Thanks!

Wu XiangCheng (2):
scripts: Add new translation tool trslt.py
docs: doc-guide: Add document for scripts/trslt.py

Documentation/doc-guide/index.rst | 1 +
Documentation/doc-guide/trslt.rst | 233 ++++++++++++++++++++++++++
scripts/trslt.py | 267 ++++++++++++++++++++++++++++++
3 files changed, 501 insertions(+)
create mode 100644 Documentation/doc-guide/trslt.rst
create mode 100755 scripts/trslt.py

--
2.20.1


Attachments:
(No filename) (1.79 kB)
trslt.py (8.37 kB)
Download all attachments

2021-04-12 18:22:24

by Wu XiangCheng

[permalink] [raw]
Subject: [RFC 2/2] docs: doc-guide: Add document for scripts/trslt.py

Add document for new translation tool scripts/trslt.py
and link it to doc-guide/index.rst

Signed-off-by: Wu XiangCheng <[email protected]>
---
Documentation/doc-guide/index.rst | 1 +
Documentation/doc-guide/trslt.rst | 233 ++++++++++++++++++++++++++++++
2 files changed, 234 insertions(+)
create mode 100644 Documentation/doc-guide/trslt.rst

diff --git a/Documentation/doc-guide/index.rst b/Documentation/doc-guide/index.rst
index 7c7d97784626..441722cdd3fc 100644
--- a/Documentation/doc-guide/index.rst
+++ b/Documentation/doc-guide/index.rst
@@ -12,6 +12,7 @@ How to write kernel documentation
parse-headers
contributing
maintainer-profile
+ trslt

.. only:: subproject and html

diff --git a/Documentation/doc-guide/trslt.rst b/Documentation/doc-guide/trslt.rst
new file mode 100644
index 000000000000..df77c5a13500
--- /dev/null
+++ b/Documentation/doc-guide/trslt.rst
@@ -0,0 +1,233 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+.. _trslt:
+
+===========================================
+Kernel Documentation Translation File tool
+===========================================
+
+:Author: Wu XiangCheng <[email protected]>
+
+This document is for ``scripts/trslt.py``.
+
+Motivation
+-----------
+
+For a long time, kernel documentation translations lacks a way to control the
+version corresponding to the source files. If you translate a file and then
+someone updates the source file, there will be a problem. It's hard to know
+which version the existing translation corresponds to, and even harder to sync
+them. The common way now is to check the date, but this is not exactly accurate,
+especially for documents that are often updated.
+
+So, some translators write corresponding commit ID in the commit log for
+reference, it is a good way, but still a little troublesome.
+
+Thus, the purpose of ``trslt.py`` is to add a new annotating tag to the file to
+indicate the corresponding version of the source file::
+
+ .. translation_origin_commit: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+
+The script will automatically copy file and generate tag when creating new
+translation, and give update suggestions based on those tags when updating
+translations.
+
+Dependency
+-----------
+
+:Language: Python 3.x
+
+:Python Libraries:
+
+ os
+
+ argparse
+
+ subprocess
+
+Usage
+------
+
+``trslt.py`` comes with a help message::
+
+ ➜ scripts/trslt.py -h
+ usage: trslt.py [-h] [-v] [-l {it_IT,ja_JP,ko_KR,zh_CN}] (-c | -u) file
+
+ Linux Kernel Documentation Translation File Tool
+
+ positional arguments:
+ file specific file path
+
+ optional arguments:
+ -h, --help show this help message and exit
+ -v, --verbose enable verbose mode
+ -l {it_IT,ja_JP,ko_KR,zh_CN}, --language {it_IT,ja_JP,ko_KR,zh_CN}
+ choose translation language, default: zh_CN
+ -c, --copy copy a origin file to translation directory
+ -u, --update get a translation file's update information
+
+We could learn some basic operation methods from this help message. See below
+for details.
+
+.. note::
+
+ ``trslt.py`` should be called in Linux kernel source **ROOT** directory or
+ "Documentation/", "Documentation/translations/", "Documentation/translations/ll_NN/".
+ Anyway, don't worry, it will remind you when using a wrong directory.
+
+Verbose mode
+~~~~~~~~~~~~~
+
+``-v, --verbose``
+
+As its name said, ``-v`` is used to turn on the verbose mode. Then will show
+more informations, something is better than nothing.
+
+
+Choose language
+~~~~~~~~~~~~~~~~
+
+``-l, --language``
+
+As a translator, you need to select the language you prefer. And this script
+also need to decide which language directory should be used.
+
+Simply give the language after ``-l``, like ``-l zh_CN``. If you do not give
+a choice, default is ``zh_CN``.
+
+Now, we have four langugue(it_IT,ja_JP,ko_KR,zh_CN) to use, if you need others,
+please feel free to add it, only need to modify language choice list in
+``arg()`` of ``trslt.py`` and this document.
+
+Copy mode
+~~~~~~~~~~
+
+``-c, --copy``
+
+This action is used to copy a origin file to translation directory. If the file
+is existing, it will give a warning::
+
+ ➜ scripts/trslt.py -c Documentation/admin-guide/perf-security.rst
+ INFO: Documentation/translations/zh_CN/admin-guide/perf-security.rst has been created, please remember to edit it.
+
+ ➜ scripts/trslt.py -c Documentation/admin-guide/perf-security.rst
+ WARNING: Documentation/translations/zh_CN/admin-guide/perf-security.rst is existing, can not use copy, please try -u/--update!
+
+Also, it will auto add a commit-id tag and language special header::
+
+ :Original: Documentation/admin-guide/perf-security.rst
+
+ .. translation_origin_commit: a15cb2c1658417f9e8c7e84fe5d6ee0b63cbb9b0
+
+ :Translator: Name <[email protected]>
+
+The header could be used to include a unified declaration or localization tag.
+If you need a special header for your language, please modify ``la_head(fp, la)``
+in ``trslt.py``, simply add a ``elif`` condition.
+
+
+Update mode
+~~~~~~~~~~~~
+
+``-u, --update``
+
+This action is used to update a existing translation file. The translation file
+must have a commit-id tag for generating origin text diff file. If there is no
+commit-id tag or no need to update, it will remind you::
+
+ ➜ scripts/trslt.py -u Documentation/translations/zh_CN/admin-guide/perf-security.rst
+ INFO: Documentation/translations/zh_CN/admin-guide/perf-security.rst.diff file has generated
+ INFO: if you want to update Documentation/translations/zh_CN/admin-guide/perf-security.rst, please Do Not Forget to update the translation_origin_commit tag.
+
+ .. translation_origin_commit: a15cb2c1658417f9e8c7e84fe5d6ee0b63cbb9b0
+
+ ➜ scripts/trslt.py -u Documentation/translations/zh_CN/admin-guide/perf-security.rst
+ INFO: Documentation/admin-guide/perf-security.rst does not have any change since a15cb2c1658417f9e8c7e84fe5d6ee0b63cbb9b0
+
+ ➜ scripts/trslt.py -u Documentation/translations/zh_CN/admin-guide/index.rst
+ WARNING: Documentation/translations/zh_CN/admin-guide/index.rst does not have a translation_origin_commit tag, can not generate a diff file, please add a tag if you want to update it.
+
+ .. translation_origin_commit: da514157c4f063527204adc8e9642a18a77fccc9
+
+.. important::
+
+ Please note, this action will auto generate a diff file, but it **will not
+ automatically add or change the commit-id**, only print it, you need to add
+ or change it by yourself!
+
+Workflow
+----------
+
+Describes two common workflows — start new and update existing.
+
+Start a new translation
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+To start a new translation, please use ``-c`` action::
+
+ ➜ scripts/trslt.py -c Documentation/any-file
+
+If it's ok, translation file created successfully::
+
+ INFO: Documentation/translations/ll_NN/any-file has been created, please remember to edit it.
+
+Then you can start translation work.
+
+Or, get a warning::
+
+ WARNING: Documentation/translations/ll_NN/any-file is existing, can not use copy, please try -u/--update!
+
+ WARNING: seems you are copying a file only exist in translations/ dir
+
+Or, get a error::
+
+ ERROR: file does not in Linux Kernel source Documentation
+
+Update a existing translation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To update a existing translation, please use ``-u`` action::
+
+ ➜ scripts/trslt.py -u Documentation/translations/ll_NN/any-file
+
+If everything is ok, script will generate a diff file of origin text from the
+commit-id tag's id to newest, and print the newset commit-id tag::
+
+ INFO: Documentation/translations/ll_NN/any-file.diff file has generated
+ INFO: if you want to update Documentation/translations/ll_NN/any-file, please Do Not Forget to update the translation_origin_commit tag.
+
+ .. translation_origin_commit: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+
+So simply take a look to diff and update translation, also do not forget to
+modify commit-id tag.
+
+Or the translation no need to update::
+
+ INFO: Documentation/any-file does not have any change since xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+
+If the translation file does not have a commit-id tag::
+
+ WARNING: Documentation/translations/ll_NN/any-file does not have a translation_origin_commit tag, can not generate a diff file, please add a tag if you want to update it.
+
+ .. translation_origin_commit: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+
+Please add the tag by hand if you want to update it.
+
+If you give a wrong file::
+
+ ERROR: Documentation/any-file does not belong to ll_NN translation!
+
+Why the name?
+--------------
+
+``trslt.py`` — tr(an)sl(a)t(or).
+
+Issues
+-------
+
+If you find any problem, please report issues to Wu XiangCheng <[email protected]>
+
+Thanks
+--------
+
+Will be completed after RFC.
--
2.20.1

2021-04-12 18:22:39

by Wu XiangCheng

[permalink] [raw]
Subject: [PATCH 1/2] scripts: Add new translation tool trslt.py

Add a new translation tool scripts/trslt.py
For
- translation file help tool
- translation corresponding version control

Signed-off-by: Wu XiangCheng <[email protected]>
---
scripts/trslt.py | 267 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 267 insertions(+)
create mode 100755 scripts/trslt.py

diff --git a/scripts/trslt.py b/scripts/trslt.py
new file mode 100755
index 000000000000..1acc6f2e69f3
--- /dev/null
+++ b/scripts/trslt.py
@@ -0,0 +1,267 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0+
+#
+# Kernel Documentation Translation File tool
+# Document see: Documentation/doc-guide/trslt.rst
+#
+# Wu XiangCheng <[email protected]>, 2021.
+
+import os
+import argparse
+import subprocess
+
+# global verbose mode flag
+VERBOSE_FLAG = False
+
+# change to source root dir
+def cdpath():
+ # at source ROOT
+ if os.path.isdir("Documentation/translations") and os.path.isfile("MAINTAINERS"):
+ return 0
+ # at Documentation/
+ elif os.path.isdir("translations") and os.path.isfile("../MAINTAINERS"):
+ os.chdir("../")
+ return 0
+ # at Documentation/translation/
+ elif os.path.isdir("zh_CN") and os.path.isfile("../../MAINTAINERS"):
+ os.chdir("../../")
+ return 0
+ # at Documentation/translations/ll_NN/
+ elif os.path.isdir("translations") == False and os.path.isdir("../../translations") and os.path.isfile("../../../MAINTAINERS"):
+ os.chdir("../../../")
+ return 0
+ # anywhere else
+ else:
+ print("ERROR: Please run this script under linux kernel source ROOT dir")
+ return -1
+
+# argv
+def arg():
+ parser = argparse.ArgumentParser(
+ description='Linux Kernel Documentation Translation File Tool')
+ # file path
+ parser.add_argument('file', help="specific file path")
+ # verbose mode
+ parser.add_argument('-v', '--verbose',
+ help="enable verbose mode",
+ action='store_true')
+ # language choose
+ parser.add_argument('-l', '--language',
+ help="choose translation language, default: zh_CN",
+ type=str,
+ choices=["it_IT", "ja_JP", "ko_KR", "zh_CN"],
+ default="zh_CN")
+ # required action group
+ ch = parser.add_mutually_exclusive_group(required=True)
+ # \_ copy
+ ch.add_argument('-c', '--copy',
+ help="copy a origin file to translation directory",
+ action='store_true')
+ # \_ update
+ ch.add_argument('-u', '--update',
+ help="get a translation file's update information",
+ action='store_true')
+
+ argv_ = parser.parse_args()
+
+ # modify global VERBOSE_FLAG
+ if argv_.verbose:
+ global VERBOSE_FLAG
+ VERBOSE_FLAG = True
+ print(argv_)
+
+ return argv_
+
+# get newest commit id of a origin doc file
+def get_newest_commit(fp):
+ cmd = "git log --format=oneline --no-merges "+fp
+ p = subprocess.Popen(cmd,
+ shell=True,
+ stdout=subprocess.PIPE,
+ errors="replace")
+ log = p.stdout.readline()
+ commit_id = log[:log.find(' ')]
+ return commit_id
+
+# add language special header
+def la_head(fp, la):
+ if la == "zh_CN":
+ cfp = fp[0:14]+"translations/"+la+'/'+fp[14:]
+ r = ".. include:: " + \
+ os.path.relpath(
+ "Documentation/translations/zh_CN/disclaimer-zh_CN.rst",
+ cfp[0:cfp.rfind('/')]) + "\n\n"
+ r += ":Original: "+fp+"\n\n"
+ r += ".. translation_origin_commit: "+get_newest_commit(fp)+"\n\n"
+ r += ":译者: 姓名 EnglishName <[email protected]>\n\n"
+ else:
+ r = ":Original: "+fp+"\n\n"
+ r += ".. translation_origin_commit: "+get_newest_commit(fp)+"\n\n"
+ r += ":Translator: Name <[email protected]>\n\n"
+
+ return r
+
+# copy mode
+def copy(fp, la):
+ if os.path.isfile(fp) == False:
+ return -2
+
+ if fp.find("/translations/") != fp.rfind("/translations/"):
+ print("WARNING: seems you are copying a file only exist in translations/ dir")
+ return -3
+
+ f = open(fp, 'r')
+ try:
+ first = f.read(2048)
+ except:
+ print("ERROR: can not read file", fp)
+ return -2
+
+ spdx_id = first.find(".. SPDX-License-Identifier: ")
+ if spdx_id != -1:
+ insert_id = first.find('\n', spdx_id)+1
+ first = first[:insert_id]+'\n'+la_head(fp, la)+first[insert_id:]
+ else:
+ first = la_head(fp, la)+first
+
+ if fp[0:14] == "Documentation/":
+ cfp = fp[0:14]+"translations/"+la+'/'+fp[14:]
+
+ if cfp[cfp.rfind('.'):] != ".rst":
+ print("WARNING: this is not a rst file, may cause problems.",
+ "copy will continue, but please \033[31mcheck it!\033[0m")
+
+ cfp_dir = cfp[0:cfp.rfind('/')]
+
+ if not os.path.exists(cfp_dir):
+ os.makedirs(cfp_dir)
+
+ if os.path.isfile(cfp):
+ print("WARNING:\033[31m", cfp,
+ "\033[0mis existing, can not use copy, please try -u/--update!")
+ return -3
+
+ cf = open(cfp, 'w')
+ cf.write(first)
+
+ while True:
+ a = f.read(2048)
+ if a != '':
+ cf.write(a)
+ else:
+ break
+
+ cf.close()
+ print("INFO: \033[32m" + cfp +
+ "\033[0m has been created, please remember to edit it.")
+ else:
+ return -2
+
+ return 0
+
+# generete origin text diff file for update
+def gen_diff(ofp, old_id):
+ new_id = get_newest_commit(ofp)
+ if old_id == new_id:
+ return 1
+
+ cmd = "git show "+old_id+".."+new_id+" "+ofp
+ p = subprocess.Popen(cmd,
+ shell=True,
+ stdout=subprocess.PIPE,
+ errors="replace")
+ log = p.stdout.read()
+ log = cmd+"\n\n"+log
+ return log
+
+# update mode
+def update(fp, la):
+ if os.path.isfile(fp) == False:
+ return -2
+ if fp.find("Documentation/translations/"+la) == -1:
+ print("ERROR:", fp, "does not belong to", la, "translation!")
+ return -3
+
+ # origin file path
+ ofp = fp[:fp.find("translations/"+la)] + \
+ fp[fp.find("translations/"+la)+14+len(la):]
+
+ if not os.path.isfile(ofp):
+ print("ERROR: origin file",ofp,"does not exist or not a file")
+ return -2
+
+ f = open(fp, 'r')
+ try:
+ first = f.read(3072)
+ except:
+ print("ERROR: can not read file", fp)
+ return -2
+
+ commit_id = first.find("\n.. translation_origin_commit: ")
+ if commit_id == -1:
+ print("WARNING:", fp, "\033[31mdoes not have a translation_origin_commit tag,",
+ "can not generate a diff file\033[0m, please add a tag if you want to update it.")
+ print("\n\033[33m.. translation_origin_commit: " +
+ get_newest_commit(ofp) + "\033[0m")
+ return -4
+ else:
+ commit_id = commit_id+1 # '\n'
+ commit_id = first[commit_id:first.find('\n', commit_id)]
+ commit_id = commit_id[commit_id.find(' ')+1:]
+ commit_id = commit_id[commit_id.find(' ')+1:]
+
+ diff = gen_diff(ofp, commit_id)
+ if diff == 1:
+ print("INFO:", ofp, "does not have any change since", commit_id)
+ else:
+ with open(fp+".diff", 'w') as d:
+ d.write(diff)
+ print("INFO: \033[32m"+fp+".diff\033[0m file has generated",)
+ print("INFO: if you want to update " + fp +
+ ", please \033[31mDo Not Forget\033[0m to update the translation_origin_commit tag.",
+ "\n\n\033[33m.. translation_origin_commit: " +
+ get_newest_commit(ofp) + "\033[0m")
+
+ return 0
+
+# main entry
+def main():
+ argv_ = arg()
+
+ # get file's abspath before cdpath
+ file_path = os.path.abspath(argv_.file)
+ if VERBOSE_FLAG:
+ print(file_path)
+
+ if cdpath() != 0:
+ return -1
+
+ # if file_path valid
+ if file_path.find("Documentation") == -1:
+ print("ERROR: file does not in Linux Kernel source Documentation")
+ return -2
+ elif os.path.isfile(file_path[file_path.find("Documentation"):]) == False:
+ print("ERROR: file does not exist or not a file")
+ return -2
+ else:
+ file_path = file_path[file_path.find("Documentation"):]
+
+ if VERBOSE_FLAG:
+ print(file_path)
+
+ if argv_.copy:
+ return copy(file_path, argv_.language)
+ elif argv_.update:
+ return update(file_path, argv_.language)
+
+ return 0
+
+
+if __name__ == "__main__":
+ exit_code = main()
+ if VERBOSE_FLAG:
+ if exit_code == 0:
+ print("exit with code:\033[32m", exit_code, "\033[0m")
+ else:
+ print("exit with code:\033[31m", exit_code, "\033[0m")
+ exit(exit_code)
--
2.20.1

2021-04-14 09:26:14

by Federico Vaga

[permalink] [raw]
Subject: Re: [RFC 0/2] Add a new translation tool scripts/trslt.py

Hi,

Yes, you are touching a good point where things can be improved. I admit that I
did not have a look at the code yet, if not very quickly. Perhaps I'm missing
somethin. However, let me give you my two cents based on what I usually do.

I do not like the idea of adding tags to the file and having tools to modify it.
I would prefer to keep the text as clean as possible.

Instead, what can be done without touching manipulating the text file is to do
something like this:

# Take the commit ID of the last time a document has translated
LAST_TRANS=$(git log -n 1 --oneline Documentation/translations/<lang>/<path-to-file> | cut -d " " -f 1)

# Take the history of the same file in the main Documentation tree
git log --oneline $LAST_TRANS..doc/docs-next Documentation/<path-to-file>

This will give you the list of commits that changed <path-to-file>, and that
probably need to be translated. The problem of this approach is that by the time
you submit a translation, other people may change the very same files. The
correctness of this approach depends on patch order in docs-next, and this can't
be guaranteed.

So, instead of reling on LAST_DIR, I rely on a special git branch that acts as
marker. But this works only for me and not for other translator of the same
languages, so you can get troubles also in this case.

What we can actually do is to exploit the git commit message to store the tag
you mentioned. Hence, we can get the last Id with something like this:

LAST_ID=$(git log -n 1 Documentation/translations/<lang>/<path-to-file> | grep -E "Translated-on-top-of: commit [0-9a-f]{12}")

The ID we store in the tag does not need to be the commit ID of the last change
to <path-to-file>, but just the commit on which you were when you did the
translation. This because it will simplify the management of this tag when
translating multiple files/patches in a single patch (to avoid to spam the
mailing list with dozens of small patches).

On Mon, Apr 12, 2021 at 03:04:03PM +0800, Wu XiangCheng wrote:
>Hi all,
>
>This set of patches aim to add a new translation tool - trslt.py, which
>can control the transltions version corresponding to source files.
>
>For a long time, kernel documentation translations lacks a way to control the
>version corresponding to the source files. If you translate a file and then
>someone updates the source file, there will be a problem. It's hard to know
>which version the existing translation corresponds to, and even harder to sync
>them.
>
>The common way now is to check the date, but this is not exactly accurate,
>especially for documents that are often updated. And some translators write
>corresponding commit ID in the commit log for reference, it is a good way,
>but still a little troublesome.
>
>Thus, the purpose of ``trslt.py`` is to add a new annotating tag to the file
>to indicate corresponding version of the source file::
>
>.. translation_origin_commit: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
>The script will automatically copy file and generate tag when creating new
>translation, and give update suggestions based on those tags when updating
>translations.
>
>More details please read doc in [Patch 2/2].
>
>Still need working:
>- improve verbose mode
>- test on more python 3.x version
>- only support linux now, need test on Mac OS, nonsupport Windows
> due to '\'
>
>Any suggestion is welcome!
>
>Thanks!
>
>Wu XiangCheng (2):
> scripts: Add new translation tool trslt.py
> docs: doc-guide: Add document for scripts/trslt.py
>
> Documentation/doc-guide/index.rst | 1 +
> Documentation/doc-guide/trslt.rst | 233 ++++++++++++++++++++++++++
> scripts/trslt.py | 267 ++++++++++++++++++++++++++++++
> 3 files changed, 501 insertions(+)
> create mode 100644 Documentation/doc-guide/trslt.rst
> create mode 100755 scripts/trslt.py
>
>--
>2.20.1
>


2021-04-15 21:02:44

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [RFC 0/2] Add a new translation tool scripts/trslt.py

Wu XiangCheng <[email protected]> writes:

> Hi all,
>
> This set of patches aim to add a new translation tool - trslt.py, which
> can control the transltions version corresponding to source files.
>
> For a long time, kernel documentation translations lacks a way to control the
> version corresponding to the source files. If you translate a file and then
> someone updates the source file, there will be a problem. It's hard to know
> which version the existing translation corresponds to, and even harder to sync
> them.
>
> The common way now is to check the date, but this is not exactly accurate,
> especially for documents that are often updated. And some translators write
> corresponding commit ID in the commit log for reference, it is a good way,
> but still a little troublesome.
>
> Thus, the purpose of ``trslt.py`` is to add a new annotating tag to the file
> to indicate corresponding version of the source file::
>
> .. translation_origin_commit: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>
> The script will automatically copy file and generate tag when creating new
> translation, and give update suggestions based on those tags when updating
> translations.
>
> More details please read doc in [Patch 2/2].

So, like Federico, I'm unconvinced about putting this into the
translated text itself. This is metadata, and I'd put it with the rest
of the metadata. My own suggestion would be a tag like:

Translates: 6161a4b18a66 ("docs: reporting-issues: make people CC the regressions list")

It would be an analogue to the Fixes tag in this regard; you could have
more than one of them if need be.

I'm not sure we really need a script in the kernel tree for this; it
seems like what you really want is some sort of git commit hook. That
said, if you come up with something useful, we can certainly find a
place for it.

Thanks,

jon

2021-04-16 08:48:56

by Wu XiangCheng

[permalink] [raw]
Subject: Re: [RFC 0/2] Add a new translation tool scripts/trslt.py

Hi Federico,

On Wed, Apr 14, 2021 at 01:27:23AM +0200, Federico Vaga wrote:
> Hi,
>
> Yes, you are touching a good point where things can be improved. I admit that I
> did not have a look at the code yet, if not very quickly. Perhaps I'm missing
> something. However, let me give you my two cents based on what I usually do.
>
> I do not like the idea of adding tags to the file and having tools to modify it.
> I would prefer to keep the text as clean as possible.

Yeah, I also consider about that, so let this tag be one line and a comment
at design time, hope make text clean.

>
> Instead, what can be done without touching manipulating the text file is to do
> something like this:
>
> # Take the commit ID of the last time a document has translated
> LAST_TRANS=$(git log -n 1 --oneline Documentation/translations/<lang>/<path-to-file> | cut -d " " -f 1)
>
> # Take the history of the same file in the main Documentation tree
> git log --oneline $LAST_TRANS..doc/docs-next Documentation/<path-to-file>
>
> This will give you the list of commits that changed <path-to-file>, and that
> probably need to be translated. The problem of this approach is that by the time
> you submit a translation, other people may change the very same files. The
> correctness of this approach depends on patch order in docs-next, and this can't
> be guaranteed.

Thanks for sharing your experiences!

Yes, the order is why I think about this translation version control.
It's really messy especially when file be updated frequently.
And some old files are also hard to maintain.

>
> So, instead of relying on LAST_DIR, I rely on a special git branch that acts as
> marker. But this works only for me and not for other translator of the same
> languages, so you can get troubles also in this case.
>
> What we can actually do is to exploit the git commit message to store the tag
> you mentioned. Hence, we can get the last Id with something like this:
>
> LAST_ID=$(git log -n 1 Documentation/translations/<lang>/<path-to-file> | grep -E "Translated-on-top-of: commit [0-9a-f]{12}")
>
> The ID we store in the tag does not need to be the commit ID of the last change
> to <path-to-file>, but just the commit on which you were when you did the
> translation. This because it will simplify the management of this tag when
> translating multiple files/patches in a single patch (to avoid to spam the
> mailing list with dozens of small patches).

Yes, I also think about store the relative commit-id in commit message.
Being a git-hook is easy for now, but if we'd like to add something in
the future, it would may need add another script. Or just a tool which
show the relative information and let translator add it by themselves?

But to be honest, I'd like to make the tool could have more functions in
the future. Like auto start worlflow etc. More and more people will
join the translation work and some new developers also start their way
from here. There is a clear need to make the work more standardized and
easier.

Thanks!

Wu X.C.

>
> On Mon, Apr 12, 2021 at 03:04:03PM +0800, Wu XiangCheng wrote:
> > Hi all,
> >
> > This set of patches aim to add a new translation tool - trslt.py, which
> > can control the transltions version corresponding to source files.
> >
> > For a long time, kernel documentation translations lacks a way to control the
> > version corresponding to the source files. If you translate a file and then
> > someone updates the source file, there will be a problem. It's hard to know
> > which version the existing translation corresponds to, and even harder to sync
> > them.
> >
> > The common way now is to check the date, but this is not exactly accurate,
> > especially for documents that are often updated. And some translators write
> > corresponding commit ID in the commit log for reference, it is a good way,
> > but still a little troublesome.
> >
> > Thus, the purpose of ``trslt.py`` is to add a new annotating tag to the file
> > to indicate corresponding version of the source file::
> >
> > .. translation_origin_commit: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > The script will automatically copy file and generate tag when creating new
> > translation, and give update suggestions based on those tags when updating
> > translations.
> >
> > More details please read doc in [Patch 2/2].
> >
> > Still need working:
> > - improve verbose mode
> > - test on more python 3.x version
> > - only support linux now, need test on Mac OS, nonsupport Windows
> > due to '\'
> >
> > Any suggestion is welcome!
> >
> > Thanks!
> >
> > Wu XiangCheng (2):
> > scripts: Add new translation tool trslt.py
> > docs: doc-guide: Add document for scripts/trslt.py
> >
> > Documentation/doc-guide/index.rst | 1 +
> > Documentation/doc-guide/trslt.rst | 233 ++++++++++++++++++++++++++
> > scripts/trslt.py | 267 ++++++++++++++++++++++++++++++
> > 3 files changed, 501 insertions(+)
> > create mode 100644 Documentation/doc-guide/trslt.rst
> > create mode 100755 scripts/trslt.py
> >
> > --
> > 2.20.1
> >
>

2021-04-17 02:49:24

by Wu XiangCheng

[permalink] [raw]
Subject: Re: [RFC 0/2] Add a new translation tool scripts/trslt.py

On Thu, Apr 15, 2021 at 03:00:36PM -0600, Jonathan Corbet wrote:
> Wu XiangCheng <[email protected]> writes:
>
> > Hi all,
> >
> > This set of patches aim to add a new translation tool - trslt.py, which
> > can control the transltions version corresponding to source files.
> >
> > For a long time, kernel documentation translations lacks a way to control the
> > version corresponding to the source files. If you translate a file and then
> > someone updates the source file, there will be a problem. It's hard to know
> > which version the existing translation corresponds to, and even harder to sync
> > them.
> >
> > The common way now is to check the date, but this is not exactly accurate,
> > especially for documents that are often updated. And some translators write
> > corresponding commit ID in the commit log for reference, it is a good way,
> > but still a little troublesome.
> >
> > Thus, the purpose of ``trslt.py`` is to add a new annotating tag to the file
> > to indicate corresponding version of the source file::
> >
> > .. translation_origin_commit: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> >
> > The script will automatically copy file and generate tag when creating new
> > translation, and give update suggestions based on those tags when updating
> > translations.
> >
> > More details please read doc in [Patch 2/2].
>
> So, like Federico, I'm unconvinced about putting this into the
> translated text itself. This is metadata, and I'd put it with the rest
> of the metadata. My own suggestion would be a tag like:
>
> Translates: 6161a4b18a66 ("docs: reporting-issues: make people CC the regressions list")
>
> It would be an analogue to the Fixes tag in this regard; you could have
> more than one of them if need be.

Yes, that's also a good idea rather than add a tag to text itself.

>
> I'm not sure we really need a script in the kernel tree for this; it
> seems like what you really want is some sort of git commit hook. That
> said, if you come up with something useful, we can certainly find a
> place for it.

Emmm, thought again.

Maybe we just need a doc to tell people recommended practice, just put a
script or hook in the doc.

Use it or not, depend on themselves. That's may easier, but I'm worried
about whether this loose approach will work better.

Thanks!

Wu X.C.


Attachments:
(No filename) (2.33 kB)
signature.asc (673.00 B)
Download all attachments