2020-10-14 07:32:10

by Nícolas F. R. A. Prado

[permalink] [raw]
Subject: [PATCH v2 0/5] docs: automarkup.py: Make automarkup ready for Sphinx 3.1+

Hi,

this patch series makes the automatic markup extension ready for Sphinx 3.1+.
It was based on Mauro's Sphinx patch series, and requires it for the namespaces
to work, but could also be merged through the docs tree without regressions
(other than the increased build time explained below).

The first three patches make automarkup compatible with Sphinx 3.1. The first
patch makes use of the new C roles in Sphinx3 instead of the generic type role
from Sphinx 2, while patches 2 and 3 solve the warnings caused by Sphinx3's
stricter C domain.

Patch 4 adds cross-referencing to C macros with parameters for Sphinx 3.

Patch 5 enables cross-referencing inside C namespaces, which are new to Sphinx
3.1.

On an importante note:
In order to be able to support automatic cross-referencing inside C namespaces,
I needed to disable parallel source reading for Sphinx in patch 5. On my
machine, this makes the build process take about 4 additional minutes. This is
very bad, since the documentation building process already takes too long, but I
couldn't think of a way to sidestep this issue. If anyone has any idea, it would
be greatly appreciated.

Also, for some reason, disabling the source read parallelization makes
Sphinx output 2 warnings saying so, which is another annoyance.

Thanks,
Nícolas

Changes in v2:
- Split the single patch into patches 1, 2 and 3
- Change sphinx version verification in patch 1
- Thanks Mauro for the clarifications in v1:
- Add patches 4 and 5 for the missing functionalities

Nícolas F. R. A. Prado (5):
docs: automarkup.py: Use new C roles in Sphinx 3
docs: automarkup.py: Fix regexes to solve sphinx 3 warnings
docs: automarkup.py: Skip C reserved words when cross-referencing
docs: automarkup.py: Add cross-reference for parametrized C macros
docs: automarkup.py: Allow automatic cross-reference inside C
namespace

Documentation/sphinx/automarkup.py | 188 +++++++++++++++++++++++------
1 file changed, 154 insertions(+), 34 deletions(-)

--
2.28.0



2020-10-14 07:37:34

by Nícolas F. R. A. Prado

[permalink] [raw]
Subject: [PATCH v2 4/5] docs: automarkup.py: Add cross-reference for parametrized C macros

Sphinx 3 added support for declaring C macros with parameters using the
:c:macro role.

To support automarkup for both functions and parametrized macros using
the same regex (words ending in ()), try to cross-reference to both, and
only fall back to regular text if neither exist.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
---
Documentation/sphinx/automarkup.py | 49 +++++++++++++++++++++++++-----
1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
index 1cc3a2cf2a88..409dbc4100de 100644
--- a/Documentation/sphinx/automarkup.py
+++ b/Documentation/sphinx/automarkup.py
@@ -74,7 +74,7 @@ def markup_refs(docname, app, node):
RE_generic_type: markup_c_ref}

markup_func_sphinx3 = {RE_doc: markup_doc_ref,
- RE_function: markup_c_ref,
+ RE_function: markup_func_ref_sphinx3,
RE_struct: markup_c_ref,
RE_union: markup_c_ref,
RE_enum: markup_c_ref,
@@ -109,12 +109,47 @@ def markup_refs(docname, app, node):
return repl

#
-# Try to replace a C reference (function() or struct/union/enum/typedef
-# type_name) with an appropriate cross reference.
+# In sphinx3 we can cross-reference to C macro and function, each one with its
+# own C role, but both match the same regex, so we try both.
#
+def markup_func_ref_sphinx3(docname, app, match):
+ class_str = ['c-func', 'c-macro']
+ reftype_str = ['function', 'macro']
+
+ cdom = app.env.domains['c']
+ #
+ # Go through the dance of getting an xref out of the C domain
+ #
+ target = match.group(2)
+ target_text = nodes.Text(match.group(0))
+ xref = None
+ if not (target in Skipfuncs or target in Skipnames):
+ for class_s, reftype_s in zip(class_str, reftype_str):
+ lit_text = nodes.literal(classes=['xref', 'c', class_s])
+ lit_text += target_text
+ pxref = addnodes.pending_xref('', refdomain = 'c',
+ reftype = reftype_s,
+ reftarget = target, modname = None,
+ classname = None)
+ #
+ # XXX The Latex builder will throw NoUri exceptions here,
+ # work around that by ignoring them.
+ #
+ try:
+ xref = cdom.resolve_xref(app.env, docname, app.builder,
+ reftype_s, target, pxref,
+ lit_text)
+ except NoUri:
+ xref = None
+
+ if xref:
+ return xref
+
+ return target_text
+
def markup_c_ref(docname, app, match):
- class_str = {RE_function: 'c-func',
- # Sphinx 2 only
+ class_str = {# Sphinx 2 only
+ RE_function: 'c-func',
RE_generic_type: 'c-type',
# Sphinx 3+ only
RE_struct: 'c-struct',
@@ -122,8 +157,8 @@ def markup_c_ref(docname, app, match):
RE_enum: 'c-enum',
RE_typedef: 'c-type',
}
- reftype_str = {RE_function: 'function',
- # Sphinx 2 only
+ reftype_str = {# Sphinx 2 only
+ RE_function: 'function',
RE_generic_type: 'type',
# Sphinx 3+ only
RE_struct: 'struct',
--
2.28.0


2020-10-14 07:39:20

by Nícolas F. R. A. Prado

[permalink] [raw]
Subject: [PATCH v2 5/5] docs: automarkup.py: Allow automatic cross-reference inside C namespace

Sphinx 3.1 introduced namespaces for C cross-references. With this,
each C domain type/function declaration is put inside the namespace that
was active at the time of its declaration.

To support automatic cross-reference inside C namespaces:
- Save the C namespace used in each doc file (if any) at the source-read
phase, in the beginning of the Sphinx process.
- When making the automarkup, if any namespace was used in the current
file, try to cross-reference to the symbol inside of it before trying
in the global namespace.

To make the first step possible, disable the parallel_read_safe option
in Sphinx, since the dictionary that maps the files to the C namespaces
can't be concurrently updated. This unfortunately increases the build
time of the documentation.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
---
Documentation/sphinx/automarkup.py | 130 ++++++++++++++++++-----------
1 file changed, 82 insertions(+), 48 deletions(-)

diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
index 409dbc4100de..bca8cf5f519d 100644
--- a/Documentation/sphinx/automarkup.py
+++ b/Documentation/sphinx/automarkup.py
@@ -45,6 +45,8 @@ RE_typedef = re.compile(r'\b(typedef)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
#
RE_doc = re.compile(r'\bDocumentation(/[\w\-_/]+)(\.\w+)*')

+RE_namespace = re.compile(r'^\s*..\s*c:namespace::\s*(\S+)\s*$')
+
#
# Reserved C words that we should skip when cross-referencing
#
@@ -62,6 +64,8 @@ Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap',
'select', 'poll', 'fork', 'execve', 'clone', 'ioctl',
'socket' ]

+c_namespace = {}
+
def markup_refs(docname, app, node):
t = node.astext()
done = 0
@@ -120,30 +124,42 @@ def markup_func_ref_sphinx3(docname, app, match):
#
# Go through the dance of getting an xref out of the C domain
#
- target = match.group(2)
+ base_target = match.group(2)
target_text = nodes.Text(match.group(0))
xref = None
- if not (target in Skipfuncs or target in Skipnames):
- for class_s, reftype_s in zip(class_str, reftype_str):
- lit_text = nodes.literal(classes=['xref', 'c', class_s])
- lit_text += target_text
- pxref = addnodes.pending_xref('', refdomain = 'c',
- reftype = reftype_s,
- reftarget = target, modname = None,
- classname = None)
- #
- # XXX The Latex builder will throw NoUri exceptions here,
- # work around that by ignoring them.
- #
- try:
- xref = cdom.resolve_xref(app.env, docname, app.builder,
- reftype_s, target, pxref,
- lit_text)
- except NoUri:
- xref = None
-
- if xref:
- return xref
+ possible_targets = [base_target]
+ # Check if this document has a namespace, and if so, try
+ # cross-referencing inside it first.
+ try:
+ namespace = c_namespace[docname]
+ except KeyError:
+ pass
+ else:
+ possible_targets.insert(0, namespace + "." + base_target)
+
+ if base_target not in Skipnames:
+ for target in possible_targets:
+ if target not in Skipfuncs:
+ for class_s, reftype_s in zip(class_str, reftype_str):
+ lit_text = nodes.literal(classes=['xref', 'c', class_s])
+ lit_text += target_text
+ pxref = addnodes.pending_xref('', refdomain = 'c',
+ reftype = reftype_s,
+ reftarget = target, modname = None,
+ classname = None)
+ #
+ # XXX The Latex builder will throw NoUri exceptions here,
+ # work around that by ignoring them.
+ #
+ try:
+ xref = cdom.resolve_xref(app.env, docname, app.builder,
+ reftype_s, target, pxref,
+ lit_text)
+ except NoUri:
+ xref = None
+
+ if xref:
+ return xref

return target_text

@@ -171,34 +187,43 @@ def markup_c_ref(docname, app, match):
#
# Go through the dance of getting an xref out of the C domain
#
- target = match.group(2)
+ base_target = match.group(2)
target_text = nodes.Text(match.group(0))
xref = None
- if not ((match.re == RE_function and target in Skipfuncs)
- or (target in Skipnames)):
- lit_text = nodes.literal(classes=['xref', 'c', class_str[match.re]])
- lit_text += target_text
- pxref = addnodes.pending_xref('', refdomain = 'c',
- reftype = reftype_str[match.re],
- reftarget = target, modname = None,
- classname = None)
- #
- # XXX The Latex builder will throw NoUri exceptions here,
- # work around that by ignoring them.
- #
- try:
- xref = cdom.resolve_xref(app.env, docname, app.builder,
- reftype_str[match.re], target, pxref,
- lit_text)
- except NoUri:
- xref = None
- #
- # Return the xref if we got it; otherwise just return the plain text.
- #
- if xref:
- return xref
+ possible_targets = [base_target]
+ # Check if this document has a namespace, and if so, try
+ # cross-referencing inside it first.
+ try:
+ namespace = c_namespace[docname]
+ except KeyError:
+ pass
else:
- return target_text
+ possible_targets.insert(0, namespace + "." + base_target)
+
+ if base_target not in Skipnames:
+ for target in possible_targets:
+ if not (match.re == RE_function and target in Skipfuncs):
+ lit_text = nodes.literal(classes=['xref', 'c', class_str[match.re]])
+ lit_text += target_text
+ pxref = addnodes.pending_xref('', refdomain = 'c',
+ reftype = reftype_str[match.re],
+ reftarget = target, modname = None,
+ classname = None)
+ #
+ # XXX The Latex builder will throw NoUri exceptions here,
+ # work around that by ignoring them.
+ #
+ try:
+ xref = cdom.resolve_xref(app.env, docname, app.builder,
+ reftype_str[match.re], target, pxref,
+ lit_text)
+ except NoUri:
+ xref = None
+
+ if xref:
+ return xref
+
+ return target_text

#
# Try to replace a documentation reference of the form Documentation/... with a
@@ -246,9 +271,18 @@ def auto_markup(app, doctree, name):
if not isinstance(node.parent, nodes.literal):
node.parent.replace(node, markup_refs(name, app, node))

+def save_c_namespace(app, docname, source):
+ lines = iter(source[0].splitlines(True))
+ for l in lines:
+ match = RE_namespace.search(l)
+ if match:
+ c_namespace[docname] = match.group(1)
+ return
+
def setup(app):
+ app.connect('source-read', save_c_namespace)
app.connect('doctree-resolved', auto_markup)
return {
- 'parallel_read_safe': True,
+ 'parallel_read_safe': False,
'parallel_write_safe': True,
}
--
2.28.0


2020-10-14 10:44:08

by Nícolas F. R. A. Prado

[permalink] [raw]
Subject: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings

With the transition to Sphinx 3, new warnings were generated by
automarkup, exposing bugs in the regexes.

The warnings were caused by the expressions matching words in the
translated versions of the documentation, since any unicode character
was matched.

Fix the regular expression by making the C regexes use ASCII and
ensuring the expressions only match the beginning of words.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
---
Documentation/sphinx/automarkup.py | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
index db13fb15cedc..43dd9025fc77 100644
--- a/Documentation/sphinx/automarkup.py
+++ b/Documentation/sphinx/automarkup.py
@@ -22,12 +22,13 @@ from itertools import chain
# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
# bit tries to restrict matches to things that won't create trouble.
#
-RE_function = re.compile(r'(([\w_][\w\d_]+)\(\))')
+RE_function = re.compile(r'\b(([a-zA-Z_]\w+)\(\))', flags=re.ASCII)

#
# Sphinx 2 uses the same :c:type role for struct, union, enum and typedef
#
-RE_generic_type = re.compile(r'(struct|union|enum|typedef)\s+([\w_][\w\d_]+)')
+RE_generic_type = re.compile(r'\b(struct|union|enum|typedef)\s+([a-zA-Z_]\w+)',
+ flags=re.ASCII)

#
# Sphinx 3 uses a different C role for each one of struct, union, enum and
@@ -42,7 +43,7 @@ RE_typedef = re.compile(r'\b(typedef)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
# Detects a reference to a documentation page of the form Documentation/... with
# an optional extension
#
-RE_doc = re.compile(r'Documentation(/[\w\-_/]+)(\.\w+)*')
+RE_doc = re.compile(r'\bDocumentation(/[\w\-_/]+)(\.\w+)*')

#
# Many places in the docs refer to common system calls. It is
--
2.28.0


2020-10-14 10:44:09

by Nícolas F. R. A. Prado

[permalink] [raw]
Subject: [PATCH v2 3/5] docs: automarkup.py: Skip C reserved words when cross-referencing

With the transition to Sphinx 3, new warnings were caused by
automarkup, exposing bugs in the name matching.

When automarkup parsed a text like "struct struct" in the documentation,
it tried to cross-reference to a "struct" symbol, which is recognized as
a C reserved word by Sphinx 3, generating a warning.

Add some C reserved words (only the ones that were causing warnings) to
a list and skip them while trying to cross-reference.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
---
Documentation/sphinx/automarkup.py | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
index 43dd9025fc77..1cc3a2cf2a88 100644
--- a/Documentation/sphinx/automarkup.py
+++ b/Documentation/sphinx/automarkup.py
@@ -45,6 +45,12 @@ RE_typedef = re.compile(r'\b(typedef)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
#
RE_doc = re.compile(r'\bDocumentation(/[\w\-_/]+)(\.\w+)*')

+#
+# Reserved C words that we should skip when cross-referencing
+#
+Skipnames = [ 'for', 'if', 'register', 'sizeof', 'struct', 'unsigned' ]
+
+
#
# Many places in the docs refer to common system calls. It is
# pointless to try to cross-reference them and, as has been known
@@ -133,7 +139,8 @@ def markup_c_ref(docname, app, match):
target = match.group(2)
target_text = nodes.Text(match.group(0))
xref = None
- if not (match.re == RE_function and target in Skipfuncs):
+ if not ((match.re == RE_function and target in Skipfuncs)
+ or (target in Skipnames)):
lit_text = nodes.literal(classes=['xref', 'c', class_str[match.re]])
lit_text += target_text
pxref = addnodes.pending_xref('', refdomain = 'c',
--
2.28.0


2020-10-14 19:20:20

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] docs: automarkup.py: Allow automatic cross-reference inside C namespace

On Wed, 14 Oct 2020 11:56:44 +0200
Mauro Carvalho Chehab <[email protected]> wrote:

> > To make the first step possible, disable the parallel_read_safe option
> > in Sphinx, since the dictionary that maps the files to the C namespaces
> > can't be concurrently updated. This unfortunately increases the build
> > time of the documentation.
>
> Disabling parallel_read_safe will make performance very poor.
> Doesn't the C domain store the current namespace somewhere?
> If so, then, instead of using the source-read phase, something
> else could be used instead.

That seems like the best solution if it exists, yes. Otherwise a simple
lock could be used around c_namespace to serialize access there, right?

Thanks,

jon

2020-10-14 20:20:21

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] docs: automarkup.py: Allow automatic cross-reference inside C namespace

Em Tue, 13 Oct 2020 23:13:40 +0000
Nícolas F. R. A. Prado <[email protected]> escreveu:

> Sphinx 3.1 introduced namespaces for C cross-references. With this,
> each C domain type/function declaration is put inside the namespace that
> was active at the time of its declaration.
>
> To support automatic cross-reference inside C namespaces:
> - Save the C namespace used in each doc file (if any) at the source-read
> phase, in the beginning of the Sphinx process.
> - When making the automarkup, if any namespace was used in the current
> file, try to cross-reference to the symbol inside of it before trying
> in the global namespace.
>
> To make the first step possible, disable the parallel_read_safe option
> in Sphinx, since the dictionary that maps the files to the C namespaces
> can't be concurrently updated. This unfortunately increases the build
> time of the documentation.

Disabling parallel_read_safe will make performance very poor.
Doesn't the C domain store the current namespace somewhere?
If so, then, instead of using the source-read phase, something
else could be used instead.

>
> Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
> ---
> Documentation/sphinx/automarkup.py | 130 ++++++++++++++++++-----------
> 1 file changed, 82 insertions(+), 48 deletions(-)
>
> diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
> index 409dbc4100de..bca8cf5f519d 100644
> --- a/Documentation/sphinx/automarkup.py
> +++ b/Documentation/sphinx/automarkup.py
> @@ -45,6 +45,8 @@ RE_typedef = re.compile(r'\b(typedef)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> #
> RE_doc = re.compile(r'\bDocumentation(/[\w\-_/]+)(\.\w+)*')
>
> +RE_namespace = re.compile(r'^\s*..\s*c:namespace::\s*(\S+)\s*$')
> +
> #
> # Reserved C words that we should skip when cross-referencing
> #
> @@ -62,6 +64,8 @@ Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap',
> 'select', 'poll', 'fork', 'execve', 'clone', 'ioctl',
> 'socket' ]
>
> +c_namespace = {}
> +
> def markup_refs(docname, app, node):
> t = node.astext()
> done = 0
> @@ -120,30 +124,42 @@ def markup_func_ref_sphinx3(docname, app, match):
> #
> # Go through the dance of getting an xref out of the C domain
> #
> - target = match.group(2)
> + base_target = match.group(2)
> target_text = nodes.Text(match.group(0))
> xref = None
> - if not (target in Skipfuncs or target in Skipnames):
> - for class_s, reftype_s in zip(class_str, reftype_str):
> - lit_text = nodes.literal(classes=['xref', 'c', class_s])
> - lit_text += target_text
> - pxref = addnodes.pending_xref('', refdomain = 'c',
> - reftype = reftype_s,
> - reftarget = target, modname = None,
> - classname = None)
> - #
> - # XXX The Latex builder will throw NoUri exceptions here,
> - # work around that by ignoring them.
> - #
> - try:
> - xref = cdom.resolve_xref(app.env, docname, app.builder,
> - reftype_s, target, pxref,
> - lit_text)
> - except NoUri:
> - xref = None
> -
> - if xref:
> - return xref
> + possible_targets = [base_target]
> + # Check if this document has a namespace, and if so, try
> + # cross-referencing inside it first.
> + try:
> + namespace = c_namespace[docname]
> + except KeyError:
> + pass
> + else:
> + possible_targets.insert(0, namespace + "." + base_target)
> +
> + if base_target not in Skipnames:
> + for target in possible_targets:
> + if target not in Skipfuncs:
> + for class_s, reftype_s in zip(class_str, reftype_str):
> + lit_text = nodes.literal(classes=['xref', 'c', class_s])
> + lit_text += target_text
> + pxref = addnodes.pending_xref('', refdomain = 'c',
> + reftype = reftype_s,
> + reftarget = target, modname = None,
> + classname = None)
> + #
> + # XXX The Latex builder will throw NoUri exceptions here,
> + # work around that by ignoring them.
> + #
> + try:
> + xref = cdom.resolve_xref(app.env, docname, app.builder,
> + reftype_s, target, pxref,
> + lit_text)
> + except NoUri:
> + xref = None
> +
> + if xref:
> + return xref
>
> return target_text
>
> @@ -171,34 +187,43 @@ def markup_c_ref(docname, app, match):
> #
> # Go through the dance of getting an xref out of the C domain
> #
> - target = match.group(2)
> + base_target = match.group(2)
> target_text = nodes.Text(match.group(0))
> xref = None
> - if not ((match.re == RE_function and target in Skipfuncs)
> - or (target in Skipnames)):
> - lit_text = nodes.literal(classes=['xref', 'c', class_str[match.re]])
> - lit_text += target_text
> - pxref = addnodes.pending_xref('', refdomain = 'c',
> - reftype = reftype_str[match.re],
> - reftarget = target, modname = None,
> - classname = None)
> - #
> - # XXX The Latex builder will throw NoUri exceptions here,
> - # work around that by ignoring them.
> - #
> - try:
> - xref = cdom.resolve_xref(app.env, docname, app.builder,
> - reftype_str[match.re], target, pxref,
> - lit_text)
> - except NoUri:
> - xref = None
> - #
> - # Return the xref if we got it; otherwise just return the plain text.
> - #
> - if xref:
> - return xref
> + possible_targets = [base_target]
> + # Check if this document has a namespace, and if so, try
> + # cross-referencing inside it first.
> + try:
> + namespace = c_namespace[docname]
> + except KeyError:
> + pass
> else:
> - return target_text
> + possible_targets.insert(0, namespace + "." + base_target)
> +
> + if base_target not in Skipnames:
> + for target in possible_targets:
> + if not (match.re == RE_function and target in Skipfuncs):
> + lit_text = nodes.literal(classes=['xref', 'c', class_str[match.re]])
> + lit_text += target_text
> + pxref = addnodes.pending_xref('', refdomain = 'c',
> + reftype = reftype_str[match.re],
> + reftarget = target, modname = None,
> + classname = None)
> + #
> + # XXX The Latex builder will throw NoUri exceptions here,
> + # work around that by ignoring them.
> + #
> + try:
> + xref = cdom.resolve_xref(app.env, docname, app.builder,
> + reftype_str[match.re], target, pxref,
> + lit_text)
> + except NoUri:
> + xref = None
> +
> + if xref:
> + return xref
> +
> + return target_text
>
> #
> # Try to replace a documentation reference of the form Documentation/... with a
> @@ -246,9 +271,18 @@ def auto_markup(app, doctree, name):
> if not isinstance(node.parent, nodes.literal):
> node.parent.replace(node, markup_refs(name, app, node))
>
> +def save_c_namespace(app, docname, source):
> + lines = iter(source[0].splitlines(True))
> + for l in lines:
> + match = RE_namespace.search(l)
> + if match:
> + c_namespace[docname] = match.group(1)
> + return
> +
> def setup(app):
> + app.connect('source-read', save_c_namespace)
> app.connect('doctree-resolved', auto_markup)
> return {
> - 'parallel_read_safe': True,
> + 'parallel_read_safe': False,
> 'parallel_write_safe': True,
> }



Thanks,
Mauro

2020-10-14 23:11:25

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings

On Tue, 13 Oct 2020 23:13:17 +0000
Nícolas F. R. A. Prado <[email protected]> wrote:

> The warnings were caused by the expressions matching words in the
> translated versions of the documentation, since any unicode character
> was matched.
>
> Fix the regular expression by making the C regexes use ASCII

I don't quite understand this part, can you give an example of the kinds
of warnings you were seeing?

Thanks,

jon

2020-10-15 02:06:11

by Nícolas F. R. A. Prado

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings

On Wed Oct 14, 2020 at 4:11 PM -03, Jonathan Corbet wrote:
>
> On Tue, 13 Oct 2020 23:13:17 +0000
> Nícolas F. R. A. Prado <[email protected]> wrote:
>
> > The warnings were caused by the expressions matching words in the
> > translated versions of the documentation, since any unicode character
> > was matched.
> >
> > Fix the regular expression by making the C regexes use ASCII
>
> I don't quite understand this part, can you give an example of the kinds
> of warnings you were seeing?

Hi Jon,
sure.

One I had noted down was:

WARNING: Unparseable C cross-reference: '调用debugfs_rename'

which I believe occurred in the chinese translation.

I think the problem is that in chinese there normally isn't space between the
words, so even if I had made the regexes only match the beginning of the word
(which I didn't, but I fixed this in this patch with the \b), it would still try
to cross-reference to that symbol containing chinese characters, which is
unparsable to sphinx.

So since valid identifiers in C are only in ASCII anyway, I used the ASCII flag
to make \w, and \d only match ASCII characters, otherwise they match any unicode
character.

If you want to have a look at other warnings or more complete output let me know
and I will recompile those versions. That sentence was the only thing I noted
down, but I think it gives a good idea of the problem.

Thanks,
Nícolas

>
> Thanks,
>
> jon


2020-10-15 02:07:00

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings

On Wed, 14 Oct 2020 20:09:10 +0000
Nícolas F. R. A. Prado <[email protected]> wrote:

> One I had noted down was:
>
> WARNING: Unparseable C cross-reference: '调用debugfs_rename'
>
> which I believe occurred in the chinese translation.
>
> I think the problem is that in chinese there normally isn't space between the
> words, so even if I had made the regexes only match the beginning of the word
> (which I didn't, but I fixed this in this patch with the \b), it would still try
> to cross-reference to that symbol containing chinese characters, which is
> unparsable to sphinx.
>
> So since valid identifiers in C are only in ASCII anyway, I used the ASCII flag
> to make \w, and \d only match ASCII characters, otherwise they match any unicode
> character.

OK, this all makes sense, as does your fix. The one thing I would ask
would be to put that warning into the changelog for future reference.

Thanks,

jon

2020-10-15 11:04:09

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] docs: automarkup.py: Fix regexes to solve sphinx 3 warnings

Em Wed, 14 Oct 2020 14:16:16 -0600
Jonathan Corbet <[email protected]> escreveu:

> On Wed, 14 Oct 2020 20:09:10 +0000
> Nícolas F. R. A. Prado <[email protected]> wrote:
>
> > One I had noted down was:
> >
> > WARNING: Unparseable C cross-reference: '调用debugfs_rename'
> >
> > which I believe occurred in the chinese translation.
> >
> > I think the problem is that in chinese there normally isn't space between the
> > words, so even if I had made the regexes only match the beginning of the word
> > (which I didn't, but I fixed this in this patch with the \b), it would still try
> > to cross-reference to that symbol containing chinese characters, which is
> > unparsable to sphinx.
> >
> > So since valid identifiers in C are only in ASCII anyway, I used the ASCII flag
> > to make \w, and \d only match ASCII characters, otherwise they match any unicode
> > character.
>
> OK, this all makes sense, as does your fix. The one thing I would ask
> would be to put that warning into the changelog for future reference.

I added yesterday patches 1 to 4 from Nícolas series on my -next tree:

https://git.linuxtv.org/mchehab/media-next.git/log/

Today, I changed the changelog in order to better describe the ASCII issue:

https://git.linuxtv.org/mchehab/media-next.git/commit/?id=f66e47f98c1e827a85654a8cfa1ba539bb381a1b

If this is enough, I'll likely send the PR to Linus later today or tomorrow,
depending on next- merge results.

Patch 5 can be added later, after we find a way to keep it safe for
parallel reading.

Thanks,
Mauro

2020-11-02 15:37:37

by Nícolas F. R. A. Prado

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] docs: automarkup.py: Allow automatic cross-reference inside C namespace

On Wed Oct 14, 2020 at 4:19 PM -03, Jonathan Corbet wrote:
>
> On Wed, 14 Oct 2020 11:56:44 +0200
> Mauro Carvalho Chehab <[email protected]> wrote:
>
> > > To make the first step possible, disable the parallel_read_safe option
> > > in Sphinx, since the dictionary that maps the files to the C namespaces
> > > can't be concurrently updated. This unfortunately increases the build
> > > time of the documentation.
> >
> > Disabling parallel_read_safe will make performance very poor.
> > Doesn't the C domain store the current namespace somewhere?
> > If so, then, instead of using the source-read phase, something
> > else could be used instead.

The issue is that C domain parsing happens at an earlier phase in the Sphinx
process, and the current stack containing the C namespace is long gone when we
get to do the automatic cross-referencing at the doctree-resolved phase.

Not only that, but the namespace isn't assigned to the file it's in, and
vice-versa, because Sphinx's interest is in assigning a C directive it is
currently reading to the current namespace, so there isn't any point in saving
which namespaces appeared at a given file. That is exactly what we want, but
Sphinx doesn't have that information.

For instance, printing all symbols from app.env.domaindata['c']['root_symbol']
shows every single C namespace, but the docname field in each of them is None.

That's why the way to go is to assign the namespaces to the files at the
source-read phase on our own.

> That seems like the best solution if it exists, yes. Otherwise a simple
> lock could be used around c_namespace to serialize access there, right?

Actually I was wrong when I said that the issue was that "they can't be
concurrently updated". When parallel_read_safe is enabled, Sphinx spawns
multiple processes rather than multiple threads, to get true concurrency by
sidestepping python's GIL. So the same c_namespace variable isn't even
accessible across the multiple processes.

Reading multiprocessing's documentation [1] it seems that memory could be shared
between the processes using Value or Array, but both would need to be passed to
the processes by the one who spawned them, that is, it would need to be done
from Sphinx's side.

So, at the moment I'm not really seeing a way to have this information be shared
concurrently by the python processes but I will keep searching.

Thanks,
Nícolas

[1] https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes