LinuxLists.cc - [PATCH] docs: Make automarkup ready for Sphinx 3.1+

2020-10-07 23:59:44

Subject: [PATCH] docs: Make automarkup ready for Sphinx 3.1+

While Sphinx 2 used a single c:type role for struct, union, enum and
typedef, Sphinx 3 uses a specific role for each one.
To keep backward compatibility, detect the Sphinx version and use the
correct roles for that version.

Also, Sphinx 3 is more strict with its C domain and generated warnings,
exposing issues in the parsing.
To fix the warnings, make the C regexes use ASCII, ensure the
expressions only match the beginning of words and skip trying to
cross-reference C reserved words.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
---

Hi,

after Mauro's series making everything ready for Sphinx 3.1, only the automarkup
was left to be ported.
This patch makes the needed changes to automarkup so that we can soon flip the
switch to Sphinx 3.1.

This change was tested both with Sphinx 2.4.4 and Sphinx 3.1.

This change doesn't add any warnings to the Documentation build.
I tested it with Mauro's series but later rebased it to docs-next, and it can be
accepted independently of that series.

I ended up doing more than one thing in this single patch, but since it was all
changing the same lines and for the same purpose, I felt it would be better to
keep it as a single commit.

Mauro,
if this patch is ok, the 3rd patch in your series, which disables automarkup for
sphinx 3, should be dropped.
Although I'm not sure what the implications of your patches adding namespaces
and using the c:macro for functions are.
All I did here was use the specific roles for sphinx 3 and fix the warnings, but
that was enough to get correct cross-references even after your series.

Thanks,
Nícolas

Documentation/sphinx/automarkup.py | 69 ++++++++++++++++++++++++++----
1 file changed, 60 insertions(+), 9 deletions(-)

diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
index a1b0f554cd82..fd1e927408ad 100644
--- a/Documentation/sphinx/automarkup.py
+++ b/Documentation/sphinx/automarkup.py
@@ -22,13 +22,34 @@ from itertools import chain
# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
# bit tries to restrict matches to things that won't create trouble.
#
-RE_function = re.compile(r'(([\w_][\w\d_]+))')
-RE_type = re.compile(r'(struct|union|enum|typedef)\s+([\w_][\w\d_]+)')
+RE_function = re.compile(r'\b(([a-zA-Z_]\w+))', flags=re.ASCII)
+
+#
+# Sphinx 2 uses the same :c:type role for struct, union, enum and typedef
+#
+RE_generic_type = re.compile(r'\b(struct|union|enum|typedef)\s+([a-zA-Z_]\w+)',
+ flags=re.ASCII)
+
+#
+# Sphinx 3 uses a different C role for each one of struct, union, enum and
+# typedef
+#
+RE_struct = re.compile(r'\b(struct)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
+RE_union = re.compile(r'\b(union)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
+RE_enum = re.compile(r'\b(enum)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
+RE_typedef = re.compile(r'\b(typedef)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
+
#
# Detects a reference to a documentation page of the form Documentation/... with
# an optional extension
#
-RE_doc = re.compile(r'Documentation(/[\w\-_/]+)(\.\w+)*')
+RE_doc = re.compile(r'\bDocumentation(/[\w\-_/]+)(\.\w+)*')
+
+#
+# Reserved C words that we should skip when cross-referencing
+#
+Skipnames = [ 'for', 'if', 'register', 'sizeof', 'struct', 'unsigned' ]
+

#
# Many places in the docs refer to common system calls. It is
@@ -48,9 +69,22 @@ def markup_refs(docname, app, node):
#
# Associate each regex with the function that will markup its matches
#
- markup_func = {RE_type: markup_c_ref,
- RE_function: markup_c_ref,
- RE_doc: markup_doc_ref}
+ markup_func_sphinx2 = {RE_doc: markup_doc_ref,
+ RE_function: markup_c_ref,
+ RE_generic_type: markup_c_ref}
+
+ markup_func_sphinx3 = {RE_doc: markup_doc_ref,
+ RE_function: markup_c_ref,
+ RE_struct: markup_c_ref,
+ RE_union: markup_c_ref,
+ RE_enum: markup_c_ref,
+ RE_typedef: markup_c_ref}
+
+ if sphinx.__version__[0] == '3':
+ markup_func = markup_func_sphinx3
+ else:
+ markup_func = markup_func_sphinx2
+
match_iterators = [regex.finditer(t) for regex in markup_func]
#
# Sort all references by the starting position in text
@@ -79,8 +113,24 @@ def markup_refs(docname, app, node):
# type_name) with an appropriate cross reference.
#
def markup_c_ref(docname, app, match):
- class_str = {RE_function: 'c-func', RE_type: 'c-type'}
- reftype_str = {RE_function: 'function', RE_type: 'type'}
+ class_str = {RE_function: 'c-func',
+ # Sphinx 2 only
+ RE_generic_type: 'c-type',
+ # Sphinx 3+ only
+ RE_struct: 'c-struct',
+ RE_union: 'c-union',
+ RE_enum: 'c-enum',
+ RE_typedef: 'c-type',
+ }
+ reftype_str = {RE_function: 'function',
+ # Sphinx 2 only
+ RE_generic_type: 'type',
+ # Sphinx 3+ only
+ RE_struct: 'struct',
+ RE_union: 'union',
+ RE_enum: 'enum',
+ RE_typedef: 'type',
+ }

cdom = app.env.domains['c']
#
@@ -89,7 +139,8 @@ def markup_c_ref(docname, app, match):
target = match.group(2)
target_text = nodes.Text(match.group(0))
xref = None
- if not (match.re == RE_function and target in Skipfuncs):
+ if not ((match.re == RE_function and target in Skipfuncs)
+ or (target in Skipnames)):
lit_text = nodes.literal(classes=['xref', 'c', class_str[match.re]])
lit_text += target_text
pxref = addnodes.pending_xref('', refdomain = 'c',
--
2.28.0

2020-10-08 00:46:28

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH] docs: Make automarkup ready for Sphinx 3.1+

On Wed, Oct 07, 2020 at 11:12:25PM +0000, N?colas F. R. A. Prado wrote:
> While Sphinx 2 used a single c:type role for struct, union, enum and
> typedef, Sphinx 3 uses a specific role for each one.
> To keep backward compatibility, detect the Sphinx version and use the
> correct roles for that version.
>
> Also, Sphinx 3 is more strict with its C domain and generated warnings,
> exposing issues in the parsing.
> To fix the warnings, make the C regexes use ASCII, ensure the
> expressions only match the beginning of words and skip trying to
> cross-reference C reserved words.

Thank you for doing this!

I have a feature request ... could you automarkup NULL as being :c:macro?
Or maybe just anything matching \<[[:upper:]_[:digit:]]*\>
(i may have my regex syntax confused ... a word composed of any
arrangement of upper-case, digits and underscores.)

2020-10-08 02:17:49

by Nícolas F. R. A. Prado

[permalink] [raw]

Subject: Re: [PATCH] docs: Make automarkup ready for Sphinx 3.1+

On Wed Oct 7, 2020 at 8:40 PM -03, Matthew Wilcox wrote:
>
> On Wed, Oct 07, 2020 at 11:12:25PM +0000, Nícolas F. R. A. Prado wrote:
> > While Sphinx 2 used a single c:type role for struct, union, enum and
> > typedef, Sphinx 3 uses a specific role for each one.
> > To keep backward compatibility, detect the Sphinx version and use the
> > correct roles for that version.
> >
> > Also, Sphinx 3 is more strict with its C domain and generated warnings,
> > exposing issues in the parsing.
> > To fix the warnings, make the C regexes use ASCII, ensure the
> > expressions only match the beginning of words and skip trying to
> > cross-reference C reserved words.
>
> Thank you for doing this!
>
> I have a feature request ... could you automarkup NULL as being
> :c:macro?
> Or maybe just anything matching \<[[:upper:]_[:digit:]]*\>
> (i may have my regex syntax confused ... a word composed of any
> arrangement of upper-case, digits and underscores.)

I think what you are suggesting are two separate things.

For NULL, what you're interested in is that it appears in a monospaced font, as
if written ``NULL``, right? As I don't think a cross-reference to "the NULL
macro definition" would make much sense.

While "anything containing only upper-case, digits and underscores" would
actually be for cross-referencing to the definition of the macro symbol in
question, right?

At the moment, this automarkup script is being used only for cross-referencing,
but it is indeed a generic automarkup script, and could be used for the
formatting of NULL. But we also can't just make every upper-case word written
in monospaced font, as that doesn't always makes sense.

So if I understood your two requests correctly, I think we could:
1. Always automatically format NULL using a literal ``.
2. Try to cross-reference every upper-case word with the macro definition using
:c:macro, but if the cross-reference doesn't exist, format it normally, since
it's just normal text (this is what we're doing for C references at the moment).

What do you think?

Thanks,
Nícolas

2020-10-08 04:17:17

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH] docs: Make automarkup ready for Sphinx 3.1+

On Thu, Oct 08, 2020 at 02:15:24AM +0000, N?colas F. R. A. Prado wrote:
> > I have a feature request ... could you automarkup NULL as being
> > :c:macro?
> > Or maybe just anything matching \<[[:upper:]_[:digit:]]*\>
> > (i may have my regex syntax confused ... a word composed of any
> > arrangement of upper-case, digits and underscores.)
>
> I think what you are suggesting are two separate things.
>
> For NULL, what you're interested in is that it appears in a monospaced font, as
> if written ``NULL``, right? As I don't think a cross-reference to "the NULL
> macro definition" would make much sense.
>
> While "anything containing only upper-case, digits and underscores" would
> actually be for cross-referencing to the definition of the macro symbol in
> question, right?

Well, maybe! What I'd really like is to remove all the markup from
xarray.rst. Jon managed to get rid of most of it with the (), but
there's still markup on:

LONG_MAX
NULL
-EBUSY
true
XA_MARK_[012]
XA_FLAGS_*
ENOMEM
EINVAL

I'm not sure there's much that automarkup can do about ``true``, but all
the others fit the all-caps-and-underscore-and-digits pattern.

I don't know how much we want errnos to link to anything in particular.
So maybe split these into 'well-known' (eg defined by ANSI C or POSIX)
definitions and things which are local macros:

LONG_MAX
NULL
-EBUSY
ENOMEM
EINVAL

vs

XA_MARK_[012]
XA_FLAGS_*

I'm willing to add more inline kernel-doc to get this to work better.
Or even convert #defines to enums ... whatever gets this working better.

> At the moment, this automarkup script is being used only for cross-referencing,
> but it is indeed a generic automarkup script, and could be used for the
> formatting of NULL. But we also can't just make every upper-case word written
> in monospaced font, as that doesn't always makes sense.
>
> So if I understood your two requests correctly, I think we could:
> 1. Always automatically format NULL using a literal ``.
> 2. Try to cross-reference every upper-case word with the macro definition using
> :c:macro, but if the cross-reference doesn't exist, format it normally, since
> it's just normal text (this is what we're doing for C references at the moment).
>
> What do you think?

I think this works well, except that we need to match not just NULL
but other well-known ANSI/POSIX keywords.

Thanks for entertaining this!

2020-10-08 05:32:46

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: Re: [PATCH] docs: Make automarkup ready for Sphinx 3.1+

Hi Nícolas,

Em Wed, 07 Oct 2020 23:12:25 +0000
Nícolas F. R. A. Prado <[email protected]> escreveu:

> While Sphinx 2 used a single c:type role for struct, union, enum and
> typedef, Sphinx 3 uses a specific role for each one.
> To keep backward compatibility, detect the Sphinx version and use the
> correct roles for that version.
>
> Also, Sphinx 3 is more strict with its C domain and generated warnings,
> exposing issues in the parsing.
> To fix the warnings, make the C regexes use ASCII, ensure the
> expressions only match the beginning of words and skip trying to
> cross-reference C reserved words.
>
> Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
> ---
>
> Hi,
>
> after Mauro's series making everything ready for Sphinx 3.1, only the automarkup
> was left to be ported.
> This patch makes the needed changes to automarkup so that we can soon flip the
> switch to Sphinx 3.1.
>
> This change was tested both with Sphinx 2.4.4 and Sphinx 3.1.
>
> This change doesn't add any warnings to the Documentation build.
> I tested it with Mauro's series but later rebased it to docs-next, and it can be
> accepted independently of that series.
>
> I ended up doing more than one thing in this single patch, but since it was all
> changing the same lines and for the same purpose, I felt it would be better to
> keep it as a single commit.
>

Thanks for doing this! That was the last missing part on fully supporting
Sphinx 3.1+.

> Mauro,
> if this patch is ok, the 3rd patch in your series, which disables automarkup for
> sphinx 3, should be dropped.

Yeah, sure.

> Although I'm not sure what the implications of your patches adding namespaces
> and using the c:macro for functions are.

With regards to namespaces:

Currently, only the media docs use namespaces, and it declares it at the
beginning of each file that needs it, without overriding it later[1].

[1] btw, the cdomain.py backward compat code doesn't support namespace
changes - as it parses namespaces before handling the C domain tags.
I doubt that we'll need to have a single .rst file using more than
one namespace anyway.

The main usage is to avoid conflicts for uAPI documentation for
syscalls - actually for libc userspace wrappers to syscalls. It documents
things like: open, close, read, write, ioctl, poll, select.

I'm not sure if the automarkup should be aware of it, or if the c.py code
at Sphinx 3.x will add the namespace automatically, but I suspect that
automarkup will need to handle it as well.

One file you could use for checking it is this one:

Documentation/userspace-api/media/v4l/hist-v4l2.rst

It contains a namespace directive and documents what changed without
using any explicit reference (after my patch series + linux-next).

With regards to c:macro vs c:function:

I suspect that automarkup should test both when trying to do
cross-references for function-like calls. E. g. test first if
there is a :c:function, falling back to check for :c:macro.

I would add a "sphinx3_c_func_ref" function that would handle
such special case, e. g. something like:

markup_func_sphinx3 = {RE_doc: markup_doc_ref,
RE_function: sphinx3_c_func_ref,
RE_struct: markup_c_ref,
RE_union: markup_c_ref,
RE_enum: markup_c_ref,
RE_typedef: markup_c_ref}

> All I did here was use the specific roles for sphinx 3 and fix the warnings, but
> that was enough to get correct cross-references even after your series.
>
> Thanks,
> Nícolas

>
>
> Documentation/sphinx/automarkup.py | 69 ++++++++++++++++++++++++++----
> 1 file changed, 60 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
> index a1b0f554cd82..fd1e927408ad 100644
> --- a/Documentation/sphinx/automarkup.py
> +++ b/Documentation/sphinx/automarkup.py
> @@ -22,13 +22,34 @@ from itertools import chain
> # :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
> # bit tries to restrict matches to things that won't create trouble.
> #
> -RE_function = re.compile(r'(([\w_][\w\d_]+))')
> -RE_type = re.compile(r'(struct|union|enum|typedef)\s+([\w_][\w\d_]+)')
> +RE_function = re.compile(r'\b(([a-zA-Z_]\w+))', flags=re.ASCII)
> +
> +#
> +# Sphinx 2 uses the same :c:type role for struct, union, enum and typedef
> +#
> +RE_generic_type = re.compile(r'\b(struct|union|enum|typedef)\s+([a-zA-Z_]\w+)',
> + flags=re.ASCII)
> +
> +#
> +# Sphinx 3 uses a different C role for each one of struct, union, enum and
> +# typedef
> +#
> +RE_struct = re.compile(r'\b(struct)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> +RE_union = re.compile(r'\b(union)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> +RE_enum = re.compile(r'\b(enum)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> +RE_typedef = re.compile(r'\b(typedef)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> +
> #
> # Detects a reference to a documentation page of the form Documentation/... with
> # an optional extension
> #
> -RE_doc = re.compile(r'Documentation(/[\w\-_/]+)(\.\w+)*')
> +RE_doc = re.compile(r'\bDocumentation(/[\w\-_/]+)(\.\w+)*')
> +
> +#
> +# Reserved C words that we should skip when cross-referencing
> +#
> +Skipnames = [ 'for', 'if', 'register', 'sizeof', 'struct', 'unsigned' ]
> +
>
> #
> # Many places in the docs refer to common system calls. It is
> @@ -48,9 +69,22 @@ def markup_refs(docname, app, node):
> #
> # Associate each regex with the function that will markup its matches
> #
> - markup_func = {RE_type: markup_c_ref,
> - RE_function: markup_c_ref,
> - RE_doc: markup_doc_ref}
> + markup_func_sphinx2 = {RE_doc: markup_doc_ref,
> + RE_function: markup_c_ref,
> + RE_generic_type: markup_c_ref}
> +
> + markup_func_sphinx3 = {RE_doc: markup_doc_ref,
> + RE_function: markup_c_ref,
> + RE_struct: markup_c_ref,
> + RE_union: markup_c_ref,
> + RE_enum: markup_c_ref,
> + RE_typedef: markup_c_ref}
> +
> + if sphinx.__version__[0] == '3':
> + markup_func = markup_func_sphinx3
> + else:
> + markup_func = markup_func_sphinx2
> +
> match_iterators = [regex.finditer(t) for regex in markup_func]
> #
> # Sort all references by the starting position in text
> @@ -79,8 +113,24 @@ def markup_refs(docname, app, node):
> # type_name) with an appropriate cross reference.
> #
> def markup_c_ref(docname, app, match):
> - class_str = {RE_function: 'c-func', RE_type: 'c-type'}
> - reftype_str = {RE_function: 'function', RE_type: 'type'}
> + class_str = {RE_function: 'c-func',
> + # Sphinx 2 only
> + RE_generic_type: 'c-type',
> + # Sphinx 3+ only
> + RE_struct: 'c-struct',
> + RE_union: 'c-union',
> + RE_enum: 'c-enum',
> + RE_typedef: 'c-type',
> + }
> + reftype_str = {RE_function: 'function',
> + # Sphinx 2 only
> + RE_generic_type: 'type',
> + # Sphinx 3+ only
> + RE_struct: 'struct',
> + RE_union: 'union',
> + RE_enum: 'enum',
> + RE_typedef: 'type',
> + }
>
> cdom = app.env.domains['c']
> #
> @@ -89,7 +139,8 @@ def markup_c_ref(docname, app, match):
> target = match.group(2)
> target_text = nodes.Text(match.group(0))
> xref = None
> - if not (match.re == RE_function and target in Skipfuncs):
> + if not ((match.re == RE_function and target in Skipfuncs)
> + or (target in Skipnames)):
> lit_text = nodes.literal(classes=['xref', 'c', class_str[match.re]])
> lit_text += target_text
> pxref = addnodes.pending_xref('', refdomain = 'c',

Thanks,
Mauro

2020-10-08 06:18:10

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: Re: [PATCH] docs: Make automarkup ready for Sphinx 3.1+

Em Thu, 8 Oct 2020 03:47:06 +0100
Matthew Wilcox <[email protected]> escreveu:

> On Thu, Oct 08, 2020 at 02:15:24AM +0000, Nícolas F. R. A. Prado wrote:
> > > I have a feature request ... could you automarkup NULL as being
> > > :c:macro?
> > > Or maybe just anything matching \<[[:upper:]_[:digit:]]*\>
> > > (i may have my regex syntax confused ... a word composed of any
> > > arrangement of upper-case, digits and underscores.)
> >
> > I think what you are suggesting are two separate things.
> >
> > For NULL, what you're interested in is that it appears in a monospaced font, as
> > if written ``NULL``, right? As I don't think a cross-reference to "the NULL
> > macro definition" would make much sense.
> >
> > While "anything containing only upper-case, digits and underscores" would
> > actually be for cross-referencing to the definition of the macro symbol in
> > question, right?
>
> Well, maybe! What I'd really like is to remove all the markup from
> xarray.rst. Jon managed to get rid of most of it with the (), but
> there's still markup on:
>
> LONG_MAX
> NULL
> -EBUSY
> true
> XA_MARK_[012]
> XA_FLAGS_*
> ENOMEM
> EINVAL
>
> I'm not sure there's much that automarkup can do about ``true``, but all
> the others fit the all-caps-and-underscore-and-digits pattern.
>
> I don't know how much we want errnos to link to anything in particular.
> So maybe split these into 'well-known' (eg defined by ANSI C or POSIX)
> definitions and things which are local macros:
>
> LONG_MAX
> NULL
> -EBUSY
> ENOMEM
> EINVAL

Yeah, a nice improvement would be to auto-markup error codes and NULL as
literal blocks.

>
> vs
>
> XA_MARK_[012]

> XA_FLAGS_*

Actually, things that end with an * (but doesn't start with an *)
are good candidates for being literals - although extra care should
be taken on such case, as parsing those automatically will likely hit
lots of false-positives.

> I'm willing to add more inline kernel-doc to get this to work better.

Why? inline kernel-doc should be evaluated just like normal blocks.

Right now, kernel-doc handles constants like NULL and XA_FLAGS_* using
two ways:

%FOO
or
``FOO``

The regex for those are:

my $type_constant = '\b``([^\`]+)``\b';
my $type_constant2 = '\%([-_\w]+)';

In other words, "%FOO" should not contain any symbol, except for
'-' and '_'.

If there is any other symbol, like in "XA_FLAGS_*", the alternative
syntax is needed.

No matter if you use inline or block definitions, the same regexes
are used.

> Or even convert #defines to enums ... whatever gets this working better.

Using enums where possible[1] is nicer, IMHO.

[1] enums shouldn't be used on uAPI, as its size depends on the C
compiler implementation.

Thanks,
Mauro

2020-10-08 12:29:21

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: Re: [PATCH] docs: Make automarkup ready for Sphinx 3.1+

Em Thu, 8 Oct 2020 12:31:27 +0100
Matthew Wilcox <[email protected]> escreveu:

> On Thu, Oct 08, 2020 at 08:03:06AM +0200, Mauro Carvalho Chehab wrote:
> > Em Thu, 8 Oct 2020 03:47:06 +0100
> > Matthew Wilcox <[email protected]> escreveu:
> >
> > > On Thu, Oct 08, 2020 at 02:15:24AM +0000, Nícolas F. R. A. Prado wrote:
> > > > > I have a feature request ... could you automarkup NULL as being
> > > > > :c:macro?
> > > > > Or maybe just anything matching \<[[:upper:]_[:digit:]]*\>
> > > > > (i may have my regex syntax confused ... a word composed of any
> > > > > arrangement of upper-case, digits and underscores.)
> > > >
> > > > I think what you are suggesting are two separate things.
> > > >
> > > > For NULL, what you're interested in is that it appears in a monospaced font, as
> > > > if written ``NULL``, right? As I don't think a cross-reference to "the NULL
> > > > macro definition" would make much sense.
> > > >
> > > > While "anything containing only upper-case, digits and underscores" would
> > > > actually be for cross-referencing to the definition of the macro symbol in
> > > > question, right?
> > >
> > > Well, maybe! What I'd really like is to remove all the markup from
> > > xarray.rst. Jon managed to get rid of most of it with the (), but
> > > there's still markup on:
> > >
> > > LONG_MAX
> > > NULL
> > > -EBUSY
> > > true
> > > XA_MARK_[012]
> > > XA_FLAGS_*
> > > ENOMEM
> > > EINVAL
> > >
> > > I'm not sure there's much that automarkup can do about ``true``, but all
> > > the others fit the all-caps-and-underscore-and-digits pattern.
> > >
> > > I don't know how much we want errnos to link to anything in particular.
> > > So maybe split these into 'well-known' (eg defined by ANSI C or POSIX)
> > > definitions and things which are local macros:
> > >
> > > LONG_MAX
> > > NULL
> > > -EBUSY
> > > ENOMEM
> > > EINVAL
> >
> > Yeah, a nice improvement would be to auto-markup error codes and NULL as
> > literal blocks.
> >
> > >
> > > vs
> > >
> > > XA_MARK_[012]
> >
> > > XA_FLAGS_*
> >
> > Actually, things that end with an * (but doesn't start with an *)
> > are good candidates for being literals - although extra care should
> > be taken on such case, as parsing those automatically will likely hit
> > lots of false-positives.
>
> I do apologise. I was trying to be concise in email. In the actual
> text file, I currently have:
>
> ``XA_FLAGS_ALLOC``
> ``XA_FLAGS_ALLOC1``
> ``XA_FLAGS_LOCK_IRQ``
> ``XA_FLAGS_LOCK_BH``
> ``XA_FLAGS_TRACK_FREE``

Ah, OK!

>
> > > I'm willing to add more inline kernel-doc to get this to work better.
> >
> > Why? inline kernel-doc should be evaluated just like normal blocks.
> >
> > Right now, kernel-doc handles constants like NULL and XA_FLAGS_* using
> > two ways:
> >
> > %FOO
> > or
> > ``FOO``
> >
> > The regex for those are:
> >
> > my $type_constant = '\b``([^\`]+)``\b';
> > my $type_constant2 = '\%([-_\w]+)';
>
> Right, but that's in kernel-doc ... in a .rst file, I believe we have
> to use the ``SYMBOL`` syntax.

As you mentioned that you're "willing to add more inline kernel-doc",
I assumed that you were talking about kernel-doc markups at the C files.

Yeah, inside a .rst file, this should be ``SYMBOL``.

As you suggested, the automarkup.py could help with replacing some
of those.

-

Just my two cents: a documentation writer hat, it sounds weird to me to
mix ``SYMBOL`` (with markup) with NULL (without explicit markup) at the
same file.

Thanks,
Mauro

2020-10-08 13:59:07

by Nícolas F. R. A. Prado

[permalink] [raw]

Subject: Re: [PATCH] docs: Make automarkup ready for Sphinx 3.1+

On Thu Oct 8, 2020 at 2:27 AM -03, Mauro Carvalho Chehab wrote:
>
> Hi Nícolas,
>
> Em Wed, 07 Oct 2020 23:12:25 +0000
> Nícolas F. R. A. Prado <[email protected]> escreveu:
>
> > While Sphinx 2 used a single c:type role for struct, union, enum and
> > typedef, Sphinx 3 uses a specific role for each one.
> > To keep backward compatibility, detect the Sphinx version and use the
> > correct roles for that version.
> >
> > Also, Sphinx 3 is more strict with its C domain and generated warnings,
> > exposing issues in the parsing.
> > To fix the warnings, make the C regexes use ASCII, ensure the
> > expressions only match the beginning of words and skip trying to
> > cross-reference C reserved words.
> >
> > Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
> > ---
> >
> > Hi,
> >
> > after Mauro's series making everything ready for Sphinx 3.1, only the automarkup
> > was left to be ported.
> > This patch makes the needed changes to automarkup so that we can soon flip the
> > switch to Sphinx 3.1.
> >
> > This change was tested both with Sphinx 2.4.4 and Sphinx 3.1.
> >
> > This change doesn't add any warnings to the Documentation build.
> > I tested it with Mauro's series but later rebased it to docs-next, and it can be
> > accepted independently of that series.
> >
> > I ended up doing more than one thing in this single patch, but since it was all
> > changing the same lines and for the same purpose, I felt it would be better to
> > keep it as a single commit.
> >
>
> Thanks for doing this! That was the last missing part on fully
> supporting
> Sphinx 3.1+.
>
> > Mauro,
> > if this patch is ok, the 3rd patch in your series, which disables automarkup for
> > sphinx 3, should be dropped.
>
> Yeah, sure.
>
> > Although I'm not sure what the implications of your patches adding namespaces
> > and using the c:macro for functions are.
>
> With regards to namespaces:
>
> Currently, only the media docs use namespaces, and it declares it at the
> beginning of each file that needs it, without overriding it later[1].
>
> [1] btw, the cdomain.py backward compat code doesn't support namespace
> changes - as it parses namespaces before handling the C domain tags.
> I doubt that we'll need to have a single .rst file using more than
> one namespace anyway.
>
> The main usage is to avoid conflicts for uAPI documentation for
> syscalls - actually for libc userspace wrappers to syscalls. It
> documents
> things like: open, close, read, write, ioctl, poll, select.

If it's mainly for that, I think automarkup could skip handling namespaces.
From automarkup.py:

#
# Many places in the docs refer to common system calls. It is
# pointless to try to cross-reference them and, as has been known
# to happen, somebody defining a function by these names can lead
# to the creation of incorrect and confusing cross references. So
# just don't even try with these names.
#
Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap',
'select', 'poll', 'fork', 'execve', 'clone', 'ioctl',
'socket' ]

So unless I'm confusing things and the namespaces actually sidestep that issue,
the namespace handling could be left out of automarkup.

>
> I'm not sure if the automarkup should be aware of it, or if the c.py
> code
> at Sphinx 3.x will add the namespace automatically, but I suspect that
> automarkup will need to handle it as well.
>
> One file you could use for checking it is this one:
>
> Documentation/userspace-api/media/v4l/hist-v4l2.rst
>
> It contains a namespace directive and documents what changed without
> using any explicit reference (after my patch series + linux-next).
>
> With regards to c:macro vs c:function:
>
> I suspect that automarkup should test both when trying to do
> cross-references for function-like calls. E. g. test first if
> there is a :c:function, falling back to check for :c:macro.
>
> I would add a "sphinx3_c_func_ref" function that would handle
> such special case, e. g. something like:
>
> markup_func_sphinx3 = {RE_doc: markup_doc_ref,
> RE_function: sphinx3_c_func_ref,
> RE_struct: markup_c_ref,
> RE_union: markup_c_ref,
> RE_enum: markup_c_ref,
> RE_typedef: markup_c_ref}

Sounds good.

I'll make this patch into a series and add that function/macro handling as a new
patch, and the namespace handling depending on your answer on the above comment,
for v2.

>
> > All I did here was use the specific roles for sphinx 3 and fix the warnings, but
> > that was enough to get correct cross-references even after your series.
> >
> > Thanks,
> > Nícolas
>
> >
> >
> > Documentation/sphinx/automarkup.py | 69 ++++++++++++++++++++++++++----
> > 1 file changed, 60 insertions(+), 9 deletions(-)
> >
> > diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
> > index a1b0f554cd82..fd1e927408ad 100644
> > --- a/Documentation/sphinx/automarkup.py
> > +++ b/Documentation/sphinx/automarkup.py
> > @@ -22,13 +22,34 @@ from itertools import chain
> > # :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
> > # bit tries to restrict matches to things that won't create trouble.
> > #
> > -RE_function = re.compile(r'(([\w_][\w\d_]+))')
> > -RE_type = re.compile(r'(struct|union|enum|typedef)\s+([\w_][\w\d_]+)')
> > +RE_function = re.compile(r'\b(([a-zA-Z_]\w+))', flags=re.ASCII)
> > +
> > +#
> > +# Sphinx 2 uses the same :c:type role for struct, union, enum and typedef
> > +#
> > +RE_generic_type = re.compile(r'\b(struct|union|enum|typedef)\s+([a-zA-Z_]\w+)',
> > + flags=re.ASCII)
> > +
> > +#
> > +# Sphinx 3 uses a different C role for each one of struct, union, enum and
> > +# typedef
> > +#
> > +RE_struct = re.compile(r'\b(struct)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> > +RE_union = re.compile(r'\b(union)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> > +RE_enum = re.compile(r'\b(enum)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> > +RE_typedef = re.compile(r'\b(typedef)\s+([a-zA-Z_]\w+)', flags=re.ASCII)
> > +
> > #
> > # Detects a reference to a documentation page of the form Documentation/... with
> > # an optional extension
> > #
> > -RE_doc = re.compile(r'Documentation(/[\w\-_/]+)(\.\w+)*')
> > +RE_doc = re.compile(r'\bDocumentation(/[\w\-_/]+)(\.\w+)*')
> > +
> > +#
> > +# Reserved C words that we should skip when cross-referencing
> > +#
> > +Skipnames = [ 'for', 'if', 'register', 'sizeof', 'struct', 'unsigned' ]
> > +
> >
> > #
> > # Many places in the docs refer to common system calls. It is
> > @@ -48,9 +69,22 @@ def markup_refs(docname, app, node):
> > #
> > # Associate each regex with the function that will markup its matches
> > #
> > - markup_func = {RE_type: markup_c_ref,
> > - RE_function: markup_c_ref,
> > - RE_doc: markup_doc_ref}
> > + markup_func_sphinx2 = {RE_doc: markup_doc_ref,
> > + RE_function: markup_c_ref,
> > + RE_generic_type: markup_c_ref}
> > +
> > + markup_func_sphinx3 = {RE_doc: markup_doc_ref,
> > + RE_function: markup_c_ref,
> > + RE_struct: markup_c_ref,
> > + RE_union: markup_c_ref,
> > + RE_enum: markup_c_ref,
> > + RE_typedef: markup_c_ref}
> > +
> > + if sphinx.__version__[0] == '3':
> > + markup_func = markup_func_sphinx3
> > + else:
> > + markup_func = markup_func_sphinx2
> > +
> > match_iterators = [regex.finditer(t) for regex in markup_func]
> > #
> > # Sort all references by the starting position in text
> > @@ -79,8 +113,24 @@ def markup_refs(docname, app, node):
> > # type_name) with an appropriate cross reference.
> > #
> > def markup_c_ref(docname, app, match):
> > - class_str = {RE_function: 'c-func', RE_type: 'c-type'}
> > - reftype_str = {RE_function: 'function', RE_type: 'type'}
> > + class_str = {RE_function: 'c-func',
> > + # Sphinx 2 only
> > + RE_generic_type: 'c-type',
> > + # Sphinx 3+ only
> > + RE_struct: 'c-struct',
> > + RE_union: 'c-union',
> > + RE_enum: 'c-enum',
> > + RE_typedef: 'c-type',
> > + }
> > + reftype_str = {RE_function: 'function',
> > + # Sphinx 2 only
> > + RE_generic_type: 'type',
> > + # Sphinx 3+ only
> > + RE_struct: 'struct',
> > + RE_union: 'union',
> > + RE_enum: 'enum',
> > + RE_typedef: 'type',
> > + }
> >
> > cdom = app.env.domains['c']
> > #
> > @@ -89,7 +139,8 @@ def markup_c_ref(docname, app, match):
> > target = match.group(2)
> > target_text = nodes.Text(match.group(0))
> > xref = None
> > - if not (match.re == RE_function and target in Skipfuncs):
> > + if not ((match.re == RE_function and target in Skipfuncs)
> > + or (target in Skipnames)):
> > lit_text = nodes.literal(classes=['xref', 'c', class_str[match.re]])
> > lit_text += target_text
> > pxref = addnodes.pending_xref('', refdomain = 'c',
>
>
>
> Thanks,
> Mauro

Thanks,
Nícolas

2020-10-08 16:40:10

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH] docs: Make automarkup ready for Sphinx 3.1+

On Thu, Oct 08, 2020 at 08:03:06AM +0200, Mauro Carvalho Chehab wrote:
> Em Thu, 8 Oct 2020 03:47:06 +0100
> Matthew Wilcox <[email protected]> escreveu:
>
> > On Thu, Oct 08, 2020 at 02:15:24AM +0000, N?colas F. R. A. Prado wrote:
> > > > I have a feature request ... could you automarkup NULL as being
> > > > :c:macro?
> > > > Or maybe just anything matching \<[[:upper:]_[:digit:]]*\>
> > > > (i may have my regex syntax confused ... a word composed of any
> > > > arrangement of upper-case, digits and underscores.)
> > >
> > > I think what you are suggesting are two separate things.
> > >
> > > For NULL, what you're interested in is that it appears in a monospaced font, as
> > > if written ``NULL``, right? As I don't think a cross-reference to "the NULL
> > > macro definition" would make much sense.
> > >
> > > While "anything containing only upper-case, digits and underscores" would
> > > actually be for cross-referencing to the definition of the macro symbol in
> > > question, right?
> >
> > Well, maybe! What I'd really like is to remove all the markup from
> > xarray.rst. Jon managed to get rid of most of it with the (), but
> > there's still markup on:
> >
> > LONG_MAX
> > NULL
> > -EBUSY
> > true
> > XA_MARK_[012]
> > XA_FLAGS_*
> > ENOMEM
> > EINVAL
> >
> > I'm not sure there's much that automarkup can do about ``true``, but all
> > the others fit the all-caps-and-underscore-and-digits pattern.
> >
> > I don't know how much we want errnos to link to anything in particular.
> > So maybe split these into 'well-known' (eg defined by ANSI C or POSIX)
> > definitions and things which are local macros:
> >
> > LONG_MAX
> > NULL
> > -EBUSY
> > ENOMEM
> > EINVAL
>
> Yeah, a nice improvement would be to auto-markup error codes and NULL as
> literal blocks.
>
> >
> > vs
> >
> > XA_MARK_[012]
>
> > XA_FLAGS_*
>
> Actually, things that end with an * (but doesn't start with an *)
> are good candidates for being literals - although extra care should
> be taken on such case, as parsing those automatically will likely hit
> lots of false-positives.

I do apologise. I was trying to be concise in email. In the actual
text file, I currently have:

``XA_FLAGS_ALLOC``
``XA_FLAGS_ALLOC1``
``XA_FLAGS_LOCK_IRQ``
``XA_FLAGS_LOCK_BH``
``XA_FLAGS_TRACK_FREE``

> > I'm willing to add more inline kernel-doc to get this to work better.
>
> Why? inline kernel-doc should be evaluated just like normal blocks.
>
> Right now, kernel-doc handles constants like NULL and XA_FLAGS_* using
> two ways:
>
> %FOO
> or
> ``FOO``
>
> The regex for those are:
>
> my $type_constant = '\b``([^\`]+)``\b';
> my $type_constant2 = '\%([-_\w]+)';

Right, but that's in kernel-doc ... in a .rst file, I believe we have
to use the ``SYMBOL`` syntax.

2020-10-09 06:37:26

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: Re: [PATCH] docs: Make automarkup ready for Sphinx 3.1+

Em Thu, 08 Oct 2020 13:54:59 +0000
Nícolas F. R. A. Prado <[email protected]> escreveu:

> On Thu Oct 8, 2020 at 2:27 AM -03, Mauro Carvalho Chehab wrote:
> >
> > Hi Nícolas,
> >
> > Em Wed, 07 Oct 2020 23:12:25 +0000
> > Nícolas F. R. A. Prado <[email protected]> escreveu:
> >
> > > While Sphinx 2 used a single c:type role for struct, union, enum and
> > > typedef, Sphinx 3 uses a specific role for each one.
> > > To keep backward compatibility, detect the Sphinx version and use the
> > > correct roles for that version.
> > >
> > > Also, Sphinx 3 is more strict with its C domain and generated warnings,
> > > exposing issues in the parsing.
> > > To fix the warnings, make the C regexes use ASCII, ensure the
> > > expressions only match the beginning of words and skip trying to
> > > cross-reference C reserved words.
> > >
> > > Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
> > > ---
> > >
> > > Hi,
> > >
> > > after Mauro's series making everything ready for Sphinx 3.1, only the automarkup
> > > was left to be ported.
> > > This patch makes the needed changes to automarkup so that we can soon flip the
> > > switch to Sphinx 3.1.
> > >
> > > This change was tested both with Sphinx 2.4.4 and Sphinx 3.1.
> > >
> > > This change doesn't add any warnings to the Documentation build.
> > > I tested it with Mauro's series but later rebased it to docs-next, and it can be
> > > accepted independently of that series.
> > >
> > > I ended up doing more than one thing in this single patch, but since it was all
> > > changing the same lines and for the same purpose, I felt it would be better to
> > > keep it as a single commit.
> > >
> >
> > Thanks for doing this! That was the last missing part on fully
> > supporting
> > Sphinx 3.1+.
> >
> > > Mauro,
> > > if this patch is ok, the 3rd patch in your series, which disables automarkup for
> > > sphinx 3, should be dropped.
> >
> > Yeah, sure.
> >
> > > Although I'm not sure what the implications of your patches adding namespaces
> > > and using the c:macro for functions are.
> >
> > With regards to namespaces:
> >
> > Currently, only the media docs use namespaces, and it declares it at the
> > beginning of each file that needs it, without overriding it later[1].
> >
> > [1] btw, the cdomain.py backward compat code doesn't support namespace
> > changes - as it parses namespaces before handling the C domain tags.
> > I doubt that we'll need to have a single .rst file using more than
> > one namespace anyway.
> >
> > The main usage is to avoid conflicts for uAPI documentation for
> > syscalls - actually for libc userspace wrappers to syscalls. It
> > documents
> > things like: open, close, read, write, ioctl, poll, select.
>
> If it's mainly for that, I think automarkup could skip handling namespaces.
> From automarkup.py:
>
> #
> # Many places in the docs refer to common system calls. It is
> # pointless to try to cross-reference them and, as has been known
> # to happen, somebody defining a function by these names can lead
> # to the creation of incorrect and confusing cross references. So
> # just don't even try with these names.
> #
> Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap',
> 'select', 'poll', 'fork', 'execve', 'clone', 'ioctl',
> 'socket' ]
>
> So unless I'm confusing things and the namespaces actually sidestep that issue,
> the namespace handling could be left out of automarkup.

Maybe I didn't express well enough. We need namespaces due to the
syscals.

Yet, if a .rst file uses it, *all* functions, structs, ... declared
there will be under the namespace.

In other words, looking at the V4L docs, for instance, all
functions there will be under "V4L" namespace.

It should be noticed that a side effect of this change is that
we may need to use namespaces on *all* (or almost all) uAPI
media documents. I'll double-check this for v5.11.

If automarkup would try to generate a cross-reference for one
of the many V4L2 API structs without using the "V4L" namespace,
it will fail.

Btw, considering that the namespace will solve the issues
with those functions, I suspect that we can avoid skipping them,
at least with Sphinx 3+.

> >
> > I'm not sure if the automarkup should be aware of it, or if the c.py
> > code
> > at Sphinx 3.x will add the namespace automatically, but I suspect that
> > automarkup will need to handle it as well.
> >
> > One file you could use for checking it is this one:
> >
> > Documentation/userspace-api/media/v4l/hist-v4l2.rst
> >
> > It contains a namespace directive and documents what changed without
> > using any explicit reference (after my patch series + linux-next).
> >
> > With regards to c:macro vs c:function:
> >
> > I suspect that automarkup should test both when trying to do
> > cross-references for function-like calls. E. g. test first if
> > there is a :c:function, falling back to check for :c:macro.
> >
> > I would add a "sphinx3_c_func_ref" function that would handle
> > such special case, e. g. something like:
> >
> > markup_func_sphinx3 = {RE_doc: markup_doc_ref,
> > RE_function: sphinx3_c_func_ref,
> > RE_struct: markup_c_ref,
> > RE_union: markup_c_ref,
> > RE_enum: markup_c_ref,
> > RE_typedef: markup_c_ref}
>
> Sounds good.
>
> I'll make this patch into a series and add that function/macro handling as a new
> patch, and the namespace handling depending on your answer on the above comment,
> for v2.

Thank you!

Thanks,
Mauro