There are a handful of instances where kernel doc comments want an
actual '%' in the final output, e.g. vsnprintf() wants to display "%n"
and "%p" to document format specifiers, and assembly functions that use
a custom call ABI may want to document their register usage, e.g. %eax.
Because kernel-doc unconditionally interprets '%' followed by a word
character as a constant definition, i.e. %CONST, it's impossible to get
an actual '%\w' when kernel-doc is used to translate comments into rst
format. Treat backtick and backlash as escaping '%', the former to
handle '%' in a ``LITERAL``, and the latter to allow '%' when using
standard formatting.
An alternative option would be to define a fancier set of rules for
interpreting '%' so that explicit escaping would not be required. For
example, require "%CONST" to be preceded by a recognized set of
characters, e.g. whitespace, opening parenthesis, etc... But the list
of recognized characters is quite large even in the current code base,
and using '\' to escape is more common and intuitive, i.e. most people
will naturally try doing "\%..." to get the desired formatting, whereas
losing %CONST formatting because of an unrecognized character is likely
to cause confusion.
Except for the aforementioned vsnprintf(), all .html output files from
`make htmldocs` are identical before and after this change.
Signed-off-by: Sean Christopherson <[email protected]>
---
scripts/kernel-doc | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index 3350e498b4ce..1890deb16725 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -210,7 +210,7 @@ my $anon_struct_union = 0;
# match expressions used to find embedded type information
my $type_constant = '\b``([^\`]+)``\b';
-my $type_constant2 = '\%([-_\w]+)';
+my $type_constant2 = '(^|[^\`\\\])\%([-_\w]+)';
my $type_func = '(\w+)\(\)';
my $type_param = '\@(\w*((\.\w+)|(->\w+))*(\.\.\.)?)';
my $type_fp_param = '\@(\w+)\(\)'; # Special RST handling for func ptr params
@@ -229,7 +229,7 @@ my $type_member_func = $type_member . '\(\)';
# these are pretty rough
my @highlights_man = (
[$type_constant, "\$1"],
- [$type_constant2, "\$1"],
+ [$type_constant2, "\$1\$2"],
[$type_func, "\\\\fB\$1\\\\fP"],
[$type_enum, "\\\\fI\$1\\\\fP"],
[$type_struct, "\\\\fI\$1\\\\fP"],
@@ -244,7 +244,7 @@ my $blankline_man = "";
# rst-mode
my @highlights_rst = (
[$type_constant, "``\$1``"],
- [$type_constant2, "``\$1``"],
+ [$type_constant2, "\$1``\$2``"],
# Note: need to escape () to avoid func matching later
[$type_member_func, "\\:c\\:type\\:`\$1\$2\$3\\\\(\\\\) <\$1>`"],
[$type_member, "\\:c\\:type\\:`\$1\$2\$3 <\$1>`"],
--
2.21.0
On Fri, Apr 05, 2019 at 02:18:20PM -0700, Sean Christopherson wrote:
> There are a handful of instances where kernel doc comments want an
> actual '%' in the final output, e.g. vsnprintf() wants to display "%n"
> and "%p" to document format specifiers, and assembly functions that use
> a custom call ABI may want to document their register usage, e.g. %eax.
>
> Because kernel-doc unconditionally interprets '%' followed by a word
> character as a constant definition, i.e. %CONST, it's impossible to get
> an actual '%\w' when kernel-doc is used to translate comments into rst
> format. Treat backtick and backlash as escaping '%', the former to
> handle '%' in a ``LITERAL``, and the latter to allow '%' when using
> standard formatting.
>
> An alternative option would be to define a fancier set of rules for
> interpreting '%' so that explicit escaping would not be required. For
> example, require "%CONST" to be preceded by a recognized set of
> characters, e.g. whitespace, opening parenthesis, etc... But the list
> of recognized characters is quite large even in the current code base,
> and using '\' to escape is more common and intuitive, i.e. most people
> will naturally try doing "\%..." to get the desired formatting, whereas
> losing %CONST formatting because of an unrecognized character is likely
> to cause confusion.
Would it make sense to have %% turn into % rather than forcing quotation
marks?
On Fri, 05 Apr 2019, Matthew Wilcox <[email protected]> wrote:
> On Fri, Apr 05, 2019 at 02:18:20PM -0700, Sean Christopherson wrote:
>> There are a handful of instances where kernel doc comments want an
>> actual '%' in the final output, e.g. vsnprintf() wants to display "%n"
>> and "%p" to document format specifiers, and assembly functions that use
>> a custom call ABI may want to document their register usage, e.g. %eax.
>>
>> Because kernel-doc unconditionally interprets '%' followed by a word
>> character as a constant definition, i.e. %CONST, it's impossible to get
>> an actual '%\w' when kernel-doc is used to translate comments into rst
>> format. Treat backtick and backlash as escaping '%', the former to
>> handle '%' in a ``LITERAL``, and the latter to allow '%' when using
>> standard formatting.
>>
>> An alternative option would be to define a fancier set of rules for
>> interpreting '%' so that explicit escaping would not be required. For
>> example, require "%CONST" to be preceded by a recognized set of
>> characters, e.g. whitespace, opening parenthesis, etc... But the list
>> of recognized characters is quite large even in the current code base,
>> and using '\' to escape is more common and intuitive, i.e. most people
>> will naturally try doing "\%..." to get the desired formatting, whereas
>> losing %CONST formatting because of an unrecognized character is likely
>> to cause confusion.
>
> Would it make sense to have %% turn into % rather than forcing quotation
> marks?
The problem is not limited to % though.
I see two long-term solutions to the problem:
1) Define proper escaping and quoting rules for kernel-doc to pass
through stuff as-is to Sphinx. This may be difficult to implement
because, well, see kernel-doc source.
2) Figure out how to do the kernel-doc references and highlights in the
Sphinx extension after all the reStructuredText processing.
BR,
Jani.
--
Jani Nikula, Intel Open Source Graphics Center
On Fri, 5 Apr 2019 14:18:20 -0700
Sean Christopherson <[email protected]> wrote:
> There are a handful of instances where kernel doc comments want an
> actual '%' in the final output, e.g. vsnprintf() wants to display "%n"
> and "%p" to document format specifiers, and assembly functions that use
> a custom call ABI may want to document their register usage, e.g. %eax.
>
> Because kernel-doc unconditionally interprets '%' followed by a word
> character as a constant definition, i.e. %CONST, it's impossible to get
> an actual '%\w' when kernel-doc is used to translate comments into rst
> format. Treat backtick and backlash as escaping '%', the former to
> handle '%' in a ``LITERAL``, and the latter to allow '%' when using
> standard formatting.
So I'm sympathetic toward the goal; we want this stuff to format properly.
But I'm less convinced about this specific solution. Starting with the
details:
> -my $type_constant2 = '\%([-_\w]+)';
> +my $type_constant2 = '(^|[^\`\\\])\%([-_\w]+)';
This only escapes the % if it occurs *immediately* after the backtick, so
something like ``foo %p`` will still get processed incorrectly. Somebody
will surely run into that at some point and waste a bunch of time trying to
figure out what's going on.
Also, believe it or not, I don't think you have enough backslashes; that
inner expression, I believe, should be:
[^\`\\\\]
Gosh Perl regexes are fun... This highlights the danger of adding
functionality that isn't exercised anywhere; I don't think it works here.
Now to more general considerations. Willy's suggestion of using %% instead
makes some sense, though it may lead to pushback from the "no extra markup
ever" contingent. It should be more straightforward to implement
correctly.
I have to wonder if the % thing is actually buying us much, honestly. It's
another form of markup that kind of duplicates the Sphinx notation, and
we've kind of agreed that most of the time, we don't want to clutter our
text with ``explicit markup`` like that. I'm curious what people think:
might the best solution be to just make %const do nothing special, with the
idea of phasing it out?
Thanks,
jon
On Mon, Apr 08, 2019 at 04:03:14PM -0600, Jonathan Corbet wrote:
> On Fri, 5 Apr 2019 14:18:20 -0700
> Sean Christopherson <[email protected]> wrote:
>
> > There are a handful of instances where kernel doc comments want an
> > actual '%' in the final output, e.g. vsnprintf() wants to display "%n"
> > and "%p" to document format specifiers, and assembly functions that use
> > a custom call ABI may want to document their register usage, e.g. %eax.
> >
> > Because kernel-doc unconditionally interprets '%' followed by a word
> > character as a constant definition, i.e. %CONST, it's impossible to get
> > an actual '%\w' when kernel-doc is used to translate comments into rst
> > format. Treat backtick and backlash as escaping '%', the former to
> > handle '%' in a ``LITERAL``, and the latter to allow '%' when using
> > standard formatting.
>
> So I'm sympathetic toward the goal; we want this stuff to format properly.
> But I'm less convinced about this specific solution. Starting with the
> details:
>
> > -my $type_constant2 = '\%([-_\w]+)';
> > +my $type_constant2 = '(^|[^\`\\\])\%([-_\w]+)';
>
> This only escapes the % if it occurs *immediately* after the backtick, so
> something like ``foo %p`` will still get processed incorrectly. Somebody
> will surely run into that at some point and waste a bunch of time trying to
> figure out what's going on.
Argh, yeah.
> Also, believe it or not, I don't think you have enough backslashes; that
> inner expression, I believe, should be:
>
> [^\`\\\\]
>
> Gosh Perl regexes are fun... This highlights the danger of adding
> functionality that isn't exercised anywhere; I don't think it works here.
I honestly just added backslashes until it did work (for unmerged docs).
But Perl isn't exactly my strong suit, so it's entirely possible it worked
by sheer dumb luck.
>
> Now to more general considerations. Willy's suggestion of using %% instead
> makes some sense, though it may lead to pushback from the "no extra markup
> ever" contingent. It should be more straightforward to implement
> correctly.
I considered %% as well. I basically did a mental coin flip between %% and \%.
> I have to wonder if the % thing is actually buying us much, honestly. It's
> another form of markup that kind of duplicates the Sphinx notation, and
> we've kind of agreed that most of the time, we don't want to clutter our
> text with ``explicit markup`` like that. I'm curious what people think:
> might the best solution be to just make %const do nothing special, with the
> idea of phasing it out?
Along those lines, what about adding flags into the kernel-doc directive
to opt-out of specific markup? That would allow removing the old markup
on a case-by-case basis instead of having to do a tree-wide change. And
once we've reached critical mass the flag can be removed (in theory).
E.g.
.. kernel-doc:: arch/x86/kernel/cpu/sgx/driver/ioctl.c
:noconsts:
:functions: sgx_ioc_create_enclave
sgx_ioc_enclave_add_page
sgx_ioc_enclave_init
.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h
:noconsts:
:nosymbols: sgx_enclave_exception
Another alternative would be to version the kernel-doc directive. But that
lacks the granularity of opting out of specific markup, and would probably
lead to endless discussion on what exactly should be in "v2" or whatever,
and inevitably a v3, v4, etc...
E.g.:
.. kernel-doc-v2:: arch/x86/kernel/cpu/sgx/driver/ioctl.c
:functions: sgx_ioc_create_enclave
sgx_ioc_enclave_add_page
sgx_ioc_enclave_init
.. kernel-doc-v2:: arch/x86/include/uapi/asm/sgx.h
:nosymbols: sgx_enclave_exception