Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Date:   Fri, 26 Apr 2019 17:51:58 -0300
From:   Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
To:     Jonathan Corbet <corbet@lwn.net>
Cc:     linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
        Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH 1/2] Docs: An initial automarkup extension for sphinx
Message-ID: <20190426175158.0111c437@coco.lan>
In-Reply-To: <20190426133719.5d30d4a4@lwn.net>
References: <20190425200125.12302-1-corbet@lwn.net>
        <20190425200125.12302-2-corbet@lwn.net>
        <20190426153255.7e424a45@coco.lan>
        <20190426133719.5d30d4a4@lwn.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

Em Fri, 26 Apr 2019 13:37:19 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> On Fri, 26 Apr 2019 15:32:55 -0300
> Mauro Carvalho Chehab <mchehab+samsung@kernel.org> wrote:
> 
> > > +# Try to identify "function()" that's not already marked up some
> > > +# other way.  Sphinx doesn't like a lot of stuff right after a
> > > +# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
> > > +# bit tries to restrict matches to things that won't create trouble.
> > > +#
> > > +RE_function = re.compile(r'(^|\s+)([\w\d_]+\(\))([.,/\s]|$)')    
> > 
> > IMHO, this looks good enough to avoid trouble, maybe except if one
> > wants to write a document explaining this functionality at the
> > doc-guide/kernel-doc.rst.  
> 
> Adding something to the docs is definitely on my list.
> 
> > Anyway, the way it is written, we could still explain it by adding
> > a "\ " after the func, e. g.:
> > 
> > 	When you write a function like: func()\ , the automarkup
> > 	extension will automatically convert it into:
> > 	``:c:func:`func()```.
> > 
> > So, this looks OK on my eyes.  
> 
> Not sure I like that; the whole point is to avoid extra markup here.  Plus
> I like that it catches all function references whether the author thought
> to mark them or not.

Yes, but I'm pretty sure that there will be cases where one may want
to explicitly force the parser to not recognize it. One of such examples
is the document explaining this feature.

> 
> > > +#
> > > +# Lines consisting of a single underline character.
> > > +#
> > > +RE_underline = re.compile(r'^([-=~])\1+$')    
> > 
> > Hmm... why are you calling this "underline"? Sounds a bad name to me,
> > as it took me a while to understand what you meant.  
> 
> Seemed OK to me, but I can change it :)

I'm pretty sure that, on my last /79 patch series, I used some
patterns that would be placing function names at the title,
and that were not using '-', '=' or '~'. If the parser
would pick it or not is a separate matter[1] :-)

[1] It would probably reject on titles, as very often what
we write on titles are things like:

foo(int i, int j)
.................

As it has the variables inside, the parser won't likely get it.

Yet, I vaguely remember I saw or wrote some title that had
a pattern like:

Usage of foo()
^^^^^^^^^^^^^^

(but I can't really remember what was the used markup)

I would prefer if you change. I usually use myself:

	'=' '-' '^' and '.'

(usually on the above order - as it makes some sense to my
brain  to use the above indentation levels)

In general, function descriptions are sub-sub-title or
sub-sub-sub-title, so, on the places I wrote, it would likely
be either '^' or '.'.

But I've seen other symbols being used too to mark titles
(like '*' and '#').

> > From the code I'm inferring that this is meant to track 3 of the
> > possible symbols used as a (sub).*title markup. On several places 
> > we use other symbols:'^', '~', '.', '*' (and others) as sub-sub(sub..)
> > title markups.  
> 
> I picked the ones that were suggested in our docs; it was enough to catch
> all of the problems in the current kernel docs.
> 
> Anyway, The real documentation gives the actual set, so I'll maybe make it:
> 
> 	=-'`":~^_*+#<>

I'm pretty sure a single dot works as well, as I used this already.

> I'd prefer that to something more wildcardish.

Yeah, makes sense, provided that it will reflect what Sphinx actually
uses internally.

> > You should probably need another regex for the title itself:
> > 
> > 	RE_possible_title = re.compile(r'^(\S.*\S)\s*$')
> > 
> > in order to get the size of the matched line. Doing a doing len(previous)
> > will get you false positives.  
> 
> This I don't quite get.  It's easy enough to trim off the spaces with
> strip() if that turns out to be a problem (which it hasn't so far).  I can
> add that.


What I'm saying is that the title markup should always start at the first
position. So, this is a valid title:

Foo valid title
===============

But this causes Sphinx to crash badly:

 Foo invalid title
 =================

Knowing that, we can use a regex for the previous line assuming that it
will always start with a non-spaced character[2], and checking only the 
length of non-blank characters.

[2] Strictly speaking, I guess Sphinx would accept something like:

   Foo weirdly marked title - probably non-compliant with ReST spec
===================================================================

But I don't think we have any occurrence of something like that - and
I don't think we should concern about that, as it would be a very bad 
documentation style anyway.

So, what I'm saying is that we could use such knowledge in our benefit,
considering a valid title to be something like: ^(\S.*\S)\s*$ - e. g.
the title itself starts on a non-space char and ends on another 
non-space char.

> 
> > on a separate matter (but related to automarkup matter - and to what
> > I would name underline), as a future feature, perhaps we could also add
> > a parser for:
> > 
> > 	_something that requires underlines_
> > 
> > Underlined text is probably the only feature that we use on several docs
> > with Sphinx doesn't support (there are some extensions for that - I guess,
> > but it sounds simple enough to have a parser here).
> > 
> > This can be tricky to get it right, as just underlines_ is a
> > cross reference markup - so, I would only add this after we improve the
> > script to come after Sphinx own markup processing.  
> 
> That does indeed sound tricky.  It would also probably have to come
> *before* Sphinx does its thing or it's unlikely to survive.
> 
> > > +        #
> > > +        # Is this an underline line?  If so, and it is the same length
> > > +        # as the previous line, we may have mangled a heading line in
> > > +        # error, so undo it.
> > > +        #
> > > +        elif RE_underline.match(line):
> > > +            if len(line) == len(previous):    
> > 
> > No, that doesn't seem enough. I would, instead, use the regex I
> > proposed before, in order to check if the previous line starts with
> > a non-space, and getting the length only up to the last non-space
> > (yeah, unfortunately, we have some text files that have extra blank
> > spaces at line's tail).  
> 
> So I'll make it "if len(line) == len(previous.strip())
> 
> Thanks,
> 
> jon


Thanks,
Mauro