Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1087134yba; Fri, 26 Apr 2019 13:54:37 -0700 (PDT) X-Google-Smtp-Source: APXvYqzQ5UutxDFAO5z7l6/LFUjxIY9DrZrvLPOkDBsMhV4WJaDV9X5VZpc9aFcKGqd8jZ8qpPqu X-Received: by 2002:a65:6554:: with SMTP id a20mr46377710pgw.284.1556312077707; Fri, 26 Apr 2019 13:54:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556312077; cv=none; d=google.com; s=arc-20160816; b=jSV88K9JeP+XbnoJH5uPbE/eWWha90OhZ0wgbawxOyfWTGyW+QoJB2Ucbx8+zZv4ZG UDOHPKSffm/xaupvbGxyDQJcjvHdZcDTOL7fQbqy8SnC+fElzXXPwLuxtlkQ5zRIx1Yf gOsOGOtY4hDugpyS3ojYAmhIDE6/GmHQv2wib6WYvJfDx3gqb+bDmLFxPq4m2dgoXPvA IHBtxjx4QB+XEkjS41un/pQKnyqYcakXW1gCzDFH1d+NQjFMhCtPuqaiLLldbY68rza/ PV7m9QPOA9vO76yfI6FdP4bsDD7/wM70D0Lw2sOa19TNv88eGA/VKFeusyI/pFF08qQw Puyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=47uBOF96sITqddAtULtaa04U3/2YmE4ACcKJzgGb3L4=; b=TxxDilCksrCxzKQzQm2fUpo9JjLGsyMVYykjNJFYM+a65Vu7gNc0J6t5whLo6z3Cpc 8fn9+9L3nVgAqZoQ4eQq/z0K+KjVkeRTn17hYMX998y9YAhCLFsiPLqA0LikC4CGM+JD 5yckoDy13QPK8TOTt9fNVJke/yqWJUiVPdrP6tS1YniGcqGes7Qm//7MDpX2e2ekKIuY EHiSDu4c35q/+koiilwkZpS0+vdaDLUyCEmgzfNb3todxaH7CVK7wn6ccUR7stPKHzZj qCR8cr3xGunxjtVZeJBny+6qv528AvpiLxnFwO56jFg44MKg9EiBDRAiDBU13vOJFfAA lyqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@infradead.org header.s=casper.20170209 header.b=TAz6B8aw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a17si8365280pgm.505.2019.04.26.13.54.21; Fri, 26 Apr 2019 13:54:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@infradead.org header.s=casper.20170209 header.b=TAz6B8aw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726834AbfDZUwI (ORCPT + 99 others); Fri, 26 Apr 2019 16:52:08 -0400 Received: from casper.infradead.org ([85.118.1.10]:41828 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725966AbfDZUwI (ORCPT ); Fri, 26 Apr 2019 16:52:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:Content-Type: MIME-Version:References:In-Reply-To:Message-ID:Subject:Cc:To:From:Date:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=LkGNyvhAncaB0wFb06dGXLfekZH8dWavH787YZU01SU=; b=TAz6B8awHU41/enEZKRmjDOGkv YeXxLjtzzqHDj8rVMBrMKsNwEwB0z0BiuXJwu/9SMwuyhjT6sLs9+C1WcQWjBfHBydisbmvfSZEvM NIZHT/mGZbCD0rrX+E0LmsnP2ljTg2QS3gbs4XuJwwJn15q1dqqzE0wV+GoG8t/MzNAes2YZBmmIK X3BLdIsQDanwBONhfiowPw7uFaGuTMqmcOXmbSccvtm1/rLDWXnjPxFPKak22rphE2hoIWk3vIr4Z +1PL51IpEu6KTAsKaeRtqHJ1V0MA+i7qd67YcVJP6nt+9ZGY3RgOFxJgHIfiVDp1DQ8kIO1p+XGe5 85xOjJQg==; Received: from [179.95.39.209] (helo=coco.lan) by casper.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1hK7pC-0000Tc-Jv; Fri, 26 Apr 2019 20:52:03 +0000 Date: Fri, 26 Apr 2019 17:51:58 -0300 From: Mauro Carvalho Chehab To: Jonathan Corbet Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Matthew Wilcox Subject: Re: [PATCH 1/2] Docs: An initial automarkup extension for sphinx Message-ID: <20190426175158.0111c437@coco.lan> In-Reply-To: <20190426133719.5d30d4a4@lwn.net> References: <20190425200125.12302-1-corbet@lwn.net> <20190425200125.12302-2-corbet@lwn.net> <20190426153255.7e424a45@coco.lan> <20190426133719.5d30d4a4@lwn.net> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Fri, 26 Apr 2019 13:37:19 -0600 Jonathan Corbet escreveu: > On Fri, 26 Apr 2019 15:32:55 -0300 > Mauro Carvalho Chehab wrote: > > > > +# Try to identify "function()" that's not already marked up some > > > +# other way. Sphinx doesn't like a lot of stuff right after a > > > +# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last > > > +# bit tries to restrict matches to things that won't create trouble. > > > +# > > > +RE_function = re.compile(r'(^|\s+)([\w\d_]+\(\))([.,/\s]|$)') > > > > IMHO, this looks good enough to avoid trouble, maybe except if one > > wants to write a document explaining this functionality at the > > doc-guide/kernel-doc.rst. > > Adding something to the docs is definitely on my list. > > > Anyway, the way it is written, we could still explain it by adding > > a "\ " after the func, e. g.: > > > > When you write a function like: func()\ , the automarkup > > extension will automatically convert it into: > > ``:c:func:`func()```. > > > > So, this looks OK on my eyes. > > Not sure I like that; the whole point is to avoid extra markup here. Plus > I like that it catches all function references whether the author thought > to mark them or not. Yes, but I'm pretty sure that there will be cases where one may want to explicitly force the parser to not recognize it. One of such examples is the document explaining this feature. > > > > +# > > > +# Lines consisting of a single underline character. > > > +# > > > +RE_underline = re.compile(r'^([-=~])\1+$') > > > > Hmm... why are you calling this "underline"? Sounds a bad name to me, > > as it took me a while to understand what you meant. > > Seemed OK to me, but I can change it :) I'm pretty sure that, on my last /79 patch series, I used some patterns that would be placing function names at the title, and that were not using '-', '=' or '~'. If the parser would pick it or not is a separate matter[1] :-) [1] It would probably reject on titles, as very often what we write on titles are things like: foo(int i, int j) ................. As it has the variables inside, the parser won't likely get it. Yet, I vaguely remember I saw or wrote some title that had a pattern like: Usage of foo() ^^^^^^^^^^^^^^ (but I can't really remember what was the used markup) I would prefer if you change. I usually use myself: '=' '-' '^' and '.' (usually on the above order - as it makes some sense to my brain to use the above indentation levels) In general, function descriptions are sub-sub-title or sub-sub-sub-title, so, on the places I wrote, it would likely be either '^' or '.'. But I've seen other symbols being used too to mark titles (like '*' and '#'). > > From the code I'm inferring that this is meant to track 3 of the > > possible symbols used as a (sub).*title markup. On several places > > we use other symbols:'^', '~', '.', '*' (and others) as sub-sub(sub..) > > title markups. > > I picked the ones that were suggested in our docs; it was enough to catch > all of the problems in the current kernel docs. > > Anyway, The real documentation gives the actual set, so I'll maybe make it: > > =-'`":~^_*+#<> I'm pretty sure a single dot works as well, as I used this already. > I'd prefer that to something more wildcardish. Yeah, makes sense, provided that it will reflect what Sphinx actually uses internally. > > You should probably need another regex for the title itself: > > > > RE_possible_title = re.compile(r'^(\S.*\S)\s*$') > > > > in order to get the size of the matched line. Doing a doing len(previous) > > will get you false positives. > > This I don't quite get. It's easy enough to trim off the spaces with > strip() if that turns out to be a problem (which it hasn't so far). I can > add that. What I'm saying is that the title markup should always start at the first position. So, this is a valid title: Foo valid title =============== But this causes Sphinx to crash badly: Foo invalid title ================= Knowing that, we can use a regex for the previous line assuming that it will always start with a non-spaced character[2], and checking only the length of non-blank characters. [2] Strictly speaking, I guess Sphinx would accept something like: Foo weirdly marked title - probably non-compliant with ReST spec =================================================================== But I don't think we have any occurrence of something like that - and I don't think we should concern about that, as it would be a very bad documentation style anyway. So, what I'm saying is that we could use such knowledge in our benefit, considering a valid title to be something like: ^(\S.*\S)\s*$ - e. g. the title itself starts on a non-space char and ends on another non-space char. > > > on a separate matter (but related to automarkup matter - and to what > > I would name underline), as a future feature, perhaps we could also add > > a parser for: > > > > _something that requires underlines_ > > > > Underlined text is probably the only feature that we use on several docs > > with Sphinx doesn't support (there are some extensions for that - I guess, > > but it sounds simple enough to have a parser here). > > > > This can be tricky to get it right, as just underlines_ is a > > cross reference markup - so, I would only add this after we improve the > > script to come after Sphinx own markup processing. > > That does indeed sound tricky. It would also probably have to come > *before* Sphinx does its thing or it's unlikely to survive. > > > > + # > > > + # Is this an underline line? If so, and it is the same length > > > + # as the previous line, we may have mangled a heading line in > > > + # error, so undo it. > > > + # > > > + elif RE_underline.match(line): > > > + if len(line) == len(previous): > > > > No, that doesn't seem enough. I would, instead, use the regex I > > proposed before, in order to check if the previous line starts with > > a non-space, and getting the length only up to the last non-space > > (yeah, unfortunately, we have some text files that have extra blank > > spaces at line's tail). > > So I'll make it "if len(line) == len(previous.strip()) > > Thanks, > > jon Thanks, Mauro