LinuxLists.cc - Re: Kernel docs: muddying the waters a bit

2016-03-08 13:39:38

Subject: Re: Kernel docs: muddying the waters a bit

Em Tue, 08 Mar 2016 05:13:13 -0700
Dan Allen <[email protected]> escreveu:

> On Tue, Mar 8, 2016 at 4:29 AM, Mauro Carvalho Chehab <
> [email protected]> wrote:
>
> > pandoc did a really crap job on the conversion. To convert this
> > into something useful, we'll need to spend a lot of time, as it lost
> > most of the cross-references, as they were defined via DocBook macros.
> >
>
> I agree pandoc creates crappy AsciiDoc. We have a much better converter in
> the works called DocBookRx.
>
> https://github.com/opendevise/docbookrx
>
> It has converted several very serious DocBook documents and we're
> continuing to improve it. It's also a lot easier to hack than pandoc.

Didn't work:

$ ./bin/docbookrx ~/devel/docbook_test/v4l2.xml
No visitor defined for <part>! Skipping.
No visitor defined for <part>! Skipping.
No visitor defined for <part>! Skipping.
No visitor defined for <part>! Skipping.
No visitor defined for <appendixinfo>! Skipping.

>
> -Dan
>
>

--
Thanks,
Mauro

2016-03-08 15:39:37

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: Re: Kernel docs: muddying the waters a bit

Em Tue, 8 Mar 2016 10:39:22 -0300
Mauro Carvalho Chehab <[email protected]> escreveu:

> Em Tue, 08 Mar 2016 05:13:13 -0700
> Dan Allen <[email protected]> escreveu:
>
> > On Tue, Mar 8, 2016 at 4:29 AM, Mauro Carvalho Chehab <
> > [email protected]> wrote:
> >
> > > pandoc did a really crap job on the conversion. To convert this
> > > into something useful, we'll need to spend a lot of time, as it lost
> > > most of the cross-references, as they were defined via DocBook macros.
> > >
> >
> > I agree pandoc creates crappy AsciiDoc. We have a much better converter in
> > the works called DocBookRx.
> >
> > https://github.com/opendevise/docbookrx
> >
> > It has converted several very serious DocBook documents and we're
> > continuing to improve it. It's also a lot easier to hack than pandoc.
>
> Didn't work:
>
> $ ./bin/docbookrx ~/devel/docbook_test/v4l2.xml
> No visitor defined for <part>! Skipping.
> No visitor defined for <part>! Skipping.
> No visitor defined for <part>! Skipping.
> No visitor defined for <part>! Skipping.
> No visitor defined for <appendixinfo>! Skipping.

I tried to use docbookrx for the bits that were not properly converted,
like the manpage-like pages:

$ ../docbookrx/bin/docbookrx Documentation/DocBook/media/v4l/func-ioctl.xml
No visitor defined for <refentry>! Skipping.

Dan, if you want to take a look on what's going wrong here,
the XML I'm trying to convert is:

https://git.linuxtv.org/media_tree.git/tree/Documentation/DocBook/media/v4l/func-ioctl.xml

If this would work, it should be generating something like:
https://git.linuxtv.org/mchehab/asciidoc-poc.git/tree/func-ioctl.adoc

Pandoc failed to fully convert it, but at least it left all the texts,
with prevented rewriting it from scratch. This is the manual fix
I applied to it:
https://git.linuxtv.org/mchehab/asciidoc-poc.git/commit/func-ioctl.adoc?id=801d336c3742f26731e08c284290c32c0b4632fc

FYI, we have 133 xml files at the media uAPI doc with refmeta.

--
Thanks,
Mauro

2016-03-09 21:27:26

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: Re: Kernel docs: muddying the waters a bit

Em Tue, 8 Mar 2016 12:39:21 -0300
Mauro Carvalho Chehab <[email protected]> escreveu:

> Pandoc failed to fully convert it, but at least it left all the texts,
> with prevented rewriting it from scratch. This is the manual fix
> I applied to it:
> https://git.linuxtv.org/mchehab/asciidoc-poc.git/commit/func-ioctl.adoc?id=801d336c3742f26731e08c284290c32c0b4632fc
>
> FYI, we have 133 xml files at the media uAPI doc with refmeta.

I used pandoc to convert from the html files and manually edited it.
I also fixed lots of other issues with the conversion.

I guess the conversion to asciidoc format is now in good shape,
at least to demonstrate that it is possible to use this format for the
media docbook. Still, there are lots of broken references.

The proof of concept html file is at:
https://mchehab.fedorapeople.org/media-kabi-docs-test/asciidoc_tests/media_api.html

I also added the ascii doc files there, at:
https://mchehab.fedorapeople.org/media-kabi-docs-test/asciidoc_tests/

And I'm keeping the git tree, with helps to identify the work that was
needed to make it work:
https://git.linuxtv.org/mchehab/asciidoc-poc.git

In summary, AsciiDoc, formatted via AsciiDoctor worked fine to produce
an html file.

PROBLEMS
========

1)

I was not able to produce outputs on any other format.

For example, when trying to generate docbook45 output, it sems that
part of the trouble was due to pandoc conversion. It produces
links like:

link:#ftn.id-1.4.11.43.5.11.2.7.2.6.2[^[a]^]

Which causes errors with DocBook parsers, like xmllint:

media_api.xml:32300: parser error : Opening and ending tag mismatch: superscript line 32300 and ulink
<ulink url="#id-1.4.11.43.5.11.2.7.2.6.2"><superscript>[a</ulink></superscript>]
^

I suspect that this is fixable. I may try to fix it later.

2) It seems that Asciidoctor doesn't allow annexes per document part.
It numberates them as chapters, instead of using A, B, C, ...

3) Even producing the html without troubles, it produces an error:
asciidoctor: ERROR: media_api.adoc: line 57: invalid part, must have at least one section (e.g., chapter, appendix, etc.)

4) There are some things that got lost during the conversion, like
copyright notes and revision notes. This could be simply a problem
with pandoc conversion. Nothing serious, I guess, as we could insert
the lost data manually. Yet, it means that, to move from the PoC to
the Kernel, there are still lots of work to do.

I was unable do discover why, nor to suppress this error message.

Yet, from my side, if we're willing to get rid of DocBook, then
Asciidoctor seems to be the *only* alternative so far to parse the
complex media documents.

Regards,
Mauro

2016-03-10 10:26:34

by Jani Nikula

[permalink] [raw]

Subject: Re: Kernel docs: muddying the waters a bit

TL;DR? Skip to the last paragraph.

On Wed, 09 Mar 2016, Mauro Carvalho Chehab <[email protected]> wrote:
> I guess the conversion to asciidoc format is now in good shape,
> at least to demonstrate that it is possible to use this format for the
> media docbook. Still, there are lots of broken references.

Getting references right with asciidoc is a big problem in the
kernel-doc side. As I wrote before, the proofs of concept only worked
because everything was processed as one big file (via includes). The
Asciidoctor inter-document references won't help, because we won't know
the target document name while processing kernel-doc.

Sphinx is massively better at handling cross references for
kernel-doc. We can use domains (C language) and roles (e.g. functions,
types, etc.) for the references, which provide kind of
namespaces. Sphinx warns for referencing non-existing targets, but
doesn't generate broken links in the result like Asciidoctor does.

For example, in the documentation for a function that has struct foo as
parameter or return type, a cross reference to struct foo is added
automagically, but only if documentation for struct foo actually
exists. In Asciidoctor, we would have to blindly generate the references
ourselves, and try to resolve broken links ourselves by somehow
post-processing the result.

> Yet, from my side, if we're willing to get rid of DocBook, then
> Asciidoctor seems to be the *only* alternative so far to parse the
> complex media documents.

I think you mean, "get rid of DocBook as source format", not altogether?
I'm yet to be convinved we could rely on Asciidoctor's native formats.

---

Mauro, I truly appreciate your efforts at evaluating both
alternatives. I also appreciate Dan's inputs on Asciidoctor.

Despite your evaluation that Asciidoctor is the only alternative for
media documents, it is my opinion that we should go with Sphinx.

It's an opinion, it's subjective, it's from my perspective, especially
from the kernel-doc POV, so please don't take it as a slap in the face
after all the work you've done. With that out of the way, here's why.

For starters, Jon's Sphinx proof-of-concept at
http://static.lwn.net/kerneldoc/ is pretty amazing. It's beautiful and
usable. Cross references work, there are no broken links (I hacked a bit
more on kernel-doc and it gets even better). There's embedded search
(and if this gets exported to https://readthedocs.org/ the search is
even better). The API documentation is sensible and the headings aren't
mixed up with other headings. It's all there. It's what we've been
looking for.

The toolchain gets faster, easier to debug and simplified a lot with
DocBook out of the equation completely. Sphinx itself is stable, widely
available, and well documented. IMO there's sufficient native output
format support. There are plenty of really nice extensions
available. There's a possibility of doing kernel-doc as an extension in
the future (either by calling current kernel-doc from the extension or
by rewriting it).

Dan keeps bringing up the active community in Asciidoctor, and how
they're fixing things up as we speak... which is great, but Sphinx is
here now, packaged and shipping in distros ready to use. It seems that
of the two, an Asciidoctor based toolchain is currently more in need of
hacking and extending to meet our needs. Which brings us to the
implementation language, Python vs. Ruby.

I won't make the mistake of comparing the relative merits of the
languages, but I'll boldly claim the set of kernel developers who know
Python is likely larger than the set of kernel developers who know Ruby
[citation needed]. AFAICT there are no Ruby tools in the kernel tree,
but there is a bunch of Python. My own very limited and subjective
experience with other tools around the kernel is that Python is much
more popular than Ruby. So my claim here is that we're in a better
position to hack on Sphinx extensions ourselves than Asciidoctor.

My conclusion is that Sphinx covers the vast majority of the needs of
our documentation producers and consumers, in an amazing way, out of the
box, better than Asciidoctor.

Which brings us to the minority and the parts where Sphinx falls short,
media documentation in particular. It's complex documentation, with very
specific requirements on the output, especially that many things remain
exactly as they are now. It also feels like the target is more to have
standalone media documentation, and not so much to be aligned with and
be part of the rest of the kernel documentation.

I want to question the need to have all kernel documentation use tools
that meet the strict requirements of the outlier, when there's a better
alternative for the vast majority of the documentation. Especially when
Asciidoctor isn't a ready solution for media documentation either.

In summary, my proposal is to go with Sphinx, leave media docs as
DocBook for now, and see if and how they can be converted to
Sphinx/reStructuredText later on when we have everything else in
place. It's not the perfect outcome, but IMHO it's the best overall
choice.

BR,
Jani.

--
Jani Nikula, Intel Open Source Technology Center

2016-03-10 15:21:16

by Mauro Carvalho Chehab

[permalink] [raw]

Subject: Re: Kernel docs: muddying the waters a bit

Em Thu, 10 Mar 2016 12:25:58 +0200
Jani Nikula <[email protected]> escreveu:

> TL;DR? Skip to the last paragraph.
>
> On Wed, 09 Mar 2016, Mauro Carvalho Chehab <[email protected]> wrote:
> > I guess the conversion to asciidoc format is now in good shape,
> > at least to demonstrate that it is possible to use this format for the
> > media docbook. Still, there are lots of broken references.
>
> Getting references right with asciidoc is a big problem in the
> kernel-doc side. As I wrote before, the proofs of concept only worked
> because everything was processed as one big file (via includes). The
> Asciidoctor inter-document references won't help, because we won't know
> the target document name while processing kernel-doc.

I was able to produce chunked htmls here with:

asciidoctor -b docbook45 media_api.adoc
xmlto -o html-dir html media_api.xml

The results are at:
https://mchehab.fedorapeople.org/media-kabi-docs-test/asciidoc_tests/chunked/

But yeah, all references seem to be broken there. It could be due to some
conversion issue (I didn't actually tried to check what's wrong there),
but I think that there's something not ok with docbook45
output for multi-part documents (on both AsciiDoc and Asciidoctor).

> Sphinx is massively better at handling cross references for
> kernel-doc. We can use domains (C language) and roles (e.g. functions,
> types, etc.) for the references, which provide kind of
> namespaces. Sphinx warns for referencing non-existing targets, but
> doesn't generate broken links in the result like Asciidoctor does.
>
> For example, in the documentation for a function that has struct foo as
> parameter or return type, a cross reference to struct foo is added
> automagically, but only if documentation for struct foo actually
> exists. In Asciidoctor, we would have to blindly generate the references
> ourselves, and try to resolve broken links ourselves by somehow
> post-processing the result.
>
> > Yet, from my side, if we're willing to get rid of DocBook, then
> > Asciidoctor seems to be the *only* alternative so far to parse the
> > complex media documents.
>
> I think you mean, "get rid of DocBook as source format", not altogether?
> I'm yet to be convinved we could rely on Asciidoctor's native formats.

What I mean is that, right now, I see only two alternatives for the
media uAPI documentation:
1) keep using DocBook;
2) AsciiDoc/Asciidoctor.

Sphinx doesn't have what's needed to support the complexity of the
media books, specially since cell span seems to be possible only
by using asciiArt formats. Writing a big table using asciiArt is
something that is a *real pain*. Also, as tested, if the table is
too big, it fails to parse such asciiArt tables. So, while Sphinx
doesn't have a decent way to describe tables, we can't use it.

If it starts implementing it, then we can check if the other
features used by the media documentation are also supported.
Probably, multi-part books would be another pain with Sphinx.
We have actually 4 books inside a common body. A few chapters
(like book licensing, bibliography, error codes) are shared
by all 4 documents.

But, so far, I can't see any way to port media books without
lots of lot of work to develop new features at the Sphinx code.

> ---
>
> Mauro, I truly appreciate your efforts at evaluating both
> alternatives. I also appreciate Dan's inputs on Asciidoctor.
>
> Despite your evaluation that Asciidoctor is the only alternative for
> media documents, it is my opinion that we should go with Sphinx.
>
> It's an opinion, it's subjective, it's from my perspective, especially
> from the kernel-doc POV, so please don't take it as a slap in the face
> after all the work you've done. With that out of the way, here's why.
>
> For starters, Jon's Sphinx proof-of-concept at
> http://static.lwn.net/kerneldoc/ is pretty amazing. It's beautiful and
> usable. Cross references work, there are no broken links (I hacked a bit
> more on kernel-doc and it gets even better). There's embedded search
> (and if this gets exported to https://readthedocs.org/ the search is
> even better). The API documentation is sensible and the headings aren't
> mixed up with other headings. It's all there. It's what we've been
> looking for.
>
> The toolchain gets faster, easier to debug and simplified a lot with
> DocBook out of the equation completely. Sphinx itself is stable, widely
> available, and well documented. IMO there's sufficient native output
> format support. There are plenty of really nice extensions
> available. There's a possibility of doing kernel-doc as an extension in
> the future (either by calling current kernel-doc from the extension or
> by rewriting it).

Well, if we go to Sphinx for kernel-doc, that means that we'll need
2 different tools for the documentation:
- Sphinx for kernel-doc
- either DocBook or Asciidoctor/AsciiDoc for media.

IMHO, this is the worse scenario, as we'll keep depending on
DocBook plus requiring Sphinx, but it is up to Jon to decide.

> Dan keeps bringing up the active community in Asciidoctor, and how
> they're fixing things up as we speak... which is great, but Sphinx is
> here now, packaged and shipping in distros ready to use. It seems that
> of the two, an Asciidoctor based toolchain is currently more in need of
> hacking and extending to meet our needs. Which brings us to the
> implementation language, Python vs. Ruby.
>
> I won't make the mistake of comparing the relative merits of the
> languages, but I'll boldly claim the set of kernel developers who know
> Python is likely larger than the set of kernel developers who know Ruby
> [citation needed]. AFAICT there are no Ruby tools in the kernel tree,
> but there is a bunch of Python. My own very limited and subjective
> experience with other tools around the kernel is that Python is much
> more popular than Ruby. So my claim here is that we're in a better
> position to hack on Sphinx extensions ourselves than Asciidoctor.

Sorry, but I don't buy it. Python is, IMHO, a mess: each new version
is incompatible with the previous one, and requires the source to
change, in order to use a newer version than the one used to write
the code. So, when talking about Python, we're actually talking about
several different dialects that don't talk well to each other.

I don't know about Ruby. So far, I don't have anything against (or in
favor) of it. I bet most Kernel developers would actually prefer a
toolchain in C. If such tool doesn't exist, anything else seems
equally the same ;)

> My conclusion is that Sphinx covers the vast majority of the needs of
> our documentation producers and consumers, in an amazing way, out of the
> box, better than Asciidoctor.
>
> Which brings us to the minority and the parts where Sphinx falls short,
> media documentation in particular. It's complex documentation, with very
> specific requirements on the output, especially that many things remain
> exactly as they are now. It also feels like the target is more to have
> standalone media documentation, and not so much to be aligned with and
> be part of the rest of the kernel documentation.
>
> I want to question the need to have all kernel documentation use tools
> that meet the strict requirements of the outlier, when there's a better
> alternative for the vast majority of the documentation. Especially when
> Asciidoctor isn't a ready solution for media documentation either.
>
> In summary, my proposal is to go with Sphinx, leave media docs as
> DocBook for now, and see if and how they can be converted to
> Sphinx/reStructuredText later on when we have everything else in
> place. It's not the perfect outcome, but IMHO it's the best overall
> choice.

Well, this could be done. We don't have any good reason to move
the media docs out of DocBook. On the contrary, this means an extra
work. The only advantage is that it is a way simpler to write
documentation with a markup language, but converting from the PoC
to its integration at the Kernel tree still require lots of work,
specially due to the cross-refs "magic" scripts that we have under
Documentation/DocBook/media/Makefile.

As I said, the only big drawback is to keep depending on two
different tools for kernel-doc and for media documentation.

--
Thanks,
Mauro

2016-03-13 15:42:29

by Markus Heiser

[permalink] [raw]

Subject: Re: Kernel docs: muddying the waters a bit

Am 10.03.2016 um 16:21 schrieb Mauro Carvalho Chehab <[email protected]>:

> Em Thu, 10 Mar 2016 12:25:58 +0200
> Jani Nikula <[email protected]> escreveu:
>
>> TL;DR? Skip to the last paragraph.
>>
>> On Wed, 09 Mar 2016, Mauro Carvalho Chehab <[email protected]> wrote:
>>> I guess the conversion to asciidoc format is now in good shape,
>>> at least to demonstrate that it is possible to use this format for the
>>> media docbook. Still, there are lots of broken references.
>>
>> Getting references right with asciidoc is a big problem in the
>> kernel-doc side. As I wrote before, the proofs of concept only worked
>> because everything was processed as one big file (via includes). The
>> Asciidoctor inter-document references won't help, because we won't know
>> the target document name while processing kernel-doc.
>
> I was able to produce chunked htmls here with:
>
> asciidoctor -b docbook45 media_api.adoc
> xmlto -o html-dir html media_api.xml
>
> The results are at:
> https://mchehab.fedorapeople.org/media-kabi-docs-test/asciidoc_tests/chunked/
>
> But yeah, all references seem to be broken there. It could be due to some
> conversion issue (I didn't actually tried to check what's wrong there),
> but I think that there's something not ok with docbook45
> output for multi-part documents (on both AsciiDoc and Asciidoctor).
>
>> Sphinx is massively better at handling cross references for
>> kernel-doc. We can use domains (C language) and roles (e.g. functions,
>> types, etc.) for the references, which provide kind of
>> namespaces. Sphinx warns for referencing non-existing targets, but
>> doesn't generate broken links in the result like Asciidoctor does.
>>
>> For example, in the documentation for a function that has struct foo as
>> parameter or return type, a cross reference to struct foo is added
>> automagically, but only if documentation for struct foo actually
>> exists. In Asciidoctor, we would have to blindly generate the references
>> ourselves, and try to resolve broken links ourselves by somehow
>> post-processing the result.
>>
>>> Yet, from my side, if we're willing to get rid of DocBook, then
>>> Asciidoctor seems to be the *only* alternative so far to parse the
>>> complex media documents.
>>
>> I think you mean, "get rid of DocBook as source format", not altogether?
>> I'm yet to be convinved we could rely on Asciidoctor's native formats.
>
> What I mean is that, right now, I see only two alternatives for the
> media uAPI documentation:
> 1) keep using DocBook;
> 2) AsciiDoc/Asciidoctor.
>
> Sphinx doesn't have what's needed to support the complexity of the
> media books, specially since cell span seems to be possible only
> by using asciiArt formats. Writing a big table using asciiArt is
> something that is a *real pain*. Also, as tested, if the table is
> too big, it fails to parse such asciiArt tables. So, while Sphinx
> doesn't have a decent way to describe tables, we can't use it.

Huge tables and cell-spans are the *real pain* ;-) ... with sphinx-doc,
(mostly) you have more then one choice .. e.g. import csv tables ..
but this should be discussed by example ...

> If it starts implementing it, then we can check if the other
> features used by the media documentation are also supported.
> Probably, multi-part books would be another pain with Sphinx.
> We have actually 4 books inside a common body. A few chapters
> (like book licensing, bibliography, error codes) are shared
> by all 4 documents.
>
> But, so far, I can't see any way to port media books without
> lots of lot of work to develop new features at the Sphinx code.

may I can help you ...

>> The toolchain gets faster, easier to debug and simplified a lot with
>> DocBook out of the equation completely. Sphinx itself is stable, widely
>> available, and well documented. IMO there's sufficient native output
>> format support. There are plenty of really nice extensions
>> available. There's a possibility of doing kernel-doc as an extension in
>> the future (either by calling current kernel-doc from the extension or
>> by rewriting it).
>
> Well, if we go to Sphinx for kernel-doc, that means that we'll need
> 2 different tools for the documentation:
> - Sphinx for kernel-doc
> - either DocBook or Asciidoctor/AsciiDoc for media.
>
> IMHO, this is the worse scenario, as we'll keep depending on
> DocBook plus requiring Sphinx, but it is up to Jon to decide.
>

The migration of kernel-doc is a long term project, not a
one shot job. The scope of documents to migrate is not limited
to the files with DocBook markup in, most documents have not
a real markup.

Please take a look at my thoughts and efforts about migration.

* https://sphkerneldoc.readthedocs.org

* https://github.com/return42/sphkerneldoc.git

sphkerneldoc.git is a small project started this weekend, within
this project I show you, how migration could be done and
we can discuss concerns like "tables and cell-spans" by example.

Believe me, most concerns discussed in this thread are a leak of
knowledge. I'am working with sphinx-doc since 7 years, switched
over from DocBook (escaped from a 8 years lasting XML hell).
DocBook and sphinx-doc are complete different, so sphinx-doc
might feels odd in the first time, but if you have switched
like me, you will never go back again.

>> Dan keeps bringing up the active community in Asciidoctor, and how
>> they're fixing things up as we speak... which is great, but Sphinx is
>> here now, packaged and shipping in distros ready to use. It seems that
>> of the two, an Asciidoctor based toolchain is currently more in need of
>> hacking and extending to meet our needs. Which brings us to the
>> implementation language, Python vs. Ruby.
>>
>> I won't make the mistake of comparing the relative merits of the
>> languages, but I'll boldly claim the set of kernel developers who know
>> Python is likely larger than the set of kernel developers who know Ruby
>> [citation needed]. AFAICT there are no Ruby tools in the kernel tree,
>> but there is a bunch of Python. My own very limited and subjective
>> experience with other tools around the kernel is that Python is much
>> more popular than Ruby. So my claim here is that we're in a better
>> position to hack on Sphinx extensions ourselves than Asciidoctor.
>
> Sorry, but I don't buy it. Python is, IMHO, a mess: each new version
> is incompatible with the previous one, and requires the source to
> change, in order to use a newer version than the one used to write
> the code. So, when talking about Python, we're actually talking about
> several different dialects that don't talk well to each other.

Sorry, you are complete wrong ... I'am 15 years python programmer,
shipped out huge projects with my customers ... we never have seen
these problems ... sorry ...

> I don't know about Ruby. So far, I don't have anything against (or in
> favor) of it. I bet most Kernel developers would actually prefer a
> toolchain in C. If such tool doesn't exist, anything else seems
> equally the same ;)

Why we are talking about script languages? What needed is a
authoring system, which is as near as possible to the developers,
which are the authors.

Sphinx-Doc is a standard authoring-tool versioned, maintained
and extended by thousands of developers ...

>> My conclusion is that Sphinx covers the vast majority of the needs of
>> our documentation producers and consumers, in an amazing way, out of the
>> box, better than Asciidoctor.
>>
>> Which brings us to the minority and the parts where Sphinx falls short,
>> media documentation in particular. It's complex documentation, with very
>> specific requirements on the output, especially that many things remain
>> exactly as they are now. It also feels like the target is more to have
>> standalone media documentation, and not so much to be aligned with and
>> be part of the rest of the kernel documentation.
>>
>> I want to question the need to have all kernel documentation use tools
>> that meet the strict requirements of the outlier, when there's a better
>> alternative for the vast majority of the documentation. Especially when
>> Asciidoctor isn't a ready solution for media documentation either.
>>
>> In summary, my proposal is to go with Sphinx, leave media docs as
>> DocBook for now, and see if and how they can be converted to
>> Sphinx/reStructuredText later on when we have everything else in
>> place. It's not the perfect outcome, but IMHO it's the best overall
>> choice.
>
> Well, this could be done. We don't have any good reason to move
> the media docs out of DocBook.

Sorry but again wrong: you lost many of the authors which are
frustrated by a XML markup and you lost many developers to improve
the toolchain, frustrated by a complicated DocBook-XML XSLT
toolchain with SGML markup from the middle of the last epoch.

> On the contrary, this means an extra
> work. The only advantage is that it is a way simpler to write
> documentation with a markup language, but converting from the PoC
> to its integration at the Kernel tree still require lots of work,
> specially due to the cross-refs "magic" scripts that we have under
> Documentation/DocBook/media/Makefile.

Yes, you are right, migration is a process not a one shot
job, as I mentioned before. You are a great programmer, your
documentation is also great, this invest should be preserved.
So lets take a try. It would be a honor for me to show
you all theses steps by example on my repository (see above).

> As I said, the only big drawback is to keep depending on two
> different tools for kernel-doc and for media documentation.

-- Markus --

2016-04-08 15:13:32

by Markus Heiser

[permalink] [raw]

Subject: Re: Kernel docs: muddying the waters a bit

Hi kernel-doc authors,

motivated by this MT, I implemented a toolchain to migrate the kernel?s
DocBook XML documentation to reST markup.

It converts 99% of the docs well ... to gain an impression how
kernel-docs could benefit from, visit my sphkerneldoc project page
on github:

http://return42.github.io/sphkerneldoc/

The sources available at:

https://github.com/return42/sphkerneldoc

The work is underway, suggestions are welcome!

.. have a nice weekend ..

--M--

Am 13.03.2016 um 16:33 schrieb Markus Heiser <[email protected]>:

>
> Am 10.03.2016 um 16:21 schrieb Mauro Carvalho Chehab <[email protected]>:
>
>> Em Thu, 10 Mar 2016 12:25:58 +0200
>> Jani Nikula <[email protected]> escreveu:
>>
>>> TL;DR? Skip to the last paragraph.
>>>
>>> On Wed, 09 Mar 2016, Mauro Carvalho Chehab <[email protected]> wrote:
>>>> I guess the conversion to asciidoc format is now in good shape,
>>>> at least to demonstrate that it is possible to use this format for the
>>>> media docbook. Still, there are lots of broken references.
>>>
>>> Getting references right with asciidoc is a big problem in the
>>> kernel-doc side. As I wrote before, the proofs of concept only worked
>>> because everything was processed as one big file (via includes). The
>>> Asciidoctor inter-document references won't help, because we won't know
>>> the target document name while processing kernel-doc.
>>
>> I was able to produce chunked htmls here with:
>>
>> asciidoctor -b docbook45 media_api.adoc
>> xmlto -o html-dir html media_api.xml
>>
>> The results are at:
>> https://mchehab.fedorapeople.org/media-kabi-docs-test/asciidoc_tests/chunked/
>>
>> But yeah, all references seem to be broken there. It could be due to some
>> conversion issue (I didn't actually tried to check what's wrong there),
>> but I think that there's something not ok with docbook45
>> output for multi-part documents (on both AsciiDoc and Asciidoctor).
>>
>>> Sphinx is massively better at handling cross references for
>>> kernel-doc. We can use domains (C language) and roles (e.g. functions,
>>> types, etc.) for the references, which provide kind of
>>> namespaces. Sphinx warns for referencing non-existing targets, but
>>> doesn't generate broken links in the result like Asciidoctor does.
>>>
>>> For example, in the documentation for a function that has struct foo as
>>> parameter or return type, a cross reference to struct foo is added
>>> automagically, but only if documentation for struct foo actually
>>> exists. In Asciidoctor, we would have to blindly generate the references
>>> ourselves, and try to resolve broken links ourselves by somehow
>>> post-processing the result.
>>>
>>>> Yet, from my side, if we're willing to get rid of DocBook, then
>>>> Asciidoctor seems to be the *only* alternative so far to parse the
>>>> complex media documents.
>>>
>>> I think you mean, "get rid of DocBook as source format", not altogether?
>>> I'm yet to be convinved we could rely on Asciidoctor's native formats.
>>
>> What I mean is that, right now, I see only two alternatives for the
>> media uAPI documentation:
>> 1) keep using DocBook;
>> 2) AsciiDoc/Asciidoctor.
>>
>> Sphinx doesn't have what's needed to support the complexity of the
>> media books, specially since cell span seems to be possible only
>> by using asciiArt formats. Writing a big table using asciiArt is
>> something that is a *real pain*. Also, as tested, if the table is
>> too big, it fails to parse such asciiArt tables. So, while Sphinx
>> doesn't have a decent way to describe tables, we can't use it.
>
>
> Huge tables and cell-spans are the *real pain* ;-) ... with sphinx-doc,
> (mostly) you have more then one choice .. e.g. import csv tables ..
> but this should be discussed by example ...
>
>
>> If it starts implementing it, then we can check if the other
>> features used by the media documentation are also supported.
>> Probably, multi-part books would be another pain with Sphinx.
>> We have actually 4 books inside a common body. A few chapters
>> (like book licensing, bibliography, error codes) are shared
>> by all 4 documents.
>>
>> But, so far, I can't see any way to port media books without
>> lots of lot of work to develop new features at the Sphinx code.
>
>
> may I can help you ...
>
>
>>> The toolchain gets faster, easier to debug and simplified a lot with
>>> DocBook out of the equation completely. Sphinx itself is stable, widely
>>> available, and well documented. IMO there's sufficient native output
>>> format support. There are plenty of really nice extensions
>>> available. There's a possibility of doing kernel-doc as an extension in
>>> the future (either by calling current kernel-doc from the extension or
>>> by rewriting it).
>>
>> Well, if we go to Sphinx for kernel-doc, that means that we'll need
>> 2 different tools for the documentation:
>> - Sphinx for kernel-doc
>> - either DocBook or Asciidoctor/AsciiDoc for media.
>>
>> IMHO, this is the worse scenario, as we'll keep depending on
>> DocBook plus requiring Sphinx, but it is up to Jon to decide.
>>
>
> The migration of kernel-doc is a long term project, not a
> one shot job. The scope of documents to migrate is not limited
> to the files with DocBook markup in, most documents have not
> a real markup.
>
> Please take a look at my thoughts and efforts about migration.
>
> * https://sphkerneldoc.readthedocs.org
>
> * https://github.com/return42/sphkerneldoc.git
>
> sphkerneldoc.git is a small project started this weekend, within
> this project I show you, how migration could be done and
> we can discuss concerns like "tables and cell-spans" by example.
>
> Believe me, most concerns discussed in this thread are a leak of
> knowledge. I'am working with sphinx-doc since 7 years, switched
> over from DocBook (escaped from a 8 years lasting XML hell).
> DocBook and sphinx-doc are complete different, so sphinx-doc
> might feels odd in the first time, but if you have switched
> like me, you will never go back again.
>
>>> Dan keeps bringing up the active community in Asciidoctor, and how
>>> they're fixing things up as we speak... which is great, but Sphinx is
>>> here now, packaged and shipping in distros ready to use. It seems that
>>> of the two, an Asciidoctor based toolchain is currently more in need of
>>> hacking and extending to meet our needs. Which brings us to the
>>> implementation language, Python vs. Ruby.
>>>
>>> I won't make the mistake of comparing the relative merits of the
>>> languages, but I'll boldly claim the set of kernel developers who know
>>> Python is likely larger than the set of kernel developers who know Ruby
>>> [citation needed]. AFAICT there are no Ruby tools in the kernel tree,
>>> but there is a bunch of Python. My own very limited and subjective
>>> experience with other tools around the kernel is that Python is much
>>> more popular than Ruby. So my claim here is that we're in a better
>>> position to hack on Sphinx extensions ourselves than Asciidoctor.
>>
>> Sorry, but I don't buy it. Python is, IMHO, a mess: each new version
>> is incompatible with the previous one, and requires the source to
>> change, in order to use a newer version than the one used to write
>> the code. So, when talking about Python, we're actually talking about
>> several different dialects that don't talk well to each other.
>
> Sorry, you are complete wrong ... I'am 15 years python programmer,
> shipped out huge projects with my customers ... we never have seen
> these problems ... sorry ...
>
>
>> I don't know about Ruby. So far, I don't have anything against (or in
>> favor) of it. I bet most Kernel developers would actually prefer a
>> toolchain in C. If such tool doesn't exist, anything else seems
>> equally the same ;)
>
> Why we are talking about script languages? What needed is a
> authoring system, which is as near as possible to the developers,
> which are the authors.
>
> Sphinx-Doc is a standard authoring-tool versioned, maintained
> and extended by thousands of developers ...
>
>
>>> My conclusion is that Sphinx covers the vast majority of the needs of
>>> our documentation producers and consumers, in an amazing way, out of the
>>> box, better than Asciidoctor.
>>>
>>> Which brings us to the minority and the parts where Sphinx falls short,
>>> media documentation in particular. It's complex documentation, with very
>>> specific requirements on the output, especially that many things remain
>>> exactly as they are now. It also feels like the target is more to have
>>> standalone media documentation, and not so much to be aligned with and
>>> be part of the rest of the kernel documentation.
>>>
>>> I want to question the need to have all kernel documentation use tools
>>> that meet the strict requirements of the outlier, when there's a better
>>> alternative for the vast majority of the documentation. Especially when
>>> Asciidoctor isn't a ready solution for media documentation either.
>>>
>>> In summary, my proposal is to go with Sphinx, leave media docs as
>>> DocBook for now, and see if and how they can be converted to
>>> Sphinx/reStructuredText later on when we have everything else in
>>> place. It's not the perfect outcome, but IMHO it's the best overall
>>> choice.
>>
>> Well, this could be done. We don't have any good reason to move
>> the media docs out of DocBook.
>
> Sorry but again wrong: you lost many of the authors which are
> frustrated by a XML markup and you lost many developers to improve
> the toolchain, frustrated by a complicated DocBook-XML XSLT
> toolchain with SGML markup from the middle of the last epoch.
>
>> On the contrary, this means an extra
>> work. The only advantage is that it is a way simpler to write
>> documentation with a markup language, but converting from the PoC
>> to its integration at the Kernel tree still require lots of work,
>> specially due to the cross-refs "magic" scripts that we have under
>> Documentation/DocBook/media/Makefile.
>
> Yes, you are right, migration is a process not a one shot
> job, as I mentioned before. You are a great programmer, your
> documentation is also great, this invest should be preserved.
> So lets take a try. It would be a honor for me to show
> you all theses steps by example on my repository (see above).
>
>> As I said, the only big drawback is to keep depending on two
>> different tools for kernel-doc and for media documentation.
>
> -- Markus --
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-04-12 09:19:12

by Hans Verkuil

[permalink] [raw]

Subject: Re: Kernel docs: muddying the waters a bit

Hi Markus,

On 04/08/16 17:12, Markus Heiser wrote:
> Hi kernel-doc authors,
>
> motivated by this MT, I implemented a toolchain to migrate the kernel?s
> DocBook XML documentation to reST markup.
>
> It converts 99% of the docs well ... to gain an impression how
> kernel-docs could benefit from, visit my sphkerneldoc project page
> on github:
>
> http://return42.github.io/sphkerneldoc/
>
> The sources available at:
>
> https://github.com/return42/sphkerneldoc
>
> The work is underway, suggestions are welcome!

I have to admit that this looks pretty good :-)

My main remark based on my quick scan through the docs is that anything in
typewriter font seems to be shown as red text with a rectangle around it.
That's quite jarring for me and I think it should be just shown as normal
text, just using a non-proportional font, just like in the original.

I also noticed that the 'title' of tables ends with a '?' character for
some reason.

See e.g. the struct v4l2_audioout table in
http://return42.github.io/sphkerneldoc/books/linux_tv/media/v4l/vidioc-g-audioout.html

Regards,

Hans

>
> .. have a nice weekend ..
>
> --M--
>
>
> Am 13.03.2016 um 16:33 schrieb Markus Heiser <[email protected]>:
>
>>
>> Am 10.03.2016 um 16:21 schrieb Mauro Carvalho Chehab <[email protected]>:
>>
>>> Em Thu, 10 Mar 2016 12:25:58 +0200
>>> Jani Nikula <[email protected]> escreveu:
>>>
>>>> TL;DR? Skip to the last paragraph.
>>>>
>>>> On Wed, 09 Mar 2016, Mauro Carvalho Chehab <[email protected]> wrote:
>>>>> I guess the conversion to asciidoc format is now in good shape,
>>>>> at least to demonstrate that it is possible to use this format for the
>>>>> media docbook. Still, there are lots of broken references.
>>>>
>>>> Getting references right with asciidoc is a big problem in the
>>>> kernel-doc side. As I wrote before, the proofs of concept only worked
>>>> because everything was processed as one big file (via includes). The
>>>> Asciidoctor inter-document references won't help, because we won't know
>>>> the target document name while processing kernel-doc.
>>>
>>> I was able to produce chunked htmls here with:
>>>
>>> asciidoctor -b docbook45 media_api.adoc
>>> xmlto -o html-dir html media_api.xml
>>>
>>> The results are at:
>>> https://mchehab.fedorapeople.org/media-kabi-docs-test/asciidoc_tests/chunked/
>>>
>>> But yeah, all references seem to be broken there. It could be due to some
>>> conversion issue (I didn't actually tried to check what's wrong there),
>>> but I think that there's something not ok with docbook45
>>> output for multi-part documents (on both AsciiDoc and Asciidoctor).
>>>
>>>> Sphinx is massively better at handling cross references for
>>>> kernel-doc. We can use domains (C language) and roles (e.g. functions,
>>>> types, etc.) for the references, which provide kind of
>>>> namespaces. Sphinx warns for referencing non-existing targets, but
>>>> doesn't generate broken links in the result like Asciidoctor does.
>>>>
>>>> For example, in the documentation for a function that has struct foo as
>>>> parameter or return type, a cross reference to struct foo is added
>>>> automagically, but only if documentation for struct foo actually
>>>> exists. In Asciidoctor, we would have to blindly generate the references
>>>> ourselves, and try to resolve broken links ourselves by somehow
>>>> post-processing the result.
>>>>
>>>>> Yet, from my side, if we're willing to get rid of DocBook, then
>>>>> Asciidoctor seems to be the *only* alternative so far to parse the
>>>>> complex media documents.
>>>>
>>>> I think you mean, "get rid of DocBook as source format", not altogether?
>>>> I'm yet to be convinved we could rely on Asciidoctor's native formats.
>>>
>>> What I mean is that, right now, I see only two alternatives for the
>>> media uAPI documentation:
>>> 1) keep using DocBook;
>>> 2) AsciiDoc/Asciidoctor.
>>>
>>> Sphinx doesn't have what's needed to support the complexity of the
>>> media books, specially since cell span seems to be possible only
>>> by using asciiArt formats. Writing a big table using asciiArt is
>>> something that is a *real pain*. Also, as tested, if the table is
>>> too big, it fails to parse such asciiArt tables. So, while Sphinx
>>> doesn't have a decent way to describe tables, we can't use it.
>>
>>
>> Huge tables and cell-spans are the *real pain* ;-) ... with sphinx-doc,
>> (mostly) you have more then one choice .. e.g. import csv tables ..
>> but this should be discussed by example ...
>>
>>
>>> If it starts implementing it, then we can check if the other
>>> features used by the media documentation are also supported.
>>> Probably, multi-part books would be another pain with Sphinx.
>>> We have actually 4 books inside a common body. A few chapters
>>> (like book licensing, bibliography, error codes) are shared
>>> by all 4 documents.
>>>
>>> But, so far, I can't see any way to port media books without
>>> lots of lot of work to develop new features at the Sphinx code.
>>
>>
>> may I can help you ...
>>
>>
>>>> The toolchain gets faster, easier to debug and simplified a lot with
>>>> DocBook out of the equation completely. Sphinx itself is stable, widely
>>>> available, and well documented. IMO there's sufficient native output
>>>> format support. There are plenty of really nice extensions
>>>> available. There's a possibility of doing kernel-doc as an extension in
>>>> the future (either by calling current kernel-doc from the extension or
>>>> by rewriting it).
>>>
>>> Well, if we go to Sphinx for kernel-doc, that means that we'll need
>>> 2 different tools for the documentation:
>>> - Sphinx for kernel-doc
>>> - either DocBook or Asciidoctor/AsciiDoc for media.
>>>
>>> IMHO, this is the worse scenario, as we'll keep depending on
>>> DocBook plus requiring Sphinx, but it is up to Jon to decide.
>>>
>>
>> The migration of kernel-doc is a long term project, not a
>> one shot job. The scope of documents to migrate is not limited
>> to the files with DocBook markup in, most documents have not
>> a real markup.
>>
>> Please take a look at my thoughts and efforts about migration.
>>
>> * https://sphkerneldoc.readthedocs.org
>>
>> * https://github.com/return42/sphkerneldoc.git
>>
>> sphkerneldoc.git is a small project started this weekend, within
>> this project I show you, how migration could be done and
>> we can discuss concerns like "tables and cell-spans" by example.
>>
>> Believe me, most concerns discussed in this thread are a leak of
>> knowledge. I'am working with sphinx-doc since 7 years, switched
>> over from DocBook (escaped from a 8 years lasting XML hell).
>> DocBook and sphinx-doc are complete different, so sphinx-doc
>> might feels odd in the first time, but if you have switched
>> like me, you will never go back again.
>>
>>>> Dan keeps bringing up the active community in Asciidoctor, and how
>>>> they're fixing things up as we speak... which is great, but Sphinx is
>>>> here now, packaged and shipping in distros ready to use. It seems that
>>>> of the two, an Asciidoctor based toolchain is currently more in need of
>>>> hacking and extending to meet our needs. Which brings us to the
>>>> implementation language, Python vs. Ruby.
>>>>
>>>> I won't make the mistake of comparing the relative merits of the
>>>> languages, but I'll boldly claim the set of kernel developers who know
>>>> Python is likely larger than the set of kernel developers who know Ruby
>>>> [citation needed]. AFAICT there are no Ruby tools in the kernel tree,
>>>> but there is a bunch of Python. My own very limited and subjective
>>>> experience with other tools around the kernel is that Python is much
>>>> more popular than Ruby. So my claim here is that we're in a better
>>>> position to hack on Sphinx extensions ourselves than Asciidoctor.
>>>
>>> Sorry, but I don't buy it. Python is, IMHO, a mess: each new version
>>> is incompatible with the previous one, and requires the source to
>>> change, in order to use a newer version than the one used to write
>>> the code. So, when talking about Python, we're actually talking about
>>> several different dialects that don't talk well to each other.
>>
>> Sorry, you are complete wrong ... I'am 15 years python programmer,
>> shipped out huge projects with my customers ... we never have seen
>> these problems ... sorry ...
>>
>>
>>> I don't know about Ruby. So far, I don't have anything against (or in
>>> favor) of it. I bet most Kernel developers would actually prefer a
>>> toolchain in C. If such tool doesn't exist, anything else seems
>>> equally the same ;)
>>
>> Why we are talking about script languages? What needed is a
>> authoring system, which is as near as possible to the developers,
>> which are the authors.
>>
>> Sphinx-Doc is a standard authoring-tool versioned, maintained
>> and extended by thousands of developers ...
>>
>>
>>>> My conclusion is that Sphinx covers the vast majority of the needs of
>>>> our documentation producers and consumers, in an amazing way, out of the
>>>> box, better than Asciidoctor.
>>>>
>>>> Which brings us to the minority and the parts where Sphinx falls short,
>>>> media documentation in particular. It's complex documentation, with very
>>>> specific requirements on the output, especially that many things remain
>>>> exactly as they are now. It also feels like the target is more to have
>>>> standalone media documentation, and not so much to be aligned with and
>>>> be part of the rest of the kernel documentation.
>>>>
>>>> I want to question the need to have all kernel documentation use tools
>>>> that meet the strict requirements of the outlier, when there's a better
>>>> alternative for the vast majority of the documentation. Especially when
>>>> Asciidoctor isn't a ready solution for media documentation either.
>>>>
>>>> In summary, my proposal is to go with Sphinx, leave media docs as
>>>> DocBook for now, and see if and how they can be converted to
>>>> Sphinx/reStructuredText later on when we have everything else in
>>>> place. It's not the perfect outcome, but IMHO it's the best overall
>>>> choice.
>>>
>>> Well, this could be done. We don't have any good reason to move
>>> the media docs out of DocBook.
>>
>> Sorry but again wrong: you lost many of the authors which are
>> frustrated by a XML markup and you lost many developers to improve
>> the toolchain, frustrated by a complicated DocBook-XML XSLT
>> toolchain with SGML markup from the middle of the last epoch.
>>
>>> On the contrary, this means an extra
>>> work. The only advantage is that it is a way simpler to write
>>> documentation with a markup language, but converting from the PoC
>>> to its integration at the Kernel tree still require lots of work,
>>> specially due to the cross-refs "magic" scripts that we have under
>>> Documentation/DocBook/media/Makefile.
>>
>> Yes, you are right, migration is a process not a one shot
>> job, as I mentioned before. You are a great programmer, your
>> documentation is also great, this invest should be preserved.
>> So lets take a try. It would be a honor for me to show
>> you all theses steps by example on my repository (see above).
>>
>>> As I said, the only big drawback is to keep depending on two
>>> different tools for kernel-doc and for media documentation.
>>
>> -- Markus --
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-media" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2016-04-12 15:46:27

by Jonathan Corbet

[permalink] [raw]

Subject: Re: Kernel docs: muddying the waters a bit

On Fri, 8 Apr 2016 17:12:27 +0200
Markus Heiser <[email protected]> wrote:

> motivated by this MT, I implemented a toolchain to migrate the kernel’s
> DocBook XML documentation to reST markup.
>
> It converts 99% of the docs well ... to gain an impression how
> kernel-docs could benefit from, visit my sphkerneldoc project page
> on github:
>
> http://return42.github.io/sphkerneldoc/

So I've obviously been pretty quiet on this recently. Apologies...I've
been dealing with an extended death-in-the-family experience, and there is
still a fair amount of cleanup to be done.

Looking quickly at this work, it seems similar to the results I got. But
there's a lot of code there that came from somewhere? I'd put together a
fairly simple conversion using pandoc and a couple of short sed scripts;
is there a reason for a more complex solution?

Thanks for looking into this, anyway; I hope to be able to focus more on
it shortly.

jon

2016-04-18 09:50:34

by Markus Heiser

[permalink] [raw]

Subject: Re: Kernel docs: muddying the waters a bit

Hi Jonahtan,

Am 12.04.2016 um 17:46 schrieb Jonathan Corbet <[email protected]>:

> On Fri, 8 Apr 2016 17:12:27 +0200
> Markus Heiser <[email protected]> wrote:
>
>> motivated by this MT, I implemented a toolchain to migrate the kernel?s
>> DocBook XML documentation to reST markup.
>>
>> It converts 99% of the docs well ... to gain an impression how
>> kernel-docs could benefit from, visit my sphkerneldoc project page
>> on github:
>>
>> http://return42.github.io/sphkerneldoc/
>
> So I've obviously been pretty quiet on this recently. Apologies...I've
> been dealing with an extended death-in-the-family experience, and there is
> still a fair amount of cleanup to be done.
>
> Looking quickly at this work, it seems similar to the results I got. But
> there's a lot of code there that came from somewhere?

>From me? ... except the kernel-doc script which is a fork from your

git://git.lwn.net/linux.git doc/sphinx

> I'd put together a
> fairly simple conversion using pandoc and a couple of short sed scripts;
> is there a reason for a more complex solution?

It depends. If you have a simple DocBook with less various markup, maybe not.
May you want to read my remarks about migration tools and especially pandoc:

* https://return42.github.io/sphkerneldoc/articles/dbtools.html#remarks-on-pandoc

A few more words about, what I have done:

I wrote a lib of XML filters which might be also usefully in other
migration projects (dbxml).

* https://github.com/return42/sphkerneldoc/blob/master/scripts/dbxml.py

It uses a xml-parser, pandoc, pandoc-filters and regular expressions. Because
I did not implemented a whole converter, I hacked around pandoc. Thats why
conversion is done in several steps:

1. copy xml file(s) to a cache space

2. substitude unsolved internal and external entities

3. filter all xml files

* run custom hooks on every node

* apply filters on every node and inject reST into the XML-tree where pandoc fails.
https://github.com/return42/sphkerneldoc/blob/master/scripts/dbxml.py#L515

4. convert intermediary XML result with pandoc to json (needed by pandoc filters)

5. apply pandoc-filter and clean up the injected reST markup from step3

6. convert filtered json to reST

7. fix the produce reST with regular expression

... the last step is similar to your sed scripts.

And I wrote a commandline Interface to use this lib (see func db2rst):

* https://github.com/return42/sphkerneldoc/blob/master/scripts/dbtools.py#L146

With this db2rst all kernel DB-XML books could be migrated, except the linux-tv
book, which has much more complexity. For this, there is a separated commandline
called media2rst

* https://github.com/return42/sphkerneldoc/blob/master/scripts/dbtools.py#L107

The media2rst needs several special handlings, which is implemented in
hooks (the dbxml interface method)

* https://github.com/return42/sphkerneldoc/blob/master/scripts/media.py

Summarize, why should one prefer this tools over pandoc + sed?

* Pandoc coverage is less on reading and writing, this is where
dbxml comes into play

- reading DocBook:
https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Readers/DocBook.hs#L23

- writing reST has many bugs and leaks
(you fixed some of them with sed)

* Pandoc does not support external entities (linux-tv), covered by dbxml

* dbxml brings the ability to chunk one large XML book into small
reST chunks e.g. kernel-hacking book:

https://github.com/return42/sphkerneldoc/tree/master/doc/books/kernel-hacking

* dbxml lets you manipulate the XML source before you convert it to reST

this might helpfull e.g. if you have to convert single-column informal-tables
to lists or other things ... in short; dbxml and it's hooks are the key to hack
everything you need in a full automated DocBook-->reST migration workflow.

--Markus--

> Thanks for looking into this, anyway; I hope to be able to focus more on
> it shortly.
>
> jon
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-04-27 14:29:05

by Grant Likely

[permalink] [raw]

Subject: Re: Kernel docs: muddying the waters a bit

On Tue, Apr 12, 2016 at 4:46 PM, Jonathan Corbet <[email protected]> wrote:
> On Fri, 8 Apr 2016 17:12:27 +0200
> Markus Heiser <[email protected]> wrote:
>
>> motivated by this MT, I implemented a toolchain to migrate the kernel’s
>> DocBook XML documentation to reST markup.
>>
>> It converts 99% of the docs well ... to gain an impression how
>> kernel-docs could benefit from, visit my sphkerneldoc project page
>> on github:
>>
>> http://return42.github.io/sphkerneldoc/
>
> So I've obviously been pretty quiet on this recently. Apologies...I've
> been dealing with an extended death-in-the-family experience, and there is
> still a fair amount of cleanup to be done.
>
> Looking quickly at this work, it seems similar to the results I got. But
> there's a lot of code there that came from somewhere? I'd put together a
> fairly simple conversion using pandoc and a couple of short sed scripts;
> is there a reason for a more complex solution?
>
> Thanks for looking into this, anyway; I hope to be able to focus more on
> it shortly.

Hi Jon,

Thanks for digging into this. FWIW, here is my $0.02. I've been
working on restarting the devicetree specification, and after looking
at both reStructuredText and Asciidoc(tor) I thought I liked the
Asciidoc markup better, so chose that. I then proceeded to spend weeks
trying to get reasonable output from the toolchain. When I got fed up
and gave Sphinx a try, I was up and running with reasonable PDF and
HTML output in a day and a half.

Honestly, in the end I think we could make either tool do what is
needed of it. However, my impression after trying to do a document
that needs to have nice publishable output with both tools is that
Sphinx is easier to work with, simpler to extend, better supported. My
vote is firmly behind Sphinx/reStructuredText.

g.