2021-05-10 10:40:26

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: [PATCH 39/53] docs: dev-tools: testing-overview.rst: avoid using UTF-8 chars

While UTF-8 characters can be used at the Linux documentation,
the best is to use them only when ASCII doesn't offer a good replacement.
So, replace the occurences of the following UTF-8 characters:

- U+2014 ('—'): EM DASH

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
---
Documentation/dev-tools/testing-overview.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
index b5b46709969c..8adffc26a2ec 100644
--- a/Documentation/dev-tools/testing-overview.rst
+++ b/Documentation/dev-tools/testing-overview.rst
@@ -18,8 +18,8 @@ frameworks. These both provide infrastructure to help make running tests and
groups of tests easier, as well as providing helpers to aid in writing new
tests.

-If you're looking to verify the behaviour of the Kernel — particularly specific
-parts of the kernel — then you'll want to use KUnit or kselftest.
+If you're looking to verify the behaviour of the Kernel - particularly specific
+parts of the kernel - then you'll want to use KUnit or kselftest.


The Difference Between KUnit and kselftest
--
2.30.2


2021-05-10 11:51:45

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 39/53] docs: dev-tools: testing-overview.rst: avoid using UTF-8 chars

On Mon, 10 May 2021 at 12:27, Mauro Carvalho Chehab
<[email protected]> wrote:
>
> While UTF-8 characters can be used at the Linux documentation,
> the best is to use them only when ASCII doesn't offer a good replacement.
> So, replace the occurences of the following UTF-8 characters:
>
> - U+2014 ('—'): EM DASH
>
> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
> ---
> Documentation/dev-tools/testing-overview.rst | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
> index b5b46709969c..8adffc26a2ec 100644
> --- a/Documentation/dev-tools/testing-overview.rst
> +++ b/Documentation/dev-tools/testing-overview.rst
> @@ -18,8 +18,8 @@ frameworks. These both provide infrastructure to help make running tests and
> groups of tests easier, as well as providing helpers to aid in writing new
> tests.
>
> -If you're looking to verify the behaviour of the Kernel — particularly specific
> -parts of the kernel — then you'll want to use KUnit or kselftest.
> +If you're looking to verify the behaviour of the Kernel - particularly specific
> +parts of the kernel - then you'll want to use KUnit or kselftest.

Single dash is incorrect punctuation here. So that Sphinx gives us the
correct em dash, these should be '--'.

Thanks,
-- Marco

2021-05-10 23:37:36

by David Gow

[permalink] [raw]
Subject: Re: [PATCH 39/53] docs: dev-tools: testing-overview.rst: avoid using UTF-8 chars

On Mon, May 10, 2021 at 6:27 PM Mauro Carvalho Chehab
<[email protected]> wrote:
>
> While UTF-8 characters can be used at the Linux documentation,
> the best is to use them only when ASCII doesn't offer a good replacement.
> So, replace the occurences of the following UTF-8 characters:
>
> - U+2014 ('—'): EM DASH
>
> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
> ---

Oh dear, I do have a habit of overusing em-dashes. I've no problem in
theory with exchanging them for an ASCII approximation.
I suppose there's a reason it's the one dash to rule them all: :-)
https://twitter.com/FakeUnicode/status/727888721312260096/photo/1

> Documentation/dev-tools/testing-overview.rst | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
> index b5b46709969c..8adffc26a2ec 100644
> --- a/Documentation/dev-tools/testing-overview.rst
> +++ b/Documentation/dev-tools/testing-overview.rst
> @@ -18,8 +18,8 @@ frameworks. These both provide infrastructure to help make running tests and
> groups of tests easier, as well as providing helpers to aid in writing new
> tests.
>
> -If you're looking to verify the behaviour of the Kernel — particularly specific
> -parts of the kernel — then you'll want to use KUnit or kselftest.
> +If you're looking to verify the behaviour of the Kernel - particularly specific
> +parts of the kernel - then you'll want to use KUnit or kselftest.

As Marco pointed out, having multiple HYPHEN-MINUS symbols in a row is
probably a better replacement, as it does distinguish the em-dash from
smaller dashes better.
However, I need three for sphinx to output an em-dash here (2 hyphens
only gives me an en-dash).

So, if we want to get rid of the UTF-8 em-dash, my preferences would
be (in descending order):
1. Three hyphens: '---' (sphinx generates an em-dash)
2. Two hyphens: '--' (worst case, an en-dash surrounded by spaces --
as sphinx generates for me -- is still readable, and it's still
readable as an em-dash in plain text)
3. One hyphen as in this patch (which I don't like as much, but will
no doubt learn to live with)

But it looks like you've got several similar comments on other patches
in this series, so I'm happy for you to use whatever ends up being
agreed upon generally.

Cheers,
-- David

2021-05-12 08:16:22

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: [PATCH 39/53] docs: dev-tools: testing-overview.rst: avoid using UTF-8 chars

Em Tue, 11 May 2021 07:35:29 +0800
David Gow <[email protected]> escreveu:

> On Mon, May 10, 2021 at 6:27 PM Mauro Carvalho Chehab
> <[email protected]> wrote:
> >
> > While UTF-8 characters can be used at the Linux documentation,
> > the best is to use them only when ASCII doesn't offer a good replacement.
> > So, replace the occurences of the following UTF-8 characters:
> >
> > - U+2014 ('—'): EM DASH
> >
> > Signed-off-by: Mauro Carvalho Chehab <[email protected]>
> > ---
>
> Oh dear, I do have a habit of overusing em-dashes. I've no problem in
> theory with exchanging them for an ASCII approximation.
> I suppose there's a reason it's the one dash to rule them all: :-)
> https://twitter.com/FakeUnicode/status/727888721312260096/photo/1

No, there's no such rule, although there's a preference to keep
the texts easy to edit/read as text files[1]. The main rationale for
this series is that the conversion from other formats to ReST ended
introducing a lot of UTF-8 noise.

[1] IMO, the best is to use UTF-8 characters for symbols that
aren't properly represented in ASCII, like Latin accents,
Greek letters, etc.

In the specific case of dashes, you can use:

"--" for EN DASH
"---" for EM DASH

Those will automatically be translated by Sphinx when building
the docs. Using ASCII there usually makes life simpler for
developers whose editors can't easily type EN/EM DASH.

Btw, Sphinx will also replace commas to curly commas
automatically on its output (except for literal blocks).

Thanks,
Mauro

2021-05-12 08:29:51

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: [PATCH 39/53] docs: dev-tools: testing-overview.rst: avoid using UTF-8 chars

Em Tue, 11 May 2021 07:35:29 +0800
David Gow <[email protected]> escreveu:

> On Mon, May 10, 2021 at 6:27 PM Mauro Carvalho Chehab
> <[email protected]> wrote:
> >
> > While UTF-8 characters can be used at the Linux documentation,
> > the best is to use them only when ASCII doesn't offer a good replacement.
> > So, replace the occurences of the following UTF-8 characters:
> >
> > - U+2014 ('—'): EM DASH
> >
> > Signed-off-by: Mauro Carvalho Chehab <[email protected]>
> > ---
>
> Oh dear, I do have a habit of overusing em-dashes. I've no problem in
> theory with exchanging them for an ASCII approximation.
> I suppose there's a reason it's the one dash to rule them all: :-)
> https://twitter.com/FakeUnicode/status/727888721312260096/photo/1
>
> > Documentation/dev-tools/testing-overview.rst | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
> > index b5b46709969c..8adffc26a2ec 100644
> > --- a/Documentation/dev-tools/testing-overview.rst
> > +++ b/Documentation/dev-tools/testing-overview.rst
> > @@ -18,8 +18,8 @@ frameworks. These both provide infrastructure to help make running tests and
> > groups of tests easier, as well as providing helpers to aid in writing new
> > tests.
> >
> > -If you're looking to verify the behaviour of the Kernel — particularly specific
> > -parts of the kernel — then you'll want to use KUnit or kselftest.
> > +If you're looking to verify the behaviour of the Kernel - particularly specific
> > +parts of the kernel - then you'll want to use KUnit or kselftest.
>
> As Marco pointed out, having multiple HYPHEN-MINUS symbols in a row is
> probably a better replacement, as it does distinguish the em-dash from
> smaller dashes better.
> However, I need three for sphinx to output an em-dash here (2 hyphens
> only gives me an en-dash).
>
> So, if we want to get rid of the UTF-8 em-dash, my preferences would
> be (in descending order):
> 1. Three hyphens: '---' (sphinx generates an em-dash)
> 2. Two hyphens: '--' (worst case, an en-dash surrounded by spaces --
> as sphinx generates for me -- is still readable, and it's still
> readable as an em-dash in plain text)
> 3. One hyphen as in this patch (which I don't like as much, but will
> no doubt learn to live with)
>
> But it looks like you've got several similar comments on other patches
> in this series, so I'm happy for you to use whatever ends up being
> agreed upon generally.

Yeah, from the comments I received so far, it seems that most developers
want to use '---' for EM DASH and '--' for EN DASH, typing it as ASCII
instead of using U+<number> as this is easier on most editors.

Yet, my understanding is that we don't have a consensus with that
regards, as some patches I sent using a single hyphen were
accepted/reviewed/acked.

So, I sent (and it was already applied) a small patch series (/5)
fixing the cases where UTF-8 chars (including DASH) were added
by mistake (probably due to some conversion tool).

For the remaining issues, my plan is to split this series in two
parts:

The first one with non-polemic UTF-8 changes, and a second one with
just EM/EN DASH, using '---' to replace EM DASH and '--' to replace
EN DASH, as this way, the produced HTML/LaTeX/PDF docs won't change.

This should make easier to discuss the EM/EN DASH changes on
each patch, and see if the above default is the better fit for a
particular usecase.

Thanks,
Mauro

2021-05-12 08:53:47

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: [PATCH 39/53] docs: dev-tools: testing-overview.rst: avoid using UTF-8 chars

Em Mon, 10 May 2021 12:48:22 +0200
Marco Elver <[email protected]> escreveu:

> On Mon, 10 May 2021 at 12:27, Mauro Carvalho Chehab
> <[email protected]> wrote:
> >
> > While UTF-8 characters can be used at the Linux documentation,
> > the best is to use them only when ASCII doesn't offer a good replacement.
> > So, replace the occurences of the following UTF-8 characters:
> >
> > - U+2014 ('—'): EM DASH
> >
> > Signed-off-by: Mauro Carvalho Chehab <[email protected]>
> > ---
> > Documentation/dev-tools/testing-overview.rst | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
> > index b5b46709969c..8adffc26a2ec 100644
> > --- a/Documentation/dev-tools/testing-overview.rst
> > +++ b/Documentation/dev-tools/testing-overview.rst
> > @@ -18,8 +18,8 @@ frameworks. These both provide infrastructure to help make running tests and
> > groups of tests easier, as well as providing helpers to aid in writing new
> > tests.
> >
> > -If you're looking to verify the behaviour of the Kernel — particularly specific
> > -parts of the kernel — then you'll want to use KUnit or kselftest.
> > +If you're looking to verify the behaviour of the Kernel - particularly specific
> > +parts of the kernel - then you'll want to use KUnit or kselftest.
>
> Single dash is incorrect punctuation here. So that Sphinx gives us the
> correct em dash, these should be '--'.

On Sphinx[1]:

-- is equivalent to EN DASH;
--- is equivalent to EM DASH.

[1] https://docutils.sourceforge.io/docs/user/smartquotes.html

I'll change this on a next spin.

Thanks,
Mauro