Hi,
a while back Matthew suggested adding automatic markup of known constants, like
NULL and error codes, as literals [1]. This is a great idea since those literals
are very frequently used throughout the documentation and are fixed names, so we
can easily match them without false positives and make the source files less
cluttered.
Patch 1 adds that feature to automarkup.py, while patch 2 removes the no longer
needed markup from the xarray.rst file which was given as an example.
Some things I'd like to discuss:
* As a first draft I added the constants I found on xarray.rst plus all error
codes from errno-base.h and errno.h. We don't need to start with everything,
but we can certainly do better than this. What are common constants that
should be added here? (Matthew mentioned ANSI C or POSIX, some reference to
the constants in those would be great, even more if they are easily parsable)
* The Literals list added is already a bit big with just the error codes, so
perhaps we should move them to a separate plain text file that is read on
startup?
* There was also mention of automatically converting uppercase words to
literals. I'm not so sure about that one so I left it out for now.
The example given was XA_MARK_0, which is a #define in xarray.h. The way
to go here is certainly to first use kernel-doc to get it included in the
documentation, and then cross-reference to it. FWICT at the moment kernel-doc
for defines should be done like for functions (but the parenthesis can be
omitted, although they show up in the output):
/**
* XA_MARK_0 - Brief description.
*
* Description of the type.
*/
At the current state, the cross-reference would then need to be done either
through plain :c:macro:`XA_MARK_0`, or, by relying on automarkup, typedef
XA_MARK_0 (which is not ideal since this isn't a typedef). Options for
improvements would be to add 'macro' as a possible prefix (so eg. macro
XA_MARK_0), or go all out and try to cross-reference on every uppercase word
like suggested. We should strive for what is the most natural to write, as
long as the regex isn't too broad.
Since automarkup.py is opt-out rather than opt-in, I think we should be extra
careful not to make the regexes too broad, even if in this case it relies on a
C symbol being present.
One other idea I had was that we could limit to all uppercase words as
long as it has an _ on it, since I don't expect we would get false positives
with that. The downside is that it might be very confusing to people why their
MACRONAME isn't being automatically cross-referenced to...
Thanks,
Nícolas
[1] https://lore.kernel.org/linux-doc/[email protected]/
Nícolas F. R. A. Prado (2):
docs: automarkup.py: Add literal markup of known constants
XArray: Remove literal markup from known constants
Documentation/core-api/xarray.rst | 48 +++++++++++++++---------------
Documentation/sphinx/automarkup.py | 45 ++++++++++++++++++++++++++--
2 files changed, 67 insertions(+), 26 deletions(-)
--
2.32.0
Remove literal markup from known constants, instead relying on
automarkup.py to make them into literals.
Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
---
Documentation/core-api/xarray.rst | 48 +++++++++++++++----------------
1 file changed, 24 insertions(+), 24 deletions(-)
diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst
index a137a0e6d068..6e608e2e8e5b 100644
--- a/Documentation/core-api/xarray.rst
+++ b/Documentation/core-api/xarray.rst
@@ -22,7 +22,7 @@ The XArray implementation is efficient when the indices used are densely
clustered; hashing the object and using the hash as the index will not
perform well. The XArray is optimised for small indices, but still has
good performance with large indices. If your index can be larger than
-``ULONG_MAX`` then the XArray is not the data type for you. The most
+ULONG_MAX then the XArray is not the data type for you. The most
important user of the XArray is the page cache.
Normal pointers may be stored in the XArray directly. They must be 4-byte
@@ -31,7 +31,7 @@ alloc_page(). It isn't true for arbitrary user-space pointers,
nor for function pointers. You can store pointers to statically allocated
objects, as long as those objects have an alignment of at least 4.
-You can also store integers between 0 and ``LONG_MAX`` in the XArray.
+You can also store integers between 0 and LONG_MAX in the XArray.
You must first convert it into an entry using xa_mk_value().
When you retrieve an entry from the XArray, you can check whether it is
a value entry by calling xa_is_value(), and convert it back to
@@ -52,7 +52,7 @@ An unusual feature of the XArray is the ability to create entries which
occupy a range of indices. Once stored to, looking up any index in
the range will return the same entry as looking up any other index in
the range. Storing to any index will store to all of them. Multi-index
-entries can be explicitly split into smaller entries, or storing ``NULL``
+entries can be explicitly split into smaller entries, or storing NULL
into any entry will cause the XArray to forget about the range.
Normal API
@@ -60,16 +60,16 @@ Normal API
Start by initialising an XArray, either with DEFINE_XARRAY()
for statically allocated XArrays or xa_init() for dynamically
-allocated ones. A freshly-initialised XArray contains a ``NULL``
+allocated ones. A freshly-initialised XArray contains a NULL
pointer at every index.
You can then set entries using xa_store() and get entries
using xa_load(). xa_store will overwrite any entry with the
new entry and return the previous entry stored at that index. You can
use xa_erase() instead of calling xa_store() with a
-``NULL`` entry. There is no difference between an entry that has never
+NULL a entry. There is no difference between an entry that has never
been stored to, one that has been erased and one that has most recently
-had ``NULL`` stored to it.
+had NULL stored to it.
You can conditionally replace an entry at an index by using
xa_cmpxchg(). Like cmpxchg(), it will only succeed if
@@ -78,8 +78,8 @@ which was at that index; if it returns the same entry which was passed as
'old', then xa_cmpxchg() succeeded.
If you want to only store a new entry to an index if the current entry
-at that index is ``NULL``, you can use xa_insert() which
-returns ``-EBUSY`` if the entry is not empty.
+at that index is NULL, you can use xa_insert() which
+returns -EBUSY if the entry is not empty.
You can copy entries out of the XArray into a plain array by calling
xa_extract(). Or you can iterate over the present entries in the XArray
@@ -97,14 +97,14 @@ some, but not all of the other indices changing.
Sometimes you need to ensure that a subsequent call to xa_store()
will not need to allocate memory. The xa_reserve() function
will store a reserved entry at the indicated index. Users of the
-normal API will see this entry as containing ``NULL``. If you do
+normal API will see this entry as containing NULL. If you do
not need to use the reserved entry, you can call xa_release()
to remove the unused entry. If another user has stored to the entry
in the meantime, xa_release() will do nothing; if instead you
-want the entry to become ``NULL``, you should use xa_erase().
+want the entry to become NULL, you should use xa_erase().
Using xa_insert() on a reserved entry will fail.
-If all entries in the array are ``NULL``, the xa_empty() function
+If all entries in the array are NULL, the xa_empty() function
will return ``true``.
Finally, you can remove all entries from an XArray by calling
@@ -120,7 +120,7 @@ Each mark may be set or cleared independently of the others. You can
iterate over marked entries by using the xa_for_each_marked() iterator.
You can enquire whether a mark is set on an entry by using
-xa_get_mark(). If the entry is not ``NULL``, you can set a mark on it
+xa_get_mark(). If the entry is not NULL, you can set a mark on it
by using xa_set_mark() and remove the mark from an entry by calling
xa_clear_mark(). You can ask whether any entry in the XArray has a
particular mark set by calling xa_marked(). Erasing an entry from the
@@ -151,9 +151,9 @@ interrupts while allocating the ID.
Using xa_store(), xa_cmpxchg() or xa_insert() will
also mark the entry as being allocated. Unlike a normal XArray, storing
-``NULL`` will mark the entry as being in use, like xa_reserve().
+NULL will mark the entry as being in use, like xa_reserve().
To free an entry, use xa_erase() (or xa_release() if
-you only want to free the entry if it's ``NULL``).
+you only want to free the entry if it's NULL).
By default, the lowest free entry is allocated starting from 0. If you
want to allocate entries starting at 1, it is more efficient to use
@@ -326,11 +326,11 @@ xas_error() to retrieve the error. All operations check whether
the xa_state is in an error state before proceeding, so there's no need
for you to check for an error after each call; you can make multiple
calls in succession and only check at a convenient point. The only
-errors currently generated by the XArray code itself are ``ENOMEM`` and
-``EINVAL``, but it supports arbitrary errors in case you want to call
+errors currently generated by the XArray code itself are ENOMEM and
+EINVAL, but it supports arbitrary errors in case you want to call
xas_set_err() yourself.
-If the xa_state is holding an ``ENOMEM`` error, calling xas_nomem()
+If the xa_state is holding an ENOMEM error, calling xas_nomem()
will attempt to allocate more memory using the specified gfp flags and
cache it in the xa_state for the next attempt. The idea is that you take
the xa_lock, attempt the operation and drop the lock. The operation
@@ -340,7 +340,7 @@ can try harder to allocate more memory. It will return ``true`` if it
is worth retrying the operation (i.e. that there was a memory error *and*
more memory was allocated). If it has previously allocated memory, and
that memory wasn't used, and there is no error (or some error that isn't
-``ENOMEM``), then it will free the memory previously allocated.
+ENOMEM), then it will free the memory previously allocated.
Internal Entries
----------------
@@ -375,10 +375,10 @@ to xas_retry(), and retry the operation if it returns ``true``.
* - Zero
- xa_is_zero()
- - Zero entries appear as ``NULL`` through the Normal API, but occupy
+ - Zero entries appear as NULL through the Normal API, but occupy
an entry in the XArray which can be used to reserve the index for
future use. This is used by allocating XArrays for allocated entries
- which are ``NULL``.
+ which are NULL.
Other internal entries may be added in the future. As far as possible, they
will be handled by xas_retry().
@@ -461,9 +461,9 @@ You can create a multi-index entry by using XA_STATE_ORDER()
or xas_set_order() followed by a call to xas_store().
Calling xas_load() with a multi-index xa_state will walk the
xa_state to the right location in the tree, but the return value is not
-meaningful, potentially being an internal entry or ``NULL`` even when there
+meaningful, potentially being an internal entry or NULL even when there
is an entry stored within the range. Calling xas_find_conflict()
-will return the first entry within the range or ``NULL`` if there are no
+will return the first entry within the range or NULL if there are no
entries in the range. The xas_for_each_conflict() iterator will
iterate over every entry which overlaps the specified range.
@@ -479,8 +479,8 @@ Using xas_next() or xas_prev() with a multi-index xa_state is not
supported. Using either of these functions on a multi-index entry will
reveal sibling entries; these should be skipped over by the caller.
-Storing ``NULL`` into any index of a multi-index entry will set the
-entry at every index to ``NULL`` and dissolve the tie. A multi-index
+Storing NULL into any index of a multi-index entry will set the
+entry at every index to NULL and dissolve the tie. A multi-index
entry can be split into entries occupying smaller ranges by calling
xas_split_alloc() without the xa_lock held, followed by taking the lock
and calling xas_split().
--
2.32.0
There are some known constants that are used throughout the
documentation, like NULL and error codes, and which are better formatted
as literals by Sphinx. Make these words automatically literals.
Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
---
Documentation/sphinx/automarkup.py | 45 ++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
index acf5473002f3..eb219783d9e3 100644
--- a/Documentation/sphinx/automarkup.py
+++ b/Documentation/sphinx/automarkup.py
@@ -72,6 +72,40 @@ Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap',
'select', 'poll', 'fork', 'execve', 'clone', 'ioctl',
'socket' ]
+#
+# Words that are automatically converted to literals
+#
+Literals = [ 'NULL', 'ULONG_MAX', 'LONG_MAX', 'EPERM', 'ENOENT', 'ESRCH',
+ 'EINTR', 'EIO', 'ENXIO', 'E2BIG', 'ENOEXEC', 'EBADF', 'ECHILD',
+ 'EAGAIN', 'ENOMEM', 'EACCES', 'EFAULT', 'ENOTBLK', 'EBUSY',
+ 'EEXIST', 'EXDEV', 'ENODEV', 'ENOTDIR', 'EISDIR', 'EINVAL',
+ 'ENFILE', 'EMFILE', 'ENOTTY', 'ETXTBSY', 'EFBIG', 'ENOSPC',
+ 'ESPIPE', 'EROFS', 'EMLINK', 'EPIPE', 'EDOM', 'ERANGE', 'EDEADLK',
+ 'ENAMETOOLONG', 'ENOLCK', 'ENOSYS', 'ENOTEMPTY', 'ELOOP',
+ 'EWOULDBLOCK', 'ENOMSG', 'EIDRM', 'ECHRNG', 'EL2NSYNC', 'EL3HLT',
+ 'EL3RST', 'ELNRNG', 'EUNATCH', 'ENOCSI', 'EL2HLT', 'EBADE', 'EBADR',
+ 'EXFULL', 'ENOANO', 'EBADRQC', 'EBADSLT', 'EDEADLOCK', 'EBFONT',
+ 'ENOSTR', 'ENODATA', 'ETIME', 'ENOSR', 'ENONET', 'ENOPKG',
+ 'EREMOTE', 'ENOLINK', 'EADV', 'ESRMNT', 'ECOMM', 'EPROTO',
+ 'EMULTIHOP', 'EDOTDOT', 'EBADMSG', 'EOVERFLOW', 'ENOTUNIQ',
+ 'EBADFD', 'EREMCHG', 'ELIBACC', 'ELIBBAD', 'ELIBSCN', 'ELIBMAX',
+ 'ELIBEXEC', 'EILSEQ', 'ERESTART', 'ESTRPIPE', 'EUSERS', 'ENOTSOCK',
+ 'EDESTADDRREQ', 'EMSGSIZE', 'EPROTOTYPE', 'ENOPROTOOPT',
+ 'EPROTONOSUPPORT', 'ESOCKTNOSUPPORT', 'EOPNOTSUPP', 'EPFNOSUPPORT',
+ 'EAFNOSUPPORT', 'EADDRINUSE', 'EADDRNOTAVAIL', 'ENETDOWN',
+ 'ENETUNREACH', 'ENETRESET', 'ECONNABORTED', 'ECONNRESET', 'ENOBUFS',
+ 'EISCONN', 'ENOTCONN', 'ESHUTDOWN', 'ETOOMANYREFS', 'ETIMEDOUT',
+ 'ECONNREFUSED', 'EHOSTDOWN', 'EHOSTUNREACH', 'EALREADY',
+ 'EINPROGRESS', 'ESTALE', 'EUCLEAN', 'ENOTNAM', 'ENAVAIL', 'EISNAM',
+ 'EREMOTEIO', 'EDQUOT', 'ENOMEDIUM', 'EMEDIUMTYPE', 'ECANCELED',
+ 'ENOKEY', 'EKEYEXPIRED', 'EKEYREVOKED', 'EKEYREJECTED',
+ 'EOWNERDEAD', 'ENOTRECOVERABLE', 'ERFKILL', 'EHWPOISON' ]
+
+#
+# Any of the words in Literals, optionally prefixed with a '-'
+#
+RE_literal = re.compile(r'-?\b(' + str(r'|'.join(Literals)) + r')\b')
+
c_namespace = ''
def markup_refs(docname, app, node):
@@ -83,14 +117,18 @@ def markup_refs(docname, app, node):
#
markup_func_sphinx2 = {RE_doc: markup_doc_ref,
RE_function: markup_c_ref,
- RE_generic_type: markup_c_ref}
+ RE_generic_type: markup_c_ref,
+ RE_literal: markup_literal,
+ }
markup_func_sphinx3 = {RE_doc: markup_doc_ref,
RE_function: markup_func_ref_sphinx3,
RE_struct: markup_c_ref,
RE_union: markup_c_ref,
RE_enum: markup_c_ref,
- RE_typedef: markup_c_ref}
+ RE_typedef: markup_c_ref,
+ RE_literal: markup_literal,
+ }
if sphinx.version_info[0] >= 3:
markup_func = markup_func_sphinx3
@@ -225,6 +263,9 @@ def markup_c_ref(docname, app, match):
return target_text
+def markup_literal(docname, app, match):
+ return nodes.literal('', match.group(0))
+
#
# Try to replace a documentation reference of the form Documentation/... with a
# cross reference to that page
--
2.32.0
Hi Nícolas,
Em Tue, 8 Jun 2021 22:43:06 -0300
Nícolas F. R. A. Prado <[email protected]> escreveu:
> Hi,
>
> a while back Matthew suggested adding automatic markup of known constants, like
> NULL and error codes, as literals [1]. This is a great idea since those literals
> are very frequently used throughout the documentation and are fixed names, so we
> can easily match them without false positives and make the source files less
> cluttered.
>
> Patch 1 adds that feature to automarkup.py, while patch 2 removes the no longer
> needed markup from the xarray.rst file which was given as an example.
>
> Some things I'd like to discuss:
>
> * As a first draft I added the constants I found on xarray.rst plus all error
> codes from errno-base.h and errno.h. We don't need to start with everything,
> but we can certainly do better than this. What are common constants that
> should be added here? (Matthew mentioned ANSI C or POSIX, some reference to
> the constants in those would be great, even more if they are easily parsable)
>
> * The Literals list added is already a bit big with just the error codes, so
> perhaps we should move them to a separate plain text file that is read on
> startup?
If we're willing to keep a long list, they should be parsed from files,
instead of directly included. For things like errno-base.h/errno.h, the
better would be if automarkup would read them directly from the header
file, as otherwise there will be a maintainance burden.
Yet, a regex like "-E[A-Z\d]+\b" would likely get all cases, but this
will produce false positives, like EXAMPLE, EXAMPLES and some other
non-trivial exceptions.
With regards to NULL, I would just use a trivial regex like: "\bNULL\b"
(but see below my notes about using "\b").
> * There was also mention of automatically converting uppercase words to
> literals. I'm not so sure about that one so I left it out for now.
>
> The example given was XA_MARK_0, which is a #define in xarray.h. The way
> to go here is certainly to first use kernel-doc to get it included in the
> documentation, and then cross-reference to it. FWICT at the moment kernel-doc
> for defines should be done like for functions (but the parenthesis can be
> omitted, although they show up in the output):
>
> /**
> * XA_MARK_0 - Brief description.
> *
> * Description of the type.
> */
>
> At the current state, the cross-reference would then need to be done either
> through plain :c:macro:`XA_MARK_0`, or, by relying on automarkup, typedef
> XA_MARK_0 (which is not ideal since this isn't a typedef). Options for
> improvements would be to add 'macro' as a possible prefix (so eg. macro
> XA_MARK_0), or go all out and try to cross-reference on every uppercase word
> like suggested. We should strive for what is the most natural to write, as
> long as the regex isn't too broad.
>
> Since automarkup.py is opt-out rather than opt-in, I think we should be extra
> careful not to make the regexes too broad, even if in this case it relies on a
> C symbol being present.
>
> One other idea I had was that we could limit to all uppercase words as
> long as it has an _ on it, since I don't expect we would get false positives
> with that. The downside is that it might be very confusing to people why their
> MACRONAME isn't being automatically cross-referenced to...
What it can be done would be to first check/apply cross-references. Only if it
fails, then replace everything in uppercase to literals.
There is a drawback, tough: this would cause texts in uppercase. We used to
have lots of them before ReST conversion. I won't doubt that some files
would have kept a few of them. So, in this case, the regex would need to
require at least one _, e. g. something like:
\b[\-]?[A-Z_][A-Z\d_]*_[A-Z\d_]*\b
Again, tests are needed in order to check if the regex won't get anything
that shouldn't be caught. So, I would grep the source codes in order to
check if the regexes won't bring false positives, e. g.:
$ git grep -E "\b[\-]?[A-Z_][A-Z\d_]*_[A-Z\d_]*\b" Documentation/
Btw, the "\b" at the end won't actually work, due to things like:
GLOBAL_GEN_STORAGE{0:3}
GAUDI_ENGINE_ID_*
If you take a look at the scripts/get_abi.pl that I wrote, I faced
the same problem there with \b. So, I ended implementing my own set
of \b:
my $start = qr {(^|\s|\() }x;
my $bondary = qr { ([,.:;\)\s]|\z) }x;
Using the above replacements for \b, I would do something like this to
double-check if the regex won't be producing false-positives:
$ for i in $(find Documentation/ -name '*.rst'); do perl -ne 'print "$1 \e[0;31m$2\e[0;37m $3\n" if /(^|\s|\()([\-]?[A-Z_][A-Z\d_]*_[A-Z\d_]*)([,.:;\)\s]|\z)/' $i; done
Btw, on a quick look, this specific regex:
(^|\s|\()([\-]?[A-Z_][A-Z\d_]*_[A-Z\d_]*)([,.:;\)\s]|\z)
Seems to be producing just one false positive: a sequence of _, e. g.:
\b\_+\b
While we could make the regex more complex, I would just test if the second
match is '^\_+$', skipping it if found. On other words, a python-equivalent
to this:
$ for i in $(find Documentation/ -name '*.rst'); do perl -ne 'print "$2\n" if /(^|\s|\()([\-]?[A-Z_][A-Z\d_]*_[A-Z\d_]*)([,.:;\)\s]|\z)/' $i|grep -vE "^\_+$"; done|sort|uniq
-
That's said, it is worth to also mention the false negatives:
$ for i in $(find Documentation/ -name '*.rst'); do perl -ne 'print "$2\n" if /(^|\s|\()([\-]?[A-Z_][A-Z\d_]*_[A-Z\d_]*)([,.:;\)\s]|\z)/' $i|grep -vE "^\_+$"; done|sort|uniq >OK_list
$ for i in $(find Documentation/ -name '*.rst'); do perl -ne 'print "$2\n" if /(^|\s|\()([\-]?[A-Z_][A-Z\d_]*)([,.:;\)\s]|\z)/' $i|grep -vE " "; done|sort|uniq >FULL_list
$ grep -v -f OK_list FULL_list >excluded_list
$ wc -l OK_list FULL_list excluded_list
7398 OK_list
14487 FULL_list
7520 excluded_list
29405 total
There are a lot of constants that won't be parsed if we require at least
one '_'. Looking at the excluded_list, indeed there are things there
which should be excluded, like EXAMPLE, EXAMPLES, EXCEPTION, WITH, WITHIN,
WITHOUT..., but also there are a several matches that are constants,
like KERNELRELEASE, DISCONTIGMEM, SIOCGIFINDEX, SIGALRM, SETREGSET,
CDROMREADCOOKED, CDROMREADRAW, ...
I doubt that there would be an easy way to handle such cases, as
a file with a ~7000 constants would be a maintenance nightmare.
---
In summary, my suggestion is that we should stay out of having a big list
of constants. So, I would start with:
1. a regex to get FOO_BAR cases, like:
(^|\s|\()([\-]?[A-Z_][A-Z\d_]*_[A-Z\d_]*)([,.:;\)\s]|\z)
with a test to exclude this pattern:
^\_+$
2. a logic that will pick and use errno codes from the errno*.h
files;
3. an specific handler handler for the NULL special case.
This would get 7000+ different constants, which seems a very good
start. If needed, later we could get a few other relevant constants
by checking the most-used ones with something like:
for i in $(find Documentation/ -name '*.rst'); do perl -ne 'print "$2\n" if /(^|\s|\()([\-]?[A-Z_][A-Z\d_]*)([,.:;\)\s]|\z)/' $i|grep -v "_"; done|sort|uniq -c|sort -n
in order to grab the most common ones.
Regards,
Mauro
Em Tue, 8 Jun 2021 22:43:08 -0300
Nícolas F. R. A. Prado <[email protected]> escreveu:
> Remove literal markup from known constants, instead relying on
> automarkup.py to make them into literals.
>
> Suggested-by: Matthew Wilcox <[email protected]>
> Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Once we implement automarkup.py support, this patch seems OK on my eyes.
you can add my reviewed by: here on a series with the automarkup
logic.
> ---
> Documentation/core-api/xarray.rst | 48 +++++++++++++++----------------
> 1 file changed, 24 insertions(+), 24 deletions(-)
>
> diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst
> index a137a0e6d068..6e608e2e8e5b 100644
> --- a/Documentation/core-api/xarray.rst
> +++ b/Documentation/core-api/xarray.rst
> @@ -22,7 +22,7 @@ The XArray implementation is efficient when the indices used are densely
> clustered; hashing the object and using the hash as the index will not
> perform well. The XArray is optimised for small indices, but still has
> good performance with large indices. If your index can be larger than
> -``ULONG_MAX`` then the XArray is not the data type for you. The most
> +ULONG_MAX then the XArray is not the data type for you. The most
> important user of the XArray is the page cache.
>
> Normal pointers may be stored in the XArray directly. They must be 4-byte
> @@ -31,7 +31,7 @@ alloc_page(). It isn't true for arbitrary user-space pointers,
> nor for function pointers. You can store pointers to statically allocated
> objects, as long as those objects have an alignment of at least 4.
>
> -You can also store integers between 0 and ``LONG_MAX`` in the XArray.
> +You can also store integers between 0 and LONG_MAX in the XArray.
> You must first convert it into an entry using xa_mk_value().
> When you retrieve an entry from the XArray, you can check whether it is
> a value entry by calling xa_is_value(), and convert it back to
> @@ -52,7 +52,7 @@ An unusual feature of the XArray is the ability to create entries which
> occupy a range of indices. Once stored to, looking up any index in
> the range will return the same entry as looking up any other index in
> the range. Storing to any index will store to all of them. Multi-index
> -entries can be explicitly split into smaller entries, or storing ``NULL``
> +entries can be explicitly split into smaller entries, or storing NULL
> into any entry will cause the XArray to forget about the range.
>
> Normal API
> @@ -60,16 +60,16 @@ Normal API
>
> Start by initialising an XArray, either with DEFINE_XARRAY()
> for statically allocated XArrays or xa_init() for dynamically
> -allocated ones. A freshly-initialised XArray contains a ``NULL``
> +allocated ones. A freshly-initialised XArray contains a NULL
> pointer at every index.
>
> You can then set entries using xa_store() and get entries
> using xa_load(). xa_store will overwrite any entry with the
> new entry and return the previous entry stored at that index. You can
> use xa_erase() instead of calling xa_store() with a
> -``NULL`` entry. There is no difference between an entry that has never
> +NULL a entry. There is no difference between an entry that has never
> been stored to, one that has been erased and one that has most recently
> -had ``NULL`` stored to it.
> +had NULL stored to it.
>
> You can conditionally replace an entry at an index by using
> xa_cmpxchg(). Like cmpxchg(), it will only succeed if
> @@ -78,8 +78,8 @@ which was at that index; if it returns the same entry which was passed as
> 'old', then xa_cmpxchg() succeeded.
>
> If you want to only store a new entry to an index if the current entry
> -at that index is ``NULL``, you can use xa_insert() which
> -returns ``-EBUSY`` if the entry is not empty.
> +at that index is NULL, you can use xa_insert() which
> +returns -EBUSY if the entry is not empty.
>
> You can copy entries out of the XArray into a plain array by calling
> xa_extract(). Or you can iterate over the present entries in the XArray
> @@ -97,14 +97,14 @@ some, but not all of the other indices changing.
> Sometimes you need to ensure that a subsequent call to xa_store()
> will not need to allocate memory. The xa_reserve() function
> will store a reserved entry at the indicated index. Users of the
> -normal API will see this entry as containing ``NULL``. If you do
> +normal API will see this entry as containing NULL. If you do
> not need to use the reserved entry, you can call xa_release()
> to remove the unused entry. If another user has stored to the entry
> in the meantime, xa_release() will do nothing; if instead you
> -want the entry to become ``NULL``, you should use xa_erase().
> +want the entry to become NULL, you should use xa_erase().
> Using xa_insert() on a reserved entry will fail.
>
> -If all entries in the array are ``NULL``, the xa_empty() function
> +If all entries in the array are NULL, the xa_empty() function
> will return ``true``.
>
> Finally, you can remove all entries from an XArray by calling
> @@ -120,7 +120,7 @@ Each mark may be set or cleared independently of the others. You can
> iterate over marked entries by using the xa_for_each_marked() iterator.
>
> You can enquire whether a mark is set on an entry by using
> -xa_get_mark(). If the entry is not ``NULL``, you can set a mark on it
> +xa_get_mark(). If the entry is not NULL, you can set a mark on it
> by using xa_set_mark() and remove the mark from an entry by calling
> xa_clear_mark(). You can ask whether any entry in the XArray has a
> particular mark set by calling xa_marked(). Erasing an entry from the
> @@ -151,9 +151,9 @@ interrupts while allocating the ID.
>
> Using xa_store(), xa_cmpxchg() or xa_insert() will
> also mark the entry as being allocated. Unlike a normal XArray, storing
> -``NULL`` will mark the entry as being in use, like xa_reserve().
> +NULL will mark the entry as being in use, like xa_reserve().
> To free an entry, use xa_erase() (or xa_release() if
> -you only want to free the entry if it's ``NULL``).
> +you only want to free the entry if it's NULL).
>
> By default, the lowest free entry is allocated starting from 0. If you
> want to allocate entries starting at 1, it is more efficient to use
> @@ -326,11 +326,11 @@ xas_error() to retrieve the error. All operations check whether
> the xa_state is in an error state before proceeding, so there's no need
> for you to check for an error after each call; you can make multiple
> calls in succession and only check at a convenient point. The only
> -errors currently generated by the XArray code itself are ``ENOMEM`` and
> -``EINVAL``, but it supports arbitrary errors in case you want to call
> +errors currently generated by the XArray code itself are ENOMEM and
> +EINVAL, but it supports arbitrary errors in case you want to call
> xas_set_err() yourself.
>
> -If the xa_state is holding an ``ENOMEM`` error, calling xas_nomem()
> +If the xa_state is holding an ENOMEM error, calling xas_nomem()
> will attempt to allocate more memory using the specified gfp flags and
> cache it in the xa_state for the next attempt. The idea is that you take
> the xa_lock, attempt the operation and drop the lock. The operation
> @@ -340,7 +340,7 @@ can try harder to allocate more memory. It will return ``true`` if it
> is worth retrying the operation (i.e. that there was a memory error *and*
> more memory was allocated). If it has previously allocated memory, and
> that memory wasn't used, and there is no error (or some error that isn't
> -``ENOMEM``), then it will free the memory previously allocated.
> +ENOMEM), then it will free the memory previously allocated.
>
> Internal Entries
> ----------------
> @@ -375,10 +375,10 @@ to xas_retry(), and retry the operation if it returns ``true``.
>
> * - Zero
> - xa_is_zero()
> - - Zero entries appear as ``NULL`` through the Normal API, but occupy
> + - Zero entries appear as NULL through the Normal API, but occupy
> an entry in the XArray which can be used to reserve the index for
> future use. This is used by allocating XArrays for allocated entries
> - which are ``NULL``.
> + which are NULL.
>
> Other internal entries may be added in the future. As far as possible, they
> will be handled by xas_retry().
> @@ -461,9 +461,9 @@ You can create a multi-index entry by using XA_STATE_ORDER()
> or xas_set_order() followed by a call to xas_store().
> Calling xas_load() with a multi-index xa_state will walk the
> xa_state to the right location in the tree, but the return value is not
> -meaningful, potentially being an internal entry or ``NULL`` even when there
> +meaningful, potentially being an internal entry or NULL even when there
> is an entry stored within the range. Calling xas_find_conflict()
> -will return the first entry within the range or ``NULL`` if there are no
> +will return the first entry within the range or NULL if there are no
> entries in the range. The xas_for_each_conflict() iterator will
> iterate over every entry which overlaps the specified range.
>
> @@ -479,8 +479,8 @@ Using xas_next() or xas_prev() with a multi-index xa_state is not
> supported. Using either of these functions on a multi-index entry will
> reveal sibling entries; these should be skipped over by the caller.
>
> -Storing ``NULL`` into any index of a multi-index entry will set the
> -entry at every index to ``NULL`` and dissolve the tie. A multi-index
> +Storing NULL into any index of a multi-index entry will set the
> +entry at every index to NULL and dissolve the tie. A multi-index
> entry can be split into entries occupying smaller ranges by calling
> xas_split_alloc() without the xa_lock held, followed by taking the lock
> and calling xas_split().
Thanks,
Mauro
Em Tue, 8 Jun 2021 22:43:07 -0300
Nícolas F. R. A. Prado <[email protected]> escreveu:
> There are some known constants that are used throughout the
> documentation, like NULL and error codes, and which are better formatted
> as literals by Sphinx. Make these words automatically literals.
>
> Suggested-by: Matthew Wilcox <[email protected]>
> Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
> ---
> Documentation/sphinx/automarkup.py | 45 ++++++++++++++++++++++++++++--
> 1 file changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
> index acf5473002f3..eb219783d9e3 100644
> --- a/Documentation/sphinx/automarkup.py
> +++ b/Documentation/sphinx/automarkup.py
> @@ -72,6 +72,40 @@ Skipfuncs = [ 'open', 'close', 'read', 'write', 'fcntl', 'mmap',
> 'select', 'poll', 'fork', 'execve', 'clone', 'ioctl',
> 'socket' ]
>
> +#
> +# Words that are automatically converted to literals
> +#
> +Literals = [ 'NULL', 'ULONG_MAX', 'LONG_MAX', 'EPERM', 'ENOENT', 'ESRCH',
> + 'EINTR', 'EIO', 'ENXIO', 'E2BIG', 'ENOEXEC', 'EBADF', 'ECHILD',
> + 'EAGAIN', 'ENOMEM', 'EACCES', 'EFAULT', 'ENOTBLK', 'EBUSY',
> + 'EEXIST', 'EXDEV', 'ENODEV', 'ENOTDIR', 'EISDIR', 'EINVAL',
> + 'ENFILE', 'EMFILE', 'ENOTTY', 'ETXTBSY', 'EFBIG', 'ENOSPC',
> + 'ESPIPE', 'EROFS', 'EMLINK', 'EPIPE', 'EDOM', 'ERANGE', 'EDEADLK',
> + 'ENAMETOOLONG', 'ENOLCK', 'ENOSYS', 'ENOTEMPTY', 'ELOOP',
> + 'EWOULDBLOCK', 'ENOMSG', 'EIDRM', 'ECHRNG', 'EL2NSYNC', 'EL3HLT',
> + 'EL3RST', 'ELNRNG', 'EUNATCH', 'ENOCSI', 'EL2HLT', 'EBADE', 'EBADR',
> + 'EXFULL', 'ENOANO', 'EBADRQC', 'EBADSLT', 'EDEADLOCK', 'EBFONT',
> + 'ENOSTR', 'ENODATA', 'ETIME', 'ENOSR', 'ENONET', 'ENOPKG',
> + 'EREMOTE', 'ENOLINK', 'EADV', 'ESRMNT', 'ECOMM', 'EPROTO',
> + 'EMULTIHOP', 'EDOTDOT', 'EBADMSG', 'EOVERFLOW', 'ENOTUNIQ',
> + 'EBADFD', 'EREMCHG', 'ELIBACC', 'ELIBBAD', 'ELIBSCN', 'ELIBMAX',
> + 'ELIBEXEC', 'EILSEQ', 'ERESTART', 'ESTRPIPE', 'EUSERS', 'ENOTSOCK',
> + 'EDESTADDRREQ', 'EMSGSIZE', 'EPROTOTYPE', 'ENOPROTOOPT',
> + 'EPROTONOSUPPORT', 'ESOCKTNOSUPPORT', 'EOPNOTSUPP', 'EPFNOSUPPORT',
> + 'EAFNOSUPPORT', 'EADDRINUSE', 'EADDRNOTAVAIL', 'ENETDOWN',
> + 'ENETUNREACH', 'ENETRESET', 'ECONNABORTED', 'ECONNRESET', 'ENOBUFS',
> + 'EISCONN', 'ENOTCONN', 'ESHUTDOWN', 'ETOOMANYREFS', 'ETIMEDOUT',
> + 'ECONNREFUSED', 'EHOSTDOWN', 'EHOSTUNREACH', 'EALREADY',
> + 'EINPROGRESS', 'ESTALE', 'EUCLEAN', 'ENOTNAM', 'ENAVAIL', 'EISNAM',
> + 'EREMOTEIO', 'EDQUOT', 'ENOMEDIUM', 'EMEDIUMTYPE', 'ECANCELED',
> + 'ENOKEY', 'EKEYEXPIRED', 'EKEYREVOKED', 'EKEYREJECTED',
> + 'EOWNERDEAD', 'ENOTRECOVERABLE', 'ERFKILL', 'EHWPOISON' ]
There are some arch-specific error codes missing there:
ECANCELLED
EINIT
EMAXERRNO
ENOSYM
EPROCLIM
EREFUSED
EREMDEV
EREMOTERELEASE
ERREMOTE
It sounds a nightmare to maintain this by hand, as a list of used
constants will only grow. IMO, an explicit list should be kept only to
with the absolute minimum, e.g. for highly-used constants that aren't
error codes nor FOO_BAR. The only case that occurs to me that fits
on this rule is 'NULL'.
As I said on patch 0/2, IMO the error codes should be parsed from the
relevant errno files:
./arch/powerpc/include/uapi/asm/errno.h
./arch/parisc/include/uapi/asm/errno.h
./arch/sparc/include/uapi/asm/errno.h
./arch/alpha/include/uapi/asm/errno.h
./arch/mips/include/uapi/asm/errno.h
./arch/mips/include/asm/errno.h
./include/uapi/asm-generic/errno-base.h
./include/uapi/asm-generic/errno.h
./include/linux/errno.h
Also, as pointed on patch 0/2 too, I would automatically parse all
uppercase with a _ on it, as all of them look constants on my eyes
(but only after checking at the references dictionary if they aren't
a cross-reference to a macro).
> +
> +#
> +# Any of the words in Literals, optionally prefixed with a '-'
> +#
> +RE_literal = re.compile(r'-?\b(' + str(r'|'.join(Literals)) + r')\b')
see my notes about "\b" on patch 0/2.
Regards,
Mauro
Mauro Carvalho Chehab <[email protected]> writes:
> It sounds a nightmare to maintain this by hand, as a list of used
> constants will only grow. IMO, an explicit list should be kept only to
> with the absolute minimum, e.g. for highly-used constants that aren't
> error codes nor FOO_BAR. The only case that occurs to me that fits
> on this rule is 'NULL'.
This is my concern as well. It seems to me like we would always have a
situation where some constants are magically marked up and others
aren't, and people will spend a lot of time trying to figure out why.
Might it not be better to just adopt the convention that these constants
don't need to be marked up at all? NULL is entirely understandable even
when presented in a proportional font. Seems like maybe the best of
both worlds?
Thanks,
jon
Em Sun, 13 Jun 2021 17:13:45 -0600
Jonathan Corbet <[email protected]> escreveu:
> Mauro Carvalho Chehab <[email protected]> writes:
>
> > It sounds a nightmare to maintain this by hand, as a list of used
> > constants will only grow. IMO, an explicit list should be kept only to
> > with the absolute minimum, e.g. for highly-used constants that aren't
> > error codes nor FOO_BAR. The only case that occurs to me that fits
> > on this rule is 'NULL'.
>
> This is my concern as well. It seems to me like we would always have a
> situation where some constants are magically marked up and others
> aren't, and people will spend a lot of time trying to figure out why.
Yeah, people wasting their time trying to fix it doesn't sound
good.
>
> Might it not be better to just adopt the convention that these constants
> don't need to be marked up at all? NULL is entirely understandable even
> when presented in a proportional font. Seems like maybe the best of
> both worlds?
I doubt we'll have a consensus here. IMO it is a lot easier for the
reader to have constants displayed on a different way. I guess other
developers may have similar opinions, while others won't care or would
think otherwise ;-)
Granted, the ones who care could always explicitly add ``FOO`` at
the rst file (we do that on media, as this was imported from our
old docbooks - and we try to preserve it on newer symbols), but I bet
that people will still do things like: "FOO", 'FOO' or ‘FOO’ in order
to distinguish at least some of them.
So, if we want consistency, I can see only two alternatives:
- a treewide patchset manually replacing the conts;
- use automarkup.py.
IMO, the latter is better. We can also agree that there won't be
an agreement, and keep as is ;-)
-
Assuming that this would be addressed by automarkup.py, there are
3 cases that automarkup.py can identify without much efforts:
1. uppercases with underscore;
2. error codes: Easy to parse the errno files and get the codes at
runtme;
3. NULL.
Dealing with the remaining cases, however, are a lot more complex, as
documents may have:
4. Simple uppercase texts:
THIS README FILE IS PROVIDED BY ADAPTEC AND CONTRIBUTORS ``AS IS`` AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, ANY
...
FILE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
(this one is at Documentation/scsi/aic79xx.rst, but there are
other variants of similar licensing style texts on other places)
5. uppercase acronyms, like DIMM, RAM, etc;
6. names of boards, manufacturers, etc;
7. syscalls, like:
SIOCDEVPRIVATE
SIOCETHTOOL
SIOCGHWTSTAMP
...
8. other constants
It should also be noticed that some files use uppercase letters for
variables like "A", "B" and "C". On those, using a different font
would make a lot easier for the readers.
It sounds to me that one size doesn't fit all. I can't see a
way to address it on a way that it would make easy to maintain
while keeping it fully transparent.
Maybe there's one alternative: automarkup could gain support to read
extra files with non-treewide regex rules.
For instance, a Documentation/foo/.automarkup could contain extra
regex rules for the files under Documentation/foo/.
Another possibility would be to have a ".. automarkup_file" tag.
This way, (1), (2) and (3) would be handled automatically,
treewide, but things like sockios.h (SIO* ioctls) would be there
only for network, for instance, using some file with the extra
regexes for automarkup stored under Documentation/networking.
Not sure how hard/easy would be to implement it, nor if one
directory can/should inherit extra regexes from its parent,
but, at a first glance, this seems to be a way to go.
Thanks,
Mauro