The Path lookup is a very complex subject in VFS. The path-lookup
document provides a very detailed guidance to help people understand
how path lookup works in the kernel. This document was originally
written based on three lwn articles five years ago. As times goes by,
some of the content is outdated. This patchset is intended to update
the document to make it more relevant to current codebase.
---
v1: https://lore.kernel.org/lkml/[email protected]/
v2: https://lore.kernel.org/lkml/[email protected]/
- Fix problems in v1 reviewed by Neil:
1. In Patch 01 and 02 rewrite a new paragrah to describe step_into()
2. In Patch 01 instead of changing it to traverse_mounts, remove follow_managed()
3. In Patch 03 re-telling the story rather than adding notes
4. In Patch 04 do_open() should be outside of loop, fix it and fix other problems
in following paragrah
5. In Patch 07 use "drop out of RCU-walk"
6. In Patch 08 "latter" should be "later", fix it and restructure the next paragrah
removing "Finally"
v3:
- Fix problems in v2 according to Neil's review:
1. Fix minor problems in Patch 1,2,3,8,9,11,12
2. In Patch 4 remove redundant paragraph, condense information, and
make the paragraph connection more smooth.
3. In Patch 10 Fix WALK_NOFOLLOW, WALK_MORE descriptions and
document WALK_TRAILING
- As suggested by Matthew Wilcox and Jonathan Corbet, remove ``...`` literals of function names in this patchset. I put this in a standalone Patch(Patch 13), because automarkup extension doesn't work on my side. You can choose to take it or not take it.
To help review, I've put a compiled html version here:
http://linux-docs.54fox.com/linux_docs/filesystems/path-lookup-v3.html
and highlight changes using hypothesis:
https://hyp.is/go?url=http%3A%2F%2Flinux-docs.54fox.com%2Flinux_docs%2Ffilesystems%2Fpath-lookup-v3.html&group=__world__
Fox Chen (13):
docs: path-lookup: update follow_managed() part
docs: path-lookup: update path_to_nameidata() part
docs: path-lookup: update path_mountpoint() part
docs: path-lookup: update do_last() part
docs: path-lookup: remove filename_mountpoint
docs: path-lookup: Add macro name to symlink limit description
docs: path-lookup: i_op->follow_link replaced with i_op->get_link
docs: path-lookup: update i_op->put_link and cookie description
docs: path-lookup: no get_link()
docs: path-lookup: update WALK_GET, WALK_PUT desc
docs: path-lookup: update get_link() ->follow_link description
docs: path-lookup: update symlink description
docs: path-lookup: use bare function() rather than literals
Documentation/filesystems/path-lookup.rst | 194 ++++++++++------------
1 file changed, 85 insertions(+), 109 deletions(-)
--
2.31.1
As suggested by Matthew Wilcox and Jonathan Corbet, drop ``...``
literals around function names of this patchset.
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 70 +++++++++++------------
1 file changed, 35 insertions(+), 35 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index 9ac742530e46..33d58fca662a 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -450,13 +450,13 @@ If that doesn't get a good result, it calls "``lookup_slow()``" which
takes ``i_rwsem``, rechecks the cache, and then asks the filesystem
to find a definitive answer.
-As the last step of ``walk_component()``, ``step_into()`` will be called either
+As the last step of walk_component(), step_into() will be called either
directly from walk_component() or from handle_dots(). It calls
-``handle_mounts()``, to check and handle mount points, in which a new
+handle_mounts(), to check and handle mount points, in which a new
``struct path`` is created containing a counted reference to the new dentry and
a reference to the new ``vfsmount`` which is only counted if it is
different from the previous ``vfsmount``. Then if there is
-a symbolic link, ``step_into()`` calls ``pick_link()`` to deal with it,
+a symbolic link, step_into() calls pick_link() to deal with it,
otherwise it installs the new ``struct path`` in the ``struct nameidata``, and
drops the unneeded references.
@@ -472,8 +472,8 @@ Handling the final component
``nd->last_type`` to refer to the final component of the path. It does
not call ``walk_component()`` that last time. Handling that final
component remains for the caller to sort out. Those callers are
-``path_lookupat()``, ``path_parentat()`` and
-``path_openat()`` each of which handles the differing requirements of
+path_lookupat(), path_parentat() and
+path_openat() each of which handles the differing requirements of
different system calls.
``path_parentat()`` is clearly the simplest - it just wraps a little bit
@@ -489,17 +489,17 @@ object is wanted such as by ``stat()`` or ``chmod()``. It essentially just
calls ``walk_component()`` on the final component through a call to
``lookup_last()``. ``path_lookupat()`` returns just the final dentry.
It is worth noting that when flag ``LOOKUP_MOUNTPOINT`` is set,
-``path_lookupat()`` will unset LOOKUP_JUMPED in nameidata so that in the
-subsequent path traversal ``d_weak_revalidate()`` won't be called.
+path_lookupat() will unset LOOKUP_JUMPED in nameidata so that in the
+subsequent path traversal d_weak_revalidate() won't be called.
This is important when unmounting a filesystem that is inaccessible, such as
one provided by a dead NFS server.
Finally ``path_openat()`` is used for the ``open()`` system call; it
-contains, in support functions starting with "``open_last_lookups()``", all the
+contains, in support functions starting with "open_last_lookups()", all the
complexity needed to handle the different subtleties of O_CREAT (with
or without O_EXCL), final "``/``" characters, and trailing symbolic
links. We will revisit this in the final part of this series, which
-focuses on those symbolic links. "``open_last_lookups()``" will sometimes, but
+focuses on those symbolic links. "open_last_lookups()" will sometimes, but
not always, take ``i_rwsem``, depending on what it finds.
Each of these, or the functions which call them, need to be alert to
@@ -651,9 +651,9 @@ RCU-walk finds it cannot stop gracefully, it simply gives up and
restarts from the top with REF-walk.
This pattern of "try RCU-walk, if that fails try REF-walk" can be
-clearly seen in functions like ``filename_lookup()``,
-``filename_parentat()``,
-``do_filp_open()``, and ``do_file_open_root()``. These four
+clearly seen in functions like filename_lookup(),
+filename_parentat(),
+do_filp_open(), and do_file_open_root(). These four
correspond roughly to the three ``path_*()`` functions we met earlier,
each of which calls ``link_path_walk()``. The ``path_*()`` functions are
called using different mode flags until a mode is found which works.
@@ -1069,8 +1069,8 @@ all the data structures it references are safe to be accessed while
holding no counted reference, only the RCU lock. A callback
``struct delayed_called`` will be passed to ``->get_link()``:
file systems can set their own put_link function and argument through
-``set_delayed_call()``. Later on, when VFS wants to put link, it will call
-``do_delayed_call()`` to invoke that callback function with the argument.
+set_delayed_call(). Later on, when VFS wants to put link, it will call
+do_delayed_call() to invoke that callback function with the argument.
In order for the reference to each symlink to be dropped when the walk completes,
whether in RCU-walk or REF-walk, the symlink stack needs to contain,
@@ -1103,7 +1103,7 @@ doesn't need to notice. Getting this ``name`` variable on and off the
stack is very straightforward; pushing and popping the references is
a little more complex.
-When a symlink is found, ``walk_component()`` calls ``pick_link()`` via ``step_into()``
+When a symlink is found, walk_component() calls pick_link() via step_into()
which returns the link from the filesystem.
Providing that operation is successful, the old path ``name`` is placed on the
stack, and the new value is used as the ``name`` for a while. When the end of
@@ -1136,10 +1136,10 @@ Symlinks with no final component
A pair of special-case symlinks deserve a little further explanation.
Both result in a new ``struct path`` (with mount and dentry) being set
-up in the ``nameidata``, and result in ``pick_link()`` returning ``NULL``.
+up in the ``nameidata``, and result in pick_link() returning ``NULL``.
The more obvious case is a symlink to "``/``". All symlinks starting
-with "``/``" are detected in ``pick_link()`` which resets the ``nameidata``
+with "``/``" are detected in pick_link() which resets the ``nameidata``
to point to the effective filesystem root. If the symlink only
contains "``/``" then there is nothing more to do, no components at all,
so ``NULL`` is returned to indicate that the symlink can be released and
@@ -1157,9 +1157,9 @@ target file, not just the name of it. When you ``readlink`` these
objects you get a name that might refer to the same file - unless it
has been unlinked or mounted over. When ``walk_component()`` follows
one of these, the ``->get_link()`` method in "procfs" doesn't return
-a string name, but instead calls ``nd_jump_link()`` which updates the
+a string name, but instead calls nd_jump_link() which updates the
``nameidata`` in place to point to that target. ``->get_link()`` then
-returns ``NULL``. Again there is no final component and ``pick_link()``
+returns ``NULL``. Again there is no final component and pick_link()
returns ``NULL``.
Following the symlink in the final component
@@ -1177,35 +1177,35 @@ potentially need to call ``link_path_walk()`` again and again on
successive symlinks until one is found that doesn't point to another
symlink.
-This case is handled by relevant callers of ``link_path_walk()``, such as
-``path_lookupat()``, ``path_openat()`` using a loop that calls ``link_path_walk()``,
-and then handles the final component by calling ``open_last_lookups()`` or
-``lookup_last()``. If it is a symlink that needs to be followed,
-``open_last_lookups()`` or ``lookup_last()`` will set things up properly and
+This case is handled by relevant callers of link_path_walk(), such as
+path_lookupat(), path_openat() using a loop that calls link_path_walk(),
+and then handles the final component by calling open_last_lookups() or
+lookup_last(). If it is a symlink that needs to be followed,
+open_last_lookups() or lookup_last() will set things up properly and
return the path so that the loop repeats, calling
-``link_path_walk()`` again. This could loop as many as 40 times if the last
+link_path_walk() again. This could loop as many as 40 times if the last
component of each symlink is another symlink.
Of the various functions that examine the final component,
-``open_last_lookups()`` is the most interesting as it works in tandem
-with ``do_open()`` for opening a file. Part of ``open_last_lookups()`` runs
-with ``i_rwsem`` held and this part is in a separate function: ``lookup_open()``.
+open_last_lookups() is the most interesting as it works in tandem
+with do_open() for opening a file. Part of open_last_lookups() runs
+with ``i_rwsem`` held and this part is in a separate function: lookup_open().
-Explaining ``open_last_lookups()`` and ``do_open()`` completely is beyond the scope
+Explaining open_last_lookups() and do_open() completely is beyond the scope
of this article, but a few highlights should help those interested in exploring
the code.
-1. Rather than just finding the target file, ``do_open()`` is used after
- ``open_last_lookup()`` to open
+1. Rather than just finding the target file, do_open() is used after
+ open_last_lookup() to open
it. If the file was found in the dcache, then ``vfs_open()`` is used for
this. If not, then ``lookup_open()`` will either call ``atomic_open()`` (if
the filesystem provides it) to combine the final lookup with the open, or
will perform the separate ``i_op->lookup()`` and ``i_op->create()`` steps
directly. In the later case the actual "open" of this newly found or
- created file will be performed by ``vfs_open()``, just as if the name
+ created file will be performed by vfs_open(), just as if the name
were found in the dcache.
-2. ``vfs_open()`` can fail with ``-EOPENSTALE`` if the cached information
+2. vfs_open() can fail with ``-EOPENSTALE`` if the cached information
wasn't quite current enough. If it's in RCU-walk ``-ECHILD`` will be returned
otherwise ``-ESTALE`` is returned. When ``-ESTALE`` is returned, the caller may
retry with ``LOOKUP_REVAL`` flag set.
@@ -1218,8 +1218,8 @@ the code.
will create a file called ``/tmp/bar``. This is not permitted if
``O_EXCL`` is set but otherwise is handled for an O_CREAT open much
- like for a non-creating open: ``lookup_last()`` or ``open_last_lookup()``
- returns a non ``NULL`` value, and ``link_path_walk()`` gets called and the
+ like for a non-creating open: lookup_last() or open_last_lookup()
+ returns a non ``NULL`` value, and link_path_walk() gets called and the
open process continues on the symlink that was found.
Updating the access time
--
2.31.1
No path_to_namei() anymore, step_into() will be called.
Related commit: commit c99687a03a78 ("fold path_to_nameidata()
into its only remaining caller")
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index 751082d469e8..6ea0880fb982 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -453,11 +453,12 @@ to find a definitive answer.
As the last step of ``walk_component()``, ``step_into()`` will be called either
directly from walk_component() or from handle_dots(). It calls
``handle_mounts()``, to check and handle mount points, in which a new
-``struct path`` containing a counted reference to the new dentry and a
-reference to the new ``vfsmount`` which is only counted if it is
-different from the previous ``vfsmount``. It then calls
-``path_to_nameidata()`` to install the new ``struct path`` in the
-``struct nameidata`` and drop the unneeded references.
+``struct path`` is created containing a counted reference to the new dentry and
+a reference to the new ``vfsmount`` which is only counted if it is
+different from the previous ``vfsmount``. Then if there is
+a symbolic link, ``step_into()`` calls ``pick_link()`` to deal with it,
+otherwise it installs the new ``struct path`` in the ``struct nameidata``, and
+drops the unneeded references.
This "hand-over-hand" sequencing of getting a reference to the new
dentry before dropping the reference to the previous dentry may
--
2.31.1
On Thu, 27 May 2021, Fox Chen wrote:
> The Path lookup is a very complex subject in VFS. The path-lookup
> document provides a very detailed guidance to help people understand
> how path lookup works in the kernel. This document was originally
> written based on three lwn articles five years ago. As times goes by,
> some of the content is outdated. This patchset is intended to update
> the document to make it more relevant to current codebase.
>
Thanks for persisting. Sorry for the delay.
All:
Reviewed-by: NeilBrown <[email protected]>
I've noted a couple of little issues with one patch. Hopefully Jon can
simply fix those up rather than requiring a resubmission of the whole
series.
To be honest, I haven't examined patch 4 in as much detail as I'd like,
and it required the biggest change since last time. But I think it is
good enough. It might even be excellent.
NeilBrown
On Fri, Jun 18, 2021 at 6:36 AM NeilBrown <[email protected]> wrote:
>
> On Thu, 27 May 2021, Fox Chen wrote:
> > The Path lookup is a very complex subject in VFS. The path-lookup
> > document provides a very detailed guidance to help people understand
> > how path lookup works in the kernel. This document was originally
> > written based on three lwn articles five years ago. As times goes by,
> > some of the content is outdated. This patchset is intended to update
> > the document to make it more relevant to current codebase.
> >
>
> Thanks for persisting. Sorry for the delay.
Thanks for the review. :D
> All:
> Reviewed-by: NeilBrown <[email protected]>
>
> I've noted a couple of little issues with one patch. Hopefully Jon can
> simply fix those up rather than requiring a resubmission of the whole
> series.
if needed, I can resubmit just this single patch.
> To be honest, I haven't examined patch 4 in as much detail as I'd like,
> and it required the biggest change since last time. But I think it is
> good enough. It might even be excellent.
>
> NeilBrown
thanks,
fox
Fox Chen <[email protected]> writes:
> The Path lookup is a very complex subject in VFS. The path-lookup
> document provides a very detailed guidance to help people understand
> how path lookup works in the kernel. This document was originally
> written based on three lwn articles five years ago. As times goes by,
> some of the content is outdated. This patchset is intended to update
> the document to make it more relevant to current codebase.
OK, I have applied this set. I took the liberty of making the changes
suggested by Neil to patch 10.
Thanks for doing this work, and thanks to Neil for reviewing it!
jon