The Path lookup is a very complex subject in VFS. The path-lookup
document provides a very detailed guidance to help people understand
how path lookup works in the kernel. This document was originally
written based on three lwn articles five years ago. As times goes by,
some of the content is outdated. This patchset is intended to update
the document to make it more relevant to current codebase.
---
v1: https://lore.kernel.org/lkml/[email protected]/
v2:
- Fix problems in v1 reviewed by Neil:
1. In Patch 01 and 02 rewrite a new paragrah to describe step_into()
2. In Patch 01 instead of changing it to traverse_mounts, remove follow_managed()
3. In Patch 03 re-telling the story rather than adding notes
4. In Patch 04 do_open() should be outside of loop, fix it and fix other problems
in following paragrah
5. In Patch 07 use "drop out of RCU-walk"
6. In Patch 08 "latter" should be "later", fix it and restructure the next paragrah
removing "Finally"
To help review, I've put a compiled html version here:
http://linux-docs.54fox.com/linux_docs/filesystems/path-lookup-v2.html
Fox Chen (12):
docs: path-lookup: update follow_managed() part
docs: path-lookup: update path_to_nameidata() part
docs: path-lookup: update path_mountpoint() part
docs: path-lookup: update do_last() part
docs: path-lookup: remove filename_mountpoint
docs: path-lookup: Add macro name to symlink limit description
docs: path-lookup: i_op->follow_link replaced with i_op->get_link
docs: path-lookup: update i_op->put_link and cookie description
docs: path-lookup: no get_link()
docs: path-lookup: update WALK_GET, WALK_PUT desc
docs: path-lookup: update get_link() ->follow_link description
docs: path-lookup: update symlink description
Documentation/filesystems/path-lookup.rst | 164 ++++++++++------------
1 file changed, 71 insertions(+), 93 deletions(-)
--
2.30.2
No filename_mountpoint any more
see commit: commit 161aff1d93ab ("LOOKUP_MOUNTPOINT:
fold path_mountpointat() into path_lookupat()")
Without filename_mountpoint and path_mountpoint(), the
numbers should be four & three:
"These four correspond roughly to the three path_*() functions"
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index a65cb477d524..66697db74955 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -652,9 +652,9 @@ restarts from the top with REF-walk.
This pattern of "try RCU-walk, if that fails try REF-walk" can be
clearly seen in functions like ``filename_lookup()``,
-``filename_parentat()``, ``filename_mountpoint()``,
-``do_filp_open()``, and ``do_file_open_root()``. These five
-correspond roughly to the four ``path_*()`` functions we met earlier,
+``filename_parentat()``,
+``do_filp_open()``, and ``do_file_open_root()``. These four
+correspond roughly to the three ``path_*()`` functions we met earlier,
each of which calls ``link_path_walk()``. The ``path_*()`` functions are
called using different mode flags until a mode is found which works.
They are first called with ``LOOKUP_RCU`` set to request "RCU-walk". If
--
2.30.2
follow_link has been replaced by get_link() which can be
called in RCU mode.
see commit: commit 6b2553918d8b ("replace ->follow_link() with
new method that could stay in RCU mode")
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index af5c20fecfef..e6b6c43ff0f6 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1060,13 +1060,11 @@ filesystem cannot successfully get a reference in RCU-walk mode, it
must return ``-ECHILD`` and ``unlazy_walk()`` will be called to return to
REF-walk mode in which the filesystem is allowed to sleep.
-The place for all this to happen is the ``i_op->follow_link()`` inode
-method. In the present mainline code this is never actually called in
-RCU-walk mode as the rewrite is not quite complete. It is likely that
-in a future release this method will be passed an ``inode`` pointer when
-called in RCU-walk mode so it both (1) knows to be careful, and (2) has the
-validated pointer. Much like the ``i_op->permission()`` method we
-looked at previously, ``->follow_link()`` would need to be careful that
+The place for all this to happen is the ``i_op->get_link()`` inode
+method. This is called both in RCU-walk and REF-walk. In RCU-walk the
+``dentry*`` argument is NULL, ``->get_link()`` can return -ECHILD to drop out of
+RCU-walk. Much like the ``i_op->permission()`` method we
+looked at previously, ``->get_link()`` would need to be careful that
all the data structures it references are safe to be accessed while
holding no counted reference, only the RCU lock. Though getting a
reference with ``->follow_link()`` is not yet done in RCU-walk mode, the
--
2.30.2
No inode->put_link operation anymore. We use delayed_call to
deal with link destruction. Cookie has been replaced with
struct delayed_call.
Related commit: commit fceef393a538 ("switch ->get_link() to
delayed_call, kill ->put_link()")
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 30 ++++++-----------------
1 file changed, 8 insertions(+), 22 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index e6b6c43ff0f6..8ab95dd9046e 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1066,34 +1066,20 @@ method. This is called both in RCU-walk and REF-walk. In RCU-walk the
RCU-walk. Much like the ``i_op->permission()`` method we
looked at previously, ``->get_link()`` would need to be careful that
all the data structures it references are safe to be accessed while
-holding no counted reference, only the RCU lock. Though getting a
-reference with ``->follow_link()`` is not yet done in RCU-walk mode, the
-code is ready to release the reference when that does happen.
-
-This need to drop the reference to a symlink adds significant
-complexity. It requires a reference to the inode so that the
-``i_op->put_link()`` inode operation can be called. In REF-walk, that
-reference is kept implicitly through a reference to the dentry, so
-keeping the ``struct path`` of the symlink is easiest. For RCU-walk,
-the pointer to the inode is kept separately. To allow switching from
-RCU-walk back to REF-walk in the middle of processing nested symlinks
-we also need the seq number for the dentry so we can confirm that
-switching back was safe.
-
-Finally, when providing a reference to a symlink, the filesystem also
-provides an opaque "cookie" that must be passed to ``->put_link()`` so that it
-knows what to free. This might be the allocated memory area, or a
-pointer to the ``struct page`` in the page cache, or something else
-completely. Only the filesystem knows what it is.
+holding no counted reference, only the RCU lock. A callback
+``struct delayed_called`` will be passed to get_link,
+file systems can set their own put_link function and argument through
+``set_delayed_call``. Later on, when vfs wants to put link, it will call
+``do_delayed_call`` to invoke that callback function with the argument.
In order for the reference to each symlink to be dropped when the walk completes,
whether in RCU-walk or REF-walk, the symlink stack needs to contain,
along with the path remnants:
-- the ``struct path`` to provide a reference to the inode in REF-walk
-- the ``struct inode *`` to provide a reference to the inode in RCU-walk
+- the ``struct path`` to provide a reference to the previous path
+- the ``const char *`` to provide a reference to the to previous name
- the ``seq`` to allow the path to be safely switched from RCU-walk to REF-walk
-- the ``cookie`` that tells ``->put_path()`` what to put.
+- the ``struct delayed_call`` for later invocation.
This means that each entry in the symlink stack needs to hold five
pointers and an integer instead of just one pointer (the path
--
2.30.2
No follow_managed() anymore, handle_mounts(),
traverse_mounts(), will do the job.
see commit 9deed3ebca24 ("new helper: traverse_mounts()")
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index c482e1619e77..d07766375e13 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -448,10 +448,11 @@ described. If it finds a ``LAST_NORM`` component it first calls
filesystem to revalidate the result if it is that sort of filesystem.
If that doesn't get a good result, it calls "``lookup_slow()``" which
takes ``i_rwsem``, rechecks the cache, and then asks the filesystem
-to find a definitive answer. Each of these will call
-``follow_managed()`` (as described below) to handle any mount points.
+to find a definitive answer.
-In the absence of symbolic links, ``walk_component()`` creates a new
+As the last step of ``walk_component()``, ``step_into()`` will be called either
+directly from walk_component() or from handle_dots(). It calls
+``handle_mount()``, to check and handle mount points, in which a new
``struct path`` containing a counted reference to the new dentry and a
reference to the new ``vfsmount`` which is only counted if it is
different from the previous ``vfsmount``. It then calls
@@ -535,8 +536,7 @@ covered in greater detail in autofs.txt in the Linux documentation
tree, but a few notes specifically related to path lookup are in order
here.
-The Linux VFS has a concept of "managed" dentries which is reflected
-in function names such as "``follow_managed()``". There are three
+The Linux VFS has a concept of "managed" dentries. There are three
potentially interesting things about these dentries corresponding
to three different flags that might be set in ``dentry->d_flags``:
--
2.30.2
No path_to_namei() anymore, step_into() will be called.
Related commit: commit c99687a03a78 ("fold path_to_nameidata()
into its only remaining caller")
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index d07766375e13..a29d714431a3 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -455,9 +455,10 @@ directly from walk_component() or from handle_dots(). It calls
``handle_mount()``, to check and handle mount points, in which a new
``struct path`` containing a counted reference to the new dentry and a
reference to the new ``vfsmount`` which is only counted if it is
-different from the previous ``vfsmount``. It then calls
-``path_to_nameidata()`` to install the new ``struct path`` in the
-``struct nameidata`` and drop the unneeded references.
+different from the previous ``vfsmount`` will be created. Then if there is
+symbolic link, ``step_into()`` calls ``pick_link()`` to deal with it, otherwise
+installs the new ``struct path`` in the ``struct nameidata`` and drop the
+unneeded references.
This "hand-over-hand" sequencing of getting a reference to the new
dentry before dropping the reference to the previous dentry may
--
2.30.2
traling_symlink() was merged into lookup_last, do_last().
do_last() has later been split into open_last_lookups()
and do_open().
see related commit: commit c5971b8c6354 ("take post-lookup
part of do_last() out of loop")
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 35 ++++++++++++-----------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index b6a301b78121..a65cb477d524 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -495,11 +495,11 @@ This is important when unmounting a filesystem that is inaccessible, such as
one provided by a dead NFS server.
Finally ``path_openat()`` is used for the ``open()`` system call; it
-contains, in support functions starting with "``do_last()``", all the
+contains, in support functions starting with "``open_last_lookups()``", all the
complexity needed to handle the different subtleties of O_CREAT (with
or without O_EXCL), final "``/``" characters, and trailing symbolic
links. We will revisit this in the final part of this series, which
-focuses on those symbolic links. "``do_last()``" will sometimes, but
+focuses on those symbolic links. "``open_last_lookups()``" will sometimes, but
not always, take ``i_rwsem``, depending on what it finds.
Each of these, or the functions which call them, need to be alert to
@@ -1199,26 +1199,27 @@ symlink.
This case is handled by the relevant caller of ``link_path_walk()``, such as
``path_lookupat()`` using a loop that calls ``link_path_walk()``, and then
handles the final component. If the final component is a symlink
-that needs to be followed, then ``trailing_symlink()`` is called to set
-things up properly and the loop repeats, calling ``link_path_walk()``
-again. This could loop as many as 40 times if the last component of
-each symlink is another symlink.
+that needs to be followed, then ``open_last_lookups()`` is
+called to set things up properly and the loop repeats, calling
+``link_path_walk()`` again. This could loop as many as 40 times if the last
+component of each symlink is another symlink.
The various functions that examine the final component and possibly
-report that it is a symlink are ``lookup_last()``, ``mountpoint_last()``
-and ``do_last()``, each of which use the same convention as
-``walk_component()`` of returning ``1`` if a symlink was found that needs
-to be followed.
+report that it is a symlink are ``lookup_last()``, ``open_last_lookups()``
+, each of which use the same convention as
+``walk_component()`` of returning ``char *name`` if a symlink was found that
+needs to be followed.
-Of these, ``do_last()`` is the most interesting as it is used for
-opening a file. Part of ``do_last()`` runs with ``i_rwsem`` held and this
-part is in a separate function: ``lookup_open()``.
+Of these, ``open_last_lookups()`` is the most interesting as it works in tandem
+with ``do_open()`` for opening a file. Part of ``open_last_lookups()`` runs
+with ``i_rwsem`` held and this part is in a separate function: ``lookup_open()``.
-Explaining ``do_last()`` completely is beyond the scope of this article,
-but a few highlights should help those interested in exploring the
-code.
+Explaining ``open_last_lookups()`` and ``do_open()`` completely is beyond the scope
+of this article, but a few highlights should help those interested in exploring
+the code.
-1. Rather than just finding the target file, ``do_last()`` needs to open
+1. Rather than just finding the target file, ``do_open()`` is used after
+ ``open_last_lookup()`` to open
it. If the file was found in the dcache, then ``vfs_open()`` is used for
this. If not, then ``lookup_open()`` will either call ``atomic_open()`` (if
the filesystem provides it) to combine the final lookup with the open, or
--
2.30.2
Add macro name MAXSYMLINKS to the symlink limit description, so
that it is consistent with path name length description above.
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index 66697db74955..af5c20fecfef 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -992,8 +992,8 @@ is 4096. There are a number of reasons for this limit; not letting the
kernel spend too much time on just one path is one of them. With
symbolic links you can effectively generate much longer paths so some
sort of limit is needed for the same reason. Linux imposes a limit of
-at most 40 symlinks in any one path lookup. It previously imposed a
-further limit of eight on the maximum depth of recursion, but that was
+at most 40 (MAXSYMLINKS) symlinks in any one path lookup. It previously imposed
+a further limit of eight on the maximum depth of recursion, but that was
raised to 40 when a separate stack was implemented, so there is now
just the one limit.
--
2.30.2
no get_link() anymore. we have step_into() and pick_link().
walk_component() will call step_into(), in turn call pick_link,
and return symlink name.
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index 8ab95dd9046e..0d41c61f7e4f 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1103,12 +1103,10 @@ doesn't need to notice. Getting this ``name`` variable on and off the
stack is very straightforward; pushing and popping the references is
a little more complex.
-When a symlink is found, ``walk_component()`` returns the value ``1``
-(``0`` is returned for any other sort of success, and a negative number
-is, as usual, an error indicator). This causes ``get_link()`` to be
-called; it then gets the link from the filesystem. Providing that
-operation is successful, the old path ``name`` is placed on the stack,
-and the new value is used as the ``name`` for a while. When the end of
+When a symlink is found, ``walk_component()`` calls ``pick_link()``,
+it then gets the link from the filesystem returning new path ``name``.
+Providing that operation is successful, the old path ``name`` is placed on the
+stack, and the new value is used as the ``name`` for a while. When the end of
the path is found (i.e. ``*name`` is ``'\0'``) the old ``name`` is restored
off the stack and path walking continues.
--
2.30.2
path_mountpoint() doesn't exist anymore. Have been folded
into path_lookup_at when flag is set with LOOKUP_MOUNTPOINT.
Check commit: commit 161aff1d93abf0e ("LOOKUP_MOUNTPOINT: fold
path_mountpointat() into path_lookupat()")
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index a29d714431a3..b6a301b78121 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -472,7 +472,7 @@ Handling the final component
``nd->last_type`` to refer to the final component of the path. It does
not call ``walk_component()`` that last time. Handling that final
component remains for the caller to sort out. Those callers are
-``path_lookupat()``, ``path_parentat()``, ``path_mountpoint()`` and
+``path_lookupat()``, ``path_parentat()`` and
``path_openat()`` each of which handles the differing requirements of
different system calls.
@@ -488,12 +488,10 @@ perform their operation.
object is wanted such as by ``stat()`` or ``chmod()``. It essentially just
calls ``walk_component()`` on the final component through a call to
``lookup_last()``. ``path_lookupat()`` returns just the final dentry.
-
-``path_mountpoint()`` handles the special case of unmounting which must
-not try to revalidate the mounted filesystem. It effectively
-contains, through a call to ``mountpoint_last()``, an alternate
-implementation of ``lookup_slow()`` which skips that step. This is
-important when unmounting a filesystem that is inaccessible, such as
+It is worth noting that when flag ``LOOKUP_MOUNTPOINT`` is set,
+``path_lookupat()`` will unset LOOKUP_JUMPED in nameidata so that in the further
+path traversal ``d_weak_revalidate()`` won't be called.
+This is important when unmounting a filesystem that is inaccessible, such as
one provided by a dead NFS server.
Finally ``path_openat()`` is used for the ``open()`` system call; it
--
2.30.2
WALK_GET is changed to WALK_TRAILING with a different meaning.
Here it should be WALK_NOFOLLOW. WALK_PUT dosn't exist, we have
WALK_MORE.
WALK_PUT == !WALK_MORE
And there is not should_follow_link().
Related commits:
commit 8c4efe22e7c4 ("namei: invert the meaning of WALK_FOLLOW")
commit 1c4ff1a87e46 ("namei: invert WALK_PUT logics")
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index 0d41c61f7e4f..abd0153e2415 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1123,13 +1123,11 @@ stack in ``walk_component()`` immediately when the symlink is found;
old symlink as it walks that last component. So it is quite
convenient for ``walk_component()`` to release the old symlink and pop
the references just before pushing the reference information for the
-new symlink. It is guided in this by two flags; ``WALK_GET``, which
-gives it permission to follow a symlink if it finds one, and
-``WALK_PUT``, which tells it to release the current symlink after it has been
-followed. ``WALK_PUT`` is tested first, leading to a call to
-``put_link()``. ``WALK_GET`` is tested subsequently (by
-``should_follow_link()``) leading to a call to ``pick_link()`` which sets
-up the stack frame.
+new symlink. It is guided in this by two flags; ``WALK_NOFOLLOW``, which
+suggests whether to follow a symlink if it finds one, and
+``WALK_MORE``, which tells whether to release the current symlink after it has
+been followed. ``WALK_MORE`` is tested first, leading to a call to
+``put_link()``.
Symlinks with no final component
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--
2.30.2
instead of lookup_real()/vfs_create(), i_op->lookup() and
i_op->create() will be called directly.
update vfs_open() logic
should_follow_link is merged into lookup_last() or open_last_lookup()
which returns symlink name instead of an integer.
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index eef6e9f68fba..adbc714740c2 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1202,16 +1202,15 @@ the code.
it. If the file was found in the dcache, then ``vfs_open()`` is used for
this. If not, then ``lookup_open()`` will either call ``atomic_open()`` (if
the filesystem provides it) to combine the final lookup with the open, or
- will perform the separate ``lookup_real()`` and ``vfs_create()`` steps
+ will perform the separate ``i_op->lookup()`` and ``i_op->create()`` steps
directly. In the later case the actual "open" of this newly found or
created file will be performed by ``vfs_open()``, just as if the name
were found in the dcache.
2. ``vfs_open()`` can fail with ``-EOPENSTALE`` if the cached information
- wasn't quite current enough. Rather than restarting the lookup from
- the top with ``LOOKUP_REVAL`` set, ``lookup_open()`` is called instead,
- giving the filesystem a chance to resolve small inconsistencies.
- If that doesn't work, only then is the lookup restarted from the top.
+ wasn't quite current enough. If it's in RCU-walk -ECHILD will be returned
+ otherwise will return -ESTALE. When -ESTALE is returned, the caller may
+ retry with LOOKUP_REVAL flag set.
3. An open with O_CREAT **does** follow a symlink in the final component,
unlike other creation system calls (like ``mkdir``). So the sequence::
@@ -1221,8 +1220,8 @@ the code.
will create a file called ``/tmp/bar``. This is not permitted if
``O_EXCL`` is set but otherwise is handled for an O_CREAT open much
- like for a non-creating open: ``should_follow_link()`` returns ``1``, and
- so does ``do_last()`` so that ``trailing_symlink()`` gets called and the
+ like for a non-creating open: ``lookup_last()`` or ``open_last_lookup()``
+ returns a non ``Null`` value, and ``link_path_walk()`` gets called and the
open process continues on the symlink that was found.
Updating the access time
--
2.30.2
get_link() is merged into pick_link(). i_op->follow_link is
replaced with i_op->get_link(). get_link() can return ERR_PTR(0)
which equals NULL.
Signed-off-by: Fox Chen <[email protected]>
---
Documentation/filesystems/path-lookup.rst | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
index abd0153e2415..eef6e9f68fba 100644
--- a/Documentation/filesystems/path-lookup.rst
+++ b/Documentation/filesystems/path-lookup.rst
@@ -1134,10 +1134,10 @@ Symlinks with no final component
A pair of special-case symlinks deserve a little further explanation.
Both result in a new ``struct path`` (with mount and dentry) being set
-up in the ``nameidata``, and result in ``get_link()`` returning ``NULL``.
+up in the ``nameidata``, and result in ``pick_link()`` returning ``NULL``.
The more obvious case is a symlink to "``/``". All symlinks starting
-with "``/``" are detected in ``get_link()`` which resets the ``nameidata``
+with "``/``" are detected in ``pick_link()`` which resets the ``nameidata``
to point to the effective filesystem root. If the symlink only
contains "``/``" then there is nothing more to do, no components at all,
so ``NULL`` is returned to indicate that the symlink can be released and
@@ -1154,12 +1154,11 @@ something that looks like a symlink. It is really a reference to the
target file, not just the name of it. When you ``readlink`` these
objects you get a name that might refer to the same file - unless it
has been unlinked or mounted over. When ``walk_component()`` follows
-one of these, the ``->follow_link()`` method in "procfs" doesn't return
+one of these, the ``->get_link()`` method in "procfs" doesn't return
a string name, but instead calls ``nd_jump_link()`` which updates the
-``nameidata`` in place to point to that target. ``->follow_link()`` then
-returns ``NULL``. Again there is no final component and ``get_link()``
-reports this by leaving the ``last_type`` field of ``nameidata`` as
-``LAST_BIND``.
+``nameidata`` in place to point to that target. ``->get_link()`` then
+returns ``0``. Again there is no final component and ``pick_link()``
+returns NULL.
Following the symlink in the final component
--------------------------------------------
--
2.30.2
Fox Chen <[email protected]> writes:
> The Path lookup is a very complex subject in VFS. The path-lookup
> document provides a very detailed guidance to help people understand
> how path lookup works in the kernel. This document was originally
> written based on three lwn articles five years ago. As times goes by,
> some of the content is outdated. This patchset is intended to update
> the document to make it more relevant to current codebase.
Neil, have you had a chance to take a look at these? I'm reluctant to
apply them without your ack...
Thanks,
jon
On Tue, Apr 13 2021, Jonathan Corbet wrote:
> Fox Chen <[email protected]> writes:
>
>> The Path lookup is a very complex subject in VFS. The path-lookup
>> document provides a very detailed guidance to help people understand
>> how path lookup works in the kernel. This document was originally
>> written based on three lwn articles five years ago. As times goes by,
>> some of the content is outdated. This patchset is intended to update
>> the document to make it more relevant to current codebase.
>
> Neil, have you had a chance to take a look at these? I'm reluctant to
> apply them without your ack...
No I haven't, I'm sorry. And I'm on leave at the moment so my attention
is mostly elsewhere. However I'll try to make time to have a look
sometime in the next week or so.
Thanks for the prompt.
NeilBrown
On Tue, Mar 16 2021, Fox Chen wrote:
> No follow_managed() anymore, handle_mounts(),
> traverse_mounts(), will do the job.
> see commit 9deed3ebca24 ("new helper: traverse_mounts()")
>
> Signed-off-by: Fox Chen <[email protected]>
> ---
> Documentation/filesystems/path-lookup.rst | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index c482e1619e77..d07766375e13 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -448,10 +448,11 @@ described. If it finds a ``LAST_NORM`` component it first calls
> filesystem to revalidate the result if it is that sort of filesystem.
> If that doesn't get a good result, it calls "``lookup_slow()``" which
> takes ``i_rwsem``, rechecks the cache, and then asks the filesystem
> -to find a definitive answer. Each of these will call
> -``follow_managed()`` (as described below) to handle any mount points.
> +to find a definitive answer.
>
> -In the absence of symbolic links, ``walk_component()`` creates a new
> +As the last step of ``walk_component()``, ``step_into()`` will be called either
> +directly from walk_component() or from handle_dots(). It calls
> +``handle_mount()``, to check and handle mount points, in which a new
Typo - it is "handle_mounts", not "handle_mount"
With that fixed:
Reviewed-by: NeilBrown <[email protected]>
Thanks,
NeilBrown
> ``struct path`` containing a counted reference to the new dentry and a
> reference to the new ``vfsmount`` which is only counted if it is
> different from the previous ``vfsmount``. It then calls
> @@ -535,8 +536,7 @@ covered in greater detail in autofs.txt in the Linux documentation
> tree, but a few notes specifically related to path lookup are in order
> here.
>
> -The Linux VFS has a concept of "managed" dentries which is reflected
> -in function names such as "``follow_managed()``". There are three
> +The Linux VFS has a concept of "managed" dentries. There are three
> potentially interesting things about these dentries corresponding
> to three different flags that might be set in ``dentry->d_flags``:
>
> --
> 2.30.2
On Tue, Mar 16 2021, Fox Chen wrote:
> No path_to_namei() anymore, step_into() will be called.
> Related commit: commit c99687a03a78 ("fold path_to_nameidata()
> into its only remaining caller")
>
> Signed-off-by: Fox Chen <[email protected]>
> ---
> Documentation/filesystems/path-lookup.rst | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index d07766375e13..a29d714431a3 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -455,9 +455,10 @@ directly from walk_component() or from handle_dots(). It calls
> ``handle_mount()``, to check and handle mount points, in which a new
> ``struct path`` containing a counted reference to the new dentry and a
> reference to the new ``vfsmount`` which is only counted if it is
> -different from the previous ``vfsmount``. It then calls
> -``path_to_nameidata()`` to install the new ``struct path`` in the
> -``struct nameidata`` and drop the unneeded references.
> +different from the previous ``vfsmount`` will be created. Then if there is
That "will be created" messes up the sentence.
It would probably work to put it earlier:
It calls handle_mounts() to check and handle mount points, in which a
new struct path is created containing a counted reference to the new
dentry and a reference to the new vfsmount, which is only counted if
it is different from the previous vfsmount.
(I'm not sure about the comma I put in before the 'which' - Jon often
removes my commas, and sometimes changes 'which' to 'that'...)
> +symbolic link, ``step_into()`` calls ``pick_link()`` to deal with it, otherwise
"a symbolic link"
> +installs the new ``struct path`` in the ``struct nameidata`` and drop the
"it installs". Any maybe "into the". And "drops".
> +unneeded references.
So sentence is:
Then if there is a symbolic link, step_into() calls pick_link() to
deal with it, otherwise it installs the new struct path into the
struct nameidata, and drops the unneeded references.
With those changes,
Reviewed-by: NeilBrown <[email protected]>
Thanks,
NeilBrown
>
> This "hand-over-hand" sequencing of getting a reference to the new
> dentry before dropping the reference to the previous dentry may
> --
> 2.30.2
On Tue, Mar 16 2021, Fox Chen wrote:
> path_mountpoint() doesn't exist anymore. Have been folded
> into path_lookup_at when flag is set with LOOKUP_MOUNTPOINT.
> Check commit: commit 161aff1d93abf0e ("LOOKUP_MOUNTPOINT: fold
> path_mountpointat() into path_lookupat()")
>
> Signed-off-by: Fox Chen <[email protected]>
> ---
> Documentation/filesystems/path-lookup.rst | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index a29d714431a3..b6a301b78121 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -472,7 +472,7 @@ Handling the final component
> ``nd->last_type`` to refer to the final component of the path. It does
> not call ``walk_component()`` that last time. Handling that final
> component remains for the caller to sort out. Those callers are
> -``path_lookupat()``, ``path_parentat()``, ``path_mountpoint()`` and
> +``path_lookupat()``, ``path_parentat()`` and
> ``path_openat()`` each of which handles the differing requirements of
> different system calls.
>
> @@ -488,12 +488,10 @@ perform their operation.
> object is wanted such as by ``stat()`` or ``chmod()``. It essentially just
> calls ``walk_component()`` on the final component through a call to
> ``lookup_last()``. ``path_lookupat()`` returns just the final dentry.
> -
> -``path_mountpoint()`` handles the special case of unmounting which must
> -not try to revalidate the mounted filesystem. It effectively
> -contains, through a call to ``mountpoint_last()``, an alternate
> -implementation of ``lookup_slow()`` which skips that step. This is
> -important when unmounting a filesystem that is inaccessible, such as
> +It is worth noting that when flag ``LOOKUP_MOUNTPOINT`` is set,
> +``path_lookupat()`` will unset LOOKUP_JUMPED in nameidata so that in the further
I would say "subsequent" rather than "further".
Either way:
Reviewed-by: NeilBrown <[email protected]>
Thanks,
NeilBrown
> +path traversal ``d_weak_revalidate()`` won't be called.
> +This is important when unmounting a filesystem that is inaccessible, such as
> one provided by a dead NFS server.
>
> Finally ``path_openat()`` is used for the ``open()`` system call; it
> --
> 2.30.2
On Tue, Mar 16 2021, Fox Chen wrote:
> traling_symlink() was merged into lookup_last, do_last().
>
> do_last() has later been split into open_last_lookups()
> and do_open().
>
> see related commit: commit c5971b8c6354 ("take post-lookup
> part of do_last() out of loop")
>
> Signed-off-by: Fox Chen <[email protected]>
> ---
> Documentation/filesystems/path-lookup.rst | 35 ++++++++++++-----------
> 1 file changed, 18 insertions(+), 17 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index b6a301b78121..a65cb477d524 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -495,11 +495,11 @@ This is important when unmounting a filesystem that is inaccessible, such as
> one provided by a dead NFS server.
>
> Finally ``path_openat()`` is used for the ``open()`` system call; it
> -contains, in support functions starting with "``do_last()``", all the
> +contains, in support functions starting with "``open_last_lookups()``", all the
> complexity needed to handle the different subtleties of O_CREAT (with
> or without O_EXCL), final "``/``" characters, and trailing symbolic
> links. We will revisit this in the final part of this series, which
> -focuses on those symbolic links. "``do_last()``" will sometimes, but
> +focuses on those symbolic links. "``open_last_lookups()``" will sometimes, but
> not always, take ``i_rwsem``, depending on what it finds.
>
> Each of these, or the functions which call them, need to be alert to
> @@ -1199,26 +1199,27 @@ symlink.
> This case is handled by the relevant caller of ``link_path_walk()``, such as
> ``path_lookupat()`` using a loop that calls ``link_path_walk()``, and then
> handles the final component. If the final component is a symlink
> -that needs to be followed, then ``trailing_symlink()`` is called to set
> -things up properly and the loop repeats, calling ``link_path_walk()``
> -again. This could loop as many as 40 times if the last component of
> -each symlink is another symlink.
> +that needs to be followed, then ``open_last_lookups()`` is
> +called to set things up properly and the loop repeats, calling
> +``link_path_walk()`` again. This could loop as many as 40 times if the last
> +component of each symlink is another symlink.
>
> The various functions that examine the final component and possibly
> -report that it is a symlink are ``lookup_last()``, ``mountpoint_last()``
> -and ``do_last()``, each of which use the same convention as
> -``walk_component()`` of returning ``1`` if a symlink was found that needs
> -to be followed.
> +report that it is a symlink are ``lookup_last()``, ``open_last_lookups()``
> +, each of which use the same convention as
> +``walk_component()`` of returning ``char *name`` if a symlink was found that
> +needs to be followed.
This para no longer makes sense.
There is only one function that examines the final compoenent:
step_into()
It is called from open_last_lookups() directly and indirectly from
lookup_last() through walk_component().
But saying that here might be duplicating earlier text.
I think the key point in the para is that convention of returning a
'char *name' if a symlink was found. The rest might now be redundant.
I think this needs a larger revision.
Thanks,
NeilBrown
>
> -Of these, ``do_last()`` is the most interesting as it is used for
> -opening a file. Part of ``do_last()`` runs with ``i_rwsem`` held and this
> -part is in a separate function: ``lookup_open()``.
> +Of these, ``open_last_lookups()`` is the most interesting as it works in tandem
> +with ``do_open()`` for opening a file. Part of ``open_last_lookups()`` runs
> +with ``i_rwsem`` held and this part is in a separate function: ``lookup_open()``.
>
> -Explaining ``do_last()`` completely is beyond the scope of this article,
> -but a few highlights should help those interested in exploring the
> -code.
> +Explaining ``open_last_lookups()`` and ``do_open()`` completely is beyond the scope
> +of this article, but a few highlights should help those interested in exploring
> +the code.
>
> -1. Rather than just finding the target file, ``do_last()`` needs to open
> +1. Rather than just finding the target file, ``do_open()`` is used after
> + ``open_last_lookup()`` to open
> it. If the file was found in the dcache, then ``vfs_open()`` is used for
> this. If not, then ``lookup_open()`` will either call ``atomic_open()`` (if
> the filesystem provides it) to combine the final lookup with the open, or
> --
> 2.30.2
On Tue, Mar 16 2021, Fox Chen wrote:
> No filename_mountpoint any more
> see commit: commit 161aff1d93ab ("LOOKUP_MOUNTPOINT:
> fold path_mountpointat() into path_lookupat()")
>
> Without filename_mountpoint and path_mountpoint(), the
> numbers should be four & three:
>
> "These four correspond roughly to the three path_*() functions"
>
> Signed-off-by: Fox Chen <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Thanks,
NeilBrown
> ---
> Documentation/filesystems/path-lookup.rst | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index a65cb477d524..66697db74955 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -652,9 +652,9 @@ restarts from the top with REF-walk.
>
> This pattern of "try RCU-walk, if that fails try REF-walk" can be
> clearly seen in functions like ``filename_lookup()``,
> -``filename_parentat()``, ``filename_mountpoint()``,
> -``do_filp_open()``, and ``do_file_open_root()``. These five
> -correspond roughly to the four ``path_*()`` functions we met earlier,
> +``filename_parentat()``,
> +``do_filp_open()``, and ``do_file_open_root()``. These four
> +correspond roughly to the three ``path_*()`` functions we met earlier,
> each of which calls ``link_path_walk()``. The ``path_*()`` functions are
> called using different mode flags until a mode is found which works.
> They are first called with ``LOOKUP_RCU`` set to request "RCU-walk". If
> --
> 2.30.2
On Tue, Mar 16 2021, Fox Chen wrote:
> Add macro name MAXSYMLINKS to the symlink limit description, so
> that it is consistent with path name length description above.
>
> Signed-off-by: Fox Chen <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Thanks,
NeilBrown
> ---
> Documentation/filesystems/path-lookup.rst | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index 66697db74955..af5c20fecfef 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -992,8 +992,8 @@ is 4096. There are a number of reasons for this limit; not letting the
> kernel spend too much time on just one path is one of them. With
> symbolic links you can effectively generate much longer paths so some
> sort of limit is needed for the same reason. Linux imposes a limit of
> -at most 40 symlinks in any one path lookup. It previously imposed a
> -further limit of eight on the maximum depth of recursion, but that was
> +at most 40 (MAXSYMLINKS) symlinks in any one path lookup. It previously imposed
> +a further limit of eight on the maximum depth of recursion, but that was
> raised to 40 when a separate stack was implemented, so there is now
> just the one limit.
>
> --
> 2.30.2
On Tue, Mar 16 2021, Fox Chen wrote:
> follow_link has been replaced by get_link() which can be
> called in RCU mode.
>
> see commit: commit 6b2553918d8b ("replace ->follow_link() with
> new method that could stay in RCU mode")
>
> Signed-off-by: Fox Chen <[email protected]>
Reviewed-By: NeilBrown <[email protected]>
Thanks,
NeilBrown
> ---
> Documentation/filesystems/path-lookup.rst | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index af5c20fecfef..e6b6c43ff0f6 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -1060,13 +1060,11 @@ filesystem cannot successfully get a reference in RCU-walk mode, it
> must return ``-ECHILD`` and ``unlazy_walk()`` will be called to return to
> REF-walk mode in which the filesystem is allowed to sleep.
>
> -The place for all this to happen is the ``i_op->follow_link()`` inode
> -method. In the present mainline code this is never actually called in
> -RCU-walk mode as the rewrite is not quite complete. It is likely that
> -in a future release this method will be passed an ``inode`` pointer when
> -called in RCU-walk mode so it both (1) knows to be careful, and (2) has the
> -validated pointer. Much like the ``i_op->permission()`` method we
> -looked at previously, ``->follow_link()`` would need to be careful that
> +The place for all this to happen is the ``i_op->get_link()`` inode
> +method. This is called both in RCU-walk and REF-walk. In RCU-walk the
> +``dentry*`` argument is NULL, ``->get_link()`` can return -ECHILD to drop out of
> +RCU-walk. Much like the ``i_op->permission()`` method we
> +looked at previously, ``->get_link()`` would need to be careful that
> all the data structures it references are safe to be accessed while
> holding no counted reference, only the RCU lock. Though getting a
> reference with ``->follow_link()`` is not yet done in RCU-walk mode, the
> --
> 2.30.2
On Tue, Mar 16 2021, Fox Chen wrote:
> No inode->put_link operation anymore. We use delayed_call to
> deal with link destruction. Cookie has been replaced with
> struct delayed_call.
>
> Related commit: commit fceef393a538 ("switch ->get_link() to
> delayed_call, kill ->put_link()")
>
> Signed-off-by: Fox Chen <[email protected]>
> ---
> Documentation/filesystems/path-lookup.rst | 30 ++++++-----------------
> 1 file changed, 8 insertions(+), 22 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index e6b6c43ff0f6..8ab95dd9046e 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -1066,34 +1066,20 @@ method. This is called both in RCU-walk and REF-walk. In RCU-walk the
> RCU-walk. Much like the ``i_op->permission()`` method we
> looked at previously, ``->get_link()`` would need to be careful that
> all the data structures it references are safe to be accessed while
> -holding no counted reference, only the RCU lock. Though getting a
> -reference with ``->follow_link()`` is not yet done in RCU-walk mode, the
> -code is ready to release the reference when that does happen.
> -
> -This need to drop the reference to a symlink adds significant
> -complexity. It requires a reference to the inode so that the
> -``i_op->put_link()`` inode operation can be called. In REF-walk, that
> -reference is kept implicitly through a reference to the dentry, so
> -keeping the ``struct path`` of the symlink is easiest. For RCU-walk,
> -the pointer to the inode is kept separately. To allow switching from
> -RCU-walk back to REF-walk in the middle of processing nested symlinks
> -we also need the seq number for the dentry so we can confirm that
> -switching back was safe.
> -
> -Finally, when providing a reference to a symlink, the filesystem also
> -provides an opaque "cookie" that must be passed to ``->put_link()`` so that it
> -knows what to free. This might be the allocated memory area, or a
> -pointer to the ``struct page`` in the page cache, or something else
> -completely. Only the filesystem knows what it is.
> +holding no counted reference, only the RCU lock. A callback
> +``struct delayed_called`` will be passed to get_link,
I'd put a ":", not "," at the end of above line.
> +file systems can set their own put_link function and argument through
> +``set_delayed_call``. Later on, when vfs wants to put link, it will call
() after function names please, both above and below.
Also: "when VFS want to put the link"
With these changes:
Reviewed-by: NeilBrown <[email protected]>
Thanks,
NeilBrown
> +``do_delayed_call`` to invoke that callback function with the argument.
>
> In order for the reference to each symlink to be dropped when the walk completes,
> whether in RCU-walk or REF-walk, the symlink stack needs to contain,
> along with the path remnants:
>
> -- the ``struct path`` to provide a reference to the inode in REF-walk
> -- the ``struct inode *`` to provide a reference to the inode in RCU-walk
> +- the ``struct path`` to provide a reference to the previous path
> +- the ``const char *`` to provide a reference to the to previous name
> - the ``seq`` to allow the path to be safely switched from RCU-walk to REF-walk
> -- the ``cookie`` that tells ``->put_path()`` what to put.
> +- the ``struct delayed_call`` for later invocation.
>
> This means that each entry in the symlink stack needs to hold five
> pointers and an integer instead of just one pointer (the path
> --
> 2.30.2
On Tue, Mar 16 2021, Fox Chen wrote:
> no get_link() anymore. we have step_into() and pick_link().
>
> walk_component() will call step_into(), in turn call pick_link,
> and return symlink name.
>
> Signed-off-by: Fox Chen <[email protected]>
> ---
> Documentation/filesystems/path-lookup.rst | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index 8ab95dd9046e..0d41c61f7e4f 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -1103,12 +1103,10 @@ doesn't need to notice. Getting this ``name`` variable on and off the
> stack is very straightforward; pushing and popping the references is
> a little more complex.
>
> -When a symlink is found, ``walk_component()`` returns the value ``1``
> -(``0`` is returned for any other sort of success, and a negative number
> -is, as usual, an error indicator). This causes ``get_link()`` to be
> -called; it then gets the link from the filesystem. Providing that
> -operation is successful, the old path ``name`` is placed on the stack,
> -and the new value is used as the ``name`` for a while. When the end of
> +When a symlink is found, ``walk_component()`` calls ``pick_link()``,
walk_component() calls pick_link() via step_into()
??
> +it then gets the link from the filesystem returning new path ``name``.
"which returns the link from the filesystem."
With those changes (assuming you agree with them)
Reviewed-by: NeilBrown <[email protected]>
Thanks,
NeilBrown
> +Providing that operation is successful, the old path ``name`` is placed on the
> +stack, and the new value is used as the ``name`` for a while. When the end of
> the path is found (i.e. ``*name`` is ``'\0'``) the old ``name`` is restored
> off the stack and path walking continues.
>
> --
> 2.30.2
On Tue, Mar 16 2021, Fox Chen wrote:
> WALK_GET is changed to WALK_TRAILING with a different meaning.
> Here it should be WALK_NOFOLLOW. WALK_PUT dosn't exist, we have
> WALK_MORE.
>
> WALK_PUT == !WALK_MORE
>
> And there is not should_follow_link().
>
> Related commits:
> commit 8c4efe22e7c4 ("namei: invert the meaning of WALK_FOLLOW")
> commit 1c4ff1a87e46 ("namei: invert WALK_PUT logics")
>
> Signed-off-by: Fox Chen <[email protected]>
> ---
> Documentation/filesystems/path-lookup.rst | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index 0d41c61f7e4f..abd0153e2415 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -1123,13 +1123,11 @@ stack in ``walk_component()`` immediately when the symlink is found;
> old symlink as it walks that last component. So it is quite
> convenient for ``walk_component()`` to release the old symlink and pop
> the references just before pushing the reference information for the
> -new symlink. It is guided in this by two flags; ``WALK_GET``, which
> -gives it permission to follow a symlink if it finds one, and
> -``WALK_PUT``, which tells it to release the current symlink after it has been
> -followed. ``WALK_PUT`` is tested first, leading to a call to
> -``put_link()``. ``WALK_GET`` is tested subsequently (by
> -``should_follow_link()``) leading to a call to ``pick_link()`` which sets
> -up the stack frame.
> +new symlink. It is guided in this by two flags; ``WALK_NOFOLLOW``, which
There are 3 flags now. You haven't documented WALK_TRAIlING.
> +suggests whether to follow a symlink if it finds one, and
I don't think it is a suggestion.
.. which forbits it from following a symlink if it finds one, and
WALK_MORE which indicates that it is yet too early to release the
current symlink.
> +``WALK_MORE``, which tells whether to release the current symlink after it has
> +been followed. ``WALK_MORE`` is tested first, leading to a call to
> +``put_link()``.
I don't think that "tested first" sentence is relevant any more.
Thanks,
NeilBrown
>
> Symlinks with no final component
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> --
> 2.30.2
On Tue, Mar 16 2021, Fox Chen wrote:
> get_link() is merged into pick_link(). i_op->follow_link is
> replaced with i_op->get_link(). get_link() can return ERR_PTR(0)
> which equals NULL.
>
> Signed-off-by: Fox Chen <[email protected]>
> ---
> Documentation/filesystems/path-lookup.rst | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index abd0153e2415..eef6e9f68fba 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -1134,10 +1134,10 @@ Symlinks with no final component
>
> A pair of special-case symlinks deserve a little further explanation.
> Both result in a new ``struct path`` (with mount and dentry) being set
> -up in the ``nameidata``, and result in ``get_link()`` returning ``NULL``.
> +up in the ``nameidata``, and result in ``pick_link()`` returning ``NULL``.
>
> The more obvious case is a symlink to "``/``". All symlinks starting
> -with "``/``" are detected in ``get_link()`` which resets the ``nameidata``
> +with "``/``" are detected in ``pick_link()`` which resets the ``nameidata``
> to point to the effective filesystem root. If the symlink only
> contains "``/``" then there is nothing more to do, no components at all,
> so ``NULL`` is returned to indicate that the symlink can be released and
> @@ -1154,12 +1154,11 @@ something that looks like a symlink. It is really a reference to the
> target file, not just the name of it. When you ``readlink`` these
> objects you get a name that might refer to the same file - unless it
> has been unlinked or mounted over. When ``walk_component()`` follows
> -one of these, the ``->follow_link()`` method in "procfs" doesn't return
> +one of these, the ``->get_link()`` method in "procfs" doesn't return
> a string name, but instead calls ``nd_jump_link()`` which updates the
> -``nameidata`` in place to point to that target. ``->follow_link()`` then
> -returns ``NULL``. Again there is no final component and ``get_link()``
> -reports this by leaving the ``last_type`` field of ``nameidata`` as
> -``LAST_BIND``.
> +``nameidata`` in place to point to that target. ``->get_link()`` then
> +returns ``0``. Again there is no final component and ``pick_link()``
Why did you change NULL to 0? ->get_link returns a pointer.
Without that change:
Reviewed-by: NeilBrown <[email protected]>
Thanks,
NeilBrown
> +returns NULL.
>
> Following the symlink in the final component
> --------------------------------------------
> --
> 2.30.2
On Tue, Mar 16 2021, Fox Chen wrote:
> instead of lookup_real()/vfs_create(), i_op->lookup() and
> i_op->create() will be called directly.
>
> update vfs_open() logic
>
> should_follow_link is merged into lookup_last() or open_last_lookup()
> which returns symlink name instead of an integer.
>
> Signed-off-by: Fox Chen <[email protected]>
> ---
> Documentation/filesystems/path-lookup.rst | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> index eef6e9f68fba..adbc714740c2 100644
> --- a/Documentation/filesystems/path-lookup.rst
> +++ b/Documentation/filesystems/path-lookup.rst
> @@ -1202,16 +1202,15 @@ the code.
> it. If the file was found in the dcache, then ``vfs_open()`` is used for
> this. If not, then ``lookup_open()`` will either call ``atomic_open()`` (if
> the filesystem provides it) to combine the final lookup with the open, or
> - will perform the separate ``lookup_real()`` and ``vfs_create()`` steps
> + will perform the separate ``i_op->lookup()`` and ``i_op->create()`` steps
> directly. In the later case the actual "open" of this newly found or
> created file will be performed by ``vfs_open()``, just as if the name
> were found in the dcache.
>
> 2. ``vfs_open()`` can fail with ``-EOPENSTALE`` if the cached information
> - wasn't quite current enough. Rather than restarting the lookup from
> - the top with ``LOOKUP_REVAL`` set, ``lookup_open()`` is called instead,
> - giving the filesystem a chance to resolve small inconsistencies.
> - If that doesn't work, only then is the lookup restarted from the top.
> + wasn't quite current enough. If it's in RCU-walk -ECHILD will be returned
> + otherwise will return -ESTALE. When -ESTALE is returned, the caller may
"otherwise -ESTALE is returned".
If you don't like repeating "is returned", then maybe:
"... -ECHILD will be returned, otherwise the result is -ESTALE".
> + retry with LOOKUP_REVAL flag set.
>
> 3. An open with O_CREAT **does** follow a symlink in the final component,
> unlike other creation system calls (like ``mkdir``). So the sequence::
> @@ -1221,8 +1220,8 @@ the code.
>
> will create a file called ``/tmp/bar``. This is not permitted if
> ``O_EXCL`` is set but otherwise is handled for an O_CREAT open much
> - like for a non-creating open: ``should_follow_link()`` returns ``1``, and
> - so does ``do_last()`` so that ``trailing_symlink()`` gets called and the
> + like for a non-creating open: ``lookup_last()`` or ``open_last_lookup()``
> + returns a non ``Null`` value, and ``link_path_walk()`` gets called and the
"NULL", not "Null".
This those changes,
Reviewed-by: NeilBrown <[email protected]>
Thanks for a lot of all these improvements!! and apologies for the delay
in the review.
Thanks,
NeilBrown
> open process continues on the symlink that was found.
>
> Updating the access time
> --
> 2.30.2
On Tue, Mar 16, 2021 at 01:47:16PM +0800, Fox Chen wrote:
> -In the absence of symbolic links, ``walk_component()`` creates a new
> +As the last step of ``walk_component()``, ``step_into()`` will be called either
You can drop ``..`` from around function named which are followed with
(). d74b0d31ddde ("Docs: An initial automarkup extension for sphinx")
marks them up automatically.
On Mon, Apr 19, 2021 at 9:59 AM NeilBrown <[email protected]> wrote:
>
> On Tue, Mar 16 2021, Fox Chen wrote:
>
> > instead of lookup_real()/vfs_create(), i_op->lookup() and
> > i_op->create() will be called directly.
> >
> > update vfs_open() logic
> >
> > should_follow_link is merged into lookup_last() or open_last_lookup()
> > which returns symlink name instead of an integer.
> >
> > Signed-off-by: Fox Chen <[email protected]>
> > ---
> > Documentation/filesystems/path-lookup.rst | 13 ++++++-------
> > 1 file changed, 6 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst
> > index eef6e9f68fba..adbc714740c2 100644
> > --- a/Documentation/filesystems/path-lookup.rst
> > +++ b/Documentation/filesystems/path-lookup.rst
> > @@ -1202,16 +1202,15 @@ the code.
> > it. If the file was found in the dcache, then ``vfs_open()`` is used for
> > this. If not, then ``lookup_open()`` will either call ``atomic_open()`` (if
> > the filesystem provides it) to combine the final lookup with the open, or
> > - will perform the separate ``lookup_real()`` and ``vfs_create()`` steps
> > + will perform the separate ``i_op->lookup()`` and ``i_op->create()`` steps
> > directly. In the later case the actual "open" of this newly found or
> > created file will be performed by ``vfs_open()``, just as if the name
> > were found in the dcache.
> >
> > 2. ``vfs_open()`` can fail with ``-EOPENSTALE`` if the cached information
> > - wasn't quite current enough. Rather than restarting the lookup from
> > - the top with ``LOOKUP_REVAL`` set, ``lookup_open()`` is called instead,
> > - giving the filesystem a chance to resolve small inconsistencies.
> > - If that doesn't work, only then is the lookup restarted from the top.
> > + wasn't quite current enough. If it's in RCU-walk -ECHILD will be returned
> > + otherwise will return -ESTALE. When -ESTALE is returned, the caller may
>
> "otherwise -ESTALE is returned".
> If you don't like repeating "is returned", then maybe:
> "... -ECHILD will be returned, otherwise the result is -ESTALE".
>
>
> > + retry with LOOKUP_REVAL flag set.
> >
> > 3. An open with O_CREAT **does** follow a symlink in the final component,
> > unlike other creation system calls (like ``mkdir``). So the sequence::
> > @@ -1221,8 +1220,8 @@ the code.
> >
> > will create a file called ``/tmp/bar``. This is not permitted if
> > ``O_EXCL`` is set but otherwise is handled for an O_CREAT open much
> > - like for a non-creating open: ``should_follow_link()`` returns ``1``, and
> > - so does ``do_last()`` so that ``trailing_symlink()`` gets called and the
> > + like for a non-creating open: ``lookup_last()`` or ``open_last_lookup()``
> > + returns a non ``Null`` value, and ``link_path_walk()`` gets called and the
>
> "NULL", not "Null".
>
> This those changes,
> Reviewed-by: NeilBrown <[email protected]>
>
> Thanks for a lot of all these improvements!! and apologies for the delay
> in the review.
Thanks for the review, I will fix them and send the next version back.
> Thanks,
> NeilBrown
>
>
> > open process continues on the symlink that was found.
> >
> > Updating the access time
> > --
> > 2.30.2
thanks,
fox
On Mon, Apr 19, 2021 at 10:17 AM Matthew Wilcox <[email protected]> wrote:
>
> On Tue, Mar 16, 2021 at 01:47:16PM +0800, Fox Chen wrote:
> > -In the absence of symbolic links, ``walk_component()`` creates a new
> > +As the last step of ``walk_component()``, ``step_into()`` will be called either
>
> You can drop ``..`` from around function named which are followed with
> (). d74b0d31ddde ("Docs: An initial automarkup extension for sphinx")
> marks them up automatically.
>
Got it, thanks for letting me know. But I will still use them in this
patch series to keep consistency with the remaining parts of the
document.
thanks,
fox
On Mon, Apr 19, 2021 at 10:33:00AM +0800, Fox Chen wrote:
> On Mon, Apr 19, 2021 at 10:17 AM Matthew Wilcox <[email protected]> wrote:
> >
> > On Tue, Mar 16, 2021 at 01:47:16PM +0800, Fox Chen wrote:
> > > -In the absence of symbolic links, ``walk_component()`` creates a new
> > > +As the last step of ``walk_component()``, ``step_into()`` will be called either
> >
> > You can drop ``..`` from around function named which are followed with
> > (). d74b0d31ddde ("Docs: An initial automarkup extension for sphinx")
> > marks them up automatically.
> >
>
> Got it, thanks for letting me know. But I will still use them in this
> patch series to keep consistency with the remaining parts of the
> document.
Well, you weren't. For example:
+As the last step of ``walk_component()``, ``step_into()`` will be called either
+directly from walk_component() or from handle_dots(). It calls
+``handle_mount()``, to check and handle mount points, in which a new
Neither of the functions on the second line were using ``.
On Mon, Apr 19, 2021 at 11:25 AM Matthew Wilcox <[email protected]> wrote:
>
> On Mon, Apr 19, 2021 at 10:33:00AM +0800, Fox Chen wrote:
> > On Mon, Apr 19, 2021 at 10:17 AM Matthew Wilcox <[email protected]> wrote:
> > >
> > > On Tue, Mar 16, 2021 at 01:47:16PM +0800, Fox Chen wrote:
> > > > -In the absence of symbolic links, ``walk_component()`` creates a new
> > > > +As the last step of ``walk_component()``, ``step_into()`` will be called either
> > >
> > > You can drop ``..`` from around function named which are followed with
> > > (). d74b0d31ddde ("Docs: An initial automarkup extension for sphinx")
> > > marks them up automatically.
> > >
> >
> > Got it, thanks for letting me know. But I will still use them in this
> > patch series to keep consistency with the remaining parts of the
> > document.
>
> Well, you weren't. For example:
>
> +As the last step of ``walk_component()``, ``step_into()`` will be called either
> +directly from walk_component() or from handle_dots(). It calls
> +``handle_mount()``, to check and handle mount points, in which a new
>
> Neither of the functions on the second line were using ``.
Oh, That was a mistake, They should've been wrapped with ``.
Thanks for pointing it out. I will go through the whole patch set and
fix this type of inconsistency in V3.
thanks,
fox
Fox Chen <[email protected]> writes:
> On Mon, Apr 19, 2021 at 11:25 AM Matthew Wilcox <[email protected]> wrote:
>>
>> On Mon, Apr 19, 2021 at 10:33:00AM +0800, Fox Chen wrote:
>> > On Mon, Apr 19, 2021 at 10:17 AM Matthew Wilcox <[email protected]> wrote:
>> > > You can drop ``..`` from around function named which are followed with
>> > > (). d74b0d31ddde ("Docs: An initial automarkup extension for sphinx")
>> > > marks them up automatically.
>> > >
>> >
>> > Got it, thanks for letting me know. But I will still use them in this
>> > patch series to keep consistency with the remaining parts of the
>> > document.
>>
>> Well, you weren't. For example:
>>
>> +As the last step of ``walk_component()``, ``step_into()`` will be called either
>> +directly from walk_component() or from handle_dots(). It calls
>> +``handle_mount()``, to check and handle mount points, in which a new
>>
>> Neither of the functions on the second line were using ``.
>
> Oh, That was a mistake, They should've been wrapped with ``.
> Thanks for pointing it out. I will go through the whole patch set and
> fix this type of inconsistency in V3.
Please, if possible, go toward the bare function() form rather than
using literals...it's easier to read and the docs system will
automatically create cross references for you.
Thanks,
jon
On Tue, Apr 20, 2021 at 3:22 AM Jonathan Corbet <[email protected]> wrote:
>
> Fox Chen <[email protected]> writes:
>
> > On Mon, Apr 19, 2021 at 11:25 AM Matthew Wilcox <[email protected]> wrote:
> >>
> >> On Mon, Apr 19, 2021 at 10:33:00AM +0800, Fox Chen wrote:
> >> > On Mon, Apr 19, 2021 at 10:17 AM Matthew Wilcox <[email protected]> wrote:
> >> > > You can drop ``..`` from around function named which are followed with
> >> > > (). d74b0d31ddde ("Docs: An initial automarkup extension for sphinx")
> >> > > marks them up automatically.
> >> > >
> >> >
> >> > Got it, thanks for letting me know. But I will still use them in this
> >> > patch series to keep consistency with the remaining parts of the
> >> > document.
> >>
> >> Well, you weren't. For example:
> >>
> >> +As the last step of ``walk_component()``, ``step_into()`` will be called either
> >> +directly from walk_component() or from handle_dots(). It calls
> >> +``handle_mount()``, to check and handle mount points, in which a new
> >>
> >> Neither of the functions on the second line were using ``.
> >
> > Oh, That was a mistake, They should've been wrapped with ``.
> > Thanks for pointing it out. I will go through the whole patch set and
> > fix this type of inconsistency in V3.
>
> Please, if possible, go toward the bare function() form rather than
> using literals...it's easier to read and the docs system will
> automatically create cross references for you.
>
> Thanks,
>
> jon
Ok, If you have no problem with that inconsistency, I will go with the
bare one in v3.
thanks,
fox