2022-02-15 17:03:00

by Al Viro

[permalink] [raw]
Subject: Re: race between vfs_rename and do_linkat (mv and link)

On Tue, Feb 15, 2022 at 04:17:11PM +0000, Matthew Wilcox wrote:
> On Tue, Feb 15, 2022 at 04:06:06PM +0000, Al Viro wrote:
> > On Tue, Feb 15, 2022 at 01:37:40PM +0000, Al Viro wrote:
> > > On Tue, Feb 15, 2022 at 10:56:29AM +0100, Miklos Szeredi wrote:
> > >
> > > > Doing "lock_rename() + lookup last components" would fix this race.
> >
> > "Fucking ugly" is inadequate for the likely results of that approach.
> > It's guaranteed to be a source of headache for pretty much ever after.
> >
> > Does POSIX actually make any promises in that area? That would affect
> > how high a cost we ought to pay for that - I agree that it would be nicer
> > to have atomicity from userland point of view, but there's a difference
> > between hard bug and QoI issue.
>
> As I understand the original report, it relies on us hitting the nlink ==
> 0 at exactly the wrong moment. Can't we just restart the entire path
> resolution if we find a target with nlink == 0? Sure, it's a lot of
> extra work, but you've got to be trying hard to hit it in the first place.

touch /tmp/blah
exec 42</tmp/blah
rm /tmp/blah
... call linkat() with AT_SYMLINK_FOLLOW and /proc/self/fd/42 for source

Your variant will loop indefinitely on that...


2022-02-16 09:34:17

by Miklos Szeredi

[permalink] [raw]
Subject: Re: race between vfs_rename and do_linkat (mv and link)

On Tue, 15 Feb 2022 at 17:20, Al Viro <[email protected]> wrote:
>
> On Tue, Feb 15, 2022 at 04:17:11PM +0000, Matthew Wilcox wrote:
> > On Tue, Feb 15, 2022 at 04:06:06PM +0000, Al Viro wrote:
> > > On Tue, Feb 15, 2022 at 01:37:40PM +0000, Al Viro wrote:
> > > > On Tue, Feb 15, 2022 at 10:56:29AM +0100, Miklos Szeredi wrote:
> > > >
> > > > > Doing "lock_rename() + lookup last components" would fix this race.
> > >
> > > "Fucking ugly" is inadequate for the likely results of that approach.
> > > It's guaranteed to be a source of headache for pretty much ever after.

So this is a fairly special situation. How about adding a new rwsem
(could possibly be global or per-fs)?

- acquired for read in lock_rename() before inode locks
- acquired for write in do_linkat before inode locks, but only on retry

Thanks,
Miklos

2022-02-16 10:29:28

by Miklos Szeredi

[permalink] [raw]
Subject: Re: race between vfs_rename and do_linkat (mv and link)

On Wed, Feb 16, 2022 at 10:28:20AM +0100, Miklos Szeredi wrote:

> So this is a fairly special situation. How about adding a new rwsem
> (could possibly be global or per-fs)?
>
> - acquired for read in lock_rename() before inode locks
> - acquired for write in do_linkat before inode locks, but only on retry

Something like this:

diff --git a/fs/namei.c b/fs/namei.c
index 3f1829b3ab5b..dd6908cee49d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -122,6 +122,8 @@
* PATH_MAX includes the nul terminator --RR.
*/

+static DECLARE_RWSEM(link_rwsem);
+
#define EMBEDDED_NAME_MAX (PATH_MAX - offsetof(struct filename, iname))

struct filename *
@@ -2961,6 +2963,8 @@ struct dentry *lock_rename(struct dentry *p1, struct dentry *p2)
{
struct dentry *p;

+ down_read(&link_rwsem);
+
if (p1 == p2) {
inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
return NULL;
@@ -2995,6 +2999,8 @@ void unlock_rename(struct dentry *p1, struct dentry *p2)
inode_unlock(p2->d_inode);
mutex_unlock(&p1->d_sb->s_vfs_rename_mutex);
}
+
+ up_read(&link_rwsem);
}
EXPORT_SYMBOL(unlock_rename);

@@ -4456,6 +4462,7 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
struct path old_path, new_path;
struct inode *delegated_inode = NULL;
int how = 0;
+ bool lock = false;
int error;

if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0) {
@@ -4474,10 +4481,13 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,

if (flags & AT_SYMLINK_FOLLOW)
how |= LOOKUP_FOLLOW;
+retry_lock:
+ if (lock)
+ down_write(&link_rwsem);
retry:
error = filename_lookup(olddfd, old, how, &old_path, NULL);
if (error)
- goto out_putnames;
+ goto out_unlock_link;

new_dentry = filename_create(newdfd, new, &new_path,
(how & LOOKUP_REVAL));
@@ -4511,8 +4521,16 @@ int do_linkat(int olddfd, struct filename *old, int newdfd,
how |= LOOKUP_REVAL;
goto retry;
}
+ if (!lock && error == -ENOENT) {
+ path_put(&old_path);
+ lock = true;
+ goto retry_lock;
+ }
out_putpath:
path_put(&old_path);
+out_unlock_link:
+ if (lock)
+ up_write(&link_rwsem);
out_putnames:
putname(old);
putname(new);

2022-02-16 13:56:54

by Xavier Roche

[permalink] [raw]
Subject: Re: race between vfs_rename and do_linkat (mv and link)

On Wed, Feb 16, 2022 at 11:28:18AM +0100, Miklos Szeredi wrote:
> Something like this:
> diff --git a/fs/namei.c b/fs/namei.c
> index 3f1829b3ab5b..dd6908cee49d 100644

Tested-by: Xavier Roche <[email protected]>

I confirm this completely fixes at least the specific race. Tested on a
unpatched and then patched 5.16.5, with the trivial bash test, and then
with a C++ torture test.

Before:
-------

$ time ./linkbug
Failed after 4 with No such file or directory

real 0m0,004s
user 0m0,000s
sys 0m0,004s

After:
------

(no error after ten minutes of running the program)

Torture test program:
---------------------

/* Linux rename vs. linkat race condition.
* Rationale: both (1) moving a file to a target and (2) linking the target to a file in parallel leads to a race
* on Linux kernel.
* Sample file courtesy of Xavier Grand at Algolia
* g++ -pthread linkbug.c -o linkbug
*/

#include <thread>
#include <unistd.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <iostream>
#include <string.h>

static const char* producedDir = "/tmp";
static const char* producedFile = "/tmp/file.txt";
static const char* producedTmpFile = "/tmp/file.txt.tmp";
static const char* producedThreadDir = "/tmp/tmp";
static const char* producedThreadFile = "/tmp/file.txt.tmp.2";

bool createFile(const char* path)
{
const int fdOut = open(path,
O_WRONLY | O_CREAT | O_TRUNC | O_EXCL | O_CLOEXEC,
S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH);
assert(fdOut != -1);
assert(write(fdOut, "Foo", 4) == 4);
assert(close(fdOut) == 0);
return true;
}

void func()
{
int nbSuccess = 0;
// Loop producedThread a hardlink of the file
while (true) {
if (link(producedFile, producedThreadFile) != 0) {
std::cout << "Failed after " << nbSuccess << " with " << strerror(errno) << std::endl;
exit(EXIT_FAILURE);
} else {
nbSuccess++;
}
assert(unlink(producedThreadFile) == 0);
}
}

int main()
{
// Setup env
unlink(producedTmpFile);
unlink(producedFile);
unlink(producedThreadFile);
createFile(producedFile);
mkdir(producedThreadDir, 0777);

// Async thread doing a hardlink and moving it
std::thread t(func);
// Loop creating a .tmp and moving it
while (true) {
assert(createFile(producedTmpFile));
assert(rename(producedTmpFile, producedFile) == 0);
}
return 0;
}