2021-09-08 17:47:14

by David Howells

[permalink] [raw]
Subject: [PATCH 0/6] afs: Fixes for 3rd party-induced data corruption


Here are some fixes for AFS that can cause data corruption due to
interaction with another client modifying data cached locally[1].

(1) When d_revalidating a dentry, don't look at the inode to which it
points. Only check the directory to which the dentry belongs. This
was confusing things and causing the silly-rename cleanup code to
remove the file now at the dentry of a file that got deleted.

(2) Fix mmap data coherency. When a callback break is received that
relates to a file that we have cached, the data content may have been
changed (there are other reasons, such as the user's rights having
been changed). However, we're checking it lazily, only on entry to
the kernel, which doesn't happen if we have a writeable shared mapped
page on that file.

We make the kernel keep track of mmapped files and clear all PTEs
mapping to that file as soon as the callback comes in by calling
unmap_mapping_pages() (we don't necessarily want to zap the
pagecache). This causes the kernel to be reentered when userspace
tries to access the mmapped address range again - and at that point we
can query the server and, if we need to, zap the page cache.

Ideally, I would check each file at the point of notification, but
that involves poking the server[*] - which is holding up final closure
of the change it is making, waiting for all the clients it notified to
reply. This could then deadlock against the server. Further,
invalidating the pagecache might call ->launder_page(), which would
try to write to the file, which would definitely deadlock. (AFS
doesn't lease file access).

[*] Checking to see if the file content has changed is a matter of
comparing the current data version number, but we have to ask the
server for that. We also need to get a new callback promise and
we need to poke the server for that too.

(3) Add some more points at which the inode is validated, since we're
doing it lazily, notably in ->read_iter() and ->page_mkwrite(), but
also when performing some directory operations.

Ideally, checking in ->read_iter() would be done in some derivation of
filemap_read(). If we're going to call the server to read the file,
then we get the file status fetch as part of that.

(4) The above is now causing us to make a lot more calls to afs_validate()
to check the inode - and afs_validate() takes the RCU read lock each
time to make a quick check (ie. afs_check_validity()). This is
entirely for the purpose of checking cb_s_break to see if the server
we're using reinitialised its list of callbacks - however this isn't a
very common operation, so most of the time we're taking this
needlessly.

Add a new cell-wide counter to count the number of reinitialisations
done by any server and check that - and only if that changes, take the
RCU read lock and check the server list (the server list may change,
but the cell a file is part of won't).

(5) Don't update vnode->cb_s_break and ->cb_v_break inside the validity
checking loop. The cb_lock is done with read_seqretry, so we might go
round the loop a second time after resetting those values - and that
could cause someone else checking validity to miss something (I
think).

Also included are patches for fixes for some bugs encountered whilst
debugging this.

(6) Fix a leak of afs_read objects and fix a leak of keys hidden by that.

(7) Fix a leak of pages that couldn't be added to extend a writeback.


The patches can be found here:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-fixes

David

Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217 [1]

---
David Howells (6):
afs: Fix missing put on afs_read objects and missing get on the key therein
afs: Fix page leak
afs: Add missing vnode validation checks
afs: Fix incorrect triggering of sillyrename on 3rd-party invalidation
afs: Fix mmap coherency vs 3rd-party changes
afs: Try to avoid taking RCU read lock when checking vnode validity


fs/afs/callback.c | 44 ++++++++++++++++++-
fs/afs/cell.c | 2 +
fs/afs/dir.c | 57 ++++++++----------------
fs/afs/file.c | 83 ++++++++++++++++++++++++++++++++++-
fs/afs/inode.c | 88 +++++++++++++++++++-------------------
fs/afs/internal.h | 10 +++++
fs/afs/rotate.c | 1 +
fs/afs/server.c | 2 +
fs/afs/super.c | 1 +
fs/afs/write.c | 27 ++++++++++--
include/trace/events/afs.h | 8 +++-
mm/memory.c | 1 +
12 files changed, 230 insertions(+), 94 deletions(-)



2021-09-08 17:47:17

by David Howells

[permalink] [raw]
Subject: [PATCH 4/6] afs: Fix incorrect triggering of sillyrename on 3rd-party invalidation

The AFS filesystem is currently triggering the silly-rename cleanup from
afs_d_revalidate() when it sees that a dentry has been changed by a third
party[1]. It should not be doing this as the cleanup includes deleting the
silly-rename target file on iput.

Fix this by removing the places in the d_revalidate handling that validate
anything other than the directory and the dirent. It probably should not
be looking to validate the target inode of the dentry also.

This includes removing the point in afs_d_revalidate() where the inode that
a dentry used to point to was marked as being deleted (AFS_VNODE_DELETED).
We don't know it got deleted. It could have been renamed or it could have
hard links remaining.

This was reproduced by cloning a git repo onto an afs volume on one
machine, switching to another machine and doing "git status", then
switching back to the first and doing "git status". The second status
would show weird output due to ".git/index" getting deleted by the above
mentioned mechanism.

A simpler way to do it is to do:

machine 1: touch a
machine 2: touch b; mv -f b a
machine 1: stat a

on an afs volume. The bug shows up as the stat failing with ENOENT and the
file server log showing that machine 1 deleted "a".

Fixes: 79ddbfa500b3 ("afs: Implement sillyrename for unlink and rename")
Reported-by: Markus Suvanto <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c4 [1]
---

fs/afs/dir.c | 46 +++++++---------------------------------------
1 file changed, 7 insertions(+), 39 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index a8e3ae55f1f9..4579bbda4634 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -1077,9 +1077,9 @@ static struct dentry *afs_lookup(struct inode *dir, struct dentry *dentry,
*/
static int afs_d_revalidate_rcu(struct dentry *dentry)
{
- struct afs_vnode *dvnode, *vnode;
+ struct afs_vnode *dvnode;
struct dentry *parent;
- struct inode *dir, *inode;
+ struct inode *dir;
long dir_version, de_version;

_enter("%p", dentry);
@@ -1109,18 +1109,6 @@ static int afs_d_revalidate_rcu(struct dentry *dentry)
return -ECHILD;
}

- /* Check to see if the vnode referred to by the dentry still
- * has a callback.
- */
- if (d_really_is_positive(dentry)) {
- inode = d_inode_rcu(dentry);
- if (inode) {
- vnode = AFS_FS_I(inode);
- if (!afs_check_validity(vnode))
- return -ECHILD;
- }
- }
-
return 1; /* Still valid */
}

@@ -1156,17 +1144,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags)
if (IS_ERR(key))
key = NULL;

- if (d_really_is_positive(dentry)) {
- inode = d_inode(dentry);
- if (inode) {
- vnode = AFS_FS_I(inode);
- afs_validate(vnode, key);
- if (test_bit(AFS_VNODE_DELETED, &vnode->flags))
- goto out_bad;
- }
- }
-
- /* lock down the parent dentry so we can peer at it */
+ /* Hold the parent dentry so we can peer at it */
parent = dget_parent(dentry);
dir = AFS_FS_I(d_inode(parent));

@@ -1175,7 +1153,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags)

if (test_bit(AFS_VNODE_DELETED, &dir->flags)) {
_debug("%pd: parent dir deleted", dentry);
- goto out_bad_parent;
+ goto not_found;
}

/* We only need to invalidate a dentry if the server's copy changed
@@ -1201,12 +1179,12 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags)
case 0:
/* the filename maps to something */
if (d_really_is_negative(dentry))
- goto out_bad_parent;
+ goto not_found;
inode = d_inode(dentry);
if (is_bad_inode(inode)) {
printk("kAFS: afs_d_revalidate: %pd2 has bad inode\n",
dentry);
- goto out_bad_parent;
+ goto not_found;
}

vnode = AFS_FS_I(inode);
@@ -1228,9 +1206,6 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags)
dentry, fid.unique,
vnode->fid.unique,
vnode->vfs_inode.i_generation);
- write_seqlock(&vnode->cb_lock);
- set_bit(AFS_VNODE_DELETED, &vnode->flags);
- write_sequnlock(&vnode->cb_lock);
goto not_found;
}
goto out_valid;
@@ -1245,7 +1220,7 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags)
default:
_debug("failed to iterate dir %pd: %d",
parent, ret);
- goto out_bad_parent;
+ goto not_found;
}

out_valid:
@@ -1256,16 +1231,9 @@ static int afs_d_revalidate(struct dentry *dentry, unsigned int flags)
_leave(" = 1 [valid]");
return 1;

- /* the dirent, if it exists, now points to a different vnode */
not_found:
- spin_lock(&dentry->d_lock);
- dentry->d_flags |= DCACHE_NFSFS_RENAMED;
- spin_unlock(&dentry->d_lock);
-
-out_bad_parent:
_debug("dropping dentry %pd2", dentry);
dput(parent);
-out_bad:
key_put(key);

_leave(" = 0 [bad]");


2021-09-08 17:47:18

by David Howells

[permalink] [raw]
Subject: [PATCH 1/6] afs: Fix missing put on afs_read objects and missing get on the key therein

The afs_read objects created by afs_req_issue_op() get leaked because
afs_alloc_read() returns a ref and then afs_fetch_data() gets its own ref
which is released when the operation completes, but the initial ref is
never released.

Fix this by discarding the initial ref at the end of afs_req_issue_op().

This leak also covered another bug whereby a ref isn't got on the key
attached to the read record by afs_req_issue_op(). This isn't a problem as
long as the afs_read req never goes away...

Fix this by calling key_get() in afs_req_issue_op().

This was found by the generic/074 test. It leaks a bunch of kmalloc-192
objects each time it is run, which can be observed by watching
/proc/slabinfo.

Fixes: f7605fa869cf ("afs: Fix leak of afs_read objects")
Reported-by: Marc Dionne <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163010394740.3035676.8516846193899793357.stgit@warthog.procyon.org.uk/
---

fs/afs/file.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index db035ae2a134..6688fff14b0b 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -295,7 +295,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest *subreq)
fsreq->subreq = subreq;
fsreq->pos = subreq->start + subreq->transferred;
fsreq->len = subreq->len - subreq->transferred;
- fsreq->key = subreq->rreq->netfs_priv;
+ fsreq->key = key_get(subreq->rreq->netfs_priv);
fsreq->vnode = vnode;
fsreq->iter = &fsreq->def_iter;

@@ -304,6 +304,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest *subreq)
fsreq->pos, fsreq->len);

afs_fetch_data(fsreq->vnode, fsreq);
+ afs_put_read(fsreq);
}

static int afs_symlink_readpage(struct page *page)


2021-09-08 17:47:18

by David Howells

[permalink] [raw]
Subject: [PATCH 3/6] afs: Add missing vnode validation checks

afs_d_revalidate() should only be validating the directory entry it is
given and the directory to which that belongs; it shouldn't be validating
the inode/vnode to which that dentry points. Besides, validation need to
be done even if we don't call afs_d_revalidate() - which might be the case
if we're starting from a file descriptor.

In order for afs_d_revalidate() to be fixed, validation points must be
added in some other places. Certain directory operations, such as
afs_unlink(), already check this, but not all and not all file operations
either.

Note that the validation of a vnode not only checks to see if the
attributes we have are correct, but also gets a promise from the server to
notify us if that file gets changed by a third party.

Add the following checks:

- Check the vnode we're going to make a hard link to.
- Check the vnode we're going to move/rename.
- Check the vnode we're going to read from.
- Check the vnode we're going to write to.
- Check the vnode we're going to sync.
- Check the vnode we're going to make a mapped page writable for.

Some of these aren't strictly necessary as we're going to perform a server
operation that might get the attributes anyway from which we can determine
if something changed - though it might not get us a callback promise.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
---

fs/afs/dir.c | 11 +++++++++++
fs/afs/file.c | 16 +++++++++++++++-
fs/afs/write.c | 17 +++++++++++++++--
3 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index ac829e63c570..a8e3ae55f1f9 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -1792,6 +1792,10 @@ static int afs_link(struct dentry *from, struct inode *dir,
goto error;
}

+ ret = afs_validate(vnode, op->key);
+ if (ret < 0)
+ goto error_op;
+
afs_op_set_vnode(op, 0, dvnode);
afs_op_set_vnode(op, 1, vnode);
op->file[0].dv_delta = 1;
@@ -1805,6 +1809,8 @@ static int afs_link(struct dentry *from, struct inode *dir,
op->create.reason = afs_edit_dir_for_link;
return afs_do_sync_operation(op);

+error_op:
+ afs_put_operation(op);
error:
d_drop(dentry);
_leave(" = %d", ret);
@@ -1989,6 +1995,11 @@ static int afs_rename(struct user_namespace *mnt_userns, struct inode *old_dir,
if (IS_ERR(op))
return PTR_ERR(op);

+ ret = afs_validate(vnode, op->key);
+ op->error = ret;
+ if (ret < 0)
+ goto error;
+
afs_op_set_vnode(op, 0, orig_dvnode);
afs_op_set_vnode(op, 1, new_dvnode); /* May be same as orig_dvnode */
op->file[0].dv_delta = 1;
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 6688fff14b0b..4c8d786b53e0 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -24,12 +24,13 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
static int afs_releasepage(struct page *page, gfp_t gfp_flags);

static void afs_readahead(struct readahead_control *ractl);
+static ssize_t afs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter);

const struct file_operations afs_file_operations = {
.open = afs_open,
.release = afs_release,
.llseek = generic_file_llseek,
- .read_iter = generic_file_read_iter,
+ .read_iter = afs_file_read_iter,
.write_iter = afs_file_write,
.mmap = afs_file_mmap,
.splice_read = generic_file_splice_read,
@@ -503,3 +504,16 @@ static int afs_file_mmap(struct file *file, struct vm_area_struct *vma)
vma->vm_ops = &afs_vm_ops;
return ret;
}
+
+static ssize_t afs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+ struct afs_vnode *vnode = AFS_FS_I(file_inode(iocb->ki_filp));
+ struct afs_file *af = iocb->ki_filp->private_data;
+ int ret;
+
+ ret = afs_validate(vnode, af->key);
+ if (ret < 0)
+ return ret;
+
+ return generic_file_read_iter(iocb, iter);
+}
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 66b235266893..32a764c24284 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -807,6 +807,7 @@ int afs_writepages(struct address_space *mapping,
ssize_t afs_file_write(struct kiocb *iocb, struct iov_iter *from)
{
struct afs_vnode *vnode = AFS_FS_I(file_inode(iocb->ki_filp));
+ struct afs_file *af = iocb->ki_filp->private_data;
ssize_t result;
size_t count = iov_iter_count(from);

@@ -822,6 +823,10 @@ ssize_t afs_file_write(struct kiocb *iocb, struct iov_iter *from)
if (!count)
return 0;

+ result = afs_validate(vnode, af->key);
+ if (result < 0)
+ return result;
+
result = generic_file_write_iter(iocb, from);

_leave(" = %zd", result);
@@ -835,13 +840,18 @@ ssize_t afs_file_write(struct kiocb *iocb, struct iov_iter *from)
*/
int afs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct inode *inode = file_inode(file);
- struct afs_vnode *vnode = AFS_FS_I(inode);
+ struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
+ struct afs_file *af = file->private_data;
+ int ret;

_enter("{%llx:%llu},{n=%pD},%d",
vnode->fid.vid, vnode->fid.vnode, file,
datasync);

+ ret = afs_validate(vnode, af->key);
+ if (ret < 0)
+ return ret;
+
return file_write_and_wait_range(file, start, end);
}

@@ -855,11 +865,14 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
struct file *file = vmf->vma->vm_file;
struct inode *inode = file_inode(file);
struct afs_vnode *vnode = AFS_FS_I(inode);
+ struct afs_file *af = file->private_data;
unsigned long priv;
vm_fault_t ret = VM_FAULT_RETRY;

_enter("{{%llx:%llu}},{%lx}", vnode->fid.vid, vnode->fid.vnode, page->index);

+ afs_validate(vnode, af->key);
+
sb_start_pagefault(inode->i_sb);

/* Wait for the page to be written to the cache before we allow it to


2021-09-08 18:12:23

by David Howells

[permalink] [raw]
Subject: [PATCH 6/6] afs: Try to avoid taking RCU read lock when checking vnode validity

Try to avoid taking the RCU read lock when checking the validity of a
vnode's callback state. The only thing it's needed for is to pin the
parent volume's server list whilst we search it to find the record of the
server we're currently using to see if it has been reinitialised (ie. it
sent us a CB.InitCallBackState* RPC).

Do this by the following means:

(1) Keep an additional per-cell counter (fs_s_break) that's incremented
each time any of the fileservers in the cell reinitialises.

Since the new counter can be accessed without RCU from the vnode, we
can check that first - and only if it differs, get the RCU read lock
and check the volume's server list.

(2) Replace afs_get_s_break_rcu() with afs_check_server_good() which now
indicates whether the callback promise is still expected to be present
on the server. This does the checks as described in (1).

(3) Restructure afs_check_validity() to take account of the change in (2).

We can also get rid of the valid variable and just use the need_clear
variable with the addition of the afs_cb_break_no_promise reason.

(4) afs_check_validity() probably shouldn't be altering vnode->cb_v_break
and vnode->cb_s_break when it doesn't have cb_lock exclusively locked.

Move the change to vnode->cb_v_break to __afs_break_callback().

Delegate the change to vnode->cb_s_break to afs_select_fileserver()
and set vnode->cb_fs_s_break there also.

(5) afs_validate() no longer needs to get the RCU read lock around its call
to afs_check_validity() - and can skip the call entirely if we don't have
a promise.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
---

fs/afs/callback.c | 2 +
fs/afs/inode.c | 88 ++++++++++++++++++++++----------------------
fs/afs/internal.h | 2 +
fs/afs/rotate.c | 1 +
include/trace/events/afs.h | 8 +++-
5 files changed, 54 insertions(+), 47 deletions(-)

diff --git a/fs/afs/callback.c b/fs/afs/callback.c
index 1dd7543dbf9f..1b4d5809808d 100644
--- a/fs/afs/callback.c
+++ b/fs/afs/callback.c
@@ -60,6 +60,7 @@ void afs_init_callback_state(struct afs_server *server)
rcu_read_lock();
do {
server->cb_s_break++;
+ atomic_inc(&server->cell->fs_s_break);
if (!list_empty(&server->cell->fs_open_mmaps))
queue_work(system_unbound_wq, &server->initcb_work);

@@ -77,6 +78,7 @@ void __afs_break_callback(struct afs_vnode *vnode, enum afs_cb_break_reason reas
clear_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags);
if (test_and_clear_bit(AFS_VNODE_CB_PROMISED, &vnode->flags)) {
vnode->cb_break++;
+ vnode->cb_v_break = vnode->volume->cb_v_break;
afs_clear_permits(vnode);

if (vnode->lock_state == AFS_VNODE_LOCK_WAITING_FOR_CB)
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 80b6c8d967d5..126daf9969db 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -587,22 +587,32 @@ static void afs_zap_data(struct afs_vnode *vnode)
}

/*
- * Get the server reinit counter for a vnode's current server.
+ * Check to see if we have a server currently serving this volume and that it
+ * hasn't been reinitialised or dropped from the list.
*/
-static bool afs_get_s_break_rcu(struct afs_vnode *vnode, unsigned int *_s_break)
+static bool afs_check_server_good(struct afs_vnode *vnode)
{
- struct afs_server_list *slist = rcu_dereference(vnode->volume->servers);
+ struct afs_server_list *slist;
struct afs_server *server;
+ bool good;
int i;

+ if (vnode->cb_fs_s_break == atomic_read(&vnode->volume->cell->fs_s_break))
+ return true;
+
+ rcu_read_lock();
+
+ slist = rcu_dereference(vnode->volume->servers);
for (i = 0; i < slist->nr_servers; i++) {
server = slist->servers[i].server;
if (server == vnode->cb_server) {
- *_s_break = READ_ONCE(server->cb_s_break);
- return true;
+ good = (vnode->cb_s_break == server->cb_s_break);
+ rcu_read_unlock();
+ return good;
}
}

+ rcu_read_unlock();
return false;
}

@@ -611,57 +621,46 @@ static bool afs_get_s_break_rcu(struct afs_vnode *vnode, unsigned int *_s_break)
*/
bool afs_check_validity(struct afs_vnode *vnode)
{
- struct afs_volume *volume = vnode->volume;
enum afs_cb_break_reason need_clear = afs_cb_break_no_break;
time64_t now = ktime_get_real_seconds();
- bool valid;
- unsigned int cb_break, cb_s_break, cb_v_break;
+ unsigned int cb_break;
int seq = 0;

do {
read_seqbegin_or_lock(&vnode->cb_lock, &seq);
- cb_v_break = READ_ONCE(volume->cb_v_break);
cb_break = vnode->cb_break;

- if (test_bit(AFS_VNODE_CB_PROMISED, &vnode->flags) &&
- afs_get_s_break_rcu(vnode, &cb_s_break)) {
- if (vnode->cb_s_break != cb_s_break ||
- vnode->cb_v_break != cb_v_break) {
- vnode->cb_s_break = cb_s_break;
- vnode->cb_v_break = cb_v_break;
- need_clear = afs_cb_break_for_vsbreak;
- valid = false;
- } else if (test_bit(AFS_VNODE_ZAP_DATA, &vnode->flags)) {
+ if (test_bit(AFS_VNODE_CB_PROMISED, &vnode->flags)) {
+ if (vnode->cb_v_break != vnode->volume->cb_v_break)
+ need_clear = afs_cb_break_for_v_break;
+ else if (!afs_check_server_good(vnode))
+ need_clear = afs_cb_break_for_s_reinit;
+ else if (test_bit(AFS_VNODE_ZAP_DATA, &vnode->flags))
need_clear = afs_cb_break_for_zap;
- valid = false;
- } else if (vnode->cb_expires_at - 10 <= now) {
+ else if (vnode->cb_expires_at - 10 <= now)
need_clear = afs_cb_break_for_lapsed;
- valid = false;
- } else {
- valid = true;
- }
} else if (test_bit(AFS_VNODE_DELETED, &vnode->flags)) {
- valid = true;
+ ;
} else {
- vnode->cb_v_break = cb_v_break;
- valid = false;
+ need_clear = afs_cb_break_no_promise;
}

} while (need_seqretry(&vnode->cb_lock, seq));

done_seqretry(&vnode->cb_lock, seq);

- if (need_clear != afs_cb_break_no_break) {
- write_seqlock(&vnode->cb_lock);
- if (cb_break == vnode->cb_break)
- __afs_break_callback(vnode, need_clear);
- else
- trace_afs_cb_miss(&vnode->fid, need_clear);
- write_sequnlock(&vnode->cb_lock);
- valid = false;
- }
+ if (need_clear == afs_cb_break_no_break)
+ return true;

- return valid;
+ write_seqlock(&vnode->cb_lock);
+ if (need_clear == afs_cb_break_no_promise)
+ vnode->cb_v_break = vnode->volume->cb_v_break;
+ else if (cb_break == vnode->cb_break)
+ __afs_break_callback(vnode, need_clear);
+ else
+ trace_afs_cb_miss(&vnode->fid, need_clear);
+ write_sequnlock(&vnode->cb_lock);
+ return false;
}

/*
@@ -675,21 +674,20 @@ bool afs_check_validity(struct afs_vnode *vnode)
*/
int afs_validate(struct afs_vnode *vnode, struct key *key)
{
- bool valid;
int ret;

_enter("{v={%llx:%llu} fl=%lx},%x",
vnode->fid.vid, vnode->fid.vnode, vnode->flags,
key_serial(key));

- rcu_read_lock();
- valid = afs_check_validity(vnode);
- rcu_read_unlock();
-
- if (test_bit(AFS_VNODE_DELETED, &vnode->flags))
- clear_nlink(&vnode->vfs_inode);
+ if (unlikely(test_bit(AFS_VNODE_DELETED, &vnode->flags))) {
+ if (vnode->vfs_inode.i_nlink)
+ clear_nlink(&vnode->vfs_inode);
+ goto valid;
+ }

- if (valid)
+ if (test_bit(AFS_VNODE_CB_PROMISED, &vnode->flags) &&
+ afs_check_validity(vnode))
goto valid;

down_write(&vnode->validate_lock);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 0deeb76c67d0..c97618855b46 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -392,6 +392,7 @@ struct afs_cell {
seqlock_t fs_lock; /* For fs_servers */
struct rw_semaphore fs_open_mmaps_lock;
struct list_head fs_open_mmaps; /* List of vnodes that are mmapped */
+ atomic_t fs_s_break; /* Counter of CB.InitCallBackState messages */

/* VL server list. */
rwlock_t vl_servers_lock; /* Lock on vl_servers */
@@ -664,6 +665,7 @@ struct afs_vnode {
struct list_head cb_mmap_link; /* Link in cell->fs_open_mmaps */
void *cb_server; /* Server with callback/filelock */
atomic_t cb_nr_mmap; /* Number of mmaps */
+ unsigned int cb_fs_s_break; /* Mass server break counter (cell->fs_s_break) */
unsigned int cb_s_break; /* Mass break counter on ->server */
unsigned int cb_v_break; /* Mass break counter on ->volume */
unsigned int cb_break; /* Break counter on vnode */
diff --git a/fs/afs/rotate.c b/fs/afs/rotate.c
index d83f13c44b92..79e1a5f6701b 100644
--- a/fs/afs/rotate.c
+++ b/fs/afs/rotate.c
@@ -374,6 +374,7 @@ bool afs_select_fileserver(struct afs_operation *op)
if (vnode->cb_server != server) {
vnode->cb_server = server;
vnode->cb_s_break = server->cb_s_break;
+ vnode->cb_fs_s_break = atomic_read(&server->cell->fs_s_break);
vnode->cb_v_break = vnode->volume->cb_v_break;
clear_bit(AFS_VNODE_CB_PROMISED, &vnode->flags);
}
diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h
index 9f73ed2cf061..bca73e8c8cde 100644
--- a/include/trace/events/afs.h
+++ b/include/trace/events/afs.h
@@ -306,11 +306,13 @@ enum afs_flock_operation {

enum afs_cb_break_reason {
afs_cb_break_no_break,
+ afs_cb_break_no_promise,
afs_cb_break_for_callback,
afs_cb_break_for_deleted,
afs_cb_break_for_lapsed,
+ afs_cb_break_for_s_reinit,
afs_cb_break_for_unlink,
- afs_cb_break_for_vsbreak,
+ afs_cb_break_for_v_break,
afs_cb_break_for_volume_callback,
afs_cb_break_for_zap,
};
@@ -602,11 +604,13 @@ enum afs_cb_break_reason {

#define afs_cb_break_reasons \
EM(afs_cb_break_no_break, "no-break") \
+ EM(afs_cb_break_no_promise, "no-promise") \
EM(afs_cb_break_for_callback, "break-cb") \
EM(afs_cb_break_for_deleted, "break-del") \
EM(afs_cb_break_for_lapsed, "break-lapsed") \
+ EM(afs_cb_break_for_s_reinit, "s-reinit") \
EM(afs_cb_break_for_unlink, "break-unlink") \
- EM(afs_cb_break_for_vsbreak, "break-vs") \
+ EM(afs_cb_break_for_v_break, "break-v") \
EM(afs_cb_break_for_volume_callback, "break-v-cb") \
E_(afs_cb_break_for_zap, "break-zap")



2021-09-09 07:53:58

by markus.suvanto

[permalink] [raw]
Subject: Re: [PATCH 0/6] afs: Fixes for 3rd party-induced data corruption

ke, 2021-09-08 kello 16:57 +0100, David Howells kirjoitti:
> Here are some fixes for AFS that can cause data corruption due to
> interaction with another client modifying data cached locally[1].
>
> (1) When d_revalidating a dentry, don't look at the inode to which it
> points. Only check the directory to which the dentry belongs. This
> was confusing things and causing the silly-rename cleanup code to
> remove the file now at the dentry of a file that got deleted.
>
> (2) Fix mmap data coherency. When a callback break is received that
> relates to a file that we have cached, the data content may have been
> changed (there are other reasons, such as the user's rights having
> been changed). However, we're checking it lazily, only on entry to
> the kernel, which doesn't happen if we have a writeable shared mapped
> page on that file.
>
> We make the kernel keep track of mmapped files and clear all PTEs
> mapping to that file as soon as the callback comes in by calling
> unmap_mapping_pages() (we don't necessarily want to zap the
> pagecache). This causes the kernel to be reentered when userspace
> tries to access the mmapped address range again - and at that point we
> can query the server and, if we need to, zap the page cache.
>
> Ideally, I would check each file at the point of notification, but
> that involves poking the server[*] - which is holding up final closure
> of the change it is making, waiting for all the clients it notified to
> reply. This could then deadlock against the server. Further,
> invalidating the pagecache might call ->launder_page(), which would
> try to write to the file, which would definitely deadlock. (AFS
> doesn't lease file access).
>
> [*] Checking to see if the file content has changed is a matter of
> comparing the current data version number, but we have to ask the
> server for that. We also need to get a new callback promise and
> we need to poke the server for that too.
>
> (3) Add some more points at which the inode is validated, since we're
> doing it lazily, notably in ->read_iter() and ->page_mkwrite(), but
> also when performing some directory operations.
>
> Ideally, checking in ->read_iter() would be done in some derivation of
> filemap_read(). If we're going to call the server to read the file,
> then we get the file status fetch as part of that.
>
> (4) The above is now causing us to make a lot more calls to afs_validate()
> to check the inode - and afs_validate() takes the RCU read lock each
> time to make a quick check (ie. afs_check_validity()). This is
> entirely for the purpose of checking cb_s_break to see if the server
> we're using reinitialised its list of callbacks - however this isn't a
> very common operation, so most of the time we're taking this
> needlessly.
>
> Add a new cell-wide counter to count the number of reinitialisations
> done by any server and check that - and only if that changes, take the
> RCU read lock and check the server list (the server list may change,
> but the cell a file is part of won't).
>
> (5) Don't update vnode->cb_s_break and ->cb_v_break inside the validity
> checking loop. The cb_lock is done with read_seqretry, so we might go
> round the loop a second time after resetting those values - and that
> could cause someone else checking validity to miss something (I
> think).
>
> Also included are patches for fixes for some bugs encountered whilst
> debugging this.
>
> (6) Fix a leak of afs_read objects and fix a leak of keys hidden by that.
>
> (7) Fix a leak of pages that couldn't be added to extend a writeback.
>
>
> The patches can be found here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=afs-fixes
>
> David
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217 [1]
>
> ---
> David Howells (6):
> afs: Fix missing put on afs_read objects and missing get on the key therein
> afs: Fix page leak
> afs: Add missing vnode validation checks
> afs: Fix incorrect triggering of sillyrename on 3rd-party invalidation
> afs: Fix mmap coherency vs 3rd-party changes
> afs: Try to avoid taking RCU read lock when checking vnode validity
>
>
> fs/afs/callback.c | 44 ++++++++++++++++++-
> fs/afs/cell.c | 2 +
> fs/afs/dir.c | 57 ++++++++----------------
> fs/afs/file.c | 83 ++++++++++++++++++++++++++++++++++-
> fs/afs/inode.c | 88 +++++++++++++++++++-------------------
> fs/afs/internal.h | 10 +++++
> fs/afs/rotate.c | 1 +
> fs/afs/server.c | 2 +
> fs/afs/super.c | 1 +
> fs/afs/write.c | 27 ++++++++++--
> include/trace/events/afs.h | 8 +++-
> mm/memory.c | 1 +
> 12 files changed, 230 insertions(+), 94 deletions(-)
>
>



Tested-By: Markus Suvanto <[email protected]>



2021-09-10 21:12:37

by Marc Dionne

[permalink] [raw]
Subject: Re: [PATCH 1/6] afs: Fix missing put on afs_read objects and missing get on the key therein

On Wed, Sep 8, 2021 at 12:58 PM David Howells <[email protected]> wrote:
>
> The afs_read objects created by afs_req_issue_op() get leaked because
> afs_alloc_read() returns a ref and then afs_fetch_data() gets its own ref
> which is released when the operation completes, but the initial ref is
> never released.
>
> Fix this by discarding the initial ref at the end of afs_req_issue_op().
>
> This leak also covered another bug whereby a ref isn't got on the key
> attached to the read record by afs_req_issue_op(). This isn't a problem as
> long as the afs_read req never goes away...
>
> Fix this by calling key_get() in afs_req_issue_op().
>
> This was found by the generic/074 test. It leaks a bunch of kmalloc-192
> objects each time it is run, which can be observed by watching
> /proc/slabinfo.
>
> Fixes: f7605fa869cf ("afs: Fix leak of afs_read objects")
> Reported-by: Marc Dionne <[email protected]>
> Signed-off-by: David Howells <[email protected]>
> cc: [email protected]
> Link: https://lore.kernel.org/r/163010394740.3035676.8516846193899793357.stgit@warthog.procyon.org.uk/
> ---
>
> fs/afs/file.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index db035ae2a134..6688fff14b0b 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -295,7 +295,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest *subreq)
> fsreq->subreq = subreq;
> fsreq->pos = subreq->start + subreq->transferred;
> fsreq->len = subreq->len - subreq->transferred;
> - fsreq->key = subreq->rreq->netfs_priv;
> + fsreq->key = key_get(subreq->rreq->netfs_priv);
> fsreq->vnode = vnode;
> fsreq->iter = &fsreq->def_iter;
>
> @@ -304,6 +304,7 @@ static void afs_req_issue_op(struct netfs_read_subrequest *subreq)
> fsreq->pos, fsreq->len);
>
> afs_fetch_data(fsreq->vnode, fsreq);
> + afs_put_read(fsreq);
> }
>
> static int afs_symlink_readpage(struct page *page)

Tested that it prevents the leak of about 49K kmalloc-192 objects for
a run of generic/074.

Reviewed-and-tested-by: Marc Dionne <[email protected]>

Marc

2021-09-10 21:20:50

by David Howells

[permalink] [raw]
Subject: [PATCH 7/6] afs: Fix corruption in reads at fpos 2G-4G from an OpenAFS server


AFS-3 has two data fetch RPC variants, FS.FetchData and FS.FetchData64, and
Linux's afs client switches between them when talking to a non-YFS server
if the read size, the file position or the sum of the two have the upper 32
bits set of the 64-bit value.

This is a problem, however, since the file position and length fields of
FS.FetchData are *signed* 32-bit values.

Fix this by capturing the capability bits obtained from the fileserver when
it's sent an FS.GetCapabilities RPC, rather than just discarding them, and
then picking out the VICED_CAPABILITY_64BITFILES flag. This can then be
used to decide whether to use FS.FetchData or FS.FetchData64 - and also
FS.StoreData or FS.StoreData64 - rather than using upper_32_bits() to
switch on the parameter values.

This capabilities flag could also be used to limit the maximum size of the
file, but all servers must be checked for that.

Note that the issue does not exist with FS.StoreData - that uses *unsigned*
32-bit values. It's also not a problem with Auristor servers as its
YFS.FetchData64 op uses unsigned 64-bit values.

This can be tested by cloning a git repo through an OpenAFS client to an
OpenAFS server and then doing "git status" on it from a Linux afs
client[1]. Provided the clone has a pack file that's in the 2G-4G range,
the git status will show errors like:

error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index
error: packfile .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack does not match index

This can be observed in the server's FileLog with something like the
following appearing:

Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001
Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866
...
Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5

Note the file position of 18446744071815340032. This is the requested file
position sign-extended.

Fixes: b9b1f8d5930a ("AFS: write support fixes")
Reported-by: Markus Suvanto <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Marc Dionne <[email protected]>
Tested-by: Markus Suvanto <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c9 [1]
---
fs/afs/fs_probe.c | 8 +++++++-
fs/afs/fsclient.c | 31 ++++++++++++++++++++-----------
fs/afs/internal.h | 1 +
fs/afs/protocol_afs.h | 15 +++++++++++++++
fs/afs/protocol_yfs.h | 6 ++++++
5 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/fs/afs/fs_probe.c b/fs/afs/fs_probe.c
index e7e98ad63a91..c0031a3ab42f 100644
--- a/fs/afs/fs_probe.c
+++ b/fs/afs/fs_probe.c
@@ -9,6 +9,7 @@
#include <linux/slab.h>
#include "afs_fs.h"
#include "internal.h"
+#include "protocol_afs.h"
#include "protocol_yfs.h"

static unsigned int afs_fs_probe_fast_poll_interval = 30 * HZ;
@@ -102,7 +103,7 @@ void afs_fileserver_probe_result(struct afs_call *call)
struct afs_addr_list *alist = call->alist;
struct afs_server *server = call->server;
unsigned int index = call->addr_ix;
- unsigned int rtt_us = 0;
+ unsigned int rtt_us = 0, cap0;
int ret = call->error;

_enter("%pU,%u", &server->uuid, index);
@@ -159,6 +160,11 @@ void afs_fileserver_probe_result(struct afs_call *call)
clear_bit(AFS_SERVER_FL_IS_YFS, &server->flags);
alist->addrs[index].srx_service = call->service_id;
}
+ cap0 = ntohl(call->tmp);
+ if (cap0 & AFS3_VICED_CAPABILITY_64BITFILES)
+ set_bit(AFS_SERVER_FL_HAS_FS64, &server->flags);
+ else
+ clear_bit(AFS_SERVER_FL_HAS_FS64, &server->flags);
}

if (rxrpc_kernel_get_srtt(call->net->socket, call->rxcall, &rtt_us) &&
diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index dd3f45d906d2..4943413d9c5f 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -456,9 +456,7 @@ void afs_fs_fetch_data(struct afs_operation *op)
struct afs_read *req = op->fetch.req;
__be32 *bp;

- if (upper_32_bits(req->pos) ||
- upper_32_bits(req->len) ||
- upper_32_bits(req->pos + req->len))
+ if (test_bit(AFS_SERVER_FL_HAS_FS64, &op->server->flags))
return afs_fs_fetch_data64(op);

_enter("");
@@ -1113,9 +1111,7 @@ void afs_fs_store_data(struct afs_operation *op)
(unsigned long long)op->store.pos,
(unsigned long long)op->store.i_size);

- if (upper_32_bits(op->store.pos) ||
- upper_32_bits(op->store.size) ||
- upper_32_bits(op->store.i_size))
+ if (test_bit(AFS_SERVER_FL_HAS_FS64, &op->server->flags))
return afs_fs_store_data64(op);

call = afs_alloc_flat_call(op->net, &afs_RXFSStoreData,
@@ -1229,7 +1225,7 @@ static void afs_fs_setattr_size(struct afs_operation *op)
key_serial(op->key), vp->fid.vid, vp->fid.vnode);

ASSERT(attr->ia_valid & ATTR_SIZE);
- if (upper_32_bits(attr->ia_size))
+ if (test_bit(AFS_SERVER_FL_HAS_FS64, &op->server->flags))
return afs_fs_setattr_size64(op);

call = afs_alloc_flat_call(op->net, &afs_RXFSStoreData_as_Status,
@@ -1657,20 +1653,33 @@ static int afs_deliver_fs_get_capabilities(struct afs_call *call)
return ret;

count = ntohl(call->tmp);
-
call->count = count;
call->count2 = count;
- afs_extract_discard(call, count * sizeof(__be32));
+ if (count == 0) {
+ call->unmarshall = 4;
+ call->tmp = 0;
+ break;
+ }
+
+ /* Extract the first word of the capabilities to call->tmp */
+ afs_extract_to_tmp(call);
call->unmarshall++;
fallthrough;

- /* Extract capabilities words */
case 2:
ret = afs_extract_data(call, false);
if (ret < 0)
return ret;

- /* TODO: Examine capabilities */
+ afs_extract_discard(call, (count - 1) * sizeof(__be32));
+ call->unmarshall++;
+ fallthrough;
+
+ /* Extract remaining capabilities words */
+ case 3:
+ ret = afs_extract_data(call, false);
+ if (ret < 0)
+ return ret;

call->unmarshall++;
break;
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index c97618855b46..b0fe27ae60db 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -520,6 +520,7 @@ struct afs_server {
#define AFS_SERVER_FL_IS_YFS 16 /* Server is YFS not AFS */
#define AFS_SERVER_FL_NO_IBULK 17 /* Fileserver doesn't support FS.InlineBulkStatus */
#define AFS_SERVER_FL_NO_RM2 18 /* Fileserver doesn't support YFS.RemoveFile2 */
+#define AFS_SERVER_FL_HAS_FS64 19 /* Fileserver supports FS.{Fetch,Store}Data64 */
atomic_t ref; /* Object refcount */
atomic_t active; /* Active user count */
u32 addr_version; /* Address list version */
diff --git a/fs/afs/protocol_afs.h b/fs/afs/protocol_afs.h
new file mode 100644
index 000000000000..0c39358c8b70
--- /dev/null
+++ b/fs/afs/protocol_afs.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* AFS protocol bits
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+
+#define AFSCAPABILITIESMAX 196 /* Maximum number of words in a capability set */
+
+/* AFS3 Fileserver capabilities word 0 */
+#define AFS3_VICED_CAPABILITY_ERRORTRANS 0x0001 /* Uses UAE errors */
+#define AFS3_VICED_CAPABILITY_64BITFILES 0x0002 /* FetchData64 & StoreData64 supported */
+#define AFS3_VICED_CAPABILITY_WRITELOCKACL 0x0004 /* Can lock a file even without lock perm */
+#define AFS3_VICED_CAPABILITY_SANEACLS 0x0008 /* ACLs reviewed for sanity - don't use */
diff --git a/fs/afs/protocol_yfs.h b/fs/afs/protocol_yfs.h
index b5bd03b1d3c7..e4cd89c44c46 100644
--- a/fs/afs/protocol_yfs.h
+++ b/fs/afs/protocol_yfs.h
@@ -168,3 +168,9 @@ enum yfs_lock_type {
yfs_LockMandatoryWrite = 0x101,
yfs_LockMandatoryExtend = 0x102,
};
+
+/* RXYFS Viced Capability Flags */
+#define YFS_VICED_CAPABILITY_ERRORTRANS 0x0001 /* Deprecated v0.195 */
+#define YFS_VICED_CAPABILITY_64BITFILES 0x0002 /* Deprecated v0.195 */
+#define YFS_VICED_CAPABILITY_WRITELOCKACL 0x0004 /* Can lock a file even without lock perm */
+#define YFS_VICED_CAPABILITY_SANEACLS 0x0008 /* Deprecated v0.195 */