2021-12-27 03:12:11

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 00/15] Some bugfixs for ubi/ubifs

v1->v2:
1. Add new fix for ubifs, "ubifs: Fix to add refcount once page is set
private"
2. Update "ubifs: Rename whiteout atomically":
1) Move inode mode in create_whiteout()
2) Don't check O_SYNC for whiteout, because it inherits from the old_dir
3) Remove useless 'synced_i_size ' assignment for whiteout, because
it's always be zero.
4) Remove unused variable 'ui' in create_whiteout()
3. Update "ubifs: setflags: Make dirtied_ino_d 8 bytes aligned":
1) Align dirtied_ino_d with 8 bytes.

v2->v3:
1. Update "ubifs: Rename whiteout atomically":
1) Fix misspelling 'have already check the old dir inode' ->
'have already checked the old dir inode'
2) Fix misspelling "Whiteout don't have non-zero size" ->
"Whiteout have non-zero size"
2. Update "ubifs: Fix to add refcount once page is set private"
1) Fix commit message to explain the root cause.
3. Update "ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
fm->used_blocks>=2"
1) Add fastmap used pebs into 'ai' in for-loop, rather than in
two-steps(Add pebs [pnum<UBI_FM_MAX_START] then add pebs
[pnum>=UBI_FM_MAX_START] into 'ai').

v3->v4:
1. Update "ubifs: Add missing iput if do_tmpfile() failed in rename whiteout":
1) Move whiteout cleanup into do_tmpfile() according to Sascha's advice
2. Add new fix for ubifs, "ubifs: ubifs_writepage: Mark page dirty after
writing inode failed"

v4->v5:
1. Add new fix for ubifs, "ubifs: ubifs_releasepage: Remove ubifs_assert(0)
to valid this process"

v5->v6:
1. Add new fix for ubi: "ubi: fastmap: Fix high cpu usage of ubi_bgt by
making sure wl_pool not empty"

Zhihao Cheng (15):
ubifs: rename_whiteout: Fix double free for whiteout_ui->data
ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode
comment
ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
ubifs: Rename whiteout atomically
ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
ubifs: Rectify space amount budget for mkdir/tmpfile operations
ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
ubifs: Fix to add refcount once page is set private
ubi: fastmap: Return error code if memory allocation fails in
add_aeb()
ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
fm->used_blocks>=2
ubifs: ubifs_writepage: Mark page dirty after writing inode failed
ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process
ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not
empty

drivers/mtd/ubi/fastmap-wl.c | 5 +-
drivers/mtd/ubi/fastmap.c | 63 ++++------
drivers/mtd/ubi/wl.h | 11 +-
fs/ubifs/dir.c | 235 +++++++++++++++++++++--------------
fs/ubifs/file.c | 45 ++++---
fs/ubifs/io.c | 34 ++++-
fs/ubifs/ioctl.c | 2 +-
fs/ubifs/journal.c | 52 ++++++--
fs/ubifs/ubifs.h | 2 +-
9 files changed, 284 insertions(+), 165 deletions(-)

--
2.31.1



2021-12-27 03:12:19

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 04/15] ubifs: Add missing iput if do_tmpfile() failed in rename whiteout

whiteout inode should be put when do_tmpfile() failed if inode has been
initialized. Otherwise we will get following warning during umount:
UBIFS error (ubi0:0 pid 1494): ubifs_assert_failed [ubifs]: UBIFS
assert failed: c->bi.dd_growth == 0, in fs/ubifs/super.c:1930
VFS: Busy inodes after unmount of ubifs. Self-destruct in 5 seconds.

Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Signed-off-by: Zhihao Cheng <[email protected]>
Suggested-by: Sascha Hauer <[email protected]>
---
fs/ubifs/dir.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 2735ad1affed..2cbc5f05f671 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -432,6 +432,8 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
make_bad_inode(inode);
if (!instantiated)
iput(inode);
+ else if (whiteout)
+ iput(*whiteout);
out_budg:
ubifs_release_budget(c, &req);
if (!instantiated)
--
2.31.1


2021-12-27 03:12:19

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 03/15] ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode comment

Since 9ec64962afb1702f75b("ubifs: Implement RENAME_EXCHANGE") and
9e0a1fff8db56eaaebb("ubifs: Implement RENAME_WHITEOUT") are applied,
ubifs_rename locks and changes 4 ubifs inodes, correct the comment
for ui_mutex in ubifs_inode.

Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/ubifs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index c38066ce9ab0..972e41daff01 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -372,7 +372,7 @@ struct ubifs_gced_idx_leb {
* @ui_mutex exists for two main reasons. At first it prevents inodes from
* being written back while UBIFS changing them, being in the middle of an VFS
* operation. This way UBIFS makes sure the inode fields are consistent. For
- * example, in 'ubifs_rename()' we change 3 inodes simultaneously, and
+ * example, in 'ubifs_rename()' we change 4 inodes simultaneously, and
* write-back must not write any of them before we have finished.
*
* The second reason is budgeting - UBIFS has to budget all operations. If an
--
2.31.1


2021-12-27 03:12:21

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 06/15] ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work

'ui->dirty' is not protected by 'ui_mutex' in function do_tmpfile() which
may race with ubifs_write_inode[wb_workfn] to access/update 'ui->dirty',
finally dirty space is released twice.

open(O_TMPFILE) wb_workfn
do_tmpfile
ubifs_budget_space(ino_req = { .dirtied_ino = 1})
d_tmpfile // mark inode(tmpfile) dirty
ubifs_jnl_update // without holding tmpfile's ui_mutex
mark_inode_clean(ui)
if (ui->dirty)
ubifs_release_dirty_inode_budget(ui) // release first time
ubifs_write_inode
mutex_lock(&ui->ui_mutex)
ubifs_release_dirty_inode_budget(ui)
// release second time
mutex_unlock(&ui->ui_mutex)
ui->dirty = 0

Run generic/476 can reproduce following message easily
(See reproducer in [Link]):

UBIFS error (ubi0:0 pid 2578): ubifs_assert_failed [ubifs]: UBIFS assert
failed: c->bi.dd_growth >= 0, in fs/ubifs/budget.c:554
UBIFS warning (ubi0:0 pid 2578): ubifs_ro_mode [ubifs]: switched to
read-only mode, error -22
Workqueue: writeback wb_workfn (flush-ubifs_0_0)
Call Trace:
ubifs_ro_mode+0x54/0x60 [ubifs]
ubifs_assert_failed+0x4b/0x80 [ubifs]
ubifs_release_budget+0x468/0x5a0 [ubifs]
ubifs_release_dirty_inode_budget+0x53/0x80 [ubifs]
ubifs_write_inode+0x121/0x1f0 [ubifs]
...
wb_workfn+0x283/0x7b0

Fix it by holding tmpfile ubifs inode lock during ubifs_jnl_update().
Similar problem exists in whiteout renaming, but previous fix("ubifs:
Rename whiteout atomically") has solved the problem.

Fixes: 474b93704f32163 ("ubifs: Implement O_TMPFILE")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214765
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/dir.c | 60 +++++++++++++++++++++++++-------------------------
1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index deaf2d5dba5b..ebdc9aa04cbb 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -397,6 +397,32 @@ static struct inode *create_whiteout(struct inode *dir, struct dentry *dentry)
return ERR_PTR(err);
}

+/**
+ * lock_2_inodes - a wrapper for locking two UBIFS inodes.
+ * @inode1: first inode
+ * @inode2: second inode
+ *
+ * We do not implement any tricks to guarantee strict lock ordering, because
+ * VFS has already done it for us on the @i_mutex. So this is just a simple
+ * wrapper function.
+ */
+static void lock_2_inodes(struct inode *inode1, struct inode *inode2)
+{
+ mutex_lock_nested(&ubifs_inode(inode1)->ui_mutex, WB_MUTEX_1);
+ mutex_lock_nested(&ubifs_inode(inode2)->ui_mutex, WB_MUTEX_2);
+}
+
+/**
+ * unlock_2_inodes - a wrapper for unlocking two UBIFS inodes.
+ * @inode1: first inode
+ * @inode2: second inode
+ */
+static void unlock_2_inodes(struct inode *inode1, struct inode *inode2)
+{
+ mutex_unlock(&ubifs_inode(inode2)->ui_mutex);
+ mutex_unlock(&ubifs_inode(inode1)->ui_mutex);
+}
+
static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
struct dentry *dentry, umode_t mode)
{
@@ -404,7 +430,7 @@ static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
struct ubifs_info *c = dir->i_sb->s_fs_info;
struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1};
struct ubifs_budget_req ino_req = { .dirtied_ino = 1 };
- struct ubifs_inode *ui, *dir_ui = ubifs_inode(dir);
+ struct ubifs_inode *ui;
int err, instantiated = 0;
struct fscrypt_name nm;

@@ -452,18 +478,18 @@ static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
instantiated = 1;
mutex_unlock(&ui->ui_mutex);

- mutex_lock(&dir_ui->ui_mutex);
+ lock_2_inodes(dir, inode);
err = ubifs_jnl_update(c, dir, &nm, inode, 1, 0);
if (err)
goto out_cancel;
- mutex_unlock(&dir_ui->ui_mutex);
+ unlock_2_inodes(dir, inode);

ubifs_release_budget(c, &req);

return 0;

out_cancel:
- mutex_unlock(&dir_ui->ui_mutex);
+ unlock_2_inodes(dir, inode);
out_inode:
make_bad_inode(inode);
if (!instantiated)
@@ -690,32 +716,6 @@ static int ubifs_dir_release(struct inode *dir, struct file *file)
return 0;
}

-/**
- * lock_2_inodes - a wrapper for locking two UBIFS inodes.
- * @inode1: first inode
- * @inode2: second inode
- *
- * We do not implement any tricks to guarantee strict lock ordering, because
- * VFS has already done it for us on the @i_mutex. So this is just a simple
- * wrapper function.
- */
-static void lock_2_inodes(struct inode *inode1, struct inode *inode2)
-{
- mutex_lock_nested(&ubifs_inode(inode1)->ui_mutex, WB_MUTEX_1);
- mutex_lock_nested(&ubifs_inode(inode2)->ui_mutex, WB_MUTEX_2);
-}
-
-/**
- * unlock_2_inodes - a wrapper for unlocking two UBIFS inodes.
- * @inode1: first inode
- * @inode2: second inode
- */
-static void unlock_2_inodes(struct inode *inode1, struct inode *inode2)
-{
- mutex_unlock(&ubifs_inode(inode2)->ui_mutex);
- mutex_unlock(&ubifs_inode(inode1)->ui_mutex);
-}
-
static int ubifs_link(struct dentry *old_dentry, struct inode *dir,
struct dentry *dentry)
{
--
2.31.1


2021-12-27 03:12:19

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 05/15] ubifs: Rename whiteout atomically

Currently, rename whiteout has 3 steps:
1. create tmpfile(which associates old dentry to tmpfile inode) for
whiteout, and store tmpfile to disk
2. link whiteout, associate whiteout inode to old dentry agagin and
store old dentry, old inode, new dentry on disk
3. writeback dirty whiteout inode to disk

Suddenly power-cut or error occurring(eg. ENOSPC returned by budget,
memory allocation failure) during above steps may cause kinds of problems:
Problem 1: ENOSPC returned by whiteout space budget (before step 2),
old dentry will disappear after rename syscall, whiteout file
cannot be found either.

ls dir // we get file, whiteout
rename(dir/file, dir/whiteout, REANME_WHITEOUT)
ENOSPC = ubifs_budget_space(&wht_req) // return
ls dir // empty (no file, no whiteout)
Problem 2: Power-cut happens before step 3, whiteout inode with 'nlink=1'
is not stored on disk, whiteout dentry(old dentry) is written
on disk, whiteout file is lost on next mount (We get "dead
directory entry" after executing 'ls -l' on whiteout file).

Now, we use following 3 steps to finish rename whiteout:
1. create an in-mem inode with 'nlink = 1' as whiteout
2. ubifs_jnl_rename (Write on disk to finish associating old dentry to
whiteout inode, associating new dentry with old inode)
3. iput(whiteout)

Rely writing in-mem inode on disk by ubifs_jnl_rename() to finish rename
whiteout, which avoids middle disk state caused by suddenly power-cut
and error occurring.

Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT")
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/dir.c | 144 +++++++++++++++++++++++++++++----------------
fs/ubifs/journal.c | 52 +++++++++++++---
2 files changed, 136 insertions(+), 60 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 2cbc5f05f671..deaf2d5dba5b 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -349,8 +349,56 @@ static int ubifs_create(struct user_namespace *mnt_userns, struct inode *dir,
return err;
}

-static int do_tmpfile(struct inode *dir, struct dentry *dentry,
- umode_t mode, struct inode **whiteout)
+static struct inode *create_whiteout(struct inode *dir, struct dentry *dentry)
+{
+ int err;
+ umode_t mode = S_IFCHR | WHITEOUT_MODE;
+ struct inode *inode;
+ struct ubifs_info *c = dir->i_sb->s_fs_info;
+ struct fscrypt_name nm;
+
+ /*
+ * Create an inode('nlink = 1') for whiteout without updating journal,
+ * let ubifs_jnl_rename() store it on flash to complete rename whiteout
+ * atomically.
+ */
+
+ dbg_gen("dent '%pd', mode %#hx in dir ino %lu",
+ dentry, mode, dir->i_ino);
+
+ err = fscrypt_setup_filename(dir, &dentry->d_name, 0, &nm);
+ if (err)
+ return ERR_PTR(err);
+
+ inode = ubifs_new_inode(c, dir, mode);
+ if (IS_ERR(inode)) {
+ err = PTR_ERR(inode);
+ goto out_free;
+ }
+
+ init_special_inode(inode, inode->i_mode, WHITEOUT_DEV);
+ ubifs_assert(c, inode->i_op == &ubifs_file_inode_operations);
+
+ err = ubifs_init_security(dir, inode, &dentry->d_name);
+ if (err)
+ goto out_inode;
+
+ /* The dir size is updated by do_rename. */
+ insert_inode_hash(inode);
+
+ return inode;
+
+out_inode:
+ make_bad_inode(inode);
+ iput(inode);
+out_free:
+ fscrypt_free_filename(&nm);
+ ubifs_err(c, "cannot create whiteout file, error %d", err);
+ return ERR_PTR(err);
+}
+
+static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
+ struct dentry *dentry, umode_t mode)
{
struct inode *inode;
struct ubifs_info *c = dir->i_sb->s_fs_info;
@@ -392,25 +440,13 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
}
ui = ubifs_inode(inode);

- if (whiteout) {
- init_special_inode(inode, inode->i_mode, WHITEOUT_DEV);
- ubifs_assert(c, inode->i_op == &ubifs_file_inode_operations);
- }
-
err = ubifs_init_security(dir, inode, &dentry->d_name);
if (err)
goto out_inode;

mutex_lock(&ui->ui_mutex);
insert_inode_hash(inode);
-
- if (whiteout) {
- mark_inode_dirty(inode);
- drop_nlink(inode);
- *whiteout = inode;
- } else {
- d_tmpfile(dentry, inode);
- }
+ d_tmpfile(dentry, inode);
ubifs_assert(c, ui->dirty);

instantiated = 1;
@@ -432,8 +468,6 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
make_bad_inode(inode);
if (!instantiated)
iput(inode);
- else if (whiteout)
- iput(*whiteout);
out_budg:
ubifs_release_budget(c, &req);
if (!instantiated)
@@ -443,12 +477,6 @@ static int do_tmpfile(struct inode *dir, struct dentry *dentry,
return err;
}

-static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
- struct dentry *dentry, umode_t mode)
-{
- return do_tmpfile(dir, dentry, mode, NULL);
-}
-
/**
* vfs_dent_type - get VFS directory entry type.
* @type: UBIFS directory entry type
@@ -1266,17 +1294,19 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
.dirtied_ino = 3 };
struct ubifs_budget_req ino_req = { .dirtied_ino = 1,
.dirtied_ino_d = ALIGN(old_inode_ui->data_len, 8) };
+ struct ubifs_budget_req wht_req;
struct timespec64 time;
unsigned int saved_nlink;
struct fscrypt_name old_nm, new_nm;

/*
- * Budget request settings: deletion direntry, new direntry, removing
- * the old inode, and changing old and new parent directory inodes.
+ * Budget request settings:
+ * req: deletion direntry, new direntry, removing the old inode,
+ * and changing old and new parent directory inodes.
*
- * However, this operation also marks the target inode as dirty and
- * does not write it, so we allocate budget for the target inode
- * separately.
+ * wht_req: new whiteout inode for RENAME_WHITEOUT.
+ *
+ * ino_req: marks the target inode as dirty and does not write it.
*/

dbg_gen("dent '%pd' ino %lu in dir ino %lu to dent '%pd' in dir ino %lu flags 0x%x",
@@ -1326,7 +1356,6 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,

if (flags & RENAME_WHITEOUT) {
union ubifs_dev_desc *dev = NULL;
- struct ubifs_budget_req wht_req;

dev = kmalloc(sizeof(union ubifs_dev_desc), GFP_NOFS);
if (!dev) {
@@ -1334,24 +1363,26 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
goto out_release;
}

- err = do_tmpfile(old_dir, old_dentry, S_IFCHR | WHITEOUT_MODE, &whiteout);
- if (err) {
+ /*
+ * The whiteout inode without dentry is pinned in memory,
+ * umount won't happen during rename process because we
+ * got parent dentry.
+ */
+ whiteout = create_whiteout(old_dir, old_dentry);
+ if (IS_ERR(whiteout)) {
+ err = PTR_ERR(whiteout);
kfree(dev);
goto out_release;
}

- spin_lock(&whiteout->i_lock);
- whiteout->i_state |= I_LINKABLE;
- spin_unlock(&whiteout->i_lock);
-
whiteout_ui = ubifs_inode(whiteout);
whiteout_ui->data = dev;
whiteout_ui->data_len = ubifs_encode_dev(dev, MKDEV(0, 0));
ubifs_assert(c, !whiteout_ui->dirty);

memset(&wht_req, 0, sizeof(struct ubifs_budget_req));
- wht_req.dirtied_ino = 1;
- wht_req.dirtied_ino_d = ALIGN(whiteout_ui->data_len, 8);
+ wht_req.new_ino = 1;
+ wht_req.new_ino_d = ALIGN(whiteout_ui->data_len, 8);
/*
* To avoid deadlock between space budget (holds ui_mutex and
* waits wb work) and writeback work(waits ui_mutex), do space
@@ -1359,6 +1390,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
*/
err = ubifs_budget_space(c, &wht_req);
if (err) {
+ /*
+ * Whiteout inode can not be written on flash by
+ * ubifs_jnl_write_inode(), because it's neither
+ * dirty nor zero-nlink.
+ */
iput(whiteout);
goto out_release;
}
@@ -1433,17 +1469,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
sync = IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir);
if (unlink && IS_SYNC(new_inode))
sync = 1;
- }
-
- if (whiteout) {
- inc_nlink(whiteout);
- mark_inode_dirty(whiteout);
-
- spin_lock(&whiteout->i_lock);
- whiteout->i_state &= ~I_LINKABLE;
- spin_unlock(&whiteout->i_lock);
-
- iput(whiteout);
+ /*
+ * S_SYNC flag of whiteout inherits from the old_dir, and we
+ * have already checked the old dir inode. So there is no need
+ * to check whiteout.
+ */
}

err = ubifs_jnl_rename(c, old_dir, old_inode, &old_nm, new_dir,
@@ -1454,6 +1484,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
unlock_4_inodes(old_dir, new_dir, new_inode, whiteout);
ubifs_release_budget(c, &req);

+ if (whiteout) {
+ ubifs_release_budget(c, &wht_req);
+ iput(whiteout);
+ }
+
mutex_lock(&old_inode_ui->ui_mutex);
release = old_inode_ui->dirty;
mark_inode_dirty_sync(old_inode);
@@ -1462,11 +1497,16 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
if (release)
ubifs_release_budget(c, &ino_req);
if (IS_SYNC(old_inode))
- err = old_inode->i_sb->s_op->write_inode(old_inode, NULL);
+ /*
+ * Rename finished here. Although old inode cannot be updated
+ * on flash, old ctime is not a big problem, don't return err
+ * code to userspace.
+ */
+ old_inode->i_sb->s_op->write_inode(old_inode, NULL);

fscrypt_free_filename(&old_nm);
fscrypt_free_filename(&new_nm);
- return err;
+ return 0;

out_cancel:
if (unlink) {
@@ -1487,11 +1527,11 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
inc_nlink(old_dir);
}
}
+ unlock_4_inodes(old_dir, new_dir, new_inode, whiteout);
if (whiteout) {
- drop_nlink(whiteout);
+ ubifs_release_budget(c, &wht_req);
iput(whiteout);
}
- unlock_4_inodes(old_dir, new_dir, new_inode, whiteout);
out_release:
ubifs_release_budget(c, &ino_req);
ubifs_release_budget(c, &req);
diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
index 8ea680dba61e..75dab0ae3939 100644
--- a/fs/ubifs/journal.c
+++ b/fs/ubifs/journal.c
@@ -1207,9 +1207,9 @@ int ubifs_jnl_xrename(struct ubifs_info *c, const struct inode *fst_dir,
* @sync: non-zero if the write-buffer has to be synchronized
*
* This function implements the re-name operation which may involve writing up
- * to 4 inodes and 2 directory entries. It marks the written inodes as clean
- * and returns zero on success. In case of failure, a negative error code is
- * returned.
+ * to 4 inodes(new inode, whiteout inode, old and new parent directory inodes)
+ * and 2 directory entries. It marks the written inodes as clean and returns
+ * zero on success. In case of failure, a negative error code is returned.
*/
int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
const struct inode *old_inode,
@@ -1222,14 +1222,15 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
void *p;
union ubifs_key key;
struct ubifs_dent_node *dent, *dent2;
- int err, dlen1, dlen2, ilen, lnum, offs, len, orphan_added = 0;
+ int err, dlen1, dlen2, ilen, wlen, lnum, offs, len, orphan_added = 0;
int aligned_dlen1, aligned_dlen2, plen = UBIFS_INO_NODE_SZ;
int last_reference = !!(new_inode && new_inode->i_nlink == 0);
int move = (old_dir != new_dir);
- struct ubifs_inode *new_ui;
+ struct ubifs_inode *new_ui, *whiteout_ui;
u8 hash_old_dir[UBIFS_HASH_ARR_SZ];
u8 hash_new_dir[UBIFS_HASH_ARR_SZ];
u8 hash_new_inode[UBIFS_HASH_ARR_SZ];
+ u8 hash_whiteout_inode[UBIFS_HASH_ARR_SZ];
u8 hash_dent1[UBIFS_HASH_ARR_SZ];
u8 hash_dent2[UBIFS_HASH_ARR_SZ];

@@ -1249,9 +1250,20 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
} else
ilen = 0;

+ if (whiteout) {
+ whiteout_ui = ubifs_inode(whiteout);
+ ubifs_assert(c, mutex_is_locked(&whiteout_ui->ui_mutex));
+ ubifs_assert(c, whiteout->i_nlink == 1);
+ ubifs_assert(c, !whiteout_ui->dirty);
+ wlen = UBIFS_INO_NODE_SZ;
+ wlen += whiteout_ui->data_len;
+ } else
+ wlen = 0;
+
aligned_dlen1 = ALIGN(dlen1, 8);
aligned_dlen2 = ALIGN(dlen2, 8);
- len = aligned_dlen1 + aligned_dlen2 + ALIGN(ilen, 8) + ALIGN(plen, 8);
+ len = aligned_dlen1 + aligned_dlen2 + ALIGN(ilen, 8) +
+ ALIGN(wlen, 8) + ALIGN(plen, 8);
if (move)
len += plen;

@@ -1313,6 +1325,15 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
p += ALIGN(ilen, 8);
}

+ if (whiteout) {
+ pack_inode(c, p, whiteout, 0);
+ err = ubifs_node_calc_hash(c, p, hash_whiteout_inode);
+ if (err)
+ goto out_release;
+
+ p += ALIGN(wlen, 8);
+ }
+
if (!move) {
pack_inode(c, p, old_dir, 1);
err = ubifs_node_calc_hash(c, p, hash_old_dir);
@@ -1352,6 +1373,9 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
if (new_inode)
ubifs_wbuf_add_ino_nolock(&c->jheads[BASEHD].wbuf,
new_inode->i_ino);
+ if (whiteout)
+ ubifs_wbuf_add_ino_nolock(&c->jheads[BASEHD].wbuf,
+ whiteout->i_ino);
}
release_head(c, BASEHD);

@@ -1368,8 +1392,6 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
err = ubifs_tnc_add_nm(c, &key, lnum, offs, dlen2, hash_dent2, old_nm);
if (err)
goto out_ro;
-
- ubifs_delete_orphan(c, whiteout->i_ino);
} else {
err = ubifs_add_dirt(c, lnum, dlen2);
if (err)
@@ -1390,6 +1412,15 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
offs += ALIGN(ilen, 8);
}

+ if (whiteout) {
+ ino_key_init(c, &key, whiteout->i_ino);
+ err = ubifs_tnc_add(c, &key, lnum, offs, wlen,
+ hash_whiteout_inode);
+ if (err)
+ goto out_ro;
+ offs += ALIGN(wlen, 8);
+ }
+
ino_key_init(c, &key, old_dir->i_ino);
err = ubifs_tnc_add(c, &key, lnum, offs, plen, hash_old_dir);
if (err)
@@ -1410,6 +1441,11 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
new_ui->synced_i_size = new_ui->ui_size;
spin_unlock(&new_ui->ui_lock);
}
+ /*
+ * No need to mark whiteout inode clean.
+ * Whiteout doesn't have non-zero size, no need to update
+ * synced_i_size for whiteout_ui.
+ */
mark_inode_clean(c, ubifs_inode(old_dir));
if (move)
mark_inode_clean(c, ubifs_inode(new_dir));
--
2.31.1


2021-12-27 03:12:22

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 07/15] ubifs: Rectify space amount budget for mkdir/tmpfile operations

UBIFS should make sure the flash has enough space to store dirty (Data
that is newer than disk) data (in memory), space budget is exactly
designed to do that. If space budget calculates less data than we need,
'make_reservation()' will do more work(return -ENOSPC if no free space
lelf, sometimes we can see "cannot reserve xxx bytes in jhead xxx, error
-28" in ubifs error messages) with ubifs inodes locked, which may effect
other syscalls.

A simple way to decide how much space do we need when make a budget:
See how much space is needed by 'make_reservation()' in ubifs_jnl_xxx()
function according to corresponding operation.

It's better to report ENOSPC in ubifs_budget_space(), as early as we can.

Fixes: 474b93704f32163 ("ubifs: Implement O_TMPFILE")
Fixes: 1e51764a3c2ac05 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/dir.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index ebdc9aa04cbb..7ae48f273feb 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -428,15 +428,18 @@ static int ubifs_tmpfile(struct user_namespace *mnt_userns, struct inode *dir,
{
struct inode *inode;
struct ubifs_info *c = dir->i_sb->s_fs_info;
- struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1};
+ struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1,
+ .dirtied_ino = 1};
struct ubifs_budget_req ino_req = { .dirtied_ino = 1 };
struct ubifs_inode *ui;
int err, instantiated = 0;
struct fscrypt_name nm;

/*
- * Budget request settings: new dirty inode, new direntry,
- * budget for dirtied inode will be released via writeback.
+ * Budget request settings: new inode, new direntry, changing the
+ * parent directory inode.
+ * Allocate budget separately for new dirtied inode, the budget will
+ * be released via writeback.
*/

dbg_gen("dent '%pd', mode %#hx in dir ino %lu",
@@ -979,7 +982,8 @@ static int ubifs_mkdir(struct user_namespace *mnt_userns, struct inode *dir,
struct ubifs_inode *dir_ui = ubifs_inode(dir);
struct ubifs_info *c = dir->i_sb->s_fs_info;
int err, sz_change;
- struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1 };
+ struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1,
+ .dirtied_ino = 1};
struct fscrypt_name nm;

/*
--
2.31.1


2021-12-27 03:12:26

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 08/15] ubifs: setflags: Make dirtied_ino_d 8 bytes aligned

Make 'ui->data_len' aligned with 8 bytes before it is assigned to
dirtied_ino_d. Since 8871d84c8f8b0c6b("ubifs: convert to fileattr")
applied, 'setflags()' only affects regular files and directories, only
xattr inode, symlink inode and special inode(pipe/char_dev/block_dev)
have none- zero 'ui->data_len' field, so assertion
'!(req->dirtied_ino_d & 7)' cannot fail in ubifs_budget_space().
To avoid assertion fails in future evolution(eg. setflags can operate
special inodes), it's better to make dirtied_ino_d 8 bytes aligned,
after all aligned size is still zero for regular files.

Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c
index c6a863487780..71bcebe45f9c 100644
--- a/fs/ubifs/ioctl.c
+++ b/fs/ubifs/ioctl.c
@@ -108,7 +108,7 @@ static int setflags(struct inode *inode, int flags)
struct ubifs_inode *ui = ubifs_inode(inode);
struct ubifs_info *c = inode->i_sb->s_fs_info;
struct ubifs_budget_req req = { .dirtied_ino = 1,
- .dirtied_ino_d = ui->data_len };
+ .dirtied_ino_d = ALIGN(ui->data_len, 8) };

err = ubifs_budget_space(c, &req);
if (err)
--
2.31.1


2021-12-27 03:12:29

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 10/15] ubifs: Fix to add refcount once page is set private

MM defined the rule [1] very clearly that once page was set with PG_private
flag, we should increment the refcount in that page, also main flows like
pageout(), migrate_page() will assume there is one additional page
reference count if page_has_private() returns true. Otherwise, we may
get a BUG in page migration:

page:0000000080d05b9d refcount:-1 mapcount:0 mapping:000000005f4d82a8
index:0xe2 pfn:0x14c12
aops:ubifs_file_address_operations [ubifs] ino:8f1 dentry name:"f30e"
flags: 0x1fffff80002405(locked|uptodate|owner_priv_1|private|node=0|
zone=1|lastcpupid=0x1fffff)
page dumped because: VM_BUG_ON_PAGE(page_count(page) != 0)
------------[ cut here ]------------
kernel BUG at include/linux/page_ref.h:184!
invalid opcode: 0000 [#1] SMP
CPU: 3 PID: 38 Comm: kcompactd0 Not tainted 5.15.0-rc5
RIP: 0010:migrate_page_move_mapping+0xac3/0xe70
Call Trace:
ubifs_migrate_page+0x22/0xc0 [ubifs]
move_to_new_page+0xb4/0x600
migrate_pages+0x1523/0x1cc0
compact_zone+0x8c5/0x14b0
kcompactd+0x2bc/0x560
kthread+0x18c/0x1e0
ret_from_fork+0x1f/0x30

Before the time, we should make clean a concept, what does refcount means
in page gotten from grab_cache_page_write_begin(). There are 2 situations:
Situation 1: refcount is 3, page is created by __page_cache_alloc.
TYPE_A - the write process is using this page
TYPE_B - page is assigned to one certain mapping by calling
__add_to_page_cache_locked()
TYPE_C - page is added into pagevec list corresponding current cpu by
calling lru_cache_add()
Situation 2: refcount is 2, page is gotten from the mapping's tree
TYPE_B - page has been assigned to one certain mapping
TYPE_A - the write process is using this page (by calling
page_cache_get_speculative())
Filesystem releases one refcount by calling put_page() in xxx_write_end(),
the released refcount corresponds to TYPE_A (write task is using it). If
there are any processes using a page, page migration process will skip the
page by judging whether expected_page_refs() equals to page refcount.

The BUG is caused by following process:
PA(cpu 0) kcompactd(cpu 1)
compact_zone
ubifs_write_begin
page_a = grab_cache_page_write_begin
add_to_page_cache_lru
lru_cache_add
pagevec_add // put page into cpu 0's pagevec
(refcnf = 3, for page creation process)
ubifs_write_end
SetPagePrivate(page_a) // doesn't increase page count !
unlock_page(page_a)
put_page(page_a) // refcnt = 2
[...]

PB(cpu 0)
filemap_read
filemap_get_pages
add_to_page_cache_lru
lru_cache_add
__pagevec_lru_add // traverse all pages in cpu 0's pagevec
__pagevec_lru_add_fn
SetPageLRU(page_a)
isolate_migratepages
isolate_migratepages_block
get_page_unless_zero(page_a)
// refcnt = 3
list_add(page_a, from_list)
migrate_pages(from_list)
__unmap_and_move
move_to_new_page
ubifs_migrate_page(page_a)
migrate_page_move_mapping
expected_page_refs get 3
(migration[1] + mapping[1] + private[1])
release_pages
put_page_testzero(page_a) // refcnt = 3
page_ref_freeze // refcnt = 0
page_ref_dec_and_test(0 - 1 = -1)
page_ref_unfreeze
VM_BUG_ON_PAGE(-1 != 0, page)

UBIFS doesn't increase the page refcount after setting private flag, which
leads to page migration task believes the page is not used by any other
processes, so the page is migrated. This causes concurrent accessing on
page refcount between put_page() called by other process(eg. read process
calls lru_cache_add) and page_ref_unfreeze() called by migration task.

Actually zhangjun has tried to fix this problem [2] by recalculating page
refcnt in ubifs_migrate_page(). It's better to follow MM rules [1], because
just like Kirill suggested in [2], we need to check all users of
page_has_private() helper. Like f2fs does in [3], fix it by adding/deleting
refcount when setting/clearing private for a page. BTW, according to [4],
we set 'page->private' as 1 because ubifs just simply SetPagePrivate().
And, [5] provided a common helper to set/clear page private, ubifs can
use this helper following the example of iomap, afs, btrfs, etc.

Jump [6] to find a reproducer.

[1] https://lore.kernel.org/lkml/[email protected]
[2] https://www.spinics.net/lists/linux-mtd/msg04018.html
[3] http://lkml.iu.edu/hypermail/linux/kernel/1903.0/03313.html
[4] https://lore.kernel.org/linux-f2fs-devel/[email protected]
[5] https://lore.kernel.org/all/[email protected]
[6] https://bugzilla.kernel.org/show_bug.cgi?id=214961

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/file.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 5cfa28cd00cd..6b45a037a047 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -570,7 +570,7 @@ static int ubifs_write_end(struct file *file, struct address_space *mapping,
}

if (!PagePrivate(page)) {
- SetPagePrivate(page);
+ attach_page_private(page, (void *)1);
atomic_long_inc(&c->dirty_pg_cnt);
__set_page_dirty_nobuffers(page);
}
@@ -947,7 +947,7 @@ static int do_writepage(struct page *page, int len)
release_existing_page_budget(c);

atomic_long_dec(&c->dirty_pg_cnt);
- ClearPagePrivate(page);
+ detach_page_private(page);
ClearPageChecked(page);

kunmap(page);
@@ -1304,7 +1304,7 @@ static void ubifs_invalidatepage(struct page *page, unsigned int offset,
release_existing_page_budget(c);

atomic_long_dec(&c->dirty_pg_cnt);
- ClearPagePrivate(page);
+ detach_page_private(page);
ClearPageChecked(page);
}

@@ -1471,8 +1471,8 @@ static int ubifs_migrate_page(struct address_space *mapping,
return rc;

if (PagePrivate(page)) {
- ClearPagePrivate(page);
- SetPagePrivate(newpage);
+ detach_page_private(page);
+ attach_page_private(newpage, (void *)1);
}

if (mode != MIGRATE_SYNC_NO_COPY)
@@ -1496,7 +1496,7 @@ static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags)
return 0;
ubifs_assert(c, PagePrivate(page));
ubifs_assert(c, 0);
- ClearPagePrivate(page);
+ detach_page_private(page);
ClearPageChecked(page);
return 1;
}
@@ -1567,7 +1567,7 @@ static vm_fault_t ubifs_vm_page_mkwrite(struct vm_fault *vmf)
else {
if (!PageChecked(page))
ubifs_convert_page_budget(c);
- SetPagePrivate(page);
+ attach_page_private(page, (void *)1);
atomic_long_inc(&c->dirty_pg_cnt);
__set_page_dirty_nobuffers(page);
}
--
2.31.1


2021-12-27 03:12:31

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 11/15] ubi: fastmap: Return error code if memory allocation fails in add_aeb()

Abort fastmap scanning and return error code if memory allocation fails
in add_aeb(). Otherwise ubi will get wrong peb statistics information
after scanning.

Fixes: dbb7d2a88d2a7b ("UBI: Add fastmap core")
Signed-off-by: Zhihao Cheng <[email protected]>
---
drivers/mtd/ubi/fastmap.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 022af59906aa..6b5f1ffd961b 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -468,7 +468,9 @@ static int scan_pool(struct ubi_device *ubi, struct ubi_attach_info *ai,
if (err == UBI_IO_FF_BITFLIPS)
scrub = 1;

- add_aeb(ai, free, pnum, ec, scrub);
+ ret = add_aeb(ai, free, pnum, ec, scrub);
+ if (ret)
+ goto out;
continue;
} else if (err == 0 || err == UBI_IO_BITFLIPS) {
dbg_bld("Found non empty PEB:%i in pool", pnum);
@@ -638,8 +640,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
if (fm_pos >= fm_size)
goto fail_bad;

- add_aeb(ai, &ai->free, be32_to_cpu(fmec->pnum),
- be32_to_cpu(fmec->ec), 0);
+ ret = add_aeb(ai, &ai->free, be32_to_cpu(fmec->pnum),
+ be32_to_cpu(fmec->ec), 0);
+ if (ret)
+ goto fail;
}

/* read EC values from used list */
@@ -649,8 +653,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
if (fm_pos >= fm_size)
goto fail_bad;

- add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
- be32_to_cpu(fmec->ec), 0);
+ ret = add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
+ be32_to_cpu(fmec->ec), 0);
+ if (ret)
+ goto fail;
}

/* read EC values from scrub list */
@@ -660,8 +666,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
if (fm_pos >= fm_size)
goto fail_bad;

- add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
- be32_to_cpu(fmec->ec), 1);
+ ret = add_aeb(ai, &used, be32_to_cpu(fmec->pnum),
+ be32_to_cpu(fmec->ec), 1);
+ if (ret)
+ goto fail;
}

/* read EC values from erase list */
@@ -671,8 +679,10 @@ static int ubi_attach_fastmap(struct ubi_device *ubi,
if (fm_pos >= fm_size)
goto fail_bad;

- add_aeb(ai, &ai->erase, be32_to_cpu(fmec->pnum),
- be32_to_cpu(fmec->ec), 1);
+ ret = add_aeb(ai, &ai->erase, be32_to_cpu(fmec->pnum),
+ be32_to_cpu(fmec->ec), 1);
+ if (ret)
+ goto fail;
}

ai->mean_ec = div_u64(ai->ec_sum, ai->ec_count);
--
2.31.1


2021-12-27 03:12:34

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

Fastmap pebs(pnum >= UBI_FM_MAX_START) won't be added into 'ai->fastmap'
while attaching ubi device if 'fm->used_blocks' is greater than 2, which
may cause warning from 'ubi_assert(ubi->good_peb_count == found_pebs)':

UBI assert failed in ubi_wl_init at 1878 (pid 2409)
Call Trace:
ubi_wl_init.cold+0xae/0x2af [ubi]
ubi_attach+0x1b0/0x780 [ubi]
ubi_init+0x23a/0x3ad [ubi]
load_module+0x22d2/0x2430

Reproduce:
ID="0x20,0x33,0x00,0x00" # 16M 16KB PEB, 512 page
modprobe nandsim id_bytes=$ID
modprobe ubi mtd="0,0" fm_autoconvert # Fastmap takes 2 pebs
rmmod ubi
modprobe ubi mtd="0,0" fm_autoconvert # Attach by fastmap

Add all used fastmap pebs into list 'ai->fastmap' to make sure they can
be counted into 'found_pebs'.

Fixes: fdf10ed710c0aa ("ubi: Rework Fastmap attach base code")
Signed-off-by: Zhihao Cheng <[email protected]>
---
drivers/mtd/ubi/fastmap.c | 35 +++++------------------------------
1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
index 6b5f1ffd961b..01dcdd94c9d2 100644
--- a/drivers/mtd/ubi/fastmap.c
+++ b/drivers/mtd/ubi/fastmap.c
@@ -828,24 +828,6 @@ static int find_fm_anchor(struct ubi_attach_info *ai)
return ret;
}

-static struct ubi_ainf_peb *clone_aeb(struct ubi_attach_info *ai,
- struct ubi_ainf_peb *old)
-{
- struct ubi_ainf_peb *new;
-
- new = ubi_alloc_aeb(ai, old->pnum, old->ec);
- if (!new)
- return NULL;
-
- new->vol_id = old->vol_id;
- new->sqnum = old->sqnum;
- new->lnum = old->lnum;
- new->scrub = old->scrub;
- new->copy_flag = old->copy_flag;
-
- return new;
-}
-
/**
* ubi_scan_fastmap - scan the fastmap.
* @ubi: UBI device object
@@ -865,7 +847,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
struct ubi_vid_hdr *vh;
struct ubi_ec_hdr *ech;
struct ubi_fastmap_layout *fm;
- struct ubi_ainf_peb *aeb;
int i, used_blocks, pnum, fm_anchor, ret = 0;
size_t fm_size;
__be32 crc, tmp_crc;
@@ -875,17 +856,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
if (fm_anchor < 0)
return UBI_NO_FASTMAP;

- /* Copy all (possible) fastmap blocks into our new attach structure. */
- list_for_each_entry(aeb, &scan_ai->fastmap, u.list) {
- struct ubi_ainf_peb *new;
-
- new = clone_aeb(ai, aeb);
- if (!new)
- return -ENOMEM;
-
- list_add(&new->u.list, &ai->fastmap);
- }
-
down_write(&ubi->fm_protect);
memset(ubi->fm_buf, 0, ubi->fm_size);

@@ -1029,6 +999,11 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai,
"err: %i)", i, pnum, ret);
goto free_hdr;
}
+
+ /* Add all fastmap blocks into attach structure. */
+ ret = add_aeb(ai, &ai->fastmap, pnum, be64_to_cpu(ech->ec), 0);
+ if (ret)
+ goto free_hdr;
}

kfree(fmsb);
--
2.31.1


2021-12-27 03:12:36

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 13/15] ubifs: ubifs_writepage: Mark page dirty after writing inode failed

There are two states for ubifs writing pages:
1. Dirty, Private
2. Not Dirty, Not Private

There is a third possibility which maybe related to [1] that page is
private but not dirty caused by following process:

PA
lock(page)
ubifs_write_end
attach_page_private // set Private
__set_page_dirty_nobuffers // set Dirty
unlock(page)

write_cache_pages
lock(page)
clear_page_dirty_for_io(page) // clear Dirty
ubifs_writepage
write_inode
// fail, goto out, following codes are not executed
// do_writepage
// set_page_writeback // set Writeback
// detach_page_private // clear Private
// end_page_writeback // clear Writeback
out:
unlock(page) // Private, Not Dirty

PB
ksys_fadvise64_64
generic_fadvise
invalidate_inode_page
// page is neither Dirty nor Writeback
invalidate_complete_page
// page_has_private is true
try_to_release_page
ubifs_releasepage
ubifs_assert(c, 0) !!!

Then we may get following assertion failed:
UBIFS error (ubi0:0 pid 1492): ubifs_assert_failed [ubifs]:
UBIFS assert failed: 0, in fs/ubifs/file.c:1499
UBIFS warning (ubi0:0 pid 1492): ubifs_ro_mode [ubifs]:
switched to read-only mode, error -22
CPU: 2 PID: 1492 Comm: aa Not tainted 5.16.0-rc2-00012-g7bb767dee0ba-dirty
Call Trace:
dump_stack+0x13/0x1b
ubifs_ro_mode+0x54/0x60 [ubifs]
ubifs_assert_failed+0x4b/0x80 [ubifs]
ubifs_releasepage+0x7e/0x1e0 [ubifs]
try_to_release_page+0x57/0xe0
invalidate_inode_page+0xfb/0x130
invalidate_mapping_pagevec+0x12/0x20
generic_fadvise+0x303/0x3c0
vfs_fadvise+0x35/0x40
ksys_fadvise64_64+0x4c/0xb0

Jump [2] to find a reproducer.

[1] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty
[2] https://bugzilla.kernel.org/show_bug.cgi?id=215357

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/file.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 6b45a037a047..7cc2abcb70ae 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1031,7 +1031,7 @@ static int ubifs_writepage(struct page *page, struct writeback_control *wbc)
if (page->index >= synced_i_size >> PAGE_SHIFT) {
err = inode->i_sb->s_op->write_inode(inode, NULL);
if (err)
- goto out_unlock;
+ goto out_redirty;
/*
* The inode has been written, but the write-buffer has
* not been synchronized, so in case of an unclean
@@ -1059,11 +1059,17 @@ static int ubifs_writepage(struct page *page, struct writeback_control *wbc)
if (i_size > synced_i_size) {
err = inode->i_sb->s_op->write_inode(inode, NULL);
if (err)
- goto out_unlock;
+ goto out_redirty;
}

return do_writepage(page, len);
-
+out_redirty:
+ /*
+ * redirty_page_for_writepage() won't call ubifs_dirty_inode() because
+ * it passes I_DIRTY_PAGES flag while calling __mark_inode_dirty(), so
+ * there is no need to do space budget for dirty inode.
+ */
+ redirty_page_for_writepage(wbc, page);
out_unlock:
unlock_page(page);
return err;
--
2.31.1


2021-12-27 03:12:41

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 09/15] ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()

Function ubifs_wbuf_write_nolock() may access buf out of bounds in
following process:

ubifs_wbuf_write_nolock():
aligned_len = ALIGN(len, 8); // Assume len = 4089, aligned_len = 4096
if (aligned_len <= wbuf->avail) ... // Not satisfy
if (wbuf->used) {
ubifs_leb_write() // Fill some data in avail wbuf
len -= wbuf->avail; // len is still not 8-bytes aligned
aligned_len -= wbuf->avail;
}
n = aligned_len >> c->max_write_shift;
if (n) {
n <<= c->max_write_shift;
err = ubifs_leb_write(c, wbuf->lnum, buf + written,
wbuf->offs, n);
// n > len, read out of bounds less than 8(n-len) bytes
}

, which can be catched by KASAN:
=========================================================
BUG: KASAN: slab-out-of-bounds in ecc_sw_hamming_calculate+0x1dc/0x7d0
Read of size 4 at addr ffff888105594ff8 by task kworker/u8:4/128
Workqueue: writeback wb_workfn (flush-ubifs_0_0)
Call Trace:
kasan_report.cold+0x81/0x165
nand_write_page_swecc+0xa9/0x160
ubifs_leb_write+0xf2/0x1b0 [ubifs]
ubifs_wbuf_write_nolock+0x421/0x12c0 [ubifs]
write_head+0xdc/0x1c0 [ubifs]
ubifs_jnl_write_inode+0x627/0x960 [ubifs]
wb_workfn+0x8af/0xb80

Function ubifs_wbuf_write_nolock() accepts that parameter 'len' is not 8
bytes aligned, the 'len' represents the true length of buf (which is
allocated in 'ubifs_jnl_xxx', eg. ubifs_jnl_write_inode), so
ubifs_wbuf_write_nolock() must handle the length read from 'buf' carefully
to write leb safely.

Fetch a reproducer in [Link].

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214785
Reported-by: Chengsong Ke <[email protected]>
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/io.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/fs/ubifs/io.c b/fs/ubifs/io.c
index 00b61dba62b7..b019dd6f7fa0 100644
--- a/fs/ubifs/io.c
+++ b/fs/ubifs/io.c
@@ -833,16 +833,42 @@ int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len)
*/
n = aligned_len >> c->max_write_shift;
if (n) {
- n <<= c->max_write_shift;
+ int m = n - 1;
+
dbg_io("write %d bytes to LEB %d:%d", n, wbuf->lnum,
wbuf->offs);
- err = ubifs_leb_write(c, wbuf->lnum, buf + written,
- wbuf->offs, n);
+
+ if (m) {
+ /* '(n-1)<<c->max_write_shift < len' is always true. */
+ m <<= c->max_write_shift;
+ err = ubifs_leb_write(c, wbuf->lnum, buf + written,
+ wbuf->offs, m);
+ if (err)
+ goto out;
+ wbuf->offs += m;
+ aligned_len -= m;
+ len -= m;
+ written += m;
+ }
+
+ /*
+ * The non-written len of buf may be less than 'n' because
+ * parameter 'len' is not 8 bytes aligned, so here we read
+ * min(len, n) bytes from buf.
+ */
+ n = 1 << c->max_write_shift;
+ memcpy(wbuf->buf, buf + written, min(len, n));
+ if (n > len) {
+ ubifs_assert(c, n - len < 8);
+ ubifs_pad(c, wbuf->buf + len, n - len);
+ }
+
+ err = ubifs_leb_write(c, wbuf->lnum, wbuf->buf, wbuf->offs, n);
if (err)
goto out;
wbuf->offs += n;
aligned_len -= n;
- len -= n;
+ len -= min(len, n);
written += n;
}

--
2.31.1


2021-12-27 03:12:44

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 14/15] ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process

There are two states for ubifs writing pages:
1. Dirty, Private
2. Not Dirty, Not Private

The normal process cannot go to ubifs_releasepage() which means there
exists pages being private but not dirty. Reproducer[1] shows that it
could occur (which maybe related to [2]) with following process:

PA PB PC
lock(page)[PA]
ubifs_write_end
attach_page_private // set Private
__set_page_dirty_nobuffers // set Dirty
unlock(page)

write_cache_pages[PA]
lock(page)
clear_page_dirty_for_io(page) // clear Dirty
ubifs_writepage

do_truncation[PB]
truncate_setsize
i_size_write(inode, newsize) // newsize = 0

i_size = i_size_read(inode) // i_size = 0
end_index = i_size >> PAGE_SHIFT
if (page->index > end_index)
goto out // jump
out:
unlock(page) // Private, Not Dirty

generic_fadvise[PC]
lock(page)
invalidate_inode_page
try_to_release_page
ubifs_releasepage
ubifs_assert(c, 0)
// bad assertion!
unlock(page)
truncate_pagecache[PB]

Then we may get following assertion failed:
UBIFS error (ubi0:0 pid 1683): ubifs_assert_failed [ubifs]:
UBIFS assert failed: 0, in fs/ubifs/file.c:1513
UBIFS warning (ubi0:0 pid 1683): ubifs_ro_mode [ubifs]:
switched to read-only mode, error -22
CPU: 2 PID: 1683 Comm: aa Not tainted 5.16.0-rc5-00184-g0bca5994cacc-dirty #308
Call Trace:
dump_stack+0x13/0x1b
ubifs_ro_mode+0x54/0x60 [ubifs]
ubifs_assert_failed+0x4b/0x80 [ubifs]
ubifs_releasepage+0x67/0x1d0 [ubifs]
try_to_release_page+0x57/0xe0
invalidate_inode_page+0xfb/0x130
__invalidate_mapping_pages+0xb9/0x280
invalidate_mapping_pagevec+0x12/0x20
generic_fadvise+0x303/0x3c0
ksys_fadvise64_64+0x4c/0xb0

[1] https://bugzilla.kernel.org/show_bug.cgi?id=215373
[2] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty

Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/file.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 7cc2abcb70ae..4bafcb80d29c 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1494,14 +1494,23 @@ static int ubifs_releasepage(struct page *page, gfp_t unused_gfp_flags)
struct inode *inode = page->mapping->host;
struct ubifs_info *c = inode->i_sb->s_fs_info;

- /*
- * An attempt to release a dirty page without budgeting for it - should
- * not happen.
- */
if (PageWriteback(page))
return 0;
+
+ /*
+ * Page is private but not dirty, weird? There is one condition
+ * making it happened. ubifs_writepage skipped the page because
+ * page index beyonds isize (for example. truncated by other
+ * process named A), then the page is invalidated by fadvise64
+ * syscall before being truncated by process A.
+ */
ubifs_assert(c, PagePrivate(page));
- ubifs_assert(c, 0);
+ if (PageChecked(page))
+ release_new_page_budget(c);
+ else
+ release_existing_page_budget(c);
+
+ atomic_long_dec(&c->dirty_pg_cnt);
detach_page_private(page);
ClearPageChecked(page);
return 1;
--
2.31.1


2021-12-27 03:12:48

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH v6 15/15] ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty

There at least 6 PEBs reserved on UBI device:
1. EBA_RESERVED_PEBS[1]
2. WL_RESERVED_PEBS[1]
3. UBI_LAYOUT_VOLUME_EBS[2]
4. MIN_FASTMAP_RESERVED_PEBS[2]

When all ubi volumes take all their PEBs, there are 3 (EBA_RESERVED_PEBS +
WL_RESERVED_PEBS + MIN_FASTMAP_RESERVED_PEBS - MIN_FASTMAP_TAKEN_PEBS[1])
free PEBs. Since f9c34bb529975fe ("ubi: Fix producing anchor PEBs") and
4b68bf9a69d22dd ("ubi: Select fastmap anchor PEBs considering wear level
rules") applied, there is only 1 (3 - FASTMAP_ANCHOR_PEBS[1] -
FASTMAP_NEXT_ANCHOR_PEBS[1]) free PEB to fill pool and wl_pool, after
filling pool, wl_pool is always empty. So, UBI could be stuck in an
infinite loop:

ubi_thread system_wq
wear_leveling_worker <--------------------------------------------------
get_peb_for_wl |
// fm_wl_pool, used = size = 0 |
schedule_work(&ubi->fm_work) |
|
update_fastmap_work_fn |
ubi_update_fastmap |
ubi_refill_pools |
// ubi->free_count - ubi->beb_rsvd_pebs < 5 |
// wl_pool is not filled with any PEBs |
schedule_erase(old_fm_anchor) |
ubi_ensure_anchor_pebs |
__schedule_ubi_work(wear_leveling_worker) |
|
__erase_worker |
ensure_wear_leveling |
__schedule_ubi_work(wear_leveling_worker) --------------------------

, which cause high cpu usage of ubi_bgt:
top - 12:10:42 up 5 min, 2 users, load average: 1.76, 0.68, 0.27
Tasks: 123 total, 3 running, 54 sleeping, 0 stopped, 0 zombie

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1589 root 20 0 0 0 0 R 45.0 0.0 0:38.86 ubi_bgt0d
319 root 20 0 0 0 0 I 15.2 0.0 0:15.29 kworker/0:3-eve
371 root 20 0 0 0 0 I 14.9 0.0 0:12.85 kworker/3:3-eve
20 root 20 0 0 0 0 I 11.3 0.0 0:05.33 kworker/1:0-eve
202 root 20 0 0 0 0 I 11.3 0.0 0:04.93 kworker/2:3-eve

The fm_next_anchor always takes one free PEB to prepare for updating
fastmap, fm_anchor and fm_next_anchor are always allocated first of pool
and wl_pool, so UBI can regard these two PEBs as reserved PEBs.

Fix it by:
1) Adding 2 PEBs reserved for fm_anchor and fm_next_anchor.
2) Abandoning filling wl_pool until free count belows beb_rsvd_pebs.
Then, there are at least 2(EBA_RESERVED_PEBS + MIN_FASTMAP_RESERVED_PEBS -
MIN_FASTMAP_TAKEN_PEBS[1]) PEBs in pool and 1(WL_RESERVED_PEBS) PEB in
wl_pool after calling ubi_refill_pools() with all erase works done.

This modification will cause a compatibility problem with old UBI image.
If UBI volumes take the maximun number of PEBs for one certain UBI device,
there are no available PEBs to satisfy 2 new reserved PEBs, bad reserved
PEBs are taken firstly, if still not enough, ENOSPC will returned from ubi
initialization.

Fetch a reproducer in [Link].

Fixes: f9c34bb529975fe ("ubi: Fix producing anchor PEBs")
Fixes: 4b68bf9a69d22dd ("ubi: Select fastmap anchor PEBs ... rules")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215407
Signed-off-by: Zhihao Cheng <[email protected]>
---
drivers/mtd/ubi/fastmap-wl.c | 5 +++--
drivers/mtd/ubi/wl.h | 11 +++++++++--
2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/mtd/ubi/fastmap-wl.c b/drivers/mtd/ubi/fastmap-wl.c
index 28f55f9cf715..18f23e76fee9 100644
--- a/drivers/mtd/ubi/fastmap-wl.c
+++ b/drivers/mtd/ubi/fastmap-wl.c
@@ -134,7 +134,8 @@ void ubi_refill_pools(struct ubi_device *ubi)
for (;;) {
enough = 0;
if (pool->size < pool->max_size) {
- if (!ubi->free.rb_node)
+ if (!ubi->free.rb_node ||
+ (ubi->free_count <= ubi->beb_rsvd_pebs))
break;

e = wl_get_wle(ubi);
@@ -148,7 +149,7 @@ void ubi_refill_pools(struct ubi_device *ubi)

if (wl_pool->size < wl_pool->max_size) {
if (!ubi->free.rb_node ||
- (ubi->free_count - ubi->beb_rsvd_pebs < 5))
+ (ubi->free_count <= ubi->beb_rsvd_pebs))
break;

e = find_wl_entry(ubi, &ubi->free, WL_FREE_MAX_DIFF);
diff --git a/drivers/mtd/ubi/wl.h b/drivers/mtd/ubi/wl.h
index c93a53293786..e6c8f68d5ffb 100644
--- a/drivers/mtd/ubi/wl.h
+++ b/drivers/mtd/ubi/wl.h
@@ -2,14 +2,21 @@
#ifndef UBI_WL_H
#define UBI_WL_H
#ifdef CONFIG_MTD_UBI_FASTMAP
+
+/* Number of physical eraseblocks for fm anchor */
+#define FASTMAP_ANCHOR_PEBS 1
+/* Number of physical eraseblocks for next fm anchor */
+#define FASTMAP_NEXT_ANCHOR_PEBS 1
+
static void update_fastmap_work_fn(struct work_struct *wrk);
static struct ubi_wl_entry *find_anchor_wl_entry(struct rb_root *root);
static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi);
static void ubi_fastmap_close(struct ubi_device *ubi);
static inline void ubi_fastmap_init(struct ubi_device *ubi, int *count)
{
- /* Reserve enough LEBs to store two fastmaps. */
- *count += (ubi->fm_size / ubi->leb_size) * 2;
+ /* Reserve enough LEBs to store two fastmaps and anchor fm pebs */
+ *count += (ubi->fm_size / ubi->leb_size) * 2 +
+ FASTMAP_ANCHOR_PEBS + FASTMAP_NEXT_ANCHOR_PEBS;
INIT_WORK(&ubi->fm_work, update_fastmap_work_fn);
}
static struct ubi_wl_entry *may_reserve_for_fm(struct ubi_device *ubi,
--
2.31.1


2021-12-27 10:13:12

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH v6 00/15] Some bugfixs for ubi/ubifs

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <[email protected]>
> An: "richard" <[email protected]>, "Miquel Raynal" <[email protected]>, "Vignesh Raghavendra" <[email protected]>,
> "mcoquelin stm32" <[email protected]>, "kirill shutemov" <[email protected]>, "Sascha Hauer"
> <[email protected]>
> CC: "linux-mtd" <[email protected]>, "linux-kernel" <[email protected]>
> Gesendet: Montag, 27. Dezember 2021 04:22:31
> Betreff: [PATCH v6 00/15] Some bugfixs for ubi/ubifs

> v1->v2:
> 1. Add new fix for ubifs, "ubifs: Fix to add refcount once page is set
> private"
> 2. Update "ubifs: Rename whiteout atomically":
> 1) Move inode mode in create_whiteout()
> 2) Don't check O_SYNC for whiteout, because it inherits from the old_dir
> 3) Remove useless 'synced_i_size ' assignment for whiteout, because
> it's always be zero.
> 4) Remove unused variable 'ui' in create_whiteout()
> 3. Update "ubifs: setflags: Make dirtied_ino_d 8 bytes aligned":
> 1) Align dirtied_ino_d with 8 bytes.
>
> v2->v3:
> 1. Update "ubifs: Rename whiteout atomically":
> 1) Fix misspelling 'have already check the old dir inode' ->
> 'have already checked the old dir inode'
> 2) Fix misspelling "Whiteout don't have non-zero size" ->
> "Whiteout have non-zero size"
> 2. Update "ubifs: Fix to add refcount once page is set private"
> 1) Fix commit message to explain the root cause.
> 3. Update "ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
> fm->used_blocks>=2"
> 1) Add fastmap used pebs into 'ai' in for-loop, rather than in
> two-steps(Add pebs [pnum<UBI_FM_MAX_START] then add pebs
> [pnum>=UBI_FM_MAX_START] into 'ai').
>
> v3->v4:
> 1. Update "ubifs: Add missing iput if do_tmpfile() failed in rename whiteout":
> 1) Move whiteout cleanup into do_tmpfile() according to Sascha's advice
> 2. Add new fix for ubifs, "ubifs: ubifs_writepage: Mark page dirty after
> writing inode failed"
>
> v4->v5:
> 1. Add new fix for ubifs, "ubifs: ubifs_releasepage: Remove ubifs_assert(0)
> to valid this process"
>
> v5->v6:
> 1. Add new fix for ubi: "ubi: fastmap: Fix high cpu usage of ubi_bgt by
> making sure wl_pool not empty"
>
> Zhihao Cheng (15):
> ubifs: rename_whiteout: Fix double free for whiteout_ui->data
> ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
> ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode
> comment
> ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
> ubifs: Rename whiteout atomically
> ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
> ubifs: Rectify space amount budget for mkdir/tmpfile operations
> ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
> ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
> ubifs: Fix to add refcount once page is set private
> ubi: fastmap: Return error code if memory allocation fails in
> add_aeb()
> ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when
> fm->used_blocks>=2
> ubifs: ubifs_writepage: Mark page dirty after writing inode failed
> ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process
> ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not
> empty

Thanks a lot for all this fixes! I will start reviewing/picking them an
as soon the series is stable.
So please don't resend everything if you only add a new patch.

Thanks,
//richard

2021-12-27 12:19:20

by Zhihao Cheng

[permalink] [raw]
Subject: Re: [PATCH v6 00/15] Some bugfixs for ubi/ubifs

在 2021/12/27 18:13, Richard Weinberger 写道:

> Thanks a lot for all this fixes! I will start reviewing/picking them an
> as soon the series is stable.
> So please don't resend everything if you only add a new patch.
OK. New patches(If I find new bugs) will be sent indepently. v6 series
is closed.

2021-12-27 13:00:06

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH v6 00/15] Some bugfixs for ubi/ubifs

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <[email protected]>
>> Thanks a lot for all this fixes! I will start reviewing/picking them an
>> as soon the series is stable.
>> So please don't resend everything if you only add a new patch.
> OK. New patches(If I find new bugs) will be sent indepently. v6 series
> is closed.

Thanks! Your fixes are most appreciated!

Thanks,
//richard

2022-01-09 21:16:33

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <[email protected]>
> An: "richard" <[email protected]>, "Miquel Raynal" <[email protected]>, "Vignesh Raghavendra" <[email protected]>,
> "mcoquelin stm32" <[email protected]>, "kirill shutemov" <[email protected]>, "Sascha Hauer"
> <[email protected]>
> CC: "linux-mtd" <[email protected]>, "linux-kernel" <[email protected]>
> Gesendet: Montag, 27. Dezember 2021 04:22:36
> Betreff: [PATCH v6 05/15] ubifs: Rename whiteout atomically

> Currently, rename whiteout has 3 steps:
> 1. create tmpfile(which associates old dentry to tmpfile inode) for
> whiteout, and store tmpfile to disk
> 2. link whiteout, associate whiteout inode to old dentry agagin and
> store old dentry, old inode, new dentry on disk
> 3. writeback dirty whiteout inode to disk
>
> Suddenly power-cut or error occurring(eg. ENOSPC returned by budget,
> memory allocation failure) during above steps may cause kinds of problems:
> Problem 1: ENOSPC returned by whiteout space budget (before step 2),
> old dentry will disappear after rename syscall, whiteout file
> cannot be found either.
>
> ls dir // we get file, whiteout
> rename(dir/file, dir/whiteout, REANME_WHITEOUT)
> ENOSPC = ubifs_budget_space(&wht_req) // return
> ls dir // empty (no file, no whiteout)
> Problem 2: Power-cut happens before step 3, whiteout inode with 'nlink=1'
> is not stored on disk, whiteout dentry(old dentry) is written
> on disk, whiteout file is lost on next mount (We get "dead
> directory entry" after executing 'ls -l' on whiteout file).
>
> Now, we use following 3 steps to finish rename whiteout:
> 1. create an in-mem inode with 'nlink = 1' as whiteout
> 2. ubifs_jnl_rename (Write on disk to finish associating old dentry to
> whiteout inode, associating new dentry with old inode)
> 3. iput(whiteout)
>
> Rely writing in-mem inode on disk by ubifs_jnl_rename() to finish rename
> whiteout, which avoids middle disk state caused by suddenly power-cut
> and error occurring.

How do you make sure the the whiteout is never written to disk (by writeback) before ubifs_jnl_rename() linked
it? That's the reason why other filesystems use the tmpfile mechanism for whiteouts too.

Thanks,
//richard

2022-01-10 09:35:13

by Zhihao Cheng

[permalink] [raw]
Subject: Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically

Hi, Richard
>
> How do you make sure the the whiteout is never written to disk (by writeback) before ubifs_jnl_rename() linked
> it? That's the reason why other filesystems use the tmpfile mechanism for whiteouts too.
>

The whiteout inode is clean after creation from create_whiteout(), and
it can't be marked dirty until ubifs_jnl_rename() finished. So, I think
there is no chance for whiteout being written on disk. Then,
'ubifs_assert(c, !whiteout_ui->dirty)' never fails in ubifs_jnl_rename()
during my local stress tests. You may add some delay executions after
whiteout creation to make sure that whiteout won't be written back
before ubifs_jnl_rename().

BTW, I considered plan B(Binding whiteout to a negative dentry), but we
cannot get parent dir's entry in do_rename(), so I abandoned it.
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1334,7 +1335,8 @@ static int do_rename(struct inode *old_dir, struct
dentry *old_dentry,
goto out_release;
}

- err = do_tmpfile(old_dir, old_dentry, S_IFCHR |
WHITEOUT_MODE, &whiteout);
+ // tmp_dentry = d_alloc(old_dir_dentry, &slash_name);
// refer to vfs_tmpfile
+ err = do_tmpfile(old_dir, tmp_dentry, S_IFCHR |
WHITEOUT_MODE, &whiteout);
if (err) {
kfree(dev);
goto out_release;

2022-01-10 10:14:25

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <[email protected]>
> An: "richard" <[email protected]>
> CC: "Miquel Raynal" <[email protected]>, "Vignesh Raghavendra" <[email protected]>, "mcoquelin stm32"
> <[email protected]>, "kirill shutemov" <[email protected]>, "Sascha Hauer"
> <[email protected]>, "linux-mtd" <[email protected]>, "linux-kernel" <[email protected]>
> Gesendet: Montag, 10. Januar 2022 10:35:02
> Betreff: Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically

> Hi, Richard
>>
>> How do you make sure the the whiteout is never written to disk (by writeback)
>> before ubifs_jnl_rename() linked
>> it? That's the reason why other filesystems use the tmpfile mechanism for
>> whiteouts too.
>>
>
> The whiteout inode is clean after creation from create_whiteout(), and
> it can't be marked dirty until ubifs_jnl_rename() finished. So, I think
> there is no chance for whiteout being written on disk. Then,
> 'ubifs_assert(c, !whiteout_ui->dirty)' never fails in ubifs_jnl_rename()
> during my local stress tests. You may add some delay executions after
> whiteout creation to make sure that whiteout won't be written back
> before ubifs_jnl_rename().

From UBIFS point of view I fully agree with you. I'm just a little puzzled why
other filesystems use the tmpfile approach. My fear is that VFS can do things
to the inode we don't have in mind right now.

Thanks,
//richard

2022-01-10 20:58:14

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH v6 05/15] ubifs: Rename whiteout atomically

----- Ursprüngliche Mail -----
>> The whiteout inode is clean after creation from create_whiteout(), and
>> it can't be marked dirty until ubifs_jnl_rename() finished. So, I think
>> there is no chance for whiteout being written on disk. Then,
>> 'ubifs_assert(c, !whiteout_ui->dirty)' never fails in ubifs_jnl_rename()
>> during my local stress tests. You may add some delay executions after
>> whiteout creation to make sure that whiteout won't be written back
>> before ubifs_jnl_rename().
>
> From UBIFS point of view I fully agree with you. I'm just a little puzzled why
> other filesystems use the tmpfile approach. My fear is that VFS can do things
> to the inode we don't have in mind right now.

After digging a bit into XFS I'm sure your approach is okay.
So, UBIFS can do a whiteout without help of tmpfiles. :-)

Thanks,
//richard

2022-01-10 23:23:52

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <[email protected]>
> An: "richard" <[email protected]>, "Miquel Raynal" <[email protected]>, "Vignesh Raghavendra" <[email protected]>,
> "mcoquelin stm32" <[email protected]>, "kirill shutemov" <[email protected]>, "Sascha Hauer"
> <[email protected]>
> CC: "linux-mtd" <[email protected]>, "linux-kernel" <[email protected]>
> Gesendet: Montag, 27. Dezember 2021 04:22:43
> Betreff: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Fastmap pebs(pnum >= UBI_FM_MAX_START) won't be added into 'ai->fastmap'
> while attaching ubi device if 'fm->used_blocks' is greater than 2, which
> may cause warning from 'ubi_assert(ubi->good_peb_count == found_pebs)':
>
> UBI assert failed in ubi_wl_init at 1878 (pid 2409)
> Call Trace:
> ubi_wl_init.cold+0xae/0x2af [ubi]
> ubi_attach+0x1b0/0x780 [ubi]
> ubi_init+0x23a/0x3ad [ubi]
> load_module+0x22d2/0x2430
>
> Reproduce:
> ID="0x20,0x33,0x00,0x00" # 16M 16KB PEB, 512 page
> modprobe nandsim id_bytes=$ID
> modprobe ubi mtd="0,0" fm_autoconvert # Fastmap takes 2 pebs
> rmmod ubi
> modprobe ubi mtd="0,0" fm_autoconvert # Attach by fastmap
>
> Add all used fastmap pebs into list 'ai->fastmap' to make sure they can
> be counted into 'found_pebs'.
>
> Fixes: fdf10ed710c0aa ("ubi: Rework Fastmap attach base code")
> Signed-off-by: Zhihao Cheng <[email protected]>
> ---
> drivers/mtd/ubi/fastmap.c | 35 +++++------------------------------
> 1 file changed, 5 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c
> index 6b5f1ffd961b..01dcdd94c9d2 100644
> --- a/drivers/mtd/ubi/fastmap.c
> +++ b/drivers/mtd/ubi/fastmap.c
> /**
> * ubi_scan_fastmap - scan the fastmap.
> * @ubi: UBI device object
> @@ -865,7 +847,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct
> ubi_attach_info *ai,
> struct ubi_vid_hdr *vh;
> struct ubi_ec_hdr *ech;
> struct ubi_fastmap_layout *fm;
> - struct ubi_ainf_peb *aeb;
> int i, used_blocks, pnum, fm_anchor, ret = 0;
> size_t fm_size;
> __be32 crc, tmp_crc;
> @@ -875,17 +856,6 @@ int ubi_scan_fastmap(struct ubi_device *ubi, struct
> ubi_attach_info *ai,
> if (fm_anchor < 0)
> return UBI_NO_FASTMAP;
>
> - /* Copy all (possible) fastmap blocks into our new attach structure. */
> - list_for_each_entry(aeb, &scan_ai->fastmap, u.list) {
> - struct ubi_ainf_peb *new;
> -
> - new = clone_aeb(ai, aeb);
> - if (!new)
> - return -ENOMEM;
> -
> - list_add(&new->u.list, &ai->fastmap);
> - }
> -

scan_ai->fastmap may contain also old fastmap PEBs.
In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
e.g. after power-cut.
That's why scan_ai->fastmap is copied into ai->fastmap.
Later in ubi_wl_init() these outdated PEBs will get erased.
So, you cannot remove this code.

But I fully agree with you that the fm->used_blocks > 1 case is not correct.
I fear if scan_ai->fastmap contains old fastmap PEBs and fm->used_blocks is > 1
we need to fall back to scanning mode while attaching.

Thanks,
//richard

2022-01-11 02:48:36

by Zhihao Cheng

[permalink] [raw]
Subject: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

Hi Richard,
> scan_ai->fastmap may contain also old fastmap PEBs.
> In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
> e.g. after power-cut.
> That's why scan_ai->fastmap is copied into ai->fastmap.
> Later in ubi_wl_init() these outdated PEBs will get erased.
> So, you cannot remove this code.
I thought old fastmap PEBs(async erase works in ubi_update_fastmap())
will be counted into erase PEBs in the next attaching process, because I
saw following code snippet in ubi_write_fastmap():
1260 list_for_each_entry(ubi_wrk, &ubi->works, list) {

1261 if (ubi_is_erase_work(ubi_wrk)) {

1262 wl_e = ubi_wrk->e;

1263 ubi_assert(wl_e);

1264

1265 fec = (struct ubi_fm_ec *)(fm_raw +
fm_pos);
1266

1267 fec->pnum = cpu_to_be32(wl_e->pnum);

1268 set_seen(ubi, wl_e->pnum, seen_pebs);

1269 fec->ec = cpu_to_be32(wl_e->ec);

1270

1271 erase_peb_count++;

1272 fm_pos += sizeof(*fec);

1273 ubi_assert(fm_pos <= ubi->fm_size);

1274 }

1275 }

1276 fmh->erase_peb_count = cpu_to_be32(erase_peb_count);
Half-writing on fastmap will be recognized in scanning, and UBI
fallbacks full scanning, So, I come up with two situations:
1. power-cut before new fastmap written, the old fastmap is completely
saved until next attaching, and some free PEBs are written with new
fastmap data. Luckly, fastmap anchor PEB's vid header is written first
of all, bad fastmap will be returned by ubi_attach_fastmap() in next
attaching.
2. power-cut after new fastmap written, the old fastmap PEBs will be
added into 'ai->erase' list in next attaching.
Did I miss other possible circumstances?
>
> But I fully agree with you that the fm->used_blocks > 1 case is not correct.
> I fear if scan_ai->fastmap contains old fastmap PEBs and fm->used_blocks is > 1
> we need to fall back to scanning mode while attaching.

2022-01-11 07:27:51

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <[email protected]>
> An: "richard" <[email protected]>
> CC: "Miquel Raynal" <[email protected]>, "Vignesh Raghavendra" <[email protected]>, "mcoquelin stm32"
> <[email protected]>, "kirill shutemov" <[email protected]>, "Sascha Hauer"
> <[email protected]>, "linux-mtd" <[email protected]>, "linux-kernel" <[email protected]>
> Gesendet: Dienstag, 11. Januar 2022 03:48:24
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Hi Richard,
>> scan_ai->fastmap may contain also old fastmap PEBs.
>> In the area < UBI_FM_MAX_START you can find outdated fastmap PEBs.
>> e.g. after power-cut.
>> That's why scan_ai->fastmap is copied into ai->fastmap.
>> Later in ubi_wl_init() these outdated PEBs will get erased.
>> So, you cannot remove this code.
> I thought old fastmap PEBs(async erase works in ubi_update_fastmap())
> will be counted into erase PEBs in the next attaching process, because I
> saw following code snippet in ubi_write_fastmap():
> 1260 list_for_each_entry(ubi_wrk, &ubi->works, list) {
>
> 1261 if (ubi_is_erase_work(ubi_wrk)) {
>
> 1262 wl_e = ubi_wrk->e;
>
> 1263 ubi_assert(wl_e);
>
> 1264
>
> 1265 fec = (struct ubi_fm_ec *)(fm_raw +
> fm_pos);
> 1266
>
> 1267 fec->pnum = cpu_to_be32(wl_e->pnum);
>
> 1268 set_seen(ubi, wl_e->pnum, seen_pebs);
>
> 1269 fec->ec = cpu_to_be32(wl_e->ec);
>
> 1270
>
> 1271 erase_peb_count++;
>
> 1272 fm_pos += sizeof(*fec);
>
> 1273 ubi_assert(fm_pos <= ubi->fm_size);
>
> 1274 }
>
> 1275 }
>
> 1276 fmh->erase_peb_count = cpu_to_be32(erase_peb_count);
> Half-writing on fastmap will be recognized in scanning, and UBI
> fallbacks full scanning, So, I come up with two situations:
> 1. power-cut before new fastmap written, the old fastmap is completely
> saved until next attaching, and some free PEBs are written with new
> fastmap data. Luckly, fastmap anchor PEB's vid header is written first
> of all, bad fastmap will be returned by ubi_attach_fastmap() in next
> attaching.
> 2. power-cut after new fastmap written, the old fastmap PEBs will be
> added into 'ai->erase' list in next attaching.
> Did I miss other possible circumstances?

In ubi_wl_init() there is another corner case documented:
/*
* The fastmap update code might not find a free PEB for
* writing the fastmap anchor to and then reuses the
* current fastmap anchor PEB. When this PEB gets erased
* and a power cut happens before it is written again we
* must make sure that the fastmap attach code doesn't
* find any outdated fastmap anchors, hence we erase the
* outdated fastmap anchor PEBs synchronously here.
*/
if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
sync = true;

So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
starts to operate. With your change this is no longer satisfied.

Thanks,
//richard

2022-01-11 11:44:55

by Zhihao Cheng

[permalink] [raw]
Subject: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

Hi Richard,
> In ubi_wl_init() there is another corner case documented:
> /*
> * The fastmap update code might not find a free PEB for
> * writing the fastmap anchor to and then reuses the
> * current fastmap anchor PEB. When this PEB gets erased
> * and a power cut happens before it is written again we
> * must make sure that the fastmap attach code doesn't
> * find any outdated fastmap anchors, hence we erase the
> * outdated fastmap anchor PEBs synchronously here.
> */
> if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
> sync = true;
>
> So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
> starts to operate. With your change this is no longer satisfied
I seem to understand the another case. But I'm still confused that why
outdated fastmap PEBs cannot be erased. When UBI comes to this point, it
means UBI is attached by **full scanning mode**, scan_fast() returns two
values:
UBI_NO_FASTMAP: scan all pebs from pnum UBI_FM_MAX_START, ai is
assigned with scan_ai, at last, all fastmap pebs are added into
'ai->fastmap'
UBI_BAD_FASTMAP: scan all pebs from pnum 0, all fastmap pebs are
added into 'ai->fastmap'

2022-01-11 11:57:12

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <[email protected]>
> An: "richard" <[email protected]>
> CC: "Miquel Raynal" <[email protected]>, "Vignesh Raghavendra" <[email protected]>, "mcoquelin stm32"
> <[email protected]>, "kirill shutemov" <[email protected]>, "Sascha Hauer"
> <[email protected]>, "linux-mtd" <[email protected]>, "linux-kernel" <[email protected]>
> Gesendet: Dienstag, 11. Januar 2022 12:44:49
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> Hi Richard,
>> In ubi_wl_init() there is another corner case documented:
>> /*
>> * The fastmap update code might not find a free PEB for
>> * writing the fastmap anchor to and then reuses the
>> * current fastmap anchor PEB. When this PEB gets erased
>> * and a power cut happens before it is written again we
>> * must make sure that the fastmap attach code doesn't
>> * find any outdated fastmap anchors, hence we erase the
>> * outdated fastmap anchor PEBs synchronously here.
>> */
>> if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)
>> sync = true;
>>
>> So ubi_wl_init() makes sure that all old fastmap anchors get erased before UBI
>> starts to operate. With your change this is no longer satisfied
> I seem to understand the another case. But I'm still confused that why
> outdated fastmap PEBs cannot be erased. When UBI comes to this point, it
> means UBI is attached by **full scanning mode**, scan_fast() returns two
> values:
> UBI_NO_FASTMAP: scan all pebs from pnum UBI_FM_MAX_START, ai is
> assigned with scan_ai, at last, all fastmap pebs are added into
> 'ai->fastmap'
> UBI_BAD_FASTMAP: scan all pebs from pnum 0, all fastmap pebs are
> added into 'ai->fastmap'

I fear, I'm unable to follow your thoughts.
ubi_wl_init() is called in both cases, with and without fastmap.
And ai->fastmap contains all anchor PEBs that scan_fast() found.
This can be the most recent but also outdated anchor PEBs.

Thanks,
//richard

2022-01-11 13:23:57

by Zhihao Cheng

[permalink] [raw]
Subject: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

在 2022/1/11 19:57, Richard Weinberger 写道:
> ubi_wl_init() is called in both cases, with and without fastmap.
I agree.

> And ai->fastmap contains all anchor PEBs that scan_fast() found.
> This can be the most recent but also outdated anchor PEBs.
Is it exists a case that outdated fastmap PEBs are neither counted into
'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
by fastmap.

1853 if (ubi->lookuptbl[aeb->pnum])

1854 continue;

1855

1856 /*

1857 * The fastmap update code might not find a
free PEB for
1858 * writing the fastmap anchor to and then
reuses the
1859 * current fastmap anchor PEB. When this
PEB gets erased
1860 * and a power cut happens before it is
written again we
1861 * must make sure that the fastmap attach
code doesn't
1862 * find any outdated fastmap anchors, hence
we erase the
1863 * outdated fastmap anchor PEBs
synchronously here.
1864 */

1865 if (aeb->vol_id == UBI_FM_SB_VOLUME_ID)

1866 sync = true;

I think UBI attaches failed by fastmap if kernel goes here.
1870 err = erase_aeb(ubi, aeb, sync);

2022-01-11 13:56:28

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <[email protected]>
> An: "richard" <[email protected]>
> CC: "Miquel Raynal" <[email protected]>, "Vignesh Raghavendra" <[email protected]>, "mcoquelin stm32"
> <[email protected]>, "kirill shutemov" <[email protected]>, "Sascha Hauer"
> <[email protected]>, "linux-mtd" <[email protected]>, "linux-kernel" <[email protected]>
> Gesendet: Dienstag, 11. Januar 2022 14:23:52
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

> 在 2022/1/11 19:57, Richard Weinberger 写道:
>> ubi_wl_init() is called in both cases, with and without fastmap.
> I agree.
>
>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>> This can be the most recent but also outdated anchor PEBs.
> Is it exists a case that outdated fastmap PEBs are neither counted into
> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
> by fastmap.
>

[...]

> I think UBI attaches failed by fastmap if kernel goes here.
> 1870 err = erase_aeb(ubi, aeb, sync);

Hmm, I think the paranoia check in fastmap.c is too strict these days.
if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
ai->bad_peb_count - fm->used_blocks))
goto fail_bad;

It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
this check will trigger and force falling back to scanning mode.
With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.

So I agree that this code path is wonky and can be cleaned up. But please be
extremely careful and give all your changes excessive testing with real workload
and power-cuts.

Thanks,
//richard

2022-01-12 03:49:07

by Zhihao Cheng

[permalink] [raw]
Subject: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

>>> ubi_wl_init() is called in both cases, with and without fastmap.
>> I agree.
>>
>>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>>> This can be the most recent but also outdated anchor PEBs.
>> Is it exists a case that outdated fastmap PEBs are neither counted into
>> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
>> by fastmap.
>>
>
> [...]
>
>> I think UBI attaches failed by fastmap if kernel goes here.
>> 1870 err = erase_aeb(ubi, aeb, sync);
>
> Hmm, I think the paranoia check in fastmap.c is too strict these days.
> if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
> ai->bad_peb_count - fm->used_blocks))
> goto fail_bad;
>
> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
> this check will trigger and force falling back to scanning mode.
> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.
Forgive my stubbornness, I think this strict check is good, could you
show me a process to trigger this WARN_ON, it would be nice to provide a
reproducer.
I still insist the point(after my fix patch applied): All outdated
fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted
into 'fmhdr->erase_peb_count'(fast attached case).
>
> So I agree that this code path is wonky and can be cleaned up. But please be
> extremely careful and give all your changes excessive testing with real workload
> and power-cuts.Let's list all power-cut cases during fastmap updateing(I tried to
simulate some of them on nandsim), in theory, I think the WARN_ON check
is okay and all outdated fastmap PEBs can be erased in next attaching:

ubi_update_fastmap:
1565 for (i = 1; i < new_fm->used_blocks; i++) {
1570 if (!tmp_e) {
1571 if (old_fm && old_fm->e[i]) {
/* sync erase old fm data PEBs */
power-cut!!!
1585 } else {
/* async erase old fm data PEBs */
power-cut!!!
1595 }
1596 } else {
1599 if (old_fm && old_fm->e[i]) {
/* async erase old fm data PEBs */
power-cut!!!
1603 }
1604 }
1605 }
1621 if (old_fm) {
1623 if (!tmp_e) {
/* sync erase old fm anchor PEB */
power-cut!!!
1638 } else {
/* async erase old fm anchor PEB */
power-cut!!!
1644 }
1645 } else {
1658 }
1660 ret = ubi_write_fastmap(ubi, new_fm);

ubi_write_fastmap:
1324 ret = ubi_io_write_vid_hdr(ubi, new_fm->e[0]->pnum, avbuf);
power-cut!!!
1345 for (i = 1; i < new_fm->used_blocks; i++) {
1350 ret = ubi_io_write_vid_hdr(...);
power-cut!!!
1356 }

1358 for (i = 0; i < new_fm->used_blocks; i++) {
1359 ret = ubi_io_write_data(...),
power-cut!!!
1366 }

2022-01-14 23:04:36

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <[email protected]>
> An: "richard" <[email protected]>
> CC: "Miquel Raynal" <[email protected]>, "Vignesh Raghavendra" <[email protected]>, "mcoquelin stm32"
> <[email protected]>, "kirill shutemov" <[email protected]>, "Sascha Hauer"
> <[email protected]>, "linux-mtd" <[email protected]>, "linux-kernel" <[email protected]>
> Gesendet: Mittwoch, 12. Januar 2022 04:46:28
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

>>>> ubi_wl_init() is called in both cases, with and without fastmap.
>>> I agree.
>>>
>>>> And ai->fastmap contains all anchor PEBs that scan_fast() found.
>>>> This can be the most recent but also outdated anchor PEBs.
>>> Is it exists a case that outdated fastmap PEBs are neither counted into
>>> 'fmhdr->erase_peb_count' nor scanned into 'ai->fastmap' after attaching
>>> by fastmap.
>>>
>>
>> [...]
>>
>>> I think UBI attaches failed by fastmap if kernel goes here.
>>> 1870 err = erase_aeb(ubi, aeb, sync);
>>
>> Hmm, I think the paranoia check in fastmap.c is too strict these days.
>> if (WARN_ON(count_fastmap_pebs(ai) != ubi->peb_count -
>> ai->bad_peb_count - fm->used_blocks))
>> goto fail_bad;
>>
>> It does not account ai->fastmap. So if ai->fastmap contains old anchor PEBs
>> this check will trigger and force falling back to scanning mode.
>> With this check fixed, ubi_wl_init() will erase all old PEBs from ai->fastmap.
> Forgive my stubbornness, I think this strict check is good, could you
> show me a process to trigger this WARN_ON, it would be nice to provide a
> reproducer.

You can trigger this by interrupting UBI.
e.g. When UBI writes a new fastmap to the NAND, it schedules the old fastmap
PEBs for erasure. PEB erasure is asynchronous in UBI. So this can be delayed
for a very long time.
While developing UBI fastmap and performing powercut tests I saw this often
on targets.

> I still insist the point(after my fix patch applied): All outdated
> fastmap PEBs are added into 'ai->fastmap'(full scanning case) or counted
> into 'fmhdr->erase_peb_count'(fast attached case).

Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
get erases synchronously(!). The comment before the erasure explains why.
To complicate things, this code is currently unreachable because the WARN_ON()
is not right. I misses to count ai->fastmap.
So, when there are old fastmap PEBs found, the counter does not match
and UBI falls back to full scanning while it could to an attach by fastmap.

Fastmap is full with corner cases that have been found by massive amount of testing, sadly.

Thanks,
//richard

2022-01-16 01:03:43

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <[email protected]>
> An: "richard" <[email protected]>
> CC: "Miquel Raynal" <[email protected]>, "Vignesh Raghavendra" <[email protected]>, "mcoquelin stm32"
> <[email protected]>, "kirill shutemov" <[email protected]>, "Sascha Hauer"
> <[email protected]>, "linux-mtd" <[email protected]>, "linux-kernel" <[email protected]>
> Gesendet: Samstag, 15. Januar 2022 09:46:07
> Betreff: Re: [PATCH v6 12/15] ubi: fastmap: Add all fastmap pebs into 'ai->fastmap' when fm->used_blocks>=2

>>> Yes. But if you look into ubi_wl_init() you see that fastmap anchor PEBs
>>> get erases synchronously(!). The comment before the erasure explains why.
>> About erasing fastmap anchor PEB synchronously, I admit curreunt UBI
>> implementation cannot satisfy it, even with my fix applied. Wait, it
>> seems that UBI has never made it sure. Old fastmap PEBs could be erased
>> asynchronously, they could be counted into 'fmh->erase_peb_count' even
>> in early UBI implementation code, so old fastmap anchor PEB will be
>> added into 'ai->erase' and be erased asynchronously in next attaching.
> In next attaching old fastmap PEBs will be processed as following:
> ubi_attach_fastmap -> add_aeb(ai, &ai->erase...)
> ubi_wl_init
> list_for_each_entry_safe(aeb, tmp, &ai->erase)
> erase_aeb // erase asynchronously
> ubi->lookuptbl[e->pnum] = e
> list_for_each_entry(aeb, &ai->fastmap, u.list)
> e = ubi_find_fm_block(ubi, aeb->pnum)
> if (e) {
> ...
> } else {
> if (ubi->lookuptbl[aeb->pnum]) // old fastmap PEBs are
> assigned to 'ubi->lookuptbl'
> continue;
> }
>> But, I feel it is not a problem, find_fm_anchor() can help us find out
> > the right fastmap anchor PEB according seqnum.

FYI, I think I understand now our disagreement.
You assume that old Fastmap PEBs are *guaranteed* to be part of Fastmap's erase list.
That's okay and this is what Linux as of today does.

My point is that we need to be paranoid and check carefully for old Fastmap PEBs
which might be *not* on the erase list.
I saw such issues in the wild. These were causes by old and/or buggy Fastmap
implementations.
Keep in mind that Linux is not the only system that implements UBI (and fastmap).

So let me give the whole situation another thought on how to improve it.
I totally agree with you that currently there is a problem with fm->used_blocks > 1.
I'm just careful (maybe too careful) about changing Fastmap code.

Thanks,
//richard