2024-04-10 07:46:55

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH 0/9] ubifs: Fix a serious of inconsistent problems when powercut happens

UBIFS is tolerant of powercut, this patchset fix some inconsistent
problems when powercut happens.
Xfstests,mtd-utils and powercut tests[1] are passed(Enable chk* debugging)
after this patchset applied.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=218309

performance regression:

4GB nandsim 256KB PEB 4KB page

1. fio

direct=0
bs=4k
sync=0
iodepth=128
rw=write
iodepth_batch_complete=1
iodepth_batch=1
numjobs=1
directory=/root/temp
size=1G
runtime=20s
time_based=1
before after
1 thread seq write 412MiB/s 409MiB/s
1 thread random write 402MiB/s 397MiB/s
4 thread seq write 163MiB/s 164MiB/s
4 thread random write 162MiB/s 160MiB/s

2. fsmark
fs_mark -d /root/temp -s 10240 -n 1000
before after
7131 Files/sec 7090 Files/sec


Zhihao Cheng (9):
ubifs: Fix unattached xattr inode if powercut happens after deleting
ubifs: Don't add xattr inode into orphan area
Revert "ubifs: ubifs_symlink: Fix memleak of inode->i_link in error
path"
ubifs: Remove insert_dead_orphan from replaying orphan process
ubifs: Fix adding orphan entry twice for the same inode
ubifs: Move ui->data initialization after initializing security
ubifs: Fix space leak when powercut happens in linking tmpfile
ubifs: Fix unattached inode when powercut happens in creating
ubifs: dbg_orphan_check: Fix missed key type checking

fs/ubifs/dir.c | 91 +++++++++++++++-----------
fs/ubifs/journal.c | 14 +++-
fs/ubifs/orphan.c | 155 ++++++++-------------------------------------
fs/ubifs/ubifs.h | 7 +-
fs/ubifs/xattr.c | 2 +-
5 files changed, 92 insertions(+), 177 deletions(-)

--
2.39.2



2024-04-10 07:47:15

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH 6/9] ubifs: Move ui->data initialization after initializing security

Host inode and its' xattr will be written on disk after initializing
security when creating symlink or dev, then the host inode and its
dentry will be written again in ubifs_jnl_update.
There is no need to write inode data in the security initialization
pass, just move the ui->data initialization after initializing
security.

Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/dir.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 551148de66cd..848988f2b7dc 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1102,16 +1102,18 @@ static int ubifs_mknod(struct mnt_idmap *idmap, struct inode *dir,
goto out_fname;
}

+ err = ubifs_init_security(dir, inode, &dentry->d_name);
+ if (err) {
+ kfree(dev);
+ goto out_inode;
+ }
+
init_special_inode(inode, inode->i_mode, rdev);
inode->i_size = ubifs_inode(inode)->ui_size = devlen;
ui = ubifs_inode(inode);
ui->data = dev;
ui->data_len = devlen;

- err = ubifs_init_security(dir, inode, &dentry->d_name);
- if (err)
- goto out_inode;
-
mutex_lock(&dir_ui->ui_mutex);
dir->i_size += sz_change;
dir_ui->ui_size = dir->i_size;
@@ -1184,6 +1186,10 @@ static int ubifs_symlink(struct mnt_idmap *idmap, struct inode *dir,
goto out_fname;
}

+ err = ubifs_init_security(dir, inode, &dentry->d_name);
+ if (err)
+ goto out_inode;
+
ui = ubifs_inode(inode);
ui->data = kmalloc(disk_link.len, GFP_NOFS);
if (!ui->data) {
@@ -1209,10 +1215,6 @@ static int ubifs_symlink(struct mnt_idmap *idmap, struct inode *dir,
ui->data_len = disk_link.len - 1;
inode->i_size = ubifs_inode(inode)->ui_size = disk_link.len - 1;

- err = ubifs_init_security(dir, inode, &dentry->d_name);
- if (err)
- goto out_inode;
-
mutex_lock(&dir_ui->ui_mutex);
dir->i_size += sz_change;
dir_ui->ui_size = dir->i_size;
--
2.39.2


2024-04-10 07:47:51

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH 7/9] ubifs: Fix space leak when powercut happens in linking tmpfile

There is a potential space leak problem when powercut happens in linking
tmpfile, in which case, inode node (with nlink=0) and its' data nodes can
be found from tnc (on flash), but there are no dentries related to the
inode, so the file is invisible but takes free space. Detailed process is
shown as:
ubifs_tmpfile
ubifs_jnl_update // Add bud A into log area
ubifs_add_orphan // Add inode into orphan list

P1 P2
ubifs_link
ubifs_delete_orphan // Delete inode from orphan list, then inode won't
// be written into orphan area, there is no chance
// to delete inode by replaying orphan.
commit // bud A won't be replayed in next mounting
>> powercut <<
ubifs_jnl_update // Link inode to dentry

The root cause is that orphan entry deletion and journal writing(for link)
are interrupted by commit, which makes the two operations are not atomic.
Fix it by doing ubifs_delete_orphan under the protection of c->commit_sem
within ubifs_jnl_update. This is also a preparation to support all creating
new files by orphan inode.

v1 is https://lore.kernel.org/linux-mtd/[email protected]/

Fixes: 32fe905c17f0 ("ubifs: Fix O_TMPFILE corner case in ubifs_link()")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208405
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/dir.c | 22 ++++++++--------------
fs/ubifs/journal.c | 6 +++++-
fs/ubifs/ubifs.h | 2 +-
fs/ubifs/xattr.c | 2 +-
4 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index 848988f2b7dc..fe16443243ab 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -325,7 +325,7 @@ static int ubifs_create(struct mnt_idmap *idmap, struct inode *dir,
dir_ui->ui_size = dir->i_size;
inode_set_mtime_to_ts(dir,
inode_set_ctime_to_ts(dir, inode_get_ctime(inode)));
- err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 0);
if (err)
goto out_cancel;
mutex_unlock(&dir_ui->ui_mutex);
@@ -479,7 +479,7 @@ static int ubifs_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
mutex_unlock(&ui->ui_mutex);

lock_2_inodes(dir, inode);
- err = ubifs_jnl_update(c, dir, &nm, inode, 1, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 1, 0, 0);
if (err)
goto out_cancel;
unlock_2_inodes(dir, inode);
@@ -760,10 +760,6 @@ static int ubifs_link(struct dentry *old_dentry, struct inode *dir,

lock_2_inodes(dir, inode);

- /* Handle O_TMPFILE corner case, it is allowed to link a O_TMPFILE. */
- if (inode->i_nlink == 0)
- ubifs_delete_orphan(c, inode->i_ino);
-
inc_nlink(inode);
ihold(inode);
inode_set_ctime_current(inode);
@@ -771,7 +767,7 @@ static int ubifs_link(struct dentry *old_dentry, struct inode *dir,
dir_ui->ui_size = dir->i_size;
inode_set_mtime_to_ts(dir,
inode_set_ctime_to_ts(dir, inode_get_ctime(inode)));
- err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, inode->i_nlink == 1);
if (err)
goto out_cancel;
unlock_2_inodes(dir, inode);
@@ -785,8 +781,6 @@ static int ubifs_link(struct dentry *old_dentry, struct inode *dir,
dir->i_size -= sz_change;
dir_ui->ui_size = dir->i_size;
drop_nlink(inode);
- if (inode->i_nlink == 0)
- ubifs_add_orphan(c, inode->i_ino);
unlock_2_inodes(dir, inode);
ubifs_release_budget(c, &req);
iput(inode);
@@ -846,7 +840,7 @@ static int ubifs_unlink(struct inode *dir, struct dentry *dentry)
dir_ui->ui_size = dir->i_size;
inode_set_mtime_to_ts(dir,
inode_set_ctime_to_ts(dir, inode_get_ctime(inode)));
- err = ubifs_jnl_update(c, dir, &nm, inode, 1, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 1, 0, 0);
if (err)
goto out_cancel;
unlock_2_inodes(dir, inode);
@@ -950,7 +944,7 @@ static int ubifs_rmdir(struct inode *dir, struct dentry *dentry)
dir_ui->ui_size = dir->i_size;
inode_set_mtime_to_ts(dir,
inode_set_ctime_to_ts(dir, inode_get_ctime(inode)));
- err = ubifs_jnl_update(c, dir, &nm, inode, 1, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 1, 0, 0);
if (err)
goto out_cancel;
unlock_2_inodes(dir, inode);
@@ -1025,7 +1019,7 @@ static int ubifs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
dir_ui->ui_size = dir->i_size;
inode_set_mtime_to_ts(dir,
inode_set_ctime_to_ts(dir, inode_get_ctime(inode)));
- err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 0);
if (err) {
ubifs_err(c, "cannot create directory, error %d", err);
goto out_cancel;
@@ -1119,7 +1113,7 @@ static int ubifs_mknod(struct mnt_idmap *idmap, struct inode *dir,
dir_ui->ui_size = dir->i_size;
inode_set_mtime_to_ts(dir,
inode_set_ctime_to_ts(dir, inode_get_ctime(inode)));
- err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 0);
if (err)
goto out_cancel;
mutex_unlock(&dir_ui->ui_mutex);
@@ -1220,7 +1214,7 @@ static int ubifs_symlink(struct mnt_idmap *idmap, struct inode *dir,
dir_ui->ui_size = dir->i_size;
inode_set_mtime_to_ts(dir,
inode_set_ctime_to_ts(dir, inode_get_ctime(inode)));
- err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 0);
if (err)
goto out_cancel;
mutex_unlock(&dir_ui->ui_mutex);
diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
index f997a85bcdce..3178020ea3c1 100644
--- a/fs/ubifs/journal.c
+++ b/fs/ubifs/journal.c
@@ -643,6 +643,7 @@ static void set_dent_cookie(struct ubifs_info *c, struct ubifs_dent_node *dent)
* @inode: inode to update
* @deletion: indicates a directory entry deletion i.e unlink or rmdir
* @xent: non-zero if the directory entry is an extended attribute entry
+ * @delete_orphan: indicates an orphan entry deletion for @inode
*
* This function updates an inode by writing a directory entry (or extended
* attribute entry), the inode itself, and the parent directory inode (or the
@@ -664,7 +665,7 @@ static void set_dent_cookie(struct ubifs_info *c, struct ubifs_dent_node *dent)
*/
int ubifs_jnl_update(struct ubifs_info *c, const struct inode *dir,
const struct fscrypt_name *nm, const struct inode *inode,
- int deletion, int xent)
+ int deletion, int xent, int delete_orphan)
{
int err, dlen, ilen, len, lnum, ino_offs, dent_offs, orphan_added = 0;
int aligned_dlen, aligned_ilen, sync = IS_DIRSYNC(dir);
@@ -806,6 +807,9 @@ int ubifs_jnl_update(struct ubifs_info *c, const struct inode *dir,
if (err)
goto out_ro;

+ if (delete_orphan)
+ ubifs_delete_orphan(c, inode->i_ino);
+
finish_reservation(c);
spin_lock(&ui->ui_lock);
ui->synced_i_size = ui->ui_size;
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index 3651a87e64b2..14d28c5456f4 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -1800,7 +1800,7 @@ int ubifs_consolidate_log(struct ubifs_info *c);
/* journal.c */
int ubifs_jnl_update(struct ubifs_info *c, const struct inode *dir,
const struct fscrypt_name *nm, const struct inode *inode,
- int deletion, int xent);
+ int deletion, int xent, int delete_orphan);
int ubifs_jnl_write_data(struct ubifs_info *c, const struct inode *inode,
const union ubifs_key *key, const void *buf, int len);
int ubifs_jnl_write_inode(struct ubifs_info *c, const struct inode *inode);
diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
index 0847db521984..f734588b224a 100644
--- a/fs/ubifs/xattr.c
+++ b/fs/ubifs/xattr.c
@@ -149,7 +149,7 @@ static int create_xattr(struct ubifs_info *c, struct inode *host,
if (strcmp(fname_name(nm), UBIFS_XATTR_NAME_ENCRYPTION_CONTEXT) == 0)
host_ui->flags |= UBIFS_CRYPT_FL;

- err = ubifs_jnl_update(c, host, nm, inode, 0, 1);
+ err = ubifs_jnl_update(c, host, nm, inode, 0, 1, 0);
if (err)
goto out_cancel;
ubifs_set_inode_flags(host);
--
2.39.2


2024-04-10 07:48:04

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH 9/9] ubifs: dbg_orphan_check: Fix missed key type checking

When selinux/encryption is enabled, xattr entry node is added into TNC
before host inode when creating new file. So it is possible to find
xattr entry without host inode from TNC. Orphan debug checking is called
by ubifs_orphan_end_commit(), at that time, the commit semaphore is
already unlock, so the new creation won't be blocked.

Fixes: d7f0b70d30ff ("UBIFS: Add security.* XATTR support for the UBIFS")
Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto")
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/orphan.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
index 37d206097112..fb957d963ba6 100644
--- a/fs/ubifs/orphan.c
+++ b/fs/ubifs/orphan.c
@@ -816,8 +816,12 @@ static int dbg_orphan_check(struct ubifs_info *c, struct ubifs_zbranch *zbr,

inum = key_inum(c, &zbr->key);
if (inum != ci->last_ino) {
- /* Lowest node type is the inode node, so it comes first */
- if (key_type(c, &zbr->key) != UBIFS_INO_KEY)
+ /*
+ * Lowest node type is the inode node or xattr entry(when
+ * selinux/encryption is enabled), so it comes first
+ */
+ if (key_type(c, &zbr->key) != UBIFS_INO_KEY &&
+ key_type(c, &zbr->key) != UBIFS_XENT_KEY)
ubifs_err(c, "found orphan node ino %lu, type %d",
(unsigned long)inum, key_type(c, &zbr->key));
ci->last_ino = inum;
--
2.39.2


2024-04-10 07:48:20

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH 8/9] ubifs: Fix unattached inode when powercut happens in creating

For selinux or encryption scenarios, UBIFS could become inconsistent
while creating new files in powercut case. Encryption/selinux related
xattrs will be created before creating file dentry, which makes creation
process is not atomic, details are shown as:

Encryption case:
ubifs_create
ubifs_new_inode
fscrypt_set_context
ubifs_xattr_set
create_xattr
ubifs_jnl_update // Disk: xentry xinode inode(LAST_OF_NODE_GROUP)
>> power cut <<
ubifs_jnl_update // Disk: dentry inode parent_inode(LAST_OF_NODE_GROUP)

Selinux case:
ubifs_create
ubifs_new_inode
ubifs_init_security
security_inode_init_security
ubifs_xattr_set
create_xattr
ubifs_jnl_update // Disk: xentry xinode inode(LAST_OF_NODE_GROUP)
>> power cut <<
ubifs_jnl_update // Disk: dentry inode parent_inode(LAST_OF_NODE_GROUP)

Above process will make chk_fs failed in next mounting:
UBIFS error (ubi0:0 pid 7995): dbg_check_filesystem [ubifs]: inode 66
nlink is 1, but calculated nlink is 0

Fix it by allocating orphan inode for each non-xattr file creation, then
removing orphan list in journal writing process, which ensures that both
xattr and dentry be effective in atomic when powercut happens.

Fixes: d7f0b70d30ff ("UBIFS: Add security.* XATTR support for the UBIFS")
Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218309
Suggested-by: Zhang Yi <[email protected]>
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/dir.c | 59 +++++++++++++++++++++++++++++++---------------
fs/ubifs/journal.c | 14 +++++++----
fs/ubifs/ubifs.h | 4 ++--
3 files changed, 51 insertions(+), 26 deletions(-)

diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index fe16443243ab..c77ea57fe696 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -71,8 +71,13 @@ static int inherit_flags(const struct inode *dir, umode_t mode)
* @is_xattr: whether the inode is xattr inode
*
* This function finds an unused inode number, allocates new inode and
- * initializes it. Returns new inode in case of success and an error code in
- * case of failure.
+ * initializes it. Non-xattr new inode may be written with xattrs(selinux/
+ * encryption) before writing dentry, which could cause inconsistent problem
+ * when powercut happens between two operations. To deal with it, non-xattr
+ * new inode is initialized with zero-nlink and added into orphan list, caller
+ * should make sure that inode is relinked later, and make sure that orphan
+ * removing and journal writing into an committing atomic operation. Returns
+ * new inode in case of success and an error code in case of failure.
*/
struct inode *ubifs_new_inode(struct ubifs_info *c, struct inode *dir,
umode_t mode, bool is_xattr)
@@ -163,9 +168,25 @@ struct inode *ubifs_new_inode(struct ubifs_info *c, struct inode *dir,
ui->creat_sqnum = ++c->max_sqnum;
spin_unlock(&c->cnt_lock);

+ if (!is_xattr) {
+ set_nlink(inode, 0);
+ err = ubifs_add_orphan(c, inode->i_ino);
+ if (err) {
+ ubifs_err(c, "ubifs_add_orphan failed: %i", err);
+ goto out_iput;
+ }
+ down_read(&c->commit_sem);
+ ui->del_cmtno = c->cmt_no;
+ up_read(&c->commit_sem);
+ }
+
if (encrypted) {
err = fscrypt_set_context(inode, NULL);
if (err) {
+ if (!is_xattr) {
+ set_nlink(inode, 1);
+ ubifs_delete_orphan(c, inode->i_ino);
+ }
ubifs_err(c, "fscrypt_set_context failed: %i", err);
goto out_iput;
}
@@ -320,12 +341,13 @@ static int ubifs_create(struct mnt_idmap *idmap, struct inode *dir,
if (err)
goto out_inode;

+ set_nlink(inode, 1);
mutex_lock(&dir_ui->ui_mutex);
dir->i_size += sz_change;
dir_ui->ui_size = dir->i_size;
inode_set_mtime_to_ts(dir,
inode_set_ctime_to_ts(dir, inode_get_ctime(inode)));
- err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 1);
if (err)
goto out_cancel;
mutex_unlock(&dir_ui->ui_mutex);
@@ -340,8 +362,8 @@ static int ubifs_create(struct mnt_idmap *idmap, struct inode *dir,
dir->i_size -= sz_change;
dir_ui->ui_size = dir->i_size;
mutex_unlock(&dir_ui->ui_mutex);
+ set_nlink(inode, 0);
out_inode:
- make_bad_inode(inode);
iput(inode);
out_fname:
fscrypt_free_filename(&nm);
@@ -386,7 +408,6 @@ static struct inode *create_whiteout(struct inode *dir, struct dentry *dentry)
return inode;

out_inode:
- make_bad_inode(inode);
iput(inode);
out_free:
ubifs_err(c, "cannot create whiteout file, error %d", err);
@@ -470,6 +491,7 @@ static int ubifs_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
if (err)
goto out_inode;

+ set_nlink(inode, 1);
mutex_lock(&ui->ui_mutex);
insert_inode_hash(inode);
d_tmpfile(file, inode);
@@ -479,7 +501,7 @@ static int ubifs_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
mutex_unlock(&ui->ui_mutex);

lock_2_inodes(dir, inode);
- err = ubifs_jnl_update(c, dir, &nm, inode, 1, 0, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 1, 0, 1);
if (err)
goto out_cancel;
unlock_2_inodes(dir, inode);
@@ -492,7 +514,6 @@ static int ubifs_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
out_cancel:
unlock_2_inodes(dir, inode);
out_inode:
- make_bad_inode(inode);
if (!instantiated)
iput(inode);
out_budg:
@@ -1011,6 +1032,7 @@ static int ubifs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
if (err)
goto out_inode;

+ set_nlink(inode, 1);
mutex_lock(&dir_ui->ui_mutex);
insert_inode_hash(inode);
inc_nlink(inode);
@@ -1019,7 +1041,7 @@ static int ubifs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
dir_ui->ui_size = dir->i_size;
inode_set_mtime_to_ts(dir,
inode_set_ctime_to_ts(dir, inode_get_ctime(inode)));
- err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 1);
if (err) {
ubifs_err(c, "cannot create directory, error %d", err);
goto out_cancel;
@@ -1036,8 +1058,8 @@ static int ubifs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
dir_ui->ui_size = dir->i_size;
drop_nlink(dir);
mutex_unlock(&dir_ui->ui_mutex);
+ set_nlink(inode, 0);
out_inode:
- make_bad_inode(inode);
iput(inode);
out_fname:
fscrypt_free_filename(&nm);
@@ -1107,13 +1129,14 @@ static int ubifs_mknod(struct mnt_idmap *idmap, struct inode *dir,
ui = ubifs_inode(inode);
ui->data = dev;
ui->data_len = devlen;
+ set_nlink(inode, 1);

mutex_lock(&dir_ui->ui_mutex);
dir->i_size += sz_change;
dir_ui->ui_size = dir->i_size;
inode_set_mtime_to_ts(dir,
inode_set_ctime_to_ts(dir, inode_get_ctime(inode)));
- err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 1);
if (err)
goto out_cancel;
mutex_unlock(&dir_ui->ui_mutex);
@@ -1128,8 +1151,8 @@ static int ubifs_mknod(struct mnt_idmap *idmap, struct inode *dir,
dir->i_size -= sz_change;
dir_ui->ui_size = dir->i_size;
mutex_unlock(&dir_ui->ui_mutex);
+ set_nlink(inode, 0);
out_inode:
- make_bad_inode(inode);
iput(inode);
out_fname:
fscrypt_free_filename(&nm);
@@ -1208,13 +1231,14 @@ static int ubifs_symlink(struct mnt_idmap *idmap, struct inode *dir,
*/
ui->data_len = disk_link.len - 1;
inode->i_size = ubifs_inode(inode)->ui_size = disk_link.len - 1;
+ set_nlink(inode, 1);

mutex_lock(&dir_ui->ui_mutex);
dir->i_size += sz_change;
dir_ui->ui_size = dir->i_size;
inode_set_mtime_to_ts(dir,
inode_set_ctime_to_ts(dir, inode_get_ctime(inode)));
- err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 0);
+ err = ubifs_jnl_update(c, dir, &nm, inode, 0, 0, 1);
if (err)
goto out_cancel;
mutex_unlock(&dir_ui->ui_mutex);
@@ -1228,10 +1252,10 @@ static int ubifs_symlink(struct mnt_idmap *idmap, struct inode *dir,
dir->i_size -= sz_change;
dir_ui->ui_size = dir->i_size;
mutex_unlock(&dir_ui->ui_mutex);
+ set_nlink(inode, 0);
out_inode:
/* Free inode->i_link before inode is marked as bad. */
fscrypt_free_inode(inode);
- make_bad_inode(inode);
iput(inode);
out_fname:
fscrypt_free_filename(&nm);
@@ -1399,14 +1423,10 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
*/
err = ubifs_budget_space(c, &wht_req);
if (err) {
- /*
- * Whiteout inode can not be written on flash by
- * ubifs_jnl_write_inode(), because it's neither
- * dirty nor zero-nlink.
- */
iput(whiteout);
goto out_release;
}
+ set_nlink(whiteout, 1);

/* Add the old_dentry size to the old_dir size. */
old_sz -= CALC_DENT_SIZE(fname_len(&old_nm));
@@ -1485,7 +1505,7 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
}

err = ubifs_jnl_rename(c, old_dir, old_inode, &old_nm, new_dir,
- new_inode, &new_nm, whiteout, sync);
+ new_inode, &new_nm, whiteout, sync, !!whiteout);
if (err)
goto out_cancel;

@@ -1538,6 +1558,7 @@ static int do_rename(struct inode *old_dir, struct dentry *old_dentry,
unlock_4_inodes(old_dir, new_dir, new_inode, whiteout);
if (whiteout) {
ubifs_release_budget(c, &wht_req);
+ set_nlink(whiteout, 0);
iput(whiteout);
}
out_release:
diff --git a/fs/ubifs/journal.c b/fs/ubifs/journal.c
index 3178020ea3c1..4a35f9e8f668 100644
--- a/fs/ubifs/journal.c
+++ b/fs/ubifs/journal.c
@@ -643,7 +643,7 @@ static void set_dent_cookie(struct ubifs_info *c, struct ubifs_dent_node *dent)
* @inode: inode to update
* @deletion: indicates a directory entry deletion i.e unlink or rmdir
* @xent: non-zero if the directory entry is an extended attribute entry
- * @delete_orphan: indicates an orphan entry deletion for @inode
+ * @in_orphan: indicates whether the @inode is in orphan list
*
* This function updates an inode by writing a directory entry (or extended
* attribute entry), the inode itself, and the parent directory inode (or the
@@ -665,7 +665,7 @@ static void set_dent_cookie(struct ubifs_info *c, struct ubifs_dent_node *dent)
*/
int ubifs_jnl_update(struct ubifs_info *c, const struct inode *dir,
const struct fscrypt_name *nm, const struct inode *inode,
- int deletion, int xent, int delete_orphan)
+ int deletion, int xent, int in_orphan)
{
int err, dlen, ilen, len, lnum, ino_offs, dent_offs, orphan_added = 0;
int aligned_dlen, aligned_ilen, sync = IS_DIRSYNC(dir);
@@ -751,7 +751,7 @@ int ubifs_jnl_update(struct ubifs_info *c, const struct inode *dir,
if (err)
goto out_release;

- if (last_reference) {
+ if (last_reference && !in_orphan) {
err = ubifs_add_orphan(c, inode->i_ino);
if (err) {
release_head(c, BASEHD);
@@ -807,7 +807,7 @@ int ubifs_jnl_update(struct ubifs_info *c, const struct inode *dir,
if (err)
goto out_ro;

- if (delete_orphan)
+ if (in_orphan && inode->i_nlink)
ubifs_delete_orphan(c, inode->i_ino);

finish_reservation(c);
@@ -1340,6 +1340,7 @@ int ubifs_jnl_xrename(struct ubifs_info *c, const struct inode *fst_dir,
* @new_nm: new name of the new directory entry
* @whiteout: whiteout inode
* @sync: non-zero if the write-buffer has to be synchronized
+ * @delete_orphan: indicates an orphan entry deletion for @whiteout
*
* This function implements the re-name operation which may involve writing up
* to 4 inodes(new inode, whiteout inode, old and new parent directory inodes)
@@ -1352,7 +1353,7 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
const struct inode *new_dir,
const struct inode *new_inode,
const struct fscrypt_name *new_nm,
- const struct inode *whiteout, int sync)
+ const struct inode *whiteout, int sync, int delete_orphan)
{
void *p;
union ubifs_key key;
@@ -1569,6 +1570,9 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
goto out_ro;
}

+ if (delete_orphan)
+ ubifs_delete_orphan(c, whiteout->i_ino);
+
finish_reservation(c);
if (new_inode) {
mark_inode_clean(c, new_ui);
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index 14d28c5456f4..12216a159227 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -1800,7 +1800,7 @@ int ubifs_consolidate_log(struct ubifs_info *c);
/* journal.c */
int ubifs_jnl_update(struct ubifs_info *c, const struct inode *dir,
const struct fscrypt_name *nm, const struct inode *inode,
- int deletion, int xent, int delete_orphan);
+ int deletion, int xent, int in_orphan);
int ubifs_jnl_write_data(struct ubifs_info *c, const struct inode *inode,
const union ubifs_key *key, const void *buf, int len);
int ubifs_jnl_write_inode(struct ubifs_info *c, const struct inode *inode);
@@ -1817,7 +1817,7 @@ int ubifs_jnl_rename(struct ubifs_info *c, const struct inode *old_dir,
const struct inode *new_dir,
const struct inode *new_inode,
const struct fscrypt_name *new_nm,
- const struct inode *whiteout, int sync);
+ const struct inode *whiteout, int sync, int delete_orphan);
int ubifs_jnl_truncate(struct ubifs_info *c, const struct inode *inode,
loff_t old_size, loff_t new_size);
int ubifs_jnl_delete_xattr(struct ubifs_info *c, const struct inode *host,
--
2.39.2


2024-04-10 07:51:23

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH 2/9] ubifs: Don't add xattr inode into orphan area

Now, the entire inode with its' xattrs are removed while replaying
orphan nodes. There is no need to add xattr inodes into orphan area,
which is based on the fact that xattr entries won't be cleared from
disk before deleting xattr inodes, in another words, current logic
can make sure that xattr inode be deleted in any cases even UBIFS not
record xattr inode into orphan area.
Let's looking for possible paths that could clear xattr entries from
disk but leave the xattr inode on TNC:
1. unlink/tmpfile -> ubifs_jnl_update: inode(nlink=0) is written
into bud LEB and added into orphan list, then:
a. powercut: ubifs_tnc_remove_ino(xattr entry/inode can be found
from TNC and being deleted) is invoked in replaying journal.
b. commit + powercut: inode is written into orphan area, and
ubifs_tnc_remove_ino is invoked in replaying orphan nodes.
c. evicting + powercut: xattr inode(nlink=0) is written on disk,
xattr is removed from TNC, gc could clear xattr entries from
disk. ubifs_tnc_remove_ino will apply on inode and xattr inode
in replaying journal, so lost xattr entries will make no
influence.
d. evicting + commit + powercut: xattr inode/entry are removed from
index tree(on disk) by ubifs_jnl_write_inode, xattr inode is
cleared from orphan area by ubifs_jnl_write_inode + commit.
e. commit + evicting + powercut: inode is written into orphan area,
then equivalent to c.
2. remove xattr -> ubifs_jnl_delete_xattr: xattr entry(inum=0) and
xattr inode(nlink=0) is written into bud LEB, xattr entry/inode are
removed from TNC, then:
a. powercut: gc could clear xattr entries from disk, which won't
affect deleting xattr entry from TNC. ubifs_tnc_remove_ino will
apply on xattr inode in replaying journal, ubifs_tnc_remove_nm
will apply on xattr entry in replaying journal.
b. commit + powercut: xattr entry/inode are removed from index tree
(on disk).
Tracking xattr inode in orphan list is imported by commit 988bec41318f3f
("ubifs: orphan: Handle xattrs like files"), it aims to fix the similar
problem described in commit 7959cf3a7506d4a ("ubifs: journal: Handle
xattrs like files"). Actually, the problem only exist in journal case
but not the orphan case. So, we can remove the orphan tracking for xattr
inodes.

Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/orphan.c | 85 ++++++++---------------------------------------
fs/ubifs/ubifs.h | 3 --
2 files changed, 14 insertions(+), 74 deletions(-)

diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
index ddeb125e6930..88fbf331ad8c 100644
--- a/fs/ubifs/orphan.c
+++ b/fs/ubifs/orphan.c
@@ -42,24 +42,30 @@

static int dbg_check_orphans(struct ubifs_info *c);

-static struct ubifs_orphan *orphan_add(struct ubifs_info *c, ino_t inum,
- struct ubifs_orphan *parent_orphan)
+/**
+ * ubifs_add_orphan - add an orphan.
+ * @c: UBIFS file-system description object
+ * @inum: orphan inode number
+ *
+ * Add an orphan. This function is called when an inodes link count drops to
+ * zero.
+ */
+int ubifs_add_orphan(struct ubifs_info *c, ino_t inum)
{
struct ubifs_orphan *orphan, *o;
struct rb_node **p, *parent = NULL;

orphan = kzalloc(sizeof(struct ubifs_orphan), GFP_NOFS);
if (!orphan)
- return ERR_PTR(-ENOMEM);
+ return -ENOMEM;
orphan->inum = inum;
orphan->new = 1;
- INIT_LIST_HEAD(&orphan->child_list);

spin_lock(&c->orphan_lock);
if (c->tot_orphans >= c->max_orphans) {
spin_unlock(&c->orphan_lock);
kfree(orphan);
- return ERR_PTR(-ENFILE);
+ return -ENFILE;
}
p = &c->orph_tree.rb_node;
while (*p) {
@@ -73,7 +79,7 @@ static struct ubifs_orphan *orphan_add(struct ubifs_info *c, ino_t inum,
ubifs_err(c, "orphaned twice");
spin_unlock(&c->orphan_lock);
kfree(orphan);
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
}
}
c->tot_orphans += 1;
@@ -83,14 +89,9 @@ static struct ubifs_orphan *orphan_add(struct ubifs_info *c, ino_t inum,
list_add_tail(&orphan->list, &c->orph_list);
list_add_tail(&orphan->new_list, &c->orph_new);

- if (parent_orphan) {
- list_add_tail(&orphan->child_list,
- &parent_orphan->child_list);
- }
-
spin_unlock(&c->orphan_lock);
dbg_gen("ino %lu", (unsigned long)inum);
- return orphan;
+ return 0;
}

static struct ubifs_orphan *lookup_orphan(struct ubifs_info *c, ino_t inum)
@@ -144,59 +145,6 @@ static void orphan_delete(struct ubifs_info *c, struct ubifs_orphan *orph)
__orphan_drop(c, orph);
}

-/**
- * ubifs_add_orphan - add an orphan.
- * @c: UBIFS file-system description object
- * @inum: orphan inode number
- *
- * Add an orphan. This function is called when an inodes link count drops to
- * zero.
- */
-int ubifs_add_orphan(struct ubifs_info *c, ino_t inum)
-{
- int err = 0;
- ino_t xattr_inum;
- union ubifs_key key;
- struct ubifs_dent_node *xent, *pxent = NULL;
- struct fscrypt_name nm = {0};
- struct ubifs_orphan *xattr_orphan;
- struct ubifs_orphan *orphan;
-
- orphan = orphan_add(c, inum, NULL);
- if (IS_ERR(orphan))
- return PTR_ERR(orphan);
-
- lowest_xent_key(c, &key, inum);
- while (1) {
- xent = ubifs_tnc_next_ent(c, &key, &nm);
- if (IS_ERR(xent)) {
- err = PTR_ERR(xent);
- if (err == -ENOENT)
- break;
- kfree(pxent);
- return err;
- }
-
- fname_name(&nm) = xent->name;
- fname_len(&nm) = le16_to_cpu(xent->nlen);
- xattr_inum = le64_to_cpu(xent->inum);
-
- xattr_orphan = orphan_add(c, xattr_inum, orphan);
- if (IS_ERR(xattr_orphan)) {
- kfree(pxent);
- kfree(xent);
- return PTR_ERR(xattr_orphan);
- }
-
- kfree(pxent);
- pxent = xent;
- key_read(c, &xent->key, &key);
- }
- kfree(pxent);
-
- return 0;
-}
-
/**
* ubifs_delete_orphan - delete an orphan.
* @c: UBIFS file-system description object
@@ -206,7 +154,7 @@ int ubifs_add_orphan(struct ubifs_info *c, ino_t inum)
*/
void ubifs_delete_orphan(struct ubifs_info *c, ino_t inum)
{
- struct ubifs_orphan *orph, *child_orph, *tmp_o;
+ struct ubifs_orphan *orph;

spin_lock(&c->orphan_lock);

@@ -219,11 +167,6 @@ void ubifs_delete_orphan(struct ubifs_info *c, ino_t inum)
return;
}

- list_for_each_entry_safe(child_orph, tmp_o, &orph->child_list, child_list) {
- list_del(&child_orph->child_list);
- orphan_delete(c, child_orph);
- }
-
orphan_delete(c, orph);

spin_unlock(&c->orphan_lock);
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index 7a7c62104f2b..3651a87e64b2 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -923,8 +923,6 @@ struct ubifs_budget_req {
* @rb: rb-tree node of rb-tree of orphans sorted by inode number
* @list: list head of list of orphans in order added
* @new_list: list head of list of orphans added since the last commit
- * @child_list: list of xattr children if this orphan hosts xattrs, list head
- * if this orphan is a xattr, not used otherwise.
* @cnext: next orphan to commit
* @dnext: next orphan to delete
* @inum: inode number
@@ -936,7 +934,6 @@ struct ubifs_orphan {
struct rb_node rb;
struct list_head list;
struct list_head new_list;
- struct list_head child_list;
struct ubifs_orphan *cnext;
struct ubifs_orphan *dnext;
ino_t inum;
--
2.39.2


2024-04-10 07:52:33

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH 4/9] ubifs: Remove insert_dead_orphan from replaying orphan process

UBIFS will do commit at the end of mounting process(rw mode), dead
orphans(added by insert_dead_orphan in replaying orphan) are deleted
by ubifs_orphan_end_commit(). The only reason why dead orphans are
added into orphan list is that old orpans may be lost when powercut
happens in ubifs_orphan_end_commit():
ubifs_orphan_end_commit // TNC(updated by orphans) is not written yet
if (c->cmt_orphans != 0)
commit_orphans
consolidate // traverse orphan list
write_orph_nodes // rewrite all orphans by ubifs_leb_change
// If dead orphans are not in list, they will be lost when powercut
// happens, then TNC won't be updated by old orphans in next mounting.
Luckily, the condition 'c->cmt_orphans != 0' will never be true in
mounting process, there can't be new orphans added into orphan list
before mounting returned, but commit will be done at the end of mounting.

Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/orphan.c | 49 -----------------------------------------------
1 file changed, 49 deletions(-)

diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
index 88fbf331ad8c..6e843e8fc3db 100644
--- a/fs/ubifs/orphan.c
+++ b/fs/ubifs/orphan.c
@@ -513,51 +513,6 @@ int ubifs_clear_orphans(struct ubifs_info *c)
return 0;
}

-/**
- * insert_dead_orphan - insert an orphan.
- * @c: UBIFS file-system description object
- * @inum: orphan inode number
- *
- * This function is a helper to the 'do_kill_orphans()' function. The orphan
- * must be kept until the next commit, so it is added to the rb-tree and the
- * deletion list.
- */
-static int insert_dead_orphan(struct ubifs_info *c, ino_t inum)
-{
- struct ubifs_orphan *orphan, *o;
- struct rb_node **p, *parent = NULL;
-
- orphan = kzalloc(sizeof(struct ubifs_orphan), GFP_KERNEL);
- if (!orphan)
- return -ENOMEM;
- orphan->inum = inum;
-
- p = &c->orph_tree.rb_node;
- while (*p) {
- parent = *p;
- o = rb_entry(parent, struct ubifs_orphan, rb);
- if (inum < o->inum)
- p = &(*p)->rb_left;
- else if (inum > o->inum)
- p = &(*p)->rb_right;
- else {
- /* Already added - no problem */
- kfree(orphan);
- return 0;
- }
- }
- c->tot_orphans += 1;
- rb_link_node(&orphan->rb, parent, p);
- rb_insert_color(&orphan->rb, &c->orph_tree);
- list_add_tail(&orphan->list, &c->orph_list);
- orphan->del = 1;
- orphan->dnext = c->orph_dnext;
- c->orph_dnext = orphan;
- dbg_mnt("ino %lu, new %d, tot %d", (unsigned long)inum,
- c->new_orphans, c->tot_orphans);
- return 0;
-}
-
/**
* do_kill_orphans - remove orphan inodes from the index.
* @c: UBIFS file-system description object
@@ -655,10 +610,6 @@ static int do_kill_orphans(struct ubifs_info *c, struct ubifs_scan_leb *sleb,
if (err)
goto out_ro;
}
-
- err = insert_dead_orphan(c, inum);
- if (err)
- goto out_free;
}

*last_cmt_no = cmt_no;
--
2.39.2


2024-04-10 07:52:57

by Zhihao Cheng

[permalink] [raw]
Subject: [PATCH 5/9] ubifs: Fix adding orphan entry twice for the same inode

The tmpfile could be added into orphan list twice, first time is
creation, the second time is removing after it is linked. The orphan
entry could be added twice for tmpfile if following sequence is
satisfied:

ubifs_tmpfile
ubifs_jnl_update
ubifs_add_orphan // first time to add orphan entry

P1 P2
ubifs_link do_commit
ubifs_orphan_start_commit
orphan->cmt = 1
ubifs_delete_orphan
orphan_delete
if (orph->cmt)
orph->del = 1; // orphan entry is not deleted from tree
return
ubifs_unlink
ubifs_jnl_update
ubifs_add_orphan
orphan_add // found old orphan entry, second time to add orphan entry
ubifs_err(c, "orphaned twice")
return -EINVAL // unlink failed!
ubifs_orphan_end_commit
erase_deleted // delete old orphan entry
rb_erase(&orphan->rb, &c->orph_tree)

Fix it by removing orphan entry from orphan tree in advance, rather than
remove it from orphan tree in committing process.

Fixes: 32fe905c17f0 ("ubifs: Fix O_TMPFILE corner case in ubifs_link()")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218672
Signed-off-by: Zhihao Cheng <[email protected]>
---
fs/ubifs/orphan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ubifs/orphan.c b/fs/ubifs/orphan.c
index 6e843e8fc3db..37d206097112 100644
--- a/fs/ubifs/orphan.c
+++ b/fs/ubifs/orphan.c
@@ -136,6 +136,7 @@ static void orphan_delete(struct ubifs_info *c, struct ubifs_orphan *orph)

if (orph->cmt) {
orph->del = 1;
+ rb_erase(&orph->rb, &c->orph_tree);
orph->dnext = c->orph_dnext;
c->orph_dnext = orph;
dbg_gen("delete later ino %lu", (unsigned long)orph->inum);
@@ -461,7 +462,6 @@ static void erase_deleted(struct ubifs_info *c)
dnext = orphan->dnext;
ubifs_assert(c, !orphan->new);
ubifs_assert(c, orphan->del);
- rb_erase(&orphan->rb, &c->orph_tree);
list_del(&orphan->list);
c->tot_orphans -= 1;
dbg_gen("deleting orphan ino %lu", (unsigned long)orphan->inum);
--
2.39.2