2004-04-27 06:54:53

by Junfeng Yang

[permalink] [raw]
Subject: [CHECKER] Transcation is not fully aborted upon failure in JFS (jfs 2.4, kernel 2.4.19)

Hi,

We checked JFS filesystem on linux 2.4.19 recently and found 1 case that
looks like bugs.

When doing jfs_rename, if dtDelete fails, the transaction won't be fully
aborted (even if txAbort is called).

The symptoms are either we can read the new entry in the new dir by
getdents, but we can't actually open the file (always get -ENOENTs), or
after we do a sync, we see the new entry which is not supposed to be there
(since the transcation is aborted already). I couldn't figure out why
this is happening.

---------------------------------------------------------------------
[BUG] Transaction is not fully aborted even if txAbort is called when
dtDelete failed. dtDelete will return error if kmalloc fails at
'jfs_dtree.c:dtSearch:586 jfs_dtree.c:dtDelete:2066'

/*
* NAME: jfs_rename
*
* FUNCTION: rename a file or directory
*/
int jfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
{
...
/*
* Add new directory entry
*/
rc = dtSearch(new_dir, &new_dname, &ino, &btstack,
JFS_CREATE);
if (rc) {
jfs_err("jfs_rename didn't expect dtSearch to fail "
"w/rc = %d", rc);
goto out4;
}

ino = old_ip->i_ino;
rc = dtInsert(tid, new_dir, &new_dname, &ino, &btstack);
if (rc) {
jfs_err("jfs_rename: dtInsert failed w/rc = %d",
rc);
goto out4;
}

if (S_ISDIR(old_ip->i_mode))
new_dir->i_nlink++;
}
/*
* Remove old directory entry
*/

ino = old_ip->i_ino;
FAIL--> rc = dtDelete(tid, old_dir, &old_dname, &ino, JFS_REMOVE);
if (rc) {
jfs_err("jfs_rename did not expect dtDelete to return rc = %d",
rc);
ERROR--> txAbort(tid, 1); /* Marks Filesystem dirty */
goto out4;
}
...
}


2004-04-30 19:28:47

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [CHECKER] Transcation is not fully aborted upon failure in JFS (jfs 2.4, kernel 2.4.19)

On Tue, 2004-04-27 at 01:54, Junfeng Yang wrote:
> Hi,
>
> We checked JFS filesystem on linux 2.4.19 recently and found 1 case that
> looks like bugs.
>
> When doing jfs_rename, if dtDelete fails, the transaction won't be fully
> aborted (even if txAbort is called).

Looking back at an old version of the code, the original coder had the
kernel trap if dtDelete failed. It really would be a rare case for
dtDelete to fail here. I had changed the trap to the txAbort call to
make the unlikely failure a little more friendly. I recognized that
everything wouldn't be completely consistent, so I had txAbort mark the
superblock dirty, which will force a complete fsck run before the volume
can be mounted again.

I can probably improve on this and leave the system in better
condition. I'll treat this as a low priority bug.

Thanks for your work auditing the code and reporting the problem.
--
David Kleikamp
IBM Linux Technology Center

2004-04-30 22:23:51

by Junfeng Yang

[permalink] [raw]
Subject: [CHECKER] Return Error code gets treated as dir_table index, resulting losses of other dir entries (JFS2.4, kernel 2.4.19)


static function add_index can fail by return -EPERM (and it is declared to
return a unsigned 4-byte integer). This error gets ignored by the caller,
dtInsertEntry, which will treat the returned error code (u32)(-EPERM) as
an index to dir_table. This causes losses of directory entries in the
same parent directory.

static u32 add_index(tid_t tid, struct inode *ip, s64 bn, int slot)
{
...
if ((mp = get_index_page(ip, 0)) == 0) {
jfs_err("add_index: get_metapage failed!");
xtTruncate(tid, ip, 0, COMMIT_PWMAP);
Return --> return -EPERM;
...
}


static void dtInsertEntry(dtpage_t * p, int index, struct component_name * key,
ddata_t * data, struct dt_lock ** dtlock)
{
...
Error --> lh->index = cpu_to_le32(add_index(data->leaf.tid,
data->leaf.ip,
bn, index));
...
}


Here is a trace:

============ Filesystem Image Before System Call ===================
4 files, 1 dirs, 6 nodes
[0:D]
[1:F:1073741824]
[2:F:0]
[3:F:0]
[4:D]

create file 5 succeeded.
can't get dir entry Input/output error
ERROR: Filesystem images differ

============= Filesystem Image After System Call ===================
3 files, 0 dirs, 4 nodes
[0:D]
[1:F:1073741824]
[2:F:0]
[3:F:0]