From: Li Zefan Subject: [RFC][PATCH] vfs: always protect diretory file->fpos with inode mutex Date: Tue, 19 Feb 2013 09:22:40 +0800 Message-ID: <5122D3E0.6070800@huawei.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------090108010009060609020602" Cc: LKML , Ext4 Developers List , Jan Kara , "Theodore Ts'o" , Andrew Morton , , Wuqixuan , Al Viro , To: Return-path: Received: from szxga01-in.huawei.com ([119.145.14.64]:28924 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757911Ab3BSBXp (ORCPT ); Mon, 18 Feb 2013 20:23:45 -0500 Sender: linux-ext4-owner@vger.kernel.org List-ID: --------------090108010009060609020602 Content-Type: text/plain; charset="GB2312" Content-Transfer-Encoding: 7bit There's a long long-standing bug...As long as I don't know when it dates from. I've written and attached a simple program to reproduce this bug, and it can immediately trigger the bug in my box. It uses two threads, one keeps calling read(), and the other calling readdir(), both on the same directory fd. When I ran it on ext3 (can be replaced with ext2/ext4) which has _dir_index_ feature disabled, I got this: EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=993, inode=0, rec_len=0, name_len=0 EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=1009, inode=0, rec_len=0, name_len=0 EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=993, inode=0, rec_len=0, name_len=0 EXT3-fs error (device loop1): ext3_readdir: bad entry in directory #34817: rec_len is smaller than minimal - offset=1009, inode=0, rec_len=0, name_len=0 ... If we configured errors=remount-ro, the filesystem will become read-only. SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count) { ... loff_t pos = file_pos_read(file); ret = vfs_read(file, buf, count, &pos); file_pos_write(file, pos); fput_light(file, fput_needed); ... } While readdir() is protected with i_mutex, f_pos can be changed without any locking in various read()/write() syscalls, which leads to this bug. What makes things worse is Andi removed i_mutex from generic_file_llseek, so you can trigger the same bug by replacing read() with lseek() in the test program. commit ef3d0fd27e90f67e35da516dafc1482c82939a60 Author: Andi Kleen Date: Thu Sep 15 16:06:48 2011 -0700 vfs: do (nearly) lockless generic_file_llseek I've tested ext3 with dir_index enabled and btrfs, nothing bad happened, but there should be some other vulnerabilities. For example, running the test program on /sys for a few minutes triggered this warning: [ 917.994600] ------------[ cut here ]------------ [ 917.994614] WARNING: at fs/sysfs/sysfs.h:195 sysfs_readdir+0x24c/0x260() [ 917.994621] Hardware name: Tecal RH2285 ... [ 917.994725] Pid: 8754, comm: a.out Not tainted 3.8.0-rc2-tj-0.7-default+ #69 [ 917.994731] Call Trace: [ 917.994736] [] ? sysfs_readdir+0x24c/0x260 [ 917.994743] [] ? sysfs_readdir+0x24c/0x260 [ 917.994752] [] warn_slowpath_common+0x7f/0xc0 [ 917.994759] [] warn_slowpath_null+0x1a/0x20 [ 917.994766] [] sysfs_readdir+0x24c/0x260 [ 917.994774] [] ? sys_ioctl+0x90/0x90 [ 917.994780] [] ? sys_ioctl+0x90/0x90 [ 917.994787] [] vfs_readdir+0xb1/0xd0 [ 917.994794] [] sys_getdents64+0x9b/0x110 [ 917.994803] [] system_call_fastpath+0x16/0x1b [ 917.994809] ---[ end trace 6efe15a65b89022a ]--- [ 917.994816] ida_remove called for id=13073 which is not allocated. We can fix this bug in each filesystem, but can't we just make sure i_mutex is acquired in lseek(), read(), write() and readdir() for directory file operations? (the patch is for demonstration only) diff --git a/fs/read_write.c b/fs/read_write.c index bb34af3..41f76e5 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -218,14 +218,25 @@ EXPORT_SYMBOL(default_llseek); loff_t vfs_llseek(struct file *file, loff_t offset, int whence) { + struct inode *inode = file->f_path.dentry->d_inode; loff_t (*fn)(struct file *, loff_t, int); + int ret; fn = no_llseek; if (file->f_mode & FMODE_LSEEK) { if (file->f_op && file->f_op->llseek) fn = file->f_op->llseek; } - return fn(file, offset, whence); + + if (S_ISDIR(inode->i_mode)) { + mutex_lock(&inode->i_mutex); + ret = fn(file, offset, whence); + mutex_unlock(&inode->i_mutex); + } else { + ret = fn(file, offset, whence); + } + + return ret; } EXPORT_SYMBOL(vfs_llseek); @@ -442,12 +453,32 @@ EXPORT_SYMBOL(vfs_write); static inline loff_t file_pos_read(struct file *file) { - return file->f_pos; + struct inode *inode = file->f_path.dentry->d_inode; + loff_t pos; + + if (S_ISDIR(inode->i_mode)) { + mutex_lock(&inode->i_mutex); + pos = file->f_pos; + mutex_unlock(&inode->i_mutex); + } else { + pos = file->f_pos; + } + + return pos; } static inline void file_pos_write(struct file *file, loff_t pos) { - file->f_pos = pos; + struct inode *inode = file->f_path.dentry->d_inode; + + if (S_ISDIR(inode->i_mode)) { + mutex_lock(&inode->i_mutex); + file->f_pos = pos; + file->f_version = 0; + mutex_unlock(&inode->i_mutex); + } else { + file->f_pos = pos; + } } SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count) --------------090108010009060609020602 Content-Type: text/plain; charset="gb18030"; name="test.c" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="test.c" I2luY2x1ZGUgPHVuaXN0ZC5oPg0KI2luY2x1ZGUgPHN0ZGlvLmg+DQojaW5jbHVkZSA8c3Rk bGliLmg+DQojaW5jbHVkZSA8c3RyaW5nLmg+DQojaW5jbHVkZSA8ZGlyZW50Lmg+DQojaW5j bHVkZSA8ZmNudGwuaD4NCiNpbmNsdWRlIDxzeXMvc3RhdC5oPg0KI2luY2x1ZGUgPHN5cy90 eXBlcy5oPg0KDQppbnQgbWFpbihpbnQgYXJnYywgY2hhciAqYXJndltdKQ0Kew0KCWludCBm ZDsNCglpbnQgcmV0Ow0KCURJUiAqZGlyOw0KCXN0cnVjdCBkaXJlbnQgKnB0cjsNCg0KCWlm IChhcmdjICE9IDIpDQoJCWVycngoMSwgIlBsZWFzZSBzcGVjaWZ5IGEgZGlyZWN0b3J5Iik7 DQoNCglkaXIgPSBvcGVuZGlyKGFyZ3ZbMV0pOw0KCWlmICghZGlyKQ0KCQllcnIoMSwgIkZh aWxlZCB0byBvcGVuIGRpcmVjdG9yeSAlcyIsIGFyZ3ZbMV0pOw0KDQoJZmQgPSBkaXJmZChk aXIpOw0KCWlmIChmZCA8IDApDQoJCWVycigxLCAiRmFpbGVkIHRvIGdldCBkaXJmZCIpOw0K DQoJcmV0ID0gZm9yaygpOw0KCWlmIChyZXQgPT0gMCkgew0KCQljaGFyIGJ1ZlsxMDBdOw0K DQoJCXdoaWxlICgxKQ0KCQkJcmVhZChmZCwgYnVmLCAxMDApOw0KCX0gZWxzZSB7DQoJCWlu dCByZXQyOw0KDQoJCXdoaWxlICgxKSB7DQoJCQlyZXQyID0gbHNlZWsoZmQsIDAsIFNFRUtf U0VUKTsNCgkJCWlmIChyZXQyIDwgLTEpDQoJCQkJZXJyKDEsICJzZWVrIGZhaWxlZCIpOw0K DQoJCQl3aGlsZSAocHRyID0gcmVhZGRpcihkaXIpKQ0KCQkJCTsNCgkJfQ0KCX0JDQoNCglj bG9zZWRpcihkaXIpOw0KCXJldHVybiAwOw0KfQ0K --------------090108010009060609020602--