2015-07-24 12:36:25

by Stephen Smalley

[permalink] [raw]
Subject: [PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.

The shm implementation internally uses shmem or hugetlbfs inodes
for shm segments. As these inodes are never directly exposed to
userspace and only accessed through the shm operations which are
already hooked by security modules, mark the inodes with the
S_PRIVATE flag so that inode security initialization and permission
checking is skipped.

This was motivated by the following lockdep warning:
Jul 22 14:36:40 fc23 kernel:
======================================================
Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking
dependency detected ]
Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
Tainted: G W
Jul 22 14:36:40 fc23 kernel:
-------------------------------------------------------
Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock:
Jul 22 14:36:40 fc23 kernel: (&ids->rwsem){+++++.}, at:
[<ffffffff81385354>] shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock:
Jul 22 14:36:40 fc23 kernel: (&mm->mmap_sem){++++++}, at:
[<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180
Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock.
Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in
reverse order) is:
Jul 22 14:36:40 fc23 kernel: #012-> #3 (&mm->mmap_sem){++++++}:
Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
Jul 22 14:36:40 fc23 kernel: [<ffffffff81217baa>] __might_fault+0x7a/0xa0
Jul 22 14:36:40 fc23 kernel: [<ffffffff81284a1e>] filldir+0x9e/0x130
Jul 22 14:36:40 fc23 kernel: [<ffffffffa019bb08>]
xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
Jul 22 14:36:40 fc23 kernel: [<ffffffffa019c5b4>]
xfs_readdir+0x1b4/0x330 [xfs]
Jul 22 14:36:40 fc23 kernel: [<ffffffffa019f38b>]
xfs_file_readdir+0x2b/0x30 [xfs]
Jul 22 14:36:40 fc23 kernel: [<ffffffff812847e7>] iterate_dir+0x97/0x130
Jul 22 14:36:40 fc23 kernel: [<ffffffff81284d21>] SyS_getdents+0x91/0x120
Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
entry_SYSCALL_64_fastpath+0x12/0x76
Jul 22 14:36:40 fc23 kernel: #012-> #2 (&xfs_dir_ilock_class){++++.+}:
Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
Jul 22 14:36:40 fc23 kernel: [<ffffffff81101e97>]
down_read_nested+0x57/0xa0
Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b0e57>]
xfs_ilock+0x167/0x350 [xfs]
Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b10b8>]
xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
Jul 22 14:36:40 fc23 kernel: [<ffffffffa014799d>]
xfs_attr_get+0xbd/0x190 [xfs]
Jul 22 14:36:40 fc23 kernel: [<ffffffffa01c17ad>]
xfs_xattr_get+0x3d/0x70 [xfs]
Jul 22 14:36:40 fc23 kernel: [<ffffffff8129962f>]
generic_getxattr+0x4f/0x70
Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ba52>]
inode_doinit_with_dentry+0x162/0x670
Jul 22 14:36:40 fc23 kernel: [<ffffffff8139cf69>]
sb_finish_set_opts+0xd9/0x230
Jul 22 14:36:40 fc23 kernel: [<ffffffff8139d66c>]
selinux_set_mnt_opts+0x35c/0x660
Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ff97>]
superblock_doinit+0x77/0xf0
Jul 22 14:36:40 fc23 kernel: [<ffffffff813a0020>]
delayed_superblock_init+0x10/0x20
Jul 22 14:36:40 fc23 kernel: [<ffffffff81272d23>]
iterate_supers+0xb3/0x110
Jul 22 14:36:40 fc23 kernel: [<ffffffff813a4e5f>]
selinux_complete_init+0x2f/0x40
Jul 22 14:36:40 fc23 kernel: [<ffffffff813b47a3>]
security_load_policy+0x103/0x600
Jul 22 14:36:40 fc23 kernel: [<ffffffff813a6901>]
sel_write_load+0xc1/0x750
Jul 22 14:36:40 fc23 kernel: [<ffffffff8126e817>] __vfs_write+0x37/0x100
Jul 22 14:36:40 fc23 kernel: [<ffffffff8126f229>] vfs_write+0xa9/0x1a0
Jul 22 14:36:40 fc23 kernel: [<ffffffff8126ff48>] SyS_write+0x58/0xd0
Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
entry_SYSCALL_64_fastpath+0x12/0x76
Jul 22 14:36:40 fc23 kernel: #012-> #1 (&isec->lock){+.+.+.}:
Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
Jul 22 14:36:40 fc23 kernel: [<ffffffff8186de8f>]
mutex_lock_nested+0x7f/0x3e0
Jul 22 14:36:40 fc23 kernel: [<ffffffff8139b9a9>]
inode_doinit_with_dentry+0xb9/0x670
Jul 22 14:36:40 fc23 kernel: [<ffffffff8139bf7c>]
selinux_d_instantiate+0x1c/0x20
Jul 22 14:36:40 fc23 kernel: [<ffffffff813955f6>]
security_d_instantiate+0x36/0x60
Jul 22 14:36:40 fc23 kernel: [<ffffffff81287c34>] d_instantiate+0x54/0x70
Jul 22 14:36:40 fc23 kernel: [<ffffffff8120111c>]
__shmem_file_setup+0xdc/0x240
Jul 22 14:36:40 fc23 kernel: [<ffffffff81201290>]
shmem_file_setup+0x10/0x20
Jul 22 14:36:40 fc23 kernel: [<ffffffff813856e0>] newseg+0x290/0x3a0
Jul 22 14:36:40 fc23 kernel: [<ffffffff8137e278>] ipcget+0x208/0x2d0
Jul 22 14:36:40 fc23 kernel: [<ffffffff81386074>] SyS_shmget+0x54/0x70
Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
entry_SYSCALL_64_fastpath+0x12/0x76
Jul 22 14:36:40 fc23 kernel: #012-> #0 (&ids->rwsem){+++++.}:
Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>]
__lock_acquire+0x1a78/0x1d00
Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0
Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80
Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460
Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180
Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
entry_SYSCALL_64_fastpath+0x12/0x76
Jul 22 14:36:40 fc23 kernel: #012other info that might help us debug this:
Jul 22 14:36:40 fc23 kernel: Chain exists of:#012 &ids->rwsem -->
&xfs_dir_ilock_class --> &mm->mmap_sem
Jul 22 14:36:40 fc23 kernel: Possible unsafe locking scenario:
Jul 22 14:36:40 fc23 kernel: CPU0 CPU1
Jul 22 14:36:40 fc23 kernel: ---- ----
Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem);
Jul 22 14:36:40 fc23 kernel:
lock(&xfs_dir_ilock_class);
Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem);
Jul 22 14:36:40 fc23 kernel: lock(&ids->rwsem);
Jul 22 14:36:40 fc23 kernel: #012 *** DEADLOCK ***
Jul 22 14:36:40 fc23 kernel: 1 lock held by httpd/1597:
Jul 22 14:36:40 fc23 kernel: #0: (&mm->mmap_sem){++++++}, at:
[<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180
Jul 22 14:36:40 fc23 kernel: #012stack backtrace:
Jul 22 14:36:40 fc23 kernel: CPU: 7 PID: 1597 Comm: httpd Tainted: G
W 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
Jul 22 14:36:40 fc23 kernel: Hardware name: VMware, Inc. VMware
Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
05/20/2014
Jul 22 14:36:40 fc23 kernel: 0000000000000000 000000006cb6fe9d
ffff88019ff07c58 ffffffff81868175
Jul 22 14:36:40 fc23 kernel: 0000000000000000 ffffffff82aea390
ffff88019ff07ca8 ffffffff81105903
Jul 22 14:36:40 fc23 kernel: ffff88019ff07c78 ffff88019ff07d08
0000000000000001 ffff8800b75108f0
Jul 22 14:36:40 fc23 kernel: Call Trace:
Jul 22 14:36:40 fc23 kernel: [<ffffffff81868175>] dump_stack+0x4c/0x65
Jul 22 14:36:40 fc23 kernel: [<ffffffff81105903>] print_circular_bug+0x1e3/0x250
Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>] __lock_acquire+0x1a78/0x1d00
Jul 22 14:36:40 fc23 kernel: [<ffffffff81220c33>] ? unlink_file_vma+0x33/0x60
Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0
Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130
Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80
Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460
Jul 22 14:36:40 fc23 kernel: [<ffffffff81386bbb>] ? SyS_shmdt+0x4b/0x180
Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180
Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
entry_SYSCALL_64_fastpath+0x12/0x76

Reported-by: Morten Stevens <[email protected]>
Signed-off-by: Stephen Smalley <[email protected]>
---
This version only differs in the patch description, which restores
the original lockdep trace from Morten Stevens. It was unfortunately
mangled in the prior version.

fs/hugetlbfs/inode.c | 2 ++
ipc/shm.c | 2 +-
mm/shmem.c | 4 ++--
3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 0cf74df..973c24c 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0);
if (!inode)
goto out_dentry;
+ if (creat_flags == HUGETLB_SHMFS_INODE)
+ inode->i_flags |= S_PRIVATE;

file = ERR_PTR(-ENOMEM);
if (hugetlb_reserve_pages(inode, 0,
diff --git a/ipc/shm.c b/ipc/shm.c
index 06e5cf2..4aef24d 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -545,7 +545,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
if ((shmflg & SHM_NORESERVE) &&
sysctl_overcommit_memory != OVERCOMMIT_NEVER)
acctflag = VM_NORESERVE;
- file = shmem_file_setup(name, size, acctflag);
+ file = shmem_kernel_file_setup(name, size, acctflag);
}
error = PTR_ERR(file);
if (IS_ERR(file))
diff --git a/mm/shmem.c b/mm/shmem.c
index 4caf8ed..dbe0c1e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3363,8 +3363,8 @@ put_path:
* shmem_kernel_file_setup - get an unlinked file living in tmpfs which must be
* kernel internal. There will be NO LSM permission checks against the
* underlying inode. So users of this interface must do LSM checks at a
- * higher layer. The one user is the big_key implementation. LSM checks
- * are provided at the key level rather than the inode level.
+ * higher layer. The users are the big_key and shm implementations. LSM
+ * checks are provided at the key or shm level rather than the inode.
* @name: name for dentry (to be seen in /proc/<pid>/maps
* @size: size to be set for the file
* @flags: VM_NORESERVE suppresses pre-accounting of the entire object size
--
2.1.0


2015-07-24 22:11:19

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.

On Fri, Jul 24, 2015 at 8:34 AM, Stephen Smalley <[email protected]> wrote:
> The shm implementation internally uses shmem or hugetlbfs inodes
> for shm segments. As these inodes are never directly exposed to
> userspace and only accessed through the shm operations which are
> already hooked by security modules, mark the inodes with the
> S_PRIVATE flag so that inode security initialization and permission
> checking is skipped.
>
> This was motivated by the following lockdep warning:
> Jul 22 14:36:40 fc23 kernel:
> ======================================================
> Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking
> dependency detected ]
> Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
> Tainted: G W
> Jul 22 14:36:40 fc23 kernel:
> -------------------------------------------------------
> Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock:
> Jul 22 14:36:40 fc23 kernel: (&ids->rwsem){+++++.}, at:
> [<ffffffff81385354>] shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock:
> Jul 22 14:36:40 fc23 kernel: (&mm->mmap_sem){++++++}, at:
> [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180
> Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock.
> Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in
> reverse order) is:
> Jul 22 14:36:40 fc23 kernel: #012-> #3 (&mm->mmap_sem){++++++}:
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81217baa>] __might_fault+0x7a/0xa0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81284a1e>] filldir+0x9e/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa019bb08>]
> xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa019c5b4>]
> xfs_readdir+0x1b4/0x330 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa019f38b>]
> xfs_file_readdir+0x2b/0x30 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffff812847e7>] iterate_dir+0x97/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81284d21>] SyS_getdents+0x91/0x120
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
> entry_SYSCALL_64_fastpath+0x12/0x76
> Jul 22 14:36:40 fc23 kernel: #012-> #2 (&xfs_dir_ilock_class){++++.+}:
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81101e97>]
> down_read_nested+0x57/0xa0
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b0e57>]
> xfs_ilock+0x167/0x350 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b10b8>]
> xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa014799d>]
> xfs_attr_get+0xbd/0x190 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa01c17ad>]
> xfs_xattr_get+0x3d/0x70 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8129962f>]
> generic_getxattr+0x4f/0x70
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ba52>]
> inode_doinit_with_dentry+0x162/0x670
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139cf69>]
> sb_finish_set_opts+0xd9/0x230
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139d66c>]
> selinux_set_mnt_opts+0x35c/0x660
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ff97>]
> superblock_doinit+0x77/0xf0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813a0020>]
> delayed_superblock_init+0x10/0x20
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81272d23>]
> iterate_supers+0xb3/0x110
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813a4e5f>]
> selinux_complete_init+0x2f/0x40
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813b47a3>]
> security_load_policy+0x103/0x600
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813a6901>]
> sel_write_load+0xc1/0x750
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8126e817>] __vfs_write+0x37/0x100
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8126f229>] vfs_write+0xa9/0x1a0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8126ff48>] SyS_write+0x58/0xd0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
> entry_SYSCALL_64_fastpath+0x12/0x76
> Jul 22 14:36:40 fc23 kernel: #012-> #1 (&isec->lock){+.+.+.}:
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8186de8f>]
> mutex_lock_nested+0x7f/0x3e0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139b9a9>]
> inode_doinit_with_dentry+0xb9/0x670
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139bf7c>]
> selinux_d_instantiate+0x1c/0x20
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813955f6>]
> security_d_instantiate+0x36/0x60
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81287c34>] d_instantiate+0x54/0x70
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8120111c>]
> __shmem_file_setup+0xdc/0x240
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81201290>]
> shmem_file_setup+0x10/0x20
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813856e0>] newseg+0x290/0x3a0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8137e278>] ipcget+0x208/0x2d0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386074>] SyS_shmget+0x54/0x70
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
> entry_SYSCALL_64_fastpath+0x12/0x76
> Jul 22 14:36:40 fc23 kernel: #012-> #0 (&ids->rwsem){+++++.}:
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>]
> __lock_acquire+0x1a78/0x1d00
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
> entry_SYSCALL_64_fastpath+0x12/0x76
> Jul 22 14:36:40 fc23 kernel: #012other info that might help us debug this:
> Jul 22 14:36:40 fc23 kernel: Chain exists of:#012 &ids->rwsem -->
> &xfs_dir_ilock_class --> &mm->mmap_sem
> Jul 22 14:36:40 fc23 kernel: Possible unsafe locking scenario:
> Jul 22 14:36:40 fc23 kernel: CPU0 CPU1
> Jul 22 14:36:40 fc23 kernel: ---- ----
> Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem);
> Jul 22 14:36:40 fc23 kernel:
> lock(&xfs_dir_ilock_class);
> Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem);
> Jul 22 14:36:40 fc23 kernel: lock(&ids->rwsem);
> Jul 22 14:36:40 fc23 kernel: #012 *** DEADLOCK ***
> Jul 22 14:36:40 fc23 kernel: 1 lock held by httpd/1597:
> Jul 22 14:36:40 fc23 kernel: #0: (&mm->mmap_sem){++++++}, at:
> [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180
> Jul 22 14:36:40 fc23 kernel: #012stack backtrace:
> Jul 22 14:36:40 fc23 kernel: CPU: 7 PID: 1597 Comm: httpd Tainted: G
> W 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
> Jul 22 14:36:40 fc23 kernel: Hardware name: VMware, Inc. VMware
> Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
> 05/20/2014
> Jul 22 14:36:40 fc23 kernel: 0000000000000000 000000006cb6fe9d
> ffff88019ff07c58 ffffffff81868175
> Jul 22 14:36:40 fc23 kernel: 0000000000000000 ffffffff82aea390
> ffff88019ff07ca8 ffffffff81105903
> Jul 22 14:36:40 fc23 kernel: ffff88019ff07c78 ffff88019ff07d08
> 0000000000000001 ffff8800b75108f0
> Jul 22 14:36:40 fc23 kernel: Call Trace:
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81868175>] dump_stack+0x4c/0x65
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81105903>] print_circular_bug+0x1e3/0x250
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>] __lock_acquire+0x1a78/0x1d00
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81220c33>] ? unlink_file_vma+0x33/0x60
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386bbb>] ? SyS_shmdt+0x4b/0x180
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
> entry_SYSCALL_64_fastpath+0x12/0x76
>
> Reported-by: Morten Stevens <[email protected]>
> Signed-off-by: Stephen Smalley <[email protected]>
> ---
> This version only differs in the patch description, which restores
> the original lockdep trace from Morten Stevens. It was unfortunately
> mangled in the prior version.
>
> fs/hugetlbfs/inode.c | 2 ++
> ipc/shm.c | 2 +-
> mm/shmem.c | 4 ++--
> 3 files changed, 5 insertions(+), 3 deletions(-)

The stuff below looked reasonable to me yesterday, still looks reasonable today.

Acked-by: Paul Moore <[email protected]>

> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 0cf74df..973c24c 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
> inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0);
> if (!inode)
> goto out_dentry;
> + if (creat_flags == HUGETLB_SHMFS_INODE)
> + inode->i_flags |= S_PRIVATE;
>
> file = ERR_PTR(-ENOMEM);
> if (hugetlb_reserve_pages(inode, 0,
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 06e5cf2..4aef24d 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -545,7 +545,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> if ((shmflg & SHM_NORESERVE) &&
> sysctl_overcommit_memory != OVERCOMMIT_NEVER)
> acctflag = VM_NORESERVE;
> - file = shmem_file_setup(name, size, acctflag);
> + file = shmem_kernel_file_setup(name, size, acctflag);
> }
> error = PTR_ERR(file);
> if (IS_ERR(file))
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 4caf8ed..dbe0c1e 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3363,8 +3363,8 @@ put_path:
> * shmem_kernel_file_setup - get an unlinked file living in tmpfs which must be
> * kernel internal. There will be NO LSM permission checks against the
> * underlying inode. So users of this interface must do LSM checks at a
> - * higher layer. The one user is the big_key implementation. LSM checks
> - * are provided at the key level rather than the inode level.
> + * higher layer. The users are the big_key and shm implementations. LSM
> + * checks are provided at the key or shm level rather than the inode.
> * @name: name for dentry (to be seen in /proc/<pid>/maps
> * @size: size to be set for the file
> * @flags: VM_NORESERVE suppresses pre-accounting of the entire object size
> --
> 2.1.0
>



--
paul moore
http://www.paul-moore.com

2015-07-27 19:33:51

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.

On Fri, 24 Jul 2015, Stephen Smalley wrote:

> The shm implementation internally uses shmem or hugetlbfs inodes
> for shm segments. As these inodes are never directly exposed to
> userspace and only accessed through the shm operations which are
> already hooked by security modules, mark the inodes with the
> S_PRIVATE flag so that inode security initialization and permission
> checking is skipped.
>
> This was motivated by the following lockdep warning:
> Jul 22 14:36:40 fc23 kernel:
> ======================================================
> Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking
> dependency detected ]
> Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
> Tainted: G W
> Jul 22 14:36:40 fc23 kernel:
> -------------------------------------------------------
> Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock:
> Jul 22 14:36:40 fc23 kernel: (&ids->rwsem){+++++.}, at:
> [<ffffffff81385354>] shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock:
> Jul 22 14:36:40 fc23 kernel: (&mm->mmap_sem){++++++}, at:
> [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180
> Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock.
> Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in
> reverse order) is:
> Jul 22 14:36:40 fc23 kernel: #012-> #3 (&mm->mmap_sem){++++++}:
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81217baa>] __might_fault+0x7a/0xa0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81284a1e>] filldir+0x9e/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa019bb08>]
> xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa019c5b4>]
> xfs_readdir+0x1b4/0x330 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa019f38b>]
> xfs_file_readdir+0x2b/0x30 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffff812847e7>] iterate_dir+0x97/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81284d21>] SyS_getdents+0x91/0x120
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
> entry_SYSCALL_64_fastpath+0x12/0x76
> Jul 22 14:36:40 fc23 kernel: #012-> #2 (&xfs_dir_ilock_class){++++.+}:
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81101e97>]
> down_read_nested+0x57/0xa0
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b0e57>]
> xfs_ilock+0x167/0x350 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b10b8>]
> xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa014799d>]
> xfs_attr_get+0xbd/0x190 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa01c17ad>]
> xfs_xattr_get+0x3d/0x70 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8129962f>]
> generic_getxattr+0x4f/0x70
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ba52>]
> inode_doinit_with_dentry+0x162/0x670
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139cf69>]
> sb_finish_set_opts+0xd9/0x230
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139d66c>]
> selinux_set_mnt_opts+0x35c/0x660
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ff97>]
> superblock_doinit+0x77/0xf0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813a0020>]
> delayed_superblock_init+0x10/0x20
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81272d23>]
> iterate_supers+0xb3/0x110
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813a4e5f>]
> selinux_complete_init+0x2f/0x40
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813b47a3>]
> security_load_policy+0x103/0x600
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813a6901>]
> sel_write_load+0xc1/0x750
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8126e817>] __vfs_write+0x37/0x100
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8126f229>] vfs_write+0xa9/0x1a0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8126ff48>] SyS_write+0x58/0xd0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
> entry_SYSCALL_64_fastpath+0x12/0x76
> Jul 22 14:36:40 fc23 kernel: #012-> #1 (&isec->lock){+.+.+.}:
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8186de8f>]
> mutex_lock_nested+0x7f/0x3e0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139b9a9>]
> inode_doinit_with_dentry+0xb9/0x670
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139bf7c>]
> selinux_d_instantiate+0x1c/0x20
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813955f6>]
> security_d_instantiate+0x36/0x60
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81287c34>] d_instantiate+0x54/0x70
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8120111c>]
> __shmem_file_setup+0xdc/0x240
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81201290>]
> shmem_file_setup+0x10/0x20
> Jul 22 14:36:40 fc23 kernel: [<ffffffff813856e0>] newseg+0x290/0x3a0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8137e278>] ipcget+0x208/0x2d0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386074>] SyS_shmget+0x54/0x70
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
> entry_SYSCALL_64_fastpath+0x12/0x76
> Jul 22 14:36:40 fc23 kernel: #012-> #0 (&ids->rwsem){+++++.}:
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>]
> __lock_acquire+0x1a78/0x1d00
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
> entry_SYSCALL_64_fastpath+0x12/0x76
> Jul 22 14:36:40 fc23 kernel: #012other info that might help us debug this:
> Jul 22 14:36:40 fc23 kernel: Chain exists of:#012 &ids->rwsem -->
> &xfs_dir_ilock_class --> &mm->mmap_sem
> Jul 22 14:36:40 fc23 kernel: Possible unsafe locking scenario:
> Jul 22 14:36:40 fc23 kernel: CPU0 CPU1
> Jul 22 14:36:40 fc23 kernel: ---- ----
> Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem);
> Jul 22 14:36:40 fc23 kernel:
> lock(&xfs_dir_ilock_class);
> Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem);
> Jul 22 14:36:40 fc23 kernel: lock(&ids->rwsem);
> Jul 22 14:36:40 fc23 kernel: #012 *** DEADLOCK ***
> Jul 22 14:36:40 fc23 kernel: 1 lock held by httpd/1597:
> Jul 22 14:36:40 fc23 kernel: #0: (&mm->mmap_sem){++++++}, at:
> [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180
> Jul 22 14:36:40 fc23 kernel: #012stack backtrace:
> Jul 22 14:36:40 fc23 kernel: CPU: 7 PID: 1597 Comm: httpd Tainted: G
> W 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
> Jul 22 14:36:40 fc23 kernel: Hardware name: VMware, Inc. VMware
> Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
> 05/20/2014
> Jul 22 14:36:40 fc23 kernel: 0000000000000000 000000006cb6fe9d
> ffff88019ff07c58 ffffffff81868175
> Jul 22 14:36:40 fc23 kernel: 0000000000000000 ffffffff82aea390
> ffff88019ff07ca8 ffffffff81105903
> Jul 22 14:36:40 fc23 kernel: ffff88019ff07c78 ffff88019ff07d08
> 0000000000000001 ffff8800b75108f0
> Jul 22 14:36:40 fc23 kernel: Call Trace:
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81868175>] dump_stack+0x4c/0x65
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81105903>] print_circular_bug+0x1e3/0x250
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>] __lock_acquire+0x1a78/0x1d00
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81220c33>] ? unlink_file_vma+0x33/0x60
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386bbb>] ? SyS_shmdt+0x4b/0x180
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
> entry_SYSCALL_64_fastpath+0x12/0x76
>
> Reported-by: Morten Stevens <[email protected]>
> Signed-off-by: Stephen Smalley <[email protected]>

Acked-by: Hugh Dickins <[email protected]>
but with one reservation below...

> ---
> This version only differs in the patch description, which restores
> the original lockdep trace from Morten Stevens. It was unfortunately
> mangled in the prior version.
>
> fs/hugetlbfs/inode.c | 2 ++
> ipc/shm.c | 2 +-
> mm/shmem.c | 4 ++--
> 3 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 0cf74df..973c24c 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
> inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0);
> if (!inode)
> goto out_dentry;
> + if (creat_flags == HUGETLB_SHMFS_INODE)
> + inode->i_flags |= S_PRIVATE;

I wonder if you would do better just to set S_PRIVATE unconditionally
there.

hugetlb_file_setup() has two callsites, neither of which exposes an fd.
One of them is shm.c's newseg(), which is getting us into the lockdep
trouble that you're fixing here.

The other is mmap.c's mmap_pgoff(). Now I don't think that will ever
get into lockdep trouble (no mutex or rwsem has been taken at that
point), but might your change above introduce (perhaps now or perhaps
in future) an inconsistency between how SElinux checks are applied to
a SHM area, and how they are applied to a MAP_ANONYMOUS|MAP_HUGETLB
area, and how they are applied to a straight MAP_ANONYMOUS area?

I think your patch as it stands brings SHM into line with
MAP_ANONYMOUS, but leaves MAP_ANONYMOUS|MAP_HUGETLB going the old way.
Perhaps an anomaly would appear when mprotect() is used?

It's up to you: I think your patch is okay as is,
but I just wonder if it has a surprise in store for the future.

Hugh

>
> file = ERR_PTR(-ENOMEM);
> if (hugetlb_reserve_pages(inode, 0,
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 06e5cf2..4aef24d 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -545,7 +545,7 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
> if ((shmflg & SHM_NORESERVE) &&
> sysctl_overcommit_memory != OVERCOMMIT_NEVER)
> acctflag = VM_NORESERVE;
> - file = shmem_file_setup(name, size, acctflag);
> + file = shmem_kernel_file_setup(name, size, acctflag);
> }
> error = PTR_ERR(file);
> if (IS_ERR(file))
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 4caf8ed..dbe0c1e 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3363,8 +3363,8 @@ put_path:
> * shmem_kernel_file_setup - get an unlinked file living in tmpfs which must be
> * kernel internal. There will be NO LSM permission checks against the
> * underlying inode. So users of this interface must do LSM checks at a
> - * higher layer. The one user is the big_key implementation. LSM checks
> - * are provided at the key level rather than the inode level.
> + * higher layer. The users are the big_key and shm implementations. LSM
> + * checks are provided at the key or shm level rather than the inode.
> * @name: name for dentry (to be seen in /proc/<pid>/maps
> * @size: size to be set for the file
> * @flags: VM_NORESERVE suppresses pre-accounting of the entire object size
> --
> 2.1.0
>
>

2015-07-27 21:08:25

by Stephen Smalley

[permalink] [raw]
Subject: Re: [PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.

On 07/27/2015 03:32 PM, Hugh Dickins wrote:
> On Fri, 24 Jul 2015, Stephen Smalley wrote:
>
>> The shm implementation internally uses shmem or hugetlbfs inodes
>> for shm segments. As these inodes are never directly exposed to
>> userspace and only accessed through the shm operations which are
>> already hooked by security modules, mark the inodes with the
>> S_PRIVATE flag so that inode security initialization and permission
>> checking is skipped.
>>
>> This was motivated by the following lockdep warning:
>> Jul 22 14:36:40 fc23 kernel:
>> ======================================================
>> Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking
>> dependency detected ]
>> Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
>> Tainted: G W
>> Jul 22 14:36:40 fc23 kernel:
>> -------------------------------------------------------
>> Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock:
>> Jul 22 14:36:40 fc23 kernel: (&ids->rwsem){+++++.}, at:
>> [<ffffffff81385354>] shm_close+0x34/0x130
>> Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock:
>> Jul 22 14:36:40 fc23 kernel: (&mm->mmap_sem){++++++}, at:
>> [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180
>> Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock.
>> Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in
>> reverse order) is:
>> Jul 22 14:36:40 fc23 kernel: #012-> #3 (&mm->mmap_sem){++++++}:
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81217baa>] __might_fault+0x7a/0xa0
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81284a1e>] filldir+0x9e/0x130
>> Jul 22 14:36:40 fc23 kernel: [<ffffffffa019bb08>]
>> xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
>> Jul 22 14:36:40 fc23 kernel: [<ffffffffa019c5b4>]
>> xfs_readdir+0x1b4/0x330 [xfs]
>> Jul 22 14:36:40 fc23 kernel: [<ffffffffa019f38b>]
>> xfs_file_readdir+0x2b/0x30 [xfs]
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff812847e7>] iterate_dir+0x97/0x130
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81284d21>] SyS_getdents+0x91/0x120
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
>> entry_SYSCALL_64_fastpath+0x12/0x76
>> Jul 22 14:36:40 fc23 kernel: #012-> #2 (&xfs_dir_ilock_class){++++.+}:
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81101e97>]
>> down_read_nested+0x57/0xa0
>> Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b0e57>]
>> xfs_ilock+0x167/0x350 [xfs]
>> Jul 22 14:36:40 fc23 kernel: [<ffffffffa01b10b8>]
>> xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
>> Jul 22 14:36:40 fc23 kernel: [<ffffffffa014799d>]
>> xfs_attr_get+0xbd/0x190 [xfs]
>> Jul 22 14:36:40 fc23 kernel: [<ffffffffa01c17ad>]
>> xfs_xattr_get+0x3d/0x70 [xfs]
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8129962f>]
>> generic_getxattr+0x4f/0x70
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ba52>]
>> inode_doinit_with_dentry+0x162/0x670
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139cf69>]
>> sb_finish_set_opts+0xd9/0x230
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139d66c>]
>> selinux_set_mnt_opts+0x35c/0x660
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139ff97>]
>> superblock_doinit+0x77/0xf0
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff813a0020>]
>> delayed_superblock_init+0x10/0x20
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81272d23>]
>> iterate_supers+0xb3/0x110
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff813a4e5f>]
>> selinux_complete_init+0x2f/0x40
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff813b47a3>]
>> security_load_policy+0x103/0x600
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff813a6901>]
>> sel_write_load+0xc1/0x750
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8126e817>] __vfs_write+0x37/0x100
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8126f229>] vfs_write+0xa9/0x1a0
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8126ff48>] SyS_write+0x58/0xd0
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
>> entry_SYSCALL_64_fastpath+0x12/0x76
>> Jul 22 14:36:40 fc23 kernel: #012-> #1 (&isec->lock){+.+.+.}:
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8186de8f>]
>> mutex_lock_nested+0x7f/0x3e0
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139b9a9>]
>> inode_doinit_with_dentry+0xb9/0x670
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8139bf7c>]
>> selinux_d_instantiate+0x1c/0x20
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff813955f6>]
>> security_d_instantiate+0x36/0x60
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81287c34>] d_instantiate+0x54/0x70
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8120111c>]
>> __shmem_file_setup+0xdc/0x240
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81201290>]
>> shmem_file_setup+0x10/0x20
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff813856e0>] newseg+0x290/0x3a0
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8137e278>] ipcget+0x208/0x2d0
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386074>] SyS_shmget+0x54/0x70
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
>> entry_SYSCALL_64_fastpath+0x12/0x76
>> Jul 22 14:36:40 fc23 kernel: #012-> #0 (&ids->rwsem){+++++.}:
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>]
>> __lock_acquire+0x1a78/0x1d00
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
>> entry_SYSCALL_64_fastpath+0x12/0x76
>> Jul 22 14:36:40 fc23 kernel: #012other info that might help us debug this:
>> Jul 22 14:36:40 fc23 kernel: Chain exists of:#012 &ids->rwsem -->
>> &xfs_dir_ilock_class --> &mm->mmap_sem
>> Jul 22 14:36:40 fc23 kernel: Possible unsafe locking scenario:
>> Jul 22 14:36:40 fc23 kernel: CPU0 CPU1
>> Jul 22 14:36:40 fc23 kernel: ---- ----
>> Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem);
>> Jul 22 14:36:40 fc23 kernel:
>> lock(&xfs_dir_ilock_class);
>> Jul 22 14:36:40 fc23 kernel: lock(&mm->mmap_sem);
>> Jul 22 14:36:40 fc23 kernel: lock(&ids->rwsem);
>> Jul 22 14:36:40 fc23 kernel: #012 *** DEADLOCK ***
>> Jul 22 14:36:40 fc23 kernel: 1 lock held by httpd/1597:
>> Jul 22 14:36:40 fc23 kernel: #0: (&mm->mmap_sem){++++++}, at:
>> [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180
>> Jul 22 14:36:40 fc23 kernel: #012stack backtrace:
>> Jul 22 14:36:40 fc23 kernel: CPU: 7 PID: 1597 Comm: httpd Tainted: G
>> W 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
>> Jul 22 14:36:40 fc23 kernel: Hardware name: VMware, Inc. VMware
>> Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
>> 05/20/2014
>> Jul 22 14:36:40 fc23 kernel: 0000000000000000 000000006cb6fe9d
>> ffff88019ff07c58 ffffffff81868175
>> Jul 22 14:36:40 fc23 kernel: 0000000000000000 ffffffff82aea390
>> ffff88019ff07ca8 ffffffff81105903
>> Jul 22 14:36:40 fc23 kernel: ffff88019ff07c78 ffff88019ff07d08
>> 0000000000000001 ffff8800b75108f0
>> Jul 22 14:36:40 fc23 kernel: Call Trace:
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81868175>] dump_stack+0x4c/0x65
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81105903>] print_circular_bug+0x1e3/0x250
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81108df8>] __lock_acquire+0x1a78/0x1d00
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81220c33>] ? unlink_file_vma+0x33/0x60
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff8186efba>] down_write+0x5a/0xc0
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] ? shm_close+0x34/0x130
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81385354>] shm_close+0x34/0x130
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff812203a5>] remove_vma+0x45/0x80
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81222a30>] do_munmap+0x2b0/0x460
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386bbb>] ? SyS_shmdt+0x4b/0x180
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81386c25>] SyS_shmdt+0xb5/0x180
>> Jul 22 14:36:40 fc23 kernel: [<ffffffff81871d2e>]
>> entry_SYSCALL_64_fastpath+0x12/0x76
>>
>> Reported-by: Morten Stevens <[email protected]>
>> Signed-off-by: Stephen Smalley <[email protected]>
>
> Acked-by: Hugh Dickins <[email protected]>
> but with one reservation below...
>
>> ---
>> This version only differs in the patch description, which restores
>> the original lockdep trace from Morten Stevens. It was unfortunately
>> mangled in the prior version.
>>
>> fs/hugetlbfs/inode.c | 2 ++
>> ipc/shm.c | 2 +-
>> mm/shmem.c | 4 ++--
>> 3 files changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
>> index 0cf74df..973c24c 100644
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
>> inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0);
>> if (!inode)
>> goto out_dentry;
>> + if (creat_flags == HUGETLB_SHMFS_INODE)
>> + inode->i_flags |= S_PRIVATE;
>
> I wonder if you would do better just to set S_PRIVATE unconditionally
> there.
>
> hugetlb_file_setup() has two callsites, neither of which exposes an fd.
> One of them is shm.c's newseg(), which is getting us into the lockdep
> trouble that you're fixing here.
>
> The other is mmap.c's mmap_pgoff(). Now I don't think that will ever
> get into lockdep trouble (no mutex or rwsem has been taken at that
> point), but might your change above introduce (perhaps now or perhaps
> in future) an inconsistency between how SElinux checks are applied to
> a SHM area, and how they are applied to a MAP_ANONYMOUS|MAP_HUGETLB
> area, and how they are applied to a straight MAP_ANONYMOUS area?
>
> I think your patch as it stands brings SHM into line with
> MAP_ANONYMOUS, but leaves MAP_ANONYMOUS|MAP_HUGETLB going the old way.
> Perhaps an anomaly would appear when mprotect() is used?
>
> It's up to you: I think your patch is okay as is,
> but I just wonder if it has a surprise in store for the future.

That sounds reasonable, although there is the concern that
hugetlb_file_setup() might be used in the future for files that are
exposed as fds, unless we rename it to hugetlb_kernel_file_setup() or
similar to match shmem_kernel_file_setup(). Also should probably be
done as a separate change on top since it isn't directly related to
ipc/shm or fixing this lockdep.

2015-07-27 21:13:50

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.

On Mon, 27 Jul 2015, Stephen Smalley wrote:
> On 07/27/2015 03:32 PM, Hugh Dickins wrote:
> > On Fri, 24 Jul 2015, Stephen Smalley wrote:
> >> --- a/fs/hugetlbfs/inode.c
> >> +++ b/fs/hugetlbfs/inode.c
> >> @@ -1010,6 +1010,8 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
> >> inode = hugetlbfs_get_inode(sb, NULL, S_IFREG | S_IRWXUGO, 0);
> >> if (!inode)
> >> goto out_dentry;
> >> + if (creat_flags == HUGETLB_SHMFS_INODE)
> >> + inode->i_flags |= S_PRIVATE;
> >
> > I wonder if you would do better just to set S_PRIVATE unconditionally
> > there.
> >
> > hugetlb_file_setup() has two callsites, neither of which exposes an fd.
> > One of them is shm.c's newseg(), which is getting us into the lockdep
> > trouble that you're fixing here.
> >
> > The other is mmap.c's mmap_pgoff(). Now I don't think that will ever
> > get into lockdep trouble (no mutex or rwsem has been taken at that
> > point), but might your change above introduce (perhaps now or perhaps
> > in future) an inconsistency between how SElinux checks are applied to
> > a SHM area, and how they are applied to a MAP_ANONYMOUS|MAP_HUGETLB
> > area, and how they are applied to a straight MAP_ANONYMOUS area?
> >
> > I think your patch as it stands brings SHM into line with
> > MAP_ANONYMOUS, but leaves MAP_ANONYMOUS|MAP_HUGETLB going the old way.
> > Perhaps an anomaly would appear when mprotect() is used?
> >
> > It's up to you: I think your patch is okay as is,
> > but I just wonder if it has a surprise in store for the future.
>
> That sounds reasonable, although there is the concern that
> hugetlb_file_setup() might be used in the future for files that are
> exposed as fds, unless we rename it to hugetlb_kernel_file_setup() or

Good idea.

> similar to match shmem_kernel_file_setup(). Also should probably be
> done as a separate change on top since it isn't directly related to
> ipc/shm or fixing this lockdep.

Fair enough.

2015-07-27 22:02:18

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2] ipc: Use private shmem or hugetlbfs inodes for shm segments.

On Fri, Jul 24, 2015 at 08:34:35AM -0400, Stephen Smalley wrote:
> The shm implementation internally uses shmem or hugetlbfs inodes
> for shm segments. As these inodes are never directly exposed to
> userspace and only accessed through the shm operations which are
> already hooked by security modules, mark the inodes with the
> S_PRIVATE flag so that inode security initialization and permission
> checking is skipped.
>
> This was motivated by the following lockdep warning:
> Jul 22 14:36:40 fc23 kernel:
> ======================================================
> Jul 22 14:36:40 fc23 kernel: [ INFO: possible circular locking
> dependency detected ]
> Jul 22 14:36:40 fc23 kernel: 4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1
> Tainted: G W
> Jul 22 14:36:40 fc23 kernel:
> -------------------------------------------------------
> Jul 22 14:36:40 fc23 kernel: httpd/1597 is trying to acquire lock:
> Jul 22 14:36:40 fc23 kernel: (&ids->rwsem){+++++.}, at:
> [<ffffffff81385354>] shm_close+0x34/0x130
> Jul 22 14:36:40 fc23 kernel: #012but task is already holding lock:
> Jul 22 14:36:40 fc23 kernel: (&mm->mmap_sem){++++++}, at:
> [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180
> Jul 22 14:36:40 fc23 kernel: #012which lock already depends on the new lock.
> Jul 22 14:36:40 fc23 kernel: #012the existing dependency chain (in
> reverse order) is:
> Jul 22 14:36:40 fc23 kernel: #012-> #3 (&mm->mmap_sem){++++++}:
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81109a07>] lock_acquire+0xc7/0x270
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81217baa>] __might_fault+0x7a/0xa0
> Jul 22 14:36:40 fc23 kernel: [<ffffffff81284a1e>] filldir+0x9e/0x130
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa019bb08>]
> xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
> Jul 22 14:36:40 fc23 kernel: [<ffffffffa019c5b4>]
[....]

This was send via git-send-email, which means that you've mangled
the line wrapping when you pasted the stack trace into the git
commit message. I strongly suggest that you trim the data/kernel
part of these traces as it is unneccessary information, and it makes
it harder to read. i.e the trace in the commit message should look
more like:

======================================================
[ INFO: possible circular locking dependency detected ]
4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G W
-------------------------------------------------------
httpd/1597 is trying to acquire lock:
(&ids->rwsem){+++++.}, at: [<ffffffff81385354>] shm_close+0x34/0x130
#012but task is already holding lock:
(&mm->mmap_sem){++++++}, at: [<ffffffff81386bbb>] SyS_shmdt+0x4b/0x180
#012which lock already depends on the new lock.
#012the existing dependency chain (in reverse order) is:
#012-> #3 (&mm->mmap_sem){++++++}:
[<ffffffff81109a07>] lock_acquire+0xc7/0x270
[<ffffffff81217baa>] __might_fault+0x7a/0xa0
[<ffffffff81284a1e>] filldir+0x9e/0x130
[<ffffffffa019bb08>] xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
[<ffffffffa019c5b4>] xfs_readdir+0x1b4/0x330 [xfs]
[<ffffffffa019f38b>] xfs_file_readdir+0x2b/0x30 [xfs]
[<ffffffff812847e7>] iterate_dir+0x97/0x130
[<ffffffff81284d21>] SyS_getdents+0x91/0x120
[<ffffffff81871d2e>] entry_SYSCALL_64_fastpath+0x12/0x76
#012-> #2 (&xfs_dir_ilock_class){++++.+}:
[<ffffffff81109a07>] lock_acquire+0xc7/0x270
[<ffffffff81101e97>] down_read_nested+0x57/0xa0
[<ffffffffa01b0e57>] xfs_ilock+0x167/0x350 [xfs]
[<ffffffffa01b10b8>] xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
[<ffffffffa014799d>] xfs_attr_get+0xbd/0x190 [xfs]
[<ffffffffa01c17ad>] xfs_xattr_get+0x3d/0x70 [xfs]
[<ffffffff8129962f>] generic_getxattr+0x4f/0x70
[<ffffffff8139ba52>] inode_doinit_with_dentry+0x162/0x670
[<ffffffff8139cf69>] sb_finish_set_opts+0xd9/0x230
[<ffffffff8139d66c>] selinux_set_mnt_opts+0x35c/0x660
[<ffffffff8139ff97>] superblock_doinit+0x77/0xf0
[<ffffffff813a0020>] delayed_superblock_init+0x10/0x20
[<ffffffff81272d23>] iterate_supers+0xb3/0x110
[<ffffffff813a4e5f>] selinux_complete_init+0x2f/0x40
[<ffffffff813b47a3>] security_load_policy+0x103/0x600
[<ffffffff813a6901>] sel_write_load+0xc1/0x750
[<ffffffff8126e817>] __vfs_write+0x37/0x100
[<ffffffff8126f229>] vfs_write+0xa9/0x1a0
[<ffffffff8126ff48>] SyS_write+0x58/0xd0
[<ffffffff81871d2e>] entry_SYSCALL_64_fastpath+0x12/0x76
....

Cheers,

Dave.
--
Dave Chinner
[email protected]