Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934502AbbGHUhW (ORCPT ); Wed, 8 Jul 2015 16:37:22 -0400 Received: from mx01.imt-systems.com ([212.224.83.170]:38456 "EHLO mx01.imt-systems.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754548AbbGHUhI (ORCPT ); Wed, 8 Jul 2015 16:37:08 -0400 MIME-Version: 1.0 In-Reply-To: <559D51C2.7060603@tycho.nsa.gov> References: <559D51C2.7060603@tycho.nsa.gov> Date: Wed, 8 Jul 2015 22:37:01 +0200 Message-ID: Subject: Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS From: Morten Stevens To: Stephen Smalley Cc: Stephen Smalley , Hugh Dickins , Prarit Bhargava , Morten Stevens , Eric Sandeen , Dave Chinner , Daniel Wagner , Linux Kernel , Eric Paris , linux-mm@kvack.org, selinux , Andrew Morton , Linus Torvalds Content-Type: text/plain; charset=UTF-8 X-Spam-Score: 0 () Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11420 Lines: 222 2015-07-08 18:37 GMT+02:00 Stephen Smalley : > On 07/08/2015 09:13 AM, Stephen Smalley wrote: >> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins wrote: >>> It appears that, at some point last year, XFS made directory handling >>> changes which bring it into lockdep conflict with shmem_zero_setup(): >>> it is surprising that mmap() can clone an inode while holding mmap_sem, >>> but that has been so for many years. >>> >>> Since those few lockdep traces that I've seen all implicated selinux, >>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >>> v3.13's commit c7277090927a ("security: shmem: implement kernel private >>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >>> >>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >>> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >>> which cloned inode in mmap(), but if so, I cannot locate them now. >> >> This causes a regression for SELinux (please, in the future, cc >> selinux list and Paul Moore on SELinux-related changes). In >> particular, this change disables SELinux checking of mprotect >> PROT_EXEC on shared anonymous mappings, so we lose the ability to >> control executable mappings. That said, we are only getting that >> check today as a side effect of our file execute check on the tmpfs >> inode, whereas it would be better (and more consistent with the >> mmap-time checks) to apply an execmem check in that case, in which >> case we wouldn't care about the inode-based check. However, I am >> unclear on how to correctly detect that situation from >> selinux_file_mprotect() -> file_map_prot_check(), because we do have a >> non-NULL vma->vm_file so we treat it as a file execute check. In >> contrast, if directly creating an anonymous shared mapping with >> PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with >> a NULL file and therefore we end up applying an execmem check. > > Also, can you provide the lockdep traces that motivated this change? Yes, here is it: [ 28.177939] ====================================================== [ 28.177959] [ INFO: possible circular locking dependency detected ] [ 28.177980] 4.1.0-0.rc7.git0.1.fc23.x86_64+debug #1 Tainted: G W [ 28.178002] ------------------------------------------------------- [ 28.178022] sshd/1764 is trying to acquire lock: [ 28.178037] (&isec->lock){+.+.+.}, at: [] inode_doinit_with_dentry+0xc5/0x6a0 [ 28.178078] but task is already holding lock: [ 28.178097] (&mm->mmap_sem){++++++}, at: [] vm_mmap_pgoff+0x8f/0xf0 [ 28.178131] which lock already depends on the new lock. [ 28.178157] the existing dependency chain (in reverse order) is: [ 28.178180] -> #2 (&mm->mmap_sem){++++++}: [ 28.178201] [] lock_acquire+0xc7/0x2a0 [ 28.178225] [] might_fault+0x8c/0xb0 [ 28.178248] [] filldir+0x9a/0x130 [ 28.178269] [] xfs_dir2_block_getdents.isra.12+0x1a6/0x1d0 [xfs] [ 28.178330] [] xfs_readdir+0x1c4/0x360 [xfs] [ 28.178368] [] xfs_file_readdir+0x2b/0x30 [xfs] [ 28.178404] [] iterate_dir+0x9a/0x140 [ 28.178425] [] SyS_getdents+0x91/0x120 [ 28.178447] [] system_call_fastpath+0x12/0x76 [ 28.178471] -> #1 (&xfs_dir_ilock_class){++++.+}: [ 28.178494] [] lock_acquire+0xc7/0x2a0 [ 28.178515] [] down_read_nested+0x57/0xa0 [ 28.178538] [] xfs_ilock+0x171/0x390 [xfs] [ 28.178579] [] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] [ 28.178618] [] xfs_attr_get+0xbd/0x1b0 [xfs] [ 28.178651] [] xfs_xattr_get+0x3d/0x80 [xfs] [ 28.178688] [] generic_getxattr+0x4f/0x70 [ 28.178711] [] inode_doinit_with_dentry+0x172/0x6a0 [ 28.178737] [] sb_finish_set_opts+0xdb/0x260 [ 28.178759] [] selinux_set_mnt_opts+0x331/0x670 [ 28.178783] [] superblock_doinit+0x77/0xf0 [ 28.178804] [] delayed_superblock_init+0x10/0x20 [ 28.178849] [] iterate_supers+0xba/0x120 [ 28.178872] [] selinux_complete_init+0x33/0x40 [ 28.178897] [] security_load_policy+0x103/0x640 [ 28.178920] [] sel_write_load+0xb6/0x790 [ 28.179482] [] __vfs_write+0x37/0x110 [ 28.180047] [] vfs_write+0xa9/0x1c0 [ 28.180630] [] SyS_write+0x5c/0xd0 [ 28.181168] [] system_call_fastpath+0x12/0x76 [ 28.181740] -> #0 (&isec->lock){+.+.+.}: [ 28.182808] [] __lock_acquire+0x1b31/0x1e40 [ 28.183347] [] lock_acquire+0xc7/0x2a0 [ 28.183897] [] mutex_lock_nested+0x7d/0x460 [ 28.184427] [] inode_doinit_with_dentry+0xc5/0x6a0 [ 28.184944] [] selinux_d_instantiate+0x1c/0x20 [ 28.185470] [] security_d_instantiate+0x1b/0x30 [ 28.185980] [] d_instantiate+0x54/0x80 [ 28.186495] [] __shmem_file_setup+0xdc/0x250 [ 28.186990] [] shmem_zero_setup+0x28/0x70 [ 28.187500] [] mmap_region+0x66c/0x680 [ 28.188006] [] do_mmap_pgoff+0x323/0x410 [ 28.188500] [] vm_mmap_pgoff+0xb0/0xf0 [ 28.189005] [] SyS_mmap_pgoff+0x116/0x2b0 [ 28.189490] [] SyS_mmap+0x1b/0x30 [ 28.189975] [] system_call_fastpath+0x12/0x76 [ 28.190474] other info that might help us debug this: [ 28.191901] Chain exists of: &isec->lock --> &xfs_dir_ilock_class --> &mm->mmap_sem [ 28.193327] Possible unsafe locking scenario: [ 28.194297] CPU0 CPU1 [ 28.194774] ---- ---- [ 28.195254] lock(&mm->mmap_sem); [ 28.195709] lock(&xfs_dir_ilock_class); [ 28.196174] lock(&mm->mmap_sem); [ 28.196654] lock(&isec->lock); [ 28.197108] *** DEADLOCK *** [ 28.198451] 1 lock held by sshd/1764: [ 28.198900] #0: (&mm->mmap_sem){++++++}, at: [] vm_mmap_pgoff+0x8f/0xf0 [ 28.199370] stack backtrace: [ 28.200276] CPU: 2 PID: 1764 Comm: sshd Tainted: G W 4.1.0-0.rc7.git0.1.fc23.x86_64+debug #1 [ 28.200753] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 [ 28.201246] 0000000000000000 00000000eda89a94 ffff8800a86a39c8 ffffffff81896375 [ 28.201771] 0000000000000000 ffffffff82a910d0 ffff8800a86a3a18 ffffffff8110fbd6 [ 28.202275] 0000000000000002 ffff8800a86a3a78 0000000000000001 ffff8800a897b008 [ 28.203099] Call Trace: [ 28.204237] [] dump_stack+0x4c/0x65 [ 28.205362] [] print_circular_bug+0x206/0x280 [ 28.206502] [] __lock_acquire+0x1b31/0x1e40 [ 28.207650] [] lock_acquire+0xc7/0x2a0 [ 28.208758] [] ? inode_doinit_with_dentry+0xc5/0x6a0 [ 28.209902] [] mutex_lock_nested+0x7d/0x460 [ 28.211023] [] ? inode_doinit_with_dentry+0xc5/0x6a0 [ 28.212162] [] ? inode_doinit_with_dentry+0xc5/0x6a0 [ 28.213283] [] ? native_sched_clock+0x2d/0xa0 [ 28.214403] [] ? sched_clock+0x9/0x10 [ 28.215514] [] inode_doinit_with_dentry+0xc5/0x6a0 [ 28.216656] [] selinux_d_instantiate+0x1c/0x20 [ 28.217776] [] security_d_instantiate+0x1b/0x30 [ 28.218902] [] d_instantiate+0x54/0x80 [ 28.219992] [] __shmem_file_setup+0xdc/0x250 [ 28.221112] [] shmem_zero_setup+0x28/0x70 [ 28.222234] [] mmap_region+0x66c/0x680 [ 28.223362] [] do_mmap_pgoff+0x323/0x410 [ 28.224493] [] ? vm_mmap_pgoff+0x8f/0xf0 [ 28.225643] [] vm_mmap_pgoff+0xb0/0xf0 [ 28.226771] [] SyS_mmap_pgoff+0x116/0x2b0 [ 28.227900] [] ? SyS_fcntl+0x5de/0x760 [ 28.229042] [] SyS_mmap+0x1b/0x30 [ 28.230156] [] system_call_fastpath+0x12/0x76 [ 46.520367] Adjusting tsc more than 11% (5419175 vs 7179037) Best regards, Morten > >> >>> >>> Reported-and-tested-by: Prarit Bhargava >>> Reported-by: Daniel Wagner >>> Reported-by: Morten Stevens >>> Signed-off-by: Hugh Dickins >>> --- >>> >>> mm/shmem.c | 8 +++++++- >>> 1 file changed, 7 insertions(+), 1 deletion(-) >>> >>> --- 4.1-rc7/mm/shmem.c 2015-04-26 19:16:31.352191298 -0700 >>> +++ linux/mm/shmem.c 2015-06-14 09:26:49.461120166 -0700 >>> @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru >>> struct file *file; >>> loff_t size = vma->vm_end - vma->vm_start; >>> >>> - file = shmem_file_setup("dev/zero", size, vma->vm_flags); >>> + /* >>> + * Cloning a new file under mmap_sem leads to a lock ordering conflict >>> + * between XFS directory reading and selinux: since this file is only >>> + * accessible to the user through its mapping, use S_PRIVATE flag to >>> + * bypass file security, in the same way as shmem_kernel_file_setup(). >>> + */ >>> + file = __shmem_file_setup("dev/zero", size, vma->vm_flags, S_PRIVATE); >>> if (IS_ERR(file)) >>> return PTR_ERR(file); >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Please read the FAQ at http://www.tux.org/lkml/ >> _______________________________________________ >> Selinux mailing list >> Selinux@tycho.nsa.gov >> To unsubscribe, send email to Selinux-leave@tycho.nsa.gov. >> To get help, send an email containing "help" to Selinux-request@tycho.nsa.gov. >> >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/