Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759557Ab3CGR1d (ORCPT ); Thu, 7 Mar 2013 12:27:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:16186 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753057Ab3CGR1b (ORCPT ); Thu, 7 Mar 2013 12:27:31 -0500 Date: Thu, 7 Mar 2013 18:25:45 +0100 From: Oleg Nesterov To: Dave Jones , Linux Kernel Cc: Alexander Viro , Li Zefan , Tejun Heo , cgroups@vger.kernel.org Subject: Re: lockdep trace from prepare_bprm_creds Message-ID: <20130307172545.GA10353@redhat.com> References: <20130306223657.GA7392@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130306223657.GA7392@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5316 Lines: 122 On 03/06, Dave Jones wrote: > > Looks like this happens when my fuzzer tries to look up garbage in /sys/fs/cgroup/freezer/ > > trinity -c execve -V /sys/fs/cgroup/freezer/ > > will reproduce it very quickly. > > This isn't a new trace. I've seen it in the past from iknowthis also. > > Dave > > > [ 943.971541] ====================================================== > [ 943.972451] [ INFO: possible circular locking dependency detected ] > [ 943.973370] 3.9.0-rc1+ #69 Not tainted > [ 943.973927] ------------------------------------------------------- > [ 943.974838] trinity-child0/1301 is trying to acquire lock: > [ 943.975650] blocked: (&sb->s_type->i_mutex_key#9){+.+.+.}, instance: ffff880127ea1680, at: [] do_last+0x35c/0xe30 > [ 943.977522] > but task is already holding lock: > [ 943.978371] held: (&sig->cred_guard_mutex){+.+.+.}, instance: ffff880123937578, at: [] prepare_bprm_creds+0x36/0x80 > [ 943.980260] > which lock already depends on the new lock. > > [ 943.981434] > the existing dependency chain (in reverse order) is: > [ 943.982499] > -> #2 (&sig->cred_guard_mutex){+.+.+.}: > [ 943.983280] [] lock_acquire+0x92/0x1d0 > [ 943.984196] [] mutex_lock_nested+0x73/0x3b0 > [ 943.985173] [] attach_task_by_pid+0x122/0x8d0 > [ 943.986151] [] cgroup_tasks_write+0x13/0x20 > [ 943.987127] [] cgroup_file_write+0x130/0x2f0 > [ 943.988118] [] vfs_write+0xaf/0x180 > [ 943.988985] [] sys_write+0x55/0xa0 > [ 943.989853] [] system_call_fastpath+0x16/0x1b > [ 943.990853] > -> #1 (cgroup_mutex){+.+.+.}: > [ 943.991616] [] lock_acquire+0x92/0x1d0 > [ 943.992527] [] mutex_lock_nested+0x73/0x3b0 > [ 943.993492] [] cgroup_mount+0x2e7/0x520 > [ 943.994423] [] mount_fs+0x43/0x1b0 > [ 943.995275] [] vfs_kern_mount+0x61/0x100 > [ 943.996220] [] do_mount+0x211/0xa00 > [ 943.997103] [] sys_mount+0x8e/0xe0 > [ 943.997965] [] system_call_fastpath+0x16/0x1b > [ 943.998972] > -> #0 (&sb->s_type->i_mutex_key#9){+.+.+.}: > [ 943.999886] [] __lock_acquire+0x1b86/0x1c80 > [ 944.000864] [] lock_acquire+0x92/0x1d0 > [ 944.001771] [] mutex_lock_nested+0x73/0x3b0 > [ 944.002750] [] do_last+0x35c/0xe30 > [ 944.003620] [] path_openat+0xba/0x4f0 > [ 944.004517] [] do_filp_open+0x41/0xa0 > [ 944.005427] [] open_exec+0x53/0x130 > [ 944.006296] [] do_execve_common.isra.26+0x31d/0x710 > [ 944.007373] [] do_execve+0x18/0x20 > [ 944.008222] [] sys_execve+0x3d/0x60 > [ 944.009093] [] stub_execve+0x69/0xa0 > [ 944.009983] > other info that might help us debug this: > > [ 944.011126] Chain exists of: > &sb->s_type->i_mutex_key#9 --> cgroup_mutex --> &sig->cred_guard_mutex > > [ 944.012745] Possible unsafe locking scenario: > > [ 944.013617] CPU0 CPU1 > [ 944.014280] ---- ---- > [ 944.014942] lock(&sig->cred_guard_mutex); > [ 944.021332] lock(cgroup_mutex); > [ 944.028094] lock(&sig->cred_guard_mutex); > [ 944.035007] lock(&sb->s_type->i_mutex_key#9); > [ 944.041602] And cgroup_mount() does i_mutex -> cgroup_mutex... Add cc's. I do not think we can move open_exec() outside of cred_guard_mutex. We can change do_execve_common(), but binfmt->load_binary() does open() too. And it is not easy to avoid ->cred_guard_mutex in threadgroup_lock(), we can't change de_thread() to do threadgroup_change_begin/end... Or perhaps we can? It doesn't need to sleep under ->group_rwsem, we only need it around ->group_leader changing. Otherwise cgroup_attach_proc() can rely on do_exit()->threadgroup_change_begin() ? But perhaps someone can suggest another fix in cgroup.c. Oleg. --- x/fs/exec.c +++ x/fs/exec.c @@ -898,11 +898,13 @@ static int de_thread(struct task_struct sig->notify_count = -1; /* for exit_notify() */ for (;;) { + threadgroup_change_begin(); write_lock_irq(&tasklist_lock); if (likely(leader->exit_state)) break; __set_current_state(TASK_KILLABLE); write_unlock_irq(&tasklist_lock); + threadgroup_change_end(); schedule(); if (unlikely(__fatal_signal_pending(tsk))) goto killed; @@ -960,6 +962,7 @@ static int de_thread(struct task_struct if (unlikely(leader->ptrace)) __wake_up_parent(leader, leader->parent); write_unlock_irq(&tasklist_lock); + threadgroup_change_end(); release_task(leader); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/