Received: by 2002:a5d:9c59:0:0:0:0:0 with SMTP id 25csp1038633iof; Mon, 6 Jun 2022 18:51:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyeWjOSgP8HMjJj3W2C92pGtX13K0OzpCmm7z6IWOPlHunxG3OWIxsUJYlQpNzdf83QsrA5 X-Received: by 2002:a17:907:6e8c:b0:6f9:f012:25b5 with SMTP id sh12-20020a1709076e8c00b006f9f01225b5mr24588731ejc.191.1654566707201; Mon, 06 Jun 2022 18:51:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654566707; cv=none; d=google.com; s=arc-20160816; b=a5HfdqNZ/5YQgnwoUnNvhrbfOyg/6izMCQIq5EcxKx/WaL2IdKgDb13sGJAFdJ0uC1 XtLX3GMlcsT0SSYyTOP+Jc5ngaSO3P3fJGcMl2FAL3K3B6DrgD7tcsBp4dtZyWPHA2ok 3FnD1vJ16Ub5ScQ0jEYwTRVrcpkKm7H7uBdNUhFcZNzcIIcYfk/nYftI1v6lbxEkH4T8 t68R3pYRYqCE9oDBJFms6FREv0KDD3kZBZzBHzO1rwVmhRwVDSrZ1532Q3D85nPexKHo CZrQYblIN5/l8xEdei8y1YrEZP7CVHO0pKFHn+Jr5dZBIrCbObc+7FUaGUupJunfIVVU vgmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=K+bCkP5CcB21zdaAQ4i4c8fx4eQGsa9syMHybRmRncw=; b=IhJrw6vNajxVONRBSn0io5HnD1OxtOXUZVRgWWfJOPJoOpRjG6dniCVWvV+Il/NiC3 izFG5+CTywq8+z3m+rhhxq3DU3HC6gUjb8Qe8kyAX0bOvATDLXkcvNOSbeDp3BeKlRYf LnCcyq5c4zeoZDKa9DC1/775xwaW473NmayLm+qJLfeAQaeK6TD6XR8Cbqn+t0ZGeovf fuNGEtnBo0vr8pRiFCbZyU3YA50nbvnGFfB61Mz5o8wLkHjPs+xOmeDtPiMDiZFsW3MY sn0sehexeR9/+T4LVTLpUX7SQpHIfqNs/VgMdaS4hU+hZg8p+ZIOnLmCbOIOiGkVEnMh tAEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=NO6lvjkH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e4-20020a17090658c400b00703f39c8b0bsi1824968ejs.853.2022.06.06.18.51.21; Mon, 06 Jun 2022 18:51:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=NO6lvjkH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232212AbiFFTUK (ORCPT + 99 others); Mon, 6 Jun 2022 15:20:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231542AbiFFTT6 (ORCPT ); Mon, 6 Jun 2022 15:19:58 -0400 Received: from smtp-relay-canonical-1.canonical.com (smtp-relay-canonical-1.canonical.com [185.125.188.121]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45230115CAC; Mon, 6 Jun 2022 12:19:56 -0700 (PDT) Received: from [192.168.192.153] (unknown [50.126.114.69]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-1.canonical.com (Postfix) with ESMTPSA id 77C8F3FC00; Mon, 6 Jun 2022 19:19:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1654543190; bh=K+bCkP5CcB21zdaAQ4i4c8fx4eQGsa9syMHybRmRncw=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=NO6lvjkHsJwW3anrcMCW4mNeNys3WtsBz7rlOVgsQA4ZFyKB+36rBaNTU7+HZc23y pLWcZmnnL6cNj8O+xLjSnuurEMMzW/OMm6Zgax/dNCWcGXZAswl6bWwrGEOLnFiJfI +owdVtUN/NuwE/hrfDg0tBiQ5OAWrcUZl8d2VUplNGGYEv2oNCLr2Ykma6MlFu1Daa V3mHTaoscVnfNPGLVdn0aGEasbrXu7YA/B36svTdTp8LJ2CXS6qmFaUsLofZCRp7Qf xpExGR/XSB/qYnpZac7sQ+Qzc36FSpI6l73uA3XuyOi8eLnflXzYqtRp3UkVxb4oIx 5Haf8dQStSWIA== Message-ID: <266e648a-c537-66bc-455b-37105567c942@canonical.com> Date: Mon, 6 Jun 2022 12:19:36 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: Linux 5.18-rc4 Content-Language: en-US To: Linus Torvalds , "Eric W. Biederman" Cc: Ammar Faizi , James Morris , LSM List , Linux Kernel Mailing List , Al Viro , Kees Cook , linux-fsdevel@vger.kernel.org, Linux-MM , gwml@vger.gnuweeb.org References: <226cee6a-6ca1-b603-db08-8500cd8f77b7@gnuweeb.org> <87r1414y5v.fsf@email.froward.int.ebiederm.org> From: John Johansen Organization: Canonical In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/6/22 11:28, Linus Torvalds wrote: > On Mon, Jun 6, 2022 at 8:19 AM Eric W. Biederman wrote: >> Has anyone looked into this lock ordering issues? > > The deadlock is > >>>> [78140.503821] CPU0 CPU1 >>>> [78140.503823] ---- ---- >>>> [78140.503824] lock(&newf->file_lock); >>>> [78140.503826] lock(&p->alloc_lock); >>>> [78140.503828] lock(&newf->file_lock); >>>> [78140.503830] lock(&ctx->lock); > > and the alloc_lock -> file_lock on CPU1 is trivial - it's seq_show() > in fs/proc/fd.c: > > task_lock(task); > files = task->files; > if (files) { > unsigned int fd = proc_fd(m->private); > > spin_lock(&files->file_lock); > > and that looks all normal. > > But the other chains look painful. > > I do see the IPC code doing ugly things, in particular I detest this code: > > task_lock(current); > list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist); > task_unlock(current); > > where it is using the task lock to protect the shm_clist list. Nasty. > > And it's doing that inside the shm_ids.rwsem lock _and_ inside the > shp->shm_perm.lock. > > So the IPC code has newseg() doing > > shmget -> > ipcget(): > down_write(ids->rwsem) -> > newseg(): > ipc_addid gets perm->lock > task_lock(current) > > so you have > > ids->rwsem -> perm->lock -> alloc_lock > > there. > > So now we have that > > ids->rwsem -> ipcperm->lock -> alloc_lock -> file_lock > > when you put those sequences together. > > But I didn't figure out what the security subsystem angle is and how > that then apparently mixes things up with execve. > > Yes, newseg() is doing that > > error = security_shm_alloc(&shp->shm_perm); > > while holding rwsem, but I can't see how that matters. From the > lockdep output, rwsem doesn't actually seem to be part of the whole > sequence. > > It *looks* like we have > > apparmour ctx->lock --> > radix_tree_preloads.lock --> > ipcperm->lock > > and apparently that's called under the file_lock somewhere, completing > the circle. > > I guess the execve component is that > > begin_new_exec -> > security_bprm_committing_creds -> > apparmor_bprm_committing_creds -> > aa_inherit_files -> > iterate_fd -> *takes file_lock* > match_file -> > aa_file_perm -> > update_file_ctx *takes ctx->lock* > > so that's how you get file_lock -> ctx->lock. > yes > So you have: > > SHMGET: > ipcperm->lock -> alloc_lock > /proc: > alloc_lock -> file_lock > apparmor_bprm_committing_creds: > file_lock -> ctx->lock > > and then all you need is ctx->lock -> ipcperm->lock but I didn't find that part. > yeah that is the part I got stuck on, before being pulled away from this > I suspect that part is that both Apparmor and IPC use the idr local lock. > bingo, apparmor moved its secids allocation from a custom radix tree to idr in 99cc45e48678 apparmor: Use an IDR to allocate apparmor secids and ipc is using the idr for its id allocation as well I can easily lift the secid() allocation out of the ctx->lock but that would still leave it happening under the file_lock and not fix the problem. I think the quick solution would be for apparmor to stop using idr, reverting back at least temporarily to the custom radix tree.