Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp2259325rwp; Fri, 14 Jul 2023 03:30:32 -0700 (PDT) X-Google-Smtp-Source: APBJJlF0jDC2fXaT1Nn2f7+d5KhA5V/E0VgMHsQhflENoYpCniY2ghAi7b6aPkxpt1u0TP3ky1Wx X-Received: by 2002:a05:6870:41ca:b0:1b0:6cd8:1263 with SMTP id z10-20020a05687041ca00b001b06cd81263mr5093678oac.48.1689330631928; Fri, 14 Jul 2023 03:30:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689330631; cv=none; d=google.com; s=arc-20160816; b=dLdRjnXflqnBf3shCIv39aBzbuVcBH59JjlCybFHDnfeg7+thf7cQ7Ghc7T4zJuSh6 2jA4NoUrqiYyMHLOQVU9DWoelNykuAj6NZIEwGW5CCfP0pIbhoHU07sFrNDlf/nYDCjT J2NeZtBJcjNLzLsfK+G2/U+Vk/JxFskn70Q7XawwhELg7BfM4OhmOAhDVriZx8hsrdWb YxBaOTjDAISr9UYrkVqgAbyvy8pJRDho8xCWJ2mJ9/Yr3hCsVINfBWhpy8C8XgtReUQG Tm3OKF7XVsJMrrpn8LUjzs1LLAKazPQ4y7bjnY8Lmx02jLpsgaAjzEoKhcowvMeqW279 fUfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=AkOjoC4FmhTVKe0qp3hI3DmNubc6Z9pqy2pZ6OiOc3E=; fh=55RZIZzLuhRrvB39jWdpKSgMfNLAAz4RKVVa+gvdeE8=; b=r1vGCynVyY9bKMewQZUBS1XuQrgVY7gqJEWtILMQJLyT4JXflbgiF3mrs3tX8qAu4X NBT3ORrazWCKFjVcc7r6wsKYtrG2M1BZaQD4XIPBxGuvmRz2vb4czGs4W/4xWFq5xxyF m0H02wgmwjPLFl+LwOX8wjzaLsZsEWj+e0pC9dnVFksfarhScxgy/AvBKsQr8J2Of1+h SH9mGyO9YeILYzL2WnhAL24pQXPI3RnvG4T0bPEIt6zlNLM0OB/gQKOLSuTZQYcMfyZ8 zhcyHwHleyRp0DzTMWDBKcINE2F0QJr750do51xz83XXU2iJRDBD1QCKvKod0Q9iAt6X xeog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=mAJmPi2o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x186-20020a6386c3000000b0055c9d565335si1785251pgd.557.2023.07.14.03.30.19; Fri, 14 Jul 2023 03:30:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=mAJmPi2o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235891AbjGNJod (ORCPT + 99 others); Fri, 14 Jul 2023 05:44:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235865AbjGNJo3 (ORCPT ); Fri, 14 Jul 2023 05:44:29 -0400 Received: from mail.marcansoft.com (marcansoft.com [212.63.210.85]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D02C26AE; Fri, 14 Jul 2023 02:44:26 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) (Authenticated sender: lina@asahilina.net) by mail.marcansoft.com (Postfix) with ESMTPSA id 9FF985BC37; Fri, 14 Jul 2023 09:44:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=asahilina.net; s=default; t=1689327864; bh=2gCMoY94/ESU27ff3B7Nnjye7UV/M5FZXhxeO4nGMuc=; h=Date:Subject:To:Cc:References:From:In-Reply-To; b=mAJmPi2op48Hb4dON0tfHReK27/tgzVEWLOaJggy1nSyhAtKmrBJYJ+WgA8dn0/pb QUPW6CZCfJf+OxysqpTk8dRJ1sg7r8KuCnrcI95UuBIWSzCZmd1a3fyHQw8Gf1rWdS EJ8zpMxNZmWr/MhMPgo5BrsxyzLheRq8dK85Qrwf8PiBA0rDGPA6VvEFxdNvRuPkku hL5CdeJvRFW2C1opfSYtEg5ZgGoyowK2licKSBrrhEG2atsDzvmnmYvPaZ26eylRxH VfnL2B9licbkQO4SGdtMrB34oSks+S4N8G97+rLvFu1vZrhzGKB3FMBxz7NeS0/FYz q7NHxkvXS+Y+w== Message-ID: <5c7278b9-e63e-4ddf-1988-1e02087fb988@asahilina.net> Date: Fri, 14 Jul 2023 18:44:20 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name Content-Language: en-US To: =?UTF-8?Q?Christian_K=c3=b6nig?= , Luben Tuikov , David Airlie , Daniel Vetter , Sumit Semwal Cc: Faith Ekstrand , Alyssa Rosenzweig , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, asahi@lists.linux.dev References: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> <20230714-drm-sched-fixes-v1-2-c567249709f7@asahilina.net> From: Asahi Lina In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14/07/2023 17.43, Christian König wrote: > Am 14.07.23 um 10:21 schrieb Asahi Lina: >> A signaled scheduler fence can outlive its scheduler, since fences are >> independencly reference counted. Therefore, we can't reference the >> scheduler in the get_timeline_name() implementation. >> >> Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared >> dma-bufs reference fences from GPU schedulers that no longer exist. >> >> Signed-off-by: Asahi Lina >> --- >> drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++- >> drivers/gpu/drm/scheduler/sched_fence.c | 4 +++- >> include/drm/gpu_scheduler.h | 5 +++++ >> 3 files changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >> index b2bbc8a68b30..17f35b0b005a 100644 >> --- a/drivers/gpu/drm/scheduler/sched_entity.c >> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >> @@ -389,7 +389,12 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity) >> >> /* >> * Fence is from the same scheduler, only need to wait for >> - * it to be scheduled >> + * it to be scheduled. >> + * >> + * Note: s_fence->sched could have been freed and reallocated >> + * as another scheduler. This false positive case is okay, as if >> + * the old scheduler was freed all of its jobs must have >> + * signaled their completion fences. > > This is outright nonsense. As long as an entity for a scheduler exists > it is not allowed to free up this scheduler. > > So this function can't be called like this. > >> */ >> fence = dma_fence_get(&s_fence->scheduled); >> dma_fence_put(entity->dependency); >> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c >> index ef120475e7c6..06a0eebcca10 100644 >> --- a/drivers/gpu/drm/scheduler/sched_fence.c >> +++ b/drivers/gpu/drm/scheduler/sched_fence.c >> @@ -68,7 +68,7 @@ static const char *drm_sched_fence_get_driver_name(struct dma_fence *fence) >> static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f) >> { >> struct drm_sched_fence *fence = to_drm_sched_fence(f); >> - return (const char *)fence->sched->name; >> + return (const char *)fence->sched_name; >> } >> >> static void drm_sched_fence_free_rcu(struct rcu_head *rcu) >> @@ -216,6 +216,8 @@ void drm_sched_fence_init(struct drm_sched_fence *fence, >> unsigned seq; >> >> fence->sched = entity->rq->sched; >> + strlcpy(fence->sched_name, entity->rq->sched->name, >> + sizeof(fence->sched_name)); >> seq = atomic_inc_return(&entity->fence_seq); >> dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled, >> &fence->lock, entity->fence_context, seq); >> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >> index e95b4837e5a3..4fa9523bd47d 100644 >> --- a/include/drm/gpu_scheduler.h >> +++ b/include/drm/gpu_scheduler.h >> @@ -305,6 +305,11 @@ struct drm_sched_fence { >> * @lock: the lock used by the scheduled and the finished fences. >> */ >> spinlock_t lock; >> + /** >> + * @sched_name: the name of the scheduler that owns this fence. We >> + * keep a copy here since fences can outlive their scheduler. >> + */ >> + char sched_name[16]; > > This just mitigates the problem, but doesn't fix it. Could you point out any remaining issues so we can fix them? Right now this absolutely *is* broken and this fixes the breakage I observed. If there are other bugs remaining, I'd like to know what they are so I can fix them. > The real issue is that the hw fence is kept around much longer than that. As far as I know, the whole point of scheduler fences is to decouple the hw fences from the consumers. I already talked with Daniel about this. The current behavior is broken. These fences can live forever. It is broken to require that they outlive the driver that produced them. > Additional to that I'm not willing to increase the scheduler fence size > like that just to decouple them from the scheduler. Did you read my explanation on the cover letter as to how this is just broken right now? We need to fix this. If you have a better suggestion I'll take it. Doing nothing is not an option. ~~ Lina