Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp234473rwb; Thu, 6 Oct 2022 17:38:19 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4OAAH5LieD2LPCf9LgZ5i4p5bGXM652Y+7KSeaGUC6ajdizX82JQ8U7OJyNkq2C+TArfVF X-Received: by 2002:a17:907:2712:b0:78d:3136:deec with SMTP id w18-20020a170907271200b0078d3136deecmr1951427ejk.698.1665103099328; Thu, 06 Oct 2022 17:38:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665103099; cv=none; d=google.com; s=arc-20160816; b=U5rsKdNaC9pTapeKKnRajYCRTzORwUZy1PcRGIMlLj74dgradGgbuNygBqMRwfyNTK 7Qp7NdIVO34ceEa7I4gSmnUXF9kqnGoq55CQvIC3+sXCqI/+kZLTS7YIb34fdV6g3r+t wI95Pep54/MO4loOaqWGLHSzQYypRcyJaQ5VInjDvtu8b9tZe76NqPr7evjNdAnM9Mk1 tjtg5z/UVWD/vUGOQDyFWxBww65gcXfW8OjmJG8/x0V2DkJR1wo2OVrTKPOf5MRmBKYl Cf4C9TRKQyM/gW5x/LsLBph2CRO5/1MAiyroVt6vQdDpUv+Zc3OJhVdM0trmxuVy/5Sr 32JQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=0UIpt5qbGLZX+l8DalBZlgk1bfpReda6mkVb7cX8QdQ=; b=dzFK8gsDVYg8nX/Xc8/YhM/GX1dLMHfxHu56C82hYX9lkYiKsnbcGjijMKwtLOQoye VsjvAgFPtpToY5Yx/V0RmSta0iKG37HHsaJrp+/eUBOTLN/w0FLFOk6HAiMDh37T17D5 p327D/R7gDObovtGsmSFYIjmoukyknTQYL7Fs5fW14CllkPprZvdCI9woy1Zy3Xfk7F6 EPguCLebMSxr7cpvGpJDvFAQWFvb/zPB8A5uXE2bdIN8yBSPP7wHiJmZ7ASNH+zweSC/ jflmib/9Myfr9znOJH/8OeWguSx86JHjV+Bu7TfMZVJ96llGs4VmTXArWStVDz2c9Yok QAkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=BnMwoems; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h13-20020a056402280d00b00450bda7e40bsi951716ede.29.2022.10.06.17.37.53; Thu, 06 Oct 2022 17:38:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=BnMwoems; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231829AbiJFXpp (ORCPT + 99 others); Thu, 6 Oct 2022 19:45:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232125AbiJFXph (ORCPT ); Thu, 6 Oct 2022 19:45:37 -0400 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DE3F3BC7E for ; Thu, 6 Oct 2022 16:45:36 -0700 (PDT) Received: by mail-oi1-x234.google.com with SMTP id r186so3816052oie.4 for ; Thu, 06 Oct 2022 16:45:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=0UIpt5qbGLZX+l8DalBZlgk1bfpReda6mkVb7cX8QdQ=; b=BnMwoemseLDRkIR1D5narDJJvmaVBsbjSkb5/+vanfF/XVhnkGjsNZ3tWnzkSx4ciX E9YMP6WRJuK2MXYcdLHluOmh+v4dNk81NAZt87ZCKlPxa2zamiIZDqjJfDJzqumMTTw3 guxWdJ6UwoLe88lwygu3LX27x73ahXJjGx/xQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0UIpt5qbGLZX+l8DalBZlgk1bfpReda6mkVb7cX8QdQ=; b=WZTMCN4YbSYzc+AzEhGzZzXX2RvXMNniqgF27VKuAPPyVdNgeDL6vogEGU4EvkD6Ix 2t3dMNnNgcTiw4xJVagxONmnzcYR8NZ66ksbzywTagYCxWmjwlHXgd5lWTdhGINt0TC6 vYRvdRJjt1WxuzL+IYQuJsWSpupMbAMak+DDbX4+9cS74MqSE/wjdM3i/sJPw2A/BS1Z V9E2M8Tt1bHQg4K13Xp/JkY4ftcNwekoSAgWum/22UBbCw43b7UVl/0X98w1QUKdn6bE VIibAPpqzB7GMoKJC3Apbnc5lBIg93EqfXsyPfiu8R1SrBInJBcaaBqmFjUOXm0ciuK0 4W1Q== X-Gm-Message-State: ACrzQf3XPsTKJs3gUGxVLdxYlmpXmPj8/Sl/etyWHBT0JWf8Nb9Dj85q P+Z2zxjDIw/G+M2+mJfQNRkX5C37vAT+AA== X-Received: by 2002:a05:6808:14cf:b0:350:b76b:cf9a with SMTP id f15-20020a05680814cf00b00350b76bcf9amr1012259oiw.249.1665099934596; Thu, 06 Oct 2022 16:45:34 -0700 (PDT) Received: from mail-oi1-f177.google.com (mail-oi1-f177.google.com. [209.85.167.177]) by smtp.gmail.com with ESMTPSA id c15-20020a9d67cf000000b0065c477a9db9sm481379otn.1.2022.10.06.16.45.31 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Oct 2022 16:45:32 -0700 (PDT) Received: by mail-oi1-f177.google.com with SMTP id w70so3829286oie.2 for ; Thu, 06 Oct 2022 16:45:31 -0700 (PDT) X-Received: by 2002:a05:6808:2123:b0:354:2823:f542 with SMTP id r35-20020a056808212300b003542823f542mr1084598oiw.229.1665099931487; Thu, 06 Oct 2022 16:45:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Linus Torvalds Date: Thu, 6 Oct 2022 16:45:15 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [git pull] drm for 6.1-rc1 To: Dave Airlie Cc: Alex Deucher , Alex Deucher , =?UTF-8?Q?Christian_K=C3=B6nig?= , Daniel Vetter , LKML , dri-devel Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 6, 2022 at 1:25 PM Dave Airlie wrote: > > > [ 1234.778760] BUG: kernel NULL pointer dereference, address: 0000000000000088 > [ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched] As far as I can tell, that's the line struct drm_gpu_scheduler *sched = s_fence->sched; where 's_fence' is NULL. The code is 0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 5: 41 54 push %r12 7: 55 push %rbp 8: 53 push %rbx 9: 48 89 fb mov %rdi,%rbx c:* 48 8b af 88 00 00 00 mov 0x88(%rdi),%rbp <-- trapping instruction 13: f0 ff 8d f0 00 00 00 lock decl 0xf0(%rbp) 1a: 48 8b 85 80 01 00 00 mov 0x180(%rbp),%rax and that next 'lock decl' instruction would have been the atomic_dec(&sched->hw_rq_count); at the top of drm_sched_job_done(). Now, as to *why* you'd have a NULL s_fence, it would seem that drm_sched_job_cleanup() was called with an active job. Looking at that code, it does if (kref_read(&job->s_fence->finished.refcount)) { /* drm_sched_job_arm() has been called */ dma_fence_put(&job->s_fence->finished); ... but then it does job->s_fence = NULL; anyway, despite the job still being active. The logic of that kind of "fake refcount" escapes me. The above looks fundamentally racy, not to say pointless and wrong (a refcount is a _count_, not a flag, so there could be multiple references to it, what says that you can just decrement one of them and say "I'm done"). Now, _why_ any of that happens, I have no idea. I'm just looking at the immediate "that pointer is NULL" thing, and reacting to what looks like a completely bogus refcount pattern. But that odd refcount pattern isn't new, so it's presumably some user on the amd gpu side that changed. The problem hasn't happened again for me, but that's not saying a lot, since it was very random to begin with. Linus