Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp1700341rwp; Thu, 13 Jul 2023 15:20:45 -0700 (PDT) X-Google-Smtp-Source: APBJJlGOhvBsUr2Rs/SXhdT0TejL49n93l6h9YFNbsY96SwbjCag+YZCzFei2MZJaaRJ0OzBfMuI X-Received: by 2002:a05:6a21:3391:b0:132:a5f:3e26 with SMTP id yy17-20020a056a21339100b001320a5f3e26mr3693977pzb.26.1689286845096; Thu, 13 Jul 2023 15:20:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689286845; cv=none; d=google.com; s=arc-20160816; b=FoIsNt7fpqKxR5IL4IuoJ2H4G1/PLme/9cFRv0jijWsI6oiC/y07SEC68Y4DMV1eTn 5GIlxwQi0xDQB+Oi5d9xtgIBzE7TT52z8pIjg+3k3UlW26NRGqA7h16q8iiN3cTp46h9 x7qjGYtII5QFKQwX8r/v7eLh7HH8AKd8+74rFfbsAHySTnxKE2kBNoqBnVr+7DgHmJmV VQ9OLxgmqII3y8P6YhHCW+0f1PoGt0l2JmAjiFWM9LPT2gssa9Fz+qSCEj6pmp0INsBH X6ODorzKBASxxPrBm4zE7PEOu9l3H1W+tz4D6A6TVX3b1iz1mUKCwPX4g0iP9ebO/XBT DzDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bkcPn/8IkEMwm5HtzX5ohbd8o26KLnwK0sNSpD7QR84=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=p8oM2xvT2JTyt2z7rNN7JfILmRMGasQtHT+Lqy9mkqSYQ5K2tQ7cG1kMkAA9GLsGGw sKOjMeoIn/PG2ay3EKgeXCI1E2U69/Xmc8rB8h3FsdwauDVuTQROUzk7gdnbHr3PmFZE fdLnIbPeAud3DXL6dz/ORklMKhZktMZOldpxvcJdx0PTrxaoPlXWoe8Jwok/4Ytij4eq HGLl3StfXNt/EckXDx9Dj3qfw1lAiLclBySjbDwdNDc60jwKPdaEcS4bGkRxqhfxF8pU vfe/kLLafNuwG91S8GmTrwn+u9Ijf6XtQd8vvGIUnl5AVx+6Djxu3EC7WV3gJLoMvob2 FYGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=TPEOi7m6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u191-20020a6385c8000000b0055c3e731f70si5933971pgd.146.2023.07.13.15.20.33; Thu, 13 Jul 2023 15:20:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=TPEOi7m6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233139AbjGMVdr (ORCPT + 99 others); Thu, 13 Jul 2023 17:33:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232405AbjGMVdj (ORCPT ); Thu, 13 Jul 2023 17:33:39 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67D292D6B for ; Thu, 13 Jul 2023 14:33:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=bkcPn/8IkEMwm5HtzX5ohbd8o26KLnwK0sNSpD7QR84=; b=TPEOi7m6dCmKrdH3CCyj5aoAt9 yXi5I1eqvV0R1HkWU3nv3YCqtkZ4J4yk3H0nx7In5LVOJpoRqwHd4St0rj0sGUoL0nBBDinekyrC9 IWqUAIwyw/ASO0dN1oTRV/ZfUqLzldQIIIMhm2rYSVwFw42BQSwHph9kSPR5LTOwk5DnNi5EYkrML 6KcwnZ4caTNNHvzcdZjMM/Dj6oex0kL7v9ZpvDuvkZGkXa6Gx8zXxQ7pe927VgDKXQbQymkKcZSl3 8akCi+jg4bCTpeQSDCQ7Ab9KZOJfC88SkVhnMcaBaS4uM23PSXK8JGNnipQNmfS23xYOPcXTL96LF Bg276vQA==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qK3wJ-00EDEa-2R; Thu, 13 Jul 2023 23:33:31 +0200 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?UTF-8?q?=27Marek=20Ol=C5=A1=C3=A1k=27?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?UTF-8?q?Timur=20Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?UTF-8?q?Andr=C3=A9=20Almeida?= Subject: [PATCH v2 5/6] drm/amdgpu: Log IBs and ring name at coredump Date: Thu, 13 Jul 2023 18:32:41 -0300 Message-ID: <20230713213242.680944-6-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230713213242.680944-1-andrealmeid@igalia.com> References: <20230713213242.680944-1-andrealmeid@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Log the IB addresses used by the hung job along with the stuck ring name. Note that due to nested IBs, the one that caused the reset itself may be in not listed address. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 +++++++++++++++++++++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index e1cc83a89d46..cfeaf93934fd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1086,6 +1086,9 @@ struct amdgpu_coredump_info { struct amdgpu_task_info reset_task_info; struct timespec64 reset_time; bool reset_vram_lost; + u64 *ibs; + u32 num_ibs; + char ring_name[16]; }; #endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 07546781b8b8..431ccc3d7857 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5008,12 +5008,24 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, coredump->adev->reset_dump_reg_value[i]); } + if (coredump->num_ibs) { + drm_printf(&p, "IBs:\n"); + for (i = 0; i < coredump->num_ibs; i++) + drm_printf(&p, "\t[%d] 0x%llx\n", i, coredump->ibs[i]); + } + + if (coredump->ring_name[0] != '\0') + drm_printf(&p, "ring name: %s\n", coredump->ring_name); + return count - iter.remain; } static void amdgpu_devcoredump_free(void *data) { - kfree(data); + struct amdgpu_coredump_info *coredump = data; + + kfree(coredump->ibs); + kfree(coredump); } static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, @@ -5021,6 +5033,8 @@ static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, { struct amdgpu_coredump_info *coredump; struct drm_device *dev = adev_to_drm(adev); + struct amdgpu_job *job = reset_context->job; + int i; coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT); @@ -5038,6 +5052,21 @@ static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, coredump->adev = adev; + if (job && job->num_ibs) { + struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched); + u32 num_ibs = job->num_ibs; + + coredump->ibs = kmalloc_array(num_ibs, sizeof(coredump->ibs), GFP_NOWAIT); + if (coredump->ibs) + coredump->num_ibs = num_ibs; + + for (i = 0; i < coredump->num_ibs; i++) + coredump->ibs[i] = job->ibs[i].gpu_addr; + + if (ring) + strncpy(coredump->ring_name, ring->name, 16); + } + ktime_get_ts64(&coredump->reset_time); dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, -- 2.41.0