Received: by 2002:ab2:788f:0:b0:1ee:8f2e:70ae with SMTP id b15csp408691lqi; Thu, 7 Mar 2024 00:18:05 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVUz5RxydgVMFbaT2VHmURXbmRhbrQfrXkU0X+XD4YC+PJxlhl94sNEmLvkuJ57wctyRFmypbb3ZWmmpuhP15A9ifHCTisqGT1wLyGk7g== X-Google-Smtp-Source: AGHT+IHC7FSnBjHBlNJ11fTk3WYRQ6maRHKIqQkyr7orE1lA8/ADK5S5Vcw9vkbiPX41UOnDa9/L X-Received: by 2002:a17:90a:7402:b0:29b:164:ea82 with SMTP id a2-20020a17090a740200b0029b0164ea82mr15246490pjg.38.1709799484928; Thu, 07 Mar 2024 00:18:04 -0800 (PST) Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id z7-20020a17090a66c700b002995fa56919si1228324pjl.15.2024.03.07.00.18.04 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Mar 2024 00:18:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-95118-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=OxTdZTpj; arc=fail (signature failed); spf=pass (google.com: domain of linux-kernel+bounces-95118-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-95118-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 929972834A1 for ; Thu, 7 Mar 2024 08:17:22 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2904F63401; Thu, 7 Mar 2024 08:17:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="OxTdZTpj" Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2084.outbound.protection.outlook.com [40.107.244.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E645263128 for ; Thu, 7 Mar 2024 08:17:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.244.84 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709799435; cv=fail; b=HUicTgbpq9xHmb7K/omD5cuQgfju0FRJC9jgtIfhD/qDTgJEZIoizJ8T67jXSRCDk099MHkRvvQFAKgvD8B3fraReeQygXSkDEExlr+EIEv5l2aHwctjESOhcbQiB6WtSinEIh89ygLwnxNXWrRSVS6qTye/4QS8R1CUlgcLYx0= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709799435; c=relaxed/simple; bh=jhn7+J2wqKVnFx1+hN6W4nZDStoQzVJKVHKocg4dgeU=; h=Message-ID:Date:Subject:To:Cc:References:From:In-Reply-To: Content-Type:MIME-Version; b=RRMQGpKunqclip46uA/TnYdcKoN6aMreDhJv/Y5a0EF0kb5py6mIkT9SiL0NhMTIxxycyUqkYsGHARICga71yJ78rbA5Hib3G5JVP4EzhjD9OanI90b4MS12x/jrY/kvTN1v8g1aktK6JIksj5wuQxT6+vS94veZXKJWug/7tI8= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=OxTdZTpj; arc=fail smtp.client-ip=40.107.244.84 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nc2Jj44QOVLgjjAG7Yj1kFs5zCufvNE8uUW8PqLOVz4gZENOHW3SUMVNqVyJ9mT0MkiechC8OZQpggDmZiVLZRtYOOHAbK0Cl2hjqFZb4405OpG4RCsvhGu61GgGqOhZpPOQGcOzLHZDmX1edYUWHLogTCLJgt6X+jKE6mAwHo4iTJBaa/s30ns4Dc2Y4S6lrFevx85CmNIJTMHWhUgYSowb2lAztV6ky414qoUhGBChu6K0QJrxEFg4ImaFNl08E6KsABNMJ4LOayCBSybaZFgYZ3fjtU34vefu/FQP4opyPN5ZEfp52IPg43I8hgwux2ABx/tFX9U8MJuOl9ooVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=RR50PsoOi2PbM5kM/PJCN1MOh9r22Kd8jvX3WPgp/5A=; b=Ft2U+zX1IUlT14EoODSi/jNpavJOTlfOBfaVgFhxqhvMMGO7q/ohy+qC1DS1b/EWJAtqJoNyUjEoj55F8UIwjoEXWz4CjkDWDfYQXxKo+TFWpxspsyXPLztM9wFuObZCOZ0Pj+QBXTfxH5+V3ZWEzU90JI0LtgC6ZZuNhlnB/1rMs33Zx0ToNxsw7V4LTbwd6yM+VKRpbqTXTa/MW08nr+GcXUW6ea8N46M65xUZmdK8LMuz/lp9LOwoEF0Xtq/TD3UDLyOiMykdhfmAN+B4PPKeCilZa/eaWcaUqYek76hWPBT8YHvav4VezZ2h5l0DXkPjNG7Z5LLZZEQPLz7inA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=RR50PsoOi2PbM5kM/PJCN1MOh9r22Kd8jvX3WPgp/5A=; b=OxTdZTpjarhH67resyydbhBXJ9cO+6IDeV3kj2/HNk+BnMyxZETyrm6q3ITm0TfSj7snUsmS9ekRMcy3I77B9KNqu/lrOeVeivwIJiKYVMTp2B926+22phfglhYqc4CJjbCmvI17of69MGd4MDcGA9RgkzEptLJcrWm1lucyLWg= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com; Received: from PH7PR12MB5685.namprd12.prod.outlook.com (2603:10b6:510:13c::22) by CY8PR12MB7414.namprd12.prod.outlook.com (2603:10b6:930:5e::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7339.39; Thu, 7 Mar 2024 08:17:11 +0000 Received: from PH7PR12MB5685.namprd12.prod.outlook.com ([fe80::f2b6:1034:76e8:f15a]) by PH7PR12MB5685.namprd12.prod.outlook.com ([fe80::f2b6:1034:76e8:f15a%6]) with mapi id 15.20.7362.024; Thu, 7 Mar 2024 08:17:11 +0000 Message-ID: Date: Thu, 7 Mar 2024 09:17:05 +0100 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] drm/amdgpu: add vm fault information to devcoredump Content-Language: en-US To: Sunil Khatri , Alex Deucher , Shashank Sharma Cc: amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Mukul Joshi , Arunpravin Paneer Selvam References: <20240306181937.3551648-1-sunil.khatri@amd.com> <20240306181937.3551648-2-sunil.khatri@amd.com> From: =?UTF-8?Q?Christian_K=C3=B6nig?= In-Reply-To: <20240306181937.3551648-2-sunil.khatri@amd.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: FR3P281CA0035.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:1c::8) To PH7PR12MB5685.namprd12.prod.outlook.com (2603:10b6:510:13c::22) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PH7PR12MB5685:EE_|CY8PR12MB7414:EE_ X-MS-Office365-Filtering-Correlation-Id: e0c0fe2d-fd06-476b-3d98-08dc3e7efce2 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ZIuUZ+fcXVrhDFYotfXvbvoMCrvoW/VnHbiJdtHFB96GupHdInP0d7EPt8Uj1XIEO8FsBGeOG6F0DHIW0t0klVhvymGd5wp6I/WOQQaRo1svxrfIsGVeL99qWecUmVbEX2AkQUloSBc9BTw7s9nO7JPR/eKcy14XZfBgoeZJ6XDUOoo/6/3gQtlwD8NRjjOzER8ZjC3IGnwRvNPp0nQ7ge6ebxf8c9pyw6uKc44lmIsif6e/RLCMIFJuy3dU76rx57+pAf0N8EIWc3aLSUMMRrWgR72Pnq2HW8zi1SsJlnoeBjJ9wv7R+zTPpk79atXBAzKe7dKKyektmJ6YluxwjUHV7hyfPtlPitrSPhakBT5Pb4K6CrWXxM890KQGv+OnOtK5S+2EkAFRxChKEcggYL58LlvAUGOCUMp4F2lbONXKmTKgqVxmboouRi9W7R8hrfthePuuthE8jVHyIST68jSQxKtNlS6cVc/l+//oCZrDKqNGPC+3iqtKUjgcmtRBZdPrncN7avem9mUW4hjiFgGY/7v0/6lnGNyPypHmfTOekCqKoHcDSLVmBe3qz08qNhE+92RgoEDvfyayNb6mTZgsHIM/EkJb9Ax//i81BAjZx3cB6wyiVidIY9vzlbkJ4NQUClVSMGep6iYInBg4N4Iki4Q1TIZQxHF88lWcsbo= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PH7PR12MB5685.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(376005);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?cEFtOWw2b0JMeWNiUGxlZTNkQjl2aU5ld3JSTE9oNGdWR05QWW9VTjBWdUg1?= =?utf-8?B?TzFzNTUvcmkxbDF5QVRFczJyTVRxVHJmUTNYQzVrZlByK212Q0FHQWVWQkEr?= =?utf-8?B?Z3dNSHVhdXlPUGZmUDZsYytlL2xQMnhKTmFySmp2cjZWVmxKU1I0UFZodmNl?= =?utf-8?B?alVpN0oyc3JCaTM4RTMydUNxb3NaZ1o0SzIrSTlaanpteW5iT2VIaklIQWg1?= =?utf-8?B?VXdZSEhETG1lTGFKSTlwNVBrU3NZYzlVS1Rid3V1ZUI2b2JkL2licmpmTGFy?= =?utf-8?B?eVc0V0FvVEdwWm5kakVFREt0V3h6M0dzbkI3eDlKOThWRHFjcXdydnIvK3VD?= =?utf-8?B?UVI0eEtnLzJ5SjlmNWl1aENIbUVoRGtYcG5nd05GS08rUFlOTE1uV3FSZ0Nj?= =?utf-8?B?NndvblZLbllUdElkZyt3YTR6V21BcHZlM3oxbEZ4REpPaE9jZzBwaml0Q1ky?= =?utf-8?B?M2dlSkRnTnVsZ055bTdYb2JKMWpvU08rSVVob0VtN2NuMk9GUllONHlGcE5X?= =?utf-8?B?SlFaQWdVNVZQanpEVzhTZVl2V1lYK1B4bk9sWjZhVVY1NnlqVVdMajY4WkZw?= =?utf-8?B?ZzdSaFFWeldDQmFLc3JzTFhyZHNNNUtaaFBpdjMxbC84VWFJVHpqVHByd0hV?= =?utf-8?B?b09kVFREc1JNZjgzanRWeG0xTWYzYzhzUGtuVjlsdmdVNXBUd3ZnUk1vZUNv?= =?utf-8?B?enZibGRYaThIeEQveWpXV0V0eVRjOGNpVHk2cTQwYnRGL2Z5aEVMRk9rRkQ5?= =?utf-8?B?NlV4SGVwTG5TNjE1T2RrRnE3TGhuRkliQlBSQ21DUkVkd28zcFppeW1LUDRq?= =?utf-8?B?emg2LzhOd3Jzb2JjNitnQTBVSzVxRVAvUzZ1Y0Rtcm5zdlJKQS9JaFNKS0V4?= =?utf-8?B?NDdlR2tmTjgrdlNiY1ZBNDBFeUM5bzRqQXZkeUxBVVl4djNRckJOYk1kZVhR?= =?utf-8?B?MzVUVEFRQXo2MEV5WENCeVlPSkJLUjZkN092UTZ5Umh0VEhjZTBiaGs1NUFx?= =?utf-8?B?eFFUSmZjVHRQTTZFYlBveEIwSWJ0SUE0U1ZaQmZ3V0QwcmtKajlxT2tWcjE0?= =?utf-8?B?K21aNm9teS9veWVPT2ZvTld4K0lIaXNmV0srMmc1U0s3T3JCM0ZHTVdmWm1W?= =?utf-8?B?NGx3NU83bHRBSzlqRGlJcDYxYWZvUjBDNldCNlQ1aGI5QkNmSmxnRXQ2TnpY?= =?utf-8?B?S0Q1akVzcEdMNmNIYXE0ZTlTL3hybjdPR3B1enpaODZ5Q1NsQmdXYzlNYlpX?= =?utf-8?B?RlFPaVJtbHhGYVVsZzE1MTVYZEZabVdYYzN0RGpMMjB2SDJBYk1UdC9Rc3BE?= =?utf-8?B?QzBuY29wZ2Y4ek9zOVFCdUY5emVYdWFGUHBydEpYcnROb21NdXZreHRIa0NH?= =?utf-8?B?bnlKNUUzekdpelpxbmkvNU94TWVhVjRINGlUaEJ5WG1DZml4bmpsUTJMUmpI?= =?utf-8?B?d1R5OGRsWlpyUWpmVUtrdXp0dFQyZUZLdHprUVpSVFNTR1p4a0VqeXE5Wmow?= =?utf-8?B?Mjk4MkF2YndrUG5BZ2l3a01PTDdKVkFmRUhYb2pReExRaUhqQ0FXRElRcHF3?= =?utf-8?B?bGw1enpQMHpBWTFaS3gwVFVjTUhzMkN4MWpXd093RFp1aHNZaFF6RGlmYXFo?= =?utf-8?B?NS9DbXN2U2hrWkhCUmh5TmorTEw3QzFxUENybmhYalBoOXVIMzlyOEF0a2NN?= =?utf-8?B?ME51ckYyRmQ5cmhXS3lVUVg0NlI4QUtwTVBmZWsvZUN4ZTVYZFpFTWhJam1w?= =?utf-8?B?dG5wKzZpbnJiOXFzNUpSdTAxT21SNkdhNThjd2x6aW9aZjJrWCtrdUdQN3dP?= =?utf-8?B?M1BNdllWMjlkamN2Vm1PWUhIeHdoRDlOaUFoOTJ1TFVEakozZ2s4OE5OWkV6?= =?utf-8?B?OTFyZ3VNSmplbWpSK25qSFZ1Zm9DSkczSTBrVkc3V0RWcllBM040cFdYM1RT?= =?utf-8?B?SU9OTzh1YnhaQ2tXVmZQOEJObnFDOHUrbW91Z1NKRzZRVnBxMDYyMytwVEZF?= =?utf-8?B?Qm5OemxCNmY3ckhhMVpGRE1ub0xQVVB5bUh3Z3lNSmZsNytBOGZldGpWUzBp?= =?utf-8?B?WHR6RWJoT1ozclFIYzM5cEg1WUkwd0VFbEwyWUZhUk8xWGRySExJaVovL1BG?= =?utf-8?Q?yDjMHkEhQH6DROh/Hk+RMoIKq?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: e0c0fe2d-fd06-476b-3d98-08dc3e7efce2 X-MS-Exchange-CrossTenant-AuthSource: PH7PR12MB5685.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Mar 2024 08:17:11.1579 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: pgpvHIEgRX+ubcS8LJ9tM06HGL1/cqP8csMOskXoI39eb29g0ju1GxB+PAXQSdhC X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY8PR12MB7414 Am 06.03.24 um 19:19 schrieb Sunil Khatri: > Add page fault information to the devcoredump. > > Output of devcoredump: > **** AMDGPU Device Coredump **** > version: 1 > kernel: 6.7.0-amd-staging-drm-next > module: amdgpu > time: 29.725011811 > process_name: soft_recovery_p PID: 1720 > > Ring timed out details > IP Type: 0 Ring Name: gfx_0.0.0 > > [gfxhub] Page fault observed for GPU family:143 > Faulty page starting at address 0x0000000000000000 > Protection fault status register:0x301031 > > VRAM is lost due to GPU reset! > > Signed-off-by: Sunil Khatri > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 15 ++++++++++++++- > drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 1 + > 2 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c > index 147100c27c2d..d7fea6cdf2f9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c > @@ -203,8 +203,20 @@ amdgpu_devcoredump_read(char *buffer, loff_t offset, size_t count, > coredump->ring->name); > } > > + if (coredump->fault_info.status) { > + struct amdgpu_vm_fault_info *fault_info = &coredump->fault_info; > + > + drm_printf(&p, "\n[%s] Page fault observed for GPU family:%d\n", > + fault_info->vmhub ? "mmhub" : "gfxhub", > + coredump->adev->family); > + drm_printf(&p, "Faulty page starting at address 0x%016llx\n", > + fault_info->addr); > + drm_printf(&p, "Protection fault status register:0x%x\n", > + fault_info->status); > + } > + > if (coredump->reset_vram_lost) > - drm_printf(&p, "VRAM is lost due to GPU reset!\n"); > + drm_printf(&p, "\nVRAM is lost due to GPU reset!\n"); > if (coredump->adev->reset_info.num_regs) { > drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); > > @@ -253,6 +265,7 @@ void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, > if (job) { > s_job = &job->base; > coredump->ring = to_amdgpu_ring(s_job->sched); > + coredump->fault_info = job->vm->fault_info; That's illegal. The VM pointer might already be stale at this point. I think you need to add the fault info of the last fault globally in the VRAM manager or move this to the process info Shashank is working on. Regards, Christian. > } > > coredump->adev = adev; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h > index 60522963aaca..3197955264f9 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h > @@ -98,6 +98,7 @@ struct amdgpu_coredump_info { > struct timespec64 reset_time; > bool reset_vram_lost; > struct amdgpu_ring *ring; > + struct amdgpu_vm_fault_info fault_info; > }; > #endif >