Received: by 2002:ab2:6309:0:b0:1fb:d597:ff75 with SMTP id s9csp71679lqt; Wed, 5 Jun 2024 17:57:22 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVEqS7SPE5TmQP2wiGpYEi/0qRMPsFWw7+/N1SIAec4HdIjahFPF0bwaCJKFeIW2BxNCLkCbXInZykUsAxNo7ThBFMnR81/oaasG1VoMw== X-Google-Smtp-Source: AGHT+IEyL1Ry6W5a6U3XwlgYjHR7+hpjwpGH4ZQHkupYoEeDyNkneYp/pz6mUvMcXXScNB/XXl/l X-Received: by 2002:a05:6871:5b25:b0:250:1b31:7a31 with SMTP id 586e51a60fabf-251227002d6mr4369280fac.28.1717635442526; Wed, 05 Jun 2024 17:57:22 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717635442; cv=pass; d=google.com; s=arc-20160816; b=hUl2LXEgidm5BjF9du5rgfhow7MSEr09DTNb9vWnC8gO/vusxwp8fGVFdhMGk0UeX6 N8/mCU0pP2B/9+PEhM/dxnm9hAGjoVUuS/Z16eTnDFMjSBddfTQyTHLTkDyxFURbuCzw 9O0JgrN7+ogYyHqa/uoSta+5QDXsWWYCLq/2oJWu8fn5E475w4D+LmL+DTqZlOE4iqNs QyxamVWiMgV4QdrU976U2+q2HHnw+2o81tVnNugMdHw7IsUhKHfhOu9r8QW/frcmovPY edOsfUajW8T6qNquV8Qh09vEtUd62Ry6gte12tEQqgKOoow4+B4eSoMlFWEeEH1dDHpW KILA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=hmea915Lj+YL9H1qdqB7vce9QEdxMcac9LY2PvqexNw=; fh=6ORS5Mus+JLBsFxXheqJ3k+loQ68DnKRR0xTdtQG5hU=; b=Dlocp8qXwSfVbVGtPN0MTxIdVZC+lkbGsSNJ9nNwHVp+lNmOeNc9p0r1LhRQ1wEZC8 43LUx5/1P83rqvuiY4F4RFIKwaHU5zg7Q8u4zirDk63O+DTnEaNnBhFt/KV7MFcqqqnh OPiWLwT0BY21SH2hRduJOK22OpdSDLGGd7SyrERCZTHzQXFNb96aeUZhCgfkZj6RSoLy 6uVEvJ8j9/ZVdn5Z15JjPuobK1kR8Uu6f5kEb84zotVQ5yC9DJUu8RlXsv3do9s4MhDI 70IgXnwQQ9P0CLXBKshVAVji7LzyPqOJufO0bkO7VbhSZpUPWo6+8EmS8u58JiF3zLVS wOzQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=EkW4n5+k; arc=pass (i=1 spf=pass spfdomain=collabora.com dkim=pass dkdomain=collabora.com dmarc=pass fromdomain=collabora.com); spf=pass (google.com: domain of linux-kernel+bounces-203475-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203475-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id d2e1a72fcca58-703fd4c7432si176935b3a.219.2024.06.05.17.57.21 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jun 2024 17:57:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-203475-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=EkW4n5+k; arc=pass (i=1 spf=pass spfdomain=collabora.com dkim=pass dkdomain=collabora.com dmarc=pass fromdomain=collabora.com); spf=pass (google.com: domain of linux-kernel+bounces-203475-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203475-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id C285AB24805 for ; Thu, 6 Jun 2024 00:55:41 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7B3181C6B8; Thu, 6 Jun 2024 00:54:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b="EkW4n5+k" Received: from madrid.collaboradmins.com (madrid.collaboradmins.com [46.235.227.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92B1410940 for ; Thu, 6 Jun 2024 00:54:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=46.235.227.194 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717635281; cv=none; b=iA10aUletPD41XdhLG4H43JcIPCo8HkPBF3gdMJf+aah3KhzqFqajsoe+DlNZghcoRk8iV8bAy0e5MbjzsDVOVzxG9AdCucoO6w+3OC6qeK84ObgMEer6BL+Hiyu9q/Os7xDtUm2UlJPqqyzDcDZhRNzUjZhUex25u5SRG8W08c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717635281; c=relaxed/simple; bh=9gSQQvmnIwG7AFptIo8tpe0xAEizV1lvr2ol4vi8sHA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=O5AyIc6zX433VFumbbzP6/jz8eWr0w+GEXTppMWkVpT84kCfyaPe+HT0Cv5AWhQyY4+MhsOQntH0CS3gmpZLpQ7TsEUzy0oHgDnNH8YV0UuSzOD+R6Ajuxbt1O58gullj0V0DiynOHwoa0O9IO72VVkGn79XAI5HXBxH4BdSk78= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=collabora.com; spf=pass smtp.mailfrom=collabora.com; dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b=EkW4n5+k; arc=none smtp.client-ip=46.235.227.194 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=collabora.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=collabora.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1717635278; bh=9gSQQvmnIwG7AFptIo8tpe0xAEizV1lvr2ol4vi8sHA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EkW4n5+kpXP1RPTwYmMwBBwdQqu5wPeh71S/JqEfhZJ4eKpu7ASTXjsIIel3zBWGv ZbIGlDdYi7BS4d3L2o6uWEmqQ63VYXtD7bQ2vVeRzyUtl8QXktil0FwaQ2AFE+k/ET 7heWQm5VkpAzQGN7ZOlB/nyuu2sljgw0WWk714/B7f7bU6i4F7UGgbREi/88wWItxm HV3eU6xN90rq8eJ2cy0eFAXl9pGV+9+vVIDCPtODAE3WxcbW8yo3Mrf1zl2qtxt2pT mbJFGMMheu3W8eOc8GKulgHxwhCJUNHCoZQ3sGGnFuItc8fzcXBat10pyJ8Z18ZK3x 2iSW2LyybbAEA== Received: from localhost.localdomain (cola.collaboradmins.com [195.201.22.229]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: alarumbe) by madrid.collaboradmins.com (Postfix) with ESMTPSA id 5FF1637821CD; Thu, 6 Jun 2024 00:54:37 +0000 (UTC) From: =?UTF-8?q?Adri=C3=A1n=20Larumbe?= To: Boris Brezillon , Steven Price , Liviu Dudau , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter Cc: kernel@collabora.com, =?UTF-8?q?Adri=C3=A1n=20Larumbe?= , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 5/7] drm/panthor: support job accounting Date: Thu, 6 Jun 2024 01:49:57 +0100 Message-ID: <20240606005416.1172431-6-adrian.larumbe@collabora.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240606005416.1172431-1-adrian.larumbe@collabora.com> References: <20240606005416.1172431-1-adrian.larumbe@collabora.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A previous commit brought in a sysfs knob to control the driver's profiling status. This changeset flags jobs as being profiled according to the driver's global profiling status, and picks one of two call instruction arrays to insert into the ring buffer. One of them includes FW logic to sample the timestamp and cycle counter registers and write them into the job's syncobj, and the other does not. A profiled job's call sequence takes up two ring buffer slots, and this is reflected when initialising the DRM scheduler for each queue, with a profiled job contributing twice as many credits. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_sched.c | 95 ++++++++++++++++++++++--- 1 file changed, 86 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index bbd20db40e7b..4fb6fc5c2314 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -93,7 +93,7 @@ #define MIN_CSGS 3 #define MAX_CSG_PRIO 0xf -#define NUM_INSTRS_PER_SLOT 32 +#define NUM_INSTRS_PER_SLOT 16 #define SLOTSIZE (NUM_INSTRS_PER_SLOT * sizeof(u64)) struct panthor_group; @@ -807,6 +807,9 @@ struct panthor_job { /** @done_fence: Fence signaled when the job is finished or cancelled. */ struct dma_fence *done_fence; + + /** @is_profiled: Whether timestamp and cycle numbers were gathered for this job */ + bool is_profiled; }; static void @@ -2865,7 +2868,8 @@ static void group_sync_upd_work(struct work_struct *work) dma_fence_end_signalling(cookie); list_for_each_entry_safe(job, job_tmp, &done_jobs, node) { - update_fdinfo_stats(job); + if (job->is_profiled) + update_fdinfo_stats(job); list_del_init(&job->node); panthor_job_put(&job->base); } @@ -2884,6 +2888,8 @@ queue_run_job(struct drm_sched_job *sched_job) u32 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); u32 ringbuf_insert = queue->iface.input->insert & (ringbuf_size - 1); u32 ringbuf_index = ringbuf_insert / (SLOTSIZE); + bool ringbuf_wraparound = + job->is_profiled && ((ringbuf_size/SLOTSIZE) == ringbuf_index + 1); u64 addr_reg = ptdev->csif_info.cs_reg_count - ptdev->csif_info.unpreserved_cs_reg_count; u64 val_reg = addr_reg + 2; @@ -2893,12 +2899,51 @@ queue_run_job(struct drm_sched_job *sched_job) job->queue_idx * sizeof(struct panthor_syncobj_64b); u64 times_addr = panthor_kernel_bo_gpuva(group->syncobjs.bo) + queue->time_offset + (ringbuf_index * sizeof(struct panthor_job_times)); + size_t call_insrt_size; + u64 *call_instrs; u32 waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); struct dma_fence *done_fence; int ret; - u64 call_instrs[NUM_INSTRS_PER_SLOT] = { + u64 call_instrs_simple[NUM_INSTRS_PER_SLOT] = { + /* MOV32 rX+2, cs.latest_flush */ + (2ull << 56) | (val_reg << 48) | job->call_info.latest_flush, + + /* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */ + (36ull << 56) | (0ull << 48) | (val_reg << 40) | (0 << 16) | 0x233, + + /* MOV48 rX:rX+1, cs.start */ + (1ull << 56) | (addr_reg << 48) | job->call_info.start, + + /* MOV32 rX+2, cs.size */ + (2ull << 56) | (val_reg << 48) | job->call_info.size, + + /* WAIT(0) => waits for FLUSH_CACHE2 instruction */ + (3ull << 56) | (1 << 16), + + /* CALL rX:rX+1, rX+2 */ + (32ull << 56) | (addr_reg << 40) | (val_reg << 32), + + /* MOV48 rX:rX+1, sync_addr */ + (1ull << 56) | (addr_reg << 48) | sync_addr, + + /* MOV48 rX+2, #1 */ + (1ull << 56) | (val_reg << 48) | 1, + + /* WAIT(all) */ + (3ull << 56) | (waitall_mask << 16), + + /* SYNC_ADD64.system_scope.propage_err.nowait rX:rX+1, rX+2*/ + (51ull << 56) | (0ull << 48) | (addr_reg << 40) | (val_reg << 32) | (0 << 16) | 1, + + /* ERROR_BARRIER, so we can recover from faults at job + * boundaries. + */ + (47ull << 56), + }; + + u64 call_instrs_profile[NUM_INSTRS_PER_SLOT*2] = { /* MOV32 rX+2, cs.latest_flush */ (2ull << 56) | (val_reg << 48) | job->call_info.latest_flush, @@ -2960,9 +3005,18 @@ queue_run_job(struct drm_sched_job *sched_job) }; /* Need to be cacheline aligned to please the prefetcher. */ - static_assert(sizeof(call_instrs) % 64 == 0, + static_assert(sizeof(call_instrs_simple) % 64 == 0 && sizeof(call_instrs_profile) % 64 == 0, "call_instrs is not aligned on a cacheline"); + if (job->is_profiled) { + call_instrs = call_instrs_profile; + call_insrt_size = sizeof(call_instrs_profile); + + } else { + call_instrs = call_instrs_simple; + call_insrt_size = sizeof(call_instrs_simple); + } + /* Stream size is zero, nothing to do => return a NULL fence and let * drm_sched signal the parent. */ @@ -2985,8 +3039,23 @@ queue_run_job(struct drm_sched_job *sched_job) queue->fence_ctx.id, atomic64_inc_return(&queue->fence_ctx.seqno)); - memcpy(queue->ringbuf->kmap + ringbuf_insert, - call_instrs, sizeof(call_instrs)); + /* + * Need to handle the wrap-around case when copying profiled instructions + * from an odd-indexed slot. The reason this can happen is user space is + * able to control the profiling status of the driver through a sysfs + * knob, so this might lead to a timestamp and cycles-profiling call + * instruction stream beginning at an odd-number slot. The GPU should + * be able to gracefully handle this. + */ + if (!ringbuf_wraparound) { + memcpy(queue->ringbuf->kmap + ringbuf_insert, + call_instrs, call_insrt_size); + } else { + memcpy(queue->ringbuf->kmap + ringbuf_insert, + call_instrs, call_insrt_size/2); + memcpy(queue->ringbuf->kmap, call_instrs + + NUM_INSTRS_PER_SLOT, call_insrt_size/2); + } panthor_job_get(&job->base); spin_lock(&queue->fence_ctx.lock); @@ -2994,7 +3063,7 @@ queue_run_job(struct drm_sched_job *sched_job) spin_unlock(&queue->fence_ctx.lock); job->ringbuf.start = queue->iface.input->insert; - job->ringbuf.end = job->ringbuf.start + sizeof(call_instrs); + job->ringbuf.end = job->ringbuf.start + call_insrt_size; job->ringbuf_idx = ringbuf_index; /* Make sure the ring buffer is updated before the INSERT @@ -3141,9 +3210,14 @@ group_create_queue(struct panthor_group *group, queue->time_offset = group->syncobjs.job_times_offset + (slots_so_far * sizeof(struct panthor_job_times)); + /* + * Credit limit argument tells us the total number of instructions + * across all CS slots in the ringbuffer, with some jobs requiring + * twice as many as others, depending on their profiling status. + */ ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, group->ptdev->scheduler->wq, 1, - args->ringbuf_size / SLOTSIZE, + args->ringbuf_size / sizeof(u64), 0, msecs_to_jiffies(JOB_TIMEOUT_MS), group->ptdev->reset.wq, NULL, "panthor-queue", group->ptdev->base.dev); @@ -3538,9 +3612,12 @@ panthor_job_create(struct panthor_file *pfile, goto err_put_job; } + job->is_profiled = pfile->ptdev->profile_mode; + ret = drm_sched_job_init(&job->base, &job->group->queues[job->queue_idx]->entity, - 1, job->group); + job->is_profiled ? NUM_INSTRS_PER_SLOT * 2 : + NUM_INSTRS_PER_SLOT, job->group); if (ret) goto err_put_job; -- 2.45.1