Received: by 2002:ab2:6309:0:b0:1fb:d597:ff75 with SMTP id s9csp70913lqt; Wed, 5 Jun 2024 17:54:56 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUiWrIbaQDr/fovaj9orkeq/PkHEjO3trMmOAwpOay6I3xPbYfdgasaDANrdxC78yeqvV2UJicm0kxB3tsE475owggYXAoLOUyaMSiiSA== X-Google-Smtp-Source: AGHT+IHWWzus9ntfX68G7gbQxEP3sW1Gw3ZLQbRlyuhz+veBjP6EznGyEAmU8ucAXBu2Q2xE+AKB X-Received: by 2002:a05:6214:5992:b0:6af:22ea:51f4 with SMTP id 6a1803df08f44-6b02266c6bcmr51827736d6.9.1717635296550; Wed, 05 Jun 2024 17:54:56 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717635296; cv=pass; d=google.com; s=arc-20160816; b=KReWI6V/WAAlGPeyfTW5284gZy7Fgux8D2meFhT79hPpXA411tHDbWMHo7a9V/h1cb CuLFw5N8HRVgztYGDgnkPh/qoZmdhTs64Az7mg7HkJMOHEf7IXKMQ6s5ofGgUmWdJ2im 298/UCabsSPjmx6P+vA5JjXG/Z5Aum1J/kuxhqHKCimwj7evVRrqnrupXhVbsMuZ7JA3 B4d2Ro7lEe7WBiVfHnyEvs0toBdvgKUB6k6VMXPA7SMld+zf5pjViMpN3WGr68nKVhUm Qhw2X60AYdXFmFUKRjkiQQS+ektjWEkkNG3UiCtoVviDeGWr4gmq/N9pFgjoQULqseX3 fXnA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=5ifNMFByTrXQc24XZAFkIGQDRBzF6lgRlxaVFpEylas=; fh=6ORS5Mus+JLBsFxXheqJ3k+loQ68DnKRR0xTdtQG5hU=; b=vTgJ2BggB8wB5qLVxp1xJX4U+THrP0MRmqlThz1wHDlsWfjPk5M5Jezb21UbrfmQZT EWZxklQjMFXTr8duW8mI7kEn/MbLxdT5MUV3n2ckip86rQBp6Wcoi5aiqYnGP3LkHWlb wOmaiuZjVPf6Ya/h1fczeT8azmgFDIE4HO+EY7tgMBzKniJ/LIOZOSb4P00NIpurYPo9 5Ou6P75amgz4qWLLkp5ZRBrNIkO3WJ2/t4v8xl/F7eL61swgGMa1mIVaUCZpDHYn+cMA D8GY6k1/cx2yeulJMmouGlUoPUlWu2tJSDCjNumxWMHJ/AAHpJzOnlQVlfQD4hYUGg5K MNaw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=b53pwHtV; arc=pass (i=1 spf=pass spfdomain=collabora.com dkim=pass dkdomain=collabora.com dmarc=pass fromdomain=collabora.com); spf=pass (google.com: domain of linux-kernel+bounces-203471-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203471-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id 6a1803df08f44-6b04fa16818si3243756d6.453.2024.06.05.17.54.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jun 2024 17:54:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-203471-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=b53pwHtV; arc=pass (i=1 spf=pass spfdomain=collabora.com dkim=pass dkdomain=collabora.com dmarc=pass fromdomain=collabora.com); spf=pass (google.com: domain of linux-kernel+bounces-203471-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203471-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 3B7681C21F85 for ; Thu, 6 Jun 2024 00:54:56 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0E91C1401B; Thu, 6 Jun 2024 00:54:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b="b53pwHtV" Received: from madrid.collaboradmins.com (madrid.collaboradmins.com [46.235.227.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9F96DDBC for ; Thu, 6 Jun 2024 00:54:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=46.235.227.194 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717635278; cv=none; b=FFKXNI6JuLXf6pJvFu0bo8kGRVeqZ/oQTk5i0d0RzvKoOJzL3k2wuRgguFTOfSkDRq1jmCTx5kZwqoTATe9mNi0BjgLMz6P6siFMJ9XxpmUBX9XAdDQ5Cd8an5+Kgm6iy7CyXP4FLoRr3QuAs6EFb/KwtM4ZW7zcdPApln97V9U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717635278; c=relaxed/simple; bh=pva0MdaAT5ZKPwU5t4H9xgi6WM60AfYkJ6T0uOXDQlU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QRRbCUWSLTj4boNf9tX8FNyBnYOORmEDNskSpgDcUfT3WZCF8trfRORa9H9zfmKpFMCAkYyxb7J6z3Bhw96Hlz5vZTbbJpjpS7msqcdeztedpqRThMJ7Xx4WVjnA1DuVMAnxcOwxo8fAEEZpzoZUkFcUxoLYC2NQfe1rmLbA8eI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=collabora.com; spf=pass smtp.mailfrom=collabora.com; dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b=b53pwHtV; arc=none smtp.client-ip=46.235.227.194 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=collabora.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=collabora.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1717635275; bh=pva0MdaAT5ZKPwU5t4H9xgi6WM60AfYkJ6T0uOXDQlU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=b53pwHtVnAEs66OK+joud0NukH9nCYFpNxy9nyP7DTGueuw8Z2W6IuGVf0ufk5c3b uv4l+6glIZEUfAwiXBBnA68T7vxMIqzwXNK2xnNxZxdr3vhpPCkRJRuzHdjDTgR7XE HZYLLWjD8jajc3JHkB4rS09wZk9yZrr5jn1ihlOAMEoyNWopFQXUTazZ0MSxJXTtcC AaMMVRIMFE2hASwEXdf5pIr7nSNXu0lM7m94IWdtD7g1azZMENEA0a0H0MdbYtWDG/ EKaS+xIfsQkgP0MWS0neutFzlr4DvpImuvZIfzVIJ9gKQIOwkejTs6DT7p+BMZn+Jm 55i4IlBOB41Qg== Received: from localhost.localdomain (cola.collaboradmins.com [195.201.22.229]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: alarumbe) by madrid.collaboradmins.com (Postfix) with ESMTPSA id 63DBA37821C7; Thu, 6 Jun 2024 00:54:34 +0000 (UTC) From: =?UTF-8?q?Adri=C3=A1n=20Larumbe?= To: Boris Brezillon , Steven Price , Liviu Dudau , Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter Cc: kernel@collabora.com, =?UTF-8?q?Adri=C3=A1n=20Larumbe?= , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH v3 1/7] drm/panthor: introduce job cycle and timestamp accounting Date: Thu, 6 Jun 2024 01:49:53 +0100 Message-ID: <20240606005416.1172431-2-adrian.larumbe@collabora.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: <20240606005416.1172431-1-adrian.larumbe@collabora.com> References: <20240606005416.1172431-1-adrian.larumbe@collabora.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Enable calculations of job submission times in clock cycles and wall time. This is done by expanding the boilerplate command stream when running a job to include instructions that compute said times right before an after a user CS. Those numbers are stored in the queue's group's sync objects BO, right after them. Because the queues in a group might have a different number of slots, one must keep track of the overall slot tally when reckoning the offset of a queue's time sample structs, one for each slot. NUM_INSTRS_PER_SLOT had to be increased to 32 because of adding new FW instructions for storing and subtracting the cycle counter and timestamp register, and it must always remain a power of two. This commit is done in preparation for enabling DRM fdinfo support in the Panthor driver, which depends on the numbers calculated herein. Signed-off-by: Adrián Larumbe Reviewed-by: Liviu Dudau --- drivers/gpu/drm/panthor/panthor_sched.c | 156 ++++++++++++++++++++---- 1 file changed, 132 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index 79ffcbc41d78..62a67d6bd37a 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -93,6 +93,9 @@ #define MIN_CSGS 3 #define MAX_CSG_PRIO 0xf +#define NUM_INSTRS_PER_SLOT 32 +#define SLOTSIZE (NUM_INSTRS_PER_SLOT * sizeof(u64)) + struct panthor_group; /** @@ -466,6 +469,9 @@ struct panthor_queue { */ struct list_head in_flight_jobs; } fence_ctx; + + /** @time_offset: Offset of panthor_job_times structs in group's syncobj bo. */ + unsigned long time_offset; }; /** @@ -592,7 +598,17 @@ struct panthor_group { * One sync object per queue. The position of the sync object is * determined by the queue index. */ - struct panthor_kernel_bo *syncobjs; + + struct { + /** @bo: Kernel BO holding the sync objects. */ + struct panthor_kernel_bo *bo; + + /** + * @job_times_offset: Beginning of panthor_job_times struct samples after + * the group's array of sync objects. + */ + size_t job_times_offset; + } syncobjs; /** @state: Group state. */ enum panthor_group_state state; @@ -651,6 +667,18 @@ struct panthor_group { struct list_head wait_node; }; +struct panthor_job_times { + struct { + u64 before; + u64 after; + } cycles; + + struct { + u64 before; + u64 after; + } time; +}; + /** * group_queue_work() - Queue a group work * @group: Group to queue the work for. @@ -730,6 +758,9 @@ struct panthor_job { /** @queue_idx: Index of the queue inside @group. */ u32 queue_idx; + /** @ringbuf_idx: Index of the ringbuffer inside @queue. */ + u32 ringbuf_idx; + /** @call_info: Information about the userspace command stream call. */ struct { /** @start: GPU address of the userspace command stream. */ @@ -844,7 +875,7 @@ static void group_release_work(struct work_struct *work) panthor_kernel_bo_destroy(group->suspend_buf); panthor_kernel_bo_destroy(group->protm_suspend_buf); - panthor_kernel_bo_destroy(group->syncobjs); + panthor_kernel_bo_destroy(group->syncobjs.bo); panthor_vm_put(group->vm); kfree(group); @@ -1969,8 +2000,6 @@ tick_ctx_init(struct panthor_scheduler *sched, } } -#define NUM_INSTRS_PER_SLOT 16 - static void group_term_post_processing(struct panthor_group *group) { @@ -2007,7 +2036,7 @@ group_term_post_processing(struct panthor_group *group) spin_unlock(&queue->fence_ctx.lock); /* Manually update the syncobj seqno to unblock waiters. */ - syncobj = group->syncobjs->kmap + (i * sizeof(*syncobj)); + syncobj = group->syncobjs.bo->kmap + (i * sizeof(*syncobj)); syncobj->status = ~0; syncobj->seqno = atomic64_read(&queue->fence_ctx.seqno); sched_queue_work(group->ptdev->scheduler, sync_upd); @@ -2780,7 +2809,7 @@ static void group_sync_upd_work(struct work_struct *work) if (!queue) continue; - syncobj = group->syncobjs->kmap + (queue_idx * sizeof(*syncobj)); + syncobj = group->syncobjs.bo->kmap + (queue_idx * sizeof(*syncobj)); spin_lock(&queue->fence_ctx.lock); list_for_each_entry_safe(job, job_tmp, &queue->fence_ctx.in_flight_jobs, node) { @@ -2815,11 +2844,17 @@ queue_run_job(struct drm_sched_job *sched_job) struct panthor_scheduler *sched = ptdev->scheduler; u32 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); u32 ringbuf_insert = queue->iface.input->insert & (ringbuf_size - 1); + u32 ringbuf_index = ringbuf_insert / (SLOTSIZE); u64 addr_reg = ptdev->csif_info.cs_reg_count - ptdev->csif_info.unpreserved_cs_reg_count; u64 val_reg = addr_reg + 2; - u64 sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) + - job->queue_idx * sizeof(struct panthor_syncobj_64b); + u64 cycle_reg = addr_reg; + u64 time_reg = val_reg; + u64 sync_addr = panthor_kernel_bo_gpuva(group->syncobjs.bo) + + job->queue_idx * sizeof(struct panthor_syncobj_64b); + u64 times_addr = panthor_kernel_bo_gpuva(group->syncobjs.bo) + queue->time_offset + + (ringbuf_index * sizeof(struct panthor_job_times)); + u32 waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); struct dma_fence *done_fence; int ret; @@ -2831,6 +2866,18 @@ queue_run_job(struct drm_sched_job *sched_job) /* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */ (36ull << 56) | (0ull << 48) | (val_reg << 40) | (0 << 16) | 0x233, + /* MOV48 rX:rX+1, cycles_offset */ + (1ull << 56) | (cycle_reg << 48) | (times_addr + offsetof(struct panthor_job_times, cycles.before)), + + /* MOV48 rX:rX+1, time_offset */ + (1ull << 56) | (time_reg << 48) | (times_addr + offsetof(struct panthor_job_times, time.before)), + + /* STORE_STATE cycles */ + (40ull << 56) | (cycle_reg << 40) | (1ll << 32), + + /* STORE_STATE timer */ + (40ull << 56) | (time_reg << 40) | (0ll << 32), + /* MOV48 rX:rX+1, cs.start */ (1ull << 56) | (addr_reg << 48) | job->call_info.start, @@ -2843,6 +2890,18 @@ queue_run_job(struct drm_sched_job *sched_job) /* CALL rX:rX+1, rX+2 */ (32ull << 56) | (addr_reg << 40) | (val_reg << 32), + /* MOV48 rX:rX+1, cycles_offset */ + (1ull << 56) | (cycle_reg << 48) | (times_addr + offsetof(struct panthor_job_times, cycles.after)), + + /* MOV48 rX:rX+1, time_offset */ + (1ull << 56) | (time_reg << 48) | (times_addr + offsetof(struct panthor_job_times, time.after)), + + /* STORE_STATE cycles */ + (40ull << 56) | (cycle_reg << 40) | (1ll << 32), + + /* STORE_STATE timer */ + (40ull << 56) | (time_reg << 40) | (0ll << 32), + /* MOV48 rX:rX+1, sync_addr */ (1ull << 56) | (addr_reg << 48) | sync_addr, @@ -2897,6 +2956,7 @@ queue_run_job(struct drm_sched_job *sched_job) job->ringbuf.start = queue->iface.input->insert; job->ringbuf.end = job->ringbuf.start + sizeof(call_instrs); + job->ringbuf_idx = ringbuf_index; /* Make sure the ring buffer is updated before the INSERT * register. @@ -2987,7 +3047,8 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = { static struct panthor_queue * group_create_queue(struct panthor_group *group, - const struct drm_panthor_queue_create *args) + const struct drm_panthor_queue_create *args, + unsigned int slots_so_far) { struct drm_gpu_scheduler *drm_sched; struct panthor_queue *queue; @@ -3038,9 +3099,12 @@ group_create_queue(struct panthor_group *group, goto err_free_queue; } + queue->time_offset = group->syncobjs.job_times_offset + + (slots_so_far * sizeof(struct panthor_job_times)); + ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops, group->ptdev->scheduler->wq, 1, - args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)), + args->ringbuf_size / SLOTSIZE, 0, msecs_to_jiffies(JOB_TIMEOUT_MS), group->ptdev->reset.wq, NULL, "panthor-queue", group->ptdev->base.dev); @@ -3068,7 +3132,9 @@ int panthor_group_create(struct panthor_file *pfile, struct panthor_scheduler *sched = ptdev->scheduler; struct panthor_fw_csg_iface *csg_iface = panthor_fw_get_csg_iface(ptdev, 0); struct panthor_group *group = NULL; + unsigned int total_slots; u32 gid, i, suspend_size; + size_t syncobj_bo_size; int ret; if (group_args->pad) @@ -3134,33 +3200,75 @@ int panthor_group_create(struct panthor_file *pfile, goto err_put_group; } - group->syncobjs = panthor_kernel_bo_create(ptdev, group->vm, - group_args->queues.count * - sizeof(struct panthor_syncobj_64b), - DRM_PANTHOR_BO_NO_MMAP, - DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | - DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, - PANTHOR_VM_KERNEL_AUTO_VA); - if (IS_ERR(group->syncobjs)) { - ret = PTR_ERR(group->syncobjs); + /* + * Need to add size for the panthor_job_times structs, as many as the sum + * of the number of job slots for every single queue ringbuffer. + */ + for (i = 0, total_slots = 0; i < group_args->queues.count; i++) + total_slots += (queue_args[i].ringbuf_size / (SLOTSIZE)); + + syncobj_bo_size = (group_args->queues.count * sizeof(struct panthor_syncobj_64b)) + + (total_slots * sizeof(struct panthor_job_times)); + + /* + * Memory layout of group's syncobjs BO + * group->syncobjs.bo { + * struct panthor_syncobj_64b sync1; + * struct panthor_syncobj_64b sync2; + * ... + * As many as group_args->queues.count + * ... + * struct panthor_syncobj_64b syncn; + * struct panthor_job_times queue1_slot1 + * struct panthor_job_times queue1_slot2 + * ... + * As many as queue[i].ringbuf_size / SLOTSIZE + * ... + * struct panthor_job_times queue1_slotP + * ... + * As many as group_args->queues.count + * ... + * struct panthor_job_times queueN_slot1 + * struct panthor_job_times queueN_slot2 + * ... + * As many as queue[n].ringbuf_size / SLOTSIZE + * struct panthor_job_times queueN_slotQ + * + * Linearly, group->syncobjs.bo = {syncojb1,..,syncobjN, + * {queue1 = {js1,..,jsP},..,queueN = {js1,..,jsQ}}} + * } + * + */ + + group->syncobjs.bo = panthor_kernel_bo_create(ptdev, group->vm, + syncobj_bo_size, + DRM_PANTHOR_BO_NO_MMAP, + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC | + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED, + PANTHOR_VM_KERNEL_AUTO_VA); + if (IS_ERR(group->syncobjs.bo)) { + ret = PTR_ERR(group->syncobjs.bo); goto err_put_group; } - ret = panthor_kernel_bo_vmap(group->syncobjs); + ret = panthor_kernel_bo_vmap(group->syncobjs.bo); if (ret) goto err_put_group; - memset(group->syncobjs->kmap, 0, - group_args->queues.count * sizeof(struct panthor_syncobj_64b)); + memset(group->syncobjs.bo->kmap, 0, syncobj_bo_size); + + group->syncobjs.job_times_offset = + group_args->queues.count * sizeof(struct panthor_syncobj_64b); - for (i = 0; i < group_args->queues.count; i++) { - group->queues[i] = group_create_queue(group, &queue_args[i]); + for (i = 0, total_slots = 0; i < group_args->queues.count; i++) { + group->queues[i] = group_create_queue(group, &queue_args[i], total_slots); if (IS_ERR(group->queues[i])) { ret = PTR_ERR(group->queues[i]); group->queues[i] = NULL; goto err_put_group; } + total_slots += (queue_args[i].ringbuf_size / (SLOTSIZE)); group->queue_count++; } -- 2.45.1