Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3994114yba; Tue, 9 Apr 2019 08:57:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqwcrgzz7Gye2dZz8/WvfgbgEQwgUd/05RlDk1s8sUnMJEkT47k95DBRmBNR0G8tF58uTgpE X-Received: by 2002:a63:c10d:: with SMTP id w13mr35514937pgf.311.1554825443858; Tue, 09 Apr 2019 08:57:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554825443; cv=none; d=google.com; s=arc-20160816; b=c+XKyFitK6yXoAr1vuIqF0FZ1r/I0iUWXEnaTyx5qozoo4/qUOwEv8r2bXf7z+KnGL ABJ69ngnYziC1zJron1Yuw4IRv+kjaq0gRhqh0aydaqTBWYmVfhd8Inaq4LrCRtqBp9o cz8SXmwYYY98I/kT80DMeRHv34A6C5CocgOnsFooWu3+K+h3pA0Bf1+tCY4LZ/rLcEec SIgEX8e2Pn3VYrnzJU0lXmxtkScgFiLJn0TC3L9WGpZus0rzoxukGs57zZwGCLAMZsgy l1HX312+8qmq5abnaO1vm49fzh+WiL+ClTLx43pNqyq+c3Qy1hEZYEi0UY8SgmE0kAZq MVDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=68C3iwnYo2Xm/QppO0J52a1O4X/I+vs9O44WQpI5AC4=; b=fKepQDbCN4CRhifJ9MNeVzdJglRWK7mYZtxWmtZs8iVwsNGGo58240gq8/wyYFHfaJ jWfKkEguWvVz7Fz52YUcor3c3ricmrIE6CeoPlB6PSxFvL2kOObEvxvJYT2jvAp0vaNy henzfzEYxOKdnJRL2+YQs7/uyn5wRZxmoZ8dAu9nEMSJo2QiXrOIWPSGO1YuIXccqs4r JdiZbvqvyhKWycv9YwujOXQrQRC+eoCUqCQ+q2d22VVn1oF+i9ELuwp6k0sFtHCRbBUe jX85PJpNKeUzVkHktBi0RNI4A242D4sx0fyMFmUhHBNM6bDj2ye3oLkCZGPutjchcRiy q9Ww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h69si706635pfc.100.2019.04.09.08.57.07; Tue, 09 Apr 2019 08:57:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726638AbfDIP4T (ORCPT + 99 others); Tue, 9 Apr 2019 11:56:19 -0400 Received: from mail-it1-f194.google.com ([209.85.166.194]:50725 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726446AbfDIP4T (ORCPT ); Tue, 9 Apr 2019 11:56:19 -0400 Received: by mail-it1-f194.google.com with SMTP id q14so5759665itk.0 for ; Tue, 09 Apr 2019 08:56:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=68C3iwnYo2Xm/QppO0J52a1O4X/I+vs9O44WQpI5AC4=; b=G/kOsFjjCH6HthlZIR+ESMU117nXicF623vJqnJwbuzZzRZ/L8HYhGsgjoP5zCp3xr JZGx9h6LdT1uB2owf2DuAiwco6zX+Udiq0lAWs6x3gAO2gPj5v4g7FQSz5cwRFODmSjm Yj9eedZuozgPb9zFjO+YtqePjcb0KP8ch2qgNorT4yi794831MJM7zV1iShx0ZVo04Pj 34a4ztevfLAE1OhjtzqcL0TzukeR7helRU9rpwsacTrffyP0zZ6PpkiHAc7IUNCEngko ERtEAI3KIx0t5eZZDSUhQDryuJ1dx+PYWau88pdVNip1ZskveGkFRyo42NzTv/lwHZpm GhEA== X-Gm-Message-State: APjAAAUAp0qmBDxRF6zN6s+/R6ZnbMuKRvqSDu+t4XaK3bxBrmbssnwB BwGFpHOExFHtaVymfVhpEGsd7re8I/+BSn+Lyoc= X-Received: by 2002:a02:6f55:: with SMTP id b21mr26845136jae.30.1554825378032; Tue, 09 Apr 2019 08:56:18 -0700 (PDT) MIME-Version: 1.0 References: <20190401074730.12241-1-robh@kernel.org> <20190401074730.12241-4-robh@kernel.org> <5efdc3cb-7367-65e1-d1bf-14051db5da10@arm.com> In-Reply-To: From: Tomeu Vizoso Date: Tue, 9 Apr 2019 17:56:05 +0200 Message-ID: Subject: Re: [PATCH v2 3/3] drm/panfrost: Add initial panfrost driver To: Rob Herring Cc: Steven Price , Neil Armstrong , Maxime Ripard , Robin Murphy , Will Deacon , "linux-kernel@vger.kernel.org" , dri-devel , David Airlie , Linux IOMMU , "moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE" , "Marty E . Plummer" , Sean Paul , Alyssa Rosenzweig Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 8 Apr 2019 at 23:04, Rob Herring wrote: > > On Fri, Apr 5, 2019 at 7:30 AM Steven Price wrote: > > > > On 01/04/2019 08:47, Rob Herring wrote: > > > This adds the initial driver for panfrost which supports Arm Mali > > > Midgard and Bifrost family of GPUs. Currently, only the T860 and > > > T760 Midgard GPUs have been tested. > > [...] > > > + > > > + if (status & JOB_INT_MASK_ERR(j)) { > > > + job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP); > > > + job_write(pfdev, JS_COMMAND(j), JS_COMMAND_HARD_STOP_0); > > > > Hard-stopping an already completed job isn't likely to do very much :) > > Also you are using the "_0" version which is only valid when "job chain > > disambiguation" is present. Yeah, guess that can be removed. > > I suspect in this case you should also be signalling the fence? At the > > moment you rely on the GPU timeout recovering from the fault. > > I'll defer to Tomeu who wrote this (IIRC). Yes, that would be an improvement. > > One issue that I haven't got to the bottom of is that I can trigger a > > lockdep splat: > > > > -----8<------ > > panfrost ffa30000.gpu: js fault, js=1, status=JOB_CONFIG_FAULT, > > head=0x0, tail=0x0 > > root@debian:~/ddk_panfrost# panfrost ffa30000.gpu: gpu sched timeout, > > js=1, status=0x40, head=0x0, tail=0x0, sched_job=12a94ba6 > > > > ====================================================== > > WARNING: possible circular locking dependency detected > > 5.0.0+ #32 Not tainted > > ------------------------------------------------------ > > kworker/1:0/608 is trying to acquire lock: > > 89b1e2d8 (&(&js->job_lock)->rlock){-.-.}, at: > > dma_fence_remove_callback+0x14/0x50 > > > > but task is already holding lock: > > a887e4b2 (&(&sched->job_list_lock)->rlock){-.-.}, at: > > drm_sched_stop+0x24/0x10c > > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #1 (&(&sched->job_list_lock)->rlock){-.-.}: > > drm_sched_process_job+0x60/0x208 > > dma_fence_signal+0x1dc/0x1fc > > panfrost_job_irq_handler+0x160/0x194 > > __handle_irq_event_percpu+0x80/0x388 > > handle_irq_event_percpu+0x24/0x78 > > handle_irq_event+0x38/0x5c > > handle_fasteoi_irq+0xb4/0x128 > > generic_handle_irq+0x18/0x28 > > __handle_domain_irq+0xa0/0xb4 > > gic_handle_irq+0x4c/0x78 > > __irq_svc+0x70/0x98 > > arch_cpu_idle+0x20/0x3c > > arch_cpu_idle+0x20/0x3c > > do_idle+0x11c/0x22c > > cpu_startup_entry+0x18/0x20 > > start_kernel+0x398/0x420 > > > > -> #0 (&(&js->job_lock)->rlock){-.-.}: > > _raw_spin_lock_irqsave+0x50/0x64 > > dma_fence_remove_callback+0x14/0x50 > > drm_sched_stop+0x5c/0x10c > > panfrost_job_timedout+0xd0/0x180 > > drm_sched_job_timedout+0x34/0x5c > > process_one_work+0x2ac/0x6a4 > > worker_thread+0x28c/0x3fc > > kthread+0x13c/0x158 > > ret_from_fork+0x14/0x20 > > (null) > > > > other info that might help us debug this: > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > lock(&(&sched->job_list_lock)->rlock); > > lock(&(&js->job_lock)->rlock); > > lock(&(&sched->job_list_lock)->rlock); > > lock(&(&js->job_lock)->rlock); > > > > *** DEADLOCK *** > > > > 3 locks held by kworker/1:0/608: > > #0: 9b350627 ((wq_completion)"events"){+.+.}, at: > > process_one_work+0x1f8/0x6a4 > > #1: a802aa2d ((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at: > > process_one_work+0x1f8/0x6a4 > > #2: a887e4b2 (&(&sched->job_list_lock)->rlock){-.-.}, at: > > drm_sched_stop+0x24/0x10c > > > > stack backtrace: > > CPU: 1 PID: 608 Comm: kworker/1:0 Not tainted 5.0.0+ #32 > > Hardware name: Rockchip (Device Tree) > > Workqueue: events drm_sched_job_timedout > > [] (unwind_backtrace) from [] (show_stack+0x10/0x14) > > [] (show_stack) from [] (dump_stack+0x9c/0xd4) > > [] (dump_stack) from [] > > (print_circular_bug.constprop.15+0x1fc/0x2cc) > > [] (print_circular_bug.constprop.15) from [] > > (__lock_acquire+0xe5c/0x167c) > > [] (__lock_acquire) from [] (lock_acquire+0xc4/0x210) > > [] (lock_acquire) from [] > > (_raw_spin_lock_irqsave+0x50/0x64) > > [] (_raw_spin_lock_irqsave) from [] > > (dma_fence_remove_callback+0x14/0x50) > > [] (dma_fence_remove_callback) from [] > > (drm_sched_stop+0x5c/0x10c) > > [] (drm_sched_stop) from [] > > (panfrost_job_timedout+0xd0/0x180) > > [] (panfrost_job_timedout) from [] > > (drm_sched_job_timedout+0x34/0x5c) > > [] (drm_sched_job_timedout) from [] > > (process_one_work+0x2ac/0x6a4) > > [] (process_one_work) from [] > > (worker_thread+0x28c/0x3fc) > > [] (worker_thread) from [] (kthread+0x13c/0x158) > > [] (kthread) from [] (ret_from_fork+0x14/0x20) > > Exception stack(0xeebd7fb0 to 0xeebd7ff8) > > 7fa0: 00000000 00000000 00000000 > > 00000000 > > 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > > 00000000 > > 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > ----8<---- > > > > This is with the below simple reproducer: > > > > ----8<---- > > #include > > #include > > #include > > > > #include > > #include "panfrost_drm.h" > > > > int main(int argc, char **argv) > > { > > int fd; > > > > if (argc == 2) > > fd = open(argv[1], O_RDWR); > > else > > fd = open("/dev/dri/renderD128", O_RDWR); > > if (fd == -1) { > > perror("Failed to open"); > > return 0; > > } > > > > struct drm_panfrost_submit submit = { > > .jc = 0, > > }; > > return ioctl(fd, DRM_IOCTL_PANFROST_SUBMIT, &submit); > > } > > ----8<---- > > > > Any ideas? I'm not an expert on DRM, so I got somewhat lost! > > Tomeu? Ran out of time today, but will be able to look at it tomorrow. Thanks! Tomeu