Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp4012440iog; Tue, 28 Jun 2022 07:21:07 -0700 (PDT) X-Google-Smtp-Source: AGRyM1thN0nPCyYfifIjNOfyA3FIA7VWB3KMuIQd6P9vP2FkoVpUBHho8Vnc85hoYw6zxuwq9UQL X-Received: by 2002:a05:6a00:139b:b0:525:3e1b:f630 with SMTP id t27-20020a056a00139b00b005253e1bf630mr3837091pfg.54.1656426067209; Tue, 28 Jun 2022 07:21:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656426067; cv=none; d=google.com; s=arc-20160816; b=UMWM985Y12a+2Xa8Oiut55bRMWtt+ku/QFkQjpCjklJRRKuhurWAtcg0KNKDb5pgaa oSBRzHLbPF3076a+EIL1Z2xUxAT2uK1ps5gM/Pw+kYHoZsjl/Go2+MfWu1LXPboITK8r 4TwIRxKXqh0PxH5nVpP6cEhLuoikQOmnFBSrII/R1izL5A4hHYhvVl6ABl0eRvVKfP0+ SjXAsymrKCl3baiPKgZj4AowiMCi219zPeUQJAzODqjJgaxk4+5HABM4qdLWdWIvIRiN Wzyq+CwIDhGHOnWaYeg1vC2SE5xmTZtU1WMQzYlXI6W2KybzkXRHrhq+OVK9aplxQWS/ ueVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:content-transfer-encoding:mime-version :references:in-reply-to:message-id:cc:to:from:date:dkim-signature; bh=W409DDL53Mh/XrSeTomSD2mCgKfZ0730BlNeMSkL4Kw=; b=NNJh6Vfso5s8a4zc3lhD3O7zCrREjaelOtn3ngDgW5StqLeh+Xof0EPohXNbh+jCM7 CFaK+kSuEPveFMP9LSVpkl8YWO+UHGbFj4+29nwTuPGTXEtZNxl7OVQ0kEKZexh/LV0D jD9e/VgZA82C3jgoyJDSSZG5I/Utsy5GR34FSMDm9exMeDgSAZ2vnXi8/+k+f8EgVCVO VCUgWd0SKUlS8Rm68aELjUzn3tBOoj9gbGNfqy6p7HHkn/KibtUfnp/TSz66EqrY0Hee a77s1sXkTw6IvvJE+yeqDjqnYpNyY/U5pe1XLOdC7vZIJONQUYvfXnWFu1Ux+1qM6Ian 0ihA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@intel.com header.s=Intel header.b=GQFp4GSM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q197-20020a632ace000000b0040c6b32d48asi18335898pgq.871.2022.06.28.07.20.46; Tue, 28 Jun 2022 07:21:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@intel.com header.s=Intel header.b=GQFp4GSM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346810AbiF1OIx (ORCPT + 99 others); Tue, 28 Jun 2022 10:08:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230216AbiF1OIw (ORCPT ); Tue, 28 Jun 2022 10:08:52 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B6B03466B; Tue, 28 Jun 2022 07:08:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656425331; x=1687961331; h=date:from:to:cc:message-id:in-reply-to:references: mime-version:content-transfer-encoding:resent-date: resent-from:subject:resent-message-id:resent-to; bh=NlN93CPLmTbVnxPPhc+BVfuBiZgxZ+dnkYTe/LYS/gQ=; b=GQFp4GSMz9aZsJu51arHOXiSc917AQpi6WBE1e4nVvydAmiIMOBLm7hm 1ObneK1tIPaJfOu630qcEi9ksKWUh7Ggel96xi4mLHBYh3l7LiNTJ6+qk km1qo9fL1RsgNVQf3kZuHv+DRl5N3gl+tR1O1/ZdatZs68AyKL7/e66Fj np2DYK12fXqB0yL6BaWGcHDpmgBkB/HPqe5RzB15ADRjgxyljoVWiECn9 K3yL05MtXyhsEq4bFZfU5zGgczqM5DFZN9NCTXIcm0R6AJ2Gm66wIOAFm /oaVwwfjXm9PPxNop6OIwI+ORH2G7JteqsrJnS2snbV+Dp+F8Dqhps2pp Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10391"; a="307233331" X-IronPort-AV: E=Sophos;i="5.92,227,1650956400"; d="scan'208";a="307233331" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 07:08:51 -0700 X-IronPort-AV: E=Sophos;i="5.92,227,1650956400"; d="scan'208";a="594809580" Received: from maurocar-mobl2.ger.corp.intel.com (HELO maurocar-mobl2) ([10.252.40.41]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2022 07:08:44 -0700 Date: Mon, 27 Jun 2022 11:00:56 +0200 From: Mauro Carvalho Chehab (by way of Mauro Carvalho Chehab ) To: Tvrtko Ursulin Cc: Andi Shyti , Mauro Carvalho Chehab , Chris Wilson , Fei Yang , Thomas Hellstrom , Bruce Chang , Daniel Vetter , Dave Airlie , David Airlie , Jani Nikula , John Harrison , Joonas Lahtinen , Matt Roper , Matthew Brost , Rodrigo Vivi , Tejas Upadhyay , Umesh Nerlige Ramappa , dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, Mika Kuoppala , Chris Wilson , stable@vger.kernel.org, Thomas =?UTF-8?B?SGVs?= =?UTF-8?B?bHN0csO2bQ==?= Message-ID: <20220627110056.6dfa4f9b@maurocar-mobl2> In-Reply-To: <160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel.com> References: <5ee647f243a774927ec328bfca8212abc4957909.1655306128.git.mchehab@kernel.org> <160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel.com> X-Mailer: Claws Mail 4.1.0 (GTK 3.24.34; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Tvrtko, On Fri, 24 Jun 2022 09:34:21 +0100 Tvrtko Ursulin wrote: > On 23/06/2022 12:17, Andi Shyti wrote: > > Hi Mauro, > >=20 > > On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote: = =20 > >> From: Chris Wilson > >> > >> Don't allow two engines to be reset in parallel, as they would both > >> try to select a reset bit (and send requests to common registers) > >> and wait on that register, at the same time. Serialize control of > >> the reset requests/acks using the uncore->lock, which will also ensure > >> that no other GT state changes at the same time as the actual reset. > >> > >> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing st= ore") > >> > >> Reported-by: Mika Kuoppala > >> Signed-off-by: Chris Wilson > >> Cc: Mika Kuoppala > >> Cc: Andi Shyti > >> Cc: stable@vger.kernel.org > >> Acked-by: Thomas Hellstr=C3=B6m > >> Signed-off-by: Mauro Carvalho Chehab =20 > >=20 > > Reviewed-by: Andi Shyti =20 >=20 > Notice I had a bunch of questions and asks in this series so please do=20 > not merge until those are addressed. >=20 > In this particular patch (and some others) for instance Fixes: tag, at=20 > least against that sha, shouldn't be there. Hmm... I sent an answer to your points, but I can't see it at: https://lore.kernel.org/all/160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.int= el.com/ Maybe it got lost somewhere, I dunno. Yeah, indeed the fixes tag on patch 5/6 should be removed as this is not directly related to changeset 7938d61591d3. Yet, this one is required for patch 6 to work. The other patches on this series, though, are modifying the code=20 introduced by changeset 7938d61591d3. Patch 2 is clearly a workaround needed for TLB cache invalidation to work on some GPUs. So, while not related to Broadwell, they're also fixing some TLB cache issues. So, IMO, it should keep the fixes. I tried to port just the two serialize patches to drm-tip, in order to solve the issues on Broadwell, but it didn't work, as the logic=20 inside the spinlock could be calling schedule() with a spinlock hold: =20 Jun 14 17:38:48 silver kernel: [ 23.227813] BUG: sleeping function calle= d from invalid context at drivers/gpu/drm/i915/intel_uncore.c:2496 Jun 14 17:38:48 silver kernel: [ 23.227816] in_atomic(): 1, irqs_disable= d(): 1, non_block: 0, pid: 37, name: kworker/u8:1 Jun 14 17:38:48 silver kernel: [ 23.227818] preempt_count: 1, expected: 0 Jun 14 17:38:48 silver kernel: [ 23.227819] RCU nest depth: 0, expected:= 0 Jun 14 17:38:48 silver kernel: [ 23.227820] 5 locks held by kworker/u8:1= /37: Jun 14 17:38:48 silver kernel: [ 23.227822] #0: ffff88811159b538 ((wq_c= ompletion)i915){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580 Jun 14 17:38:48 silver kernel: [ 23.227831] #1: ffffc90000183e60 ((work= _completion)(&(&i915->mm.free_work)->work)){+.+.}-{0:0}, at: process_one_wo= rk+0x1e0/0x580 Jun 14 17:38:48 silver kernel: [ 23.227837] #2: ffff88811b34c5e8 (reser= vation_ww_class_mutex){+.+.}-{3:3}, at: __i915_gem_free_objects+0xba/0x210 = [i915] Jun 14 17:38:48 silver kernel: [ 23.228283] #3: ffff88810a66c2d8 (>->= tlb_invalidate_lock){+.+.}-{3:3}, at: intel_gt_invalidate_tlbs+0xe7/0x4d0 [= i915] Jun 14 17:38:48 silver kernel: [ 23.228663] #4: ffff88810a668f28 (&unco= re->lock){-.-.}-{2:2}, at: intel_gt_invalidate_tlbs+0x115/0x4d0 [i915] I didn't investigate the root cause, but it seems related to PM, so=20 patches 1 and 3 seem to be required for the serialization logic to actually work. So, I would keep the Fixes: tag mentioning changeset 7938d61591d3 on patches: 1, 2, 3 and 6. Yet, IMO the entire series should be merged on -stable. If that's OK for you and there's no additional issues to be addressed, I'll submit a v2 of this series removing the Fixes tag from patches 4 and 5. Regards, Mauro