Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp754222iog; Wed, 29 Jun 2022 09:29:48 -0700 (PDT) X-Google-Smtp-Source: AGRyM1voAnJY4QIyN1TKl3KAUPGRZmdLEgHPKgpJYLjKA7xLQ0HVTxuvcZnFX1hKc50bTiJXsBHF X-Received: by 2002:a17:906:2f06:b0:726:3afc:fe31 with SMTP id v6-20020a1709062f0600b007263afcfe31mr4222609eji.329.1656520187869; Wed, 29 Jun 2022 09:29:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656520187; cv=none; d=google.com; s=arc-20160816; b=rj8a5wToXi1+7c54U1fkWQcGNP9MdvrOgRh6tuGadzwGD/qvjCj8N5hqZ+zIYCx2F7 nqLjHNlF9IlcA/p57GqAImSUo1nIAl1z1dJy76I5PYLigNmlC1RQuAtdbN4AyEprq3S0 b98oOROKzrBAC4ShbTxLpphS9SS+Z9ogNEptI53yHA7ztFVcmN8t4Iily2sIdgiv/cHk /jLQuKlWvyqQ319niovMo4egZb7tG8bB9FE5igzHSAUd7y7XFmmU//fZVPClEsSdrchm Txw2SL7VhOUYWryLirjRwz2+ax6Q+bU8W3ltH0NdmKgo7SfULOlXJfLHYLEDCScjGCk5 IR3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=3t1Y9X/xRwifzy4VwBhCj+NbOZuJZDl5SrTlQHed9iw=; b=prTvai8vf4uOysnMnO3aGy7R6c7O0UcQVhlR/lqqvQTZ3GnpDlvNK1jo9+jIcH1Xzp alD5UvAhWRKy09sjqVdWqSWH41zHqn4eRZzbFC/gPKuzme1FvbFIMEL69K+IDLoMVbJc N9qLtyG/s0Ggw2KHmtbyVpf77fd4hHUiRBa0yrA0CmOHDAudLFmS+HiOjhPo9EUsAdY+ ueeWnZ7c/NiQgyZnkzj3hkwUC6OSPjHHz3CKgfkSF1NdmSSrOTiEstTET+Mm0fkFtQRG u+1ijE6KiRLjnIqQtn70JUqQn3zxCFuyKr2ntHdffNgYPZGnTDE7uFM5GikkRNnIzf9q aWZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Hlp7DK2e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s26-20020a170906355a00b00722e58ba4ebsi6536488eja.619.2022.06.29.09.29.22; Wed, 29 Jun 2022 09:29:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Hlp7DK2e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233973AbiF2QER (ORCPT + 99 others); Wed, 29 Jun 2022 12:04:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234459AbiF2QD7 (ORCPT ); Wed, 29 Jun 2022 12:03:59 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBCB36396; Wed, 29 Jun 2022 09:03:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656518590; x=1688054590; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=T667AQkQO6W+y7qPEHoEOG/k45uvfM5ZLAMqoGg6hEU=; b=Hlp7DK2eOSCSt+H2sq/si3SwEndGMcA1dBnOTPsvxwg0NAvqpeMVP9kj +fMbF2i2OD4gJ5R+pL5UgI0bil0XcZBsRgc/nSKMoK3Qx42K91VqwwY6R rOA2ObMZO4VbothN/QIhmOYdtaiiauN2tIJk2gT+uLTSbVJhiYyQCBmXA 7sgB25bSsHAPw0y4DyERtMWl22UzKgbTY2f6uZC5FNfUOqyf6Lakd91v7 0ciU7wWIAhidiCTXGmkWqrK2DB+KTz+di/iCoRvP33MMiufr89Zvn4MGz CmXMDUPzMF+xr0iNM7oDGzz2ORcMrCo+5rZdpbI+WtxLZw6s4ZvRAF8Ip Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10393"; a="262473883" X-IronPort-AV: E=Sophos;i="5.92,231,1650956400"; d="scan'208";a="262473883" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jun 2022 09:03:07 -0700 X-IronPort-AV: E=Sophos;i="5.92,231,1650956400"; d="scan'208";a="837176977" Received: from dmurr12x-mobl.ger.corp.intel.com (HELO [10.213.211.77]) ([10.213.211.77]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jun 2022 09:03:01 -0700 Message-ID: <7e6a9a27-7286-7f21-7fec-b9832b93b10c@linux.intel.com> Date: Wed, 29 Jun 2022 17:02:59 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets Content-Language: en-US To: Mauro Carvalho Chehab Cc: Andi Shyti , Mauro Carvalho Chehab , Chris Wilson , Fei Yang , Thomas Hellstrom , Bruce Chang , Daniel Vetter , Dave Airlie , David Airlie , Jani Nikula , John Harrison , Joonas Lahtinen , Matt Roper , Matthew Brost , Rodrigo Vivi , Tejas Upadhyay , Umesh Nerlige Ramappa , dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, Mika Kuoppala , Chris Wilson , stable@vger.kernel.org, =?UTF-8?Q?Thomas_Hellstr=c3=b6m?= References: <5ee647f243a774927ec328bfca8212abc4957909.1655306128.git.mchehab@kernel.org> <160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel.com> <20220627110056.6dfa4f9b@maurocar-mobl2> <20220629172955.64ffb5c3@maurocar-mobl2> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc In-Reply-To: <20220629172955.64ffb5c3@maurocar-mobl2> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,HK_RANDOM_ENVFROM,HK_RANDOM_FROM, NICE_REPLY_A,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 29/06/2022 16:30, Mauro Carvalho Chehab wrote: > On Tue, 28 Jun 2022 16:49:23 +0100 > Tvrtko Ursulin wrote: > >> .. which for me means a different patch 1, followed by patch 6 (moved >> to be patch 2) would be ideal stable material. >> >> Then we have the current patch 2 which is open/unknown (to me at least). >> >> And the rest seem like optimisations which shouldn't be tagged as fixes. >> >> Apart from patch 5 which should be cc: stable, but no fixes as agreed. >> >> Could you please double check if what I am suggesting here is feasible >> to implement and if it is just send those minimal patches out alone? > > Tested and porting just those 3 patches are enough to fix the Broadwell > bug. > > So, I submitted a v2 of this series with just those. They all need to > be backported to stable. I would really like to give even a smaller fix a try. Something like, although not even compile tested: commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680 Author: Chris Wilson Date: Wed Jun 29 16:25:24 2022 +0100 drm/i915/gt: Serialize TLB invalidates with GT resets Avoid trying to invalidate the TLB in the middle of performing an engine reset, as this may result in the reset timing out. Currently, the TLB invalidate is only serialised by its own mutex, forgoing the uncore lock, but we can take the uncore->lock as well to serialise the mmio access, thereby serialising with the GDRST. Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with i915 selftest/hangcheck. Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Reported-by: Mauro Carvalho Chehab Tested-by: Mauro Carvalho Chehab Reviewed-by: Mauro Carvalho Chehab Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Acked-by: Thomas Hellström Reviewed-by: Andi Shyti Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Tvrtko Ursulin diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 8da3314bb6bf..aaadd0b02043 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) mutex_lock(>->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); + spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ + + for_each_engine(engine, gt, id) { + struct reg_and_bit rb; + + rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); + if (!i915_mmio_reg_offset(rb.reg)) + continue; + + intel_uncore_write_fw(uncore, rb.reg, rb.bit); + } + + spin_unlock_irq(&uncore->lock); + for_each_engine(engine, gt, id) { + struct reg_and_bit rb; + /* * HW architecture suggest typical invalidation time at 40us, * with pessimistic cases up to 100us and a recommendation to @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) */ const unsigned int timeout_us = 100; const unsigned int timeout_ms = 4; - struct reg_and_bit rb; rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); if (!i915_mmio_reg_offset(rb.reg)) continue; - intel_uncore_write_fw(uncore, rb.reg, rb.bit); if (__intel_wait_for_register_fw(uncore, rb.reg, rb.bit, 0, timeout_us, timeout_ms, If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag. > I still think that other TLB patches are needed/desired upstream, but > I'll submit them on a separate series. Let's fix the regression first ;-) Yep, that's exactly right. Regards, Tvrtko