Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp2402640ioo; Sat, 28 May 2022 12:30:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx3iAy0g4VbQuh9PGs42u3L518XBNlf/6fcwkPwmHRmv5SXg8E7zDOqvvimt1bvRKt1tmSx X-Received: by 2002:a05:6a00:1790:b0:519:139d:705d with SMTP id s16-20020a056a00179000b00519139d705dmr14587715pfg.71.1653766220267; Sat, 28 May 2022 12:30:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653766220; cv=none; d=google.com; s=arc-20160816; b=0WIUGtBQkECUZW7jbnMEKJXH0baToOmkSwmkWU3yxE1KHWY1DiWeHhkIsU0spjpXXH oTUxkQBCP9ra5rgPtUMLxW+g6vZYFR9ypji4b0HCnYC+CVq0lrIi+8KeVfCU63SsgCAY YZ9FMe0ODUQrVwRIZ38FjjUZTPulmIsAC4tITWCmUpLo+X+OTQfyii3LAXY5WcAAvy19 VBtO57WvxzDe7AfoUXYL/NIX5TRklgINJqMNzJQZBfPuKjyE64SMHP5XT7wLSkWE9ghQ 9d4GVos2leP4C2Jinn2Bwu7XApRf8e2hW1idEcq7hGf1YlqL3eSBB6Zi8wpnqPsJ6oTn 40Dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=7rFn7sa0lNgLU8HC2I5lJP8x2lTUtzbnkK4g8Bdll/o=; b=fk41t4sLAWRovb5Dy73WmLXVeheaeZlQ1KAnGx+6U7NlFUbg7tYUYm+S6ggXWWFh/W R3LI2T222JJLZPmrw32sKdFssHvZ/dEgHDFZ9ieAoz7bBmA3QKPsrXjBGs69bDjZRIOs PlfzmekIzldQq9W4z7lmYtT4sgycPiVaGrY+0UQbxrpz5w7iFJOf8FFZ9dYbq2lxR4bg BwTQcEG1pGj61wSti1RYLBIZJZcKEGoBGpRnwdJbgasObgqo/UVL2X6eabOuKanUCx50 TgA111Tcah4x005zae8Qiq0oEsPld6TdIZWu+gvq5apcqcCNFEpySUSvDuDQJfUWE+t+ YxPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mxqncneW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id n25-20020a056a000d5900b00518ffdf42d7si8255033pfv.285.2022.05.28.12.30.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 May 2022 12:30:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mxqncneW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B9800554A6; Sat, 28 May 2022 11:58:44 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352858AbiE0MFO (ORCPT + 99 others); Fri, 27 May 2022 08:05:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353292AbiE0L4W (ORCPT ); Fri, 27 May 2022 07:56:22 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C068131F1A; Fri, 27 May 2022 04:50:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1653652231; x=1685188231; h=date:from:to:cc:subject:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8Pv1I1Nns+I2QLs1eslBaZaz+CjfC1UGQQpD+Gz2iAk=; b=mxqncneW1ZdY+V8kbWzw9qroLJxt1MrrZoNgFtcZORF5DiF1j0koNcNA Q1LQRi+wG0OVVt8zRBlKVlDlYq/WdPw1mRAumNAWxOyB34jwp5up2KjQG pbCibHeTevO7ZDlibNYrXnShjdjxEYl/mWpuylJdMO92w0BTsarujgK5T mQnA0KYO8lAPrkdwE9ZRKpiDn5HX7B0vY3wV8/ulyfzn1atR8fVb6x6cs MiOVhtZ0i+RhBhsXMQFiHdm/evR9Z3yzwVfYvS+Q1XHcj7wtNFFJli9JN BZYfAJU+RUa//mwviEz6q6Jf4I0NWrsNaehmZnsZdrgf82QKQKRW5UoJb g==; X-IronPort-AV: E=McAfee;i="6400,9594,10359"; a="262078060" X-IronPort-AV: E=Sophos;i="5.91,255,1647327600"; d="scan'208";a="262078060" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2022 04:50:30 -0700 X-IronPort-AV: E=Sophos;i="5.91,255,1647327600"; d="scan'208";a="705112301" Received: from tdietric-mobl.ger.corp.intel.com (HELO linux.intel.com) ([10.252.54.13]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2022 04:50:26 -0700 Received: from localhost ([127.0.0.1] helo=maurocar-mobl2) by linux.intel.com with esmtps (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1nuYU2-000PFu-Ro; Fri, 27 May 2022 13:50:22 +0200 Date: Fri, 27 May 2022 13:50:22 +0200 From: Mauro Carvalho Chehab To: Tvrtko Ursulin Cc: Mauro Carvalho Chehab , Andi Shyti , Daniel Vetter , Daniele Ceraolo Spurio , David Airlie , Jani Nikula , John Harrison , Joonas Lahtinen , Lucas De Marchi , Matt Roper , Matthew Auld , Rodrigo Vivi , dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, Tvrtko Ursulin , Sushma Venkatesh Reddy , Daniel Vetter , Dave Airlie , Jon Bloomfield , Jani Nikula , stable@vger.kernel.org Subject: Re: [PATCH] drm/i915: don't flush TLB on GEN8 Message-ID: <20220527135022.0dd0891d@maurocar-mobl2> In-Reply-To: References: X-Mailer: Claws Mail 4.1.0 (GTK 3.24.34; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 27 May 2022 11:55:42 +0100 Tvrtko Ursulin wrote: > On 27/05/2022 10:09, Mauro Carvalho Chehab wrote: > > i915 selftest hangcheck is causing the i915 driver timeouts, as > > reported by Intel CI: > > > > http://gfx-ci.fi.intel.com/cibuglog-ng/issuefilterassoc/24297?query_key=42a999f48fa6ecce068bc8126c069be7c31153b4 > > > > When such test runs, the only output is: > > > > [ 68.811639] i915: Performing live selftests with st_random_seed=0xe138eac7 st_timeout=500 > > [ 68.811792] i915: Running hangcheck > > [ 68.811859] i915: Running intel_hangcheck_live_selftests/igt_hang_sanitycheck > > [ 68.816910] i915 0000:00:02.0: [drm] Cannot find any crtc or sizes > > [ 68.841597] i915: Running intel_hangcheck_live_selftests/igt_reset_nop > > [ 69.346347] igt_reset_nop: 80 resets > > [ 69.362695] i915: Running intel_hangcheck_live_selftests/igt_reset_nop_engine > > [ 69.863559] igt_reset_nop_engine(rcs0): 709 resets > > [ 70.364924] igt_reset_nop_engine(bcs0): 903 resets > > [ 70.866005] igt_reset_nop_engine(vcs0): 659 resets > > [ 71.367934] igt_reset_nop_engine(vcs1): 549 resets > > [ 71.869259] igt_reset_nop_engine(vecs0): 553 resets > > [ 71.882592] i915: Running intel_hangcheck_live_selftests/igt_reset_idle_engine > > [ 72.383554] rcs0: Completed 16605 idle resets > > [ 72.884599] bcs0: Completed 18641 idle resets > > [ 73.385592] vcs0: Completed 17517 idle resets > > [ 73.886658] vcs1: Completed 15474 idle resets > > [ 74.387600] vecs0: Completed 17983 idle resets > > [ 74.387667] i915: Running intel_hangcheck_live_selftests/igt_reset_active_engine > > [ 74.889017] rcs0: Completed 747 active resets > > [ 75.174240] intel_engine_reset(bcs0) failed, err:-110 > > [ 75.174301] bcs0: Completed 525 active resets > > > > After that, the machine just silently hangs. > > > > The root cause is that the flush TLB logic is not working as > > expected on GEN8. > > > > Tested on an Intel NUC5i7RYB with an i7-5557U Broadwell CPU. > > > > This patch partially reverts the logic by skipping GEN8 from > > the TLB cache flush. > > Since I am pretty sure no such failures were spotted when merging the > feature I assume the failure is sporadic and/or limited to some > configurations? Do you have any details there? Because it is an > important security issue we should not revert it lightly. It occurs every time here: https://intel-gfx-ci.01.org/tree/drm-tip/fi-bdw-5557u.html It also happens on my own NUC5i7RYB every time when the TLB patch is applied. Reverting it (or applying this fix) is enough for hangcheck to pass. I suspect that TLB flush never happens there, causing ETIMEOUT at hangcheck. It could indeed be limited to some specific setups. I dunno. The only Gen8 machine I have access is my own NUC. So, I can't test it elsewhere. Regards, Mauro