Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp975547ybt; Wed, 17 Jun 2020 19:38:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyC9NaTHBRY+rCTT216S01pacYeFm1VDKafGUWxyBPhcpzmy5x6Xwykb+ncXSIgw76nSdPj X-Received: by 2002:a17:906:7696:: with SMTP id o22mr1844133ejm.245.1592447921615; Wed, 17 Jun 2020 19:38:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592447921; cv=none; d=google.com; s=arc-20160816; b=b5GaN+xDwVY6o9lmaWI9kOpKdMy5ChAEeiBdaO8Bo+9+VMQYwsHgsEtzUyx70xm36x 8UaEDIsQEzPUw6v0Z8/sHV3sWXgt+JfLuPaxSZ0GCznlkIhddhOyRvgTEoWXKEN3XNh6 66nQjcpLkIqSQGZawJROHAqeGJSCporaRhuXshZkNagMoxHIkZpTK7wWFb52pulrnqsP ciAVW4csFQkIg/e9YJlmVsJSzvaKMNbt6D3aEzF5Rk/wxZXk6uSUi8gIUNBxq3NkeVKD Pr+DzmdsuMI6tO5hJ2wef7Af9KYbF3v2i7uox8MuWuYRhfS8zFXXAGLLyxLggelaFW38 h2cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=22MiKM6SEanOFaGJB8Qw7UCWIE5XvIkErNyEs5bhUHI=; b=kNhPBkNDfzamEXHJ9wPwze29a3Zi52ApiS0Ecs+ydnAli5oXkq7kmki/LfMqY6nHYZ mK/8a7FyA8/rEra/aootMDSrrFoHlp9mZVb3Gw/FUgJ7XGY+0zpQ/ZxLCoEv7i3Wmk4B cbnAvwJqOLQA6/7Dfl0s8rN6LGnsEVOPPEWEuNnUfH6neBzPmKzwqVzO9ss1pe+GU5bB OJcWb+VT7JxGW6Yh2lbFLrZIcEn9yxChddENXPh5JuhdOzNUrDIP8yuYWJ6DmxRmX1xJ i+U8REOQIa81ChkW5neWmUdJb6I4G43yUVtj6raVWvy6Dcv3Jpy2CnIXc3o8DCbiKjlH 8x2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Dn2RXfDM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ds3si1282919ejc.545.2020.06.17.19.38.19; Wed, 17 Jun 2020 19:38:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Dn2RXfDM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728676AbgFRBNy (ORCPT + 99 others); Wed, 17 Jun 2020 21:13:54 -0400 Received: from mail.kernel.org ([198.145.29.99]:41500 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728974AbgFRBMn (ORCPT ); Wed, 17 Jun 2020 21:12:43 -0400 Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E5EB12193E; Thu, 18 Jun 2020 01:12:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1592442763; bh=jCVTeln9hYqBwZrVBsTL/59b9zWrZJ+FeA82jf6YncU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Dn2RXfDMyuUFmLAT7EIVX0YgZ81acSCjBFyRs8FGlSE5+UXy/0b1QmluBP9AHuTEt /bLCxeBy1fNpMXbcjwqQGcwmVQgww+xaXHj4znvGfU30iHZTp9HUqcjoWyR6q7/A0A NejvHrkc8hpHuUraL2JgUvgdmXxWSAgZkvYpUieg= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Oded Gabbay , Omer Shpigelman , Sasha Levin Subject: [PATCH AUTOSEL 5.7 213/388] habanalabs: increase timeout during reset Date: Wed, 17 Jun 2020 21:05:10 -0400 Message-Id: <20200618010805.600873-213-sashal@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200618010805.600873-1-sashal@kernel.org> References: <20200618010805.600873-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Oded Gabbay [ Upstream commit 7a65ee046b2238e053f6ebb610e1a082cfc49490 ] When doing training, the DL framework (e.g. tensorflow) performs hundreds of thousands of memory allocations and mappings. In case the driver needs to perform hard-reset during training, the driver kills the application and unmaps all those memory allocations. Unfortunately, because of that large amount of mappings, the driver isn't able to do that in the current timeout (5 seconds). Therefore, increase the timeout significantly to 30 seconds to avoid situation where the driver resets the device with active mappings, which sometime can cause a kernel bug. BTW, it doesn't mean we will spend all the 30 seconds because the reset thread checks every one second if the unmap operation is done. Reviewed-by: Omer Shpigelman Signed-off-by: Oded Gabbay Signed-off-by: Sasha Levin --- drivers/misc/habanalabs/habanalabs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/habanalabs/habanalabs.h b/drivers/misc/habanalabs/habanalabs.h index 31ebcf9458fe..a6dd8e6ca594 100644 --- a/drivers/misc/habanalabs/habanalabs.h +++ b/drivers/misc/habanalabs/habanalabs.h @@ -23,7 +23,7 @@ #define HL_MMAP_CB_MASK (0x8000000000000000ull >> PAGE_SHIFT) -#define HL_PENDING_RESET_PER_SEC 5 +#define HL_PENDING_RESET_PER_SEC 30 #define HL_DEVICE_TIMEOUT_USEC 1000000 /* 1 s */ -- 2.25.1