Received: by 10.223.164.202 with SMTP id h10csp6123939wrb; Tue, 21 Nov 2017 23:02:04 -0800 (PST) X-Google-Smtp-Source: AGs4zMZi8lMmZwj+Hly93DcUNp/S1ugDctItpVxSFCnIxjEWuy5iFG2eBvonkKGL2CLWvd8Lrscc X-Received: by 10.99.165.79 with SMTP id r15mr19689587pgu.280.1511334123968; Tue, 21 Nov 2017 23:02:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1511334123; cv=none; d=google.com; s=arc-20160816; b=yT/zoFmrFJPDLeX3ITaMaG+JsGxDgpP8eOiRjCVXCjRaKQayjpcWO0AE3cugqkTrmI MfNZjCya0Ukr4D1mP3tb09/anGpKuJg63AwH1Hyk6YtsiN7F096h13e3kWlgfuxx9Zuo CQChg69BDMhT3fwu1Fb4H6g+op+J7HwYTcu83ImFLlnA4vz+h4fRU+gk0hTiv7gH4wKr az4m5oBq02Ali7iMjLor3NW8DRd6aFg9MS2TA93fbzxA79otxFrso64At0FrmUV1djqV OLVOmeXOsk8AUX0UrL6E5Ec5V7twqLehHf4RDMN+KFhGMLgmCt+Sc1J9fpxa9g51dhu7 mrmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:spamdiagnosticmetadata :spamdiagnosticoutput:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature:arc-authentication-results; bh=RRyOhZ+oYxG9e5JO21+m+o5F3FmIBEtnjlCmlO3s+Eg=; b=DUruO3ko4i9tJ/ulFQ3Xt3X29uriTifw5rIlou/O5GAxlqDYFM9+p0+L/UsknCo9hR w7EJPLpd9h8ZwC06ltM+m0HrOKkEJHS8EcuwF0KAIzk6e9eEJ89qMrj9OYlGl0s49Ns4 P9atNk+YOJAdL4s6aaJzbBHf7lU1ac00jlAPqd21NVhRe/fLXIZ1PPkQLnDiorJ/sT6d PPaBJV7Xe8ck+Tek8leeaPel4yC0qzRpBFjTgqUcfkPzJsp9sSCDkPp+iHF3XQ4lLH4W arC3P6Vm/vlNBaufxwAZLPrVrHWp1PPrK3tZVmdq4oZM7VRdMgBeYbnpek8GL6hqLQM8 Antw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=selector1 header.b=dhgL9Yqt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 39si12107195plc.68.2017.11.21.23.01.52; Tue, 21 Nov 2017 23:02:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=selector1 header.b=dhgL9Yqt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751436AbdKVHBQ (ORCPT + 76 others); Wed, 22 Nov 2017 02:01:16 -0500 Received: from mail-eopbgr40124.outbound.protection.outlook.com ([40.107.4.124]:21264 "EHLO EUR03-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751216AbdKVHBO (ORCPT ); Wed, 22 Nov 2017 02:01:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=virtuozzo.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=RRyOhZ+oYxG9e5JO21+m+o5F3FmIBEtnjlCmlO3s+Eg=; b=dhgL9YqtyU8KI0vltIrellxs3s8rkGbXEPKBbj1Z/xeaS5XM+y87hnKLSjJ/OnYTo3l7NkEEaT/JnraNjyKaT8K0WZF8+SY5mYfCJR89BuPYugr25q0BRSvEiVDsMNjfrCT7NcpKYR5hGPCAHefDqT5Z27tPIq25z9kZtknBZag= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=ptikhomirov@virtuozzo.com; Received: from [172.16.24.163] (195.214.232.6) by DB5PR08MB0966.eurprd08.prod.outlook.com (2603:10a6:0:13::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.239.5; Wed, 22 Nov 2017 07:01:08 +0000 Subject: Re: [PATCH] scsi/eh: fix hang adding ehandler wakeups after decrementing host_busy To: Stuart Hayes Cc: "James E . J . Bottomley" , "Martin K . Petersen" , Christoph Hellwig , linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, Konstantin Khorenko , devel@openvz.org References: <20170905125424.15412-1-ptikhomirov@virtuozzo.com> <496d6c6e-565b-9ebb-59b4-d4b17e0d9a62@virtuozzo.com> <9941b46c-72be-e171-2e4b-cb12ac5c439d@gmail.com> <753b79c1-1127-c431-e411-1ff47eea0e98@virtuozzo.com> <173c0be5-e4ab-452d-e5f6-2852147eb040@gmail.com> From: Pavel Tikhomirov Message-ID: <394ef6b9-348c-3db8-0456-d46e49190a25@virtuozzo.com> Date: Wed, 22 Nov 2017 10:01:05 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <173c0be5-e4ab-452d-e5f6-2852147eb040@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [195.214.232.6] X-ClientProxiedBy: HE1P189CA0024.EURP189.PROD.OUTLOOK.COM (2603:10a6:7:53::37) To DB5PR08MB0966.eurprd08.prod.outlook.com (2603:10a6:0:13::21) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 13729d72-09c0-494c-d07e-08d53176cf26 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(5600022)(4604075)(4534020)(4602075)(7168020)(4627115)(201703031133081)(201702281549075)(2017052603258);SRVR:DB5PR08MB0966; X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0966;3:+pw3MC3dAbIxEL9HxQF7qumpQ374GabBCOkfIJ4pIFNgyXP+vXyzQQyo9N/JkUfWTzzh+hHaRkwBS9QJevrjGlsxsVBCtF9updD7/yaaRCMcplAFF54CUfbn24M1Frjhb0aFErYloQv0yXlcF60gFK+8IOU+0M894gSK9L6b3mdb5Oqx3Qs6uvQDdJI0nhMKcE07hA011CCibY4qYqjy9tx5w0XqNeydFA9kCHjT3B3JSOlsFMXS/A50sycQxnCB;25:zsAMMJi9SI7bYhjYLXL9qpXWmPAniL1o1ak/CZz/4chH1j3KGr//pwk9JsXln9KiBNU1KHqZe/mow86gnYrFHl2JEdwGaPBCc+wTODMA3R80GoGNHX7ZfPL1ltTqbsWeR7sKLVZeylzvS/vIe8pxFJl2WnW57XLOeWLbdTcISjNOpJsPNIOctM8dx10D3Q1JoFjnjl+38qTq3zeUMVg+mRZIxNB0LzFDgLX1HgcIaaEOUgYR3JehZo0rH/PFeLZnb7LByJUAeZNp/6h0kax7+q+GCZS6DQo2UuUxgr7BhA7egQD4hOsUJkzDP2BhMaxm+Va08h3VEn6hi9z19NMpvQ==;31:8IFLHhYaU0yQu2sfCtekvtw+psZ3oJTxm0XoZRX8RYKegmPOlo86D9AeYji92/CrAavSpmQoMMgX7viCj4w8RkhFRwCfUX3SXCCt0zdiOsqGJblR+gfSshETvlOMmnQ/Jg8dZnat8sWGtUaIG/YIf/vUfuf50uwDHF1jWuciE3ZqEL/2w13Kac1LiYnAkDW4e5q4JtUR/5/JAmPUnTHCraY6IGEFrL8JgKHsWWODW0o= X-MS-TrafficTypeDiagnostic: DB5PR08MB0966: X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0966;20:Btp1pB87tFs6CGQMLy0nZFH8RYPNBfb2jou0vUCKEKi+9ztOPcTPk2yKHen7307goLDqFISMQ0k11JASOzY6fjCz4b+8231Q8kZXj1YT5lSoEu+mm5QFtELdlDx+PwgXOUzxbFRVbyPc7mLhg63nY6V4ZNbgjrV9MFgGVKC95qr2bvulOgYSLgZVXxJxD0wImUMmOo5lxyJMsHwCFL6aSYr1+S2AIbYq9FUonwFisw/BFhh2WdSecF8mbXHNVamhzLLfuh0zlFY9XPJprFGSg8ei7EGXlj2AMHnpplZc620zu7HtlvC0FejVaBmtG1R+KamltM5IK2RdFUJk4cypec5V0xxt68cFyyCf8dTZHPd/1AsOTn6w/q0/VNdNGCStcDY0ql/jULJZSrVc7IYqf7yrX3UBl3GIAOBfH5YVFx4=;4:GhjtRxlqdPR8YynYvsgslsoQxITB8o50ONjxENqB2v9c2fUC8pKWbwS84anFRaDV1kHwqjR1gxZkwEt7ZKHW7HSWgnfrV3Ww/7MaUJiQWTvT2CL+s+7P4b67JDSJxeaIkPdl3TWyhH6JkgnVPd+mVGDLUbB8sfXj2knwmu9/K52E6v1eS4DagEz05tWf8IQjdYmKZWOcDs9r/w/6ypDpPs59MOgfz6it1I0sOVYGKDdmT3KmHhWQ3BL/psBy7HPnuyNMKXND4TWDQWqJolZwacWNvoewvTDGzVdK0YjMsEhdeJkP+gtm9Lr12FyEG92h X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(6538939549742); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(3002001)(10201501046)(3231022)(93006095)(93001095)(100000703101)(100105400095)(6041248)(20161123560025)(20161123562025)(20161123564025)(20161123558100)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:DB5PR08MB0966;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:DB5PR08MB0966; X-Forefront-PRVS: 0499DAF22A X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(6049001)(376002)(346002)(199003)(24454002)(189002)(16526018)(81166006)(8676002)(81156014)(6246003)(55236003)(83506002)(966005)(107886003)(67846002)(6116002)(3846002)(53936002)(68736007)(33646002)(25786009)(6306002)(478600001)(8936002)(31686004)(316002)(101416001)(93886005)(54906003)(50986999)(76176999)(54356999)(16576012)(58126008)(97736004)(106356001)(6486002)(105586002)(64126003)(7736002)(86362001)(31696002)(305945005)(65826007)(5660300001)(189998001)(53546010)(77096006)(36756003)(229853002)(2906002)(47776003)(39060400002)(6916009)(2950100002)(4326008)(2870700001)(50466002)(23676004)(65806001)(6666003)(66066001)(2486003)(52146003)(65956001)(3714002);DIR:OUT;SFP:1102;SCL:1;SRVR:DB5PR08MB0966;H:[172.16.24.163];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; Received-SPF: None (protection.outlook.com: virtuozzo.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtEQjVQUjA4TUIwOTY2OzIzOjdmRjBXb1MyT01iMEdQTmc4UGI2cURVNnRw?= =?utf-8?B?WUZmRFI3UlhPNm0vTHV3VnErcmNKNjJEekFEYm9EVFFFd01DbWxrbEN1eXJi?= =?utf-8?B?TE51ZUhJeFJmN2VmZVBzSHcxdEwrdkpFeVRhRDIvc0tuR3daYjJDdStrWi83?= =?utf-8?B?c0w2THBOMHhzUzhudGkwb2dhdERqL1E2eUVKcktBQ1pNR1FWSEtyWjRyVnJT?= =?utf-8?B?VCtrd0dkaFRLYUpGUVhGT0hRQ0RYMGRsVzk5czA0TEhnV0pqOVFrTjZ2SEt0?= =?utf-8?B?WnNDc3ZCcWNDb0ZmRTEyek5QM0xEYXcvekRLU0s1cHN2b3VzZ1FaUFl5bm1B?= =?utf-8?B?aHgxZWV5ZlErK0w4dWphWU84NVpoaTNUd2lTMDhmbnJLNmpDbzFyaW1OTXd5?= =?utf-8?B?b3l3a1E5bGJkakFnQzdXdm1QTnV1Lzc1TEZJSWxRMUxtK0JFT0JreXQ0N1Zo?= =?utf-8?B?NVNOdTRqT0cvdHh3cElpMHdMbVd6Q1dUeXJBek1RcWt5Y3pjM09mWG0rWmIr?= =?utf-8?B?YmRjbzJhck5mb3dMazZzdGZkTHp3Q2dlNHVZcURINmFzYytRTXU3QXpMZDI1?= =?utf-8?B?VlpaRllFenBIZ2lmbExpSWVBenFhYWpRaThYTGNaY2x6Sk4wZVhucm1qSjF6?= =?utf-8?B?UXdJZ3RBRVdQTEk1a3Q1S3dHRmJCZHhzUHlUazB5VXNhSDdLRDZzdkxtWWRk?= =?utf-8?B?VzB5eHF6ejBybkxML3prZE1ZVVpnRDhBVW9Ld0RDVEx0UTZ5ZTdGWHVMM0hZ?= =?utf-8?B?d0poK0JROHJPTmRRdFNIMmlEZCtiakdEckhLY05VdlN0N1RlbUp6UFJvMmFJ?= =?utf-8?B?WHUwdE5ZejhnaDBVTGVyaVF3bjY0N1lMSVN3NW9IR0xYaUhWT2lUOGhlMWds?= =?utf-8?B?Nkh1amN2elVTNHh5bmNyTks2R3ZSRG1oTjNaMlVkcHQxczNMM0oxbGpCSnRq?= =?utf-8?B?N3BPSm5rRS9WelNQcmJHbUhPVXJxZ1cvNTZ1OWxmaEZuLzlFTWFUbEVBV3JR?= =?utf-8?B?em5udkczbW5KdlZVdVR4OWp3MHJhcmMxcGYvVUQ5YnpuYmpzRlp3MG5TZW12?= =?utf-8?B?YXhnbCtzYlQ2SnhkNFZuemJqcUo2QnJLdEdVemFvek1wTEVrMTE5TmJ3cFJJ?= =?utf-8?B?R0Q0QjRaV2VVZzR6SlRoc0hsK0pmU3V5V3Buc3YycjZnNS8vQ0o2ZU9SdnpR?= =?utf-8?B?aFVldUpIT1RMaEdUQ0JKNnFNRG9BTVJYajFpSW03RUwyc3RFQU4xbjcraW55?= =?utf-8?B?UUFIdDdGay9OV3l2dEYrK05CRTNLZzhZclRCYlZhUjZWWklZLzNjSEZCa210?= =?utf-8?B?RFYxTG51STd4RWRlaVVLeVZRRTRTbHE3cEFhQzIyc1ZPWERRTDNSZGQ5eW82?= =?utf-8?B?WTFpLzR5bkpsTVhKeTlVSEJQN2J2ODBSRTJ3YzkrOVQrZWI5NndZZTdvNXFh?= =?utf-8?B?bzAzanRDdnhwdStpdTV1UzZ2VEJRWHE0WEJDRFFoR3YwZVRuVnV4Yld2cU1V?= =?utf-8?B?QU9lWnk3Ky9kRmYrdko1aWF5TUh2TUJlVGNacHY4SnBhNnRwd1ZVcVFiMlhl?= =?utf-8?B?bjN1bEVqeG1UQkJGU3hnek5GK3ZFcHlwUk9XR08zQ1B0M280YnZCYmNCbEow?= =?utf-8?B?ZDNJNFBMaFNhczF4anJMTHhnN0N4Y2FwemRQTE1zY1lRaVU1cHR2RDlzWStG?= =?utf-8?B?dlFlV2lBRHliSHMxVkpCeDM2ekZKMHYrZnluZ3prczdvTUV2cGo1SUYvbEJu?= =?utf-8?B?UW0vTEZrSnZUTmM3cXdNSlYzWWp6UkZpUHNEVCtjZXFFcWVqeEpiZGhSc3A5?= =?utf-8?B?OEJkdnpzQXJyL3FURFFGUER5dHdaQ1MxN2hnUThSRmFiQW9oSCtqWkRYc3JJ?= =?utf-8?B?L0NPMitOOGVvczBPTWJDL0lnaERybEtNc2IzTkFIRTRrYkZ5aG1wVHAyZk43?= =?utf-8?B?MUNobkNUTmZjNzMrandnSDFpUnY4VGVJWXFUaTNqNVNKdmljWXFJTFNPdkFy?= =?utf-8?B?amV1SnNteUkxYTBVZ011WC9FOVYyb0M2Rjk5VDFJc2laWDh5UTFmdHhTektH?= =?utf-8?B?MzNZM29URUJuTElSUmxsMWNKdXE2aHdVUWk4VjN6VFoyczlWemFxd2xJMzF3?= =?utf-8?B?b2c9PQ==?= X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0966;6:aoEOqZKbCj/uc6igekOUbecJrPTbds7NGsUx45HMnOnjngctgIjieBajxevlHuZhoKmuuaWcfe4RMMaMQQtGRKv2WK7Sx0p4TWF9TjOfyLUQYC52KY6jLEhQMKowQtui+ylW4j15yiQ+3fRNLV/OQrOIhqGMXhreKgfPYPLkp7TInT4RV/PyKRPGhn2cQ7KfX99Z4eTsQZsNEV2oTZ37U4AiPT/TMogoGclkdfmpbNujdfW5cb6yFO9ErQQPnLn9Oggfpsj8I99NjD52dGMXQ3o0/dIp6x8tDa7bJHTnaehtEVgj2aAf9S1sDHyIeUd5zP3/A7FjewgXYPiJ3bTZfiTAIzUZpCdXgfW4e6K4yJU=;5:CPnC9NA/I+Ma/DU7Ep2baKoTrqvWklztSj8h8YSg3E7FEs9rfGDPGmmWI1mr98HgNyVeiaajdOiUeDBAgIseDwaZOc/ZQ3msmjCoo9CIrjuSkjY9kv8g2J/7Rzx+ZjX9DjzMOdtW95rNqTvOm2HuiukC4BdEsqgpeK1MB82a9BI=;24:21bGXOsnEUwy4Qd1X5dPPmAZogQeotj2ybJ9yIhkUkiNXt3oEPrXdskDW977+Luua446BWtprGGQRxaIdB7fmVWFV593M+oOMJ4Iasv9MYQ=;7:bGgL9bi0IaRz/d4WxeUjIXkmhgtEa3HsnRgwiuNXQtgDMfp5QY446Q3nUqLqbKVPRBhz4aU4v3NoZyGhtcArl9BQ/WzCaa9BEDRcUzFPoOsU5OlyIB/neETncSF59UMWNn80Q8G63C7z+rmKVcn4HPdqmhYIyKDZD8SH9D1lPRcp8DxTjSMbzhan9PgxK3p/345U34x/4PVeRfJ5nRehrZ/2WxM45ICcYV3Qoe9lYnlIciYh1dQ/lI8+UOTMMHTg SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;DB5PR08MB0966;20:IGuIbSz12krnhhTWPtrUwbnoty+dkWnN6qwlx6dJXZQ8kG3F8lmDpWp8t+jCYngUzHHEd5IstXgg5vt6vwoFpqOnGwB5h1Qb2xeUAD9rl5pw6yoG33E4+L4WEKee6I0EFvT2QJ5fytRvGnuAZt5srzFZTMC0Ycc69CmWwlDlA3k= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Nov 2017 07:01:08.4591 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 13729d72-09c0-494c-d07e-08d53176cf26 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 0bc7f26d-0264-416e-a6fc-8352af79c58f X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB0966 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Great news, that it works for you! Thanks a lot! Pavel On 11/22/2017 03:49 AM, Stuart Hayes wrote: > My apologies... yes, your patch also fixes my issue. I was looking at the two new places from which you were calling scsi_eh_wakeup(), and didn't notice that you moved the spinlock in scsi_device_unbusy()... moving the spinlock in scsi_device_unbusy() also should the issue I'm seeing, given that scsi_eh_scmd_add() also uses the spinlock. > > I tested your patch on my issue, and it did indeed fix my issue. > > So you can add... > > Tested-by: Stuart Hayes > > Thanks > Stuart > > > On 11/21/2017 2:09 AM, Pavel Tikhomirov wrote: >> My patch should also fix your issue too, please see explanation in reply to your patch. Do your testing show that it doesn't? >> >> Thanks, Pavel. >> >> On 11/21/2017 09:10 AM, Stuart Hayes wrote: >>> Pavel, >>> >>> It turns out that the error handler on our systems was not getting woken up for a different reason... I submitted a patch earlier today that fixes the issue I were seeing (I CCed you on the patch). >>> >>> Before I got my hands on the failing system and was able to root cause it, I was pretty sure that your patch was going to fix our issue, because after I examined the code paths, I couldn't find any other reason that the error handler would not get woken up.  I tried forcing the bug that your patch fixes to occur, by compiling in some mdelay()s in a key place or two in the scsi code, but it never failed for me that way.  With my patch, several systems that previously failed in 10 minutes or less successfully ran for many days. >>> >>> Thanks, >>> Stuart >>> >>> On 11/9/2017 8:54 AM, Pavel Tikhomirov wrote: >>>>> Are there any issues with this patch (https://patchwork.kernel.org/patch/9938919/) that Pavel Tikhomirov submitted back in September?  I am willing to help if there's anything I can do to help get it accepted. >>>> >>>> Hi, Stuart, I asked James Bottomley about the patch status offlist and it seems that the problem is - patch lacks testing and review. I would highly appreciate review from your side and anyone who wants to participate! >>>> >>>> And if you can confirm that the patch solves the problem on your environment with no side effects please add "Tested-by:" tag also. >>>> >>>> Thanks, Pavel >>>> >>>> On 09/05/2017 03:54 PM, Pavel Tikhomirov wrote: >>>>> We have a problem on several our nodes with scsi EH. Imagine such an >>>>> order of execution of two threads: >>>>> >>>>> CPU1 scsi_eh_scmd_add        CPU2 scsi_host_queue_ready >>>>> /* shost->host_busy == 1 initialy */ >>>>> >>>>>                  if (shost->shost_state == SHOST_RECOVERY) >>>>>                      /* does not get here */ >>>>>                      return 0; >>>>> >>>>> lock(shost->host_lock); >>>>> shost->shost_state = SHOST_RECOVERY; >>>>> >>>>>                  busy = shost->host_busy++; >>>>>                  /* host->can_queue == 1 initialy, busy == 1 >>>>>                   * - go to starved label */ >>>>>                  lock(shost->host_lock) /* wait */ >>>>> >>>>> shost->host_failed++; >>>>> /* shost->host_busy == 2, shost->host_failed == 1 */ >>>>> call scsi_eh_wakeup(shost) { >>>>>      if (host_busy == host_failed) { >>>>>          /* does not get here */ >>>>>          wake_up_process(shost->ehandler) >>>>>      } >>>>> } >>>>> unlock(shost->host_lock) >>>>> >>>>>                  /* acquire lock */ >>>>>                  shost->host_busy--; >>>>> >>>>> Finaly we do not wakeup scsi_error_handler and all other commands >>>>> coming will hang as we are in never ending recovery state as there >>>>> is no one left to wakeup handler. >>>>> >>>>> So scsi disc in these host becomes unresponsive and all bio on node >>>>> hangs. (We trigger these problem when scsi cmnds to DVD drive timeout.) >>>>> >>>>> Main idea of the fix is to try to do wake up every time we decrement >>>>> host_busy or increment host_failed(the latter is already OK). >>>>> >>>>> Now the very *last* one of busy threads getting host_lock after >>>>> decrementing host_busy will see all write operations on host's >>>>> shost_state, host_busy and host_failed completed thanks to implied >>>>> memory barriers on spin_lock/unlock, so at the time of busy==failed >>>>> we will trigger wakeup in at least one thread. (Thats why putting >>>>> recovery and failed checks under lock) >>>>> >>>>> Signed-off-by: Pavel Tikhomirov >>>>> --- >>>>>    drivers/scsi/scsi_lib.c | 21 +++++++++++++++++---- >>>>>    1 file changed, 17 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >>>>> index f6097b89d5d3..6c99221d60aa 100644 >>>>> --- a/drivers/scsi/scsi_lib.c >>>>> +++ b/drivers/scsi/scsi_lib.c >>>>> @@ -320,12 +320,11 @@ void scsi_device_unbusy(struct scsi_device *sdev) >>>>>        if (starget->can_queue > 0) >>>>>            atomic_dec(&starget->target_busy); >>>>>    +    spin_lock_irqsave(shost->host_lock, flags); >>>>>        if (unlikely(scsi_host_in_recovery(shost) && >>>>> -             (shost->host_failed || shost->host_eh_scheduled))) { >>>>> -        spin_lock_irqsave(shost->host_lock, flags); >>>>> +             (shost->host_failed || shost->host_eh_scheduled))) >>>>>            scsi_eh_wakeup(shost); >>>>> -        spin_unlock_irqrestore(shost->host_lock, flags); >>>>> -    } >>>>> +    spin_unlock_irqrestore(shost->host_lock, flags); >>>>>          atomic_dec(&sdev->device_busy); >>>>>    } >>>>> @@ -1503,6 +1502,13 @@ static inline int scsi_host_queue_ready(struct request_queue *q, >>>>>        spin_unlock_irq(shost->host_lock); >>>>>    out_dec: >>>>>        atomic_dec(&shost->host_busy); >>>>> + >>>>> +    spin_lock_irq(shost->host_lock); >>>>> +    if (unlikely(scsi_host_in_recovery(shost) && >>>>> +             (shost->host_failed || shost->host_eh_scheduled))) >>>>> +        scsi_eh_wakeup(shost); >>>>> +    spin_unlock_irq(shost->host_lock); >>>>> + >>>>>        return 0; >>>>>    } >>>>>    @@ -1964,6 +1970,13 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, >>>>>      out_dec_host_busy: >>>>>        atomic_dec(&shost->host_busy); >>>>> + >>>>> +    spin_lock_irq(shost->host_lock); >>>>> +    if (unlikely(scsi_host_in_recovery(shost) && >>>>> +             (shost->host_failed || shost->host_eh_scheduled))) >>>>> +        scsi_eh_wakeup(shost); >>>>> +    spin_unlock_irq(shost->host_lock); >>>>> + >>>>>    out_dec_target_busy: >>>>>        if (scsi_target(sdev)->can_queue > 0) >>>>>            atomic_dec(&scsi_target(sdev)->target_busy); >>>>> >>>> >>> >>> --- >>> This email has been checked for viruses by Avast antivirus software. >>> https://www.avast.com/antivirus >>> From 1584725321253596605@xxx Wed Nov 22 00:50:37 +0000 2017 X-GM-THRID: 1577704452897311330 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread