Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp337923rdb; Tue, 5 Dec 2023 06:57:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IF+wmk3SBME6tllK7rl1t5hMt/NCPuHgYK52DrOSCUVmWPmEaJhy9uKyrkl0qmJkM6pM8vo X-Received: by 2002:a05:6a00:6c95:b0:6ce:10ed:7754 with SMTP id jc21-20020a056a006c9500b006ce10ed7754mr1547237pfb.31.1701788242129; Tue, 05 Dec 2023 06:57:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701788242; cv=none; d=google.com; s=arc-20160816; b=awS0ACMNxqNOBSC2OgrHU0M6sP3Cbii4y4/yZHNjUUtoWE2ijiioLuW3pDgtbP+BST IvDMQwQDwc/w1FSMg1fOQ9wLGDvI7ngWJ6tgx39Y1H75xikJlZvufK9ti2ZCImJ83Vjy Cmkx79K9na2WAnUmOH33TH71ybqALPBWKMMuRTE+B2JFeIRpZNQN5LvVPkd3pdvYOXee 43jTIBQ2JkeT1uYaW6oEErV+a9iKjufkz+EI3E3Pb8DjkEjaXIMfI6SbR6HXIyrA29Vo KCwcgMjGKTQRq/AXFXahPjwds+eQ8BYXMfyoTDYLza5PHWWS6C/uWLiAbWBprP48yNns CZVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature; bh=Yv3uUoOvfqMJcse0StJFWdVfZU0GALSgt6v3o4HUtL8=; fh=8OLNwu4KKoBy14A4JKRZpAZ3fTBmQSmA4nWsRlgIpzw=; b=07EdB7m5Lyw+Rey/cfIIB4eFhT10ubHopLMtVy/h4gDZQyvWXrnq8BOp+skVqSbZFN L1RzJRFn4TUOdm1+bTjBDqDUSpzxpzqQhXeYi2+u5Y8qPVWTH8Mgdsm9pPwMUuDb+vLt NktJydOl2+MPiJypAwIjGslu1Ubcm6doS/MVq5+46LCquJC6wHFqqB0U+5rNvW/3kmOV 6B98Whc9ZTrwsKEfUvNtfeo4c1p1PegRVoInrumRl/LjffjtvxyivUQ1JJ5PYRrokUb8 blzeTrZIXl3U9FmcFprMLECZuGvg0sEH8TWqFEC95a4vS0TL7wbpa0NUU23Ce0atB9TL XIeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HJDY57Zu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id j11-20020a056a00130b00b006cb4c6074c8si9691119pfu.311.2023.12.05.06.57.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 06:57:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HJDY57Zu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 608AB80B26FF; Tue, 5 Dec 2023 06:57:19 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345775AbjLEO5E (ORCPT + 99 others); Tue, 5 Dec 2023 09:57:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58948 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345742AbjLEO5D (ORCPT ); Tue, 5 Dec 2023 09:57:03 -0500 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5013C9; Tue, 5 Dec 2023 06:57:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1701788229; x=1733324229; h=message-id:date:mime-version:to:cc:references:from: subject:in-reply-to:content-transfer-encoding; bh=mlZUqVzzz78kYGKI5cXqocVGJbtwH388Pqsjwxt8CGE=; b=HJDY57ZuhNDJCR+XCRfL6Fd17/nPgoGudZfmJL82Mj3hrg8TuUsdVm0l Ycq0cC4nTUnwXJ0a3q041FVthWTJbRn/hFdIuAgAa3+66SdjK8RbcxA+2 Zm1X2qZy6U7h1saetvUxQoFc4kTmgGMnmW3y4gcipg/mtgs77FglYNHje JIFW362ErU5IhD+dqUX4MJXJpnylOyLZOXvWmslgx0kNK+2Ld2KFuieDs ckQPGutPSS9LCh+mHw5FAP5pVEAjTvIHmBx5fHBXP9ON+Jpfq2Z7oPqGc u9srmuKRztWlvLRoV0WfmIyMo2FqM6l1/b8Ur2EcQr5j+mi1u2Jb3jZwV A==; X-IronPort-AV: E=McAfee;i="6600,9927,10915"; a="378929719" X-IronPort-AV: E=Sophos;i="6.04,252,1695711600"; d="scan'208";a="378929719" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Dec 2023 06:57:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10915"; a="799991881" X-IronPort-AV: E=Sophos;i="6.04,252,1695711600"; d="scan'208";a="799991881" Received: from mattu-haswell.fi.intel.com (HELO [10.237.72.199]) ([10.237.72.199]) by orsmga008.jf.intel.com with ESMTP; 05 Dec 2023 06:57:06 -0800 Message-ID: Date: Tue, 5 Dec 2023 16:58:25 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.13.0 Content-Language: en-US To: Yaxiong Tian , mathias.nyman@intel.com, gregkh@linuxfoundation.org Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, tianyaxiong@kylinos.cn References: From: Mathias Nyman Subject: Re: [PATCH] usb:xhci: Avoid hub_event() stuck when xHC restore state timeout In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 05 Dec 2023 06:57:19 -0800 (PST) On 4.12.2023 10.02, Yaxiong Tian wrote: > From: Yaxiong Tian > > when xHc restore state timeout,the xhci_reusme() return -ETIMEDOUT Out of curiosity, have you tried if it still is possible to revive your xHC controller here? Instead of returning -ETIMEDOUT, try setting " reinit_xhc = true", and jump to "if (reinit_xhc) {" where we reinitialize xHC in xhci_resume() due to other resume issues. > instantly. After usb_hc_died() called ,they kick hub_wq to running > hub_event() but the wq is freezd. When suspend ends,hub_evnet realy > running and sticking. > Such as: > [ 968.794016][ 2] [ T37] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 968.802969][ 2] [ T37] kworker/2:3 D 0 999 2 0x00000028 > [ 968.809579][ 2] [ T37] Workqueue: usb_hub_wq hub_event > [ 968.814885][ 2] [ T37] Call trace: > [ 968.818455][ 2] [ T37] __switch_to+0xd4/0x138 > [ 968.823067][ 2] [ T37] __schedule+0x2dc/0x6a0 > [ 968.827680][ 2] [ T37] schedule+0x34/0xb0 > [ 968.831947][ 2] [ T37] schedule_timeout+0x1e0/0x298 > [ 968.837079][ 2] [ T37] __wait_for_common+0xf0/0x208 > [ 968.842212][ 2] [ T37] wait_for_completion+0x1c/0x28 > [ 968.847432][ 2] [ T37] xhci_configure_endpoint+0x104/0x640 > [ 968.853173][ 2] [ T37] xhci_check_bandwidth+0x140/0x2e0 > [ 968.858652][ 2] [ T37] usb_hcd_alloc_bandwidth+0x1c8/0x348 > [ 968.864393][ 2] [ T37] usb_disable_device+0x198/0x260 > [ 968.869698][ 2] [ T37] usb_disconnect+0xdc/0x3a0 > [ 968.874571][ 2] [ T37] usb_disconnect+0xbc/0x3a0 > [ 968.879441][ 2] [ T37] hub_quiesce+0xa0/0x108 > [ 968.884053][ 2] [ T37] hub_event+0x4d4/0x1558 > [ 968.888664][ 2] [ T37] kretprobe_trampoline+0x0/0xc4 > [ 968.893884][ 2] [ T37] worker_thread+0x4c/0x488 > [ 968.898668][ 2] [ T37] kthread+0xf8/0x128 > [ 968.902933][ 2] [ T37] ret_from_fork+0x10/0x18 > > The result is that you cannot suspend again.because the wq can't > be freezed.Also hard to reboot,when some application visited this > piece. > > The reason of stuck is that some access related to xhci hardware > is being called.But xhci has problem,at least not running.( > when xhci_restore_registers(),the xhci will load op_regs.The > CMD_RUN will clear in xhci_suspend().) > Nice catch and debugging work btw. > So using XHCI_STATE_DYING flag,to avoid any code to touching > hardware immediately.hub_event() will complete.The usb_hc_died > tasks will be completed and some sys interfaces will be removed. The XHCI_STATE_DYING flag is currently only set in xhci_hc_died(). So when this flag is set we could assume that the command ring and pending URBs are, or will be cleaned up. This would change with your patch. We might need some other solution, Maybe set the set_bit(HCD_FLAG_HW_ACCESSIBLE, &hcd->flags) after CNR (Contorller Not Ready) is successfully cleared, and controller is actually accessible. We then would need to add checks to see if controller is accessible before queuing any commands to xHC hardware. Thanks Mathias