Received: by 2002:a05:6358:51dd:b0:131:369:b2a3 with SMTP id 29csp507995rwl; Wed, 9 Aug 2023 19:07:34 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHhjhDT3etzx+RWIrRjHfTZTuH27iiCcs87YVXvnPMKGfY+h2Cb7taiByqO76MFMFIdE6CL X-Received: by 2002:a17:90a:4a97:b0:268:e43a:dbfd with SMTP id f23-20020a17090a4a9700b00268e43adbfdmr1007984pjh.1.1691633254019; Wed, 09 Aug 2023 19:07:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691633254; cv=none; d=google.com; s=arc-20160816; b=Kk8tdMckQAnYJmIYpySQfQNXR2594NgcWQ3r7EqNRqyYeCd5PztypXJEBkS73euSAl zGVhODRwlDCH3w78U+xpRH3i/lly3ZkmTUiUUwvHf3RTuRuIG5lr95RyTIq1/7l3InGS Gg37+2leEE1kM7x/pacVEuemDdInXws9ohnlmHdyJj0/6LxoUr9UHCG7y0/oqpuJz3PN gaEh24he4eAy7Nc1sRlPe0GnvDImulcmTGP88RDW+9z2C3rs68uXv5KlWVWEJqSnD3bo hDPmD0+06xcHQV8Q8lLXHdQt9f67uVaXaAFoSEEg2RwsVsoTUlKoEbbNHfEgBLNmx3Da zpsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=FfKZ2zetRkqCUy727lxudP1qzXimUWIEO5T0FNFJM28=; fh=w+uK6ENXG0Q9F/R8uImdOWoqhi/UWaPoeCKMcCQjF6s=; b=CBPltviH7HJ8UVq0cD1qE/Mq+8t/XMzrHUhgfjuGt+YE6+dpVUENblNXCkhHeFSQF3 uq8slw1batCIKi3n3gTd8XM1hbKu90+Yp5Jc6crNZ1O+nqPwXitFJ0JxU6+gnKuge4WJ 22q/jEvYibD+l8uUYXpVRAOmIIsLZS8ZUXFnwXE2AGby7dGCp3oA/NJ1lyxe1cH6gkGx XPnmqonyi3MHkro3Ep6ia21eamBBzPQD0OubU2Dmh0q0BJb+Nw+ltunawEnAhnHdPTg/ L5bfVoTmvNOEqxIJyPURgEpeAsR4AXcdjxzRT3SnGX9jzJ5slFrZZxEw5McO1A3GIWli KmvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n91-20020a17090a5ae400b00269781d3a2bsi490629pji.134.2023.08.09.19.07.21; Wed, 09 Aug 2023 19:07:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231668AbjHJBUk (ORCPT + 99 others); Wed, 9 Aug 2023 21:20:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229480AbjHJBUk (ORCPT ); Wed, 9 Aug 2023 21:20:40 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B88D10D8; Wed, 9 Aug 2023 18:20:39 -0700 (PDT) Received: from kwepemm600005.china.huawei.com (unknown [172.30.72.56]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4RLpx82lpvzcbZ9; Thu, 10 Aug 2023 09:19:24 +0800 (CST) Received: from [10.67.103.158] (10.67.103.158) by kwepemm600005.china.huawei.com (7.193.23.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Thu, 10 Aug 2023 09:20:36 +0800 Subject: Re: [PATCH] USB:bugfix a controller halt error To: Alan Stern , Oliver Neukum CC: , , References: <20230721100015.27124-1-liulongfang@huawei.com> <77a8ecb4-8099-1826-abd8-4f080d80b07d@huawei.com> <73b58ff7-2a0a-43f7-bda9-52b9437f5bc0@rowland.harvard.edu> From: liulongfang Message-ID: Date: Thu, 10 Aug 2023 09:20:36 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="gbk" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.103.158] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemm600005.china.huawei.com (7.193.23.191) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023/7/27 23:57, Alan Stern wrote: > On Thu, Jul 27, 2023 at 05:31:41PM +0200, Oliver Neukum wrote: >> On 27.07.23 16:42, Alan Stern wrote: >>> On Thu, Jul 27, 2023 at 03:03:57PM +0800, liulongfang wrote: >>>> On 2023/7/26 22:20, Alan Stern wrote: >> >>>>> It seems to me that something along these lines must be necessary in >>>>> any case. Unless the bad memory is cleared somehow, it would never be >>>>> usable again. The kernel might deallocate it, then reallocate for >>>>> another purpose, and then crash when the new user tries to access it. >>>>> >>>>> In fact, this scenario could still happen even with your patch, which >>>>> means the patch doesn't really fix the problem. >> >> I suppose in theory you could have something like a bad blocks list >> just for RAM, but that would really hurt. You'd have to do something >> about every DMA operation in every driver in theory. >> >> Error handling would basically be an intentional memory leak. > > I started out thinking this way, but maybe that's not how it works. > Perhaps simply overwriting the part of memory that got the ECC error > would clear the error state. (This may depend on the kind of error, > one-time vs. permanent.) > > If that's the case, and if the memory buffer was deallocated without > being accessed and then later reallocated, things would be okay. The > routine that reallocated the buffer wouldn't try to read from it before > initializing it somehow. > >>>> This patch is only used to prevent data in the buffer from being accessed. >>>> As long as the data is not accessed, the kernel does not crash. >>> >>> I still don't understand. You haven't provided nearly enough >>> information. You should start by answering the questions that Oliver >>> asked. Then answer this question: >>> >>> The code you are concerned about is this: >>> >>> r = usb_control_msg(udev, usb_rcvaddr0pipe(), >>> USB_REQ_GET_DESCRIPTOR, USB_DIR_IN, >>> USB_DT_DEVICE << 8, 0, >>> buf, GET_DESCRIPTOR_BUFSIZE, >>> initial_descriptor_timeout); >>> switch (buf->bMaxPacketSize0) { >>> >>> You're worried that if an ECC memory error occurs during the >>> usb_control_msg transfer, the kernel will crash when the "switch" >>> statement tries to read the value of buf->bMaxPacketSize0. That's a >>> reasonable thing to worry about. >> >> Albeit unlikely. If the hardware and implementation are reasonable >> you'd return a specific error code from the HCD and clean up the >> RAM in your ecc driver. >> >> The fix for USB would then conceptually be something like >> >> retryio: >> r = usb_control_msg() >> if (r == -EMEMORYCORRUPTION) >> goto retryio; > > Yes, we could do this, but it's not necessary. Let's say that the HCD > returns -EMEMORYCORRUPTION and the ecc driver cleans up the RAM > (probably by resetting its contents to 0, but possibly leaving garbage > there instead). Then when the following code in hub_port_init() tests > buf->bMaxPacketSize0, it will see an invalid value and will retry the > transfer. > > Or, with low probability, it will see a valid but incorrect value. If > that happens then later transfers using ep0 will fail, causing the hub > driver to reiterate the outer loop in hub_port_connect(). Eventually > the device will be correctly initialized and enumerated. > > Alan Stern > OK, thanks. Longfang. > . >