Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1175086rwd; Thu, 18 May 2023 08:50:42 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5Glbb/xBx8UtkL9XmUDV8RmMCJRW89vdqD/cjhmPfwbDi7Ddw93JipshsxNCW/Euphpdam X-Received: by 2002:a17:90b:b04:b0:246:9c75:351a with SMTP id bf4-20020a17090b0b0400b002469c75351amr3189896pjb.12.1684425041998; Thu, 18 May 2023 08:50:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684425041; cv=none; d=google.com; s=arc-20160816; b=oFC9X24VrHUTxXcQxVdGceKU1BbKkZvdE2Lw+OhrTgtDIpZDifwZmjHpBUZezd2KkA 8JGQ9KultyGl217FeixOJuArYve3sUGOMp5wV6n6JDE/X6P8hocTzwrKajor6kG8OPpP l8WfotQj5PMGskQuaaMzlo5zEv8AGB+mtSGLxENZ0F1gM2or0xBIkz0pB5c7j9/vqJJE ag1SDYMM0s3xgVgm2/VBUQsoJYs5iQ7454CfxgTd98OyHrZQdMv/W+Lh1A0ICQlbYe/m ACIP54N4zhWoC9BpBmemfeFs7g1RQuI+xSRG3xdHYUXRmg0GNvnhSaqatFgd8DyOwVTj C9/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-filter; bh=4UYZ2IcmL1KklQiEztNjChNplhlI6pOuen+vw5gYh7c=; b=jI8Gn37KGqJDkwn+ep4Fjw3OhfdWj0LazD0xmwfZdYRRVzwtsv8zD4Iy1Hn+WDW3vX aKtZAX2Qpc25aAqlrRmb0QCWkxv9cf6eMHYlBTTXNI/KZsgNAi14OzTXAhD/VeWOzbkH NEigDroEWxsNtWBbLNwCkCb58ubolxcXUTI4As/eMQ9IbpnJ5PM5leNwM1zTgwPO5vgz cHMp7MAzN2mU+15pUcOZ/rtNHOWpM1I1wkwEHrjqcu+sgPLG5mJhRiwj6BsrpkRZzvZd dItwQtaakqgSn4R9Nm8BeLs2dqBKXi2gbpJG4zJiPddzp/CqgEl5M9u+EhcCZcEUw2qH WysA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ispras.ru header.s=default header.b=kx2tqMuC; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ispras.ru Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v6-20020a17090a4ec600b0024bd4a6e728si1895621pjl.92.2023.05.18.08.50.29; Thu, 18 May 2023 08:50:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ispras.ru header.s=default header.b=kx2tqMuC; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ispras.ru Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231343AbjERPoe (ORCPT + 62 others); Thu, 18 May 2023 11:44:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230257AbjERPod (ORCPT ); Thu, 18 May 2023 11:44:33 -0400 Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3482993; Thu, 18 May 2023 08:44:32 -0700 (PDT) Received: from fpc (unknown [46.242.14.200]) by mail.ispras.ru (Postfix) with ESMTPSA id E431444C1015; Thu, 18 May 2023 15:44:28 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru E431444C1015 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru; s=default; t=1684424669; bh=4UYZ2IcmL1KklQiEztNjChNplhlI6pOuen+vw5gYh7c=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=kx2tqMuCqGe7yKU3/GeQZ06VhdriXeA/5nKMyfZ2/eRm8uC60sAgVtUeAfHXwW0bl uLmfY2sEIfZHLxCKrYXbet5anJxGPxhak2PVLtNFYkCynQ2+zmmprL3OScahQwfarS MwHq+3s0EmwsYfyqSfBy7O8XR0VN4ScmK87UtPuc= Date: Thu, 18 May 2023 18:44:24 +0300 From: Fedor Pchelkin To: Hillf Danton Cc: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= , Kalle Vallo , syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com, linux-wireless@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Alexey Khoroshilov , lvc-project@linuxtesting.org Subject: Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Message-ID: <20230518154424.62urbguy4rxetkty@fpc> References: <20230425192607.18015-1-pchelkin@ispras.ru> <20230425230708.2132-1-hdanton@sina.com> <20230426190206.ni2au5mpjc5oty67@fpc> <20230518102437.4443-1-hdanton@sina.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230518102437.4443-1-hdanton@sina.com> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On Thu, May 18, 2023 at 06:24:37PM +0800, Hillf Danton wrote: > Fedor Pchelkin writes: > > > On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: > >> Given similar wait timeout[1], just taking lock on the waiter side is not > >> enough wrt fixing the race, because in case job done on the waker side, > >> waiter needs to wait again after timeout. > >> > > > > As I understand you correctly, you mean the case when a timeout occurs > > during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has > > occurred on a waiter's side, it should return immediately and doesn't have > > to care in which state the callback has been at that moment. > > > > AFAICS, this is controlled properly with taking a wmi_lock on waiter and > > waker sides, and there is no data corruption. > > > > If a callback has not managed to do its work entirely (performing a > > completion and subsequently waking waiting thread is included here), then, > > well, it is considered a timeout, in my opinion. > > > > Your suggestion makes a wmi_cmd call to give a little more chance for the > > belated callback to complete (although timeout has actually expired). That > > is probably good, but increasing a timeout value makes that job, too. I > > don't think it makes any sense on real hardware. > > > > Or do you mean there is data corruption that is properly fixed with your patch? > > Given complete() not paired with wait_for_completion(), what is the > difference after this patch? The main thing in the patch is making ath9k_wmi_ctrl_rx() release wmi_lock after calling ath9k_wmi_rsp_callback() which does copying data into the shared wmi->cmd_rsp_buf buffer. Otherwise there can occur a data corrupting scenario outlined in the patch description (added it here, too). On Tue, 25 Apr 2023 22:26:06 +0300, Fedor Pchelkin wrote: > CPU0 CPU1 > > ath9k_wmi_cmd(...) > mutex_lock(&wmi->op_mutex) > ath9k_wmi_cmd_issue(...) > wait_for_completion_timeout(...) > --- > timeout > --- > /* the callback is being processed > * before last_seq_id became zero > */ > ath9k_wmi_ctrl_rx(...) > spin_lock_irqsave(...) > /* wmi->last_seq_id check here > * doesn't detect timeout yet > */ > spin_unlock_irqrestore(...) > /* last_seq_id is zeroed to > * indicate there was a timeout > */ > wmi->last_seq_id = 0 > mutex_unlock(&wmi->op_mutex) > return -ETIMEDOUT > > ath9k_wmi_cmd(...) > mutex_lock(&wmi->op_mutex) > /* the buffer is replaced with > * another one > */ > wmi->cmd_rsp_buf = rsp_buf > wmi->cmd_rsp_len = rsp_len > ath9k_wmi_cmd_issue(...) > spin_lock_irqsave(...) > spin_unlock_irqrestore(...) > wait_for_completion_timeout(...) > /* the continuation of the > * callback left after the first > * ath9k_wmi_cmd call > */ > ath9k_wmi_rsp_callback(...) > /* copying data designated > * to already timeouted > * WMI command into an > * inappropriate wmi_cmd_buf > */ > memcpy(...) > complete(&wmi->cmd_wait) > /* awakened by the bogus callback > * => invalid return result > */ > mutex_unlock(&wmi->op_mutex) > return 0 So before the patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx() wasn't helpful in case wmi->last_seq_id value was changed during ath9k_wmi_rsp_callback() execution because of the next ath9k_wmi_cmd() call. With the proposed patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx() accomplishes its job as: - the next ath9k_wmi_cmd call changes last_seq_id value under lock so it either waits for a belated ath9k_wmi_ctrl_rx() to finish or updates last_seq_id value so that the timeout check in ath9k_wmi_ctrl_rx() indicates that the waiter side has timeouted and we shouldn't further process the callback. - memcopying in ath9k_wmi_rsp_callback() is made to a valid place if the last_seq_id check was successful under the lock.