Received: by 2002:a05:7412:98c1:b0:fa:551:50a7 with SMTP id kc1csp493723rdb; Fri, 5 Jan 2024 18:21:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IEIPOHfqP6EvlUhgmw6iwx5uT1vhwYgo6MyVhNYObc1r5vMgYI6n7Gc3Mbayn3rSw53hK// X-Received: by 2002:ac8:7fd0:0:b0:429:7859:5e07 with SMTP id b16-20020ac87fd0000000b0042978595e07mr479742qtk.15.1704507665359; Fri, 05 Jan 2024 18:21:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704507665; cv=none; d=google.com; s=arc-20160816; b=S/FM7zykoDhkcNDimodSIH7tmk/R7nVJMjwQ96Y0PPvN/7T/fK63KsZBV0CDwDWnsW W5ifs81a9001psnNrd/UvXLKi93ZcIa/cUQuN7cGpfecloZck4KSuc24Nl7Oj2+En6S5 lS6sgWbIpSrkvxEeYQNulCxN+g5x8kK5H39oUYByZyJlHKUyZEVrM9HtUOD/54i7Mxuf F3SOY3s8Px/XmMOfJ/2b8jVCrODmrUJitayu5lwUzzQpxQjCireNO62X6maHbr3Jevhy uQTNBhuLvMVJxQbBx7RTfnOjNAMEhuIXj2D4Ua96NDKCjQz4gkL62WTsecZugt4IBY+T CgcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=SRaXtGIsIVJlkQBOE5iNUjSHi4Hzlgzks6/6ARRXLjc=; fh=WAN+XcZM1LTayooiTt42P21sirsJGj7aFANkADJSDZ0=; b=H+56Ax3ABKny5ToB5RZrlHYbFpKswKR/X7E+svhdPkEmYZ5U+XE373H3aoKZrZGMxY whsRbXjI5l3r7bQevFfQtxZ9ifAzWF3BNa6VxBU+BbZ10yhXI0MruItd131k/QvtFqON TX0HXnz1NSB51DHhlkBUEUOyCbIKfEN15vEmre9uivLT0ewGe7lUo1iXycLew23Zdpgf gMzm8k3oL9txZapHdSR57GDk1C4r0Jrr19aEaXHPaSjjQjZ9m+lIWCvKh4yZ3M3XhB34 7sh9RrRzeMGZ9rN8neKzg+BWBR8gERxkTy6Yx0efMuSeUFEyZ2r5/F0rizLG1+w+D6D2 g0+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-18457-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-18457-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sangfor.com.cn Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id o12-20020a05622a044c00b004283795acbbsi3196725qtx.527.2024.01.05.18.21.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jan 2024 18:21:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-18457-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-18457-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-18457-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sangfor.com.cn Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 170631C232BF for ; Sat, 6 Jan 2024 02:21:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1575017CB; Sat, 6 Jan 2024 02:20:56 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from mail-m25491.xmail.ntesmail.com (mail-m25491.xmail.ntesmail.com [103.129.254.91]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 183BC1381; Sat, 6 Jan 2024 02:20:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=sangfor.com.cn Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=sangfor.com.cn Received: from [0.0.0.0] (unknown [IPV6:240e:3b7:3270:7fa0:85a6:e42d:4e25:cac7]) by mail-m12773.qiye.163.com (Hmail) with ESMTPA id 0C4152C03C5; Sat, 6 Jan 2024 10:12:19 +0800 (CST) Message-ID: Date: Sat, 6 Jan 2024 10:12:17 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init Content-Language: en-US To: Jason Gunthorpe Cc: leon@kernel.org, wenglianfa@huawei.com, gustavoars@kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, Shifeng Li , Shifeng Li References: <20240102034335.34842-1-lishifeng@sangfor.com.cn> <20240103184804.GB50608@ziepe.ca> <80cac9fd-7fed-403e-8889-78e2fc7a49b0@sangfor.com.cn> <20240104123728.GC50608@ziepe.ca> From: Ding Hui In-Reply-To: <20240104123728.GC50608@ziepe.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFITzdXWS1ZQUlXWQ8JGhUIEh9ZQVkaSE0fVkMeSU5CQhgeTBkfQlUTARMWGhIXJBQOD1 lXWRgSC1lBWUlPSx5BSBlMQUhJTEtBTB0aS0FDThpNQR5PSR9BTx5JTkEYGhhMWVdZFhoPEhUdFF lBWU9LSFVKTU9JTE5VSktLVUpCS0tZBg++ X-HM-Tid: 0a8cdc8ae567b249kuuu0c4152c03c5 X-HM-MType: 1 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6PTI6FSo4QzweFD5NTQIDDiFI TklPCQxVSlVKTEtPTktMSkhCQ0NMVTMWGhIXVR8SFRwTDhI7CBoVHB0UCVUYFBZVGBVFWVdZEgtZ QVlJT0seQUgZTEFISUxLQUwdGktBQ04aTUEeT0kfQU8eSU5BGBoYTFlXWQgBWUFNSkpLNwY+ On 2024/1/4 20:37, Jason Gunthorpe wrote: > On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote: > >> The root cause is that mad_client and cm_client may init concurrently >> when devices_rwsem write semaphore is downgraded in enable_device_and_get() like: > > That can't be true, the module loader infrastructue ensures those two > things are sequential. > Please consider the sequence again and notice that: 1. We agree that dependencies ensure mad_client be registered before cm_client. 2. But the mad_client.add() is not invoked in ib_register_client(), since there is no DEVICE_REGISTERED device at that time. Instead, it will be delayed until the device driver init (e.g. mlx5_core) in enable_device_and_get(). 3. The ib_cm and mlx5_core can be loaded concurrently, after setting DEVICE_REGISTERED and downgrade_write(&devices_rwsem) in enable_device_and_get(), there is a chance that cm_client.add() can be invoked before mad_client.add(). T1(ib_core init) | T2(device driver init) | T3(ib_cm init) --------------------------------------------------------------------------------------------------- ib_register_client mad_client assign_client_id add clients CLIENT_REGISTERED (with clients_rwsem write) down_read(&devices_rwsem); xa_for_each_marked (&devices, DEVICE_REGISTERED) nop # no devices up_read(&devices_rwsem); ib_register_device enable_device_and_get down_write(&devices_rwsem); set DEVICE_REGISTERED downgrade_write(&devices_rwsem); ib_register_client cm_client assign_client_id add clients CLIENT_REGISTERED (with clients_rwsem write) down_read(&devices_rwsem); xa_for_each_marked (&devices, DEVICE_REGISTERED) add_client_context down_write(&device->client_data_rwsem); get CLIENT_DATA_REGISTERED downgrade_write(&device->client_data_rwsem); cm_client.add cm_add_one ib_register_mad_agent ib_get_mad_port __ib_get_mad_port return NULL! set CLIENT_DATA_REGISTERED up_read(&device->client_data_rwsem); up_read(&devices_rwsem); down_read(&clients_rwsem); xa_for_each_marked (&clients, CLIENT_REGISTERED) add_client_context [mad] mad_client.add add_client_context [cm] nop # already CLIENT_DATA_REGISTERED up_read(&clients_rwsem); up_read(&devices_rwsem); > You are trying to say that the post-client fixup stuff will still see > the DEVICE_REGISTERED before it reaches the clients_rwsem lock? > > That probably just says the clients_rwsem should be obtained before > changing the DEVICE_STATE too :\ > -- Thanks, - Ding Hui