Received: by 2002:a05:7412:8d11:b0:fa:4934:9f with SMTP id bj17csp540237rdb; Mon, 15 Jan 2024 05:47:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IE3OwXN9QqtbuxxkbC/EYZ10tq4Uc+BLLdQpuZF9LNW2E1cLRgKo9NUW5RxrpsIuUQuhIXp X-Received: by 2002:ac2:4643:0:b0:50e:7437:163c with SMTP id s3-20020ac24643000000b0050e7437163cmr2464353lfo.119.1705326442567; Mon, 15 Jan 2024 05:47:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705326442; cv=none; d=google.com; s=arc-20160816; b=hAenePvfEF94eC5DpFkxJj1VCTHSvv5yLNPwArkpgDGX0JVhAzS8S/l1fGgyfA+0lE fhQY6r38k3/vRmp62qZYrUMEI5AzBiaqNzE9sqOgCcrmgr+wstg7YFr+FT3O54XvroQJ 7qCnbPNNn9sKzbYbAPuzv3r9LoRzmSkz4q45VbjcmYe4e2dpQrbISaWdJPXvkNVVG2aE aqqRw5UfyCKOWugXaF5yPMf/cSwHC9uLt74mPAf9RBKq0LnZ+ASrp/FHkvORBUDa70KK w435Uw1O1mc18B6Og8306SkrL7mAZ2IDTd1gsdiIfgxSi2MdmZl1oczdM5X0QeoMCoxV GEAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=NUjKD2gi6SVwWe2jyucClOjDBm7lQHBOV+7/9YdX/qA=; fh=0ZYwPOD0Fyxz7MuwhzjYlIrZHhKsvPxMXtJXUSTyzz0=; b=jOILX1yB4mQl7Z3sfZWjXi+VHsAaB/bZSXCuUYL/R/FefzWA3ye9QchcBliKL5MfPi Kc6n77q9I3rtgrtRio6a35Z7mQGOz4wLxK2G6uASMlzzxf+zEOiKTlJBrPCdFRCwq41c MXFk5bRL74r6JNwDpW/5/3wH3bmooBbJi+ALTCUf2IM37PF+0LQN/3fQFDIKOdnO/rc1 BASz7xkZZKorpaKkS1zDWsyWh6UW8rR3HoxAg+e+mOxtZ+EZI509R4R7r8f9t12WdKXx UhpHFezSd9sIQ8vbV3Kv8VrVdQphTuIdiLz/JqF4GDrjHdrdQnl8T9WmqKcNusQdJkkK ESsg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=bCq2m7rz; spf=pass (google.com: domain of linux-kernel+bounces-26050-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-26050-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id ks12-20020a170906f84c00b00a2ccf7acc62si2769245ejb.194.2024.01.15.05.47.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jan 2024 05:47:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-26050-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@ziepe.ca header.s=google header.b=bCq2m7rz; spf=pass (google.com: domain of linux-kernel+bounces-26050-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-26050-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 27A2F1F22146 for ; Mon, 15 Jan 2024 13:47:22 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 76CEB1755D; Mon, 15 Jan 2024 13:47:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="bCq2m7rz" Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B1E21754C for ; Mon, 15 Jan 2024 13:47:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ziepe.ca Received: by mail-qk1-f174.google.com with SMTP id af79cd13be357-7831be84f4eso619358085a.0 for ; Mon, 15 Jan 2024 05:47:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1705326429; x=1705931229; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=NUjKD2gi6SVwWe2jyucClOjDBm7lQHBOV+7/9YdX/qA=; b=bCq2m7rziLTY7k70j4gxHWHhBSbtniIeLwaGT6UMDvpfYEwef1ie5wqBCtsH9C3b2v BzqNQKBbqfe8aOZZ/+Vs3iGq6xQXtsSaACWKOdM+Gn1uKQgLSo88gxOVZfiih339visE x7EgMerJzE13C/GiqeBocZFJqd1OxYRFnYEp4rCRJN/a7oOo2eV893AEzn34nSFb5Jyj HhPWSyes3qbqiFWp5Zu0a9kMMtWrFZYIBaRlayKKlKma9UWsLF9BB7BBsW3Fl0+ZY9P0 /qMFISgFc2IdsjEQ1EgtNjVyDaRH5hgrLOqv8RD/5sbJzoODF13k+Mu3Q5g6IoUbneXj 1MIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705326429; x=1705931229; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=NUjKD2gi6SVwWe2jyucClOjDBm7lQHBOV+7/9YdX/qA=; b=D2bA4IhDBmYH0Tz4UVfGQwWbQvFQwajy9r9TJTTUPFZn4Jjd0gpIncy7r3pO2Myl59 3DjsKkzfsgIGfVJIv7mzSkMw9aky2ipVLw+VOQdYT4VhjsE5AqUuUDpI2GLNb0VFhYHy Toz2CkOujaGHqbvTQQ/ERRqwYfh3atJySf2jE8vuJHYret45TdhhFUIFF7TianauAh6X cTwUV4HY24WBuAYbZEKonwkOV7zYh88QYc4ssuGkMdbVmla+8/qjTFJISJ67b73gXzi/ NyIEZueHv4HJg48Tr+/pUOTxlzr7y7vWpXj/qgWnmXImW9jYAg8RvsL9Zu++AiTTq6+T LQyQ== X-Gm-Message-State: AOJu0YzjwTFMdMbsARq8hc3jT7/W8kVcVp223zQhXdHOCdSENjYnRM1Y eRxf27gh2QW0PC2H77GUAYzj/kZHeABt9A== X-Received: by 2002:a05:620a:3908:b0:783:14a9:3065 with SMTP id qr8-20020a05620a390800b0078314a93065mr7912816qkn.5.1705326428869; Mon, 15 Jan 2024 05:47:08 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-68-80-239.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.80.239]) by smtp.gmail.com with ESMTPSA id vr28-20020a05620a55bc00b0078199077d0asm2945492qkn.125.2024.01.15.05.47.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jan 2024 05:47:07 -0800 (PST) Received: from jgg by wakko with local (Exim 4.95) (envelope-from ) id 1rPNIx-003qJf-2i; Mon, 15 Jan 2024 09:47:07 -0400 Date: Mon, 15 Jan 2024 09:47:07 -0400 From: Jason Gunthorpe To: Ding Hui Cc: leon@kernel.org, wenglianfa@huawei.com, gustavoars@kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, Shifeng Li , Shifeng Li Subject: Re: [PATCH] RDMA/device: Fix a race between mad_client and cm_client init Message-ID: <20240115134707.GZ50608@ziepe.ca> References: <20240102034335.34842-1-lishifeng@sangfor.com.cn> <20240103184804.GB50608@ziepe.ca> <80cac9fd-7fed-403e-8889-78e2fc7a49b0@sangfor.com.cn> <20240104123728.GC50608@ziepe.ca> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Sat, Jan 06, 2024 at 10:12:17AM +0800, Ding Hui wrote: > On 2024/1/4 20:37, Jason Gunthorpe wrote: > > On Thu, Jan 04, 2024 at 02:48:14PM +0800, Shifeng Li wrote: > > > > > The root cause is that mad_client and cm_client may init concurrently > > > when devices_rwsem write semaphore is downgraded in enable_device_and_get() like: > > > > That can't be true, the module loader infrastructue ensures those two > > things are sequential. > > > > Please consider the sequence again and notice that: > > 1. We agree that dependencies ensure mad_client be registered before cm_client. > 2. But the mad_client.add() is not invoked in ib_register_client(), since > there is no DEVICE_REGISTERED device at that time. > Instead, it will be delayed until the device driver init (e.g. mlx5_core) > in enable_device_and_get(). > 3. The ib_cm and mlx5_core can be loaded concurrently, after setting DEVICE_REGISTERED > and downgrade_write(&devices_rwsem) in enable_device_and_get(), there is a chance > that cm_client.add() can be invoked before mad_client.add(). > > > T1(ib_core init) | T2(device driver init) | T3(ib_cm init) > --------------------------------------------------------------------------------------------------- > ib_register_client mad_client > assign_client_id > add clients CLIENT_REGISTERED > (with clients_rwsem write) > down_read(&devices_rwsem); > xa_for_each_marked (&devices, DEVICE_REGISTERED) > nop # no devices > up_read(&devices_rwsem); > > ib_register_device > enable_device_and_get > down_write(&devices_rwsem); > set DEVICE_REGISTERED > downgrade_write(&devices_rwsem); > ib_register_client cm_client > assign_client_id > add clients CLIENT_REGISTERED > (with clients_rwsem write) > down_read(&devices_rwsem); > xa_for_each_marked (&devices, DEVICE_REGISTERED) > add_client_context > down_write(&device->client_data_rwsem); > get CLIENT_DATA_REGISTERED > downgrade_write(&device->client_data_rwsem); > cm_client.add > cm_add_one > ib_register_mad_agent > ib_get_mad_port > __ib_get_mad_port return NULL! > set CLIENT_DATA_REGISTERED > up_read(&device->client_data_rwsem); > up_read(&devices_rwsem); > down_read(&clients_rwsem); > xa_for_each_marked (&clients, CLIENT_REGISTERED) > add_client_context [mad] > mad_client.add > add_client_context [cm] > nop # already CLIENT_DATA_REGISTERED > up_read(&clients_rwsem); > up_read(&devices_rwsem); Take the draft I sent previously and use down_write(&devices_rwsem) in ib_register_client() Jason