Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4227528imm; Wed, 30 May 2018 01:20:06 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLuRAHhcmq9ko/KTTYD3c0MVLZLzzeFHsJFe8sN2KYrvOMyeYxIfF5FLpSKCkYBca3QkvXB X-Received: by 2002:a17:902:bb8d:: with SMTP id m13-v6mr1942687pls.46.1527668405964; Wed, 30 May 2018 01:20:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527668405; cv=none; d=google.com; s=arc-20160816; b=uvtpnqpSZkOAdQAc6bJltwagej1MR+nOtGq90yQKIwWmcxVohDDuuSCBAFRu+lJ69k wXWeCOaDTzrT0x6Z5B9gwvXr7TeO0yY7oTqmXmEWZ07egsGFqWgf8gcUuFwlzr9rIop6 1UDfF9nkXXXhRXzIum5/vBE5YD2sdkf3f18e68JJXUFQvjWU7Fw918ejQlKdnpCGVYy3 jcBHcna210NolmIbAOfpfAc+f9D1y/Y8dDYVMbDmjVM1qVhTR2eaOeajVpDVpYTGFzsR DuGBwDgWng9ftyw1VW5BJ0NciRZjD4ykIgpUZy0jx9DoxQVD51MTHuyylwG2qk7Cf6yw ixSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature:arc-authentication-results; bh=ge6IT0LiFZ/o/v9Qbgu5mCEdYjX/Tk/44M12L0wfjaM=; b=G+K9NYdP0dlmy5fpXPYAgcoe6A8KbFk9QTMOAHVNkUqJ7YmRvdFB2o6xM1XliDc+ib YbIc3kO+LfoO0nfrtdWrwZlzMxJxfuqAuuzHgmTyBDCGpTltZvojhmIwT46fqc2MaBGy DMG5tavZMXfbjRLEKEiEg6VFcvHp7PTS88R73N8lQIcruY1Gnfr4YNYjZq1fgLVFoGP1 3BnkcE1PeFvFK/Lquyvs+345XzVMuljLH6bh+Vpi33OcvszSq8xrMcM27dzppTY6hva2 r849c6km3jiY6ps8VEgSocNtYukdyALGwCHgZ8fWDhxFKkJS97DiYqVc1bO7r5NDGx9L MKDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dev-mellanox-co-il.20150623.gappssmtp.com header.s=20150623 header.b=f/vhiHgQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mellanox.co.il Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f10-v6si34658767pln.443.2018.05.30.01.19.52; Wed, 30 May 2018 01:20:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@dev-mellanox-co-il.20150623.gappssmtp.com header.s=20150623 header.b=f/vhiHgQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=mellanox.co.il Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965041AbeE3ITM (ORCPT + 99 others); Wed, 30 May 2018 04:19:12 -0400 Received: from mail-wr0-f194.google.com ([209.85.128.194]:36266 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S937253AbeE3ITB (ORCPT ); Wed, 30 May 2018 04:19:01 -0400 Received: by mail-wr0-f194.google.com with SMTP id f16-v6so13213352wrm.3 for ; Wed, 30 May 2018 01:19:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dev-mellanox-co-il.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=ge6IT0LiFZ/o/v9Qbgu5mCEdYjX/Tk/44M12L0wfjaM=; b=f/vhiHgQekj2bVzbsXDj/Kof537iT0vDvoA3kQBUd3GDCBM+JCLhIks2UPsc8rfy7y U0lBI/i50VFZjKhSnA4of8D5YGr6hQITw5soskGAjx5L2VHywjLd6sTkioD1QuKnZg6w n9xBj/dkrN7Xim/MqzwABAEgC8d+jlIkKMasi2pgiSetJueNdErvTpQW4sjeiCWbQCCI pW3EDNL+NOvG9AnyOyrACAVw/7FHvZoMxAQhpu8jyKji6qSrd7rb6LMKnjqlmLBJ2MPD 073INqRWQ7CGWl529lGN5XSSMcTfqEZbLTJbl2MjEBcON3WhayQam95+ii36nIWO3YYP eaYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=ge6IT0LiFZ/o/v9Qbgu5mCEdYjX/Tk/44M12L0wfjaM=; b=VDyIGO+NPlkW/Oil1rQSKw3FSI2UZc6UpenkCgWfsiqozRhF71JgQJgyuKkv1tJF38 6seGGQcoAJXtVtojUM30FODBYNC2ZHByN3vqF04XHCNz/iulSrq8qaCWrNvOY7nGMSef XDZMtZyIN0m7NJmZd7p4WpP+UJm8z2DhQgRc4cylyjaug3MovKAGHUipaUCrtWuQhnUm SwBqk/7VBFejQRqXkD4yMnmoaJoOGHGoj7OejfHOYiQ+Q1HJl4xh3U9XgKvwziB8QEKc AZY3/EqVZZA4pAE7qIP4Jr+VMhRmJUJ36yZvjjXcjMvaX5fIG5/aZUZg0JFqlOzEqy6G i7aQ== X-Gm-Message-State: ALKqPwcd8tVV7l7xgtHHLADVYiMCL/1Da0RaIc1sM0muJR6XkyGk08jA //l2NhZyW+y0cvSgi/eTAMZpkA== X-Received: by 2002:adf:e501:: with SMTP id j1-v6mr1199998wrm.186.1527668339761; Wed, 30 May 2018 01:18:59 -0700 (PDT) Received: from localhost ([188.120.148.224]) by smtp.gmail.com with ESMTPSA id u108-v6sm26140927wrc.40.2018.05.30.01.18.57 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 30 May 2018 01:18:59 -0700 (PDT) Date: Wed, 30 May 2018 11:18:56 +0300 From: jackm To: Leon Romanovsky Cc: Hans Westgaard Ry , Doug Ledford , Jason Gunthorpe , Hakon Bugge , Daniel Jurgens , Parav Pandit , Pravin Shedge , linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] IB/mad: Use ID allocator routines to allocate agent number Message-ID: <20180530111856.0000430f@dev.mellanox.co.il> In-Reply-To: <20180529095445.GG3697@mtr-leonro.mtl.com> References: <20180529073808.27735-1-hans.westgaard.ry@oracle.com> <20180529085459.GF3697@mtr-leonro.mtl.com> <20180529095445.GG3697@mtr-leonro.mtl.com> Organization: Mellanox X-Mailer: Claws Mail 3.15.0 (GTK+ 2.24.31; i686-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 29 May 2018 12:54:45 +0300 Leon Romanovsky wrote: > On Tue, May 29, 2018 at 11:54:59AM +0300, Leon Romanovsky wrote: > > On Tue, May 29, 2018 at 09:38:08AM +0200, Hans Westgaard Ry wrote: > > > The agent TID is a 64 bit value split in two dwords. The least > > > significant dword is the TID running counter. The most significant > > > dword is the agent number. In the CX-3 shared port model, the mlx4 > > > driver uses the most significant byte of the agent number to > > > store the slave number, making agent numbers greater and equal to > > > 2^24 (3 bytes) unusable. The current codebase uses a variable > > > which is incremented atomically for each new agent number giving > > > too large agent numbers over time. The IDA set of functions are > > > used instead of the simple counter approach. This allows re-use > > > of agent numbers. A sysctl variable is also introduced, to > > > control the max agent number. > > > > Why don't you simply limit this number per-driver? By default, any > > variable is allowed and mlx4_ib will set something else. It is much simpler to do things here -- the current allocation scheme does have a wrap-around problem, so there is a good reason for using a global allocator residing in the ib core. > > > > What is the advantage of having sysctl? > > Anyway, it doesn't pass smoke test. REASON: The start argument in ida_simple_get should be 1, not 0: + ib_mad_client_id = ida_simple_get(&ib_mad_client_ids, + 1, + ib_mad_sysctl_client_id_max, + GFP_KERNEL); The agent ID 0 is interpreted as -no agent-, is used by snoop agents, and is never allocated. The first allocated agent id is 1: static atomic_t ib_mad_client_id_min = ATOMIC_INIT(0); ... mad_agent_priv->agent.hi_tid =atomic_inc_return(&ib_mad_client_id); If this is fixed, we do not get the panic. However, please note Jason's feedback and my suggestion -- this ida-based interface needs to avoid immediate re-use of freed ids. -Jack > > [ 126.428407] RPC: Unregistered rdma transport module. > [ 126.428513] RPC: Unregistered rdma backchannel transport module. > [ 194.664081] IPv6: ADDRCONF(NETDEV_UP): ib0: link is not ready > [ 209.068702] BUG: unable to handle kernel NULL pointer dereference > at 0000000000000070 [ 209.068858] PGD 80000000341cf067 P4D > 80000000341cf067 PUD 34188067 PMD 0 [ 209.068941] Oops: 0002 [#1] > SMP PTI [ 209.069006] Modules linked in: netconsole nfsv3 nfs fscache > mlx4_ib(-) mlx4_en mlx4_core devlink ib_ipoib rdma_ucm ib_ucm > ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core dm_mirror > dm_region_hash dm_log dm_mod nfsd pcspkr i2c_piix4 auth_rpcgss > nfs_acl lockd grace sunrpc ip_tables ata_generic cirrus > drm_kms_helper pata_acpi syscopyarea sysfillrect sysimgblt > fb_sys_fops ttm drm e1000 virtio_console i2c_core serio_raw ata_piix > floppy [last unloaded: mlxfw] [ 209.069312] CPU: 4 PID: 11048 Comm: > modprobe Not tainted > 4.17.0-rc7-2018-05-29_11-04-56_Hans_Westgaard_Ry__hans_westga #1 > [ 209.069413] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 > [ 209.069486] RIP: 0010:_raw_spin_lock_irqsave+0x1e/0x40 > [ 209.069536] RSP: 0018:ffffc90000b4fd70 EFLAGS: 00010046 > [ 209.069591] RAX: 0000000000000000 RBX: 0000000000000246 RCX: > ffffea0004d7ed00 [ 209.069653] RDX: 0000000000000001 RSI: > 0000000000000000 RDI: 0000000000000070 [ 209.069717] RBP: > 0000000000000000 R08: ffff88013446fc00 R09: 000000018010000f > [ 209.069778] R10: 0000000000000001 R11: ffff88013446fc00 R12: > 0000000000000070 [ 209.069849] R13: 0000000000000202 R14: > 0000000000000000 R15: 0000000000000000 [ 209.069915] FS: > 00007fc34caf7740(0000) GS:ffff88013fd00000(0000) > knlGS:0000000000000000 [ 209.069987] CS: 0010 DS: 0000 ES: 0000 > CR0: 0000000080050033 [ 209.070043] CR2: 0000000000000070 CR3: > 000000008853e000 CR4: 00000000000006e0 [ 209.070128] Call Trace: > [ 209.070189] ib_unregister_mad_agent+0x2d/0x540 [ib_core] > [ 209.070260] ? __slab_free+0x9a/0x2d0 [ 209.070332] > ib_agent_port_close+0xad/0xf0 [ib_core] [ 209.070396] > ib_mad_remove_device+0x59/0xb0 [ib_core] [ 209.070466] > ib_unregister_device+0xd4/0x180 [ib_core] [ 209.070537] > mlx4_ib_remove+0x67/0x1f0 [mlx4_ib] [ 209.070594] > mlx4_remove_device+0x93/0xa0 [mlx4_core] [ 209.070648] > mlx4_unregister_interface+0x37/0x90 [mlx4_core] [ 209.070705] > mlx4_ib_cleanup+0xc/0x4db [mlx4_ib] [ 209.072113] > __x64_sys_delete_module+0x15b/0x260 [ 209.073567] ? > exit_to_usermode_loop+0x7f/0x95 [ 209.074945] > do_syscall_64+0x48/0x100 [ 209.076448] > entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 209.077799] RIP: > 0033:0x7fc34bfe36b7 [ 209.079122] RSP: 002b:00007ffc8ffa98b8 EFLAGS: > 00000206 ORIG_RAX: 00000000000000b0 [ 209.080500] RAX: > ffffffffffffffda RBX: 00000000013455c0 RCX: 00007fc34bfe36b7 > [ 209.081875] RDX: 0000000000000000 RSI: 0000000000000800 RDI: > 0000000001345628 [ 209.083265] RBP: 0000000000000000 R08: > 00007fc34c2a8060 R09: 00007fc34c053a40 [ 209.084655] R10: > 00007ffc8ffa9640 R11: 0000000000000206 R12: 0000000000000000 > [ 209.086028] R13: 0000000000000001 R14: 0000000001345628 R15: > 0000000000000000 [ 209.087416] Code: 66 66 66 66 2e 0f 1f 84 00 00 > 00 00 00 0f 1f 44 00 00 53 9c 58 0f 1f 44 00 00 48 89 c3 fa 66 0f 1f > 44 00 00 31 c0 ba 01 00 00 00 0f b1 17 85 c0 75 05 48 89 d8 5b > c3 89 c6 e8 1e c9 81 ff eb [ 209.090262] RIP: > _raw_spin_lock_irqsave+0x1e/0x40 RSP: ffffc90000b4fd70 [ 209.091720] > CR2: 0000000000000070 [ 209.093137] ---[ end trace 7b8a6faa27868861 > ]--- [ 209.094546] Kernel panic - not syncing: Fatal exception > [ 209.096910] Kernel Offset: disabled [ 209.098291] ---[ end Kernel > panic - not syncing: Fatal exception ]--- > > Thanks > > > > > Thanks > >