Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030198AbbKDL7x (ORCPT ); Wed, 4 Nov 2015 06:59:53 -0500 Received: from mail-am1on0062.outbound.protection.outlook.com ([157.56.112.62]:19164 "EHLO emea01-am1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754522AbbKDL7q (ORCPT ); Wed, 4 Nov 2015 06:59:46 -0500 Authentication-Results: spf=pass (sender IP is 193.47.165.134) smtp.mailfrom=mellanox.com; obsidianresearch.com; dkim=none (message not signed) header.d=none;obsidianresearch.com; dmarc=pass action=none header.from=mellanox.com; Subject: Re: RFC rdma cgroup To: Parav Pandit References: <563233D7.90808@mellanox.com> <56376889.2080908@mellanox.com> CC: Tejun Heo , Doug Ledford , "Hefty, Sean" , "linux-rdma@vger.kernel.org" , "cgroups@vger.kernel.org" , Liran Liss , "linux-kernel@vger.kernel.org" , "lizefan@huawei.com" , Johannes Weiner , Jonathan Corbet , "james.l.morris@oracle.com" , "serge@hallyn.com" , Or Gerlitz , Matan Barak , "raindel@mellanox.com" , "akpm@linux-foundation.org" , "linux-security-module@vger.kernel.org" , Jason Gunthorpe From: Haggai Eran Message-ID: <5639F2E3.8090101@mellanox.com> Date: Wed, 4 Nov 2015 13:58:27 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.0.52.254] X-EOPAttributedMessage: 0 X-Microsoft-Exchange-Diagnostics: 1;DB3FFO11OLC004;1:FLJ+v1BwE0GMynO/vcMCIgQN+O8sJnSCzu4mk9ol53sMAiv9NOgrGlOIJlwcJN58wTJ/AIM0YrfQTO+vn8u5x6as9Y6VJ1uTDpgo4xytfcfMrtbPZWp51KSfBAcIexRTCvncCaO7WXq/2jA6ExoBnAkNsGTWBy+BBt7A+k4bZIdXlm+Qm1EqwLDvaZ420ofmez1vHUi7vd3Q9tzHHNcnHs5QYmzTRuCd1rAQUHh6/uRRGd/EyXXLqf3T0U0pKCvGYRK3W85wF/KRo7vq9QUvTK/1H9h2lSgYMzWaGp8cQDYocQhmFULG/agcg4nuMT33sCnZ8/eLv6S0jpWvUpTL3uILk4/2DF2OMRsxUlTKOUlLctSx/FBc0BT3LMY48fWwVWo06NZ83Q/vBmnI9TSsHA== X-Forefront-Antispam-Report: CIP:193.47.165.134;CTRY:IL;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10009020)(6009001)(2980300002)(438002)(479174004)(199003)(24454002)(189002)(59896002)(47776003)(80316001)(83506001)(19580395003)(106466001)(87936001)(5007970100001)(23676002)(86362001)(93886004)(64126003)(65956001)(76176999)(50986999)(54356999)(65806001)(5004730100002)(87266999)(65816999)(11100500001)(92566002)(36756003)(110136002)(189998001)(50466002)(230700001)(77096005)(5008740100001)(33656002)(6806005)(2950100001)(4001350100001)(97736004)(3940600001);DIR:OUT;SFP:1101;SCL:1;SRVR:DB4PR05MB352;H:mtlcas13.mtl.com;FPR:;SPF:Pass;PTR:ErrorRetry;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;DB4PR05MB352;2:sle998U7DvFyvB6JZZ95fvSLtOBtB2IunL79nPq5RiEd2c0u6qbdHgKES/QATl4R9aZyza63Z6Bo/pipRRLXkLiOzO1fot+NKT8dyHpkzAV7G8wayyWuRv44tWexWrE1Tnl03jqITgIlGiIrZe+KF6gyh2x7Kdg1QUkejbZrhTc=;3:Yzn7qExTlsul1+mtJleAsn84pnaOg9Aj7yWd8Q26wgTNIhFxMImrc4TMFZSCAu6J6kNMfLFimaUBDf+B8JcUMkH8ejbbVcszCvUmgrbHphBD+eCKUWgHD6TJSaX4JOXG/gfZeZx0LmChjOJB6bxnt1J/SuMfFr3Fz0w8QGr2BMVf9Cb+dygWLeK4oOJdY3cne9q0vpd19YPU1ZA8syWMH/LTW4wwIdEd10GJKIyPaR7+xR9cI4dNId4NrIusPGr2B5FzKM5nbu06BgIROTtHDA==;25:VFj/4OIS6AtaLMztWacN80Hyd9GTgz0t/blkWHvlihWF2DcqsEFAW+1dUak0iBb/8CrfBt4hlvErBvXyvKDWIoaUf485x0w6bUyDxUhVwDFxcR3hPWiBmemHflz/iQQC6pHsfafyuhjbpV7YfPYxr0eUCxjVh2dr48zV7J8DbK90Lu5a6+Ei3XyYJ5omS2VLVz2LIr5OK2nHQwZW1t2VvftQijWCs9GGpdbRogUk2TaDWqOYugfPC8tm+k94Oa0h X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(8251501001);SRVR:DB4PR05MB352; X-Microsoft-Exchange-Diagnostics: 1;DB4PR05MB352;20:TxoXNFfLhMew/nt9WomZC8Brsj6Zen1XkfBPCWEWGmJFbJ+6lkHl8k5oJtTpWHjzseJbvxT2OEOyodOJGTYH8ST5slpT33P2TjXDu7OrixC63QEjGcGJU+5PL59B2c8/DCzf1Lvo5cZ8SE5IL1/bj/vASM0+CkQbq5tQyCF+F4TQ13EYTS/bjbYepmnVfMVG3bChlaUl16x4h76lMXPbCzLWVgNKEwOp2k9mSfLrOPiD7HYCKjZzbc2k3xAGDi8X6i5cuXVCLUm+lNP0EG9sJ27yn7JNoDbiBmnWGNuqSaDsOdvKdIiXUgxwOcy1TFMovpI1Uj+DpI4BB1qSoQexANRR7/RWV0Ii94TG0UGycc2KJMz0l1R1FitKUsHDoLjFhEkJ2Oy9eBwKJA2CuXR5D5wiAw3tQZEiI8QxdkwmpmtVO0oCOEx40ab1yQSNam1ppKIzDrsanlW8QfxXPgCk26DQd9V4q8MQEou3Nt/nrfOdUx2EaZXoD488U5dgKM4q;4:HKS+fvPzdxHFVt71PGxIunATGJEyRqNKRsNl0XJr7QDVFdPsSCoz+yzH8mzpt61sBjE1YPV4vnqqIVgOCZ+PmK/6iXsgfNJ+YGzUgrVhpqnl/VU7rtSmMjHJUrWQ3ws+Js9dN4IyOB/XCQBU/VOe5ebhKd6tSdCDO7wDPZiCzOANEQ1k9Pcwkfr0QnlUW81H82ZAlVyiJndJBYKYstbR/p5tdARCySi3W0J+Tm7YJTvKTBCgCm693rUr96aSCQMO05vJbGiNKJpiXtZ8WsR5YjvL0bZ2zBI3909KMqAUD6HNODnOz52uTtLZTn9hTj2v+JlAwY67xzwLTVxYXCLi87ivVhomF/5hr7uaEMHzBDN+ptnpqLI95krdFZ/9FCUV X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(520078)(10201501046)(3002001);SRVR:DB4PR05MB352;BCL:0;PCL:0;RULEID:;SRVR:DB4PR05MB352; X-Forefront-PRVS: 0750463DC9 X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtEQjRQUjA1TUIzNTI7MjM6Z0tINXRSamZVZ2Zvb2JzQXFtQzFCc1l4Qkpj?= =?utf-8?B?RHI1eEpLM0IyOTV6Y2tad1JLL1dtQmhIK1BxN1lZRVhEa3lKRXpUakFNamZn?= =?utf-8?B?TGprOXNWY285M2tNcnhRcElVZllheEE4c0xFZDNLUGFDMFVXRTYyQmhoS2ho?= =?utf-8?B?S3Zyc1Y2Q3I2ME1GSEh4Q05HNDhwc1dFOWNINnZGSDdzdDcwNXZaWGVCY3NO?= =?utf-8?B?Q09FaUZYZ0Z3ODZJTmZBc2gyZlZMNmFUTlFxR0N3V08zVG1ObzFINGo4OGZR?= =?utf-8?B?OTBOK0M1SW5zc1dXbTNJOUV6NUJHNVo3ZjRhWDIyRTNQSXQ3M1hoMFdQQXRN?= =?utf-8?B?enRuT2tQL0oxRUJ2ZTBaZzFWcm5YU0VqUFQ5V0ttMGJHTlozd0ZJcE5QNHpC?= =?utf-8?B?eEZQNDJpMERtM29wdlB2Y1JhS3NjZjZrMDRvTlpWZm14UnZFMDBlRWRmTTJY?= =?utf-8?B?V2t1eWE2S1ludWxyMXp2dTdYaTBob0s1K3ZzdnA5RVU2eHl3aFU4WHhhaUdq?= =?utf-8?B?dC9WSHZ5eDB3SXZSN0tUU0xmY2trQ3NOSWtGMnpFQ3dYeS8yc1oxalhVd29x?= =?utf-8?B?ckxONkZHYXFybTRYM2tTdW5HRURteG9jQTgrNjQveVJ5UmlTWGk5S3FsRndD?= =?utf-8?B?SzFyanZGd0RDdUc2NE0wSUt3N1M3eHBnZjNKc2VHYm0xYnJwL3ROa2ZFMm1S?= =?utf-8?B?bWdVbVlnSkVhQ1dmMkZOVnpCeEFvZ3VmVys5RW9DQm5Yb2lieC9Wa1RrKy9o?= =?utf-8?B?d3ZiZm92V1JLNHU4ZzVQQm1RQXZKRWwyOFZVRHpaYXRoWWxzWThaR2Z1SllD?= =?utf-8?B?N2RYeFhKazZJQUtFK1FWQXBHMmV6VXg4RlRSZGxqMzJlVG12N3RxUU9lcDhk?= =?utf-8?B?bHFtMGtnVkxqMS9oblo4TjFTalhMNCsvUnJEMWF1U3ZOUUJ5TTNHdVl2dWc1?= =?utf-8?B?TnBEYmg3N1pkYjVJRzd5SThaeUZKMDFTQjd2OHc5dDBuN0lqc2gyMnJNMDM5?= =?utf-8?B?ZVl6RXQwNGpOYUdDcysydmt3SW1IRy83VmFtK1BQSTNhSkU5TFpJKy9LZ1NT?= =?utf-8?B?QzVhUlhlYTMxYVVMdXVwSHVNQ0tMa0RiM2VGbSs3VXZtNldPL2VCNXJlbk56?= =?utf-8?B?L3VYckZVYXBHbmpoekZkajRXM25ob0JjN3dkOGF2L3hzLzcwZlJReHBuTWNC?= =?utf-8?B?YWNpcHNFWThtTk9yOFF0Mkk3NWR2a2U5WUNCYk5VSHJrRjdMVGRQOE1NNThS?= =?utf-8?B?YVE2aVg1M2tNbWNDT0lNUmJZM2doZWk4U3YxY0x5L28wQ2tiYmdLb0FGWjFy?= =?utf-8?B?RXZ3MFRteW5PQ21KSmxwYkpXQkxOYkRYNnNrS3d6c1ZxS3YvbnR4UlJyL2d0?= =?utf-8?B?U1A4ZW4xUmJ1Mkl3d0drRTVRMHVnUGZOY1VMc3JzSHluSldybHltb0diWXE5?= =?utf-8?B?bkp2QldYcUpZa1ptQUdERzNuWThtaHhKSzltNHlOaUNnYnBFS3R4Z0lvalRI?= =?utf-8?Q?tRxN7I6Bz+Ddc7x280g+pbzE=3D?= X-Microsoft-Exchange-Diagnostics: 1;DB4PR05MB352;5:id/SKLt4IbLlswcDrqSI+qgcH+xefkbi/EarLlOp/N0nTJxcSaA9Ld4zlesSoMy3yM/R6UWlcB857GpehDzkfTd+1j6f5TCJAGWTlNQ52rfbNqa32AoGWVmLN1gm2BhnS/oym7+gbL5/+XPaiJ4Vlw==;24:JspkbGY9JlHLv1iam/5f7hREsaf3p7oOAYGGEcbz/f++guogSi5hxBQ0IDuNeSVfZbiboiMzbhg3O9iu4tBGi98WCTpw04b5YKzpCqwF6cY=;20:xU8XA2Z7iyMJiuRrLBBaaWT9rZaHiddkKh6V1dsJMT/KIwIZjvueRWKBkDvhCpGPuniKRRYtjNHiCAMN/aDyoA== SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Nov 2015 11:59:41.4765 (UTC) X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=a652971c-7d2e-4d9b-a6a4-d149256f461b;Ip=[193.47.165.134];Helo=[mtlcas13.mtl.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR05MB352 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4093 Lines: 87 On 03/11/2015 21:11, Parav Pandit wrote: > So it looks like below, > #cat rdma.resources.verbs.list > Output: > mlx4_0 uctx ah pd cq mr mw srq qp flow > mlx4_1 uctx ah pd cq mr mw srq qp flow rss_wq What happens if you set a limit of rss_wq to mlx4_0 in this example? Would it fail? I think it would be simpler for administrators if they can configure every resource supported by uverbs. If a resource is not supported by a specific device, you can never go over the limit anyway. > #cat rdma.resources.hw.list > hfi1 hw_qp hw_mr sw_pd > (This particular one is hypothical example, I haven't actually coded > this, unlike uverbs which is real). Sounds fine to me. We will need to be careful to make sure that driver maintainers don't break backward compatibility with this interface. >> I guess there aren't a lot of options when the resources can belong to >> multiple cgroups. So after migrating, new resources will belong to the >> new cgroup or the old one? > Resource always belongs to the cgroup in which its created, regardless > of process migration. > Again, its owned at the css level instead of cgroup. Therefore > original cgroup can also be deleted but internal reference to data > structure and that is freed and last rdma resource is freed. Okay. >>> For applications that doesn't use RDMA-CM, query_device and query_port >>> will filter out the GID entries based on the network namespace in >>> which caller process is running. >> This could work well for RoCE, as each entry in the GID table is >> associated with a net device and a network namespace. However, in >> InfiniBand, the GID table isn't directly related to the network >> namespace. As for the P_Keys, you could deduce the set of P_Keys of a >> namespace by the set of IPoIB netdevs in the network namespace, but >> InfiniBand is designed to also work without IPoIB, so I don't think it's >> a good idea. > Got it. Yeah, this code can be under if(device_type RoCE). IIRC there's a core capability for the new GID table code that contains namespace, so you can use that. >> I think it would be better to allow each cgroup to limit the pkeys and >> gids its processes can use. > > o.k. So the use case is P_Key? So I believe requirement would similar > to device cgroup. > Where set of GID table entries are configured as white list entries. > and when they are queried or used during create_ah or modify_qp, its > compared against the white list (or in other words as ACL). > If they are found in ACL, they are reported in query_device or in > create_ah, modify_qp. If not they those calls are failed with > appropriate status? > Does this look ok? Yes, that sounds good to me. > Can we address requirement as additional feature just after first path? > Tejun had some other idea on this kind of requirement, and I need to > discuss with him. Of course. I think there's use for the RDMA cgroup even without a pkey or GID ACL, just to make sure one application doesn't hog hardware resources. >>> One of the idea I was considering is: to create virtual RDMA device >>> mapped to physical device. >>> And configure GID count limit via configfs for each such device. >> You could probably achieve what you want by creating a virtual RDMA >> device and use the device cgroup to limit access to it, but it sounds to >> me like an overkill. > > Actually not much. Basically this virtual RDMA device points to the > struct device of the physical device itself. > So only overhead is linking this structure to native device structure > and passing most of the calls to native ib_device with thin filter > layer in control path. > post_send/recv/poll_cq will directly go native device and same performance. Still, I think we already have code that wraps ib_device calls for userspace, which is the ib_uverbs module. There's no need for an extra layer. Regards, Haggai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/