Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756144AbcCBRjz (ORCPT ); Wed, 2 Mar 2016 12:39:55 -0500 Received: from mail-yw0-f171.google.com ([209.85.161.171]:33791 "EHLO mail-yw0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755331AbcCBRjv (ORCPT ); Wed, 2 Mar 2016 12:39:51 -0500 Date: Wed, 2 Mar 2016 12:39:49 -0500 From: Tejun Heo To: Parav Pandit Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, lizefan@huawei.com, hannes@cmpxchg.org, dledford@redhat.com, liranl@mellanox.com, sean.hefty@intel.com, jgunthorpe@obsidianresearch.com, haggaie@mellanox.com, corbet@lwn.net, james.l.morris@oracle.com, serge@hallyn.com, ogerlitz@mellanox.com, matanb@mellanox.com, raindel@mellanox.com, akpm@linux-foundation.org, linux-security-module@vger.kernel.org Subject: Re: [PATCHv9 1/3] rdmacg: Added rdma cgroup controller Message-ID: <20160302173949.GG29826@mtj.duckdns.org> References: <1456859137-13646-1-git-send-email-pandit.parav@gmail.com> <1456859137-13646-2-git-send-email-pandit.parav@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1456859137-13646-2-git-send-email-pandit.parav@gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5053 Lines: 176 Hello, On Wed, Mar 02, 2016 at 12:35:35AM +0530, Parav Pandit wrote: > diff --git a/include/linux/cgroup_rdma.h b/include/linux/cgroup_rdma.h > new file mode 100644 > index 0000000..2da3d6c > --- /dev/null > +++ b/include/linux/cgroup_rdma.h > @@ -0,0 +1,50 @@ > +#ifndef _CGROUP_RDMA_H > +#define _CGROUP_RDMA_H > + > +#include > + > +#ifdef CONFIG_CGROUP_RDMA > + > +struct rdma_cgroup { > + struct cgroup_subsys_state css; > + > + spinlock_t rpool_list_lock; /* protects resource pool list */ > + struct list_head rpool_head; /* head to keep track of all resource > + * pools that belongs to this cgroup. > + */ I think we usually don't tail wing these comments. > +}; > + > +struct rdmacg_pool_info { > + const char **resource_name_table; > + int table_len; Align fields consistently? I've said this multiple times now but please make the available resources constant and document them in Documentation/cgroup-v2.txt. > +}; > + > +struct rdmacg_device { > + struct rdmacg_pool_info pool_info; > + struct list_head rdmacg_list; > + struct list_head rpool_head; > + /* protects resource pool list */ > + spinlock_t rpool_lock; > + char *name; > +}; > + > +/* APIs for RDMA/IB stack to publish when a device wants to > + * participate in resource accounting > + */ Please follow CodingStyle. Wasn't this pointed out a couple times already? > +enum rdmacg_file_type { > + RDMACG_RESOURCE_MAX, > + RDMACG_RESOURCE_STAT, > +}; Heh, usually not a good sign. If you're using this to call into a common function and then switch out on the file type, just switch out at the method level and factor out common part into helpers. > +/* resource tracker per resource for rdma cgroup */ > +struct rdmacg_resource { > + int max; > + int usage; > +}; Align fields? > +/** The above indicates kerneldoc comments, which this isn't. > + * resource pool object which represents, per cgroup, per device > + * resources. There are multiple instance > + * of this object per cgroup, therefore it cannot be embedded within > + * rdma_cgroup structure. It is maintained as list. Consistent paragraph fill? > + */ > +struct rdmacg_resource_pool { > + struct list_head cg_list; > + struct list_head dev_list; > + > + struct rdmacg_device *device; > + struct rdmacg_resource *resources; > + struct rdma_cgroup *cg; /* owner cg used during device cleanup */ > + > + int refcnt; /* count active user tasks of this pool */ > + int num_max_cnt; /* total number counts which are set to max */ > +}; Formatting. > +static inline void set_resource_limit(struct rdmacg_resource_pool *rpool, > + int index, int new_max) Is inline necessary? Compiler can figure out these. > +static struct rdmacg_resource_pool* ^ space > +find_cg_rpool_locked(struct rdma_cgroup *cg, > + struct rdmacg_device *device) ... > +static void uncharge_cg_resource(struct rdma_cgroup *cg, > + struct rdmacg_device *device, > + int index, int num) > +{ ... > + rpool->refcnt--; > + if (rpool->refcnt == 0 && rpool->num_max_cnt == pool_info->table_len) { If the caller charges 2 and then uncharges 1 two times, the refcnt underflows? Why not just track how many usages are zero? ... > +void rdmacg_uncharge(struct rdma_cgroup *cg, > + struct rdmacg_device *device, > + int index, int num) > +{ > + struct rdma_cgroup *p; > + > + for (p = cg; p; p = parent_rdmacg(p)) > + uncharge_cg_resource(p, device, index, num); > + > + css_put(&cg->css); > +} > +EXPORT_SYMBOL(rdmacg_uncharge); So, I suppose the code is trying to save lock contention with fine-grained locking; however, as the code is currently structured, it's just increasing the number of lock ops and it'd be highly likely to cheaper to simply use a single lock. If you're working up the tree grabbing lock at each level, per-node locking doesn't save you anything. Also, it introduces conditions where charges are spuriously denied when there are racing requestors. If scalability becomes an issue, the right way to address is adding percpu front cache. > +void rdmacg_query_limit(struct rdmacg_device *device, > + int *limits) > +{ > + struct rdma_cgroup *cg, *p; > + struct rdmacg_resource_pool *rpool; > + struct rdmacg_pool_info *pool_info; > + int i; > + > + cg = task_rdmacg(current); > + pool_info = &device->pool_info; Nothing seems to prevent @cg from going away if this races with @current being migrated to a different cgroup. Have you run this with lockdep and rcu debugging enabled? This should have triggered a warning. ... > + for (p = cg; p; p = parent_rdmacg(p)) { > + spin_lock(&cg->rpool_list_lock); > + rpool = find_cg_rpool_locked(cg, device); > + if (rpool) { > + for (i = 0; i < pool_info->table_len; i++) > + limits[i] = min_t(int, limits[i], > + rpool->resources[i].max); So, the O complexity wise, things like the above are pretty bad and the above pattern is used in hot paths. I suppose there can only be a handful of devices per system? Thanks. -- tejun