Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757505AbcCURf6 (ORCPT ); Mon, 21 Mar 2016 13:35:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53919 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756804AbcCURf4 (ORCPT ); Mon, 21 Mar 2016 13:35:56 -0400 From: Bandan Das To: Tejun Heo Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, mst@redhat.com, jiangshanlai@gmail.com, RAPOPORT@il.ibm.com Subject: Re: [RFC PATCH 0/4] cgroup aware workqueues References: <1458339291-4093-1-git-send-email-bsd@redhat.com> <20160320181038.GQ20028@mtj.duckdns.org> Date: Mon, 21 Mar 2016 13:35:54 -0400 In-Reply-To: <20160320181038.GQ20028@mtj.duckdns.org> (Tejun Heo's message of "Sun, 20 Mar 2016 14:10:38 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3087 Lines: 69 Tejun Heo writes: > Hello, > > On Fri, Mar 18, 2016 at 06:14:47PM -0400, Bandan Das wrote: >> These changes don't populate the "numa awareness" fields/attrs and >> unlike unbounded numa worker pools, cgroup worker pools are created >> on demand. Every work request could potentially have a new cgroup > > Hmmm... I don't get it. Why would this be exclusive with numa > support? Can't cgroup be just another attribute in addition to numa? Yes, I think it can. I am not certain what would be a good representation of the cgroup information; maybe, all cgroups could be represented by just a simple bitmap just like numa attrs ? The other thing that was on my mind is what happens when there's no intersection between the cgroups of a task and the numa locality. For example, if node 1 with cpus 0,1,2,3 is local to task A but it's cgroups want to attach to cpus 4-5, then who wins in this case ? Or a simple logic would be to always attach to cgroups as the last step. >> aware pool created for it based on the combination of cgroups it's attached >> to. However, workqueues themselves are incognizant of the actual cgroups - >> they rely on the cgroups provided helper functions either for 1. a match >> of all the cgroups or 2. to attach a worker thread to all cgroups of >> a userspace task. We do maintain a list of cgroup aware pools so that >> when a new request comes in and a suitable worker pool needs to be >> found, we search the list first before creating a new one. A worker >> pool also stores a a list of all "task owners" - a list of processes >> that we are serving currently. > > Why is this separate from the normal lookup mechanism? Can't it be > hashed together? > >> Todo: >> What about bounded workqueues ? > > I don't think it'd matter. This is only interesting for work items > which may consume a significant amount of resources, which shouldn't > be served by per-cpu workers anyway. Ok. >> What happens when cgroups of a running process changes ? > > Existing work items will be served with the old association. New work > items will be served with the new association. This is consistent > with how other attributes are handled too. In the current implementation, the cgroup info is fetched just once when alloc_workqueue is called. So, there's no way of knowing if the cgroups changed. Maybe I should rethink this too. >> Better performance numbers ? (although the onese above don't look bad) > > Where is performance regression coming from? Why is there *any* > performance penalty? I am still investigating this but creating more worker threads could be one. /* do we need to manage? */ if (unlikely(!may_start_working(pool)) && manage_workers(worker)) goto recheck; Since all work gets queued to the default pwq in this implementation, we do end up creating workers in the middle of a run. > Thanks.