From: Bandan Das <bsd@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, mst@redhat.com,
        jiangshanlai@gmail.com, RAPOPORT@il.ibm.com
Subject: Re: [RFC PATCH 0/4] cgroup aware workqueues
References: <1458339291-4093-1-git-send-email-bsd@redhat.com>
	<20160320181038.GQ20028@mtj.duckdns.org>
Date: Mon, 21 Mar 2016 13:35:54 -0400
In-Reply-To: <20160320181038.GQ20028@mtj.duckdns.org> (Tejun Heo's message of
	"Sun, 20 Mar 2016 14:10:38 -0400")
Message-ID: <jpg60wfd98l.fsf@linux.bootlegged.copy>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3087
Lines: 69

Tejun Heo <tj@kernel.org> writes:

> Hello,
>
> On Fri, Mar 18, 2016 at 06:14:47PM -0400, Bandan Das wrote:
>> These changes don't populate the "numa awareness" fields/attrs and
>> unlike unbounded numa worker pools, cgroup worker pools are created
>> on demand. Every work request could potentially have a new cgroup
>
> Hmmm... I don't get it.  Why would this be exclusive with numa
> support?  Can't cgroup be just another attribute in addition to numa?

Yes, I think it can. I am not certain what would be a good representation
of the cgroup information; maybe, all cgroups could be represented by just a
simple bitmap just like numa attrs ? The other thing that was on my mind is
what happens when there's no intersection between the cgroups
of a task and the numa locality. For example, if node 1 with cpus 0,1,2,3 is local to
task A but it's cgroups want to attach to cpus 4-5, then who wins in this case ?
Or a simple logic would be to always attach to cgroups as the last step.


>> aware pool created for it based on the combination of cgroups it's attached
>> to. However, workqueues themselves are incognizant of the actual cgroups -
>> they rely on the cgroups provided helper functions either for 1. a match
>> of all the cgroups or 2. to attach a worker thread to all cgroups of
>> a userspace task. We do maintain a list of cgroup aware pools so that
>> when a new request comes in and a suitable worker pool needs to be
>> found, we search the list first before creating a new one. A worker
>> pool also stores a a list of all "task owners" - a list of processes
>> that we are serving currently.
>
> Why is this separate from the normal lookup mechanism?  Can't it be
> hashed together?
>
>> Todo:
>> What about bounded workqueues ?
>
> I don't think it'd matter.  This is only interesting for work items
> which may consume a significant amount of resources, which shouldn't
> be served by per-cpu workers anyway.

Ok.

>> What happens when cgroups of a running process changes ?
>
> Existing work items will be served with the old association.  New work
> items will be served with the new association.  This is consistent
> with how other attributes are handled too.

In the current implementation, the cgroup info is fetched just once
when alloc_workqueue is called. So, there's no way of knowing if the
cgroups changed. Maybe I should rethink this too.

>> Better performance numbers ? (although the onese above don't look bad)
>
> Where is performance regression coming from?  Why is there *any*
> performance penalty?

I am still investigating this but creating more worker threads could
be one.

/* do  we need to manage? */                                                                                                                                                                                
        if (unlikely(!may_start_working(pool)) && manage_workers(worker))
           goto recheck;                         

Since all work gets queued to the default pwq in this implementation,
we do end up creating workers in the middle of a run.

> Thanks.