Date: Wed, 5 Jul 2017 17:34:39 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>, x86@kernel.org,
        linux-kernel@vger.kernel.org, hpa@zytor.com, ravi.v.shankar@intel.com,
        vikas.shivappa@intel.com, tony.luck@intel.com, fenghua.yu@intel.com,
        andi.kleen@intel.com
Subject: Re: [PATCH 08/21] x86/intel_rdt/cqm: Add RMID(Resource monitoring
 ID) management
Message-ID: <20170705153439.xudhew5wpq3liivf@hirez.programming.kicks-ass.net>
References: <1498503368-20173-1-git-send-email-vikas.shivappa@linux.intel.com>
 <1498503368-20173-9-git-send-email-vikas.shivappa@linux.intel.com>
 <alpine.DEB.2.20.1707021119300.2296@nanos>
 <alpine.DEB.2.20.1707030954330.2188@nanos>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.DEB.2.20.1707030954330.2188@nanos>
User-Agent: NeoMutt/20170113 (1.7.2)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3541
Lines: 116

On Mon, Jul 03, 2017 at 11:55:37AM +0200, Thomas Gleixner wrote:

> 
> 	if (static_branch_likely(&rdt_mon_enable_key)) {
> 		if (unlikely(current->rmid)) {
> 			newstate.rmid = current->rmid;
> 			__set_bit(newstate.rmid, this_cpu_ptr(rmid_bitmap));

Non atomic op

> 		}
> 	}
> 
> Now in rmid_free() we can collect that information:
> 
> 	cpumask_clear(&tmpmask);
> 	cpumask_clear(rmid_entry->mask);
> 
> 	cpus_read_lock();
> 	for_each_online_cpu(cpu) {
> 		if (test_and_clear_bit(rmid, per_cpu_ptr(cpu, rmid_bitmap)))

atomic op

> 			cpumask_set(cpu, tmpmask);
> 	}
> 
> 	for_each_domain(d, resource) {
> 		cpu = cpumask_any_and(d->cpu_mask, tmpmask);
> 		if (cpu < nr_cpu_ids)
> 			cpumask_set(cpu, rmid_entry->mask);
> 	}
> 
> 	list_add(&rmid_entry->list, &limbo_list);
> 
> 	for_each_cpu(cpu, rmid_entry->mask)
> 		schedule_delayed_work_on(cpu, rmid_work);
> 	cpus_read_unlock();
> 
> The work function:
> 
>     	boot resched = false;
> 
>     	list_for_each_entry(rme, limbo_list,...) {
> 		if (!cpumask_test_cpu(cpu, rme->mask))
> 			continue;
> 
> 		if (!rmid_is_reusable(rme)) {
> 			resched = true;
> 			continue;
> 		}
> 
> 		cpumask_clear_cpu(cpu, rme->mask);
> 		if (!cpumask_empty(rme->mask))
> 			continue;
> 
> 		/* Ready for reuse */
> 		list_del(rme->list);
> 		list_add(&rme->list, &free_list);
> 	}	
> 
> The alloc function then becomes:
> 
> 	if (list_empty(&free_list))
> 		return list_empty(&limbo_list) ? -ENOSPC : -EBUSY;
> 
> The switch_to() covers the task rmids. The per cpu default rmids can be
> marked at the point where they are installed on a CPU in the per cpu
> rmid_bitmap. The free path is the same for per task and per cpu.
> 
> Another thing which needs some thought it the CPU hotplug code. We need to
> make sure that pending work which is scheduled on an outgoing CPU is moved
> in the offline callback to a still online CPU of the same domain and not
> moved to some random CPU by the workqueue hotplug code.

just flush the workqueue for that CPU? That's what the workqueue core
_should_ do in any case. And that also covers the case where @cpu is the
last in the set of CPUs we could run on.

> There is another subtle issue. Assume a RMID is freed. The limbo stuff is
> scheduled on all domains which have online CPUs.
> 
> Now the last CPU of a domain goes offline before the threshold for clearing
> the domain CPU bit in the rme->mask is reached.
> 
> So we have two options here:
> 
>    1) Clear the bit unconditionally when the last CPU of a domain goes
>       offline.

Arguably this. This is cache level stuff, that means this is the last
CPU of a cache, so just explicitly kill the _entire_ cache and insta
mark everything good again; WBINVD ftw.

>    2) Arm a timer which clears the bit after a grace period
> 
> #1 The RMID might become available for reuse right away because all other
>    domains have not used it or have cleared their bits already.
>    
>    If one of the CPUs of that domain comes online again and is associated
>    to that reused RMID again, then the counter content might still contain
>    leftovers from the previous usage.

Not if we kill the cache on offline -- also, if all CPUs have been
offline, its not too weird to expect something like a package idle state
to have happened and shot down the caches anyway.

> #2 Prevents #1 but has it's own issues vs. serialization and coordination
>    with CPU hotplug.
> 
> I'd say we go for #1 as the simplest solution, document it and if really
> the need arises revisit it later.
> 
> Thanks,
> 
> 	tglx