Date: Tue, 27 Dec 2016 12:21:44 -0800 (PST)
From: Shivappa Vikas <vikas.shivappa@intel.com>
To: Andi Kleen <andi@firstfloor.org>
cc: Shivappa Vikas <vikas.shivappa@intel.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Vikas Shivappa <vikas.shivappa@linux.intel.com>,
        linux-kernel@vger.kernel.org, x86@kernel.org, tglx@linutronix.de,
        ravi.v.shankar@intel.com, tony.luck@intel.com, fenghua.yu@intel.com,
        davidcc@google.com, eranian@google.com, hpa@zytor.com
Subject: Re: [PATCH 01/14] x86/cqm: Intel Resource Monitoring Documentation
In-Reply-To: <87vau5gn1w.fsf@firstfloor.org>
Message-ID: <alpine.DEB.2.10.1612271206460.5815@vshiva-Udesk>
References: <1481929988-31569-1-git-send-email-vikas.shivappa@linux.intel.com> <1481929988-31569-2-git-send-email-vikas.shivappa@linux.intel.com> <20161223123228.GQ3107@twins.programming.kicks-ass.net> <alpine.DEB.2.10.1612231126590.32409@vshiva-Udesk>
 <20161223203318.GU3107@twins.programming.kicks-ass.net> <alpine.DEB.2.10.1612241747170.32409@vshiva-Udesk> <87vau5gn1w.fsf@firstfloor.org>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2320
Lines: 65


On Tue, 27 Dec 2016, Andi Kleen wrote:

> Shivappa Vikas <vikas.shivappa@intel.com> writes:
>>
>> Ok , looks like the interface  is the problem. Will try to fix
>> this. We are just trying to have a light weight monitoring
>> option so that its reasonable to monitor for a
>> very long time (like lifetime of process etc). Mainly to not have all
>> the perf scheduling overhead.
>
> That seems like an odd reason to define a completely new user interface.
> This is to avoid one MSR write for a RMID change per context switch
> in/out cgroup or is it other code too?
>
> Is there some number you can put to the overhead?
> Or is there some other overhead other than the MSR write
> you're concerned about?

Yes, seems like the interface of having a file is odd as even Peterz thinks.

Its the perf overhead actually we are trying to avoid.

The MSR writes(the driver/cqm overhead 
really not perf..) we try to optimize by having a per cpu cache/group the rmids/ 
have a common write for rmid/closid etc.

The perf overhead i was thinking atleast was during the context switch which is 
the more constant overhead (the event creation is just one time).

-I was trying to see an alternative where
1.user specifies the continuous monitor with perf-attr in open
2.driver allocates the task/cgroup RMID and stores the RMID in cgroup or 
task_struct
3.turns off the event. (hence no perf ctx switch overhead? (all the perf hook 
calls for start/stop/add we dont need any of those -
i was still finding out if this route works basically if i turn off the event 
there is minimal overhead for the event and not start/stop/add calls for the 
event.)
4.but during switch_to driver writes the RMID MSR, so we still monitor.
5.read -> calls the driver -> driver just returns the count by reading the 
RMID.

>
> Do you have an ftrace or better PT trace with the overhead before-after?
>
> Perhaps some optimization could be done in the code to make it faster,
> then the new interface wouldn't be needed.
>
> FWIW there are some pending changes to context switch that will
> eliminate at least one common MSR write [1]. If that was fixed
> you could do the RMID MSR write "for free"

I see, thats good to know..

Thanks,
Vikas

>
> -Andi
>
> [1] https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/fsgsbase
>
>