Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751243AbVLPBAk (ORCPT ); Thu, 15 Dec 2005 20:00:40 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751244AbVLPBAk (ORCPT ); Thu, 15 Dec 2005 20:00:40 -0500 Received: from mtagate2.de.ibm.com ([195.212.29.151]:58558 "EHLO mtagate2.de.ibm.com") by vger.kernel.org with ESMTP id S1751243AbVLPBAj (ORCPT ); Thu, 15 Dec 2005 20:00:39 -0500 Message-ID: <43A211A9.2060807@de.ibm.com> Date: Fri, 16 Dec 2005 02:00:25 +0100 From: Martin Peschke User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Andi Kleen CC: akpm@osdl.org, linux-kernel@vger.kernel.org Subject: Re: [patch 5/6] statistics infrastructure References: <43A04C83.2070002@de.ibm.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6521 Lines: 146 Andi Kleen wrote: >Locks and indirect function calls? > >It seems very wrong to me to make such heavy weight statistic functions. >Most likely you will disturb the performance whatever is being counted >badly. > > Well, that's a tradeoff between flexibility/function and performance. I don't have any reliable numbers ready at hand. At least, doing I/O to my SCSI devices with enabled statistics didn't feel bad. Here is the rational for both the indirect function call and the lock: If a statistic is disabled (that's the default) neither is locking done nor is a function called indirectly. So far so good, I would say. If a statistic is enabled the lock for this entity is grabbed and one indirect function call is done. Anything else is inlined. I use granular per-interface (per-entity, for example per-LUN or per-HBA) locking. The indirect function call allows customization of the ways data processing is done for particular statistics. For example, one could deflate a histogram of latencies into a counter providing the total of latency measurements, that is the total of requests observed; or inflate a statistic the other way around if required. Another example: one can make a statistic gather data for recurring periods of time, like megabytes per seconds instead of just the total amount of bytes transferred, or like queue utilization per whatever-unit-of-time instead of just an overall utilization. A statistic that feeds on request sizes can be setup to provide the following "views": - number of requests observed (counter) - number of requests per unit of time (history based on a counter) - number of bytes transfered (counter) - number of bytes transfered per unit of time = transfer rate (history based on a counter) - traffic pattern (histogram for descrete request sizes or for ranges of request sizes) - raw measurement data gathered - etc. As a device driver programmer I might pick a "view" a user is interested in. My pick might miss by a mile. I simply don't know for sure beforehand. The indirect function call could be a replaced by a switch statement. Not sure whether this is less critical and more acceptable than indirect function calls. Might be architucture dependent. We can get rid of the indirect function call (or an alternative switch statement) if the vote is against this level of flexibility. Then it would be solely up to the exploiter to define once and for all whether a particular sort of data is shown as a simple counter, a histogram, a fill level indicator, this history-type statistic thing in a ringbuffer etc. This might be fine for a considerable number of cases. The lock is there to avoid trouble with concurrent updates to a statistic. If per-CPU data was used, concurrent updates are fine as long as they are done on different CPUs. Precautions for concurrent updates to the same per-CPU is still needed, though. The current interface allows to use the lock this way: lock(stat_x->interface); statistics_inc_nolock(stat_x, y); statistics_inc_nolock(stat_m, n); statistics_add_nolock(stat_a, e, f); unlock(stat_x->interface); Because we hold the same lock when creating output for users, coherency of several statistics of a single entity can be achieved if statistic updates are done within one critical section as shown above. The lock is also used to make sure that updates to a statistic don't happen while the setup of a statistic is changed by users. If we get rid of the indirect function call, some of these setup changes go away, anyway. Other cases, like statistic resets or inflating a 5-counter histogram to a 25-counter histogram, don't go away. If I can figure out how to reallocate, say, an array of counters for a histogram without holding a lock while updates happen... Maybe I could temporarily turn of a statistic. >Take a look at many other subsystems - they do per CPU counters etc. >to make this all fast. > > I am looking into per CPU data. But, is this really required for _all_ statistics? I see that it makes sense to have per CPU optimizations for very critical components, like parts of VM. But there are still a lot of do-it-yourself type statistics around that use an atomic_t, for example, without implementing it per CPU. Then, I am not sure yet whether per CPU data is feasible for histograms and other more complex statistics. I have got to find out. I tried to write the code in a way that allows to add other statistic type, like counters, histograms and so on, with moderate effort. Maybe I can use the internal interface to plug in some disclipline based on per CPU counters... >But it's still unclear why it would need such an heavyweight infrastructure. >Normally it's not that bad to reimplemented on the fly. Maybe some >common code can be refactored out of that, but probably not too much. > >[... lots of other code snipped ... ] > >Looks all very very overdesigned to me. How about you just start >with a minimum specification and describe what you want to do? > > As a device driver programmer I don't want to reinvent the wheel when coding statistics. I would prefer to use a few and easy to use library functions. I don't want to worry about getting my personal wheel being functional. I'd rather use my time to worry about which kind of data is really needed and which is not. I'd like to provide a tool that can be customized at run time to a certain degree because it might not be acceptable for customers to install private kernels in order to get tuned statistics. As a device driver programmer I can make an educated guess, at best, about certain parameters that impact the processing of statistic data. Users might know better whether they need to focus on latencies from 2 ms up to 64 ms or from 100 ms up to 500 ms, because this kind of decisions depends on the environment to be measured, e.g. devices attached. In a device driver, I don't want to spent much thought about the statistic's user interface. Particularly not if the statistic is a little bit more complex than a simple counter. Would be really nice to have a user interface that looks the same for all exploiters, i.e. to have common output formats for counters, histograms, fill level indicators etc. Martin - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/