LinuxLists.cc - [RFC] Revised CKRM release

2004-04-29 08:44:38

Subject: [RFC] Revised CKRM release

The Class-based Resource Management project is happy to release the
first bits of a working prototype following a major revision of its
interface and internal organization.

The basic concepts and motivation of CKRM remain the same as described
in the overview at http://ckrm.sf.net. Privileged users can define
classes consisting of groups of kernel objects (currently tasks and
sockets) and specify shares for these classes. Resource controllers,
which are independent of each other, can regulate and monitor the
resources consumed by classes e.g the CPU controller will control the
CPU time received by a class etc. Optional classification engines,
implemented as kernel modules, can assist in the automatic
classification of the kernel objects (tasks/sockets currently) into
classes.

New in this release are the following:

1) A filesystem-based user interface, proposed by Rik van Riel, to
replace the system call interface in the previous prototype.

2) A hierarchy of classes can now be created so that a class (created
per user, say) can subdivide its share allocation among its children
classes (created one per application type), independent of its peer
classes (other users).

3) A newly introduced notion of a classtype which defines what kind of
kernel objects are being grouped into a class for regulation and
monitoring. Grouping tasks, into the taskclass classtype, is the most
commonly expected use. The prototype also implements the socketclass
classtype, useful for controlling groups of sockets.

4) Resource controllers are now explicitly associated with a
classtype. The CPU memory and I/O controllers (not yet implemented)
will operate on taskclasses while the multiple accept queue controller
operates on socketclasses.

5) A functional socketaq network controller which regulates the number
of accepted TCP connections for groups of listening sockets.

The newly implemented features have been described at some length in a
document posted on lkml a while back and available at

http://ckrm.sourceforge.net/CKRMmergedAPI-d6.txt

A revised description and update of the project webpages is in
progress. The patches will be posted individually and are described
below. They are also available on http://ckrm.sf.net.

Comments/feedback welcome. If this looks interesting, please consider
joining the [email protected] mailing list.

-- Hubertus Franke, Shailabh Nagar, Chandra Seetharaman, Vivek Kashyap

CKRM Patches overview
---------------------

All patches against 2.6.5.

00-core.ckrm-E12.patch:

Core code of ckrm which glues the interface (rcfs), resource
controllers (rc's) and classification engines (ce's) into the
framework.

01-rcfs.ckrm-E7.patch:

Resource control filesystem (rcfs) forming the user interface to CKRM.

02-taskclass.ckrm-E12.patch:

Creates the taskclass classtype for use by resource controllers which
operate on groups of tasks. The CPU, memory and I/O resource
controllers will operate on taskclasses when their rewrite/port to the
new API is complete. The patch includes the rcfs interface to
taskclasses.

03-numtasks.ckrm-E12.patch:

A simple resource controller that limits the number of tasks that can
be forked within a taskclass. Implemented mainly to serve as a
prototype for resource controller writers. Modifications to
kernel/exit.c and kernel/fork.c which should strictly be part of this
patch are included in the 00-core.ckrm-E12.patch.

04-socketclass.ckrm-E12.patch:

Creates the socketclass classtype, alongwith its rcfs interface, for
use by resource controllers which operate on groups of sockets.

05-socketaq.ckrm-E12.patch:

A resource controller that controls the number of accepted TCP
connections. It is CKRM's first real controller. Changes include
modifications to the TCP stack to implement multiple accept queues.

rbce.ckrm-E12:

Two classification engines (CE) to assist in automatic classification
of tasks and sockets. The first one, rbce, implements a rule-based
classification engine which is generic enough for most users. The
second, called crbce, is a variant of rbce which additionally provides
information on significant kernel events (where a task/socket could
get reclassified) to userspace as well as reports per-process wait
times for cpu, memory, io etc. Such information can be used by user
level tools to reclassify tasks to new classes, change class shares
etc.

2004-04-30 16:41:21

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [RFC] Revised CKRM release

> The basic concepts and motivation of CKRM remain the same as described
> in the overview at http://ckrm.sf.net. Privileged users can define
> classes consisting of groups of kernel objects (currently tasks and
> sockets) and specify shares for these classes. Resource controllers,
> which are independent of each other, can regulate and monitor the
> resources consumed by classes e.g the CPU controller will control the
> CPU time received by a class etc. Optional classification engines,
> implemented as kernel modules, can assist in the automatic
> classification of the kernel objects (tasks/sockets currently) into
> classes.

I'd still love to see practical problems this thing is solving. It's
a few thousand lines of code, not written to linux style guidelines,
sometimes particularly obsfucated with callbacks all over the place.

I'd hate to see this in the kernel unless there's a very strong need
for it and no way to solve it at a nicer layer of abstraction, e.g.
userland virtual machines ala uml/umlinux.

2004-04-30 18:43:18

by Shailabh Nagar

[permalink] [raw]

Subject: Re: [RFC] Revised CKRM release

Hi Christoph,

Christoph Hellwig wrote:
>>The basic concepts and motivation of CKRM remain the same as described
>>in the overview at http://ckrm.sf.net. Privileged users can define
>>classes consisting of groups of kernel objects (currently tasks and
>>sockets) and specify shares for these classes. Resource controllers,
>>which are independent of each other, can regulate and monitor the
>>resources consumed by classes e.g the CPU controller will control the
>>CPU time received by a class etc. Optional classification engines,
>>implemented as kernel modules, can assist in the automatic
>>classification of the kernel objects (tasks/sockets currently) into
>>classes.
>
>
> I'd still love to see practical problems this thing is solving.

We'd outlined three scenarios in our OLS'03 presentation
http://ckrm.sourceforge.net/documentation/ckrm-ols03-presentation.pdf

a) application server serving multiple requests from customers of
varying importance. The app server dynamically spawns processes to
handle each customers request. We need to group all the processes that
are currently serving a low-importance "bronze" customer and ensure
they don't take resources away from the group serving a "gold" customer.
Important criteria: don't assume the processes spawned always serve
the same customer (or customer type) ie. retain the flexibility for
them being able to have a "gold" priority for some time and then
revert to a "bronze" status.
--> needs processes to be classified into groups and regulated
based on some app-specific rules which cannot be predicted by the
kernel in advance.

b) Desktop user doing a combination of activites with different
priorities (for him/her), say:
compiling (lower) + listening to music (higher)
--> needs music player to be given a higher share of all
resources (cpu, mem, io) than the compile.
disk backup (very low) + checking email (higher)
--> needs io requests for email to be given a higher "share"

c) Multiple User-mode Linux instances running on a box (for virtual
hosting). Each uml instance, serving a different type of consumer (say
paying vs. nonpaying) needs a different level of service.
--> need to define groups of processes which are spawned by the
same uml instance

Besides, we have

d) department servers: multiple users logging in. Limit each
user/login to a fixed share of cpu/mem/io.
--> need to define groups of processes with same uid/gid or sharing
the same tty....

e) monitor how much load is being seen by a related group of
applications on a Linux box (perhaps to decide whether they're better
hosted on another box).
--> needs to group processes by application group even when the
command names are arbitrary), should accomodate short-lived apps etc.

f) tcp connection requests for an http server are coming from sites
with varying importance to the httpd owner. Serve some sites
preferentially.
--> needs incoming tcp connections to be accepted at differential
rates for groups of listening sockets formed using source ip/port.

> It's a few thousand lines of code, not written to linux style guidelines,

Guilty as charged :-( We will work to fix that until all are happy :-)

> sometimes particularly obsfucated with callbacks all over the place.

Not guilty ! Callbacks all over the place keep the various components
independent - the resource controllers (which are/will be patches over
the kernel schedulers), the classification engine module (which
assists in automatic classification of processes/sockets into groups
using rules; but is completely optional) and any code for new kinds of
groupings (other than tasks and sockets) that may be found useful to
control as a set in future.

This independence is a feature - it allows the controller code that is
deemed acceptable to the corresponding scheduler maintainer to be
integrated without being dependent on acceptance of other scheduler
modifications.

Of course, the core and user interface (rcfs) have to be included, but
they're not that large (subjective biased opinion of course, but
seriously, if there are suggestions on how we can make it even leaner,
we're open to ideas).

> I'd hate to see this in the kernel unless there's a very strong need
> for it and no way to solve it at a nicer layer of abstraction, e.g.
> userland virtual machines ala uml/umlinux.
>

Trying to achieve the same goals using abstractions built on top of
process-centric rlimits will not work for examples like a) or e).

Also, if we want to regulate resource consumption by groups of sockets
or other types of kernel objects, the wheel would need to be reinvented.

We believe that CKRM addresses both of the above concerns.

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-04-30 19:04:15

by Rik van Riel

[permalink] [raw]

Subject: Re: [ckrm-tech] Re: [RFC] Revised CKRM release

On Fri, 30 Apr 2004, Christoph Hellwig wrote:

> I'd hate to see this in the kernel unless there's a very strong need
> for it and no way to solve it at a nicer layer of abstraction, e.g.
> userland virtual machines ala uml/umlinux.

User Mode Linux could definitely be an option for implementing
resource management, provided that the overhead can be kept
low enough.

For these purposes, "low enough" could be as much as 30%
overhead, since that would still allow people to grow the
utilisation of their server from a typical 10-20% to as
much as 40-50%.

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2004-04-30 19:17:32

by Shailabh Nagar

[permalink] [raw]

Subject: Re: [ckrm-tech] Re: [RFC] Revised CKRM release

Rik van Riel wrote:
> On Fri, 30 Apr 2004, Christoph Hellwig wrote:
>
>
>>I'd hate to see this in the kernel unless there's a very strong need
>>for it and no way to solve it at a nicer layer of abstraction, e.g.
>>userland virtual machines ala uml/umlinux.
>
>
> User Mode Linux could definitely be an option for implementing
> resource management, provided that the overhead can be kept
> low enough.

....and provided the groups of processes that are sought to be
regulated as a unit are relatively static.

> For these purposes, "low enough" could be as much as 30%
> overhead, since that would still allow people to grow the
> utilisation of their server from a typical 10-20% to as
> much as 40-50%.
>

In overhead, I presume you're including the overhead of running as
many uml instances as expected number of classes. Not just the
slowdown of applications because they're running under a uml instance
(instead of running native) ?

I think UML is justified more from a fault-containment point of view
(where overheads are a lower priority) than from a performance
isolation viewpoint.

In any case, a 30% overhead would send a large batch of higher-end
server admins running to get a stick to beat you with :-)

2004-04-30 19:32:33

by Rik van Riel

[permalink] [raw]

Subject: Re: [ckrm-tech] Re: [RFC] Revised CKRM release

On Fri, 30 Apr 2004, Shailabh Nagar wrote:
> Rik van Riel wrote:

> > User Mode Linux could definitely be an option for implementing
> > resource management, provided that the overhead can be kept
> > low enough.
>
> ....and provided the groups of processes that are sought to be
> regulated as a unit are relatively static.

Good point, I hadn't thought of that one.

It works for most of the workloads I had in mind, but
you're right that it's not good enough for eg. the
university shell server.

> > For these purposes, "low enough" could be as much as 30%
> > overhead, since that would still allow people to grow the
> > utilisation of their server from a typical 10-20% to as
> > much as 40-50%.
>
> In overhead, I presume you're including the overhead of running as
> many uml instances as expected number of classes. Not just the
> slowdown of applications because they're running under a uml instance
> (instead of running native) ?
>
> I think UML is justified more from a fault-containment point of view
> (where overheads are a lower priority) than from a performance
> isolation viewpoint.
>
> In any case, a 30% overhead would send a large batch of higher-end
> server admins running to get a stick to beat you with :-)

True enough, but from my pov the flip side is that
merging the CKRM memory resource enforcement module
has the potential of undoing lots of the performance
tuning that was done to the VM in 2.6.

That could result in bad performance even for the
people who aren't using workload management at all...

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2004-04-30 19:48:21

by Shailabh Nagar

[permalink] [raw]

Subject: Re: [ckrm-tech] Re: [RFC] Revised CKRM release

Rik van Riel wrote:
> On Fri, 30 Apr 2004, Christoph Hellwig wrote:
>
>
>>I'd hate to see this in the kernel unless there's a very strong need
>>for it and no way to solve it at a nicer layer of abstraction, e.g.
>>userland virtual machines ala uml/umlinux.
>
>
> User Mode Linux could definitely be an option for implementing
> resource management, provided that the overhead can be kept
> low enough.
>
> For these purposes, "low enough" could be as much as 30%
> overhead, since that would still allow people to grow the
> utilisation of their server from a typical 10-20% to as
> much as 40-50%.
>

http://www.cl.cam.ac.uk/Research/SRG/netos/xen/performance.html

has some numbers comparing native Linux to UML (and against the Xen
virtual machine monitor) but its on a 2.4 kernel.

Jeff, do you have any numbers for UML overhead in 2.6 ?

-- Shailabh

2004-04-30 20:15:45

by Shailabh Nagar

[permalink] [raw]

Subject: Re: [ckrm-tech] Re: [RFC] Revised CKRM release

Rik van Riel wrote:
> On Fri, 30 Apr 2004, Shailabh Nagar wrote:
>
>>Rik van Riel wrote:
>
>
>>>User Mode Linux could definitely be an option for implementing
>>>resource management, provided that the overhead can be kept
>>>low enough.
>>
>>....and provided the groups of processes that are sought to be
>>regulated as a unit are relatively static.
>
>
> Good point, I hadn't thought of that one.
>
> It works for most of the workloads I had in mind, but
> you're right that it's not good enough for eg. the
> university shell server.
>
>
>>>For these purposes, "low enough" could be as much as 30%
>>>overhead, since that would still allow people to grow the
>>>utilisation of their server from a typical 10-20% to as
>>>much as 40-50%.
>>
>>In overhead, I presume you're including the overhead of running as
>>many uml instances as expected number of classes. Not just the
>>slowdown of applications because they're running under a uml instance
>>(instead of running native) ?
>>
>>I think UML is justified more from a fault-containment point of view
>>(where overheads are a lower priority) than from a performance
>>isolation viewpoint.
>>
>>In any case, a 30% overhead would send a large batch of higher-end
>>server admins running to get a stick to beat you with :-)
>
>
> True enough, but from my pov the flip side is that
> merging the CKRM memory resource enforcement module
> has the potential of undoing lots of the performance
> tuning that was done to the VM in 2.6.

Agreed - CKRM's memory controller logic needs major rework for it to
be acceptable....but I'm sure you can do something about it, Rik ! :-)

The cpu and I/O controllers will also have to be reworked since we now
have the hierarchical class requirement as well as lower and upper
bounds for shares.

>
> That could result in bad performance even for the
> people who aren't using workload management at all...

Even with the earlier logic, the hope was that if people are not using
workload management at all, then the only overhead they would see
would be the extra indirection into "find next class to schedule" (in
any controller) since there would be only one default class in the
system. In the cpu case, this overhead had been shown to be as low as
1-2% but memory overhead had not been measured.

Keeping overheads low (or zero) for those who don't care to use CKRM
functionality is a high-priority design goal. Keeping it proportional
to number of classes (with more significant degradations seen if the
number of hierarchy levels increase) comes next.

Also, will the 2.6 VM improvements continue to work as designed if
multiple UML instances are running, each replicating a large memory
user (like say a JVM or a database server) ? Taking the application
server serving a number of different customers. If we have to
replicate the app server for each customer class (one on each UML
instance), the app server's memory needs would get added to the
equation n times and the benefits of 2.6 VM tuning might be lost.

-- Shailabh

2004-04-30 21:35:59

by Jeff Dike

[permalink] [raw]

Subject: Re: [ckrm-tech] Re: [RFC] Revised CKRM release

[email protected] said:
> Jeff, do you have any numbers for UML overhead in 2.6 ?

It obviously depends on the workload, but for "normal" things, like kernel
builds and web serving, it's generally in the 20-30% range. That can be
reduced, since I haven't spent too much time on tuning. I'm aiming for the
teens, and I don't think that'll be too hard.

Jeff

2004-04-30 22:03:55

by Jeff Dike

[permalink] [raw]

Subject: Re: [ckrm-tech] Re: [RFC] Revised CKRM release

[email protected] said:
> In overhead, I presume you're including the overhead of running as
> many uml instances as expected number of classes. Not just the
> slowdown of applications because they're running under a uml instance
> (instead of running native) ?

My next major UML project is going to be porting it back into the kernel. By
this, I mean calling the internal system calls rather than the libc wrappers.
Doing this will give you an object that can be built directly into the kernel,
or insmodded, and when it's started, will be an in-kernel UML instance.

People look at this as an overhead reduction thing. It will do that, and open
up more opportunities for reducing overhead later, but I'm doing it to make
UML something of a virtualization toolkit. I'm envisioning that an in-kernel
UML can be stripped down to just the VM system, and processes inside it will
be confined to using a maximum amount of memory in total, but unrestricted in
every other way. Or it could be a VM system plus scheduler, in which case
their access to CPU would be controlled as well as memory. Basically, my
long-term goal is to allow UML to allow containment of any combination of
resources.

Longer-term than that, I would like the in-kernel vs userspace containment
choice to be independent of everything else, so you'd be able to decide how
you want to confine your processes, and then decide whether the container
should be in userspace or inside the kernel.

[email protected] said:
> > ....and provided the groups of processes that are sought to be
> > regulated as a unit are relatively static.
>
> Good point, I hadn't thought of that one.

I'm starting to. With the in-kernel UML stuff I describe above, it doesn't
seem too hard to move a process from the host scheduler to a UML scheduler,
for example. Moving a process into a confined memory pool looks harder, but
I can see how it might be done. The UML would trade pages with the host,
getting the process's memory in exchange for giving up free pages. So, it
looks possible that you could migrate processes around between containers.

[email protected] said:
> Also, will the 2.6 VM improvements continue to work as designed if
> multiple UML instances are running, each replicating a large memory
> user (like say a JVM or a database server) ? Taking the application
> server serving a number of different customers. If we have to
> replicate the app server for each customer class (one on each UML
> instance), the app server's memory needs would get added to the
> equation n times and the benefits of 2.6 VM tuning might be lost.

Well, you get to share text, and data is split n ways instead of being n
chunks in one server, so to a first approximation, it looks like a wash
to me.

If the different customer classes are using the same data, then you might
get some duplication, although it might be possible to eliminate it.

Jeff

2004-04-30 23:43:38

by Herbert Poetzl

[permalink] [raw]

Subject: Re: [ckrm-tech] Re: [RFC] Revised CKRM release

On Fri, Apr 30, 2004 at 06:17:39PM -0400, Jeff Dike wrote:
> [email protected] said:
> > Jeff, do you have any numbers for UML overhead in 2.6 ?
>
> It obviously depends on the workload, but for "normal" things, like kernel
> builds and web serving, it's generally in the 20-30% range. That can be
> reduced, since I haven't spent too much time on tuning. I'm aiming for the
> teens, and I don't think that'll be too hard.

hmm, just wanted to mention that linux-vserver has
around 0% overhead and often allows to improve
performance due to resource sharing ...

basically it's a soft partitioning concept based on
'Security Contexts' which allow to create many
independant Virtual Private Servers (VPS), which
act simultaneously on one box at full speed, sharing
the available hardware resources.

see http://linux-vserver.org for details ...

best,
Herbert

PS: UML and Linux-VServer play together nicely ...

>
> Jeff
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: Oracle 10g
> Get certified on the hottest thing ever to hit the market... Oracle 10g.
> Take an Oracle 10g class now, and we'll give you the exam FREE.
> http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech