--
This is the core patch set for CKRM, review comments almost all
applied (there are a few we are still working on, mostly cosmetic).
However, this set has been extensively regression tested on IA32,
x86-64/EM64T, and PPC64, with various CKRM CONFIG options on and
off and both regression tests and ckrm's functional tests.
I believe this set is ready for additional testing in -mm. We
have an additional 4 patch sets that will follow this (classification
engines, memory controller, IO controller, updated network controller).
Continued comments are welcome; once we have patches for the last
of the cleanups, we are hoping we'll have sufficient testing to be
able to push this towards mainline.
gerrit
gerrit wrote:
> This is the core patch set for CKRM
Welcome.
Newcomers to CKRM might want to start reading these patches with "[patch
8/8] CKRM: Documentation". Starting with patch 0/8 or 1/8 will be
difficult, at least if you're as dimm witted as I am.
Even the documentation included in patch 8/8 is missing the motivation
and context essential to understanding this patch set. It might have
helped if the Introduction text at http://ckrm.sourceforge.net/ had been
included in some form, as part of patch 0/8. I'm just a little penguin
here (lkml), but from what I can tell by watching how things work,
you're going to have to "make the case" -- explain what this is, how
it's put togeher, and why it's needed. This is a sizable patch, in
lines of code, in hooks in critical places, and in amount of "new
concepts." I presume (unless you've managed to bribe or blackmail some
big penguin) you're going to have convince some others that this is
worth having. I for one am a CKRM skeptic, so won't be much help to you
in that quest. Good luck.
I don't see any performance numbers, either on small systems, or
scalability on large systems. Certainly this patch does not fall under
the "obviously no performance impact" exclusion.
Here's a combined diffstat showing how much code is added by these
patches, where. Some of the patches have individual diffstat's, some
don't seem to.
Documentation/ckrm/TODO | 17
Documentation/ckrm/ckrm_basics | 66 ++
Documentation/ckrm/core_usage | 72 +++
Documentation/ckrm/crbce | 33 +
Documentation/ckrm/installation | 70 +++
Documentation/ckrm/rbce_basics | 67 ++
Documentation/ckrm/rbce_usage | 98 ++++
fs/Makefile | 1
fs/exec.c | 2
fs/proc/array.c | 18
fs/proc/base.c | 17
fs/proc/internal.h | 1
fs/rcfs/Makefile | 9
fs/rcfs/dir.c | 220 +++++++++
fs/rcfs/inode.c | 160 ++++++
fs/rcfs/magic.c | 517 ++++++++++++++++++++++
fs/rcfs/rootdir.c | 220 +++++++++
fs/rcfs/socket_fs.c | 280 ++++++++++++
fs/rcfs/super.c | 291 ++++++++++++
fs/rcfs/tc_magic.c | 93 ++++
include/linux/ckrm_ce.h | 95 ++++
include/linux/ckrm_events.h | 230 +++++++++-
include/linux/ckrm_net.h | 42 +
include/linux/ckrm_rc.h | 345 +++++++++++++++
include/linux/ckrm_tc.h | 46 ++
include/linux/ckrm_tsk.h | 35 +
include/linux/rcfs.h | 116 ++++-
include/linux/sched.h | 105 ++++
include/linux/taskdelays.h | 35 +
include/net/sock.h | 3
include/net/tcp.h | 4
init/Kconfig | 68 ++
init/main.c | 2
kernel/Makefile | 1
kernel/ckrm/Makefile | 14
kernel/ckrm/ckrm.c | 892 +++++++++++++++++++++++++++++++++++++++
kernel/ckrm/ckrm_events.c | 86 +++
kernel/ckrm/ckrm_numtasks.c | 522 ++++++++++++++++++++++
kernel/ckrm/ckrm_numtasks_stub.c | 53 ++
kernel/ckrm/ckrm_sockc.c | 559 ++++++++++++++++++++++++
kernel/ckrm/ckrm_tc.c | 745 ++++++++++++++++++++++++++++++++
kernel/ckrm/ckrmutils.c | 188 ++++++++
kernel/exit.c | 3
kernel/fork.c | 12
kernel/sched.c | 20
kernel/sys.c | 11
mm/memory.c | 10
net/ipv4/tcp_ipv4.c | 5
48 files changed, 6460 insertions(+), 39 deletions(-)
A couple of nits:
1) Instead of disabling routines with #defines:
#define numtasks_put_ref(core_class) do {} while (0)
one can do it with static inlines, preserving more compiler
checking.
2) I take it that the following constitutes the 'documentation'
for what is in /proc/<pid>/delay. Perhaps I missed something.
+ res = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n",
+ (unsigned int) get_delay(task,runs),
+ (uint64_t) get_delay(task,runcpu_total),
+ (uint64_t) get_delay(task,waitcpu_total),
+ (unsigned int) get_delay(task,num_iowaits),
+ (uint64_t) get_delay(task,iowait_total),
+ (unsigned int) get_delay(task,num_memwaits),
+ (uint64_t) get_delay(task,mem_iowait_total)
3) Typo in init/Kconfig "atleast":
If you say Y here, enable the Resource Class File System and atleast
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401
On Tue, 29 Mar 2005 22:05:30 PST, Paul Jackson wrote:
> gerrit wrote:
> > This is the core patch set for CKRM
>
> Welcome.
Hi Paul.
> Newcomers to CKRM might want to start reading these patches with "[patch
> 8/8] CKRM: Documentation". Starting with patch 0/8 or 1/8 will be
> difficult, at least if you're as dimm witted as I am.
>
> Even the documentation included in patch 8/8 is missing the motivation
> and context essential to understanding this patch set. It might have
> helped if the Introduction text at http://ckrm.sourceforge.net/ had been
> included in some form, as part of patch 0/8. I'm just a little penguin
> here (lkml), but from what I can tell by watching how things work,
> you're going to have to "make the case" -- explain what this is, how
> it's put togeher, and why it's needed. This is a sizable patch, in
> lines of code, in hooks in critical places, and in amount of "new
> concepts." I presume (unless you've managed to bribe or blackmail some
> big penguin) you're going to have convince some others that this is
> worth having. I for one am a CKRM skeptic, so won't be much help to you
> in that quest. Good luck.
Good point on including the pointer to the web site. As you probably
noticed, there is a history of the design, papers presented, etc.
Also, Jonathan Corbet did a nice write up from the discussion at the
2004 Kernel summit which is archived here: http://lwn.net/Articles/94573/
which may be of use.
The OLS and LinuxTag papers are archived at the site that you pointed
to and there will be a tutorial on configuring, using and writing
controllers for CKRM at OLS this year. You may also want to see the
previous postings of this code to LKML for more background.
In short, CKRM provides very basic desktop to server workload management
capabilities similar to those provided by most of the old fashioned
operating systems. The code provides a fairly simple mechanism for
adding controllers for any resource type and the code is currently
widely deployed by PlanetLab, a part of Novell/SuSE's distro, and
the capabilities are requested by a fair number of Linux users and
customers.
> I don't see any performance numbers, either on small systems, or
> scalability on large systems. Certainly this patch does not fall under
> the "obviously no performance impact" exclusion.
Fair point. We have been running some of the smaller benchmarks but
have not yet had a chance to do any kind of performance comparison
based on the current code. However, when configured out, it will
have zero impact. We do have some performance analysis of the code
with CONFIG_CKRM set to y but no rules configured planned for the
very near future.
> A couple of nits:
>
> 1) Instead of disabling routines with #defines:
> #define numtasks_put_ref(core_class) do {} while (0)
> one can do it with static inlines, preserving more compiler
> checking.
Yeah - that works well in some cases but it turns out to not do so
well when an argument to a function refers to a structure element
which is not configured in. In that case, the compiler emits a
reference to an undefined structure value in the case of the static
inline, where otherwise the entire set of code is pre-processed
away. I think we've gone through the code and used the correct
balance of static inlines and #define constructs as appropriate.
If we've missed any, I'm more than willing to accept a patch to
correct a specific instance.
> 2) I take it that the following constitutes the 'documentation'
> for what is in /proc/<pid>/delay. Perhaps I missed something.
>
> + res = sprintf(buffer,"%u %llu %llu %u %llu %u %llu\n",
> + (unsigned int) get_delay(task,runs),
> + (uint64_t) get_delay(task,runcpu_total),
> + (uint64_t) get_delay(task,waitcpu_total),
> + (unsigned int) get_delay(task,num_iowaits),
> + (uint64_t) get_delay(task,iowait_total),
> + (unsigned int) get_delay(task,num_memwaits),
> + (uint64_t) get_delay(task,mem_iowait_total)
The code is the documentation? :)
There is probably some documentation on /proc/<pid>/ in general and
we'll see if we can get it updated appropriately. Vivek?
> 3) Typo in init/Kconfig "atleast":
>
> If you say Y here, enable the Resource Class File System and atleast
Got it - thanks! Someone liked the new word "atleast" - at least
three occurences removed.
Oh - and uniformly updated diffstats - I probably missed some when
I was playing with quilt originally.
gerrit
On Tue, 2005-03-29 at 23:03 -0800, Gerrit Huizenga wrote:
> The code provides a fairly simple mechanism for adding controllers for
> any resource type
Last time I saw the memory controller, it was 3000 lines. Doesn't seem
too simple to me. :)
Can you post some of the additional controllers that you've been working
on to the appropriate mailing lists, like linux-mm? If the subject
experts get a good look at the controllers, it's quite possible that
some comments will cascade back to the core, don't you think?
-- Dave
El Tue, 29 Mar 2005 22:05:30 -0800,
Paul Jackson <[email protected]> escribi?:
> worth having. I for one am a CKRM skeptic, so won't be much help to you
> in that quest. Good luck.
>
> I don't see any performance numbers, either on small systems, or
> scalability on large systems. Certainly this patch does not fall under
> the "obviously no performance impact" exclusion.
I'm one of those people who also thinks that CKRM tries to do too much things, and
although my opinion doesn't counts a lot, I'll try to explain myself anyway :)
One of the things I personally don't like about CKRM its how it handles "CPU resources".
The goal of CKRM seems to be "control how much % a process can get get", but the
amount of concepts created to achieve that is too huge and too complex. For the
"CPU resources", I think that there're much simpler and better solutions. For example,
instead what CRKM proposes I propose a simpler concept: "attaching" GIDs to a
niceness level.
Say, we "attach" group foo to nice level -5. All users who belong to group foo will have
permissions to renice themselves to nice -5. If instead of that, group foo has been
attached at nice level 15, all processes from users who belong to foo will be run at 15,
and they won't be able to renice themselves even to the default priority (0)
This should be very easy to implement, and what's more important, it'd probably have
zero performance impact at runtime - CRKM touches hot paths in the scheduler
I think, this would just touch a few non-critical places - because we'd just use a existing
concept.
Sure, this can't guarantee that a group will get reserved exactly 57% of the CPU, but I
think that such level of detail is unnecesary - instead we let the kernel uses the
standard internal mechanisms to do the dirty job based in the distinction between
standard nice levels. (And we could get that level of detail just by modifying the
scheduler algorithm and adding a range of -50...0...50 nice levels ;)
For the CPU resources, we already have nice levels. The existing algorithms can already
handle priorities with them. CKRM alternative seems to be to add a second scheduling
algorithm which in super-hot paths like the ones from sched.c are, it will probably have a
performance impact. In my very humble opinion, I think we should reuse existing UNIX
concepts and combine them to achieve some of the goals CKRM tries to achieve in
a much simpler (unixy ;) way.
El Tue, 29 Mar 2005 22:05:30 -0800,
Paul Jackson <[email protected]> escribi?:
> worth having. I for one am a CKRM skeptic, so won't be much help to you
> in that quest. Good luck.
>
> I don't see any performance numbers, either on small systems, or
> scalability on large systems. Certainly this patch does not fall under
> the "obviously no performance impact" exclusion.
I'm one of those people who also thinks that CKRM tries to do too much things, and
although my opinion doesn't counts a lot, I'll try to explain myself anyway :)
One of the things I personally don't like about CKRM its how it handles "CPU resources".
The goal of CKRM seems to be "control how much % a process can get get", but the
amount of concepts created to achieve that is too huge and too complex. For the
"CPU resources", I think that there're much simpler and better solutions. For example,
instead what CRKM proposes I propose a simpler concept: "attaching" GIDs to a
niceness level.
Say, we "attach" group foo to nice level -5. All users who belong to group foo will have
permissions to renice themselves to nice -5. If instead of that, group foo has been
attached at nice level 15, all processes from users who belong to foo will be run at 15,
and they won't be able to renice themselves even to the default priority (0)
This should be very easy to implement, and what's more important, it'd probably have
zero performance impact at runtime - CRKM touches hot paths in the scheduler
I think, this would just touch a few non-critical places - because we'd just use a existing
concept.
Sure, this can't guarantee that a group will get reserved exactly 57% of the CPU, but I
think that such level of detail is unnecesary - instead we let the kernel uses the
standard internal mechanisms to do the dirty job based in the distinction between
standard nice levels. (And we could get that level of detail just by modifying the
scheduler algorithm and adding a range of -50...0...50 nice levels ;)
For the CPU resources, we already have nice levels. The existing algorithms can already
handle priorities with them. CKRM alternative seems to be to add a second scheduling
algorithm which in super-hot paths like the ones from sched.c are, it will probably have a
performance impact. In my very humble opinion, I think we should reuse existing UNIX
concepts and combine them to achieve some of the goals CKRM tries to achieve in
a much simpler (unixy ;) way.
On Wed, 30 Mar 2005 22:55:05 +0200, Diego Calleja wrote:
> El Tue, 29 Mar 2005 22:05:30 -0800,
> Paul Jackson <[email protected]> escribi=F3:
>
>
> > worth having. I for one am a CKRM skeptic, so won't be much help to you
> > in that quest. Good luck.
> >
> > I don't see any performance numbers, either on small systems, or
> > scalability on large systems. Certainly this patch does not fall under
> > the "obviously no performance impact" exclusion.
>
> I'm one of those people who also thinks that CKRM tries to do too much things, and
> although my opinion doesn't counts a lot, I'll try to explain myself anyway :)
>
> One of the things I personally don't like about CKRM its how it handles "CPU resources".
> The goal of CKRM seems to be "control how much % a process can get get", but the
> amount of concepts created to achieve that is too huge and too complex. For the
> "CPU resources", I think that there're much simpler and better solutions. For example,
> instead what CRKM proposes I propose a simpler concept: "attaching" GIDs to a
> niceness level.
Well, the current code and the stacked up patch sets don't currently
include a CPU resource controller, although the SuSE distro version does.
We've pulled back on that for the time being since the scheduler has
been under so much revision lately. However, resource utilization at the
priority level does not allow you to say "OpenOffice can have up to 30%
of my CPU, my email client is guaranteed to get at least 5%, and Firefox +
Java apps get no more than 50% of my machine, and my CD player gets 10%".
Niceness levels provide none of that level of resource control. Also,
GID's have no utility on a desktop machine, other than to separate
possibly background tasks like updatedb vs. all my real time apps.
> Say, we "attach" group foo to nice level -5. All users who belong to group foo will have
> permissions to renice themselves to nice -5. If instead of that, group foo has been
> attached at nice level 15, all processes from users who belong to foo will be run at 15,
> and they won't be able to renice themselves even to the default priority (0)
Again, great for multiuser systems if you just want people to be prioritized
as opposed to work. But more often on larger multiuser systems, you want various
services to have priorities. For instance, a web server may be allowed some
rate of incoming connections or some amount of CPU bandwidth; a database may
have memory limits, CPU limits (or allowing "at least" some percentage, possibly
also limiting it from taking over the entire machine; and IO limits in terms
amount disk traffic. These limits may allow various clients or web servers
to make progress without getting drowned out by some large server which
wants to consume 100% of cpu or all of available memory.
> This should be very easy to implement, and what's more important, it'd probably have
> zero performance impact at runtime - CRKM touches hot paths in the scheduler
> I think, this would just touch a few non-critical places - because we'd just use a existing
> concept.
Not currently in the patches being brought forward to LKML.
> Sure, this can't guarantee that a group will get reserved exactly 57% of the CPU, but I
> think that such level of detail is unnecesary - instead we let the kernel uses the
> standard internal mechanisms to do the dirty job based in the distinction between
> standard nice levels. (And we could get that level of detail just by modifying the
> scheduler algorithm and adding a range of -50...0...50 nice levels ;)
Also, with various implementation of the scheduler, the nice levels have been
either studiously ignored or sometimes at the other extreme there has been a
more clear stairstepping of nice levels. Relying on predictability here based
on the current algorithm is not a great formula for success, nor does it address
the needs of most desktop or server users in any simple/easy to use way.
> For the CPU resources, we already have nice levels. The existing algorithms can already
> handle priorities with them. CKRM alternative seems to be to add a second scheduling
> algorithm which in super-hot paths like the ones from sched.c are, it will probably have a
> performance impact. In my very humble opinion, I think we should reuse existing UNIX
> concepts and combine them to achieve some of the goals CKRM tries to achieve in
> a much simpler (unixy ;) way.
I'd love to see patches which could be validated by folks like the PlanetLab
folks, for instance. I don't believe it is possible to get the level of machine
partitioning/virtualization that CKRM provides with this overly simple prioritization
scheme.
gerrit
Diego Calleja wrote:
> El Tue, 29 Mar 2005 22:05:30 -0800,
> Paul Jackson <[email protected]> escribi?:
>
>
>
>>worth having. I for one am a CKRM skeptic, so won't be much help to you
>>in that quest. Good luck.
>>
>>I don't see any performance numbers, either on small systems, or
>>scalability on large systems. Certainly this patch does not fall under
>>the "obviously no performance impact" exclusion.
>
>
> I'm one of those people who also thinks that CKRM tries to do too much things, and
> although my opinion doesn't counts a lot, I'll try to explain myself anyway :)
>
> One of the things I personally don't like about CKRM its how it handles "CPU resources".
> The goal of CKRM seems to be "control how much % a process can get get", but the
> amount of concepts created to achieve that is too huge and too complex.
Certainly there's scope for improvement in the implementation of the CPU
controller but the solution you propose works by redefining the problem.
> For the
> "CPU resources", I think that there're much simpler and better solutions. For example,
> instead what CRKM proposes I propose a simpler concept: "attaching" GIDs to a
> niceness level.
Doing performance isolation at the granularity of users and groups may
be useful but is not enough for workload management needs. There, it is
essential that a a) flexible b) dynamic grouping of processes be
controllable in their resource consumption as an aggregate. Tying that
grouping to user/groups will not suffice.
CKRM's definition of class can be made equivalent to a user or group but
not vice versa. Hence the more generic classes are being used, rather
than reusing groups/users.
Also, our earlier prototype for the CPU controller had shown a
0.14-0.63us overhead which remained constant with increasing number of
processes. While we don't have measurements for later versions, the
overhead figures are by no means unacceptably high if one values the
additional generality of CKRM's class (over groups/users).
>
> Say, we "attach" group foo to nice level -5. All users who belong to group foo will have
> permissions to renice themselves to nice -5. If instead of that, group foo has been
> attached at nice level 15, all processes from users who belong to foo will be run at 15,
> and they won't be able to renice themselves even to the default priority (0)
>
> This should be very easy to implement, and what's more important, it'd probably have
> zero performance impact at runtime - CRKM touches hot paths in the scheduler
> I think, this would just touch a few non-critical places - because we'd just use a existing
> concept.
> Sure, this can't guarantee that a group will get reserved exactly 57% of the CPU, but I
> think that such level of detail is unnecesary
For desktop users, perhaps. For server workload management, this level
of detail is necessary. As stated earlier, CKRM's design satisfies both.
> - instead we let the kernel uses the
> standard internal mechanisms to do the dirty job based in the distinction between
> standard nice levels. (And we could get that level of detail just by modifying the
> scheduler algorithm and adding a range of -50...0...50 nice levels ;)
>
> For the CPU resources, we already have nice levels. The existing algorithms can already
> handle priorities with them. CKRM alternative seems to be to add a second scheduling
> algorithm which in super-hot paths like the ones from sched.c are, it will probably have a
> performance impact. In my very humble opinion, I think we should reuse existing UNIX
> concepts and combine them to achieve some of the goals CKRM tries to achieve in
> a much simpler (unixy ;) way.
Not that other Unix's design decisions should influence Linux but every
other enterprise UNIX has some equivalent of CKRM's classes available.
So the design is far from being non-unixy :-)
-- Shailabh
El Wed, 30 Mar 2005 13:29:53 -0800,
Gerrit Huizenga <[email protected]> escribi?:
> been under so much revision lately. However, resource utilization at the
> priority level does not allow you to say "OpenOffice can have up to 30%
> of my CPU, my email client is guaranteed to get at least 5%, and Firefox +
> Java apps get no more than 50% of my machine, and my CD player gets 10%".
> Niceness levels provide none of that level of resource control. Also,
Users can launch tasks and renice them to lowest priority levels..., with the highset
priority being given by the administrator...I've always though it's gnome/kde fault to launch
the apps at the same nice level than the panel and the window manager. Despite of
that my "design" wouldn't achieve that such fine-grained control, no - I'd argue that not
many people needs that, but then I shouldn't tell people what they need (and anyway
the previous proposal would so powerful for its simplicity that it might be worth of it doing
it anyway)
> I'd love to see patches which could be validated by folks like the PlanetLab
> folks, for instance. I don't believe it is possible to get the level of machine
> partitioning/virtualization that CKRM provides with this overly simple prioritization
> scheme.
I realize that CKRM provides much broader functionality, the alternative I was proposing
was just for CPU resources (and would probably work well for IO bandwith with CFQ),
I realize that things like "partitioning memory resources" is a whole different problem.
But I certainly think that CKRM is far too complex - the docs I've read spent all the time
describing things like classes, classes inhretance, classification engine, resources
scheduler, resource schedulers configuration and so on. I must admit I've not read too
much about CKRM - I had to stop because I couldn't understand it, everything is far too
complex to my little mind, and I'm saying this because I bet I'm not the only one here
who can't understand it either.....
On Wed, Mar 30, 2005 at 10:55:05PM +0200, Diego Calleja wrote:
> El Tue, 29 Mar 2005 22:05:30 -0800,
> Paul Jackson <[email protected]> escribi?:
>
>
> > worth having. I for one am a CKRM skeptic, so won't be much help to you
> > in that quest. Good luck.
> >
> > I don't see any performance numbers, either on small systems, or
> > scalability on large systems. Certainly this patch does not fall under
> > the "obviously no performance impact" exclusion.
>
> I'm one of those people who also thinks that CKRM tries to do too much things, and
> although my opinion doesn't counts a lot, I'll try to explain myself anyway :)
>
> One of the things I personally don't like about CKRM its how it handles "CPU resources".
> The goal of CKRM seems to be "control how much % a process can get get", but the
> amount of concepts created to achieve that is too huge and too complex. For the
> "CPU resources", I think that there're much simpler and better solutions. For example,
> instead what CRKM proposes I propose a simpler concept: "attaching" GIDs to a
> niceness level.
>
> Say, we "attach" group foo to nice level -5. All users who belong to group foo will have
> permissions to renice themselves to nice -5. If instead of that, group foo has been
> attached at nice level 15, all processes from users who belong to foo will be run at 15,
> and they won't be able to renice themselves even to the default priority (0)
>
> This should be very easy to implement, and what's more important, it'd probably have
> zero performance impact at runtime - CRKM touches hot paths in the scheduler
> I think, this would just touch a few non-critical places - because we'd just use a existing
> concept.
Your design is nice and simple to take the priority based scheduling
to the next level.
Whereas what CKRM provides is resource management and monitoring, which
is more than prioritizing group of users for scheduling.
It allows one to manage/monitor different groups of applications(that
are related or non-related).
With CKRM, you can provide resource control support for features
like UML and virtual servers to make them more controllable
domains(term domain used loosely) in terms of resource management.
>
> Sure, this can't guarantee that a group will get reserved exactly 57% of the CPU, but I
> think that such level of detail is unnecesary - instead we let the kernel uses the
> standard internal mechanisms to do the dirty job based in the distinction between
> standard nice levels. (And we could get that level of detail just by modifying the
> scheduler algorithm and adding a range of -50...0...50 nice levels ;)
>
> For the CPU resources, we already have nice levels. The existing algorithms can already
> handle priorities with them. CKRM alternative seems to be to add a second scheduling
> algorithm which in super-hot paths like the ones from sched.c are, it will probably have a
One clarification: CKRM is the infrastruture, What you are referring
is the CPU controller(whish is a module to mange the resource CPU), which
can be replaced by a simplistic one(like the one you propose) or turned
off if needed.
That is one of the advantage the architecture provides, it removed resource
specific details from the core functionality CKRM provided, so that it
remains flexible(in choosing the resources you want to control) and
expandable easily(to support additional resources).
> performance impact. In my very humble opinion, I think we should reuse existing UNIX
> concepts and combine them to achieve some of the goals CKRM tries to achieve in
> a much simpler (unixy ;) way.
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by Demarc:
> A global provider of Threat Management Solutions.
> Download our HomeAdmin security software for free today!
> http://www.demarc.com/info/Sentarus/hamr30
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
>
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------
Diego wrote:
> I bet I'm not the only one here
> who can't understand it either.....
You're not alone.
See an email thread entitled:
Classes: 1) what are they, 2) what is their name?
http://sourceforge.net/mailarchive/forum.php?thread_id=5328162&forum_id=35191
on the [email protected] email list between Aug 14 and Aug
27, 2004, where I did my best to encourage the CKRM project to address
this problem. To no avail.
Apparently, to some of the smartest amongst us, who got to hear
live presentations describing CKRM, it makes sense and is worthy
of serious consideration.
For myself, of more ordinary intelligence and working just from the
documentation and an occassional glance at the code, it has been a
difficult proposal to understand, with a rather large patch requiring
some non-trivial kernel hooks.
A question for the CKRM developers:
What middleware packages, outside the kernel, exist or are
in the works that will rely on CKRM?
CKRM (like another project near and dear to me, cpusets)
strikes me as a "middleware foundation" facility, intended
to provide the essential kernel support required for some
serious enterprise software. So perhaps in addition to
asking what end-users (of a combined kernel-middleware
platform) exist, we should also be asking who will be
directly using CKRM - directly layering middleware on top
of it.
The details don't matter much and may have to remain
obscured in the competitive fog. But the presence of
multiple groups lobbying for the same kernel infrastructure,
as an apparent basis for competing middleware products,
would I think weigh in CKRM's favor.
My impression, which may not align with how the CKRM developers view
things, is that CKRM is descendent from what have been called fair-share
schedulers. The following comes from the above email thread.
No doubt the CKRM experts are already familiar with these, but for the
possible benefit of other readers:
UNICOS Resource Administration - Chapter 4. Fair-share Scheduler
http://oscinfo.osc.edu:8080/dynaweb/all/004-2302-001/@Generic__BookTextView/22883
SHARE II -- A User Administration and Resource Control System for UNIX
http://www.c-side.com/c/papers/lisa-91.html
Solaris Resource Manager White Paper
http://wwws.sun.com/software/resourcemgr/wp-mixed/
ON THE PERFORMANCE IMPACT OF FAIR SHARE SCHEDULING
http://www.cs.umb.edu/~eb/goalmode/cmg2000final.htm
A Fair Share Scheduler, J. Kay and P. Lauder
Communications of the ACM, January 1988, Volume 31, Number 1, pp 44-55.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401
On Wed, 30 Mar 2005 17:32:32 PST, Paul Jackson wrote:
> A question for the CKRM developers:
>
> What middleware packages, outside the kernel, exist or are
> in the works that will rely on CKRM?
Primarily, CKRM classes can be instantiated today by simple
echo's into the /rcfs filesystem. There isn't a big need for
a complex middleware package to set up and use CKRM.
However, there are some tools under way to provide a small CLI
to help with the administration for those who want it. There
are also some pretty minimal rc scripts underway to ensure that
classes are configured at boot time and/or saved and restored
across reboots and a simple config file used by that rc script.
> CKRM (like another project near and dear to me, cpusets)
> strikes me as a "middleware foundation" facility, intended
> to provide the essential kernel support required for some
> serious enterprise software. So perhaps in addition to
> asking what end-users (of a combined kernel-middleware
> platform) exist, we should also be asking who will be
> directly using CKRM - directly layering middleware on top
> of it.
I'm sure you could plug this into some existing workload management
tools - lots of companies have them for managing other OS's. Getting
them to manage Linux with CKRM should be pretty simple for any of
them if you really want that sort of thing.
> The details don't matter much and may have to remain
> obscured in the competitive fog. But the presence of
> multiple groups lobbying for the same kernel infrastructure,
> as an apparent basis for competing middleware products,
> would I think weigh in CKRM's favor.
> My impression, which may not align with how the CKRM developers view
> things, is that CKRM is descendent from what have been called fair-share
> schedulers. The following comes from the above email thread.
CKRM is about ways of managing kernel resources - CPU would just be
one of these. Fairshare scheduling is similar in some respects to
what a scheduler might need to do for such a capabilitiy. But that
isn't part of the code being put forward now or the set that is
getting finalized on ckrm-tech for mainline right now. Definitely
useful, but a bit more challenging for getting a mainline mergeable
version.
BTW, one of your comments was that the word "class" was confusing.
This may stem from the fact that there have been two approaches
with the word "class" in them in CKRM.
The first was that a class would be a set of resource upper/lower limits
such as CPU, memory, number of tasks, getrlimit style resource limits,
IO bandwidth, network connections, etc. that would be applied to some
set of tasks.
At last year's kernel summit, Linus suggested that classes should
be unique to each resource, e.g. a task could be a member of a
memory class, mem-A; a CPU resource class cpu-B, an IO resource
class io-C. So, now a class is specific to a resource and a task
is effectively a member of a number of distinct and otherwise
independent resource classes.
The current code embodies the second definition of class, which
provides some more useful independence of resources (they don't all
need to tie into a common class infrastructure, which made the code
a little more intertangled).
With the current core code, a task is put into a particular resource
class simply by echoes in the corresponding rcfs directory structure
for that resource.
A soon to be forthcoming updated patch provides a simple and a more
interesting classification engine which allows you to specific rules
about what processes are associated with which resource classes.
E.g. all tasks with a particular uid can be put in the
"oracle_mem_pig" class or all tasks with a particular gid may be
put into the "video" scheduler class. The classification engine allows
for some more complex rules which are applied at task creation
time, or at a few other points such as a change of real or effective
uid/gid.
In some respects, this provides for a *very* lightweight form of
virtualization, by restricting a working set of tasks to a limited
set of resources, without the hard boundaries of a UML or Xen style
virtual machine. This also allows protection for some workloads
in the face of bursty traffic or workloads which are otherwise content
to consume your entire machine, to the exclusion of all other activities
on the machine.
gerrit
Paul Jackson wrote:
> Diego wrote:
>
>>I bet I'm not the only one here
>>who can't understand it either.....
>
>
> You're not alone.
>
> See an email thread entitled:
>
> Classes: 1) what are they, 2) what is their name?
> http://sourceforge.net/mailarchive/forum.php?thread_id=5328162&forum_id=35191
>
> on the [email protected] email list between Aug 14 and Aug
> 27, 2004, where I did my best to encourage the CKRM project to address
> this problem. To no avail.
That is not really a fair categorization of the thread. Hubertus and I
did try to explain what CKRM classes are. As the last parts of the
thread show, it was the choice of names that you disagreed with.
> Apparently, to some of the smartest amongst us, who got to hear
> live presentations describing CKRM, it makes sense and is worthy
> of serious consideration.
Except for the Kernel Summit talk (slides of which were very brief),
you have access to the very same presentations on the ckrm website.
> For myself, of more ordinary intelligence and working just from the
> documentation and an occassional glance at the code, it has been a
> difficult proposal to understand, with a rather large patch requiring
> some non-trivial kernel hooks.
Have you read Section 2 of the
http://ckrm.sourceforge.net/downloads/ckrm-ols04-paper.pdf
There the terms class, classtype, resource controllers and
classification engine have all been explained. If you continue to have
trouble understanding what these mean, we'd be happy to go over it once
more. Perhaps we should try a twiki type site or come up with a specific
set of doubts that need to be addressed.
> A question for the CKRM developers:
>
> What middleware packages, outside the kernel, exist or are
> in the works that will rely on CKRM?
>
> CKRM (like another project near and dear to me, cpusets)
> strikes me as a "middleware foundation" facility, intended
> to provide the essential kernel support required for some
> serious enterprise software. So perhaps in addition to
> asking what end-users (of a combined kernel-middleware
> platform) exist, we should also be asking who will be
> directly using CKRM - directly layering middleware on top
> of it.
>
> The details don't matter much and may have to remain
> obscured in the competitive fog. But the presence of
> multiple groups lobbying for the same kernel infrastructure,
> as an apparent basis for competing middleware products,
> would I think weigh in CKRM's favor.
Undoubtedly so. However, workload management middleware developers don't
seem to have a history of actively participating in LKML for useful
features so its left to the likes of us to determine what *would* be
useful and then go build it if it makes sense and is acceptable to the
community.
> My impression, which may not align with how the CKRM developers view
> things, is that CKRM is descendent from what have been called fair-share
> schedulers. The following comes from the above email thread.
Doing fair-share scheduling is indeed the ultimate goal of CKRM. But
using that characterization *alone* will not, in my opinion, be
sufficient to explain what are classes, classtypes etc.
> No doubt the CKRM experts are already familiar with these, but for the
> possible benefit of other readers:
>
> UNICOS Resource Administration - Chapter 4. Fair-share Scheduler
> http://oscinfo.osc.edu:8080/dynaweb/all/004-2302-001/@Generic__BookTextView/22883
>
> SHARE II -- A User Administration and Resource Control System for UNIX
> http://www.c-side.com/c/papers/lisa-91.html
>
> Solaris Resource Manager White Paper
> http://wwws.sun.com/software/resourcemgr/wp-mixed/
>
> ON THE PERFORMANCE IMPACT OF FAIR SHARE SCHEDULING
> http://www.cs.umb.edu/~eb/goalmode/cmg2000final.htm
>
> A Fair Share Scheduler, J. Kay and P. Lauder
> Communications of the ACM, January 1988, Volume 31, Number 1, pp 44-55.
Thanks for the links. Yes, some of these are useful in understanding the
utility of fair-share scheduling and may even help in creating better
"controllers" in CKRM-speak.
-- Shailabh