We are from the Open Grid Scheduler, which is the official Open Source Grid Engine. Open Grid Scheduler/
Grid Engine ( http://gridscheduler.sourceforge.net )?is used by many compute farms & HPC sites for job scheduling.
In the next release, we are using cgroups to define a Job Container interface for batch jobs:
http://blogs.scalablelogic.com/2012/05/grid-engine-cgroups-integration.html
However, not only us, but others have found that the memcg controller does not cause sbrk(2) or mmap(2) to
return error when the cgroup is under high memory pressure. Further, when the amount of free memory is
really low, the Linux Kernel OOM killer picks something and kills it.
http://www.spinics.net/lists/cgroups/msg02622.html
We also would like to see if it is technically possible for the Virtual Memory Manager to interact with the
memory?controller properly and give us the?semantics of setrlimit(2). So basically if the current address
space usage exceeds the "memory.memsw.limit_in_bytes" limit defined by the administrator, then the
memory allocation system calls (example: mmap(2), sbrk(2), etc) will return error such that the OOM
killer is not invoked.
Thanks in advance.
?-Ron
On Thu 07-06-12 18:19:07, Ron Chen wrote:
[...]
> However, not only us, but others have found that the memcg controller
> does not cause sbrk(2) or mmap(2) to return error when the cgroup is
> under high memory pressure.
Yes, because memory controller tracks the allocated memory (with page
granularity) rather than address space. So the memory is accounted when
it is faulted in.
> Further, when the amount of free memory is really low, the Linux
> Kernel OOM killer picks something and kills it.
Yes, this is the result of the design when the memory is tracked during
page faults.
> http://www.spinics.net/lists/cgroups/msg02622.html
>
>
> We also would like to see if it is technically possible for the
> Virtual Memory Manager to interact with the memory?controller
> properly and give us the?semantics of setrlimit(2).
What prevents you from using setrlimit from inside the group?
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
(2012/06/08 10:19), Ron Chen wrote:
> We are from the Open Grid Scheduler, which is the official Open Source Grid Engine. Open Grid Scheduler/
> Grid Engine ( http://gridscheduler.sourceforge.net ) is used by many compute farms& HPC sites for job scheduling.
>
> In the next release, we are using cgroups to define a Job Container interface for batch jobs:
>
> http://blogs.scalablelogic.com/2012/05/grid-engine-cgroups-integration.html
>
>
> However, not only us, but others have found that the memcg controller does not cause sbrk(2) or mmap(2) to
> return error when the cgroup is under high memory pressure. Further, when the amount of free memory is
> really low, the Linux Kernel OOM killer picks something and kills it.
>
> http://www.spinics.net/lists/cgroups/msg02622.html
>
>
> We also would like to see if it is technically possible for the Virtual Memory Manager to interact with the
> memory controller properly and give us the semantics of setrlimit(2). So basically if the current address
> space usage exceeds the "memory.memsw.limit_in_bytes" limit defined by the administrator, then the
> memory allocation system calls (example: mmap(2), sbrk(2), etc) will return error such that the OOM
> killer is not invoked.
>
It's not implemented yet. And, it was proposed before and patches were posted but
finally didn't be merged.
IIRC, there were some implementation problem but the biggest reason of rejection
was the author couldn't convince us there are real use case.
If you have real use case and want a new feature on memory cgroup, please CC
[email protected], [email protected]
Someone (including me) may be able to cook a patch for future linux kernel if you
have real use cases.
BTW, you can stop memory-cgroup-level oom-killer by memory.oom_control file.
But you cannot stop system-level oom-killer, there are no knobs.
Thanks,
-Kame