2012-06-19 19:04:27

by Alexey Vlasov

[permalink] [raw]
Subject: Attaching a process to cgroups

Hi.

Is it possible to somehow fasten a process of pid attaching to cgroup?
The problem is the pid attaches to a task-file with some strange delay:

22:28:00.788224 open("/sys/fs/cgroup/memory/virtwww/w_test-l24-apache1_4bdf3d13/apache/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000035>
22:28:00.788289 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
22:28:00.788326 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e78074000 <0.000005>
22:28:00.788355 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
22:28:00.788389 lseek(3, 0, SEEK_SET) = 0 <0.000004>
22:28:00.788426 write(3, "16317\n", 6) = 6 <0.128094>
22:28:00.916578 close(3) = 0 <0.000006>

For a comparison here's a test attaching pid-file in placed tmpfs:

22:24:41.892562 open("/tmp/w_test-l24-apache1_4bdf3d13/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000010>
22:24:41.892597 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
22:24:41.892631 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5685b6f000 <0.000006>
22:24:41.892664 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
22:24:41.892701 lseek(3, 6, SEEK_SET) = 6 <0.000004>
22:24:41.892738 write(3, "25966\n", 6) = 6 <0.000008>
22:24:41.892767 close(3) = 0 <0.000005>

Here goes it immediately.

--
BRGDS. Alexey Vlasov.


2012-06-20 03:36:39

by Daisuke Nishimura

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

Hi.

What does "cat /sys/fs/cgroup/.../apache/memory.move_charge_at_immigrate" show ?

If it shows non-zero value, you can make the pid attachment faster by writing "0" to
memory.move_charge_at_immigrate before attaching the process.
But note that if you disable the feature, current memory usage of the process is not
moved to the new cgroup.

Thanks,
Daisuke Nishimura.

On Tue, 19 Jun 2012 22:58:56 +0400
Alexey Vlasov <[email protected]> wrote:

> Hi.
>
> Is it possible to somehow fasten a process of pid attaching to cgroup?
> The problem is the pid attaches to a task-file with some strange delay:
>
> 22:28:00.788224 open("/sys/fs/cgroup/memory/virtwww/w_test-l24-apache1_4bdf3d13/apache/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000035>
> 22:28:00.788289 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788326 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e78074000 <0.000005>
> 22:28:00.788355 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788389 lseek(3, 0, SEEK_SET) = 0 <0.000004>
> 22:28:00.788426 write(3, "16317\n", 6) = 6 <0.128094>
> 22:28:00.916578 close(3) = 0 <0.000006>
>
> For a comparison here's a test attaching pid-file in placed tmpfs:
>
> 22:24:41.892562 open("/tmp/w_test-l24-apache1_4bdf3d13/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000010>
> 22:24:41.892597 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
> 22:24:41.892631 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5685b6f000 <0.000006>
> 22:24:41.892664 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
> 22:24:41.892701 lseek(3, 6, SEEK_SET) = 6 <0.000004>
> 22:24:41.892738 write(3, "25966\n", 6) = 6 <0.000008>
> 22:24:41.892767 close(3) = 0 <0.000005>
>
> Here goes it immediately.
>
> --
> BRGDS. Alexey Vlasov.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2012-06-20 11:08:59

by Alexey Vlasov

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Wed, Jun 20, 2012 at 12:34:31PM +0900, Daisuke Nishimura wrote:
>
> What does "cat /sys/fs/cgroup/.../apache/memory.move_charge_at_immigrate" show ?

Here I have zero.

By the way such a delay is not only in memory controller but also in
others - in cpu, blkio for example.

--
BRGDS. Alexey Vlasov.

2012-06-20 12:28:37

by Mike Galbraith

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Tue, 2012-06-19 at 22:58 +0400, Alexey Vlasov wrote:
> Hi.
>
> Is it possible to somehow fasten a process of pid attaching to cgroup?
> The problem is the pid attaches to a task-file with some strange delay:
>
> 22:28:00.788224 open("/sys/fs/cgroup/memory/virtwww/w_test-l24-apache1_4bdf3d13/apache/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000035>
> 22:28:00.788289 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788326 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e78074000 <0.000005>
> 22:28:00.788355 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788389 lseek(3, 0, SEEK_SET) = 0 <0.000004>
> 22:28:00.788426 write(3, "16317\n", 6) = 6 <0.128094>
> 22:28:00.916578 close(3) = 0 <0.000006>


kernel/cgroup.c::cgroup_attach_task()
{
...
synchronize_rcu();
...
}

2012-06-21 07:54:41

by Alexey Vlasov

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
>
> kernel/cgroup.c::cgroup_attach_task()
> {
> ...
> synchronize_rcu();
> ...
> }

So nothing can be done here? (I mean if only I knew how to fix it I
wouldn't ask about it ;)

2012-06-21 08:23:09

by Mike Galbraith

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> >
> > kernel/cgroup.c::cgroup_attach_task()
> > {
> > ...
> > synchronize_rcu();
> > ...
> > }
>
> So nothing can be done here? (I mean if only I knew how to fix it I
> wouldn't ask about it ;)

Sure, kill the obnoxious thing, it's sitting right in the middle of the
userspace interface.

I banged on it a while back (wrt explosive android patches), extracted
RCU from the userspace interface. It seemed to work great, much faster,
couldn't make it explode. I wouldn't bet anything I wasn't willing to
immediately part with that the result was really really safe though ;-)

-Mike

2012-06-21 08:27:01

by Mike Galbraith

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Thu, 2012-06-21 at 10:23 +0200, Mike Galbraith wrote:
> On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > >
> > > kernel/cgroup.c::cgroup_attach_task()
> > > {
> > > ...
> > > synchronize_rcu();
> > > ...
> > > }
> >
> > So nothing can be done here? (I mean if only I knew how to fix it I
> > wouldn't ask about it ;)
>
> Sure, kill the obnoxious thing, it's sitting right in the middle of the
> userspace interface.

Um, lest anyone misunderstand, no, I don't mean by whacking instances of
synchronize_rcu() without further ado :)

-Mike

2012-06-26 18:17:30

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > >
> > > kernel/cgroup.c::cgroup_attach_task()
> > > {
> > > ...
> > > synchronize_rcu();
> > > ...
> > > }
> >
> > So nothing can be done here? (I mean if only I knew how to fix it I
> > wouldn't ask about it ;)
>
> Sure, kill the obnoxious thing, it's sitting right in the middle of the
> userspace interface.
>
> I banged on it a while back (wrt explosive android patches), extracted
> RCU from the userspace interface. It seemed to work great, much faster,
> couldn't make it explode. I wouldn't bet anything I wasn't willing to
> immediately part with that the result was really really safe though ;-)

Or replace it with synchronize_rcu_expedited(). You can "get lucky"
for quite some time removing synchronize_rcu() calls!

Thanx, Paul

2012-06-27 07:23:37

by Mike Galbraith

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Tue, 2012-06-26 at 11:06 -0700, Paul E. McKenney wrote:
> On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> > On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> > > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > > >
> > > > kernel/cgroup.c::cgroup_attach_task()
> > > > {
> > > > ...
> > > > synchronize_rcu();
> > > > ...
> > > > }
> > >
> > > So nothing can be done here? (I mean if only I knew how to fix it I
> > > wouldn't ask about it ;)
> >
> > Sure, kill the obnoxious thing, it's sitting right in the middle of the
> > userspace interface.
> >
> > I banged on it a while back (wrt explosive android patches), extracted
> > RCU from the userspace interface. It seemed to work great, much faster,
> > couldn't make it explode. I wouldn't bet anything I wasn't willing to
> > immediately part with that the result was really really safe though ;-)
>
> Or replace it with synchronize_rcu_expedited(). You can "get lucky"
> for quite some time removing synchronize_rcu() calls!

s/remove/replace, but yup. A company that wanted to use the android
patches plus my tinkering showed a fix they needed on top to close a
race discovered in their testing. So yeah, even when all seems fine,
extracting synchronize_rcu() may expose evils you couldn't encounter
before, and didn't happen to encounter afterward.

-Mike

2012-06-27 17:14:28

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Wed, Jun 27, 2012 at 09:23:31AM +0200, Mike Galbraith wrote:
> On Tue, 2012-06-26 at 11:06 -0700, Paul E. McKenney wrote:
> > On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> > > On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> > > > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > > > >
> > > > > kernel/cgroup.c::cgroup_attach_task()
> > > > > {
> > > > > ...
> > > > > synchronize_rcu();
> > > > > ...
> > > > > }
> > > >
> > > > So nothing can be done here? (I mean if only I knew how to fix it I
> > > > wouldn't ask about it ;)
> > >
> > > Sure, kill the obnoxious thing, it's sitting right in the middle of the
> > > userspace interface.
> > >
> > > I banged on it a while back (wrt explosive android patches), extracted
> > > RCU from the userspace interface. It seemed to work great, much faster,
> > > couldn't make it explode. I wouldn't bet anything I wasn't willing to
> > > immediately part with that the result was really really safe though ;-)
> >
> > Or replace it with synchronize_rcu_expedited(). You can "get lucky"
> > for quite some time removing synchronize_rcu() calls!
>
> s/remove/replace, but yup. A company that wanted to use the android
> patches plus my tinkering showed a fix they needed on top to close a
> race discovered in their testing. So yeah, even when all seems fine,
> extracting synchronize_rcu() may expose evils you couldn't encounter
> before, and didn't happen to encounter afterward.

I really did mean "remove". Removing a synchronize_rcu() does result
in a race, but often an extremely low-probability race. So you can
remove a synchronize_rcu() and get lucky for a long time, but sooner
or later, something will explode.

Thanx, Paul

2012-06-28 02:41:06

by Mike Galbraith

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Wed, 2012-06-27 at 10:10 -0700, Paul E. McKenney wrote:

> I really did mean "remove". Removing a synchronize_rcu() does result
> in a race, but often an extremely low-probability race. So you can
> remove a synchronize_rcu() and get lucky for a long time, but sooner
> or later, something will explode.

Ah. Android patches did "replace", and it still didn't take long at all
for bad luck to happen.. so "remove" for that particular instance would
probably result in "sooner" variety explosions :)

-Mike

2012-08-08 16:42:26

by Alexey Vlasov

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Wed, Jul 25, 2012 at 03:57:47PM +0200, Mike Galbraith wrote:
> > Hanging on read():
> >
> > # strace -ttT cat /proc/cgroups
> >
> > 17:30:43.825005 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0 <0.000005>
> > 17:30:43.825048 open("/proc/cgroups", O_RDONLY) = 3 <0.000014>
> > 17:30:43.825085 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000004>
> > 17:30:43.825125 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000005>
> > 17:30:43.825161 read(3, "#subsys_name\thierarchy\tnum_cgrou"..., 32768) = 112 <7.447084>

In general I've changed it to synchronize_rcu_expedited () and all the
delays have gone both on writing and reading files from cgroups.

--
BRGDS. Alexey Vlasov.

2012-08-08 16:51:40

by Paul E. McKenney

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Wed, Aug 08, 2012 at 08:40:33PM +0400, Alexey Vlasov wrote:
> On Wed, Jul 25, 2012 at 03:57:47PM +0200, Mike Galbraith wrote:
> > > Hanging on read():
> > >
> > > # strace -ttT cat /proc/cgroups
> > >
> > > 17:30:43.825005 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0 <0.000005>
> > > 17:30:43.825048 open("/proc/cgroups", O_RDONLY) = 3 <0.000014>
> > > 17:30:43.825085 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000004>
> > > 17:30:43.825125 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000005>
> > > 17:30:43.825161 read(3, "#subsys_name\thierarchy\tnum_cgrou"..., 32768) = 112 <7.447084>
>
> In general I've changed it to synchronize_rcu_expedited () and all the
> delays have gone both on writing and reading files from cgroups.

Is the writing and reading from cgroups something that your workload
does all the time, or is it something that happens only on occasional
updates to your cgroup configuration?

Thanx, Paul

2012-08-10 09:56:08

by Alexey Vlasov

[permalink] [raw]
Subject: Re: Attaching a process to cgroups

On Wed, Aug 08, 2012 at 09:51:29AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 08, 2012 at 08:40:33PM +0400, Alexey Vlasov wrote:
> >
> > In general I've changed it to synchronize_rcu_expedited () and all the
> > delays have gone both on writing and reading files from cgroups.
>
> Is the writing and reading from cgroups something that your workload
> does all the time, or is it something that happens only on occasional
> updates to your cgroup configuration?

There always were some delay on writing. It reproduces easily, you have
to create some 1000 groups (may be it can be enough to create 1 group, I
didn't check it actually) and write pid to a task file of the group. I
described it in my first message.

Delays on reading appeared when there began an active rotation of
proccesses in task files and may be by renewing of counters
(cpuacct.stat, memory.stat) due to the cgroups hierarchy. LA has grown
from 10 to 500 and all the programms that read cgroups files in /proc
(htop for example) practically stopped working.

--
BRGDS. Alexey Vlasov.