2005-11-01 18:59:09

by JaniD++

[permalink] [raw]
Subject: cpuset - question

Hello, list

I have tried this:

[root@dy-xeon-1 dev]# mount -t cpuset none /dev/cpuset
[root@dy-xeon-1 dev]# cd /dev/cpuset
[root@dy-xeon-1 cpuset]# mkdir cpus_0
[root@dy-xeon-1 cpuset]# cd cpus_0
[root@dy-xeon-1 cpus_0]# /bin/echo 0 > cpus
[root@dy-xeon-1 cpus_0]# /bin/echo 1 > mems
/bin/echo: write error: Numerical result out of range
[root@dy-xeon-1 cpus_0]# echo 1 >mems
[root@dy-xeon-1 cpus_0]# cat mems

[root@dy-xeon-1 cpus_0]# /bin/echo $$ > tasks
/bin/echo: write error: No space left on device
[root@dy-xeon-1 cpus_0]# echo $$ >tasks
[root@dy-xeon-1 cpus_0]# cat tasks
[root@dy-xeon-1 cpus_0]#

The google, and man pages cant help.
What can i do?

Thanks

Janos


2005-11-01 20:36:58

by Paul Jackson

[permalink] [raw]
Subject: Re: cpuset - question

JaniD++ wrote:
> [root@dy-xeon-1 cpus_0]# /bin/echo 1 > mems
> /bin/echo: write error: Numerical result out of range
> [root@dy-xeon-1 cpus_0]# echo 1 >mems
> [root@dy-xeon-1 cpus_0]# cat mems
>
> [root@dy-xeon-1 cpus_0]# /bin/echo $$ > tasks
> /bin/echo: write error: No space left on device

I'm guessing you are on a multi-processor, with a single
memory node, not a NUMA system with multiple memory nodes.

Or, at least, your kernel was compiled for that (with the
CONFIG_NUMA option disabled).

The first echo above failed because you tried to set bit 1
in mems, but only bit 0 is valid (only one memory node).

The second echo failed too, but your shells (like most
shells) builtin echo didn't display the error.

The 'cat mems' command showed that mems was not yet set,
which is indeed the case.

The third and final echo above, into 'tasks' failed because
you can't attach a task to a cpuset that has no memory specified.

If you had done '/bin/echo 0 > mems', it would have worked
much better.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-11-01 21:32:52

by JaniD++

[permalink] [raw]
Subject: Re: cpuset - question

Hello,

----- Original Message -----
From: "Paul Jackson" <[email protected]>
To: "JaniD++" <[email protected]>
Cc: <[email protected]>
Sent: Tuesday, November 01, 2005 9:36 PM
Subject: Re: cpuset - question


> JaniD++ wrote:
> > [root@dy-xeon-1 cpus_0]# /bin/echo 1 > mems
> > /bin/echo: write error: Numerical result out of range
> > [root@dy-xeon-1 cpus_0]# echo 1 >mems
> > [root@dy-xeon-1 cpus_0]# cat mems
> >
> > [root@dy-xeon-1 cpus_0]# /bin/echo $$ > tasks
> > /bin/echo: write error: No space left on device
>
> I'm guessing you are on a multi-processor, with a single
> memory node, not a NUMA system with multiple memory nodes.
>
> Or, at least, your kernel was compiled for that (with the
> CONFIG_NUMA option disabled).

Yes, this option is disabled.
This is a dual-xeon server with HT.
With HT looks 4 CPU.

In this config i need NUMA option enabled to use cpusets?

I only need to move my 4 gnbd-client process to 4 cpuset, but i don't want
to touch the memory.
This is possible, or need the CONFIG_NUMA=y option?

Thanks

Janos

>
> The first echo above failed because you tried to set bit 1
> in mems, but only bit 0 is valid (only one memory node).
>
> The second echo failed too, but your shells (like most
> shells) builtin echo didn't display the error.
>
> The 'cat mems' command showed that mems was not yet set,
> which is indeed the case.
>
> The third and final echo above, into 'tasks' failed because
> you can't attach a task to a cpuset that has no memory specified.
>
> If you had done '/bin/echo 0 > mems', it would have worked
> much better.
>
> --
> I won't rest till it's the best ...
> Programmer, Linux Scalability
> Paul Jackson <[email protected]> 1.925.600.0401
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2005-11-01 21:50:40

by Bill Davidsen

[permalink] [raw]
Subject: Re: cpuset - question

JaniD++ wrote:
> Hello, list
>
> I have tried this:
>
> [root@dy-xeon-1 dev]# mount -t cpuset none /dev/cpuset
> [root@dy-xeon-1 dev]# cd /dev/cpuset
> [root@dy-xeon-1 cpuset]# mkdir cpus_0
> [root@dy-xeon-1 cpuset]# cd cpus_0
> [root@dy-xeon-1 cpus_0]# /bin/echo 0 > cpus
> [root@dy-xeon-1 cpus_0]# /bin/echo 1 > mems
> /bin/echo: write error: Numerical result out of range
> [root@dy-xeon-1 cpus_0]# echo 1 >mems
> [root@dy-xeon-1 cpus_0]# cat mems
>
> [root@dy-xeon-1 cpus_0]# /bin/echo $$ > tasks
> /bin/echo: write error: No space left on device
> [root@dy-xeon-1 cpus_0]# echo $$ >tasks
> [root@dy-xeon-1 cpus_0]# cat tasks
> [root@dy-xeon-1 cpus_0]#
>
> The google, and man pages cant help.
> What can i do?

Start by telling us what kernel and patches you run, what config option
you used, etc. Oh, and what you're trying to do...
--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2005-11-01 22:24:09

by Paul Jackson

[permalink] [raw]
Subject: Re: cpuset - question

> In this config i need NUMA option enabled to use cpusets?

No - if you got this far, cpusets are working for you.

You would need NUMA if you had multiple memory nodes.

I'm guessing you have one collection of RAM modules,
all equally distant from the processors, which is not
a NUMA (Non-Uniform-Memory-Architecture) system.

I only mentioned NUMA because if you did have NUMA
hardware, then you would need to CONFIG it into to
your kernel to make full use of your multiple memory
nodes. I doubt that applies to you.

It looks like you have multiple (4 logical with HT)
CPUs, numbered 0, 1, 2, and 3, and one Memory Node,
numbered 0.

Cpusets should work for you - just "echo 0", not "echo 1"
into the "mems" files. Your one and only Memory Node
is numbered "0", not "1".

Actually, make that "/bin/echo", not "echo", so you can
see the error messages.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-11-01 22:26:28

by Paul Jackson

[permalink] [raw]
Subject: Re: cpuset - question

Bill wrote:
> Start by telling us what kernel and patches you run, what config option
> you used, etc. Oh, and what you're trying to do...

That's ok, Bill. I've spent the last two years reading
those sort of echo error messages from munging /dev/cpuset
files on an almost daily basis.

I'm pretty sure I can see what's going on with JaniD++'s
situation without needing more background information.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-11-01 22:38:33

by JaniD++

[permalink] [raw]
Subject: Re: cpuset - question

Thanks Paul Jackson, Bill Davidsen!

Yes, with echo 0 > mems, this is works fine! :-)

But the documentation is a little small for cpusets...

Thanks

Janos
----- Original Message -----
From: "Paul Jackson" <[email protected]>
To: "JaniD++" <[email protected]>
Cc: <[email protected]>
Sent: Tuesday, November 01, 2005 11:24 PM
Subject: Re: cpuset - question


> > In this config i need NUMA option enabled to use cpusets?
>
> No - if you got this far, cpusets are working for you.
>
> You would need NUMA if you had multiple memory nodes.
>
> I'm guessing you have one collection of RAM modules,
> all equally distant from the processors, which is not
> a NUMA (Non-Uniform-Memory-Architecture) system.
>
> I only mentioned NUMA because if you did have NUMA
> hardware, then you would need to CONFIG it into to
> your kernel to make full use of your multiple memory
> nodes. I doubt that applies to you.
>
> It looks like you have multiple (4 logical with HT)
> CPUs, numbered 0, 1, 2, and 3, and one Memory Node,
> numbered 0.
>
> Cpusets should work for you - just "echo 0", not "echo 1"
> into the "mems" files. Your one and only Memory Node
> is numbered "0", not "1".
>
> Actually, make that "/bin/echo", not "echo", so you can
> see the error messages.
>
> --
> I won't rest till it's the best ...
> Programmer, Linux Scalability
> Paul Jackson <[email protected]> 1.925.600.0401
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2005-11-01 23:36:33

by Paul Jackson

[permalink] [raw]
Subject: Re: cpuset - question

> But the documentation is a little small for cpusets...

Yes, it is.

I actually have a fairly complete document, but it contains
both the user library support and the kernel support information.
The user library has not yet been open sourced by SGI, but I
anticipate that it will be, at which time the full documentation
will become available.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-11-02 10:36:55

by Daniel J Blueman

[permalink] [raw]
Subject: Re: cpuset - question

Janos,

You can see what valid memory nodes are available from the top-level
cpuset directory:

# cat /dev/cpuset/mems
0 1 2 3

If you were to be running on a NUMA-capable system, you'd also want to
ensure page interleaving was disabled in the BIOS/pre-boot firmware
too.
___
Daniel J Blueman

2005-11-02 16:26:48

by Randy Dunlap

[permalink] [raw]
Subject: Re: cpuset - question

On Wed, 2 Nov 2005, Daniel J Blueman wrote:

> Janos,
>
> You can see what valid memory nodes are available from the top-level
> cpuset directory:
>
> # cat /dev/cpuset/mems
> 0 1 2 3
>
> If you were to be running on a NUMA-capable system, you'd also want to
> ensure page interleaving was disabled in the BIOS/pre-boot firmware
> too.
> ___

Just for info, why is this in /dev at all, instead of, say,
/sys ??

--
~Randy

2005-11-02 17:35:04

by Daniel J Blueman

[permalink] [raw]
Subject: Re: cpuset - question

I'm not sure of the true answer; it is likely that CPUSETS was
designed in the 2.4 timeframe and compatibility was preferred over the
clean sysfs interface.

I've CC'd the authors.

Dan

On 11/2/05, Randy.Dunlap <[email protected]> wrote:
> On Wed, 2 Nov 2005, Daniel J Blueman wrote:
> >
> > Janos,
> >
> > You can see what valid memory nodes are available from the top-level
> > cpuset directory:
> >
> > # cat /dev/cpuset/mems
> > 0 1 2 3
> >
> > If you were to be running on a NUMA-capable system, you'd also want to
> > ensure page interleaving was disabled in the BIOS/pre-boot firmware
> > too.
>
> Just for info, why is this in /dev at all, instead of, say,
> /sys ??
>
> --
> ~Randy
___
Daniel J Blueman

2005-11-02 18:28:42

by Paul Jackson

[permalink] [raw]
Subject: Re: cpuset - question

Randy asked:
> Just for info, why is this in /dev at all, instead of, say,
> /sys ??

Daniel added:
> I'm not sure of the true answer; it is likely that CPUSETS was
> designed in the 2.4 timeframe and compatibility was preferred over the
> clean sysfs interface.

No .. cpusets was a fresh design for Linux 2.6. The two primary
authors were Simon Derr of Bull and myself of SGI. So far as I
know, Bull did not have Linux 2.4 precedents. SGI had both Linux
2.4 precedents and Irix precedents. I chose not to propose either
of these SGI precedent API's for the Linux mainline kernel.

Simon proposed the primary interface for the /dev/cpuset, and I gladly
joined him as his design was superior. Simon had this file system
mounted under /proc, and Christoph Hellwig (our primary reviewer -
thanks!) objected, recommending /dev/cpuset as the mount point instead.

In Christoph's own words on May 13, 2004:

- don't mount the filesystem in procfs. the whole point of a new
fs is to move away from the procfs mess! /dev/cpuset/ sounds like
a saner mtpnt.

In any case, there are two aspects to this question. Should the
cpuset hierarchy be a separate virtual file system of its own, or part
of the sysfs file system? Then, if it is separate, where should it
be mounted.

The separate file system for the cpuset hierarchy has been a
clear success, in my (no doubt biased) view. It has its own rules
appropriate for the hierarchical cpu and node sets it is managing.
Even if we were starting this work now, I would enthusiastically
advocate having it as its own, separate file system.

Given that, the mount point becomes rather secondary in my view.

Christoph's proposal of /dev/cpuset, still seems reasonable and
adequate today.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401

2005-11-02 18:48:24

by Randy Dunlap

[permalink] [raw]
Subject: Re: cpuset - question

On Wed, 2 Nov 2005, Paul Jackson wrote:

> Randy asked:
> > Just for info, why is this in /dev at all, instead of, say,
> > /sys ??
>
> Daniel added:
> > I'm not sure of the true answer; it is likely that CPUSETS was
> > designed in the 2.4 timeframe and compatibility was preferred over the
> > clean sysfs interface.
>
> No .. cpusets was a fresh design for Linux 2.6. The two primary
> authors were Simon Derr of Bull and myself of SGI. So far as I
> know, Bull did not have Linux 2.4 precedents. SGI had both Linux
> 2.4 precedents and Irix precedents. I chose not to propose either
> of these SGI precedent API's for the Linux mainline kernel.
>
> Simon proposed the primary interface for the /dev/cpuset, and I gladly
> joined him as his design was superior. Simon had this file system
> mounted under /proc, and Christoph Hellwig (our primary reviewer -
> thanks!) objected, recommending /dev/cpuset as the mount point instead.
>
> In Christoph's own words on May 13, 2004:
>
> - don't mount the filesystem in procfs. the whole point of a new
> fs is to move away from the procfs mess! /dev/cpuset/ sounds like
> a saner mtpnt.
>
> In any case, there are two aspects to this question. Should the
> cpuset hierarchy be a separate virtual file system of its own, or part
> of the sysfs file system? Then, if it is separate, where should it
> be mounted.

OK. My question was intended more to say "why /dev ?".
I don't mind cpusets using its own mini-fs.

> The separate file system for the cpuset hierarchy has been a
> clear success, in my (no doubt biased) view. It has its own rules
> appropriate for the hierarchical cpu and node sets it is managing.
> Even if we were starting this work now, I would enthusiastically
> advocate having it as its own, separate file system.

OK, no problem.

> Given that, the mount point becomes rather secondary in my view.
>
> Christoph's proposal of /dev/cpuset, still seems reasonable and
> adequate today.

I guess it's too late to care about FHS?

Apparently FHS needs to be updated to know about /sys and other
virtual filesystems (other than /proc, which it knows about).
However, using /dev for a virtual fs location makes no sense
to me. /dev is defined as a place for device (or "special")
files. Nothing about virtual filesystems (unless we want to
call virtual filesystems "special files," but I think that
"special" in this sense has its own special meaning).

Anyway, we seem to be in a mode of less general review until
after the fact (the merge) instead of when it should happen
(except for Andrew) and more adding features.
And Yes, I'm in that boat too. I try, but there aren't
enough hours in a day (and I don't want more hours/day).

Thanks for the explanation & history.

--
~Randy

2005-11-03 08:20:32

by Simon Derr

[permalink] [raw]
Subject: Re: cpuset - question

On Wed, 2 Nov 2005, Paul Jackson wrote:

> Randy asked:
> > Just for info, why is this in /dev at all, instead of, say,
> > /sys ??
>
> Daniel added:
> > I'm not sure of the true answer; it is likely that CPUSETS was
> > designed in the 2.4 timeframe and compatibility was preferred over the
> > clean sysfs interface.
>
> No .. cpusets was a fresh design for Linux 2.6. The two primary
> authors were Simon Derr of Bull and myself of SGI. So far as I
> know, Bull did not have Linux 2.4 precedents. SGI had both Linux
> 2.4 precedents and Irix precedents. I chose not to propose either
> of these SGI precedent API's for the Linux mainline kernel.
>
> Simon proposed the primary interface for the /dev/cpuset, and I gladly
> joined him as his design was superior. Simon had this file system
> mounted under /proc, and Christoph Hellwig (our primary reviewer -
> thanks!) objected, recommending /dev/cpuset as the mount point instead.
>
> In Christoph's own words on May 13, 2004:
>
> - don't mount the filesystem in procfs. the whole point of a new
> fs is to move away from the procfs mess! /dev/cpuset/ sounds like
> a saner mtpnt.
>
> In any case, there are two aspects to this question. Should the
> cpuset hierarchy be a separate virtual file system of its own, or part
> of the sysfs file system? Then, if it is separate, where should it
> be mounted.
>

There were also a few technical reasons.

The first was the desire to create cpusets with 'mkdir my_cpuset'.
But this was not a sufficient reason to have a new filesystem, so after my
first version of the cpuset patch I reworked it to use sysfs.

However then I ran into a wall: sysfs does not support files larger than a
page. And this was a showstopper as the size of the `tasks' file can be
large.

So I had to drop sysfs.

Simon.

2005-11-03 08:26:23

by Sylvain Jeaugey

[permalink] [raw]
Subject: Re: cpuset - question

To come back to Randy's original question ...

Cpusets are not - in my view - designed to display the NUMA architecture.
/sys already does this very well (example of a 16 way machine) :
$ ls /sys/devices/system/node/node*
/sys/devices/system/node/node0:
cpu0 cpu1 cpu2 cpu3 cpumap distance meminfo numastat

/sys/devices/system/node/node1:
cpu4 cpu5 cpu6 cpu7 cpumap distance meminfo numastat

/sys/devices/system/node/node2:
cpu10 cpu11 cpu8 cpu9 cpumap distance meminfo numastat

/sys/devices/system/node/node3:
cpu12 cpu13 cpu14 cpu15 cpumap distance meminfo numastat

I think sysfs remains the best way to view your NUMA nodes.

Sylvain

On Wed, 2 Nov 2005, Daniel J Blueman wrote:

> I'm not sure of the true answer; it is likely that CPUSETS was
> designed in the 2.4 timeframe and compatibility was preferred over the
> clean sysfs interface.
>
> I've CC'd the authors.
>
> Dan
>
> On 11/2/05, Randy.Dunlap <[email protected]> wrote:
> > On Wed, 2 Nov 2005, Daniel J Blueman wrote:
> > >
> > > Janos,
> > >
> > > You can see what valid memory nodes are available from the top-level
> > > cpuset directory:
> > >
> > > # cat /dev/cpuset/mems
> > > 0 1 2 3
> > >
> > > If you were to be running on a NUMA-capable system, you'd also want to
> > > ensure page interleaving was disabled in the BIOS/pre-boot firmware
> > > too.
> >
> > Just for info, why is this in /dev at all, instead of, say,
> > /sys ??
> >
> > --
> > ~Randy
> ___
> Daniel J Blueman
>