LinuxLists.cc - [RFC] Defragmentation interface

2006-11-02 14:39:30

Subject: [RFC] Defragmentation interface

Hi,

from the thread after my patch implementing ext3 online
defragmentation I found out that probably the only (and definitely the
biggest) issue is the interface. Someone wants is common enough so that
we can profit from common tools for several filesystems, others object
that some applications, e.g. defragmenter, need to know something about
ext3 internals to work reasonably well. Moreover ioctl() is ugly and has
some compatibility issues, on the other hand ext2meta is too lowlevel,
fs-specific and it would be hard to do any reasonable application
crash-safe...
So in this email I try to propose some interface which should hopefully
address most of the concerns. The type of the interface is sysfs like
(idea taken from ext2meta) - that has a few advantages:
- no 32/64-bit compatibility issues
- easily extensible
- generally nice ;)

Each filesystem willing to support this interface implements special
filesystem (e.g. ext3meta, XFSmeta, ...) and admin/defrag-tool mounts it
to some directory. There are parts of this interface which should be
common for all filesystems (so that tools don't have to care about
particular filesystem and still get some useful results), other parts
are fs-specific. Here is basic structure I propose:

meta/features
- bitmap of features supported by the interface (ext2/3-like) so that
the tool can verify whether it understands the interface and don't
mess with it otherwise
meta/allocation/free_blocks
- RO file - if you read from fpos F, you'll get a list of extents
describing areas with free blocks (as many as fits into supplied
buffer) starting from block F. Fpos of your file descriptor is
shifted to the first unreported free block.
meta/super/blocksize
- filesystem block size
meta/super/id
- filesystem ID (for paranoid tools to verify that they are accessing
really the right meta-filesystem)
meta/nodes/<ident>
- this should be a directory containing things specific for a fs-object
with identification <ident>. In case of ext3 these would be inode
numbers, I guess this should be plausible also for XFS and others
but I'm open to suggestions...
- directory contains the following:
alloc_goal
- block number with current allocation goal
data/extents
- if you read from this file, you get a list of extents describing
data blocks (and holes) of the file. The listing starts at logical
block fpos. Fpos is shifted to the first unreported data block.
data/alloc
- you write there a number L and fs allocates L blocks to a file
(preferable from alloc_goal) starting from file-block fpos. Fpos
is shifted after the last block allocated in this call.
data/reloc
- you write there <ident> and relocation of data happens as follows:
All blocks that are allocated both in original file and <ident>
are relocated to <ident>. Write returns number of relocated
blocks.
metadata/
- this directory is fs-specific, contains fs block pointers and
similar. Here I describe what I'd like to have for ext3.
metadata/alloc
- you write there number Level and a fs allocates an indirect block
to a file for logical block fpos at level Level of the indirect
tree. Parent indirect block must be already allocated.
metadata/reloc
- you write two numbers <ident> Level and an indirect block for
logical offset fpos at level Level will be swapped with
corresponding indirect block of <ident>.

This is all that is needed for my purposes. Any comments welcome.

Honza

--
Jan Kara <[email protected]>
SuSE CR Labs

2006-11-02 22:59:53

by David Chinner

[permalink] [raw]

Subject: Re: [RFC] Defragmentation interface

On Thu, Nov 02, 2006 at 03:39:29PM +0100, Jan Kara wrote:
> Hi,
>
> from the thread after my patch implementing ext3 online
> defragmentation I found out that probably the only (and definitely the
> biggest) issue is the interface. Someone wants is common enough so that
> we can profit from common tools for several filesystems, others object
> that some applications, e.g. defragmenter, need to know something about
> ext3 internals to work reasonably well. Moreover ioctl() is ugly and has
> some compatibility issues, on the other hand ext2meta is too lowlevel,
> fs-specific and it would be hard to do any reasonable application
> crash-safe...
> So in this email I try to propose some interface which should hopefully
> address most of the concerns. The type of the interface is sysfs like
> (idea taken from ext2meta) - that has a few advantages:
> - no 32/64-bit compatibility issues
> - easily extensible
> - generally nice ;)

- complex
- over-engineered
- little common code between filesystems

BTW, does use of sysfs mean ASCII encoding of all the data
passing between kernel and userspace?

> Each filesystem willing to support this interface implements special
> filesystem (e.g. ext3meta, XFSmeta, ...) and admin/defrag-tool mounts it
> to some directory.

- not useful for wider audiences like applications that would like
to direct allocation

> There are parts of this interface which should be
> common for all filesystems (so that tools don't have to care about
> particular filesystem and still get some useful results), other parts
> are fs-specific. Here is basic structure I propose:
>
> meta/features
> - bitmap of features supported by the interface (ext2/3-like) so that
> the tool can verify whether it understands the interface and don't
> mess with it otherwise

- grow very large, very quickly if it has to support all the
different quirks of different filesystems.

> meta/allocation/free_blocks
> - RO file - if you read from fpos F, you'll get a list of extents
> describing areas with free blocks (as many as fits into supplied
> buffer) starting from block F. Fpos of your file descriptor is
> shifted to the first unreported free block.

- linear search properties == Bad. (think fs sizes of hundreds of
terabytes - XFS is already deployed with filesystems of this size)
- cannot use smart requests like given me free blocks near X,
in AG Y or Z, etc.
- some filesystems have more than one data area - e.g. XFS has the
realtime volume.
- every time you fail an allocation, you need to reread this file.

> meta/super/blocksize
> - filesystem block size

fcntl(FIGETBSZ).

Also:

- some filesystems can use different block sizes for different
structures (e.g XFs directory blocks canbe larger than the fsb)
- stripe unit and stripe width need to be exposed so defrag too
can make correct placement decisions.
- extent size hints, etc.

Hence this will require the spuer/ directory to be extensible
in a filesystem specific interface.

> meta/super/id
> - filesystem ID (for paranoid tools to verify that they are accessing
> really the right meta-filesystem)

- UUID, please.

> meta/nodes/<ident>
> - this should be a directory containing things specific for a fs-object
> with identification <ident>. In case of ext3 these would be inode
> numbers, I guess this should be plausible also for XFS and others
> but I'm open to suggestions...
> - directory contains the following:
> alloc_goal
> - block number with current allocation goal

The kernel has to store this across syscalls until you write into
data/alloc? That sounds dangerous...

> data/extents
> - if you read from this file, you get a list of extents describing
> data blocks (and holes) of the file. The listing starts at logical
> block fpos. Fpos is shifted to the first unreported data block.

fcntl(FIBMAP)

> data/alloc
> - you write there a number L and fs allocates L blocks to a file
> (preferable from alloc_goal) starting from file-block fpos. Fpos
> is shifted after the last block allocated in this call.

You seek to the position you want (in blocks or bytes?), then write
a number into the file (in blocks or bytes)? That's messy compared
to a function call with an offset and length in it....

> data/reloc
> - you write there <ident> and relocation of data happens as follows:
> All blocks that are allocated both in original file and <ident>
> are relocated to <ident>. Write returns number of relocated
> blocks.

You can only relocate to a new inode (which in XFS will change
the inode number)? What happens if there are blocks in duplicate
offsets in both inodes? What happens if all the blocks aren't
relocated - how do you handle this?

Let me get this straight - the interface you propose for
moving data about is:

read and process extents into an internal structure
find range where you want to relocate
find free space you want to relocate into
write desired block to alloc_goal
seek to allocation offset in data/alloc
write length into data/alloc
allocate new inode
write new inode number into data/reloc to relocate blocks

What I proposed:

fcntl(src, FIBMAP);
/* find range to relocate */
open(tmp, O_CREATE);
funlink(tmp);
fs_get_free_list(src, policy, list);
/* select free extent to use */
fs_allocate_space(tmp, list[X], off, len);
fs_move_data(src, tmp, off, len);
close(tmp);
close(src);

So the process is pretty close to the same except the interface I
proposed does not change the location of the inode holding the data.
The major difference is that one implementation requires 3 new
generically useful syscalls, and the other requires every filesystem
to implement a metadata filesystem and require root priviledges
to use.

> metadata/
> - this directory is fs-specific, contains fs block pointers and
> similar. Here I describe what I'd like to have for ext3.

Nothing really useful for XFS here unless we start talking
about btree defragmentation and attribute fork optimisation,
etc. We really don't need a sysfs interface for this, just
an additional fs_move_metadata() type of call....

hmmm - how do you support objects in the filesystem not attached
to inodes (e.g. the freespace and inode btrees in XFS)? What sort
interface would they use?

> This is all that is needed for my purposes. Any comments welcome.

Then your purpose is explicitly data defragmentation? If that is
the case, I still fail to see any need for a new metadata fs for
every filesystem to support this.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2006-11-03 14:30:30

by Jan Kara

[permalink] [raw]

Subject: Re: [RFC] Defragmentation interface

Hello,

Thanks for your comments.

> > from the thread after my patch implementing ext3 online
> > defragmentation I found out that probably the only (and definitely the
> > biggest) issue is the interface. Someone wants is common enough so that
> > we can profit from common tools for several filesystems, others object
> > that some applications, e.g. defragmenter, need to know something about
> > ext3 internals to work reasonably well. Moreover ioctl() is ugly and has
> > some compatibility issues, on the other hand ext2meta is too lowlevel,
> > fs-specific and it would be hard to do any reasonable application
> > crash-safe...
> > So in this email I try to propose some interface which should hopefully
> > address most of the concerns. The type of the interface is sysfs like
> > (idea taken from ext2meta) - that has a few advantages:
> > - no 32/64-bit compatibility issues
> > - easily extensible
> > - generally nice ;)
>
> - complex
> - over-engineered
> - little common code between filesystems
The first two may be but actually I don't think you'll have too much
common code among fs anyway whatever interface you choose.

> BTW, does use of sysfs mean ASCII encoding of all the data
> passing between kernel and userspace?
Not necessarify but mostly yes. At least I intend to have all the
files I have proposed in ASCII.

> > Each filesystem willing to support this interface implements special
> > filesystem (e.g. ext3meta, XFSmeta, ...) and admin/defrag-tool mounts it
> > to some directory.
>
> - not useful for wider audiences like applications that would like
> to direct allocation
Why not? A simple tool could stat file, get ino, put some number in
alloc_goal...

> > There are parts of this interface which should be
> > common for all filesystems (so that tools don't have to care about
> > particular filesystem and still get some useful results), other parts
> > are fs-specific. Here is basic structure I propose:
> >
> > meta/features
> > - bitmap of features supported by the interface (ext2/3-like) so that
> > the tool can verify whether it understands the interface and don't
> > mess with it otherwise
>
> - grow very large, very quickly if it has to support all the
> different quirks of different filesystems.
Yes, that may be a problem...

> > meta/allocation/free_blocks
> > - RO file - if you read from fpos F, you'll get a list of extents
> > describing areas with free blocks (as many as fits into supplied
> > buffer) starting from block F. Fpos of your file descriptor is
> > shifted to the first unreported free block.
>
> - linear search properties == Bad. (think fs sizes of hundreds of
> terabytes - XFS is already deployed with filesystems of this size)
OK, so what do you propose? You want syscall find_free_blocks() and
my idea of it was that it will do basically the same think as my
interface.

> - cannot use smart requests like given me free blocks near X,
> in AG Y or Z, etc.
It supports "give me free block after block X". I agree that more
complicated requests may be sometimes useful but I believe doing some
syscall interface for them would be even worse.

> - some filesystems have more than one data area - e.g. XFS has the
> realtime volume.
Interesting, I didn't know that. But anything that wants to mess with
volumes has to know that it uses XFS anyway so this handling should be
probably fs-specific...

> - every time you fail an allocation, you need to reread this file.
Yes, that's the most serious disadvantage I see. Do you see any way
out of it in any interface?

> > meta/super/blocksize
> > - filesystem block size
>
> fcntl(FIGETBSZ).
I know but can be also in the interface...

> Also:
>
> - some filesystems can use different block sizes for different
> structures (e.g XFs directory blocks canbe larger than the fsb)
The block size was meant as an allocation unit size. So basically it
really was just another interface to FIGETBSZ.

> - stripe unit and stripe width need to be exposed so defrag too
> can make correct placement decisions.
fs-specific thing...

> - extent size hints, etc.
Umm, I don't understand what you mean by this.

> Hence this will require the spuer/ directory to be extensible
> in a filesystem specific interface.
Definitely. My mistake I did not say that.

> > meta/super/id
> > - filesystem ID (for paranoid tools to verify that they are accessing
> > really the right meta-filesystem)
>
> - UUID, please.
Yes, I meant UUID.

> > meta/nodes/<ident>
> > - this should be a directory containing things specific for a fs-object
> > with identification <ident>. In case of ext3 these would be inode
> > numbers, I guess this should be plausible also for XFS and others
> > but I'm open to suggestions...
> > - directory contains the following:
> > alloc_goal
> > - block number with current allocation goal
>
> The kernel has to store this across syscalls until you write into
> data/alloc? That sounds dangerous...
This is persistent until kernel decides to remove inode from memory.
So while you have the file open, you are guaranteed that kernel keeps
the information.

> > data/extents
> > - if you read from this file, you get a list of extents describing
> > data blocks (and holes) of the file. The listing starts at logical
> > block fpos. Fpos is shifted to the first unreported data block.
>
> fcntl(FIBMAP)
Yes. Only data/extents is a bit more effective and it fits the
interface nicely.

> > data/alloc
> > - you write there a number L and fs allocates L blocks to a file
> > (preferable from alloc_goal) starting from file-block fpos. Fpos
> > is shifted after the last block allocated in this call.
>
> You seek to the position you want (in blocks or bytes?), then write
> a number into the file (in blocks or bytes)? That's messy compared
> to a function call with an offset and length in it....
I meant that everything is in blocks. On the other hand we may well
define it in bytes. I don't have a strong opinion.

> > data/reloc
> > - you write there <ident> and relocation of data happens as follows:
> > All blocks that are allocated both in original file and <ident>
> > are relocated to <ident>. Write returns number of relocated
> > blocks.
>
> You can only relocate to a new inode (which in XFS will change
> the inode number)? What happens if there are blocks in duplicate
> offsets in both inodes? What happens if all the blocks aren't
> relocated - how do you handle this?
Inode does not change. Only block pointers are changed. Let <orig> be
original inode and <blocks> the temporary inode. If block at offset O is
allocated in both <orig> and <blocks>, then we copy data for the block
from <orig> to <blocks> and swap block pointers to the block of <orig>
and <blocks>.

> Let me get this straight - the interface you propose for
> moving data about is:
>
> read and process extents into an internal structure
> find range where you want to relocate
> find free space you want to relocate into
> write desired block to alloc_goal
> seek to allocation offset in data/alloc
> write length into data/alloc
> allocate new inode
> write new inode number into data/reloc to relocate blocks
>
> What I proposed:
>
> fcntl(src, FIBMAP);
> /* find range to relocate */
> open(tmp, O_CREATE);
> funlink(tmp);
> fs_get_free_list(src, policy, list);
> /* select free extent to use */
> fs_allocate_space(tmp, list[X], off, len);
> fs_move_data(src, tmp, off, len);
> close(tmp);
> close(src);
>
> So the process is pretty close to the same except the interface I
> proposed does not change the location of the inode holding the data.
Yes, what we propose is almost exactly the same in the effect (the
inode move is misunderstanding, it does not happen in my case either).

> The major difference is that one implementation requires 3 new
> generically useful syscalls, and the other requires every filesystem
> to implement a metadata filesystem and require root priviledges
> to use.
Yes. IMO the complexity of implementation is almost the same in the
syscall case and in my sysfs case. What syscall would do is just do some
basic checks and redirect everything into fs-specific call anyway...
In sysfs you just hook the same fs-specific routines to the files I
describe. Regarding the priviledges, I don't believe non-root (or user
without proper capability) should be allowed to do these operations. I
can imagine all kinds of DoS attacks using these interfaces (e.g.
forcing fs into worst-cases of file placement etc...)

> > metadata/
> > - this directory is fs-specific, contains fs block pointers and
> > similar. Here I describe what I'd like to have for ext3.
>
> Nothing really useful for XFS here unless we start talking
> about btree defragmentation and attribute fork optimisation,
> etc. We really don't need a sysfs interface for this, just
> an additional fs_move_metadata() type of call....
Either a new syscall or new files in metafs - I find the second nicer
;).

> hmmm - how do you support objects in the filesystem not attached
> to inodes (e.g. the freespace and inode btrees in XFS)? What sort
> interface would they use?
You could have fs-specific hooks manipulating with your B-tree..

> > This is all that is needed for my purposes. Any comments welcome.
>
> Then your purpose is explicitly data defragmentation? If that is
> the case, I still fail to see any need for a new metadata fs for
> every filesystem to support this.
What I want is to implement defrag for ext3. For that I need some new
interfaces so I'm trying to design them in such a way that further
extension for other needs is possible. That's all. Now if the interface
has some common parts for several filesystems, then making userspace
tool work for all of them should be easier. So I don't require anybody
to implement it. Just if it's implemented, userspace tool can work for
it too...
Bye
Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2006-11-03 14:57:40

by Dave Kleikamp

[permalink] [raw]

Subject: Re: [RFC] Defragmentation interface

On Fri, 2006-11-03 at 09:59 +1100, David Chinner wrote:

> Let me get this straight - the interface you propose for
> moving data about is:
>
> read and process extents into an internal structure
> find range where you want to relocate
> find free space you want to relocate into
> write desired block to alloc_goal
> seek to allocation offset in data/alloc
> write length into data/alloc
> allocate new inode
> write new inode number into data/reloc to relocate blocks
>
> What I proposed:
>
> fcntl(src, FIBMAP);
> /* find range to relocate */
> open(tmp, O_CREATE);
> funlink(tmp);
> fs_get_free_list(src, policy, list);
> /* select free extent to use */
> fs_allocate_space(tmp, list[X], off, len);
> fs_move_data(src, tmp, off, len);
> close(tmp);
> close(src);
>
> So the process is pretty close to the same except the interface I
> proposed does not change the location of the inode holding the data.
> The major difference is that one implementation requires 3 new
> generically useful syscalls, and the other requires every filesystem
> to implement a metadata filesystem and require root priviledges
> to use.

I agree with Dave here. The metadata filesystem will require a lot of
overhead (and a lot of code) both in the kernel and in user-space. The
only benefit I see, is that it can be easily extended. This may be
useful for debugging and prototyping, but I don't like it as a solution
for adding a permanent interface.

Shaggy
--
David Kleikamp
IBM Linux Technology Center

2006-11-03 19:22:49

by Andreas Dilger

[permalink] [raw]

Subject: Re: [RFC] Defragmentation interface

On Nov 03, 2006 15:30 +0100, Jan Kara wrote:
> > - stripe unit and stripe width need to be exposed so defrag too
> > can make correct placement decisions.
> fs-specific thing...

I think this is not just XFS-specific. It is very desirable to align
large IO to the RAID stripe so that if you write stripe_width bytes
you don't do 2 expensive read-modify-write steps, but rather a single
write operation. Also RAID controllers have (large) internal cache
"lines" and having aligned reads can help noticably, and also if array
is doing RAID parity checking to avoid overhead of doing this for two
stripes instead of one.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2006-11-03 19:38:01

by Jan Kara

[permalink] [raw]

Subject: Re: [RFC] Defragmentation interface

> On Nov 03, 2006 15:30 +0100, Jan Kara wrote:
> > > - stripe unit and stripe width need to be exposed so defrag too
> > > can make correct placement decisions.
> > fs-specific thing...
>
> I think this is not just XFS-specific. It is very desirable to align
> large IO to the RAID stripe so that if you write stripe_width bytes
> you don't do 2 expensive read-modify-write steps, but rather a single
> write operation. Also RAID controllers have (large) internal cache
> "lines" and having aligned reads can help noticably, and also if array
> is doing RAID parity checking to avoid overhead of doing this for two
> stripes instead of one.
I see, thanks for information. On the other hand this is rather an
underlying block device feature than a fs thing, isn't it? So does it
make sence to pass this information via fs interface?

Honza

2006-11-06 02:54:27

by David Chinner

[permalink] [raw]

Subject: Re: [RFC] Defragmentation interface

On Fri, Nov 03, 2006 at 03:30:30PM +0100, Jan Kara wrote:
> > > So in this email I try to propose some interface which should hopefully
> > > address most of the concerns. The type of the interface is sysfs like
> > > (idea taken from ext2meta) - that has a few advantages:
> > > - no 32/64-bit compatibility issues
> > > - easily extensible
> > > - generally nice ;)
> >
> > - complex
> > - over-engineered
> > - little common code between filesystems
> The first two may be but actually I don't think you'll have too much
> common code among fs anyway whatever interface you choose.
>
> > BTW, does use of sysfs mean ASCII encoding of all the data
> > passing between kernel and userspace?
> Not necessarify but mostly yes. At least I intend to have all the
> files I have proposed in ASCII.

Ok - that's how you're looking to avoid 32/64bit compatibility issues?
It will make the interface quite verbose, though, and entail significant
encoding and decoding costs....

> > > Each filesystem willing to support this interface implements special
> > > filesystem (e.g. ext3meta, XFSmeta, ...) and admin/defrag-tool mounts it
> > > to some directory.
> >
> > - not useful for wider audiences like applications that would like
> > to direct allocation
> Why not? A simple tool could stat file, get ino, put some number in
> alloc_goal...

- Root permissions.
- multiple files need to be opened, read, written, closed
- high overhead of searching for free blocks in the area you want
- difficult to control alloc_goal with multi-threaded programs
- potential for each filesystem to have a different meta structures....

> > > There are parts of this interface which should be
> > > common for all filesystems (so that tools don't have to care about
> > > particular filesystem and still get some useful results), other parts
> > > are fs-specific. Here is basic structure I propose:
> > >
> > > meta/features
> > > - bitmap of features supported by the interface (ext2/3-like) so that
> > > the tool can verify whether it understands the interface and don't
> > > mess with it otherwise
> >
> > - grow very large, very quickly if it has to support all the
> > different quirks of different filesystems.
> Yes, that may be a problem...
>
> > > meta/allocation/free_blocks
> > > - RO file - if you read from fpos F, you'll get a list of extents
> > > describing areas with free blocks (as many as fits into supplied
> > > buffer) starting from block F. Fpos of your file descriptor is
> > > shifted to the first unreported free block.
> >
> > - linear search properties == Bad. (think fs sizes of hundreds of
> > terabytes - XFS is already deployed with filesystems of this size)
> OK, so what do you propose? You want syscall find_free_blocks() and
> my idea of it was that it will do basically the same think as my
> interface.

Using the above interface I guess you'd have to seek and read
until you found records with block numbers near to what you'd
require. It is effectively:

find_free_blocks(fd, policy, &list, nblocks)

struct policy {
__u64 version;
__u64 blkno;
__u64 len;
__u64 group;
__u64 policy;
__u64 fallback_policy;
}

#define ALLOC_POLICY_EXACT_LEN (1<<0)ULL
#define ALLOC_POLICY_EXACT_BLOCK (1<<1)ULL
#define ALLOC_POLICY_EXACT_GROUP (1<<2)ULL
#define ALLOC_POLICY_MIN_LEN (1<<3)ULL
#define ALLOC_POLICY_NEAR_BLOCK (1<<4)ULL
#define ALLOC_POLICY_NEAR_GROUP (1<<5)ULL
#define ALLOC_POLICY_NEXT_BLOCK (1<<6)ULL
#define ALLOC_POLICY_NEXT_GROUP (1<<7)ULL

The sysfs interface you propose is effectively:

memset(&policy, 0, sizeof(policy));
policy.policy = ALLOC_POLICY_NEXT_BLOCK;
do {
find_free_blocks(fd, &policy, &list, nblocks);
/* process free block list */
.....
/* get next blocks */
policy.blkno = list[nblocks - 1].blkno
} while (policy.blkno != EOF);

However, this can be optimised for a given search where
the location is known beforehand to:

memset(&policy, 0, sizeof(policy));
policy.policy = ALLOC_POLICY_NEAR_BLOCK;
policy.blkno = X;
find_free_blocks(fd, &policy, &list, nblocks);

If you then chose to allocate from this list and it fails, you
simply redo the above.

With the sysfs interface, if you want to find a single contiguous
run of blocks, you'd probably just have to read the entire file and
search it for the pattern of blocks you want. With XFS, we already
have this information indexed in btrees, so we don't want to
have to read the entire btree just to find something we could
with a single btree lookup. i.e:

memset(&policy, 0, sizeof(policy));
policy.policy = ALLOC_POLICY_EXACT_LEN;
policy.len = X;
find_free_blocks(fd, &policy, &list, nblocks);

Or indeed, something close to the block we want, of size
big enough:

memset(&policy, 0, sizeof(policy));
policy.policy = ALLOC_POLICY_MIN_LEN | ALLOC_POLICY_NEAR_BLOCK;
policy.blkno = X;
policy.len = Y;
find_free_blocks(fd, &policy, &list, nblocks);

And so on. The advantage of this is the filesytem is free
to search for the blocks in any manner it chooses, rather than
having a fixed, linear seek/read interfaces to searches.

> > - cannot use smart requests like given me free blocks near X,
> > in AG Y or Z, etc.
> It supports "give me free block after block X". I agree that more
> complicated requests may be sometimes useful but I believe doing some
> syscall interface for them would be even worse.

Right. More complicated requests are something that we need to
support in XFS in the short-medium term. We _need_ an interface to
XFS that allows complex, compound allocation policies to be
accessible from userspace - and this is not just for defrag
programs.

I think a set of well defined allocation primitives suits a syscall
interface far better than a per-filesystem sysfs interface.

> > - some filesystems have more than one data area - e.g. XFS has the
> > realtime volume.
> Interesting, I didn't know that. But anything that wants to mess with
> volumes has to know that it uses XFS anyway so this handling should be
> probably fs-specific...

It's a flag on the inode (i.e. an extended inode attribute) that
indicates where the data lies for that inode. Once again, this can
be handled implicitly by the syscall interface because the
filesystem is aware of this flag and should return blocks associated
with the inode's data device...

> > - every time you fail an allocation, you need to reread this file.
> Yes, that's the most serious disadvantage I see. Do you see any way
> out of it in any interface?

I haven't really thought about solutions for this interface - the
syscall interface doesn't have this problem because of the way you
can specify where you want free blocks from....

> > > meta/super/blocksize
> > > - filesystem block size
> >
> > fcntl(FIGETBSZ).
> I know but can be also in the interface...
>
> > Also:
> >
> > - some filesystems can use different block sizes for different
> > structures (e.g XFs directory blocks canbe larger than the fsb)
> The block size was meant as an allocation unit size. So basically it
> really was just another interface to FIGETBSZ.

That's still a problem - XFS doesn't always use the filesystem block
size as it's allocation unit.....

> > - extent size hints, etc.
> Umm, I don't understand what you mean by this.

.... because we have per-inode extent size allocation hints. That
is, the allocator will always try to allocate extsize bytes (and
extsize aligned) extents for any file with this hint. If it can't
get a chunk large enough for this, then ENOSPC....

> > - stripe unit and stripe width need to be exposed so defrag too
> > can make correct placement decisions.
> fs-specific thing...

As Andreas said, this isn't fs-specific. XFS takes sunit and swidth
as mkfs parameters so it can align both metadata and data optimally
for RAID devices. Other fileystems have different methods of
specifying this (ext2/3/4 use -E stride-size for this), but it would
need to be exposed in some way....

> > > meta/nodes/<ident>
> > > - this should be a directory containing things specific for a fs-object
> > > with identification <ident>. In case of ext3 these would be inode
> > > numbers, I guess this should be plausible also for XFS and others
> > > but I'm open to suggestions...
> > > - directory contains the following:
> > > alloc_goal
> > > - block number with current allocation goal
> >
> > The kernel has to store this across syscalls until you write into
> > data/alloc? That sounds dangerous...
> This is persistent until kernel decides to remove inode from memory.
> So while you have the file open, you are guaranteed that kernel keeps
> the information.

But the inode hangs around long after the file is closed. How
do you guarantee that this gets cleared when it needs to be?

I just don't like the principle of this interface when we are
talking about moving data around online - it's inherently unsafe
when you consider mutli-threaded or -process access to an inode.

> > > data/reloc
> > > - you write there <ident> and relocation of data happens as follows:
> > > All blocks that are allocated both in original file and <ident>
> > > are relocated to <ident>. Write returns number of relocated
> > > blocks.
> >
> > You can only relocate to a new inode (which in XFS will change
> > the inode number)? What happens if there are blocks in duplicate
> > offsets in both inodes? What happens if all the blocks aren't
> > relocated - how do you handle this?
> Inode does not change. Only block pointers are changed. Let <orig> be
> original inode and <blocks> the temporary inode. If block at offset O is
> allocated in both <orig> and <blocks>, then we copy data for the block
> from <orig> to <blocks> and swap block pointers to the block of <orig>
> and <blocks>.

OK, understood - I was a bit confused about the "original file and
<ident> are relocated to <ident>" bit. Thanks for the clarification.

> > The major difference is that one implementation requires 3 new
> > generically useful syscalls, and the other requires every filesystem
> > to implement a metadata filesystem and require root priviledges
> > to use.
> Yes. IMO the complexity of implementation is almost the same in the
> syscall case and in my sysfs case. What syscall would do is just do some
> basic checks and redirect everything into fs-specific call anyway...

Sure, but you don't need to implement a new filesystem in every
filesystem to support it....

> In sysfs you just hook the same fs-specific routines to the files I
> describe. Regarding the priviledges, I don't believe non-root (or user
> without proper capability) should be allowed to do these operations.

Why not? As long as the user has permissions to write to the
filesystem and has quota left, they can create files however
they want.

> I
> can imagine all kinds of DoS attacks using these interfaces (e.g.
> forcing fs into worst-cases of file placement etc...)

They could only do that to files they have write access to. IOWs,
if they screw up their own files, let them. If they have root,
then it doesn't matter what interface we provide, it can be used
to do this.

And if you're really paranoid, with a generic syscall interface
we can introduce a "(no)useralloc" mount option that specifcally
prevents this interface form being used on a given filesystem...

> > hmmm - how do you support objects in the filesystem not attached
> > to inodes (e.g. the freespace and inode btrees in XFS)? What sort
> > interface would they use?
> You could have fs-specific hooks manipulating with your B-tree..

Yes, I realise that - my question is how do you think that they
should be enumerated in the metafs heirachy? What standard would apply?

> > > This is all that is needed for my purposes. Any comments welcome.
> >
> > Then your purpose is explicitly data defragmentation? If that is
> > the case, I still fail to see any need for a new metadata fs for
> > every filesystem to support this.
> What I want is to implement defrag for ext3. For that I need some new
> interfaces so I'm trying to design them in such a way that further
> extension for other needs is possible.

Understood. However, I'm looking past the immediate problem and
trying to find a common set of fileystem independent features that
will serve us well for the next few years. Allocation policies
and data relocation are just some of the issues that _all_
filesystems are going to have to face in the near future.

It is far easier to tell the application dev to "use this allocation
interface because you know exactly what you want" than to try to
develop filesystem heuristics to detect their pathological workload
and try to do something smart in the filesystem to stop the problem
from occurring.

Hence I'd like to have a common, well defined interface thought out
in advance rather than having to get applicaitons to explicitly
support one filesystem or another.

[ Simple example: posix_fallocate() syscall implementation, rather
than having to get applications to detect libxfs at build time and
use xfsctl() instead of posix_fallocate() to get a fast, efficient
preallocation method). ]

> That's all. Now if the interface
> has some common parts for several filesystems, then making userspace
> tool work for all of them should be easier. So I don't require anybody
> to implement it. Just if it's implemented, userspace tool can work for
> it too...

Hmmm - that sounds like you have already decided that this is the
interface that you are going to implement for ext3. ....

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2006-11-06 17:44:58

by Jan Kara

[permalink] [raw]

Subject: Re: [RFC] Defragmentation interface

> On Fri, Nov 03, 2006 at 03:30:30PM +0100, Jan Kara wrote:
> > > BTW, does use of sysfs mean ASCII encoding of all the data
> > > passing between kernel and userspace?
> > Not necessarify but mostly yes. At least I intend to have all the
> > files I have proposed in ASCII.
>
> Ok - that's how you're looking to avoid 32/64bit compatibility issues?
Yes.

> It will make the interface quite verbose, though, and entail significant
> encoding and decoding costs....
It would be verbose. On the other hand for most things it should not
matter (not too much data goes through the interface and it's not too
performance critical).

> > > > meta/allocation/free_blocks
> > > > - RO file - if you read from fpos F, you'll get a list of extents
> > > > describing areas with free blocks (as many as fits into supplied
> > > > buffer) starting from block F. Fpos of your file descriptor is
> > > > shifted to the first unreported free block.
> > >
> > > - linear search properties == Bad. (think fs sizes of hundreds of
> > > terabytes - XFS is already deployed with filesystems of this size)
> > OK, so what do you propose? You want syscall find_free_blocks() and
> > my idea of it was that it will do basically the same think as my
<snip>

> Right. More complicated requests are something that we need to
> support in XFS in the short-medium term. We _need_ an interface to
> XFS that allows complex, compound allocation policies to be
> accessible from userspace - and this is not just for defrag
> programs.
>
> I think a set of well defined allocation primitives suits a syscall
> interface far better than a per-filesystem sysfs interface.
I'm only afraid of one thing: Once you define a syscall it's hard to
change anything and for this kind of thing I'm not sure we are able to
tell what we'll need in two years... That is basically my main
concern with implementing this interface as a syscall.

> > > - every time you fail an allocation, you need to reread this file.
> > Yes, that's the most serious disadvantage I see. Do you see any way
> > out of it in any interface?
>
> I haven't really thought about solutions for this interface - the
> syscall interface doesn't have this problem because of the way you
> can specify where you want free blocks from....
But that does not solve the problem with having to repeat the search,
does it? Only with the syscall interface filesystem can possibly search
for free blocks more efficiently..

> > > - stripe unit and stripe width need to be exposed so defrag too
> > > can make correct placement decisions.
> > fs-specific thing...
>
> As Andreas said, this isn't fs-specific. XFS takes sunit and swidth
> as mkfs parameters so it can align both metadata and data optimally
> for RAID devices. Other fileystems have different methods of
> specifying this (ext2/3/4 use -E stride-size for this), but it would
> need to be exposed in some way....
I see. But then shouldn't we expose it regardless the interface
(sysfs/syscall) we choose so that userspace can take it into account
when picking where to allocate?

> > > > meta/nodes/<ident>
> > > > - this should be a directory containing things specific for a fs-object
> > > > with identification <ident>. In case of ext3 these would be inode
> > > > numbers, I guess this should be plausible also for XFS and others
> > > > but I'm open to suggestions...
> > > > - directory contains the following:
> > > > alloc_goal
> > > > - block number with current allocation goal
> > >
> > > The kernel has to store this across syscalls until you write into
> > > data/alloc? That sounds dangerous...
> > This is persistent until kernel decides to remove inode from memory.
> > So while you have the file open, you are guaranteed that kernel keeps
> > the information.
>
> But the inode hangs around long after the file is closed. How
> do you guarantee that this gets cleared when it needs to be?
It gets cleared (or rewritten) as soon as alloc_goal is used for
allocation or when inode gets removed from memory. Ext3 currently has
such thing (settable via ioctl()) and it seems to work reasonably well.

> I just don't like the principle of this interface when we are
> talking about moving data around online - it's inherently unsafe
> when you consider mutli-threaded or -process access to an inode.
Yes, we certainly have to make sure we don't do something destructive
in such case. On the other hand if several processes try to guide
allocation in the same file, results are uncertain and that's IMHO ok.

> > > The major difference is that one implementation requires 3 new
> > > generically useful syscalls, and the other requires every filesystem
> > > to implement a metadata filesystem and require root priviledges
> > > to use.
> > Yes. IMO the complexity of implementation is almost the same in the
> > syscall case and in my sysfs case. What syscall would do is just do some
> > basic checks and redirect everything into fs-specific call anyway...
>
> Sure, but you don't need to implement a new filesystem in every
> filesystem to support it....
But the cost of this "meta filesystem implementation" is just something
like having a file metafs.c that contains read_super() in which it
sets up those metafs files/directories and their handling functions. So
I imagine that setting up most of the files should be like:
create_metafs_file("super/uuid", RW, foo_return_uuid, foo_set_uuid)

Where create_metafs_file() is some generic VFS helper. So I think that
sysfs interface has it's problems but implementation complexity is not
one of them..

> > In sysfs you just hook the same fs-specific routines to the files I
> > describe. Regarding the priviledges, I don't believe non-root (or user
> > without proper capability) should be allowed to do these operations.
>
> Why not? As long as the user has permissions to write to the
> filesystem and has quota left, they can create files however
> they want.
>
> > I
> > can imagine all kinds of DoS attacks using these interfaces (e.g.
> > forcing fs into worst-cases of file placement etc...)
>
> They could only do that to files they have write access to. IOWs,
> if they screw up their own files, let them. If they have root,
> then it doesn't matter what interface we provide, it can be used
> to do this.
But by cleverly choosing blocks to allocate, you can for example quite
fragment free space and by that you make sure that access for others
will be slow too. Also making extent tree grow really large (because you
force each extent to have one block) and then burning CPU cycles in
kernel by forcing it to do various tree operations with it is also not
a pleasant thing.

> And if you're really paranoid, with a generic syscall interface
> we can introduce a "(no)useralloc" mount option that specifcally
> prevents this interface form being used on a given filesystem...
Of course that's possible. I don't count myself among paranoid but
certainly I would not allow users to guide allocation on my server
because of above reasons ;).

> > > hmmm - how do you support objects in the filesystem not attached
> > > to inodes (e.g. the freespace and inode btrees in XFS)? What sort
> > > interface would they use?
> > You could have fs-specific hooks manipulating with your B-tree..
>
> Yes, I realise that - my question is how do you think that they
> should be enumerated in the metafs heirachy? What standard would apply?
Honestly, I don't know. But I believe a sensible enumeration could be
found.

> > > > This is all that is needed for my purposes. Any comments welcome.
> > >
> > > Then your purpose is explicitly data defragmentation? If that is
> > > the case, I still fail to see any need for a new metadata fs for
> > > every filesystem to support this.
> > What I want is to implement defrag for ext3. For that I need some new
> > interfaces so I'm trying to design them in such a way that further
> > extension for other needs is possible.
>
> Understood. However, I'm looking past the immediate problem and
> trying to find a common set of fileystem independent features that
> will serve us well for the next few years. Allocation policies
> and data relocation are just some of the issues that _all_
> filesystems are going to have to face in the near future.
>
> It is far easier to tell the application dev to "use this allocation
> interface because you know exactly what you want" than to try to
> develop filesystem heuristics to detect their pathological workload
> and try to do something smart in the filesystem to stop the problem
> from occurring.
>
> Hence I'd like to have a common, well defined interface thought out
> in advance rather than having to get applicaitons to explicitly
> support one filesystem or another.
Yes, I think we agree in this matter.

> [ Simple example: posix_fallocate() syscall implementation, rather
> than having to get applications to detect libxfs at build time and
> use xfsctl() instead of posix_fallocate() to get a fast, efficient
> preallocation method). ]
>
> > That's all. Now if the interface
> > has some common parts for several filesystems, then making userspace
> > tool work for all of them should be easier. So I don't require anybody
> > to implement it. Just if it's implemented, userspace tool can work for
> > it too...
>
> Hmmm - that sounds like you have already decided that this is the
> interface that you are going to implement for ext3. ....
No, I have not decided yet. And actually as I've got feedback mostly
from you and that was negative I'll probably also try syscall approach
and see who won't like that one ;)

Bye
Honza

2006-11-07 03:03:16

by David Chinner

[permalink] [raw]

Subject: Re: [RFC] Defragmentation interface

On Mon, Nov 06, 2006 at 06:44:58PM +0100, Jan Kara wrote:
> > On Fri, Nov 03, 2006 at 03:30:30PM +0100, Jan Kara wrote:
> > > > BTW, does use of sysfs mean ASCII encoding of all the data
> > > > passing between kernel and userspace?
> > > Not necessarify but mostly yes. At least I intend to have all the
> > > files I have proposed in ASCII.
> >
> > Ok - that's how you're looking to avoid 32/64bit compatibility issues?
> Yes.
>
> > It will make the interface quite verbose, though, and entail significant
> > encoding and decoding costs....
> It would be verbose. On the other hand for most things it should not
> matter (not too much data goes through the interface and it's not too
> performance critical).

Except when you have a filesystem with fragmented free space.
That could be a lot of information (think searching through
terabytes of fragmetned free space before finding a contiguous
block big enough).....

> > Right. More complicated requests are something that we need to
> > support in XFS in the short-medium term. We _need_ an interface to
> > XFS that allows complex, compound allocation policies to be
> > accessible from userspace - and this is not just for defrag
> > programs.
> >
> > I think a set of well defined allocation primitives suits a syscall
> > interface far better than a per-filesystem sysfs interface.
> I'm only afraid of one thing: Once you define a syscall it's hard to
> change anything and for this kind of thing I'm not sure we are able to
> tell what we'll need in two years... That is basically my main
> concern with implementing this interface as a syscall.

True, but there's only so many ways you can ask for free space to be
found or allocated. And there's a version number in the policy
structure so that interface is extensible. Also, for the limited scope
of the interfaces I don't see much change being needed over time
but the change tht is needed can be handled by the strucutre
version.

For the sysfs interface, it needs to be very flexible and extensible
because of the amount of filesystem specific information it can
expose and would need to expose to be usable by multiple filesystems
in a generic manner...

Hence I think different criteria apply here - the syscalls implement
single mechanisms, whereas the sysfs interface allows deeper, more
intrusive delving into filesystem internals while at the same time
providing mechanisms for modification of the filesystem. If we are
going to provide simple mechanisms to do certain operations, then
I'd prefer to see it done as specific, targeted syscalls rather
than buried within a multi-purpose, cross-filesystem sysfs
interface.

> > > > - every time you fail an allocation, you need to reread this file.
> > > Yes, that's the most serious disadvantage I see. Do you see any way
> > > out of it in any interface?
> >
> > I haven't really thought about solutions for this interface - the
> > syscall interface doesn't have this problem because of the way you
> > can specify where you want free blocks from....
> But that does not solve the problem with having to repeat the search,
> does it? Only with the syscall interface filesystem can possibly search
> for free blocks more efficiently..

Right - the repeat search is not a problem because the overhead is
far lower with the syscall interface.

> > > > - stripe unit and stripe width need to be exposed so defrag too
> > > > can make correct placement decisions.
> > > fs-specific thing...
> >
> > As Andreas said, this isn't fs-specific. XFS takes sunit and swidth
> > as mkfs parameters so it can align both metadata and data optimally
> > for RAID devices. Other fileystems have different methods of
> > specifying this (ext2/3/4 use -E stride-size for this), but it would
> > need to be exposed in some way....
> I see. But then shouldn't we expose it regardless the interface
> (sysfs/syscall) we choose so that userspace can take it into account
> when picking where to allocate?

Yes. In terms of the syscall interface, it is simple to do without
having to tell the application about alignment

#define POLICY_ALLOC_ALIGN_SUNIT
#define POLICY_ALLOC_ALIGN_SWIDTH

Now the kernel only returns blocks that are correctly aligned. If the
fileystem can't find any aligned blocks, set the fallback policy to
do the same search but without the alignment restriction.....

So, the interface doesn't need to expose the actual values, just a
method to support aligned allocations. Once again, leverage the
smarts the existing fiesystem allocator already has rather than
requiring alignment calculations to be done by the applicaiton.

Come to think of it, it would probably be better to do aligned
allocation by default and to have flags to turn off alignment.
Either way, the application doesn't need to know what the alignment
restrictions are with the syscall interface - it's just another
policy decision....

> > > > > meta/nodes/<ident>
> > > > > - this should be a directory containing things specific for a fs-object
> > > > > with identification <ident>. In case of ext3 these would be inode
> > > > > numbers, I guess this should be plausible also for XFS and others
> > > > > but I'm open to suggestions...
> > > > > - directory contains the following:
> > > > > alloc_goal
> > > > > - block number with current allocation goal
> > > >
> > > > The kernel has to store this across syscalls until you write into
> > > > data/alloc? That sounds dangerous...
> > > This is persistent until kernel decides to remove inode from memory.
> > > So while you have the file open, you are guaranteed that kernel keeps
> > > the information.
> >
> > But the inode hangs around long after the file is closed. How
> > do you guarantee that this gets cleared when it needs to be?
> It gets cleared (or rewritten) as soon as alloc_goal is used for
> allocation or when inode gets removed from memory. Ext3 currently has
> such thing (settable via ioctl()) and it seems to work reasonably well.

So alloc_goal is a one-shot? that still doesn't work in
multi-threaded or multi-process workloads - you've got no guarantee
that your allocation actually used the alloc_goal you set and so you
might get back an extent that is nowhere near what you wanted....

> > I just don't like the principle of this interface when we are
> > talking about moving data around online - it's inherently unsafe
> > when you consider mutli-threaded or -process access to an inode.
> Yes, we certainly have to make sure we don't do something destructive
> in such case. On the other hand if several processes try to guide
> allocation in the same file, results are uncertain and that's IMHO ok.

IMO, parallel allocation to the one file is a use-case that any new
interface must support. Right now we use tricks in XFS like
speculative pre-allocation, extent size hints, very large I/Os, etc
to minimise the fragmetnation that occurs when multiple processes
write to the one file. This is one of the workloads that causes us
fragmentation problems.

The problem is that these mitigation techniques are all reactive and
need to be set up once a problem has been observed. IOWs, we've got
to notice the probelm before we can take action to fix it.

If the application knows that the entire file will be filled
eventually, it can do much smarter things like allocate blocks in
the file such that when all the holes get filled we end up with a
contiguous file. That requires safe, multithreaded access to the
interface but it would avoid the need for admin intervention at
every location that the application is deployed like we currently
have to do...

> > > > The major difference is that one implementation requires 3 new
> > > > generically useful syscalls, and the other requires every filesystem
> > > > to implement a metadata filesystem and require root priviledges
> > > > to use.
> > > Yes. IMO the complexity of implementation is almost the same in the
> > > syscall case and in my sysfs case. What syscall would do is just do some
> > > basic checks and redirect everything into fs-specific call anyway...
> >
> > Sure, but you don't need to implement a new filesystem in every
> > filesystem to support it....
> But the cost of this "meta filesystem implementation" is just something
> like having a file metafs.c that contains read_super() in which it
> sets up those metafs files/directories and their handling functions. So
> I imagine that setting up most of the files should be like:
> create_metafs_file("super/uuid", RW, foo_return_uuid, foo_set_uuid)
>
> Where create_metafs_file() is some generic VFS helper. So I think that
> sysfs interface has it's problems but implementation complexity is not
> one of them..

True - you can wrap some of the functionality in generic helpers,
but every object that needs to be decoded, discovered or modified
requires file system specific code.

Hmm - out of curiousity - how do you populate the metafs with
objects (say inodes)? i.e. if you want to control the allocation of
inode 325, how does the sysfs directory get populated with the meta
directory for inode 325? Is it dynamic?

> > > In sysfs you just hook the same fs-specific routines to the files I
> > > describe. Regarding the priviledges, I don't believe non-root (or user
> > > without proper capability) should be allowed to do these operations.
> >
> > Why not? As long as the user has permissions to write to the
> > filesystem and has quota left, they can create files however
> > they want.
> >
> > > I
> > > can imagine all kinds of DoS attacks using these interfaces (e.g.
> > > forcing fs into worst-cases of file placement etc...)
> >
> > They could only do that to files they have write access to. IOWs,
> > if they screw up their own files, let them. If they have root,
> > then it doesn't matter what interface we provide, it can be used
> > to do this.
> But by cleverly choosing blocks to allocate, you can for example quite
> fragment free space and by that you make sure that access for others
> will be slow too.

Sure, but I can intentionally fragment free space on any filesystem
with mkdir, dd and rm....

> Also making extent tree grow really large (because you
> force each extent to have one block) and then burning CPU cycles in
> kernel by forcing it to do various tree operations with it is also not
> a pleasant thing.

dd is all I need for this one - large sparse file, write single
bytes into random offsets.

What I'm sayin gis that these potential issues already exist and
anyone can do stuff up a filesystem right now with simple tools and
no special permissions. AFAICT, adding an interface to direct allocation
does introduce any new problems.

> > And if you're really paranoid, with a generic syscall interface
> > we can introduce a "(no)useralloc" mount option that specifcally
> > prevents this interface form being used on a given filesystem...
> Of course that's possible. I don't count myself among paranoid but
> certainly I would not allow users to guide allocation on my server
> because of above reasons ;).

Your choice ;)

> > > That's all. Now if the interface
> > > has some common parts for several filesystems, then making userspace
> > > tool work for all of them should be easier. So I don't require anybody
> > > to implement it. Just if it's implemented, userspace tool can work for
> > > it too...
> >
> > Hmmm - that sounds like you have already decided that this is the
> > interface that you are going to implement for ext3. ....
> No, I have not decided yet.

Sorry - I was jumping to conclusions.

> And actually as I've got feedback mostly
> from you and that was negative I'll probably also try syscall approach
> and see who won't like that one ;)

Ok, sounds good.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group