LinuxLists.cc - Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view" expressed by kernelnewbies.org regarding reiser4 inclusion]

2006-08-01 04:24:28

Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view" expressed by kernelnewbies.org regarding reiser4 inclusion]

Timothy Webster wrote:
> Different users have different needs.

I'm having trouble thinking of users who need an FS that doesn't need a
repacker.

The disk error problem, though, you're right -- most users will have to
get bitten by this, hard, at least once, or they'll never get the
importance of it. But it'd be nice if it's not too hard, and we can
actually recover most of their files.

Still, I can see most people who are aware of this problem using RAID,
backups, and not caring if their filesystem tolerates bad hardware.

> The problem I see is managing disk errors.

I see this kind of the same way. If your disk has errors, you should be
getting a new disk. If you can't do that, you can run a mirrored RAID
-- even on SATA, you should be able to hotswap it.

Even for a home/desktop user, disks are cheap, and getting cheaper all
the time. All you have to do is run the mean time between failure
numbers by them, and ask them if their backup is enough.

> And perhaps a
> really good clustering filesystem for markets that
> require NO downtime.

Thing is, a cluster is about the only FS I can imagine that could
reasonably require (and MAYBE provide) absolutely no downtime.
Everything else, the more you say it requires no downtime, the more I
say it requires redundancy.

Am I missing any more obvious examples where you can't have enough
redundancy, but you can't have downtime either?

2006-08-01 04:43:38

by David Lang

[permalink] [raw]

Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

On Mon, 31 Jul 2006, David Masover wrote:

>> And perhaps a
>> really good clustering filesystem for markets that
>> require NO downtime.
>
> Thing is, a cluster is about the only FS I can imagine that could reasonably
> require (and MAYBE provide) absolutely no downtime. Everything else, the more
> you say it requires no downtime, the more I say it requires redundancy.
>
> Am I missing any more obvious examples where you can't have enough
> redundancy, but you can't have downtime either?

just becouse you have redundancy doesn't mean that your data is idle enough for
you to run a repacker with your spare cycles. to run a repacker you need a time
when the chunk of the filesystem that you are repacking is not being accessed or
written to. it doesn't matter if that data lives on one disk or 9 disks all
mirroring the same data, you can't just break off 1 of the copies and repack
that becouse by the time you finish it won't match the live drives anymore.

database servers have a repacker (vaccum), and they are under tremendous
preasure from their users to avoid having to use it becouse of the performance
hit that it generates. (the theory in the past is exactly what was presented in
this thread, make things run faster most of the time and accept the performance
hit when you repack). the trend seems to be for a repacker thread that runs
continuously, causing a small impact all the time (that can be calculated into
the capacity planning) instead of a large impact once in a while.

the other thing they are seeing as new people start useing them is that the
newbys don't realize they need to do somthing as archaic as running a repacker
periodicly, as a result they let things devolve down to where performance is
really bad without understanding why.

David Lang

2006-08-01 04:57:19

by David Masover

[permalink] [raw]

Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

David Lang wrote:
> On Mon, 31 Jul 2006, David Masover wrote:
>
>>> And perhaps a
>>> really good clustering filesystem for markets that
>>> require NO downtime.
>>
>> Thing is, a cluster is about the only FS I can imagine that could
>> reasonably require (and MAYBE provide) absolutely no downtime.
>> Everything else, the more you say it requires no downtime, the more I
>> say it requires redundancy.
>>
>> Am I missing any more obvious examples where you can't have enough
>> redundancy, but you can't have downtime either?
>
> just becouse you have redundancy doesn't mean that your data is idle
> enough for you to run a repacker with your spare cycles.

Then you don't have redundancy, at least not for reliability. In that
case, you have redundancy for speed.

> to run a
> repacker you need a time when the chunk of the filesystem that you are
> repacking is not being accessed or written to.

Reasonably, yes. But it will be an online repacker, so it will be
somewhat tolerant of this.

> it doesn't matter if that
> data lives on one disk or 9 disks all mirroring the same data, you can't
> just break off 1 of the copies and repack that becouse by the time you
> finish it won't match the live drives anymore.

Aha. That really depends how you're doing the mirroring.

If you're doing it at the block level, then no, it won't work. But if
you're doing it at the filesystem level (a cluster-based FS, or
something that layers on top of an FS), or (most likely) the
database/application level, then when you come back up, the new data is
just pulled in from the logs as if it had been written to the FS.

The only example I can think of that I've actually used and seen working
is MySQL tables, but that already covers a huge number of websites.

> database servers have a repacker (vaccum), and they are under tremendous
> preasure from their users to avoid having to use it becouse of the
> performance hit that it generates. (the theory in the past is exactly
> what was presented in this thread, make things run faster most of the
> time and accept the performance hit when you repack). the trend seems to
> be for a repacker thread that runs continuously, causing a small impact
> all the time (that can be calculated into the capacity planning) instead
> of a large impact once in a while.

Hmm, if that could be done right, it wouldn't be so bad -- if you get
twice the performance but have to repack for 2 hrs at the end of the
week, repacker is better, right? So if you could spread the 2 hours out
over the week, in theory, you'd still be pretty close to twice the
performance.

But that is fairly difficult to do, and may be more difficult to do well
than to implement, say, a Reiser4 plugin that operates about on the
level of rsync, but on every file modification.

> the other thing they are seeing as new people start useing them is that
> the newbys don't realize they need to do somthing as archaic as running
> a repacker periodicly, as a result they let things devolve down to where
> performance is really bad without understanding why.

Yikes. But then, that may be a failure of distro maintainers for not
throwing it in cron for them.

I had a similar problem with MySQL. I turned on binary logging so I
could do database replication, but I didn't realize I had to actually
delete the logs. I now have a daily cron job that wipes out everything
except the last day's logs. It could probably be modified pretty easily
to run hourly, if I needed to.

Moral of the story? Maybe there's something to this "continuous
repacker" idea, but don't ruin a good thing for the rest of us because
of newbies.

2006-08-01 06:49:01

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

On Mon, Jul 31, 2006 at 09:41:02PM -0700, David Lang wrote:
> just becouse you have redundancy doesn't mean that your data is idle enough
> for you to run a repacker with your spare cycles. to run a repacker you
> need a time when the chunk of the filesystem that you are repacking is not
> being accessed or written to. it doesn't matter if that data lives on one
> disk or 9 disks all mirroring the same data, you can't just break off 1 of
> the copies and repack that becouse by the time you finish it won't match
> the live drives anymore.
>
> database servers have a repacker (vaccum), and they are under tremendous
> preasure from their users to avoid having to use it becouse of the
> performance hit that it generates. (the theory in the past is exactly what
> was presented in this thread, make things run faster most of the time and
> accept the performance hit when you repack). the trend seems to be for a
> repacker thread that runs continuously, causing a small impact all the time
> (that can be calculated into the capacity planning) instead of a large
> impact once in a while.

Ah, but as soon as the repacker thread runs continuously, then you
lose all or most of the claimed advantage of "wandering logs".
Specifically, the claim of the "wandering log" is that you don't have
to write your data twice --- once to the log, and once to the final
location on disk (whereas with ext3 you end up having to do double
writes). But if the repacker is running continuously, you end up
doing double writes anyway, as the repacker moves things from a
location that is convenient for the log, to a location which is
efficient for reading. Worse yet, if the repacker is moving disk
blocks or objects which are no longer in cache, it may end up having
to read objects in before writing them to a final location on disk.
So instead of a write-write overhead, you end up with a
write-read-write overhead.

But of course, people tend to disable the repacker when doing
benchmarks because they're trying to play the "my filesystem/database
has bigger performance numbers than yours" game....

- Ted

2006-08-01 07:25:00

by Avi Kivity

[permalink] [raw]

Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

Theodore Tso wrote:
>
> Ah, but as soon as the repacker thread runs continuously, then you
> lose all or most of the claimed advantage of "wandering logs".
> Specifically, the claim of the "wandering log" is that you don't have
> to write your data twice --- once to the log, and once to the final
> location on disk (whereas with ext3 you end up having to do double
> writes). But if the repacker is running continuously, you end up
> doing double writes anyway, as the repacker moves things from a
> location that is convenient for the log, to a location which is
> efficient for reading. Worse yet, if the repacker is moving disk
> blocks or objects which are no longer in cache, it may end up having
> to read objects in before writing them to a final location on disk.
> So instead of a write-write overhead, you end up with a
> write-read-write overhead.
>

There's no reason to repack *all* of the data. Many workloads write and
delete whole files, so file data should be contiguous. The repacker
would only need to move metadata and small files.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2006-08-01 09:25:22

by Matthias Andree

[permalink] [raw]

Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

On Tue, 01 Aug 2006, Avi Kivity wrote:

> There's no reason to repack *all* of the data. Many workloads write and
> delete whole files, so file data should be contiguous. The repacker
> would only need to move metadata and small files.

Move small files? What for?

Even if it is "only" moving metadata, it is not different from what ext3
or xfs are doing today (rewriting metadata from the intent log or block
journal to the final location).

The UFS+softupdates from the BSD world looks pretty good at avoiding
unnecessary writes (at the expense of a long-running but nice background
fsck after a crash, which is however easy on the I/O as of recent FreeBSD
versions). Which was their main point against logging/journaling BTW,
but they are porting XFS as well to save those that need instant
complete recovery.

--
Matthias Andree

2006-08-01 09:38:38

by Avi Kivity

[permalink] [raw]

Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

Matthias Andree wrote:
>
> On Tue, 01 Aug 2006, Avi Kivity wrote:
>
> > There's no reason to repack *all* of the data. Many workloads write
> and
> > delete whole files, so file data should be contiguous. The repacker
> > would only need to move metadata and small files.
>
> Move small files? What for?
>

WAFL-style filesystems like contiguous space, so if small files are
scattered in otherwise free space, the repacker should free them.

> Even if it is "only" moving metadata, it is not different from what ext3
> or xfs are doing today (rewriting metadata from the intent log or block
> journal to the final location).
>

There is no need to repack all metadata; only that which helps in
creating free space.

For example: if you untar a source tree you'd get mixed metadata and
small file data packed together, but there's no need to repack that data.

--
error compiling committee.c: too many arguments to function

2006-08-01 09:41:16

by Hans Reiser

[permalink] [raw]

Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

Theodore Tso wrote:

>On Mon, Jul 31, 2006 at 09:41:02PM -0700, David Lang wrote:
>
>
>>just becouse you have redundancy doesn't mean that your data is idle enough
>>for you to run a repacker with your spare cycles. to run a repacker you
>>need a time when the chunk of the filesystem that you are repacking is not
>>being accessed or written to. it doesn't matter if that data lives on one
>>disk or 9 disks all mirroring the same data, you can't just break off 1 of
>>the copies and repack that becouse by the time you finish it won't match
>>the live drives anymore.
>>
>>database servers have a repacker (vaccum), and they are under tremendous
>>preasure from their users to avoid having to use it becouse of the
>>performance hit that it generates. (the theory in the past is exactly what
>>was presented in this thread, make things run faster most of the time and
>>accept the performance hit when you repack). the trend seems to be for a
>>repacker thread that runs continuously, causing a small impact all the time
>>(that can be calculated into the capacity planning) instead of a large
>>impact once in a while.
>>
>>
>
>Ah, but as soon as the repacker thread runs continuously, then you
>lose all or most of the claimed advantage of "wandering logs".
>
>
Wandering logs is a term specific to reiser4, and I think you are making
a more general remark.

You are missing the implications of the oft-cited statistic that 80% of
files never or rarely move. You are also missing the implications of
the repacker being able to do larger IOs than occur for a random tiny IO
workload which is impacting a filesystem that is performing allocations
on the fly.

>Specifically, the claim of the "wandering log" is that you don't have
>to write your data twice --- once to the log, and once to the final
>location on disk (whereas with ext3 you end up having to do double
>writes). But if the repacker is running continuously, you end up
>doing double writes anyway, as the repacker moves things from a
>location that is convenient for the log, to a location which is
>efficient for reading. Worse yet, if the repacker is moving disk
>blocks or objects which are no longer in cache, it may end up having
>to read objects in before writing them to a final location on disk.
>So instead of a write-write overhead, you end up with a
>write-read-write overhead.
>
>But of course, people tend to disable the repacker when doing
>benchmarks because they're trying to play the "my filesystem/database
>has bigger performance numbers than yours" game....
>
>
When the repacker is done, we will just for you run one of our
benchmarks the morning after the repacker is run (and reference this
email);-).... that was what you wanted us to do to address your
concern, yes?;-)

> - Ted
>
>
>
>

2006-08-01 11:22:35

by Jan Engelhardt

[permalink] [raw]

Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

>
>Wandering logs is a term specific to reiser4, and I think you are making
>a more general remark.

So, what is UDF's "wandering" log then?

Jan Engelhardt
--

2006-08-01 17:03:10

by David Masover

[permalink] [raw]

Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

Theodore Tso wrote:

> Ah, but as soon as the repacker thread runs continuously, then you
> lose all or most of the claimed advantage of "wandering logs".
[...]
> So instead of a write-write overhead, you end up with a
> write-read-write overhead.

This would tend to suggest that the repacker should not run constantly,
but also that while it's running, performance could be almost as good as
ext3.

> But of course, people tend to disable the repacker when doing
> benchmarks because they're trying to play the "my filesystem/database
> has bigger performance numbers than yours" game....

So you run your own benchmarks, I'll run mine... Benchmarks for
everyone! I'd especially like to see what performance is like with the
repacker not running, and during the repack. If performance during a
repack is comparable to ext3, I think we win, although we have to amend
that statement to "My filesystem/database has the same or bigger
perfomance numbers than yours."

2006-08-01 17:57:12

by Hans Reiser

[permalink] [raw]

Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

Jan Engelhardt wrote:

>>Wandering logs is a term specific to reiser4, and I think you are making
>>a more general remark.
>>
>>
>
>So, what is UDF's "wandering" log then?
>
>
>
>Jan Engelhardt
>
>
I have no idea, when did they introduce it?