2006-08-01 04:24:05

by David Lang

[permalink] [raw]
Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

On Mon, 31 Jul 2006, David Masover wrote:

> Oh, I'm curious -- do hard drives ever carry enough battery/capacitance to
> cover their caches? It doesn't seem like it would be that hard/expensive,
> and if it is done that way, then I think it's valid to leave them on. You
> could just say that other filesystems aren't taking as much advantage of
> newer drive features as Reiser :P

there are no drives that have the ability to flush their cache after they loose
power.

now, that being said, /. had a story within the last couple of days about hard
drive manufacturers adding flash to their hard drives. they may be aiming to add
some non-volitile cache capability to their drives, although I didn't think that
flash writes were that fast (needed if you dump the cache to flash when you
loose power), or that easy on power (given that you would first loose power),
and flash has limited write cycles (needed if you always use the cache).

I've heard to many fancy-sounding drive technologies that never hit the market,
I'll wait until thye are actually available before I start counting on them for
anything (let alone design/run a filesystem that requires them :-)

external battery backed cache is readily available, either on high-end raid
controllers or as seperate ram drives (and in raid array boxes), but nothing on
individual drives.

David Lang


2006-08-01 04:32:33

by David Masover

[permalink] [raw]
Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

David Lang wrote:
> On Mon, 31 Jul 2006, David Masover wrote:
>
>> Oh, I'm curious -- do hard drives ever carry enough
>> battery/capacitance to cover their caches? It doesn't seem like it
>> would be that hard/expensive, and if it is done that way, then I think
>> it's valid to leave them on. You could just say that other
>> filesystems aren't taking as much advantage of newer drive features as
>> Reiser :P
>
> there are no drives that have the ability to flush their cache after
> they loose power.

Aha, so back to the usual argument: UPS! It takes a fraction of a
second to flush that cache.

> now, that being said, /. had a story within the last couple of days
> about hard drive manufacturers adding flash to their hard drives. they
> may be aiming to add some non-volitile cache capability to their drives,
> although I didn't think that flash writes were that fast (needed if you
> dump the cache to flash when you loose power), or that easy on power
> (given that you would first loose power), and flash has limited write
> cycles (needed if you always use the cache).

But, the point of flash was not to replace the RAM cache, but to be
another level. That is, you have your Flash which may be as fast as the
disk, maybe faster, maybe less, and you have maybe a gig worth of it.
Even the bloatiest of OSes aren't really all that big -- my OS X came
installed, with all kinds of apps I'll never use, in less than 10 gigs.

And I think this story was awhile ago (a dupe? Not surprising), and the
point of the Flash is that as long as your read/write cache doesn't run
out, and you're still in that 1 gig of Flash, you're a bit safer than
the RAM cache, and you can also leave the disk off, as in, spinned down.
Parked.

Very useful for a laptop -- I used to do this in Linux by using Reiser4,
setting the disk to spin down, and letting lazy writes do their thing,
but I didn't have enough RAM, and there's always the possibility of
losing data. But leaving the disk off is nice, because in the event of
sudden motion, it's safer that way. Besides, most hardware gets
designed for That Other OS, which doesn't support any kind of Laptop
Mode, so it's nice to be able to enforce this at a hardware level, in a
safe way.

> I've heard to many fancy-sounding drive technologies that never hit the
> market, I'll wait until thye are actually available before I start
> counting on them for anything (let alone design/run a filesystem that
> requires them :-)

Or even remember their names.

> external battery backed cache is readily available, either on high-end
> raid controllers or as seperate ram drives (and in raid array boxes),
> but nothing on individual drives.

Ah. Curses.

UPS, then. If you have enough time, you could even do a Software
Suspend first -- that way, when power comes back on, you boot back up,
and if it's done quickly enough, connections won't even be dropped...

2006-08-01 04:56:30

by David Lang

[permalink] [raw]
Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressedby kernelnewbies.org regarding reiser4 inclusion]

On Mon, 31 Jul 2006, David Masover wrote:

> David Lang wrote:
>> On Mon, 31 Jul 2006, David Masover wrote:
>>
>>> Oh, I'm curious -- do hard drives ever carry enough battery/capacitance to
>>> cover their caches? It doesn't seem like it would be that hard/expensive,
>>> and if it is done that way, then I think it's valid to leave them on. You
>>> could just say that other filesystems aren't taking as much advantage of
>>> newer drive features as Reiser :P
>>
>> there are no drives that have the ability to flush their cache after they
>> loose power.
>
> Aha, so back to the usual argument: UPS! It takes a fraction of a second to
> flush that cache.

which does absolutly no good if someone trips over the power cord, the fuse
blows in the power supply, someone yanks the drive out of the hot-swap bay, etc.

>> now, that being said, /. had a story within the last couple of days about
>> hard drive manufacturers adding flash to their hard drives. they may be
>> aiming to add some non-volitile cache capability to their drives, although
>> I didn't think that flash writes were that fast (needed if you dump the
>> cache to flash when you loose power), or that easy on power (given that you
>> would first loose power), and flash has limited write cycles (needed if you
>> always use the cache).
>
> But, the point of flash was not to replace the RAM cache, but to be another
> level. That is, you have your Flash which may be as fast as the disk, maybe
> faster, maybe less, and you have maybe a gig worth of it. Even the bloatiest
> of OSes aren't really all that big -- my OS X came installed, with all kinds
> of apps I'll never use, in less than 10 gigs.
>
> And I think this story was awhile ago (a dupe? Not surprising), and the
> point of the Flash is that as long as your read/write cache doesn't run out,
> and you're still in that 1 gig of Flash, you're a bit safer than the RAM
> cache, and you can also leave the disk off, as in, spinned down. Parked.

as I understand it flash reads are fast (ram speeds), but writes are pretty slow
(comparable or worse to spinning media)

writing to a ram cache, but having a flash drive behind it doesn't gain you any
protection. and I don't think you need it for reads


>> external battery backed cache is readily available, either on high-end raid
>> controllers or as seperate ram drives (and in raid array boxes), but
>> nothing on individual drives.
>
> Ah. Curses.
>
> UPS, then. If you have enough time, you could even do a Software Suspend
> first -- that way, when power comes back on, you boot back up, and if it's
> done quickly enough, connections won't even be dropped...

remember, it can take 90W of power to run your CPU, 100+ to run your video card,
plus everything else. even a few seconds of power for this is a very significant
amount of energy storage.

however, I did get a pointer recently at a company makeing super-high capcity
caps, up to 2600F (F, not uF!) in a 138mmx tall 57mm dia cyliner, however it
only handles 2.7v (they have modules that handle higher voltages available)
http://www.maxwell.com/ultracapacitors/index.html

however I don't see these as being standard equipment in systems or on drives
anytime soon

David Lang

2006-08-01 05:59:27

by David Masover

[permalink] [raw]
Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressedby kernelnewbies.org regarding reiser4 inclusion]

David Lang wrote:
> On Mon, 31 Jul 2006, David Masover wrote:

>> Aha, so back to the usual argument: UPS! It takes a fraction of a
>> second to flush that cache.
>
> which does absolutly no good if someone trips over the power cord, the
> fuse blows in the power supply, someone yanks the drive out of the
> hot-swap bay, etc.

Power supply fuse... Yeah, it happens. Drives die, too. This seems
fairly uncommon. And dear God, please tell me anyone smart enough to
set up a UPS would also be smart enough to make tripping over the power
cord rare or impossible.

My box has a cable that runs down behind a desk, between the desk and
the wall. Power strip is on the floor, where a UPS will be when I get
around to buying one. If someone kicks any cable, it would be where the
UPS hits the wall -- but that's also behind the same desk.


> as I understand it flash reads are fast (ram speeds), but writes are
> pretty slow (comparable or worse to spinning media)
>
> writing to a ram cache, but having a flash drive behind it doesn't gain
> you any protection. and I don't think you need it for reads

Does gain you protection if you're not using the RAM cache, if you're
that paranoid. I don't know if it's cheaper than RAM, but more read
cache is always better. And losing power seems a lot less likely than
crashing, especially on a Windows laptop, so it does make sense as a
product. And a laptop, having a battery, will give you a good bit of
warning before it dies. My Powerbook syncs and goes into Sleep mode
when it runs low on power (~1%/5mins)

>>> external battery backed cache is readily available, either on
>>> high-end raid controllers or as seperate ram drives (and in raid
>>> array boxes), but nothing on individual drives.
>>
>> Ah. Curses.
>>
>> UPS, then. If you have enough time, you could even do a Software
>> Suspend first -- that way, when power comes back on, you boot back up,
>> and if it's done quickly enough, connections won't even be dropped...
>
> remember, it can take 90W of power to run your CPU, 100+ to run your
> video card, plus everything else. even a few seconds of power for this
> is a very significant amount of energy storage.

Suspend2 can take about 10-20 seconds. It should be possible to work
out the maximum amount of time it can take.

Anyway, according to a quick Google search, my CPU is more like 70W.
Video card isn't required on a server, but you may be right on mine. I
haven't looked at UPSes lately, though. I need about 3 seconds for a
sync, maybe 10 for a suspend, so to be safe I can say for sure I'd be
down in about 30 seconds.

So, another Google search, and while you can get a cheap UPS for
anywhere from $10 to $100, the sweet spot seems to be a little over $200.

$229, and it's 865W, supposedly for 3.7 minutes. Here's a review:

"This is a great product. It powers an AMD 64 3200+ with beefy (6800GT)
graphics card, 21" CRT monitor, secondary 19" CRT, a linux server, a 15"
CRT, Cisco 2800XL switch, Linksys WRTG54GS, cable modem, speakers, and
many other things. The software says I will get 9 minutes runtime with
all of that hooked up, realistically it's about 4 minutes."

This was the lowest time reported. Most of the other reviews say at
least 15 minutes, sometimes 30 minutes, with fairly high-end computers
listed (and monitors, sometimes two computers/monitors), but nowhere
near as much stuff as this guy has.

I checked most of these for Linux support, and UPSes in general seem
well supported. So yes, the box will shut off automatically. On a
network, it shouldn't be too hard to get one box to shut off all the rest.

It's a lot of money, even at the low end, but when you're already
spending a pile of money on a new computer, keep power in mind. And
really, even 11 minutes would be fine, but 40 minutes of power is quite
a lot compared to less than a minute of time taken to shut down normally
-- not even suspend, but a normal shut down. I'd be tempted to try to
ride it out for the first 20 minutes, see if power comes back up...

> however, I did get a pointer recently at a company makeing super-high
> capcity caps, up to 2600F (F, not uF!) in a 138mmx tall 57mm dia
> cyliner, however it only handles 2.7v (they have modules that handle
> higher voltages available)
> http://www.maxwell.com/ultracapacitors/index.html
>
> however I don't see these as being standard equipment in systems or on
> drives anytime soon

This seems to be a whole different approach -- more along the lines of
in the drive, which would be cool...

2006-08-01 23:52:09

by Ian Stirling

[permalink] [raw]
Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

David Masover wrote:
> David Lang wrote:
>
>> On Mon, 31 Jul 2006, David Masover wrote:
>>
>>> Oh, I'm curious -- do hard drives ever carry enough
>>> battery/capacitance to cover their caches? It doesn't seem like it
>>> would be that hard/expensive, and if it is done that way, then I
>>> think it's valid to leave them on. You could just say that other
>>> filesystems aren't taking as much advantage of newer drive features
>>> as Reiser :P
>>
>>
>> there are no drives that have the ability to flush their cache after
>> they loose power.
>
>
> Aha, so back to the usual argument: UPS! It takes a fraction of a
> second to flush that cache.

You probably don't actually want to flush the cache - but to write
to a journal.
16M of cache - split into 32000 writes to single sectors spread over
the disk could well take several minutes to write. Slapping it onto
a journal would take well under .2 seconds.
That's a non-trivial amount of storage though - 3J or so, 40mF@12V -
a moderately large/expensive capacitor.

And if you've got to spin the drive up, you've just added another
order of magnitude.

You can see why a flash backup of the write cache may be nicer.
You can do it if the disk isn't spinning.
It uses moderately less energy - and at a much lower rate, which
means the power supply can be _much_ cheaper. I'd guess it's the
difference between under $2 and $10.
And if you can use it as a lazy write cache for laptops - things
just got better battery life wise too.

2006-08-02 02:29:08

by Kyle Moffett

[permalink] [raw]
Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

On Aug 01, 2006, at 19:50:49, Ian Stirling wrote:
> You probably don't actually want to flush the cache - but to write
> to a journal. 16M of cache - split into 32000 writes to single
> sectors spread over the disk could well take several minutes to
> write. Slapping it onto a journal would take well under .2 seconds.
> That's a non-trivial amount of storage though - 3J or so, 40mF@12V
> - a moderately large/expensive capacitor.

IMHO the best alternative for a situation like that is a storage
controller with a battery-backed cache and a hunk of flash NVRAM for
when the power shuts off (just in case you run out of battery), as
well as a separate 1GB battery-backed PCI ramdisk for an external
journal device (likewise equipped with flash NVRAM). It doesn't take
much power at all to write a gig of stuff to a small flash chip
(Think about your digital camera which runs off a couple AA's), so
with a fair-sized on-board battery pack you could easily transfer its
data to NVRAM and still have power left to back up data in RAM for 12
hours or so. That way bootup is fast (no reading 1GB of data from
NVRAM) but there's no risk of data loss.

Cheers,
Kyle Moffett

2006-08-02 03:52:09

by David Masover

[permalink] [raw]
Subject: Re: Solaris ZFS on Linux [Was: Re: the " 'official' point of view"expressed by kernelnewbies.org regarding reiser4 inclusion]

Ian Stirling wrote:
> David Masover wrote:
>> David Lang wrote:
>>
>>> On Mon, 31 Jul 2006, David Masover wrote:
>>>
>>>> Oh, I'm curious -- do hard drives ever carry enough
>>>> battery/capacitance to cover their caches? It doesn't seem like it
>>>> would be that hard/expensive, and if it is done that way, then I
>>>> think it's valid to leave them on. You could just say that other
>>>> filesystems aren't taking as much advantage of newer drive features
>>>> as Reiser :P
>>>
>>>
>>> there are no drives that have the ability to flush their cache after
>>> they loose power.
>>
>>
>> Aha, so back to the usual argument: UPS! It takes a fraction of a
>> second to flush that cache.
>
> You probably don't actually want to flush the cache - but to write
> to a journal.
> 16M of cache - split into 32000 writes to single sectors spread over
> the disk could well take several minutes to write. Slapping it onto
> a journal would take well under .2 seconds.
> That's a non-trivial amount of storage though - 3J or so, 40mF@12V -
> a moderately large/expensive capacitor.

Before we get ahead of ourselves, remember: ~$200 buys you a huge
amount of battery storage. We're talking several minutes for several
boxes, at the very least -- more like 10 minutes.

But yes, a journal or a software suspend.

2006-08-02 14:28:25

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: Solaris ZFS on Linux

Kyle Moffett <[email protected]> writes:

> IMHO the best alternative for a situation like that is a storage
> controller with a battery-backed cache and a hunk of flash NVRAM for
> when the power shuts off (just in case you run out of battery), as
> well as a separate 1GB battery-backed PCI ramdisk for an external
> journal device (likewise equipped with flash NVRAM). It doesn't take
> much power at all to write a gig of stuff to a small flash chip
> (Think about your digital camera which runs off a couple AA's), so
> with a fair-sized on-board battery pack you could easily transfer its
> data to NVRAM and still have power left to back up data in RAM for 12
> hours or so. That way bootup is fast (no reading 1GB of data from
> NVRAM) but there's no risk of data loss.

Not sure - reading flash is fast, but writing is quite slow.
A digital camera can consume a set of 2 or 4 2500 mAh AA cells
for a fraction of 1 GB (of course, only a part of power goes
to flash).
--
Krzysztof Halasa

2006-08-02 18:13:31

by Ian Stirling

[permalink] [raw]
Subject: Re: Solaris ZFS on Linux

Krzysztof Halasa wrote:
> Kyle Moffett <[email protected]> writes:
>
>
>>IMHO the best alternative for a situation like that is a storage
>>controller with a battery-backed cache and a hunk of flash NVRAM for
>>when the power shuts off (just in case you run out of battery), as
>>well as a separate 1GB battery-backed PCI ramdisk for an external
>>journal device (likewise equipped with flash NVRAM). It doesn't take


> Not sure - reading flash is fast, but writing is quite slow.
> A digital camera can consume a set of 2 or 4 2500 mAh AA cells
> for a fraction of 1 GB (of course, only a part of power goes
> to flash).

Yeah - that's why I said in the original message that it's not
especially lower in energy - the energy is used at a lower rate,
so is much cheaper to supply.
http://www.samsung.com/products/semiconductor/NORFlash/256Mbit/K8A5615EBA/K8A5615EBA.htm
's datasheet says to program the 32Mbyte chip takes about 30mw*120s, or
3.5J or so.
For a gigabyte, that's 100J - a fairly substantial amount of energy.
However - it's at a low rate, so it's not _too_ expensive to supply.

2006-08-03 02:20:37

by Wil Reichert

[permalink] [raw]
Subject: Re: Solaris ZFS on Linux

On 8/2/06, Krzysztof Halasa <[email protected]> wrote:
> Kyle Moffett <[email protected]> writes:
>
> > IMHO the best alternative for a situation like that is a storage
> > controller with a battery-backed cache and a hunk of flash NVRAM for
> > when the power shuts off (just in case you run out of battery), as
> > well as a separate 1GB battery-backed PCI ramdisk for an external
> > journal device (likewise equipped with flash NVRAM). It doesn't take
> > much power at all to write a gig of stuff to a small flash chip
> > (Think about your digital camera which runs off a couple AA's), so
> > with a fair-sized on-board battery pack you could easily transfer its
> > data to NVRAM and still have power left to back up data in RAM for 12
> > hours or so. That way bootup is fast (no reading 1GB of data from
> > NVRAM) but there's no risk of data loss.
>
> Not sure - reading flash is fast, but writing is quite slow.
> A digital camera can consume a set of 2 or 4 2500 mAh AA cells
> for a fraction of 1 GB (of course, only a part of power goes
> to flash).

Seeks are fast, throughput is terrible, power is minimal:

http://techreport.com/reviews/2006q3/supertalent-flashide/index.x?pg=1

Wil

2006-08-03 09:35:20

by Helge Hafting

[permalink] [raw]
Subject: Re: Solaris ZFS on Linux

On Wed, Aug 02, 2006 at 07:20:25PM -0700, Wil Reichert wrote:
> On 8/2/06, Krzysztof Halasa <[email protected]> wrote:
> >Kyle Moffett <[email protected]> writes:
> >
> >> IMHO the best alternative for a situation like that is a storage
> >> controller with a battery-backed cache and a hunk of flash NVRAM for
> >> when the power shuts off (just in case you run out of battery), as
> >> well as a separate 1GB battery-backed PCI ramdisk for an external
> >> journal device (likewise equipped with flash NVRAM). It doesn't take
> >> much power at all to write a gig of stuff to a small flash chip
> >> (Think about your digital camera which runs off a couple AA's), so
> >> with a fair-sized on-board battery pack you could easily transfer its
> >> data to NVRAM and still have power left to back up data in RAM for 12
> >> hours or so. That way bootup is fast (no reading 1GB of data from
> >> NVRAM) but there's no risk of data loss.
> >
> >Not sure - reading flash is fast, but writing is quite slow.
> >A digital camera can consume a set of 2 or 4 2500 mAh AA cells
> >for a fraction of 1 GB (of course, only a part of power goes
> >to flash).
>
> Seeks are fast, throughput is terrible, power is minimal:
>
> http://techreport.com/reviews/2006q3/supertalent-flashide/index.x?pg=1
>
That particular flash drive had terrible througput.

But there are other alternatives. I use a kingston 4GB
compactflash card as a disk, and it reads 22MB/s, according to
specs and tests with hdparm. And it writes 16MB/s.

Much better than the sorry thing in that test, about the same
read speed as their worst platter-based harddisk. And of course it still have
the nice seek times of non-rotating media. :-)

Helge Hafting