-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I have a kind of dumb question. I keep hearing that "USB Flash Memory"
or "Compact Flash Cards" and family have "a limited number of writes"
and will eventually wear out. Recommendations like "DO NOT PUT A SWAP
FILE ON USB MEMORY" have come out of this. In fact, quoting
Documentation/laptop-mode.txt:
* If you're worried about your data, you might want to consider using
a USB memory stick or something like that as a "working area". (Be
aware though that flash memory can only handle a limited number of
writes, and overuse may wear out your memory stick pretty quickly.
Do _not_ use journalling filesystems on flash memory sticks.)
The question I have is, is this really significant? I have heard quoted
that flash memory typically handles something like 3x10^18 writes; and
that compact flash cards, USB drives, SD cards, and family typically
have integrated control chipsets that include wear-leveling algorithms
(built-in flash like in an iPaq does not; hence jffs2). Should we
really care that in about 95 billion years the thing will wear out
(assuming we write its entire capacity once a second)?
I call FUD.
- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.
Creative brains are a valuable, limited resource. They shouldn't be
wasted on re-inventing the wheel when there are so many fascinating
new problems waiting out there.
-- Eric Steven Raymond
We will enslave their women, eat their children and rape their
cattle!
-- Evil alien overlord from Blasto
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iQIVAwUBRCAxdws1xW0HCTEFAQJd6w//auD2v3RJYxTbUePJwXFriCTO2d35+uo1
xU80/Brd7Hkdn82hfk/Rozoj6zsZFYYpYqqDhvo0aOKUW/cxZhTymXlEUgNXx0k+
s2hkVM4+nXoJhQhFuLk3/bPXBQlu20xA1tt6pHMscIfavijPSn7aV7gPx+L+SpDD
VqGdsmynt68IRk09b9su0gsfuM0OxYrjVAXPN5l+cjzlEk+fyHGIALu26UwiL+31
Gs86zviWaX1MwK5G0IZQ0ITySG/wNGoMNcbSdbm/45r0JnLhHPQjX2WGwIh7t5Y2
UeoYLRZJ5gRF9PT0yP5tMy0XXhKpj0aEtl8ccB/aeOCPsUKAC+2K2SFCfZLZCj8x
GOGeJKsutim+H+Qec/lnOng1LYoA9fJaisGzAUEOHYhFuYOioPVvGBKiRQlX6mMf
ofCAIOwtzWgxTa4kJrhU3oF0DYhLtP7Je/LCQW0RqmnMrXcR23/AwBa5fHTzhW1C
Mb6eL1TtYPYoyoBcKKYgKMmXLXu4d2klgxM4RRpcCrVfrupsHXr5VSzt+XYf7twX
TnY6DhmVVqp1YIVbWPXbNHplQuOU7ywdu+Y7q75jywqFBxGqeo+mPoL8ItW3IthZ
/zaoJVUH+n0FyydC+FYJ3SWx7AkPx46hZmO2UQmVlOAq2Fuc8I3haaOIQmADt0Ar
pwGzS3E92J0=
=48mD
-----END PGP SIGNATURE-----
John Richard Moser wrote:
>
> The question I have is, is this really significant? I have heard quoted
> that flash memory typically handles something like 3x10^18 writes;
That's like, uh, 13 orders of magnitudes out...
David Vrabel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
David Vrabel wrote:
> John Richard Moser wrote:
>> The question I have is, is this really significant? I have heard quoted
>> that flash memory typically handles something like 3x10^18 writes;
>
> That's like, uh, 13 orders of magnitudes out...
>
Yeah I did more searching, it looks like that was a mass overstatement.
There was one company that did claim to have developed flash memory
with that size (I think it was 3.8x10^18) but it looks like typical
drives are 1.0x10^6 with an on-chip wear-leveling algorithm. Assuming
the drive is like 256 megs with 64k blocks, that's still 129 years at
one write per second.
Bigger drives of course level over larger area and lifetime increases
linearly. My 512M drive should last 260 years in that scheme; a 4 gig
iPod Nano would last 2080 years; and a 30GiB flash-based hard disk in a
Samsung laptop on a single control chip doing the wear leveling over
multiple NAND chips would last 15600 years.
In theory anyway. And assuming one write on one block per one second on
average for the duration. (obviously the iPod nano sustains many times
less than that and will last hundreds of thousands of years in normal
usage).
> David Vrabel
>
- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.
Creative brains are a valuable, limited resource. They shouldn't be
wasted on re-inventing the wheel when there are so many fascinating
new problems waiting out there.
-- Eric Steven Raymond
We will enslave their women, eat their children and rape their
cattle!
-- Evil alien overlord from Blasto
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iQIVAwUBRCA30As1xW0HCTEFAQIELQ//TjtHD2EIh+DphWoe10MyAwt/MsWjdJEL
UWir35umv6gta6tnv4cI92PquSMAzyqeGcEe5l1Yhi9nGTe5jWqiiwxg8tVfaecq
Lf04pt53N9nVYR+lGd7DxDEq3ZCYeQKcE1hY3pnP3IHEnayfEHGl6zb8rTWEeKxm
o6miFUQoxVOXqcTHD8bLLJAJcTBsLn1IO6gAS9/WA4tvTYo4471E0m+ORY7WgFYK
/3fpq5a+PgbKkcTjRJdODaxhAROIjElwkTCPtjr/3wpjelOl1BpuTRzl8HxpAmEN
9Ybnophs6SnLeccE2WIW6PNC/cjgkyiZigOLE0EWBflJaM5ij9ZeW7Ju/FSxjhYK
e2YB6SrREFJ4Gs4eXOvzPy658JE+kbr1OtO3TIfJFykGY2tTsBvtKPvWGdFx2IJt
Znp+4vNcOtCO8Wd7uoMv+Sewk7AmqSpB5VPt64UZqGudM94Z3YDkdnUM7FjLkeng
ank4DFmzjKln2etmDo+25orQbSbPxR8UNRuWJCjOS0NTNN57fiMyIsqsSFFcuqsO
Ud8fwqvsrSLhXs1xzSxsWMvcZm/RsAgvOPAp+oajCjLVEvP2alRoLtu/yGmRgwEk
e2+Sa1zrAh2qZv2az0JLVr3gWfjRoKcn39QkF9rmiNpmr3Rf6Jx8PYjF1jYN2UDt
3j34unpH2gM=
=sTVH
-----END PGP SIGNATURE-----
On Tuesday 21 March 2006 11:14, David Vrabel wrote:
> John Richard Moser wrote:
> > The question I have is, is this really significant? I have heard quoted
> > that flash memory typically handles something like 3x10^18 writes;
>
> That's like, uh, 13 orders of magnitudes out...
Flash drives are cheap anyway. I saw a 256mB one for $10 at Circuit City. A
PNY one. After rebates... but still, you can't beat that.
>
> David Vrabel
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
--hackmiester
Walk a mile in my shoes and you will be a mile away in a new pair of shoes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFD/yYl3ApzN91C7BcRAoVVAJ97uhjh30nQ4hd9bQ90gJqiwsLEfgCeKSrg
bVfqEeJ09WhO6Y51WHEHb6o=
=VTUd
-----END PGP SIGNATURE-----
-----BEGIN GEEK CODE BLOCK-----
Version: Geek Code v3.1 (PHP)
GCS/CM/E/IT d-@ s: a- C++$ UBLS*++++$ P+ L+++$ E- W++$ !N-- !o+ K-- !w-- !O-
M++$ V-- PS@ PE@ Y--? PGP++ !t--- 5--? !X-- !R-- tv-- b+ DI++ D++ G+ e++++
h---- r+++ z++++
------END GEEK CODE BLOCK------
Quick contact info:
Work: [email protected]
Personal: [email protected]
Large files/spam: [email protected]
GTalk:hackmiester/AIM:hackmiester1337/Y!:hackm1ester/IRC:irc.7sinz.net/7sinz
On Tuesday 21 March 2006 08:01, John Richard Moser wrote:
> I have a kind of dumb question. I keep hearing that "USB Flash Memory"
> or "Compact Flash Cards" and family have "a limited number of writes"
> and will eventually wear out. Recommendations like "DO NOT PUT A SWAP
> FILE ON USB MEMORY" have come out of this. In fact, quoting
> Documentation/laptop-mode.txt:
>
> * If you're worried about your data, you might want to consider using
> a USB memory stick or something like that as a "working area". (Be
> aware though that flash memory can only handle a limited number of
> writes, and overuse may wear out your memory stick pretty quickly.
> Do _not_ use journalling filesystems on flash memory sticks.)
>
> The question I have is, is this really significant? I have heard quoted
> that flash memory typically handles something like 3x10^18 writes; and
> that compact flash cards, USB drives, SD cards, and family typically
> have integrated control chipsets that include wear-leveling algorithms
> (built-in flash like in an iPaq does not; hence jffs2). Should we
> really care that in about 95 billion years the thing will wear out
> (assuming we write its entire capacity once a second)?
>
> I call FUD.
Search for a thread on LKML having to do with enabling "sync" on removable
media, especially VFAT media. If you are copying a large file, and the FAT
on the device is being updated with every block, you can literally fry your
device in a matter of minutes, because the FAT is always in the same spot,
thus it is always overwriting the same spot.
j----- k-----
--
Joshua Kugler PGP Key: http://pgp.mit.edu/
CDE System Administrator ID 0xDB26D7CE
http://distance.uaf.edu/
John Richard Moser wrote:
> David Vrabel wrote:
>
>>John Richard Moser wrote:
>>
>>>The question I have is, is this really significant? I have heard quoted
>>>that flash memory typically handles something like 3x10^18 writes;
>>
>>That's like, uh, 13 orders of magnitudes out...
>
> Yeah I did more searching, it looks like that was a mass overstatement.
> There was one company that did claim to have developed flash memory
> with that size (I think it was 3.8x10^18) but it looks like typical
> drives are 1.0x10^6 with an on-chip wear-leveling algorithm.
That is still high. Modern flash drives will do 100.000 writes for SLC
(single-level cells) or 10.000 writes for MLC (multi-level cells) [1].
> Assuming
> the drive is like 256 megs with 64k blocks, that's still 129 years at
> one write per second.
This is also assuming _perfect_ wear leveling. There are real world
drives with crappy (or even buggy) wear levelling. I've seen CF cards
die with much less writting than this.
Even then, with just 10.000 writes, this is already reduced to 1.29
years, assuming 64kb/sec average writting.
If you take into consideration that you can actually write 6Mbytes/sec
on a modern CF card, you can fry a 256Mb card in just 5 days, if you
write continuously.
--
Paulo Marques - http://www.grupopie.com
Pointy-Haired Boss: I don't see anything that could stand in our way.
Dilbert: Sanity? Reality? The laws of physics?
[1] check out: http://www.kingston.com/products/DMTechGuide.pdf
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Joshua Kugler wrote:
> On Tuesday 21 March 2006 08:01, John Richard Moser wrote:
>> I have a kind of dumb question. I keep hearing that "USB Flash Memory"
>> or "Compact Flash Cards" and family have "a limited number of writes"
>> and will eventually wear out. Recommendations like "DO NOT PUT A SWAP
>> FILE ON USB MEMORY" have come out of this. In fact, quoting
>> Documentation/laptop-mode.txt:
>>
>> * If you're worried about your data, you might want to consider using
>> a USB memory stick or something like that as a "working area". (Be
>> aware though that flash memory can only handle a limited number of
>> writes, and overuse may wear out your memory stick pretty quickly.
>> Do _not_ use journalling filesystems on flash memory sticks.)
>>
>> The question I have is, is this really significant? I have heard quoted
>> that flash memory typically handles something like 3x10^18 writes; and
>> that compact flash cards, USB drives, SD cards, and family typically
>> have integrated control chipsets that include wear-leveling algorithms
>> (built-in flash like in an iPaq does not; hence jffs2). Should we
>> really care that in about 95 billion years the thing will wear out
>> (assuming we write its entire capacity once a second)?
>>
>> I call FUD.
>
> Search for a thread on LKML having to do with enabling "sync" on removable
> media, especially VFAT media. If you are copying a large file, and the FAT
> on the device is being updated with every block, you can literally fry your
> device in a matter of minutes, because the FAT is always in the same spot,
> thus it is always overwriting the same spot.
>
I've run with 'sync', it makes the removable device operate at a blazing
1.2k/s transfer rate instead of 13M/s. I actually tried to `dd
if=/dev/zero of=/media/usbdisk bs=64k` to zero out 150 megs of free
space, but gave up after about half an hour. This was a test to see the
speed of the drive under sync/nosync modes.
I thought these things had wear leveling on the control chips, seriously.
"USB mass storage controller - implements the USB host controller and
provides a seamless linear interface to block-oriented serial flash
devices while hiding the complexities of block-orientation, block
erasure, and wear balancing. The controller contains a small RISC
microprocessor and a small amount of on-chip ROM and RAM. (item 2 in the
diagram)"
^^^ From a diagram of a USB flash drive on Wikipedia.
http://en.wikipedia.org/wiki/USB_flash_drive
These drives seem to be rated for millions of writes:
"In normal use, mid-range flash drives currently on the market will
support several million cycles, although write operations will gradually
slow as the device ages."
Although there have been cases...
"A few cheaper USB flash drives have been found to use unsuitable flash
memory chips labelled as 'ROM USE ONLY' - these are intended for tasks
such as Flash BIOS for Routers rather than for continual rewrite use,
and fail after a very small number of cycles. [6]"
At which point we obviously know we shouldn't be doing what we're doing
with these things in the first place!
With proper wear balancing, a million writes across a drive should last
quite a while. About:
(C*10.0^6bytes * ((64.0 * 1024bytes) / Wblocks)) / (60s/min * 60min/hr *
24.0hr/day * 365.25day/yr)
C=Capacity, W=Written blocks. This gives you how many years the drive
lasts writing W blocks per second (the unit is years/blockseconds).
For a 256M flash drive this should be 130 years; a 4GiB iPod nano should
last 2080 years under this abuse (1 write to 1 block per 1 second); and
a 32GiB "flash hard disk" should last like 16600 years. That does, of
course, assume wear-leveling takes the entire storage area into account
instead of localizing to a single small area.
> j----- k-----
>
- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.
Creative brains are a valuable, limited resource. They shouldn't be
wasted on re-inventing the wheel when there are so many fascinating
new problems waiting out there.
-- Eric Steven Raymond
We will enslave their women, eat their children and rape their
cattle!
-- Evil alien overlord from Blasto
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iQIVAwUBRCBIsQs1xW0HCTEFAQK9EQ/9FYw9OzNYWrJFqLvtQgKegAvCFps2piR0
ZGqiLTHCA+BApZ3k+1vdxdFHKpT9t3HDsvjsS1GoIRqbMyD1u+ZbbjKUBLf5DQSO
SyBnpQRfnDOXoXnNM92PRObuRbvsVHzwpMMd79IljFKmneUnUrw0wem2/rv1FEbb
Q8c6IzLmzK/18oG0OowxWl8DjhwY8QmAPA45SJ0VE9RL3nM0uigY7ISVNEMK6qEb
sNuQ9tYkE8qkzTakbk09pVG7G9GljVbUMbBbkwb2v7aihnSEcFMCl717+KrDxuRL
ogGSZHZcF/bNs8FNlHAPBm057xHYpBiGaVwvM9Nm6aXPBxzxVed4iyTQ3NUyiy0K
m7YnpLrSJ3vDxge3tB/sOoIvXTg1LlfU5tjqE37YGBGxmuSNXyeP9yWuuw00Fxzd
osYKtu7FX6h8REo5NUtOB+1t7HGe3dMa7PAuBjz6f+MbBd0psr/4id/ecm6m7yuU
UJDHYVnN5kN/kDahN9VrEWLW7i04tAiCEPccQ/42pBM7EzjUkeBmNSWvn18XVBr+
2A+EIDY/+BtmJHJyC0WTvFN5mlnJ/tSdNP0zQx/5o5XvqdQEw1SnbmsToYl3w+MD
i2k0dt+8VUgLGv3YJYfTGYasllND//qI8kqRFzkcxifSVDI4eWm8p18fw8XSOluK
0RzELRz6VK4=
=9NS3
-----END PGP SIGNATURE-----
John Richard Moser wrote:
> I have a kind of dumb question. I keep hearing that "USB Flash Memory"
> or "Compact Flash Cards" and family have "a limited number of writes"
> and will eventually wear out. Recommendations like "DO NOT PUT A SWAP
> FILE ON USB MEMORY" have come out of this. In fact, quoting
> Documentation/laptop-mode.txt:
>
> * If you're worried about your data, you might want to consider using
> a USB memory stick or something like that as a "working area". (Be
> aware though that flash memory can only handle a limited number of
> writes, and overuse may wear out your memory stick pretty quickly.
> Do _not_ use journalling filesystems on flash memory sticks.)
I thought that journaling filesystems happen to overwrite exactly the same
place (where the journal is) many times... Am I mistaken?
So the effect is what we had for floppies (some many years ago) where sector
0 and others where FAT structure was kept were overused and start giving
errors - so the only solution was to throw away that floppy.
Hard disks had the same problem, but they have algorithms to relocate bad
clusters.
So do these "leveling algorithms" refer to the same? Relocating bad cells?
If not, you can see how a journaling system can fry a CF card quickly.
>
> The question I have is, is this really significant? I have heard quoted
> that flash memory typically handles something like 3x10^18 writes; and
> that compact flash cards, USB drives, SD cards, and family typically
> have integrated control chipsets that include wear-leveling algorithms
> (built-in flash like in an iPaq does not; hence jffs2). Should we
> really care that in about 95 billion years the thing will wear out
> (assuming we write its entire capacity once a second)?
>
> I call FUD.
3x10^18 is a bit overstating, IMHO. Don't have a reference now.
Kalin.
--
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|
> I thought that journaling filesystems happen to overwrite exactly the same
> place (where the journal is) many times... Am I mistaken?
Sort of. The journal is much larger than a signle block, and used
circularly, but it does get on average more traffic than the rest of
the disk.
> So the effect is what we had for floppies (some many years ago) where sector
> 0 and others where FAT structure was kept were overused and start giving
> errors - so the only solution was to throw away that floppy.
>
> Hard disks had the same problem, but they have algorithms to relocate bad
> clusters.
Actually, hard drives shouldn't have the same problem. The key difference
is that, on a floppy, the head touches the media, causing wear wherever
it hangs out. The head is very smooth, so it's minor, but it happens.
On a hard drive, the head never touches the media. There is no wear.
The magnetic writing happens across a very small air gap, and nobody's
ever found a wearout mechanism for the magnetizing part of things, so you
should be able to overwrite a single sector every rotation of the drive
(120 times a second) for the lifetime of the drive (years).
> So do these "leveling algorithms" refer to the same? Relocating bad cells?
> If not, you can see how a journaling system can fry a CF card quickly.
They do that, but they're also cleverer. Since flash media have no
seek time, there's no reason that the sector number as specified by the
computer needs to have anything to do with the physical location of the
bits on the flash, and they don't. If you write to the same sector over
and over, a flash memory device will do the write to different parts of
the memory chip every time.
This is called "wear leveling" - trying to use all of the chip an equal
amount. You'll see in high-density NAND flash systems there are actually
528 bytes per "sector". The extra 16 bytes are used to record:
- Whether this sector is still good or not
- The number of times this sector has been written
- The logical sector number of the data here
On startup, the first thing a thumb drive or CompactFlash card does is
read the extra 16 bytes from every sector and build a translation map,
so when a request for a given logical sector comes in, it knows where
to find it. Note that there are more sectors on the ROM than on the
hard drive it emulates, so there's always some spare space.
Further, when writing, occasionally a sector that's just sitting there
not bothering anyone is copied, so that the flash it occupied can be
used by faster-changing data.
(This is all complicated by the fact that, while you can *write* one
528-byte sector at a time, you can only write to an erased sector,
and erases operate in bigger chunks, often about 8K. Thus, it's not
physically possible to overwrite a 512-byte sector in place, even if
you wanted to! So the controller is continuously picking an 8K chunk
to re-use, copying any live data to a new chunk, and erasing it so
it's available. For highest speed, you want to pick chunks to recycle
that are mostly dead stale data, but for wear-leveling, you want to pick
chunks that have a low cycle count. So you do a mix.)
See http://www.st.com/stonline/products/literature/ds/10058.htm and
the various pages and papers linked to it for more detailed info.
(Not necessarily the best-written or easiest to understand, but it's
straight from a ROM manufacturer.)
As for the average lifetime, typical specs are either 10K, 100K or 1
million writes per sector. Basically, low-, normal- and high-endurance.
Low-endurance is used for program memory, where you might reflash it a
few times in the field, but aren't going to be using it continuously.
100K writes is the standard for data memory.
The denser the memory is, the lower the numbers tend to be. But you
also have a bigger pool to spread the writes across. Some folks use
multi-level cell memories, where instead of writing just 0 or 1, they add
1/3 and 2/3 values. That fits twice as many bits in, but wears out faster
as it takes less degradation of the cell to read back the wrong value.
With perfect wear leveling, a 1 GB flash memory can thus have 100 TB
written to it before bad sectors start becoming a problem. (And if you
allowed more spare sectors to start with, you would have more time.
One reason to integrate this into the file system and not emulate a
fixed-size disk.)
Assuming 10 MB/sec write speed (typical for a USB thumb drive) that
would require 10^7 seconds (115 days) of continuous full-speed writing.
So yes, a thumb drive isn't the best choice for on-line transaction
processing. But with less than 24x7 usage, it'll last many years.
Note that, assuming a decent wear-leveling algorithm (admittedly, a big
IF for some of the cheaper stuff out there!) it doesn't matter which
sectors you write that 100 TB to. It could be all the boot sector,
or sequential overwrites of the whole drive.
[email protected] wrote:
> On startup, the first thing a thumb drive or CompactFlash card does is
> read the extra 16 bytes from every sector and build a translation map,
> so when a request for a given logical sector comes in, it knows where
> to find it. Note that there are more sectors on the ROM than on the
> hard drive it emulates, so there's always some spare space.
Hello,
some time ago I tried to find any documentation about CF internals, but
failed. It seems like you may hint me where to find it, may you?
--
Best Regards,
Artem B. Bityutskiy,
St.-Petersburg, Russia.
> some time ago I tried to find any documentation about CF internals, but
> failed. It seems like you may hint me where to find it, may you?
Sorry, I don't have anything in particular, just bits I've picked up
talking to CF manufacturers.
Basically, a CF card is a flash ROM array attached to a little
microcontroller with an IDE interface. The large manufacturers generally
have custom controllers.
A basic block diagram is on page 3 (physical page 19) of the
CompactFlash spec, available from
http://www.compactflash.org/cfspc3_0.pdf
[email protected] wrote:
> Sorry, I don't have anything in particular, just bits I've picked up
> talking to CF manufacturers.
>
> Basically, a CF card is a flash ROM array attached to a little
> microcontroller with an IDE interface. The large manufacturers generally
> have custom controllers.
>
I'm actually interested in:
1. CF wear-levelling algorithms: how good or bad is it?
2. How does CF implement block mapping, does it store the mapping table
on-flash or in memory, does it build it by scanning, how scalable are
those algorithms.
3. Does CF perform bad erasable blocks hadling transparently when new
bad eraseblocks appear.
4. How tolerant CF to powrer-offs.
5. Is there a Garbage Collector in CF and how clever/stupid is it.
etc.
I've heard CF does not have good characteristics in the above mentioned
aspects, but still, it would be interesting to know details. I'm not
going to use CFs, but as I'm working with flashes, I'm just interested.
It'd help me explaining people why it is bad to use CF for more serious
applications then those just storing pictures.
--
Best Regards,
Artem B. Bityutskiy,
St.-Petersburg, Russia.
On Sun, Mar 26, 2006 at 08:36:48PM +0400, Artem B. Bityutskiy wrote:
> I'm actually interested in:
>
> 1. CF wear-levelling algorithms: how good or bad is it?
Depends on the maker.
> 2. How does CF implement block mapping, does it store the mapping table
> on-flash or in memory, does it build it by scanning, how scalable are
> those algorithms.
Well the map has to be stored in flash or other non volatile memory.
> 3. Does CF perform bad erasable blocks hadling transparently when new
> bad eraseblocks appear.
No idea, but it is almost certainly also vendor specific.
> 4. How tolerant CF to powrer-offs.
I have seen some that a power off in the middle of a write would leave
the card dead (it left it with a partially updated block map). On
others nothing happened (well you loose the write in progress of course
just as a harddisk would).
> 5. Is there a Garbage Collector in CF and how clever/stupid is it.
That is vendor specific. Depends how they did it. Different
generations from a given company may also be different in behaviour. I
imagine some parts of it are patented by some of the comapnies involed
in flash card making.
> I've heard CF does not have good characteristics in the above mentioned
> aspects, but still, it would be interesting to know details. I'm not
> going to use CFs, but as I'm working with flashes, I'm just interested.
> It'd help me explaining people why it is bad to use CF for more serious
> applications then those just storing pictures.
The wearleveling is not a part of the CF spec. So saying anything about
CF in general just doesn't make much sense. It all depends on the
controller in the CF you are using.
Len Sorensen
On Mon, 27 Mar 2006, Lennart Sorensen wrote:
> On Sun, Mar 26, 2006 at 08:36:48PM +0400, Artem B. Bityutskiy wrote:
>> I'm actually interested in:
>>
>> 1. CF wear-levelling algorithms: how good or bad is it?
>
> Depends on the maker.
>
>> 2. How does CF implement block mapping, does it store the mapping table
>> on-flash or in memory, does it build it by scanning, how scalable are
>> those algorithms.
>
> Well the map has to be stored in flash or other non volatile memory.
>
>> 3. Does CF perform bad erasable blocks hadling transparently when new
>> bad eraseblocks appear.
>
> No idea, but it is almost certainly also vendor specific.
>
>> 4. How tolerant CF to powrer-offs.
>
> I have seen some that a power off in the middle of a write would leave
> the card dead (it left it with a partially updated block map). On
> others nothing happened (well you loose the write in progress of course
> just as a harddisk would).
>
>> 5. Is there a Garbage Collector in CF and how clever/stupid is it.
>
> That is vendor specific. Depends how they did it. Different
> generations from a given company may also be different in behaviour. I
> imagine some parts of it are patented by some of the comapnies involed
> in flash card making.
>
>> I've heard CF does not have good characteristics in the above mentioned
>> aspects, but still, it would be interesting to know details. I'm not
>> going to use CFs, but as I'm working with flashes, I'm just interested.
>> It'd help me explaining people why it is bad to use CF for more serious
>> applications then those just storing pictures.
>
> The wearleveling is not a part of the CF spec. So saying anything about
> CF in general just doesn't make much sense. It all depends on the
> controller in the CF you are using.
>
> Len Sorensen
CompactFlash(tm) like SanDisk(tm) has very good R/W characteristics.
It consists of a connector that exactly emulates an IDE drive connector
in miniature, an interface controller that emulates and responds to
most IDE commands, plus a method of performing reads and writes using
static RAM buffers and permanent storage in NVRAM.
The algorithms used are proprietary. However, the techniques used
are designed to put all new data into permanent storage before
updating a block location table in static RAM. This table is built
when a reset is sent to the device and dynamically as long as there
is power to the chip. Destruction of this table will not lose any
data, because it is built upon power-up from block relocation
information saved in NVRAM. Note that the actual block size is
usually 64k, not the 512 bytes of a 'sector'. Apparently, some
of the data-space on each block is used for relocation and
logical-to-physical mapping.
Experimental data show that it is not possible to 'destroy' the
chip by interrupting a write as previously reported by others.
In fact, one of the destroyed devices was recovered by writing
all the sectors in the device as in:
`dd if=/dev/zero of=/dev/hdb bs=1M count=122`.
Note that there __is__ a problem that may become a "gotcha" if
you intend to RAW copy devices, one to another, for production.
The reported size (number of sectors) is different between
devices of the same type and manufacturer! Apparently, the size
gets set when the device is tested.
If you intend to make a bootable disk out of one of these, you
need to make a partition that is smaller than the smallest size
you are likely to encounter, on a 63-sector (virtual head) boundary.
Otherwise, at least with FAT file-systems, the device will boot
but file-system software won't be able to find the directory!
I have had very good results with these devices. My embedded
software mounts the root file-system R/O with a ram-disk mounted
on /tmp and /var. However, any parameter changes made by
the customer require a mount change to R/W, then a change
back to R/O. We thought that this was necessary. However,
we have a two year old system that writes hourly data to R/W
logs without any problems whatsoever. Basically, finite life
is a fact, but you are unlikely to encounter it as a problem
with real-world systems.
There are some 'Camera only' CompactFlash devices out there
such as "RITZ BIG PRINT DIGITAL FILM" made by Lexar Media.
The problem with this is that it's not 5-volt tollerant.
I've found that if you plug this into an IDE converter/connector,
it gets very hot and hangs the whole IDE I/O subsystem. It
didn't burn out anything, however, and I was able to use
it subsequently in my camera which uses only about 3 volts.
So, before you actually purchase something for production
stock, make sure that it works in your hardware.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.42 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
On Mon, Mar 27, 2006 at 12:44:50PM -0500, linux-os (Dick Johnson) wrote:
> Experimental data show that it is not possible to 'destroy' the
> chip by interrupting a write as previously reported by others.
> In fact, one of the destroyed devices was recovered by writing
> all the sectors in the device as in:
> `dd if=/dev/zero of=/dev/hdb bs=1M count=122`.
I have a destroyed card here. And I tried doing that. A rep from
sandisk told me, that yes that model/generation of sandisk could
encounter that situation where the device was simply impossible to
access because of corruption during a write. He also said the card
would have to be sent back to the factory to have the table reset.
Newer generations were going to fix that so it didn't happen again.
> Note that there __is__ a problem that may become a "gotcha" if
> you intend to RAW copy devices, one to another, for production.
> The reported size (number of sectors) is different between
> devices of the same type and manufacturer! Apparently, the size
> gets set when the device is tested.
Yeah, I load cards by partitioning, mkfs'ing, and extracting data.
Different manufacturers almost never have the same excact size.
Len Sorensen
[email protected] wrote:
> On a hard drive, the head never touches the media. There is no wear.
> The magnetic writing happens across a very small air gap, and nobody's
> ever found a wearout mechanism for the magnetizing part of things, so you
> should be able to overwrite a single sector every rotation of the drive
> (120 times a second) for the lifetime of the drive (years).
Not for disk. When we were running early MULTICS on the mighty GE-645,
paging was to a "firehose drum" which wrote either mainly or exclusively
into one part of core memory, creating enough heat (according to the
FEs) to cause very low MTBF on that box of memory.
--
Bill Davidsen <[email protected]>
Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one errors occurs during
wildcard (glob) expansion.
"linux-os \(Dick Johnson\)" <[email protected]> writes:
[...]
> CompactFlash(tm) like SanDisk(tm) has very good R/W characteristics.
Try to write 512-byte sectors in random order, and I'm sure write
characteristics won't be that good.
> It consists of a connector that exactly emulates an IDE drive connector
> in miniature, an interface controller that emulates and responds to
> most IDE commands, plus a method of performing reads and writes using
> static RAM buffers and permanent storage in NVRAM.
Are you sure they do have NVRAM? What kind of NVRAM? Do they have backup
battery inside to keep NVRAM alive?
[...]
> Note that the actual block size is usually 64k, not the 512 bytes of a
> 'sector'. Apparently, some of the data-space on each block is used for
> relocation and logical-to-physical mapping.
Wrong. AFAIK, first disks had FLASH with 512b blocks, then next
generation had 16K blocks, and currently most of cards have 128K
blocks. Besides, each page of a block (64 pages * 2K for 128K block) has
additional "system" area of 64 bytes. One thing that is in the system
area is bad block indicator (2 bytes) to mark some blocks as bad on
factory, and the rest could be used by application[1] the same way the
rest of the page is used. So physical block size is in fact 64 * (2048 +
64) = 135168 bytes.
Due to FLASH properties, it's a must to have ECC protection of the data
on FLASH, and AFAIK 22-bits ECC is stored for every 256 bytes of data,
so part of that extra memory on each page is apparently used for ECC
storage taking about 24 bytes out of those 64. I have no idea how the
rest of extra memory is used though.
BTW, the actual block size could be rather easily found from outside, --
just compare random access write speed against sequential write speed
using different number of 512b sectors as a write unit. Increase number
of sectors in a write unit until you get a jump in random access write
performance, -- that will give you the number of sectors in the block.
[1] By application here I mean the code that works inside the CF card
and deals with the FLASH directly. This memory is invisible from outside
of CF card.
-- Sergei.
On 3/28/06, Sergei Organov <[email protected]> wrote:
> "linux-os \(Dick Johnson\)" <[email protected]> writes:
> > Note that the actual block size is usually 64k, not the 512 bytes of a
> > 'sector'. Apparently, some of the data-space on each block is used for
> > relocation and logical-to-physical mapping.
>
> Wrong. AFAIK, first disks had FLASH with 512b blocks, then next
> generation had 16K blocks, and currently most of cards have 128K
> blocks. Besides, each page of a block (64 pages * 2K for 128K block) has
> additional "system" area of 64 bytes. One thing that is in the system
> area is bad block indicator (2 bytes) to mark some blocks as bad on
> factory, and the rest could be used by application[1] the same way the
> rest of the page is used. So physical block size is in fact 64 * (2048 +
> 64) = 135168 bytes.
Doesn't this depend on if we are talking about NOR or NAND memory? It
looks like you are describing some kind of NAND memory. Also I guess
it varies with manufacturer.
When it comes to CF the internal block size doesn't really matter
because the CF controller will hide it for you. The controller will
perform some kind of mapping between the 512 byte based IDE-interface
and it's internal sector size. This together with wear levelling.
The quality of the wear levelling will probably vary, but I guess even
the most primitive brands have something to cope with the fact that
the blocks containing the FAT are often rewritten on FAT filesystems.
/ magnus
"Magnus Damm" <[email protected]> writes:
> On 3/28/06, Sergei Organov <[email protected]> wrote:
>> "linux-os \(Dick Johnson\)" <[email protected]> writes:
>> > Note that the actual block size is usually 64k, not the 512 bytes of a
>> > 'sector'. Apparently, some of the data-space on each block is used for
>> > relocation and logical-to-physical mapping.
>>
>> Wrong. AFAIK, first disks had FLASH with 512b blocks, then next
>> generation had 16K blocks, and currently most of cards have 128K
>> blocks. Besides, each page of a block (64 pages * 2K for 128K block) has
>> additional "system" area of 64 bytes. One thing that is in the system
>> area is bad block indicator (2 bytes) to mark some blocks as bad on
>> factory, and the rest could be used by application[1] the same way the
>> rest of the page is used. So physical block size is in fact 64 * (2048 +
>> 64) = 135168 bytes.
>
> Doesn't this depend on if we are talking about NOR or NAND memory? It
> looks like you are describing some kind of NAND memory. Also I guess
> it varies with manufacturer.
Yes, I talk about NAND FLASH as I've never seen CF cards based on NOR FLASH,
-- NOR FLASH write times and capacities are just too poor, I think.
> When it comes to CF the internal block size doesn't really matter
> because the CF controller will hide it for you. The controller will
> perform some kind of mapping between the 512 byte based IDE-interface
> and it's internal sector size. This together with wear levelling.
Yes, it will, but it can't entirely hide internal block size from you,
-- just compare write times for random access against write times for
sequential access. Old SanDisk CF cards based on NAND FLASH with 512b
blocks had these times roughly the same, and all recent CF cards that
I've tested have very big difference.
-- Sergei.
On Mon, 27 Mar 2006, Sergei Organov wrote:
> "linux-os \(Dick Johnson\)" <[email protected]> writes:
> [...]
>> CompactFlash(tm) like SanDisk(tm) has very good R/W characteristics.
>
> Try to write 512-byte sectors in random order, and I'm sure write
> characteristics won't be that good.
>
>> It consists of a connector that exactly emulates an IDE drive connector
>> in miniature, an interface controller that emulates and responds to
>> most IDE commands, plus a method of performing reads and writes using
>> static RAM buffers and permanent storage in NVRAM.
>
> Are you sure they do have NVRAM? What kind of NVRAM? Do they have backup
> battery inside to keep NVRAM alive?
>
NVRAM means [N]on-[V]olatile-[RAM]. Any of many types, currently NAND flash.
No battery required.
> [...]
>
>> Note that the actual block size is usually 64k, not the 512 bytes of a
>> 'sector'. Apparently, some of the data-space on each block is used for
>> relocation and logical-to-physical mapping.
>
> Wrong. AFAIK, first disks had FLASH with 512b blocks, then next
> generation had 16K blocks, and currently most of cards have 128K
> blocks. Besides, each page of a block (64 pages * 2K for 128K block) has
> additional "system" area of 64 bytes. One thing that is in the system
> area is bad block indicator (2 bytes) to mark some blocks as bad on
> factory, and the rest could be used by application[1] the same way the
> rest of the page is used. So physical block size is in fact 64 * (2048 +
> 64) = 135168 bytes.
>
> Due to FLASH properties, it's a must to have ECC protection of the data
> on FLASH, and AFAIK 22-bits ECC is stored for every 256 bytes of data,
> so part of that extra memory on each page is apparently used for ECC
> storage taking about 24 bytes out of those 64. I have no idea how the
> rest of extra memory is used though.
>
Huh? There is no ECC anywhere nor is it required. The flash RAM is
the same kind of flash used in re-writable BIOS, etc. It requires
that an entire page be erased (all bits set high) because the
write only writes zeros. The write-procedure is a byte-at-a-time
and results in a perfect copy being written for each byte. This
procedure is hidden in devices that emulate hard-disks. The
immediate read/writes are cached in internal static RAM and
an ASIC manages everything so that the device looks like an
IDE drive.
> BTW, the actual block size could be rather easily found from outside, --
> just compare random access write speed against sequential write speed
> using different number of 512b sectors as a write unit. Increase number
> of sectors in a write unit until you get a jump in random access write
> performance, -- that will give you the number of sectors in the block.
>
Huh? The major time is the erase before the physical write, the entire
physical page needs to be erased. That's why there is static-RAM buffering.
It is quite unlikely that you will find a page size using any such
method.
> [1] By application here I mean the code that works inside the CF card
> and deals with the FLASH directly. This memory is invisible from outside
> of CF card.
>
> -- Sergei.
>
Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.42 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
"linux-os \(Dick Johnson\)" <[email protected]> writes:
> On Mon, 27 Mar 2006, Sergei Organov wrote:
>
>> "linux-os \(Dick Johnson\)" <[email protected]> writes:
>> [...]
>>> CompactFlash(tm) like SanDisk(tm) has very good R/W characteristics.
>>
>> Try to write 512-byte sectors in random order, and I'm sure write
>> characteristics won't be that good.
>>
>>> It consists of a connector that exactly emulates an IDE drive connector
>>> in miniature, an interface controller that emulates and responds to
>>> most IDE commands, plus a method of performing reads and writes using
>>> static RAM buffers and permanent storage in NVRAM.
>>
>> Are you sure they do have NVRAM? What kind of NVRAM? Do they have backup
>> battery inside to keep NVRAM alive?
>>
>
> NVRAM means [N]on-[V]olatile-[RAM]. Any of many types, currently NAND flash.
> No battery required.
But NAND FLASH, while it is NV(Non-Volatile) *is not* RAM (Random Access
Memory), sorry. So it seems there is no NVRAM inside CFs, right?
-- Sergei.
"linux-os \(Dick Johnson\)" <[email protected]> writes:
> On Mon, 27 Mar 2006, Sergei Organov wrote:
[...]
> Huh? There is no ECC anywhere nor is it required.
Really?! Read *any* specification of NAND FLASH then.
> The flash RAM is the same kind of flash used in re-writable BIOS, etc.
No, it is not. NAND FLASH is used in CFs while NOR FLASH is used in
BIOSes.
> It requires that an entire page be erased (all bits set high) because
> the write only writes zeros. The write-procedure is a byte-at-a-time
> and results in a perfect copy being written for each byte. This
> procedure is hidden in devices that emulate hard-disks. The immediate
> read/writes are cached in internal static RAM and an ASIC manages
> everything so that the device looks like an IDE drive.
What FLASH technology do you think CF cards are based on? It seems you
think it's NOR FLASH, right? I believe you are wrong.
>> BTW, the actual block size could be rather easily found from outside, --
>> just compare random access write speed against sequential write speed
>> using different number of 512b sectors as a write unit. Increase number
>> of sectors in a write unit until you get a jump in random access write
>> performance, -- that will give you the number of sectors in the block.
>>
>
> Huh? The major time is the erase before the physical write, the entire
> physical page needs to be erased. That's why there is static-RAM buffering.
> It is quite unlikely that you will find a page size using any such
> method.
Once again, you seem to assume NOR FLASH, and AFAIK that's not the
case. For NAND FLASH block (128Kb) erase time is in order of 2ms, and
write time is about 20ms.
-- Sergei.
>> Note that the actual block size is usually 64k, not the 512 bytes of a
>> 'sector'. Apparently, some of the data-space on each block is used for
>> relocation and logical-to-physical mapping.
> Wrong. AFAIK, first disks had FLASH with 512b blocks, then next
> generation had 16K blocks, and currently most of cards have 128K
> blocks. Besides, each page of a block (64 pages * 2K for 128K block) has
> additional "system" area of 64 bytes. One thing that is in the system
> area is bad block indicator (2 bytes) to mark some blocks as bad on
> factory, and the rest could be used by application[1] the same way the
> rest of the page is used. So physical block size is in fact 64 * (2048 +
> 64) = 135168 bytes.
Er, I think you know what you're talking about, but some people reading
this might be confused by the Flash-ROM-specific meaning of the word
"block" here.
In NAND Flash terminology, a PAGE is the unit of write. Thus was
originaly 256+8 bytes, which quickly got bumped to 512+16 bytes.
This is called a "small page" device.
"large page" devices have 2048+64 byte pages.
E.g. the 2 Gbyte device at
http://www.samsung.com/Products/Semiconductor/NANDFlash/SLC_LargeBlock/16Gbit/K9WAG08U1M/K9WAG08U1M.htm
Now, in a flash device, "writing" is changing selected bits from 1 to 0.
"Erasing" is changing a large chunk of bits to 1.
In some NOR devices, you can perform an almost unlimited number of writes,
limited only by the fact that each one has to change at least one 1 bit
to a 0 or there's no point.
Due to the multiplexing scheme used in high-density NAND flash devices,
even the non-programmed cells are exposed to a fraction of the programming
voltage and there are very low limits on the number of write cycles to
a page before it has to be erased again. Exceeding that can cause some
unwanted bits to change from 1 to 0. Typically, however, it is enough
to write each 512-byte portion of a page independently.
Now, erasing is done in larger units called BLOCKs. This is more
variable, but a power of two multiple of the page size. 32 to 64 pages
(16 k for small page/32-page blocks to 128K for large page with 64-page
blocks) is a typical quantity. You can only erase a block at a time.
So you really only need to keep track of wear levaling at the erase
block level.
But any file system (or disk emulation layer) consists of having
a free block at all times, picking a block to re-use, copying the
still-needed data to the free block, and then erasing the new block.
This leaves you with one completely free block, and hopefully a bit
extra as the chosen block was not completely in use.
How you choose the blocks to re-use is the heart of wear leveling.
For best efficiency, you want to re-use blocks that have the last
still-needed (non-garbage) contents, but sometimes you have to copy a
completely full block that's been sitting there not bothering anyone
just because it has been sitting there and it's less worn out that the
"busy" blocks.
In fact, such data should be copied to the most-worn blocks, since
it will "give them a rest", while the least-worn blocks available
should be used for new data. Since "most data dies young", it's
likely that the new data block will have to be copied for garbage
collection purposes. It can get very complex.
Oh, and just for the curious, let me explain the terms NAND and NOR flash.
The classic N-channel MOS transistor has three terminals: a source,
a drain, and a gate. When the gate is raised to a positive voltage,
it attracts electrons to the opposite side of the oxide insulator, and
forms a conductive channel connecting the source and drain. Since they're
negatively charged electrons, it's an N channel.
A positive voltage relative to what? Relative to the source and the drain,
of course, since that's where the electrons have to be attracted from!
In many discrete transistors, there's a difference between the source
and the drain, but the standard MOS IC process makes them completely
symmetrical.
So an N-channel MOS transistor is good for connecting things to low
voltages, but as the voltage gets closer to the gate voltage, the
resistance increases until it cuts off. (Complementary MOS,
or CMOS, technology combines this with a P-channel transistor
with exactly the opposite properties to build devices that pass
high and low voltages well.)
But still, you can build a good inverter out of an N-channel transistor
by pulling the output up through a resistor or current source, and then,
if the input to the transistor's gate goes high, the transistor (with
its other end connected to ground) pulls the output low.
Now, if you connect multiple transistors to the same output line
in parallel, any one of them turning on can pull it low. This is a
NOR gate. NOR memory is built of rows and columns, and each
column is a thousand-input NOR gate. Each input is a row, and
then you select the column to find the
There are two ways to turn this into memory.
In dynamic RAM, each transistor is connected not to ground, but
to a capacitor. When you turn it on, the capacitor might or
might not pull the output low.
In programmable "read-only" memory, each transistor is connected
to ground, but is made with a special "floating gate" that holds
a small charge. This charge has the effect of altering the
threshold voltage for the transistor. If the bit is programmed,
the floating gate has a positive sharge on it, and it "helps"
the main gate, so the voltage on the main gate needed to pull the
output low is reduced. If the bit is erased, there's no charge,
and the voltage needed on the main gate to pull the output low is
higher.
To read a bit from the ROM, you feed a carefully chosen intermediate
voltage down the row line, and the column line is pulled low (or not)
depending on the programmed value.
Now, when you have an itty-bitty transistor trying to pull down a
great big long (high-capacitance) column line, the actual effect it has
is quite small. It'll get there eventually, but like pushing a car,
it takes a quile to get started. There's a special circuit known as a
"sense amplifier" that senses the very small effect it has and amplifies
it to get the data out in a reasonable number of nanoseconds.
Okay, so that's NOR memory. Note that you have to connect three
wires to each transistor: a row line, a column line, and ground.
That takes a certain amount of space.
Now, go back to our original NMOS inverter circuit, with a pull-up
resistor and a pull-down transistor. Suppose you added a second
transistor, not in parallel with the first, but in series.
Then you'd have to turn on (supply a high gate voltage to) BOTH
transistors to pull the output low. This is a NAND gate.
You can put a long chain of transistors together, and they all have to
be on to pull the output low. You can't quite put thousands together,
because a transistor isn't as good a conductor as a wire, but you can
hook 16 together easily enough.
Now, take those 16, add a 17th transistor to hook the group up to a
column line, and NOR together a pile of those groups.
You can read any transistor in a group by turning the other 15 (and
the 17th) on all the way, and giving the selected transistor the
halfway voltage to see how it's programmed.
Each group needs a single column line and a single ground, and each
transistor needs a row line to drive its gate, but the sources and drains
are simply connected to adjacent transistors.
This reduction of the wires per transistor from 3 to 1 + 3/16, and the
fact that the 1 is the gate, which is already a metal wire in a MOS
(metal-oxide-semiconductor) transistor allows a REALLY DENSE packing
of the transistors, which is how NAND flash can fir more storage
cells in than NOR flash.
BUT... even though you have to program the groups all together you
can't read them all at once. Reading a group is a 16-step operation.
You have to assign them adjacent addresses or programming would make
no sense, but you can't read them all at the same time.
Thus, NAND flash is slower to read, but denser than NOR flash.
The fact that manufacturing defects are allowed in NAND flash
allows further density increases.
> Due to FLASH properties, it's a must to have ECC protection of the data
> on FLASH, and AFAIK 22-bits ECC is stored for every 256 bytes of data,
> so part of that extra memory on each page is apparently used for ECC
> storage taking about 24 bytes out of those 64. I have no idea how the
> rest of extra memory is used though.
The "classic" ECC design is a simple Hamming code over 256 bytes =
2^11 bits. Assign each bit an 11-bit address, and for each of those
11 bits, compute two parity bits - one over the 1024 who have that
address bit zero, and one over the 1024 who have that address bit 1.
If you have a single-bit error, one of each pair of parity bits will
be wrong, with the combination giving the location of the erroneous bit.
A double-bit error will leave some parity bit pairs both right or both
wrong.
But more recent designs use 4-byte-correcting Reed-Solomon codes.
For an example, see the ST72681 data sheet at
http://www.st.com/stonline/books/ascii/docs/11352.htm
This computes 10 ECC bytes per 512 bytes of data and can correct up to 4
errors, or correct 3 and detect up to 5, or any other combination where
detect >= correct and correct + detect adds up to 8.
[email protected] writes:
>>> Note that the actual block size is usually 64k, not the 512 bytes of a
>>> 'sector'. Apparently, some of the data-space on each block is used for
>>> relocation and logical-to-physical mapping.
>
>> Wrong. AFAIK, first disks had FLASH with 512b blocks, then next
>> generation had 16K blocks, and currently most of cards have 128K
>> blocks. Besides, each page of a block (64 pages * 2K for 128K block) has
>> additional "system" area of 64 bytes. One thing that is in the system
>> area is bad block indicator (2 bytes) to mark some blocks as bad on
>> factory, and the rest could be used by application[1] the same way the
>> rest of the page is used. So physical block size is in fact 64 * (2048 +
>> 64) = 135168 bytes.
>
> Er, I think you know what you're talking about, but some people reading
> this might be confused by the Flash-ROM-specific meaning of the word
> "block" here.
Yes, it's NAND-FLASH specific indeed, and block here means the unit of
erasure, while page is a unit of write, as you carefully describe
below.
> In NAND Flash terminology, a PAGE is the unit of write. Thus was
> originaly 256+8 bytes,
Yes, with 2 pages per block, AFAIK, thus the block was 512 bytes and was
equal to the sector size that is used in the interface of CF card. I
indeed used a few of them and while their average write time was much
worse than those with current technology, it was *much* more predictable
due to the block size not exceeding interface sector size.
> which quickly got bumped to 512+16 bytes.
Yes, with 32 pages per block, thus the block became 16K.
> This is called a "small page" device. "large page" devices have
> 2048+64 byte pages.
Yes, with 64 pages per block (typically?), thus the block became 128K.
BTW, it's a pity the information about physical block size is not
accessible anywhere in CF interface.
> E.g. the 2 Gbyte device
> at
> http://www.samsung.com/Products/Semiconductor/NANDFlash/SLC_LargeBlock/16Gbit/K9WAG08U1M/K9WAG08U1M.htm
>
> Now, in a flash device, "writing" is changing selected bits from 1 to 0.
> "Erasing" is changing a large chunk of bits to 1.
>
> In some NOR devices, you can perform an almost unlimited number of writes,
> limited only by the fact that each one has to change at least one 1 bit
> to a 0 or there's no point.
>
> Due to the multiplexing scheme used in high-density NAND flash devices,
> even the non-programmed cells are exposed to a fraction of the programming
> voltage and there are very low limits on the number of write cycles to
> a page before it has to be erased again. Exceeding that can cause some
> unwanted bits to change from 1 to 0. Typically, however, it is enough
> to write each 512-byte portion of a page independently.
Well, I'm not sure. The Toshiba and Samsung NANDs I've read manuals for
seem to limit number of writes to a single page before block erase, --
is 512-byte portion some implementation detail I'm not aware of?
> Now, erasing is done in larger units called BLOCKs. This is more
> variable, but a power of two multiple of the page size. 32 to 64 pages
> (16 k for small page/32-page blocks to 128K for large page with 64-page
> blocks) is a typical quantity. You can only erase a block at a time.
Typical? Are you aware of a "large page" NAND FLASH with different
number of pages per block? It's not just curiosity, it's indeed
important for me to know if there are CF cards in the market with
physical block size != 128K.
[... skip interesting and nicely put details of NAND technology ...]
-- Sergei.
>> Due to the multiplexing scheme used in high-density NAND flash devices,
>> even the non-programmed cells are exposed to a fraction of the programming
>> voltage and there are very low limits on the number of write cycles to
>> a page before it has to be erased again. Exceeding that can cause some
>> unwanted bits to change from 1 to 0. Typically, however, it is enough
>> to write each 512-byte portion of a page independently.
> Well, I'm not sure. The Toshiba and Samsung NANDs I've read manuals for
> seem to limit number of writes to a single page before block erase, --
> is 512-byte portion some implementation detail I'm not aware of?
No. I just meant that I generally see "you may program each 2K page a
maximum of 4 times before performing an erase cycle", and I assume the
spec came from 2048/512 = 4, so you can program each 512-byte sector
separately. I would assume if the page size were changed again, they'd
try to keep that property.
E.g. from the Samsung K9K8G08U1A/K9F4G08U0A data sheet (p. 34):
PAGE PROGRAM
The device is programmed basically on a page basis, but it does
allow multiple partial page programming of a word or consecutive
bytes up to 2,112, in a single page program cycle. The number
of consecutive partial page programming operations within the
same page without an intervening erase operation must not exceed
4 times for a single page. The addressing should be done in
sequential order in a block.
[...]
The internal write verify detects only errors for "1"s that are
not successfully programmed to "0"s."
>> Now, erasing is done in larger units called BLOCKs. This is more
>> variable, but a power of two multiple of the page size. 32 to 64 pages
>> (16 k for small page/32-page blocks to 128K for large page with 64-page
>> blocks) is a typical quantity. You can only erase a block at a time.
> Typical? Are you aware of a "large page" NAND FLASH with different
> number of pages per block? It's not just curiosity, it's indeed
> important for me to know if there are CF cards in the market with
> physical block size != 128K.
No, I'm not aware of any violations of that rule; I just hadn't looked
hard enough to verify that it was a rule, but I had seen the device ID
bits that allow a wide range to be specified.
The "more variable" statement is really based on NOR flash experience,
where it truly does vary all over the map.
> [... skip interesting and nicely put details of NAND technology ...]
Hopefully this makes descriptions like the start of the Samsung data
sheet more comprehensible:
Product Information
The K9F4G08U0A is a 4,224 Mbit (4,429,185,024 bit) memory
organized as 262,144 rows (pages) by 1,112x8 columns. Spare
64x8 columns are located from column address of 2,048-2,111.
A 2,112-byte data register is connected to memory cell arrays
accomodating data transfer between the I/O buffers and memory
during page read and page program operations. The memory array
is made up of 32 cells that are serially connected to form a
NAND structure. Each of the 32 cells resides in a different page.
A block consists of two NAND structured strings. A NAND structure
consists of 32 cells. Total 1,081,244 NAND cells reside in
a block. The program and read operations are executed on a page
basis, while the erase operation is executed on a block basis.
The memory array consists of 4,086 separately erasable 128K-byte
blocks. It indicates that the bit-by-bit erase operation is
prohibited on the K9F4G08U0A.
[email protected] writes:
>>> Due to the multiplexing scheme used in high-density NAND flash devices,
>>> even the non-programmed cells are exposed to a fraction of the programming
>>> voltage and there are very low limits on the number of write cycles to
>>> a page before it has to be erased again. Exceeding that can cause some
>>> unwanted bits to change from 1 to 0. Typically, however, it is enough
>>> to write each 512-byte portion of a page independently.
>
>> Well, I'm not sure. The Toshiba and Samsung NANDs I've read manuals for
>> seem to limit number of writes to a single page before block erase, --
>> is 512-byte portion some implementation detail I'm not aware of?
>
> No. I just meant that I generally see "you may program each 2K page a
> maximum of 4 times before performing an erase cycle", and I assume the
> spec came from 2048/512 = 4, so you can program each 512-byte sector
> separately.
I've a file system implementation that writes up to 3 times to the first
3 bytes of the first page of a block (clearing more and more bits every
time), and it seems to work in practice, so maybe this number (4) came
from another source? Alternatively, it works by accident and then I need
to reconsider the design.
-- Sergei.
>>>> Due to the multiplexing scheme used in high-density NAND flash devices,
>>>> even the non-programmed cells are exposed to a fraction of the programming
>>>> voltage and there are very low limits on the number of write cycles to
>>>> a page before it has to be erased again. Exceeding that can cause some
>>>> unwanted bits to change from 1 to 0. Typically, however, it is enough
>>>> to write each 512-byte portion of a page independently.
>>>
>>> Well, I'm not sure. The Toshiba and Samsung NANDs I've read manuals for
>>> seem to limit number of writes to a single page before block erase, --
>>> is 512-byte portion some implementation detail I'm not aware of?
>>
>> No. I just meant that I generally see "you may program each 2K page a
>> maximum of 4 times before performing an erase cycle", and I assume the
>> spec came from 2048/512 = 4, so you can program each 512-byte sector
>> separately.
>
> I've a file system implementation that writes up to 3 times to the first
> 3 bytes of the first page of a block (clearing more and more bits every
> time), and it seems to work in practice, so maybe this number (4) came
> from another source? Alternatively, it works by accident and then I need
> to reconsider the design.
No, I'm sorry, I was still unclear. The spec is 4 writes per page.
I believe that the REASON for this spec was so that people could write
512+16 bytes at a time just like they did with small-block devices and
it would work.
But I do not believe there is any limitation on the pattern you may use,
so your system should work fine.
What confuses me is that I thought I said (quoted above; paraphrasing
here) "there is a very low limit on the number of times you may write
to a page. That limit is large enough that you can do pagesize/512 =
2048/512 = 4 separate 512-byte writes." I didn't intend to imply that
that was the ONLY legal pattern.
But from your comments, I'm getting the impression that you think I did
say that was the only legal pattern. If that impression is correct,
I'm not sure how you read that into my statements.
(I wonder if the actual limit is the number of writes per BLOCK, and
they just expressed it as writes per page. I don't know enough about
the programming circuitry to know what's exposed to what voltages.
If the physics implied it, it would be useful flexibility for file system
design.)
[email protected] writes:
>>>>> Due to the multiplexing scheme used in high-density NAND flash devices,
>>>>> even the non-programmed cells are exposed to a fraction of the programming
>>>>> voltage and there are very low limits on the number of write cycles to
>>>>> a page before it has to be erased again. Exceeding that can cause some
>>>>> unwanted bits to change from 1 to 0. Typically, however, it is enough
>>>>> to write each 512-byte portion of a page independently.
>>>>
>>>> Well, I'm not sure. The Toshiba and Samsung NANDs I've read manuals for
>>>> seem to limit number of writes to a single page before block erase, --
>>>> is 512-byte portion some implementation detail I'm not aware of?
>>>
>>> No. I just meant that I generally see "you may program each 2K page a
>>> maximum of 4 times before performing an erase cycle", and I assume the
>>> spec came from 2048/512 = 4, so you can program each 512-byte sector
>>> separately.
>>
>> I've a file system implementation that writes up to 3 times to the first
>> 3 bytes of the first page of a block (clearing more and more bits every
>> time), and it seems to work in practice, so maybe this number (4) came
>> from another source? Alternatively, it works by accident and then I need
>> to reconsider the design.
>
> No, I'm sorry, I was still unclear. The spec is 4 writes per page.
> I believe that the REASON for this spec was so that people could write
> 512+16 bytes at a time just like they did with small-block devices and
> it would work.
Ah, now (due to your explanation below) I see! But then how do you
explain that those old "small page" devices do support multiple writes
to a single page?
> But I do not believe there is any limitation on the pattern you may use,
> so your system should work fine.
>
> What confuses me is that I thought I said (quoted above; paraphrasing
> here) "there is a very low limit on the number of times you may write
> to a page. That limit is large enough that you can do pagesize/512 =
> 2048/512 = 4 separate 512-byte writes." I didn't intend to imply that
> that was the ONLY legal pattern.
>
> But from your comments, I'm getting the impression that you think I did
> say that was the only legal pattern. If that impression is correct,
> I'm not sure how you read that into my statements.
Well, after your last explanation I'm not sure myself how I've read that
into your statements ;)
> (I wonder if the actual limit is the number of writes per BLOCK, and
> they just expressed it as writes per page. I don't know enough about
> the programming circuitry to know what's exposed to what voltages.
> If the physics implied it, it would be useful flexibility for file system
> design.)
I doubt block is relevant here, otherwise there seems to be no reason to
introduce "page" as a write unit in the first place. It seems that write
voltage is applied to entire page, and bit 1 in the data indeed leaks
some charge (less than 1/4 of those bit 0 leaks). I think block in this
context is mentioned only because it's impossible to erase single page.
-- Sergei.