2007-08-03 05:05:07

by Manu Abraham

[permalink] [raw]
Subject: Re: Distributed storage.

On 7/31/07, Evgeniy Polyakov <[email protected]> wrote:

> TODO list currently includes following main items:
> * redundancy algorithm (drop me a request of your own, but it is highly
> unlikley that Reed-Solomon based will ever be used - it is too slow
> for distributed RAID, I consider WEAVER codes)


LDPC codes[1][2] have been replacing Turbo code[3] with regards to
communication links and we have been seeing that transition. (maybe
helpful, came to mind seeing the mention of Turbo code) Don't know how
weaver compares to LDPC, though found some comparisons [4][5] But
looking at fault tolerance figures, i guess Weaver is much better.

[1] http://www.ldpc-codes.com/
[2] http://portal.acm.org/citation.cfm?id=1240497
[3] http://en.wikipedia.org/wiki/Turbo_code
[4] http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf
[5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf


2007-08-03 10:44:30

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: Distributed storage.

Hi.

On Fri, Aug 03, 2007 at 09:04:51AM +0400, Manu Abraham ([email protected]) wrote:
> On 7/31/07, Evgeniy Polyakov <[email protected]> wrote:
>
> > TODO list currently includes following main items:
> > * redundancy algorithm (drop me a request of your own, but it is highly
> > unlikley that Reed-Solomon based will ever be used - it is too slow
> > for distributed RAID, I consider WEAVER codes)
>
>
> LDPC codes[1][2] have been replacing Turbo code[3] with regards to
> communication links and we have been seeing that transition. (maybe
> helpful, came to mind seeing the mention of Turbo code) Don't know how
> weaver compares to LDPC, though found some comparisons [4][5] But
> looking at fault tolerance figures, i guess Weaver is much better.
>
> [1] http://www.ldpc-codes.com/
> [2] http://portal.acm.org/citation.cfm?id=1240497
> [3] http://en.wikipedia.org/wiki/Turbo_code
> [4] http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf
> [5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf

Great thanks for this links, I will definitely study them.

--
Evgeniy Polyakov

2007-08-04 02:58:05

by David Dillow

[permalink] [raw]
Subject: Re: Distributed storage.

On Fri, 2007-08-03 at 09:04 +0400, Manu Abraham wrote:
> On 7/31/07, Evgeniy Polyakov <[email protected]> wrote:
>
> > TODO list currently includes following main items:
> > * redundancy algorithm (drop me a request of your own, but it is highly
> > unlikley that Reed-Solomon based will ever be used - it is too slow
> > for distributed RAID, I consider WEAVER codes)
>
>
> LDPC codes[1][2] have been replacing Turbo code[3] with regards to
> communication links and we have been seeing that transition. (maybe
> helpful, came to mind seeing the mention of Turbo code) Don't know how
> weaver compares to LDPC, though found some comparisons [4][5] But
> looking at fault tolerance figures, i guess Weaver is much better.
>
> [1] http://www.ldpc-codes.com/
> [2] http://portal.acm.org/citation.cfm?id=1240497
> [3] http://en.wikipedia.org/wiki/Turbo_code
> [4] http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf
> [5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf

Searching Google for Dr. Plank's work at the University of TN turns up
some analysis of using LDPC codes in storage systems.

http://www.google.com/search?hl=en&q=plank+ldpc&btnG=Google+Search

Patents are an issue to watch out for around the use of Tornado/Raptor
codes. I've not researched it, but I believe there be dragons there.

2007-08-04 03:45:05

by Manu Abraham

[permalink] [raw]
Subject: Re: Distributed storage.

On 8/4/07, Dave Dillow <[email protected]> wrote:
> On Fri, 2007-08-03 at 09:04 +0400, Manu Abraham wrote:
> > On 7/31/07, Evgeniy Polyakov <[email protected]> wrote:
> >
> > > TODO list currently includes following main items:
> > > * redundancy algorithm (drop me a request of your own, but it is highly
> > > unlikley that Reed-Solomon based will ever be used - it is too slow
> > > for distributed RAID, I consider WEAVER codes)
> >
> >
> > LDPC codes[1][2] have been replacing Turbo code[3] with regards to
> > communication links and we have been seeing that transition. (maybe
> > helpful, came to mind seeing the mention of Turbo code) Don't know how
> > weaver compares to LDPC, though found some comparisons [4][5] But
> > looking at fault tolerance figures, i guess Weaver is much better.
> >
> > [1] http://www.ldpc-codes.com/
> > [2] http://portal.acm.org/citation.cfm?id=1240497
> > [3] http://en.wikipedia.org/wiki/Turbo_code
> > [4] http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf
> > [5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf
>
> Searching Google for Dr. Plank's work at the University of TN turns up
> some analysis of using LDPC codes in storage systems.
>
> http://www.google.com/search?hl=en&q=plank+ldpc&btnG=Google+Search
>
> Patents are an issue to watch out for around the use of Tornado/Raptor
> codes. I've not researched it, but I believe there be dragons there.
>

We don't use the code in the driver straight away [2] (in the case
that i mentioned), since that happens in the hardware (demodulator
chip) [1], but we have an interface for selecting the code-rate [2]
(LDPC/BCH) for DVB-S2 and the new papers for DVB-T2 looks geared that
the base decision is to use LDPC.

Though i now see a patent application for it [3]. Not sure whether it
is a registered patent, i am under an agreement of Non-Disclosure with
STM. Will ask the relevant person there, whether they have it
registered. (Most probably they may have it registered).

There are a few people from STM on LK, if not they can possibly
confirm whether the patent is regsitered or not.

[1] http://www2.dac.com/data2/42nd/42acceptedpapers.nsf/0c4c09c6ffa905c487256b7b007afb72/998f93e4b29e99fa87256fc400714617/$FILE/33_1.pdf

[2] http://linuxtv.org/hg/~manu/stb0899-c5/file/760cb230695c/linux/include/linux/dvb/frontend.h

[3] http://www.freepatentsonline.com/20060206779.html
http://www.freepatentsonline.com/20060206778.html

2007-08-04 17:04:09

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: Distributed storage.

On Fri, Aug 03, 2007 at 09:04:51AM +0400, Manu Abraham ([email protected]) wrote:
> On 7/31/07, Evgeniy Polyakov <[email protected]> wrote:
>
> > TODO list currently includes following main items:
> > * redundancy algorithm (drop me a request of your own, but it is highly
> > unlikley that Reed-Solomon based will ever be used - it is too slow
> > for distributed RAID, I consider WEAVER codes)
>
>
> LDPC codes[1][2] have been replacing Turbo code[3] with regards to
> communication links and we have been seeing that transition. (maybe
> helpful, came to mind seeing the mention of Turbo code) Don't know how
> weaver compares to LDPC, though found some comparisons [4][5] But
> looking at fault tolerance figures, i guess Weaver is much better.
>
> [1] http://www.ldpc-codes.com/
> [2] http://portal.acm.org/citation.cfm?id=1240497
> [3] http://en.wikipedia.org/wiki/Turbo_code
> [4] http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf
> [5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf

LDPC codes require to solve N order matrix over finite field - exactly
the reason I do not want to use Reed-Solomon codes even with optimized
non-Vandermonde matrix. I will investigate LDPC further though.
Turbo codes are like flow cipher compared to RS codes being block
ciphers. Transport media is reliable in data storages, otherwise they
would not even exist.

--
Evgeniy Polyakov

2007-08-28 17:19:54

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: Distributed storage.

On Fri, Aug 03, 2007 at 09:04:51AM +0400, Manu Abraham ([email protected]) wrote:
> On 7/31/07, Evgeniy Polyakov <[email protected]> wrote:
>
> > TODO list currently includes following main items:
> > * redundancy algorithm (drop me a request of your own, but it is highly
> > unlikley that Reed-Solomon based will ever be used - it is too slow
> > for distributed RAID, I consider WEAVER codes)
>
>
> LDPC codes[1][2] have been replacing Turbo code[3] with regards to
> communication links and we have been seeing that transition. (maybe
> helpful, came to mind seeing the mention of Turbo code) Don't know how
> weaver compares to LDPC, though found some comparisons [4][5] But
> looking at fault tolerance figures, i guess Weaver is much better.
>
> [1] http://www.ldpc-codes.com/
> [2] http://portal.acm.org/citation.cfm?id=1240497
> [3] http://en.wikipedia.org/wiki/Turbo_code
> [4] http://domino.research.ibm.com/library/cyberdig.nsf/papers/BD559022A190D41C85257212006CEC11/$File/rj10391.pdf
> [5] http://hplabs.hp.com/personal/Jay_Wylie/publications/wylie_dsn2007.pdf

I've studied and implemented LDPC encoder/decoder (hard decoding belief
propagation algo only though) in userspace and found that any such
probabilistic codes generally are not suitable for redundant or
distributed data storages, because of its per-bit nature and probabilistic
error recovery.
Interested reader can find similar to Dr. Plank's iteractive decoding
presentation and some of my analysis about codes and all sources at
project homepage and in blog:

http://tservice.net.ru/~s0mbre/old/?section=projects&item=ldpc

So I consider weaver codes, as a superior decision for distributed
storages.

--
Evgeniy Polyakov