2012-11-20 22:39:41

by Ezequiel Garcia

[permalink] [raw]
Subject: [RFC/PATCH 0/1] ubi: Add ubiblock driver

Hi,

I'm happy to announce we finally have a read/write ubiblock implementation!

What follows are some historical notes and some implementation hints.
Feel free to comment on anything, ask questions or provide feedback.
You can even fire some flames if you like, my asbestos suite works just fine ;-)

* Some history

It appears this is a long wanted feature. There's a
thread [1] from January 2008, where Artem gives some
hints on implementation.

A year ago, some work was done by David Wagner to implement
a read-only ubi block device (called ubiblk).
His latest patch seems to be this from September, 2011 [1].
Unfortunately, his implementation never got merged.

I've taken his implementation as a starting point (courtesy of free-electrons!),
and taken as much as I could from mtdblock, mtd_blkdev and gluebi
to make the code as consistent as possible.

* About this ubiblock driver

Because ubiblk is unpronounceable (unless you're Russian ;-)
and to match mtdblock naming, I've chosen the simpler name
'ubiblock'.

Also, I've decided to make block devices get automatically created for
each ubi volume present.
This has been done to match gluebi behavior of automatically create an
mtd device
per ubi volume, and to save us the usertool trouble.

The latter is the most important reason: a new usertool means an added
complexity
for the user and yet more crap to maintain.

Besides, each ubiblock is fairly cheap since it's based on workqueues
and not on threads.
I don't know how many ubi volumes a user typically creates, but I
expect not to be too many.

* Read/write support

Yes, this implementation supports read/write access.
It's expected to work fairly well because the request queue at block elevator
is suppose to order block transfers to be space-effective.
In other words, it's expected that reads and writes gets ordered
to point to the same LEB (see Artem's hint at [1]).

To help this and reduce access to the UBI volume, a 1-LEB size
write-back cache has been implemented (similar to the one at mtdblock.c).

Every read and every write, goes through this cache and the write is only
done when a request arrives to read or write to a different LEB or when
the device is released, when the last file handle is closed.
This cache is 1-LEB bytes, vmalloced at open() and freed at release().

* But the bad news are...

I don't own raw flashes devices to test this on.
(Well, I do have a few but they're not supported upstream
and they run on an ancient kernel.)

I've tested this on a qemu environment, creating UBI volumes on top
of nandsim devices. No problems so far, but the testing phase is still ongoing.

I expect to have some flash boards to test in a near future,
but for now, please consider this as experimental stuff.

Feel free to test this on real hardware and report any bugs.
I'll be more than happy to fix them.

* The future

** Implement debugfs files to report statistics on cache behavior.
This would help us to decide what's best.

** Compare different block evelators, and compare a request-queue
based implementation (the current one),
against a do-it-yourself make_request oriented one.
The latter is used in ramdisk (brd.c) and in loop device (loop.c),
though I don't expect to perform better.

This patch is based on today's l2-mtd branch, but probably will
apply on any recent branch.

Thanks,

Ezequiel

[1] http://lists.infradead.org/pipermail/linux-mtd/2008-January/020381.html
[2] http://lwn.net/Articles/452944/


2012-11-21 10:00:49

by Thomas Petazzoni

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/1] ubi: Add ubiblock driver

Dear Ezequiel Garcia,

On Tue, 20 Nov 2012 19:39:38 -0300, Ezequiel Garcia wrote:

> * Read/write support
>
> Yes, this implementation supports read/write access.

While I think the original ubiblk that was read-only made sense to
allow the usage of read-only filesystems like squashfs, I am not sure a
read/write ubiblock is useful.

Using a standard block read/write filesystem on top of ubiblock is going
to cause damage to your flash. Even though UBI does wear-leveling, your
standard block read/write filesystem will think it has 512 bytes block
below him, and will do a crazy number of writes to small blocks. Even
though you have a one LEB cache, it is going to be defeated quite
strongly by the small random I/O of the read/write filesystem.

I am not sure letting people use read/write block filesystems on top of
flashes, even through UBI, is a good idea.

Thomas
--
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

2012-11-21 10:42:26

by Ezequiel Garcia

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/1] ubi: Add ubiblock driver

Hi Thomas,

On Wed, Nov 21, 2012 at 7:00 AM, Thomas Petazzoni
<[email protected]> wrote:
> Dear Ezequiel Garcia,
>
> On Tue, 20 Nov 2012 19:39:38 -0300, Ezequiel Garcia wrote:
>
>> * Read/write support
>>
>> Yes, this implementation supports read/write access.
>
> While I think the original ubiblk that was read-only made sense to
> allow the usage of read-only filesystems like squashfs, I am not sure a
> read/write ubiblock is useful.
>
> Using a standard block read/write filesystem on top of ubiblock is going
> to cause damage to your flash. Even though UBI does wear-leveling, your
> standard block read/write filesystem will think it has 512 bytes block
> below him, and will do a crazy number of writes to small blocks. Even
> though you have a one LEB cache, it is going to be defeated quite
> strongly by the small random I/O of the read/write filesystem.
>

Well, I was hoping for the opposite to happen;
and hoping for the 1-LEB cache to be able to absorb
the multiple write from filesystems.

My line of reasoning is as follows.
As we all know, LEBs are much much bigger than regular disk blocks;
typically 128KiB.

Now, filesystems won't care at all about wear levelling
and thus will carelessly perform lots of reads/writes at any disk sector.

Because block elevator will try to minimize seek time, it will try to order
block requests to be contiguous. Since LEBs are much bigger than sector
blocks, this ordering will result mostly in the same LEB being addressed.

Only when a read or write arrives at a different LEB than the one in cache,
will ubiblock flush it to disk.

My **very limited** testing scenario with ext2, showed this was more
or less like this.
Next time, I'll post some benchmarks and some numbers.

Of course, there's a possibility you are right and ubiblock write support
is completely useless.

Thanks for the review,

Ezequiel

2012-11-30 11:07:32

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/1] ubi: Add ubiblock driver

Hi, without the reveiw, I can say that overall this sounds good, thanks!

On Tue, 2012-11-20 at 19:39 -0300, Ezequiel Garcia wrote:
> Also, I've decided to make block devices get automatically created for
> each ubi volume present.
> This has been done to match gluebi behavior of automatically create an
> mtd device
> per ubi volume, and to save us the usertool trouble.
>
> The latter is the most important reason: a new usertool means an added
> complexity
> for the user and yet more crap to maintain.

> I don't know how many ubi volumes a user typically creates, but I
> expect not to be too many.

I think I saw something like 8-10 in some peoples' reports.

> * Read/write support
>
> Yes, this implementation supports read/write access.
> It's expected to work fairly well because the request queue at block elevator
> is suppose to order block transfers to be space-effective.
> In other words, it's expected that reads and writes gets ordered
> to point to the same LEB (see Artem's hint at [1]).
>
> To help this and reduce access to the UBI volume, a 1-LEB size
> write-back cache has been implemented (similar to the one at mtdblock.c).
>
> Every read and every write, goes through this cache and the write is only
> done when a request arrives to read or write to a different LEB or when
> the device is released, when the last file handle is closed.

Sounds good, but you should make sure you flush the cache when the
file-system syncs a file. You can consider this as a disk cache.
File-systems usually sends I/O barriers when the disk cache has to be
flushed. I guess this is what you should also do.

> This cache is 1-LEB bytes, vmalloced at open() and freed at release().

Is it per-block device? Then I am not sure it is a good idea to
automatically create them for every volume...

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-11-30 11:25:00

by Artem Bityutskiy

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/1] ubi: Add ubiblock driver

On Wed, 2012-11-21 at 11:00 +0100, Thomas Petazzoni wrote:
> While I think the original ubiblk that was read-only made sense to
> allow the usage of read-only filesystems like squashfs, I am not sure
> a
> read/write ubiblock is useful.
>
> Using a standard block read/write filesystem on top of ubiblock is
> going
> to cause damage to your flash. Even though UBI does wear-leveling,
> your
> standard block read/write filesystem will think it has 512 bytes block
> below him, and will do a crazy number of writes to small blocks. Even
> though you have a one LEB cache, it is going to be defeated quite
> strongly by the small random I/O of the read/write filesystem.

Well, in practice normal file-system do 4K-aligned I/O, without crazy
things, and try to do I/O sequentially.
>
> I am not sure letting people use read/write block filesystems on top
> of
> flashes, even through UBI, is a good idea.

Why not?

--
Best Regards,
Artem Bityutskiy


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2012-11-30 20:48:40

by Ezequiel Garcia

[permalink] [raw]
Subject: Re: [RFC/PATCH 0/1] ubi: Add ubiblock driver

Hi Artem,

Thanks for the taking the time to answer.

On Fri, Nov 30, 2012 at 8:08 AM, Artem Bityutskiy <[email protected]> wrote:
>
>> I don't know how many ubi volumes a user typically creates, but I
>> expect not to be too many.
>
> I think I saw something like 8-10 in some peoples' reports.
>

Mmm, that's more than I've expected.

[...]
>
>> This cache is 1-LEB bytes, vmalloced at open() and freed at release().
>
> Is it per-block device?

Yes. But notice the vmalloced cache is freed on release().
So, an unused ubiblock won't allocate it.

>Then I am not sure it is a good idea to
> automatically create them for every volume...
>

Given that ubiblock is workqueue based I believe there isn't any
performance penalty in creating many instances.

Regarding memory footprint: we should consider how much
does it cost to create an ubiblock device, plus the underlying
gendisk, request queue, etc.

If people is partitioning its ubi devices in 8-10 volumes,
then I'm not too sure if we want to create a block device
per-volume automatically. Not because of the cache -again, if an
ubiblock is unused it won't allocate any- but because of overall
unnecessary bloatness.

The main idea behind auto-creation is that, IMHO,
ubi ecosystem already has its own rather large set of
userspace tools, I'm against adding yet another one.

I'm gonna keep playing with this and come up with
the long promise numbers.

Ezequiel