My goal is to provide Amazon S3 or Google Cloud Storage as a block
device. I would like to leverage the libraries that exist for both
systems by servicing requests via a user space program.
I found 2 LKML threads that talk about a "userspace block device":
2005-11-09: http://article.gmane.org/gmane.linux.kernel/346883
2009-07-27: http://article.gmane.org/gmane.linux.kernel/869784
The first thread resulted in Michael Clark suggesting his kernel
module: https://github.com/michaeljclark/userblk The second
essentially resulted in "use nbd". Mr. Clark's module is now over 10
years old, and ndb seems like a bit of a Rube Goldberg solution.
Does the kernel now supports a facility to service bio requests via
user space? If not, what would be the best approach to take? Update
Mr. Clark's code? Or is there a newer and more efficient facility for
kernel <-> user space communication and transferring of data?
Thanks...
Bill-
On Mon, May 18, 2015 at 2:01 PM, Bill Speirs <[email protected]> wrote:
> My goal is to provide Amazon S3 or Google Cloud Storage as a block
> device. I would like to leverage the libraries that exist for both
> systems by servicing requests via a user space program.
>
> I found 2 LKML threads that talk about a "userspace block device":
>
> 2005-11-09: http://article.gmane.org/gmane.linux.kernel/346883
> 2009-07-27: http://article.gmane.org/gmane.linux.kernel/869784
>
> The first thread resulted in Michael Clark suggesting his kernel
> module: https://github.com/michaeljclark/userblk The second
> essentially resulted in "use nbd". Mr. Clark's module is now over 10
> years old, and ndb seems like a bit of a Rube Goldberg solution.
I wrote the busybox and toybox nbd clients, and have a todo list item
to write an nbd server for toybox. I believe there's also an nbd
server in qemu. I haven't found any decent documentation on the
protocol yet, but what specifically makes you describe it as rube
goldberg?
Rob
On Tue, May 19, 2015 at 1:34 AM, Rob Landley <[email protected]> wrote:
> On Mon, May 18, 2015 at 2:01 PM, Bill Speirs <[email protected]> wrote:
>> My goal is to provide Amazon S3 or Google Cloud Storage as a block
>> device. I would like to leverage the libraries that exist for both
>> systems by servicing requests via a user space program.
>> ... ndb seems like a bit of a Rube Goldberg solution.
>
> I wrote the busybox and toybox nbd clients, and have a todo list item
> to write an nbd server for toybox. I believe there's also an nbd
> server in qemu. I haven't found any decent documentation on the
> protocol yet, but what specifically makes you describe it as rube
> goldberg?
My understanding of using nbd is:
- Write an ndb-server that is essentially a gateway between nbd and
S3/Google. For each nbd request, I translate it into the appropriate
S3/Google request and respond appropriately.
- I'd run the above server on the machine on some port.
- I'd run a client on the same server using 127.0.0.1 and the above
port, providing the nbd block device.
- Go drink a beer as I rack up a huge bill with Amazon or Google
Seems a bit much to run a client & server on the same machine with
socket overhead, etc. In looking at the code for your nbd-client
(https://github.com/landley/toybox/blob/master/toys/other/nbd_client.c)
I'm wondering if I couldn't just set a pipe instead of a socket in the
ioctl(nbd, NBD_SET_SOCK, sock) step, then have the same proc (or fork)
listening on the pipe so it's all in a single process/codebase.
Thoughts on this approach?
That said, clearly my bottleneck in all of this will be the
communication with S3/Google, and using something like dm-cache would
make it appear fast for most requests. So maybe my Rube Goldberg
comment was too over-the-top.
Thank you for the pointers and the feedback!
Bill-
> - Write an ndb-server that is essentially a gateway between nbd and
> S3/Google. For each nbd request, I translate it into the appropriate
> S3/Google request and respond appropriately.
> - I'd run the above server on the machine on some port.
> - I'd run a client on the same server using 127.0.0.1 and the above
> port, providing the nbd block device.
> - Go drink a beer as I rack up a huge bill with Amazon or Google
And you probably would because the block layer will see a lot of I/O
requests that you would really want to process locally, as well as stuff
caused by working at the block not file level (like readaheads).
You also can't deal with coherency this way - eg sharing the virtual disk
between two systems because the file system code isn't expecting other
clients to modify the disk under it.
Rather than nbd you could also look at drbd or some similar kind of
setup where you keep the entire filestore locally and write back changes
to the remote copy. As you can never share the filestore when mounted you
can cache it pretty aggressively.
Alan
On Tue, May 19, 2015 at 11:19 AM, One Thousand Gnomes
<[email protected]> wrote:
>> ... rack up a huge bill with Amazon or Google
>
> And you probably would because the block layer will see a lot of I/O
> requests that you would really want to process locally, as well as stuff
> caused by working at the block not file level (like readaheads).
>
> You also can't deal with coherency this way - eg sharing the virtual disk
> between two systems because the file system code isn't expecting other
> clients to modify the disk under it.
>
> Rather than nbd you could also look at drbd or some similar kind of
> setup where you keep the entire filestore locally and write back changes
> to the remote copy. As you can never share the filestore when mounted you
> can cache it pretty aggressively.
What kinds of things could I process locally? I was thinking I could
keep a bitmap of "sectors" that have never been written to, then just
return zeroed-out sectors for those. What else I could do? Thoughts?
I'm not looking to share the filesystem, just never have to buy a
bigger disk again and get pseudo-backup along with it (I realize
things in my cache would be lost if my house burned to the ground).
drbd isn't really what I'm looking for, because I don't want to have
to buy a disk that's large enough to fit everything. Just a small fast
SSD (or RAM disk) to cache commonly used files, then spill-over to the
cloud for everything else. In theory, I would have a /home that is
"infinite", and fairly fast for things that are cached.
Thanks for the thoughts/points!
Bill-