2005-11-09 07:27:46

by Jeff Garzik

[permalink] [raw]
Subject: userspace block driver?


Has anybody put any thought towards how a userspace block driver would work?

Consider a block device implemented via an SSL network connection. I
don't want to put SSL in the kernel, which means the only other
alternative is to pass data to/from a userspace daemon.

Anybody have any favorite methods? [similar to] mmap'd packet socket?
ramfs?

TIA,

Jeff






2005-11-09 07:37:13

by NeilBrown

[permalink] [raw]
Subject: Re: userspace block driver?

On Wednesday November 9, [email protected] wrote:
>
> Has anybody put any thought towards how a userspace block driver
> would work?

Isn't this was enbd does?
http://www.it.uc3m.es/~ptb/nbd/

NeilBrown

2005-11-09 07:46:19

by Jeff Garzik

[permalink] [raw]
Subject: Re: userspace block driver?

Neil Brown wrote:
> On Wednesday November 9, [email protected] wrote:
>
>>Has anybody put any thought towards how a userspace block driver
>>would work?
>
>
> Isn't this was enbd does?
> http://www.it.uc3m.es/~ptb/nbd/

Is there something there relevant for modern kernels? I would sure hope
I could come up with something more lightweight than that.

Jeff



2005-11-09 07:50:20

by Chris Wright

[permalink] [raw]
Subject: Re: userspace block driver?

* Jeff Garzik ([email protected]) wrote:
> Consider a block device implemented via an SSL network connection. I
> don't want to put SSL in the kernel, which means the only other
> alternative is to pass data to/from a userspace daemon.

Sounds a bit like stunnel + nbd.

thanks,
-chris

2005-11-09 07:54:04

by Jens Axboe

[permalink] [raw]
Subject: Re: userspace block driver?

On Wed, Nov 09 2005, Jeff Garzik wrote:
> Neil Brown wrote:
> >On Wednesday November 9, [email protected] wrote:
> >
> >>Has anybody put any thought towards how a userspace block driver
> >>would work?
> >
> >
> >Isn't this was enbd does?
> > http://www.it.uc3m.es/~ptb/nbd/
>
> Is there something there relevant for modern kernels? I would sure hope
> I could come up with something more lightweight than that.

I was going to say drbd, but then you did say more lightweight :-)

Is nbd completely screwed these days?

--
Jens Axboe

2005-11-09 08:01:48

by Jeff Garzik

[permalink] [raw]
Subject: Re: userspace block driver?

Jens Axboe wrote:
> On Wed, Nov 09 2005, Jeff Garzik wrote:
>
>>Neil Brown wrote:
>>
>>>On Wednesday November 9, [email protected] wrote:
>>>
>>>
>>>>Has anybody put any thought towards how a userspace block driver
>>>>would work?
>>>
>>>
>>>Isn't this was enbd does?
>>> http://www.it.uc3m.es/~ptb/nbd/
>>
>>Is there something there relevant for modern kernels? I would sure hope
>>I could come up with something more lightweight than that.
>
>
> I was going to say drbd, but then you did say more lightweight :-)
>
> Is nbd completely screwed these days?

nbd does more than I want.

_All_ that is needed is flipping requests <somehow> to/from userspace.
nbd messes directly with sockets and such, which I don't want. It does
way too much, hardcodes way too much.

loop is a closer model to a generic userspace block device than nbd, I
think.

Though, answering your question directly, I do get the impression that
in-kernel nbd has been left behind in favor of drbd and enbd, out in the
few places where nbd-ish solutions are used.

Jeff



2005-11-09 08:14:16

by Oliver Neukum

[permalink] [raw]
Subject: Re: userspace block driver?



On Wed, 9 Nov 2005, Jeff Garzik wrote:

>
> Has anybody put any thought towards how a userspace block driver would work?
>
> Consider a block device implemented via an SSL network connection. I don't
> want to put SSL in the kernel, which means the only other alternative is to
> pass data to/from a userspace daemon.

I am afraid this is impossible without some heavy infrastructure work.
You will almost inevitably deadlock. Yes, you can mlock() your driver, but
that still will not tell the kernel that GFP_KERNEL must be replaced with
GFP_NOIO if it is triggered by syscalls you are doing.

Regards
Oliver

2005-11-09 08:37:07

by Avi Kivity

[permalink] [raw]
Subject: Re: userspace block driver?

Oliver Neukum wrote:

>On Wed, 9 Nov 2005, Jeff Garzik wrote:
>
>
>
>>Has anybody put any thought towards how a userspace block driver would work?
>>
>>Consider a block device implemented via an SSL network connection. I don't
>>want to put SSL in the kernel, which means the only other alternative is to
>>pass data to/from a userspace daemon.
>>
>>
>
>I am afraid this is impossible without some heavy infrastructure work.
>You will almost inevitably deadlock. Yes, you can mlock() your driver, but
>that still will not tell the kernel that GFP_KERNEL must be replaced with
>GFP_NOIO if it is triggered by syscalls you are doing.
>
>
>
A simple patch can help here (in addition to mlockall()):

http://www.ussg.iu.edu/hypermail/linux/kernel/0407.3/0297.html

you might want to increase the free memory target as well.

2005-11-09 10:12:52

by Lars Marowsky-Bree

[permalink] [raw]
Subject: Re: userspace block driver?

On 2005-11-09T08:54:56, Jens Axboe <[email protected]> wrote:

> > >>Has anybody put any thought towards how a userspace block driver
^^^^^^^^^
> > >>would work?
>
> I was going to say drbd, but then you did say more lightweight :-)
^^^^

drbd is implemented all in-kernel, too.

The deadlock scenarios with running block IO through user-space are
still somewhat unsolved, though. Not that block over network IO (in
particular TCP) is much better even when implemented in-kernel...


Sincerely,
Lars Marowsky-Br?e <[email protected]>

--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

2005-11-09 12:42:38

by Paulo Marques

[permalink] [raw]
Subject: Re: userspace block driver?

Jeff Garzik wrote:
> Jens Axboe wrote:
>> On Wed, Nov 09 2005, Jeff Garzik wrote:
>>> Neil Brown wrote:
>>>> On Wednesday November 9, [email protected] wrote:
>>>>
>>>>> Has anybody put any thought towards how a userspace block driver
>>>>> would work?
>>>>
>>>> Isn't this was enbd does? http://www.it.uc3m.es/~ptb/nbd/
>>>
>>> Is there something there relevant for modern kernels? I would sure
>>> hope I could come up with something more lightweight than that.
>>
>> I was going to say drbd, but then you did say more lightweight :-)
>[...]
>
> loop is a closer model to a generic userspace block device than nbd, I
> think.

That got me thinking... theoretically we should be able to do a FUSE
server that served a single file that could be used by a loopback
device, couldn't we?

IIRC, Miklos Szeredi tried hard to avoid the deadlock scenarios that nbd
suffers from in FUSE, but I don't know if it would stand being called by
the loopback device.

If it works, it should be extremely simple to do the server. Just check
the FUSE hello world server example:

http://fuse.sourceforge.net/helloworld.html

I've CC'ed Miklos Szeredi to see if he can shed some light on the
loopback <-> FUSE combination...

--
Paulo Marques - http://www.grupopie.com

The rule is perfect: in all matters of opinion our
adversaries are insane.
Mark Twain

2005-11-09 12:55:43

by Miklos Szeredi

[permalink] [raw]
Subject: Re: userspace block driver?

>
> That got me thinking... theoretically we should be able to do a FUSE
> server that served a single file that could be used by a loopback
> device, couldn't we?
>
> IIRC, Miklos Szeredi tried hard to avoid the deadlock scenarios that nbd
> suffers from in FUSE, but I don't know if it would stand being called by
> the loopback device.
>
> If it works, it should be extremely simple to do the server. Just check
> the FUSE hello world server example:
>
> http://fuse.sourceforge.net/helloworld.html
>
> I've CC'ed Miklos Szeredi to see if he can shed some light on the
> loopback <-> FUSE combination...

Loopback works fine on FUSE filesystems (unless 'direct_io' mount
option is used).

Miklos

2005-11-09 13:32:36

by Jonathan Corbet

[permalink] [raw]
Subject: Re: userspace block driver?

> Has anybody put any thought towards how a userspace block driver would work?

I know Peter Chubb and the Gelato folks have. Some info is at
http://www.gelato.unsw.edu.au/IA64wiki/UserLevelDrivers. Mostly it says
that the block interface "needs to be cleaned up," and I think it has
been in that state for years. Still, it might be a starting place.

jon

2005-11-09 14:06:14

by Miklos Szeredi

[permalink] [raw]
Subject: Re: userspace block driver?

>
> That got me thinking... theoretically we should be able to do a FUSE
> server that served a single file that could be used by a loopback
> device, couldn't we?
>
> IIRC, Miklos Szeredi tried hard to avoid the deadlock scenarios that nbd
> suffers from in FUSE, but I don't know if it would stand being called by
> the loopback device.

N.B. though FUSE itself is free of deadlocks, as soon as you put
something on top of it which has asyncronous page writeback it will
not be safe anymore.

So the issues with NBD will still be there if you loopback mount a
fuse file.

Miklos

2005-11-09 20:56:20

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: userspace block driver?

On Wed, 9 Nov 2005, Jeff Garzik wrote:

>
> Has anybody put any thought towards how a userspace block driver would work?
>
> Consider a block device implemented via an SSL network connection. I don't
> want to put SSL in the kernel, which means the only other alternative is to
> pass data to/from a userspace daemon.
>
> Anybody have any favorite methods? [similar to] mmap'd packet socket? ramfs?

Heh, thanks, Jeff, for bringing this subject up again, hasn't been that
long ago

http://marc.theaimsgroup.com/?l=linux-kernel&m=113140332009208&w=2

, which was indeed asked with nbd in mind. To remind you and others in

http://marc.theaimsgroup.com/?t=111524157800004&r=1&w=2
http://marc.theaimsgroup.com/?t=111706463800001&r=1&w=2

I played a bit with extending nbd to map block devices to the client
system more transparently, which means, as James Bottomley explained,
basically supporting REQ_BLOCK_PC requests. He also suggested not to use
ioctls on both sides, which is where I stopped. I can understand how to
avoid implementing ioctl in the nbd driver and intercepting REQ_BLOCK_PC
requests instead, but on the server side? Assume you get the request
object on the client, send it to the server, and then? Even if there
existed a generic interface to block devices, allowing to inject requests
directly from user space into block queue, wouldn't the same problems with
endianness, 32/64 bit stay? The advantage, perhaps, would be that the
request structure is standard, so, the conversion would be universal?

So, my problem is - how to send a generic request to a device (disk /
cdrom / loop / sg / st / ...) from the user space? Hence my recent
question...

Thanks
Guennadi
---
Guennadi Liakhovetski

2005-11-10 13:14:53

by Pavel Machek

[permalink] [raw]
Subject: Re: userspace block driver?

Hi!

> Has anybody put any thought towards how a userspace block driver would work?
>
> Consider a block device implemented via an SSL network connection. I
> don't want to put SSL in the kernel, which means the only other
> alternative is to pass data to/from a userspace daemon.
>
> Anybody have any favorite methods? [similar to] mmap'd packet socket?
> ramfs?

drivers/block/nbd?
Pavel

--
Thanks, Sharp!

2005-11-11 00:13:31

by Andrew Morton

[permalink] [raw]
Subject: Re: userspace block driver?

Miklos Szeredi <[email protected]> wrote:
>
> N.B. though FUSE itself is free of deadlocks, as soon as you put
> something on top of it which has asyncronous page writeback it will
> not be safe anymore.

Why? What goes wrong?

2005-11-11 06:00:57

by Miklos Szeredi

[permalink] [raw]
Subject: Re: userspace block driver?

> >
> > N.B. though FUSE itself is free of deadlocks, as soon as you put
> > something on top of it which has asyncronous page writeback it will
> > not be safe anymore.
>
> Why? What goes wrong?

Filesystem daemon can't use GFP_NOIO, and can't set PF_MEMALLOC. Even
if it could, there's the problem with reply packets from network,
which are not even handled in kernel yet (?).

FUSE sidesteps the issue, by doing writes synchronously and not
allowing shared-writable mappings, hence never dirtying any pages.

The sync write is actually not so bad, the filesystem daemon can do
it's own buffering safely (it's swapabble memory), or do async writes
over the network (letting TCP handle the buffering).

Miklos

2005-11-12 23:41:38

by Guennadi Liakhovetski

[permalink] [raw]
Subject: Re: userspace block driver?

> On Wed, 9 Nov 2005, Jeff Garzik wrote:
>
> >
> > Has anybody put any thought towards how a userspace block driver would work?
> >
> > Consider a block device implemented via an SSL network connection. I don't
> > want to put SSL in the kernel, which means the only other alternative is to
> > pass data to/from a userspace daemon.
> >
> > Anybody have any favorite methods? [similar to] mmap'd packet socket? ramfs?

Hm, how about a simple trick:

you write a "remote resource access" filesystem and do

mount -t remote_resourse -o access_control.conf 192.168.1.1 /mnt/server1

then you "just" write a library to overload open, close, read, write,
ioctl,... do

export LD_PRELOAD=remote_libc.so

and then

mount /mnt/server1/hda1 /usr/local

In /mnt/server1/ you could have interesting things like

mouse0
dsp0

or even

network0
node0/cpu0
node0/ram0

Simple, sin't it?:-)

Thanks
Guennadi
---
Guennadi Liakhovetski

2005-11-16 03:14:21

by Michael Clark

[permalink] [raw]
Subject: Re: userspace block driver?

Jeff Garzik wrote:

>
> Has anybody put any thought towards how a userspace block driver would
> work?
>
> Consider a block device implemented via an SSL network connection. I
> don't want to put SSL in the kernel, which means the only other
> alternative is to pass data to/from a userspace daemon.
>
> Anybody have any favorite methods? [similar to] mmap'd packet socket?
> ramfs?


Here is a user block device i've been using for my own userspace block
device drivers for some years now:

http://gort.metaparadigm.com/userblk/

This code is 2.6 based. 2.6 has made it much more reliable to do this
due to the separation of writeback into the 'pdflush' thread lessening
the likely hood of deadlocks. I've had very good results with this code
under quite heavy memory pressure (you need a carefully written
userspace which mlocks itself into memory and avoids doing certain things).

I wrote it because the nbd and enbd implementations didn't provide a
nice and/or simple interface for a local userspace daemon.

enbd was closer to what I needed but when I looked at it I thought it's
blocking on a ioctl was an ugly design - plus it was overcomplicated
with enbd specific features.

I chose to use a kernel <-> user comms model based on Alan Cox's psdev
with a char device using read and write and a mmap area for the block
request data (potentially allowing me to implement zero copy in the
future by mapping the bio into the user address space).

It is named 'ub' as it was written way before the USB driver although as
I hadn't published my work no one was aware of this. I should come up
with a new name. Perhaps 'bdu'?

I'm actually using the code in a production environment on a lab of 25
linux machines with a hybrid network block device / local disk caching
implementation in userspace (to make netboot less disk intensive -
similar to how Apple's netboot system work's). The userspace
implementation also does local COW onto disk (to avoid the need for a
stateless readonly type linux system which is hard to achieve). I have
source for this available if anyone is interested. The link above only
contains a simple userspace example implementation.

The 'metaboot' system which uses 'ub' does a lot of smart things such as
'commit' and 'rollback' of block deltas and has a cache coherency
protocol to only invalidate changed blocks on clients local caches so I
can upgrade a bunch of quasi netboot/cache machines with minimal network
traffic (much more scalable then normal fatclient netbooting).

I can put up a webpage with links to CVS, etc it anyone is interested.

~mc