2008-08-24 09:00:22

by jassi brar

[permalink] [raw]
Subject: An idea .... with code

Hi,

Lately a question has been bugging me: Why do we keep complicated(specific ioctls to set up and set free) LOOP driver to emulate a data file as block-device, for it to be mounted/formated.

Wouldn't it be nice if we didnt have to specify '-o loop' or use losetup -d etc?

With inspiration from UMS-Gadget and LOOP driver itself, I implemented a proof of concept: A kernel module which creats a node in /sys/devices/ and un/loads files to be emulated as block devices. The interface doesn't implement any new ioctls or a syscall.

To load a file all we need to do is:-

$ echo +filename > /sys/devices/virblk/manage //NOTE the - sign

To unload a file:-

$ echo -filename > /sys/devices/virblk/manage //NOTE the - sign


The module creates one thread per node and alloc/free the data structures in the runtime thereby neither limiting the max number of files that could be emulated nor hogging space when not necessary.


/*************************/

[root@localhost jmod]# dd if=/dev/zero of=image bs=1024 count=102400

[root@localhost jmod]# echo +image > /sys/devices/virblk/manage
[root@localhost jmod]# cat /sys/devices/virblk/manage
/home/jassi/jmod/image' as 'v0blk'

[root@localhost jmod]# dmesg -c
File(image) is being emulated as block device v0blk
v0blk: unknown partition table

[root@localhost jmod]# fdisk /dev/v0blk
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x360df22f.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-100, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-100, default 100): 50

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (51-100, default 51):
Using default value 51
Last cylinder or +size or +sizeM or +sizeK (51-100, default 100):
Using default value 100

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

[root@localhost jmod]# mkfs.ext2 /dev/v0blk1
mke2fs 1.40.8 (13-Mar-2008)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
12824 inodes, 51184 blocks
2559 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=52428800
7 block groups
8192 blocks per group, 8192 fragments per group
1832 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 36 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

[root@localhost jmod]# mkfs.ext2 /dev/v0blk2
mke2fs 1.40.8 (13-Mar-2008)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
12824 inodes, 51200 blocks
2560 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=52428800
7 block groups
8192 blocks per group, 8192 fragments per group
1832 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 38 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

[root@localhost jmod]# dmesg -c
v0blk: v0blk1 v0blk2
v0blk: v0blk1 v0blk2

[root@localhost jmod]# echo -image > /sys/devices/virblk/manage
[root@localhost jmod]# dmesg -c
Killed thread 'vblk0_thread'

[root@localhost jmod]# echo +image > /sys/devices/virblk/manage
[root@localhost jmod]# dmesg
File(image) is being emulated as block device v0blk
v0blk: v0blk1 v0blk2

[root@localhost jmod]# parted /dev/v0blk
GNU Parted 1.8.8
Using /dev/v0blk
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p
Model: Unknown (unknown)
Disk /dev/v0blk: 105MB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number Start End Size Type File system Flags
1 16.4kB 52.4MB 52.4MB primary ext2
2 52.4MB 105MB 52.4MB primary ext2

(parted) q
[root@localhost jmod]#

/**************************************/


The module source code is attached too:-

--------------------------8<-------------------------------

/*
* Kernel module to emulate a user space file as a block device transparently.
*
* Copyright (C) 2008 Jaswinder Singh Brar <jassi_singh_brar AT yahoo DOT com>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*/

#include <linux/module.h>
#include <linux/init.h>
#include <linux/moduleparam.h>
#include <linux/init.h>
#include <linux/hdreg.h>
#include <linux/blkdev.h>
#include <linux/kthread.h>

#define VBD_MAJOR 121 /* Just some available */
#define VBD_NAME "vblk"

#define NUM_SPC_BITS 5 /* == 32 */
#define NUM_HEADS_BITS 6 /* == 64 */
#define SECT_SIZE_BITS 9 /* == 512 */
#define NUM_MINORS 16

#define REGBLKDEV (1<<0)
#define REGPDEV (1<<1)
#define CRTDEVFL (1<<2)

/*
* Various parameters could be configured at module load time or 'echo +' command.
*/
struct vir_blk_dev{
int index;
int heads_bits;
int cylinders;
int sectspercyl_bits;
int sect_size_bits;
int use_count;
int md_change;
int ro;
loff_t sects;
struct file *filp;
struct gendisk *gd;
struct request_queue *rq;
struct bio *vbd_biotail;
struct bio *vbd_bio;
struct list_head vbd_list;
struct task_struct *vbd_thread;
struct semaphore sem;
spinlock_t vbd_lock;
wait_queue_head_t vbd_event;
};

static LIST_HEAD(vbd_head);

static struct device vir_dev = {
.bus_id = "virblk",
};

static int data_xfer(struct vir_blk_dev *vbd, struct page *page, unsigned int len,
unsigned int off, int wr, sector_t sector)
{
ssize_t ret;
loff_t pos = sector * (1<<(vbd->sect_size_bits));
void *buf = kmap(page) + off;

if(sector + (len >> vbd->sect_size_bits) > get_capacity(vbd->gd)){
printk("Trying to move past the end!\n");
kunmap(page);
return -EIO;
}

if(wr)
ret = vbd->filp->f_op->write(vbd->filp, buf, len, &pos);
else
ret = vbd->filp->f_op->read(vbd->filp, buf, len, &pos);

kunmap(page);

if(likely(ret == len))
return 0;

if(ret >= 0)
ret = -EIO;

return ret;
}

static void handle_bio(struct vir_blk_dev *vbd, struct bio *bio)
{
int i, ret = -EIO;
struct bio_vec *bvec;
sector_t sector;

sector = bio->bi_sector;
bio_for_each_segment(bvec, bio, i){

ret = data_xfer(vbd, bvec->bv_page, bvec->bv_len, bvec->bv_offset,
bio_data_dir(bio) == WRITE, sector);
if(ret){
break;
}
sector += (bvec->bv_len >> vbd->sect_size_bits);
}

bio_endio(bio, ret);
}

static int vbd_make_request(struct request_queue *rq, struct bio *bio)
{
struct vir_blk_dev *vbd = (struct vir_blk_dev *)rq->queuedata;

spin_lock_irq(&vbd->vbd_lock);

if(vbd->vbd_biotail != NULL){
vbd->vbd_biotail->bi_next = bio;
vbd->vbd_biotail = bio;
}else{
vbd->vbd_bio = vbd->vbd_biotail = bio;
}

wake_up(&vbd->vbd_event);
spin_unlock_irq(&vbd->vbd_lock);
return 0;
}

static int jmod_open(struct inode *inode, struct file *filp)
{
struct vir_blk_dev *vbd = (struct vir_blk_dev *)inode->i_bdev->bd_disk->private_data;

down(&vbd->sem);
filp->private_data = (void *)vbd;
vbd->use_count++;
up(&vbd->sem);

return 0;
}

static int jmod_release(struct inode *inode, struct file *filp)
{
struct vir_blk_dev *vbd = (struct vir_blk_dev *)inode->i_bdev->bd_disk->private_data;

down(&vbd->sem);
vbd->filp->f_op->fsync(vbd->filp, vbd->filp->f_path.dentry, 1);
vbd->use_count--;
up(&vbd->sem);

return 0;
}

static int jmod_getgeo(struct block_device *bdev, struct hd_geometry *gm)
{
struct vir_blk_dev *vbd = bdev->bd_disk->private_data;

down(&vbd->sem);
gm->heads = 1<<(vbd->heads_bits);
gm->sectors = 1<<(vbd->sectspercyl_bits);
gm->cylinders = get_capacity(bdev->bd_disk) >> (vbd->heads_bits + vbd->sectspercyl_bits);
up(&vbd->sem);

return 0;
}

static int jmod_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
return 0;
}


static int jmod_mediachanged(struct gendisk *gd)
{
struct vir_blk_dev *vbd = gd->private_data;

return vbd->md_change;
}

static int jmod_revalidatedisk(struct gendisk *gd)
{
struct vir_blk_dev *vbd = gd->private_data;

down(&vbd->sem);
if(vbd->md_change)
vbd->md_change = 0;
up(&vbd->sem);

return 0;
}

static struct block_device_operations vbd_fops = {
.owner = THIS_MODULE,
.open = jmod_open,
.release = jmod_release,
.ioctl = jmod_ioctl,
.getgeo = jmod_getgeo,
.media_changed = jmod_mediachanged,
.revalidate_disk = jmod_revalidatedisk,
};

/*
* ind : First minor number of the disk
* */
static struct vir_blk_dev *vblk_alloc(int ind)
{
struct gendisk *disk;
struct vir_blk_dev *vbd;

vbd = kzalloc(sizeof(*vbd), GFP_KERNEL);
if (vbd == NULL)
goto out;

vbd->filp = NULL;
vbd->vbd_thread = NULL;
vbd->use_count = 0;
vbd->md_change = 1;
vbd->index = ind;
vbd->vbd_bio = NULL;
vbd->vbd_biotail = NULL;

vbd->rq = blk_alloc_queue(GFP_KERNEL);
if(vbd->rq == NULL){
goto out;
}
blk_queue_make_request(vbd->rq, vbd_make_request);
blk_queue_bounce_limit(vbd->rq, BLK_BOUNCE_ANY);

disk = vbd->gd = alloc_disk(NUM_MINORS);
if(disk == NULL){
goto out;
}
disk->major = VBD_MAJOR;
disk->first_minor = ind*NUM_MINORS;
disk->fops = &vbd_fops;
disk->private_data = (void *)vbd;
vbd->rq->queuedata = (void *)vbd;
disk->queue = vbd->rq;
disk->flags |= GENHD_FL_REMOVABLE;
sprintf(disk->disk_name, "v%dblk", ind);
init_waitqueue_head(&vbd->vbd_event);

return vbd;

out:
if(vbd && vbd->rq){
blk_cleanup_queue(vbd->rq);
}
if(vbd){
kfree(vbd);
}
return NULL;
}

static struct file* open_backing_file(const char *filename, int *ro, loff_t *size)
{
struct file *filp = NULL;
struct inode *inode = NULL;

*ro = 0;
*size = 0;

filp = filp_open(filename, O_RDWR | O_LARGEFILE, 0);
if (-EROFS == PTR_ERR(filp))
*ro = 1;
if (*ro)
filp = filp_open(filename, O_RDONLY | O_LARGEFILE, 0);
if (IS_ERR(filp)) {
printk("unable to open backing file: %s\n", filename);
return NULL;
}

if(!(filp->f_mode & FMODE_WRITE))
*ro = 1;

if(filp->f_path.dentry)
inode = filp->f_path.dentry->d_inode;

if(inode && S_ISBLK(inode->i_mode)){
if (bdev_read_only(inode->i_bdev))
*ro = 1;
}else if(!inode || !S_ISREG(inode->i_mode)){
printk("invalid file type: %s\n", filename);
filp_close(filp, current->files);
return NULL;
}

if (!filp->f_op || !(filp->f_op->read || filp->f_op->aio_read)) {
printk("file not readable: %s\n", filename);
filp_close(filp, current->files);
return NULL;
}
if (!(filp->f_op->write || filp->f_op->aio_write))
*ro = 1;

*size = i_size_read(inode->i_mapping->host);
if (*size < 0) {
printk("unable to find file size: %s\n", filename);
filp_close(filp, current->files);
return NULL;
}

get_file(filp);

filp_close(filp, current->files);
return filp;
}

static int io_handler(void *data)
{
struct vir_blk_dev* vbd = (struct vir_blk_dev *)data;
struct bio *bio;

allow_signal(SIGINT);
allow_signal(SIGTERM);
allow_signal(SIGKILL);
allow_signal(SIGUSR1);

set_user_nice(current, -20);
set_fs(get_ds());

while(!kthread_should_stop()){

down(&vbd->sem);
while(vbd->vbd_bio != NULL){
spin_lock_irq(&vbd->vbd_lock);
bio = vbd->vbd_bio;
if(bio == vbd->vbd_biotail){
vbd->vbd_biotail = NULL;
}
vbd->vbd_bio = bio->bi_next;
bio->bi_next = NULL;
spin_unlock_irq(&vbd->vbd_lock);

handle_bio(vbd, bio);
}
up(&vbd->sem);

wait_event_interruptible(vbd->vbd_event, (vbd->vbd_bio!=NULL) || kthread_should_stop());
}

down(&vbd->sem);

/* Flush any pending bio's */
while(vbd->vbd_bio != NULL){
spin_lock_irq(&vbd->vbd_lock);

bio = vbd->vbd_bio;
if(bio == vbd->vbd_biotail)
vbd->vbd_biotail = NULL;

vbd->vbd_bio = bio->bi_next;
bio->bi_next = NULL;

spin_unlock_irq(&vbd->vbd_lock);

bio_endio(bio, -EIO);
}

if(vbd->gd)
del_gendisk(vbd->gd);

if(vbd->rq)
blk_cleanup_queue(vbd->rq);

if(vbd->gd)
put_disk(vbd->gd);

vbd->vbd_thread = NULL;

up(&vbd->sem);

return 0;
}

static struct vir_blk_dev* create_vbd(int ind, const char *filename)
{
int ro = 0;
loff_t size = 0;
struct file* fl;
struct vir_blk_dev *vbd = NULL;

fl = open_backing_file(filename, &ro, &size);
if(fl == NULL)
goto retnull;

/* Check if the file is already being emulated */
list_for_each_entry(vbd, &vbd_head, vbd_list){
if(vbd->filp->f_path.dentry->d_name.hash == fl->f_path.dentry->d_name.hash){
printk("File already being emulated\n");
return NULL;
}
}

vbd = vblk_alloc(ind);
if(vbd == NULL){
printk(KERN_ALERT "Couldn't create anymore VBD!\n");
goto retnull;
}

vbd->filp = fl;
vbd->ro = ro;

vbd->sectspercyl_bits = NUM_SPC_BITS;
vbd->heads_bits = NUM_HEADS_BITS;
vbd->sect_size_bits = SECT_SIZE_BITS;
vbd->sects = size >> (vbd->sect_size_bits);

set_capacity(vbd->gd, vbd->sects);
vbd->cylinders = get_capacity(vbd->gd) >> (vbd->heads_bits + vbd->sectspercyl_bits);
spin_lock_init(&vbd->vbd_lock);
init_MUTEX(&vbd->sem);

/* Create a thread to handle Read/Write */
vbd->vbd_thread = kthread_create(io_handler, vbd, "vblk%d_thread", vbd->index);
if(IS_ERR(vbd->vbd_thread)){
printk(KERN_ALERT "Unable to create thread for %s!\n", vbd->gd->disk_name);
goto retnull;
}
wake_up_process(vbd->vbd_thread);
printk("File(%s) is being emulated as block device %s\n", filename, vbd->gd->disk_name);
add_disk(vbd->gd);

return vbd;

retnull:
if(vbd != NULL){
if(vbd->rq != NULL)
blk_cleanup_queue(vbd->rq);

if(vbd->gd != NULL)
put_disk(vbd->gd);

kfree(vbd);
}
return NULL;
}

static void pdev_release(struct device *dev)
{
return;
}

static int detach_vbd(const char *filename)
{
struct file *filp;
struct vir_blk_dev *next, *vbd;

filp = filp_open(filename, O_RDONLY | O_LARGEFILE, 0);
if(IS_ERR(filp))
return 0;

list_for_each_entry_safe(vbd, next, &vbd_head, vbd_list){
if(vbd->filp->f_path.dentry->d_name.hash == filp->f_path.dentry->d_name.hash){
if(vbd->use_count != 0){
printk("Can't unload busy device(%s)\n", vbd->gd->disk_name);
filp_close(filp, current->files);
return 0;
}
kthread_stop(vbd->vbd_thread);
while(vbd->vbd_thread != NULL)
schedule();
printk("Killed thread 'vblk%d_thread'\n", vbd->index);
list_del(&vbd->vbd_list);
kfree(vbd);
filp_close(filp, current->files);
return 1;
}
}
printk("File(%s) not being emulated\n", filename);
filp_close(filp, current->files);
return 0;
}

static int attach_vbd(char *filename)
{
int ind, i;
struct vir_blk_dev *vbd = NULL;

/* Find an available Ind */
ind = i = 0;
do{
i = ind;
list_for_each_entry(vbd, &vbd_head, vbd_list){
if(vbd->index == ind){
ind++;
break;
}
}
}while(ind != i);

vbd = create_vbd(ind, filename);
if(vbd != NULL){
list_add_tail(&vbd->vbd_list, &vbd_head);
return 1;
}else{
printk("Unable to load(%s)\n", filename);
return 0;
}
}

static ssize_t nodes_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
{
if(count>0 && buf[count-1] == '\n')
((char *) buf)[count-1] = 0;

if(buf[0] == '-' && detach_vbd(buf+1))
return count;
else if(buf[0] == '+' && attach_vbd(buf+1))
return count;
else
return -EINVAL;
}

static ssize_t nodes_show(struct device *dev, struct device_attribute *attr, char *buf)
{
ssize_t ret = 0;
struct vir_blk_dev *vbd;
char *fname;

list_for_each_entry(vbd, &vbd_head, vbd_list){
down(&vbd->sem);
fname = d_path(&vbd->filp->f_path, buf, PAGE_SIZE - 1);

if(IS_ERR(fname)){
ret = PTR_ERR(fname);
up(&vbd->sem);
break;
}
ret += sprintf(buf+ret, "%s' as '%s'\n", fname, vbd->gd->disk_name);
up(&vbd->sem);
}

return ret;
}

static DEVICE_ATTR(manage, S_IRUGO|S_IWUSR, nodes_show, nodes_store);

static int __init vb_init(void)
{
int ret = 0;
int op_done = 0;

ret = register_blkdev(VBD_MAJOR, VBD_NAME);
if(ret){
printk(KERN_NOTICE "Could not register_blkdev!\n");
ret = -EIO;
goto cleanup;
}else{
op_done |= REGBLKDEV;
}

vir_dev.release = pdev_release;
if(device_register(&vir_dev)){
ret = -EIO;
printk(KERN_NOTICE "Could not device_register!\n");
goto cleanup;
}else{
op_done |= REGPDEV;
}

if(device_create_file(&vir_dev, &dev_attr_manage)){
ret = -EIO;
printk(KERN_NOTICE "Could not device_create_file!\n");
goto cleanup;
}else{
op_done |= CRTDEVFL;
}

return ret;

cleanup:
if(op_done & CRTDEVFL){
device_remove_file(&vir_dev, &dev_attr_manage);
op_done &= ~CRTDEVFL;
}

if(op_done & REGPDEV){
device_unregister(&vir_dev);
op_done &= ~REGPDEV;
}

if(op_done & REGBLKDEV){
unregister_blkdev(VBD_MAJOR, VBD_NAME);
op_done &= ~REGBLKDEV;
}

return ret;
}

static void __exit vb_exit(void)
{
struct vir_blk_dev *vbd, *next;

list_for_each_entry_safe(vbd, next, &vbd_head, vbd_list){
kthread_stop(vbd->vbd_thread);
while(vbd->vbd_thread != NULL)
schedule();
printk("Killed thread 'vblk%d_thread'\n", vbd->index);
list_del(&vbd->vbd_list);
kfree(vbd);
}

device_remove_file(&vir_dev, &dev_attr_manage);
device_unregister(&vir_dev);

unregister_blkdev(VBD_MAJOR, VBD_NAME);
}

module_init(vb_init);
module_exit(vb_exit);

MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("A Block Device Emulator");
--------------------------8<--------------------------


If i am not missing why this module shudn't be preferred, i would like to improve upon the code and try to contribute it back.

Of course, please CC the replies to me too.

Regards,
Jassi



2008-08-25 12:22:28

by Jochen Voß

[permalink] [raw]
Subject: Re: An idea .... with code

Hi,

I like this approach: it is nice that you don't require a special tool
like losetup with it, and that virblk can handle partitioned disks is
a nice bonus.

Some minor remarks below.
[ disclaimer: I am not an expert in this and might talk nonsense ]

2008/8/24 jassi brar <[email protected]>:
> [root@localhost jmod]# cat /sys/devices/virblk/manage
> /home/jassi/jmod/image' as 'v0blk'

Shouldn't this output be replaced by some format which is easier to
parse? This would make it easier for tools like mount to
automatically allocated virblk devices.

> /*
> * Kernel module to emulate a user space file as a block device transparently.
> *
> * Copyright (C) 2008 Jaswinder Singh Brar <jassi_singh_brar AT yahoo DOT com>
> *
> * This program is free software; you can redistribute it and/or
> * modify it under the terms of the GNU General Public License
> * as published by the Free Software Foundation; either version 2
> * of the License, or (at your option) any later version.
> *
> * This program is distributed in the hope that it will be useful,
> * but WITHOUT ANY WARRANTY; without even the implied warranty of
> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> * GNU General Public License for more details.
> *
> * You should have received a copy of the GNU General Public License
> * along with this program; if not, write to the Free Software
> * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> */
>
> #include <linux/module.h>
> #include <linux/init.h>
> #include <linux/moduleparam.h>
> #include <linux/init.h>
> #include <linux/hdreg.h>
> #include <linux/blkdev.h>
> #include <linux/kthread.h>
>
> #define VBD_MAJOR 121 /* Just some available */
> #define VBD_NAME "vblk"
>
> #define NUM_SPC_BITS 5 /* == 32 */
> #define NUM_HEADS_BITS 6 /* == 64 */
> #define SECT_SIZE_BITS 9 /* == 512 */
> #define NUM_MINORS 16
>
> #define REGBLKDEV (1<<0)
> #define REGPDEV (1<<1)
> #define CRTDEVFL (1<<2)
>
> /*
> * Various parameters could be configured at module load time or 'echo +' command.
> */
> struct vir_blk_dev{
> int index;
> int heads_bits;
> int cylinders;
> int sectspercyl_bits;
> int sect_size_bits;
> int use_count;
> int md_change;
> int ro;
> loff_t sects;
> struct file *filp;
> struct gendisk *gd;
> struct request_queue *rq;
> struct bio *vbd_biotail;
> struct bio *vbd_bio;
> struct list_head vbd_list;
> struct task_struct *vbd_thread;
> struct semaphore sem;
> spinlock_t vbd_lock;
> wait_queue_head_t vbd_event;
> };
>
> static LIST_HEAD(vbd_head);
>
> static struct device vir_dev = {
> .bus_id = "virblk",
> };
>
> static int data_xfer(struct vir_blk_dev *vbd, struct page *page, unsigned int len,
> unsigned int off, int wr, sector_t sector)

strange indentation

> {
> ssize_t ret;
> loff_t pos = sector * (1<<(vbd->sect_size_bits));
> void *buf = kmap(page) + off;
>
> if(sector + (len >> vbd->sect_size_bits) > get_capacity(vbd->gd)){

Kernel coding style seems to prefer spaces after "if" and before "{"
(more instances below).

> printk("Trying to move past the end!\n");
> kunmap(page);
> return -EIO;
> }
>
> if(wr)
> ret = vbd->filp->f_op->write(vbd->filp, buf, len, &pos);
> else
> ret = vbd->filp->f_op->read(vbd->filp, buf, len, &pos);
>
> kunmap(page);
>
> if(likely(ret == len))
> return 0;
>
> if(ret >= 0)
> ret = -EIO;
>
> return ret;
> }
>
> static void handle_bio(struct vir_blk_dev *vbd, struct bio *bio)
> {
> int i, ret = -EIO;
> struct bio_vec *bvec;
> sector_t sector;
>
> sector = bio->bi_sector;
> bio_for_each_segment(bvec, bio, i){
>
> ret = data_xfer(vbd, bvec->bv_page, bvec->bv_len, bvec->bv_offset,
> bio_data_dir(bio) == WRITE, sector);

strange indentation

> if(ret){
> break;

unnecessary brackets

> }
> sector += (bvec->bv_len >> vbd->sect_size_bits);
> }
>
> bio_endio(bio, ret);
> }
>
> static int vbd_make_request(struct request_queue *rq, struct bio *bio)
> {
> struct vir_blk_dev *vbd = (struct vir_blk_dev *)rq->queuedata;
>
> spin_lock_irq(&vbd->vbd_lock);
>
> if(vbd->vbd_biotail != NULL){
> vbd->vbd_biotail->bi_next = bio;
> vbd->vbd_biotail = bio;
> }else{
> vbd->vbd_bio = vbd->vbd_biotail = bio;
> }
>
> wake_up(&vbd->vbd_event);
> spin_unlock_irq(&vbd->vbd_lock);
> return 0;
> }
>
> static int jmod_open(struct inode *inode, struct file *filp)
> {
> struct vir_blk_dev *vbd = (struct vir_blk_dev *)inode->i_bdev->bd_disk->private_data;
>
> down(&vbd->sem);
> filp->private_data = (void *)vbd;
> vbd->use_count++;
> up(&vbd->sem);
>
> return 0;
> }
>
> static int jmod_release(struct inode *inode, struct file *filp)
> {
> struct vir_blk_dev *vbd = (struct vir_blk_dev *)inode->i_bdev->bd_disk->private_data;
>
> down(&vbd->sem);
> vbd->filp->f_op->fsync(vbd->filp, vbd->filp->f_path.dentry, 1);
> vbd->use_count--;
> up(&vbd->sem);
>
> return 0;
> }
>
> static int jmod_getgeo(struct block_device *bdev, struct hd_geometry *gm)
> {
> struct vir_blk_dev *vbd = bdev->bd_disk->private_data;
>
> down(&vbd->sem);
> gm->heads = 1<<(vbd->heads_bits);
> gm->sectors = 1<<(vbd->sectspercyl_bits);
> gm->cylinders = get_capacity(bdev->bd_disk) >> (vbd->heads_bits + vbd->sectspercyl_bits);
> up(&vbd->sem);

Is locking required here?

>
> return 0;
> }
>
> static int jmod_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
> {
> return 0;
> }

Probably you can just omit this method?

>
>
> static int jmod_mediachanged(struct gendisk *gd)
> {
> struct vir_blk_dev *vbd = gd->private_data;
>
> return vbd->md_change;
> }
>
> static int jmod_revalidatedisk(struct gendisk *gd)
> {
> struct vir_blk_dev *vbd = gd->private_data;
>
> down(&vbd->sem);
> if(vbd->md_change)
> vbd->md_change = 0;

Maybe just set "vbd->md_change = 0" unconditionally?

> up(&vbd->sem);
>
> return 0;
> }
>
> static struct block_device_operations vbd_fops = {
> .owner = THIS_MODULE,
> .open = jmod_open,
> .release = jmod_release,
> .ioctl = jmod_ioctl,
> .getgeo = jmod_getgeo,
> .media_changed = jmod_mediachanged,
> .revalidate_disk = jmod_revalidatedisk,
> };
>
> /*
> * ind : First minor number of the disk
> * */
> static struct vir_blk_dev *vblk_alloc(int ind)
> {
> struct gendisk *disk;
> struct vir_blk_dev *vbd;
>
> vbd = kzalloc(sizeof(*vbd), GFP_KERNEL);
> if (vbd == NULL)
> goto out;
>
> vbd->filp = NULL;
> vbd->vbd_thread = NULL;
> vbd->use_count = 0;
> vbd->md_change = 1;
> vbd->index = ind;
> vbd->vbd_bio = NULL;
> vbd->vbd_biotail = NULL;
>
> vbd->rq = blk_alloc_queue(GFP_KERNEL);
> if(vbd->rq == NULL){
> goto out;
> }
> blk_queue_make_request(vbd->rq, vbd_make_request);
> blk_queue_bounce_limit(vbd->rq, BLK_BOUNCE_ANY);
>
> disk = vbd->gd = alloc_disk(NUM_MINORS);
> if(disk == NULL){
> goto out;
> }
> disk->major = VBD_MAJOR;
> disk->first_minor = ind*NUM_MINORS;
> disk->fops = &vbd_fops;
> disk->private_data = (void *)vbd;
> vbd->rq->queuedata = (void *)vbd;
> disk->queue = vbd->rq;
> disk->flags |= GENHD_FL_REMOVABLE;

inconsistent indentation in this block

> sprintf(disk->disk_name, "v%dblk", ind);

Maybe snprintf?

> init_waitqueue_head(&vbd->vbd_event);
>
> return vbd;
>
> out:
> if(vbd && vbd->rq){
> blk_cleanup_queue(vbd->rq);
> }
> if(vbd){
> kfree(vbd);
> }

The "if(vbd)" is not neede, kfree can deal with NULL pointers.

> return NULL;
> }
>
> static struct file* open_backing_file(const char *filename, int *ro, loff_t *size)
> {
> struct file *filp = NULL;
> struct inode *inode = NULL;
>
> *ro = 0;
> *size = 0;
>
> filp = filp_open(filename, O_RDWR | O_LARGEFILE, 0);
> if (-EROFS == PTR_ERR(filp))
> *ro = 1;
> if (*ro)
> filp = filp_open(filename, O_RDONLY | O_LARGEFILE, 0);
> if (IS_ERR(filp)) {
> printk("unable to open backing file: %s\n", filename);
> return NULL;
> }
>
> if(!(filp->f_mode & FMODE_WRITE))
> *ro = 1;
>
> if(filp->f_path.dentry)
> inode = filp->f_path.dentry->d_inode;
>
> if(inode && S_ISBLK(inode->i_mode)){
> if (bdev_read_only(inode->i_bdev))
> *ro = 1;
> }else if(!inode || !S_ISREG(inode->i_mode)){
> printk("invalid file type: %s\n", filename);
> filp_close(filp, current->files);
> return NULL;
> }
>
> if (!filp->f_op || !(filp->f_op->read || filp->f_op->aio_read)) {
> printk("file not readable: %s\n", filename);
> filp_close(filp, current->files);
> return NULL;
> }
> if (!(filp->f_op->write || filp->f_op->aio_write))
> *ro = 1;
>
> *size = i_size_read(inode->i_mapping->host);
> if (*size < 0) {
> printk("unable to find file size: %s\n", filename);
> filp_close(filp, current->files);
> return NULL;
> }
>
> get_file(filp);
>
> filp_close(filp, current->files);
> return filp;
> }
>
> static int io_handler(void *data)
> {
> struct vir_blk_dev* vbd = (struct vir_blk_dev *)data;
> struct bio *bio;
>
> allow_signal(SIGINT);
> allow_signal(SIGTERM);
> allow_signal(SIGKILL);
> allow_signal(SIGUSR1);
>
> set_user_nice(current, -20);
> set_fs(get_ds());
>
> while(!kthread_should_stop()){
>
> down(&vbd->sem);
> while(vbd->vbd_bio != NULL){
> spin_lock_irq(&vbd->vbd_lock);
> bio = vbd->vbd_bio;
> if(bio == vbd->vbd_biotail){
> vbd->vbd_biotail = NULL;
> }

unneeded brackets

> vbd->vbd_bio = bio->bi_next;
> bio->bi_next = NULL;
> spin_unlock_irq(&vbd->vbd_lock);
>
> handle_bio(vbd, bio);
> }
> up(&vbd->sem);
>
> wait_event_interruptible(vbd->vbd_event, (vbd->vbd_bio!=NULL) || kthread_should_stop());

Maybe break long lines into two (in this case after the comma)?

> }
>
> down(&vbd->sem);
>
> /* Flush any pending bio's */
> while(vbd->vbd_bio != NULL){
> spin_lock_irq(&vbd->vbd_lock);
>
> bio = vbd->vbd_bio;
> if(bio == vbd->vbd_biotail)
> vbd->vbd_biotail = NULL;
>
> vbd->vbd_bio = bio->bi_next;
> bio->bi_next = NULL;
>
> spin_unlock_irq(&vbd->vbd_lock);
>
> bio_endio(bio, -EIO);
> }
>
> if(vbd->gd)
> del_gendisk(vbd->gd);
>
> if(vbd->rq)
> blk_cleanup_queue(vbd->rq);
>
> if(vbd->gd)
> put_disk(vbd->gd);
>
> vbd->vbd_thread = NULL;
>
> up(&vbd->sem);
>
> return 0;
> }
>
> static struct vir_blk_dev* create_vbd(int ind, const char *filename)
> {
> int ro = 0;
> loff_t size = 0;
> struct file* fl;
> struct vir_blk_dev *vbd = NULL;
>
> fl = open_backing_file(filename, &ro, &size);
> if(fl == NULL)
> goto retnull;
>
> /* Check if the file is already being emulated */
> list_for_each_entry(vbd, &vbd_head, vbd_list){
> if(vbd->filp->f_path.dentry->d_name.hash == fl->f_path.dentry->d_name.hash){
> printk("File already being emulated\n");

Use KERN_ERR?

> return NULL;
> }
> }
>
> vbd = vblk_alloc(ind);
> if(vbd == NULL){
> printk(KERN_ALERT "Couldn't create anymore VBD!\n");

Is KERN_ALERT ("action must be taken immediately") appropriate here?

> goto retnull;
> }
>
> vbd->filp = fl;
> vbd->ro = ro;
>
> vbd->sectspercyl_bits = NUM_SPC_BITS;
> vbd->heads_bits = NUM_HEADS_BITS;
> vbd->sect_size_bits = SECT_SIZE_BITS;
> vbd->sects = size >> (vbd->sect_size_bits);
>
> set_capacity(vbd->gd, vbd->sects);
> vbd->cylinders = get_capacity(vbd->gd) >> (vbd->heads_bits + vbd->sectspercyl_bits);
> spin_lock_init(&vbd->vbd_lock);
> init_MUTEX(&vbd->sem);
>
> /* Create a thread to handle Read/Write */
> vbd->vbd_thread = kthread_create(io_handler, vbd, "vblk%d_thread", vbd->index);
> if(IS_ERR(vbd->vbd_thread)){
> printk(KERN_ALERT "Unable to create thread for %s!\n", vbd->gd->disk_name);
> goto retnull;
> }
> wake_up_process(vbd->vbd_thread);
> printk("File(%s) is being emulated as block device %s\n", filename, vbd->gd->disk_name);

Use KERN_INFO?

> add_disk(vbd->gd);
>
> return vbd;
>
> retnull:
> if(vbd != NULL){
> if(vbd->rq != NULL)
> blk_cleanup_queue(vbd->rq);
>
> if(vbd->gd != NULL)
> put_disk(vbd->gd);
>
> kfree(vbd);
> }
> return NULL;
> }
>
> static void pdev_release(struct device *dev)
> {
> return;
> }
>
> static int detach_vbd(const char *filename)
> {
> struct file *filp;
> struct vir_blk_dev *next, *vbd;
>
> filp = filp_open(filename, O_RDONLY | O_LARGEFILE, 0);
> if(IS_ERR(filp))
> return 0;
>
> list_for_each_entry_safe(vbd, next, &vbd_head, vbd_list){
> if(vbd->filp->f_path.dentry->d_name.hash == filp->f_path.dentry->d_name.hash){
> if(vbd->use_count != 0){
> printk("Can't unload busy device(%s)\n", vbd->gd->disk_name);

KERN_ERR?

> filp_close(filp, current->files);
> return 0;
> }
> kthread_stop(vbd->vbd_thread);
> while(vbd->vbd_thread != NULL)
> schedule();
> printk("Killed thread 'vblk%d_thread'\n", vbd->index);

KERN_DEBUG?

> list_del(&vbd->vbd_list);

Is some kind of locking needed here?

> kfree(vbd);
> filp_close(filp, current->files);
> return 1;
> }
> }
> printk("File(%s) not being emulated\n", filename);

KERN_DEBUG?

> filp_close(filp, current->files);
> return 0;
> }
>
> static int attach_vbd(char *filename)
> {
> int ind, i;
> struct vir_blk_dev *vbd = NULL;
>
> /* Find an available Ind */
> ind = i = 0;
> do{
> i = ind;
> list_for_each_entry(vbd, &vbd_head, vbd_list){

Can the list change while we are iterating over it? If so, do wee
need locking here?

> if(vbd->index == ind){
> ind++;
> break;
> }
> }
> }while(ind != i);
>
> vbd = create_vbd(ind, filename);
> if(vbd != NULL){
> list_add_tail(&vbd->vbd_list, &vbd_head);
> return 1;
> }else{
> printk("Unable to load(%s)\n", filename);

KERN_ERR?

> return 0;
> }
> }
>
> static ssize_t nodes_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)

Maybe break the long line?

> {
> if(count>0 && buf[count-1] == '\n')
> ((char *) buf)[count-1] = 0;
>
> if(buf[0] == '-' && detach_vbd(buf+1))
> return count;
> else if(buf[0] == '+' && attach_vbd(buf+1))
> return count;
> else
> return -EINVAL;
> }

Maybe return more informative error codes here, e.g. -EBUSY if
detach_vdb fails due to a busy device?

>
> static ssize_t nodes_show(struct device *dev, struct device_attribute *attr, char *buf)
> {
> ssize_t ret = 0;
> struct vir_blk_dev *vbd;
> char *fname;
>
> list_for_each_entry(vbd, &vbd_head, vbd_list){

Can the list change while we are iterating over it? If so, do we need
to take the lock outside the loop?

> down(&vbd->sem);
> fname = d_path(&vbd->filp->f_path, buf, PAGE_SIZE - 1);
>
> if(IS_ERR(fname)){
> ret = PTR_ERR(fname);
> up(&vbd->sem);
> break;
> }
> ret += sprintf(buf+ret, "%s' as '%s'\n", fname, vbd->gd->disk_name);

Can buf overflow?

> up(&vbd->sem);
> }
>
> return ret;
> }
>
> static DEVICE_ATTR(manage, S_IRUGO|S_IWUSR, nodes_show, nodes_store);
>
> static int __init vb_init(void)
> {
> int ret = 0;
> int op_done = 0;
>
> ret = register_blkdev(VBD_MAJOR, VBD_NAME);
> if(ret){
> printk(KERN_NOTICE "Could not register_blkdev!\n");
> ret = -EIO;
> goto cleanup;

The example code in "Linux Device Drivers", third edition, p.469 uses
KERN_WARNING and returns -EBUSY in this case. No idea whether this is
important.

> }else{
> op_done |= REGBLKDEV;
> }
>
> vir_dev.release = pdev_release;

Should this be in the initialiser alongside .bus_id?

> if(device_register(&vir_dev)){
> ret = -EIO;
> printk(KERN_NOTICE "Could not device_register!\n");
> goto cleanup;
> }else{
> op_done |= REGPDEV;
> }
>
> if(device_create_file(&vir_dev, &dev_attr_manage)){
> ret = -EIO;
> printk(KERN_NOTICE "Could not device_create_file!\n");
> goto cleanup;
> }else{
> op_done |= CRTDEVFL;
> }
>
> return ret;
>
> cleanup:
> if(op_done & CRTDEVFL){
> device_remove_file(&vir_dev, &dev_attr_manage);

Can this ever be called?

> op_done &= ~CRTDEVFL;
> }
>
> if(op_done & REGPDEV){
> device_unregister(&vir_dev);
> op_done &= ~REGPDEV;
> }
>
> if(op_done & REGBLKDEV){
> unregister_blkdev(VBD_MAJOR, VBD_NAME);
> op_done &= ~REGBLKDEV;
> }
>
> return ret;
> }
>
> static void __exit vb_exit(void)
> {
> struct vir_blk_dev *vbd, *next;
>
> list_for_each_entry_safe(vbd, next, &vbd_head, vbd_list){
> kthread_stop(vbd->vbd_thread);
> while(vbd->vbd_thread != NULL)
> schedule();
> printk("Killed thread 'vblk%d_thread'\n", vbd->index);

KERN_INFO?

> list_del(&vbd->vbd_list);
> kfree(vbd);
> }
>
> device_remove_file(&vir_dev, &dev_attr_manage);
> device_unregister(&vir_dev);
>
> unregister_blkdev(VBD_MAJOR, VBD_NAME);
> }
>
> module_init(vb_init);
> module_exit(vb_exit);
>
> MODULE_LICENSE("GPL");
> MODULE_DESCRIPTION("A Block Device Emulator");

I hope this helps,
Jochen
--
http://seehuhn.de/

2008-08-25 20:54:16

by Marcin Slusarz

[permalink] [raw]
Subject: Re: An idea .... with code

On Mon, Aug 25, 2008 at 01:22:16PM +0100, =?ISO-8859-1?Q?Jochen_Vo=DF_ wrote:
> >
> > static LIST_HEAD(vbd_head);
> >
> > static struct device vir_dev = {
> > .bus_id = "virblk",
> > };
> >
> > static int data_xfer(struct vir_blk_dev *vbd, struct page *page, unsigned int len,
> > unsigned int off, int wr, sector_t sector)
>
> strange indentation
>
> > {
> > ssize_t ret;
> > loff_t pos = sector * (1<<(vbd->sect_size_bits));
> > void *buf = kmap(page) + off;
> >
> > if(sector + (len >> vbd->sect_size_bits) > get_capacity(vbd->gd)){
>
> Kernel coding style seems to prefer spaces after "if" and before "{"
> (more instances below).

It's better to check this patch with scripts/checkpatch.pl - it will tell you
about all style issues.

> > init_MUTEX(&vbd->sem);

You seem to use this semaphore as a mutex - why don't you convert it to mutex then?
Semaphores should be used only when they are absolutely necessary.

Marcin

2008-08-26 02:34:01

by Bill Davidsen

[permalink] [raw]
Subject: Re: An idea .... with code

jassi brar wrote:
> Hi,
>
> Lately a question has been bugging me: Why do we keep complicated(specific ioctls to set up and set free) LOOP driver to emulate a data file as block-device, for it to be mounted/formated.
>
> Wouldn't it be nice if we didnt have to specify '-o loop' or use losetup -d etc?
>
> With inspiration from UMS-Gadget and LOOP driver itself, I implemented a proof of concept: A kernel module which creats a node in /sys/devices/ and un/loads files to be emulated as block devices. The interface doesn't implement any new ioctls or a syscall.
>
This seems like a really interesting idea, but will it interface with
crypto applications like LUKS? Crypto seems to be a vital feature for
anything related to devices and/or filesystems if an approach is going
to be more than a niche slution.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2008-08-26 08:24:42

by Andi Kleen

[permalink] [raw]
Subject: Re: An idea .... with code

jassi brar <[email protected]> writes:

> Lately a question has been bugging me: Why do we keep complicated(specific ioctls to set up and set free)

Can you please expand a bit why you think losetup is that complicated
and what the problem is with it?

AFAIK you're essentially just moving a minimal version of losetup
(with missing features like no offsets etc.) into the kernel and
frankly I fail to see the beauty in that. Or rather if you start with
losetup, why stop at mount, modprobe, ifconfig, mkfs, fsck, ls[1], ...?

For me it seems more that most of the file system based command
interfaces (/proc/mtrr comes to mind) are quite hard to use and
I prefer a proper command line tool with a manpage and --help
and a real parser any day.

-Andi

[1] I'm sure someone could come up with some scheme to do ls using sysfs
and you could find someone on this list who said "cool" :)

2008-08-26 10:43:34

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: An idea .... with code

On Tue, Aug 26, 2008 at 10:24:30AM +0200, Andi Kleen wrote:
> jassi brar <[email protected]> writes:
>
> > Lately a question has been bugging me: Why do we keep complicated(specific ioctls to set up and set free)
>
> Can you please expand a bit why you think losetup is that complicated
> and what the problem is with it?
>
> AFAIK you're essentially just moving a minimal version of losetup
> (with missing features like no offsets etc.) into the kernel and
> frankly I fail to see the beauty in that. Or rather if you start with
> losetup, why stop at mount, modprobe, ifconfig, mkfs, fsck, ls[1], ...?
>
> For me it seems more that most of the file system based command
> interfaces (/proc/mtrr comes to mind) are quite hard to use

Name four.

> and I prefer a proper command line tool with a manpage and --help
> and a real parser any day.

Go write mtrrctl(1) and a manpage. You'll need it in case of
ioctl-based interface anyway.

2008-08-26 11:05:59

by Andi Kleen

[permalink] [raw]
Subject: Re: An idea .... with code

> > For me it seems more that most of the file system based command
> > interfaces (/proc/mtrr comes to mind) are quite hard to use
>
> Name four.

What do you mean?

>
> > and I prefer a proper command line tool with a manpage and --help
> > and a real parser any day.
>
> Go write mtrrctl(1) and a manpage. You'll need it in case of
> ioctl-based interface anyway.

That's actually a good idea.

-Andi

2008-08-27 00:44:21

by jassi brar

[permalink] [raw]
Subject: Re: An idea .... with code

Hi Jochen, Marcin and Bill,

> Some minor remarks below.
> It's better to check this patch with scripts/checkpatch.pl
Thanks for the feedback. I admit the code is not near perfection, but then
i think that comes after the community finds the idea any worth.

> will it interface with crypto applications like LUKS?
It should work with LUKS, as it transparently emulates a file as a block device.
Please suggest a test case(webpage?) to verify it against LUKS, as i don't know much about it.

As i said the code was just as a proof of concept hastily put together. I heard Linux folks hate Ideas without code :)
Btw, i personally believe not everyone with an idea could code well and not every good programmer is as well creative.

Regards,
-Jassi


2008-08-27 02:28:11

by jassi brar

[permalink] [raw]
Subject: Re: An idea .... with code

Hi Andi,

Before starting i would like to point out the 'philosophy' behind the idea.

Exceptions(special cases) make for entropy and hence complexity. Generalization keeps uniformity and hences Order and Ease.
We can't escape 'entropy' when emulating something that it is actually not, nevertheless we can minimize it by introducing as least and as localized changes as possible.
My idea congregates only determining changes at the lowest level(driver), unlike the respectable LOOP driver which hardcodes limits and adds to the entropy(new ioctls, special utilities etc). I am sure you, being a vetran, understand the point well.

I am not for discarding features from the system, but for implementing them in a way that is lesser intrusive.

> Can you please expand a bit why you think losetup is that complicated
> and what the problem is with it?
Sir, i don't object to losetup as such. It is a utility made(specifically?) for LOOP devices: which implements unnecessary gears and switches to operate.
I object only to what could be done without using ioctls and not to features that losetup provides for devices other than Loop(are there any?)

> AFAIK you're essentially just moving a minimal version of losetup
> (with missing features like no offsets etc.) into the kernel
Its simpler than that :)
All i do is implement non-ioctl method to load/unload a file as a (emulated)block device. And alloc/free resources for them on need basis rather than limiting them at driver-module load-time.

Further about some features of LOOP drivers like encryption: i think they are better done at filesystem level.
I believe any operation that could be done on a real block device shud be possible on an emulated one. And exactly that.
If there is any need to do some stuff(offsets, encryption etc) then that cud be done on a real block and we would surely already have utilities for doing that on real devices(and hences on a transparent emulated device).
I am unable to think of something practical that can _only_ be done using the 'offset feature' of losetup on /dev/loop AND which can not be accomplished for (emulated or not)block devices.
The point is to only make least intrusive and most generic code, for making easy the inheritance of standard system-wide features, utilities and usages.

LOOP driver was a brilliant idea for the early days of Linux. The days when one could relatively easily define new syscalls/ioctls.

> Or rather if you start with losetup, why stop at mount, modprobe, ifconfig, mkfs, fsck, ls[1], ...?
I am not starting with any application.
I start with removing redundancy in a driver(LOOP), if by doing that i make obsolete an application... am for it.

>[1] I'm sure someone could come up with some scheme to do ls using sysfs
> and you could find someone on this list who said "cool" :)
It would indeed be cool if the one doesn't make use of 'cat' 'grep' et al and the method is more efficient.
Otherwise it would just be four motobikes tugging a car :)

Regards,
Jassi



2008-08-27 07:24:19

by jassi brar

[permalink] [raw]
Subject: Re: An idea .... with code

Hi Andi,

Before starting i would like to point out the 'philosophy' behind the idea.

Exceptions(special cases) make for entropy and hence complexity. Generalization keeps uniformity and hences Order and Ease.
We can't escape 'entropy' when emulating something that it is actually not, nevertheless we can minimize it by introducing as least and as localized changes as possible.
My idea congregates only determining changes at the lowest level(driver), unlike the respectable LOOP driver which hardcodes limits and adds to the entropy(new ioctls, special utilities etc). I am sure you, being a vetran, understand the point well.

I am not for discarding features from the system, but for implementing them in a way that is lesser intrusive.

> Can you please expand a bit why you think losetup is that complicated
> and what the problem is with it?
Sir, i don't object to losetup as such. It is a utility made(specifically?) for LOOP devices: which implements unnecessary gears and switches to operate.
I object only to what could be done without using ioctls and not to features that losetup provides for devices other than Loop(are there any?)

> AFAIK you're essentially just moving a minimal version of losetup
> (with missing features like no offsets etc.) into the kernel
Its simpler than that :)
All i do is implement non-ioctl method to load/unload a file as a (emulated)block device. And alloc/free resources for them on need basis rather than limiting them at driver-module load-time.

Further about some features of LOOP drivers like encryption: i think they are better done at filesystem level.
I believe any operation that could be done on a real block device shud be possible on an emulated one. And exactly that.
If there is any need to do some stuff(offsets, encryption etc) then that cud be done on a real block and we would surely already have utilities for doing that on real devices(and hences on a transparent emulated device).
I am unable to think of something practical that can _only_ be done using the 'offset feature' of losetup on /dev/loop AND which can not be accomplished for (emulated or not)block devices.
The point is to only make least intrusive and most generic code, for making easy the inheritance of standard system-wide features, utilities and usages.

LOOP driver was a brilliant idea for the early days of Linux. The days when one could relatively easily define new syscalls/ioctls.

> Or rather if you start with losetup, why stop at mount, modprobe, ifconfig, mkfs, fsck, ls[1], ...?
I am not starting with any application.
I start with removing redundancy in a driver(LOOP), if by doing that i make obsolete an application... am for it.

>[1] I'm sure someone could come up with some scheme to do ls using sysfs
> and you could find someone on this list who said "cool" :)
It would indeed be cool if the one doesn't make use of 'cat' 'grep' et al and the method is more efficient.
Otherwise it would just be four motobikes tugging a car :)

Regards,
Jassi


2008-08-27 07:45:31

by Andi Kleen

[permalink] [raw]
Subject: Re: An idea .... with code

On Wed, Aug 27, 2008 at 12:24:07AM -0700, jassi brar wrote:
> Exceptions(special cases) make for entropy and hence complexity. Generalization keeps uniformity and hences Order and Ease.

I fail to see what your patch generalizes? AFAIK it just adds a new
more narrow (less features than the old one) interface to create loop devices.

> I am not for discarding features from the system, but for implementing them in a way that is lesser intrusive.

But you're adding more code which is more intrusive?

> > Can you please expand a bit why you think losetup is that complicated
> > and what the problem is with it?
> Sir, i don't object to losetup as such. It is a utility made(specifically?) for LOOP devices: which implements unnecessary gears and switches to operate.
> I object only to what could be done without using ioctls and not to features that losetup provides for devices other than Loop(are there any?)

Your goal is to replace all ioctls with sysfs files?

If it's that then I'm going on the book as saying that's a bad idea, especially
for this case. While ioctls have their problems they work quite well for many
things. I don't see any particular reason why ioctls should not be used
to configure loop devices.

-Andi

2008-08-27 09:57:41

by David Newall

[permalink] [raw]
Subject: Re: An idea .... with code

Andi Kleen wrote:
> I fail to see what your patch generalizes? AFAIK it just adds a new
> more narrow (less features than the old one) interface to create loop devices.
>

AFAYK it adds at least one new feature: partition tables.

2008-08-27 09:59:29

by Andi Kleen

[permalink] [raw]
Subject: Re: An idea .... with code

On Wed, Aug 27, 2008 at 07:27:26PM +0930, David Newall wrote:
> Andi Kleen wrote:
> > I fail to see what your patch generalizes? AFAIK it just adds a new
> > more narrow (less features than the old one) interface to create loop devices.
> >
>
> AFAYK it adds at least one new feature: partition tables.

kpartx does those for loop devices already.

-Andi

--
[email protected]

2008-08-27 12:38:57

by jassi brar

[permalink] [raw]
Subject: Re: An idea .... with code

--- On Wed, 8/27/08, Andi Kleen <[email protected]> wrote:

> I fail to see what your patch generalizes? AFAIK it just
> adds a new
> more narrow (less features than the old one) interface to
> create loop devices.
Btw, my code is not a patch to the loop driver, its an altogether new module. May i daresay, it aims squarely at drivers/block/loop.c and mean to replace it altogether.
My module generalizes in the way that it doesn't add or make use of any ioctl. It doesn't even export a variable and makes uses only of what other subsystems provide for(block, sysfs, vfs).
As far as features are concerned, please suggest me what could be done with /dev/loop0 and not for /dev/vblk?


> But you're adding more code which is more intrusive?
To be exact, i mean to _replace_ driver/block/loop.c, and hence _remove_ all the loop specific ioctls and max# limitations, with drivers/block/vblk.c :D



> Your goal is to replace all ioctls with sysfs files?

Please do have a look at the code.

I add only one sysfs interface (manage), which when read returns the status of all the files being emulated and when written updates(add/remove) emulation of a file.
The interface could be made more useful by echo'ing in other parameters along with filename for example:
echo +[r/w]+[sects]+[cyls]+[heads]+[filename] > /sys/devices/virblk/manage
for specifyinf readonly, cylinders, heads, sectsize for the file to be emulated.


> If it's that then I'm going on the book as saying
> that's a bad idea, especially
> for this case. While ioctls have their problems they work
> quite well for many
> things. I don't see any particular reason why ioctls
> should not be used
> to configure loop devices.
I don't intend to revolt against the concept of ioctls. Being an embedded linux engineer I do understand the importance of ioctls: i declare one every other day for configuring custom h/w: Graphics and Multimedia esp so.
As for loop devices, i think we can make do without'em.

Regards,
-Jassi


2008-08-27 12:45:19

by Andi Kleen

[permalink] [raw]
Subject: Re: An idea .... with code

> As far as features are concerned, please suggest me what could be done with /dev/loop0 and not for /dev/vblk?

Serious question? cryptoloop, offsets are the big ticket items,
i'm sure there are more.

-Andi

2008-08-27 14:49:46

by Kasper Sandberg

[permalink] [raw]
Subject: Re: An idea .... with code

On Wed, 2008-08-27 at 14:47 +0200, Andi Kleen wrote:
> > As far as features are concerned, please suggest me what could be done with /dev/loop0 and not for /dev/vblk?
>
> Serious question? cryptoloop, offsets are the big ticket items,
> i'm sure there are more.

As far as cryptoloop goes, isnt that obsoleted anyway? is there anyone
that doesent use dm-crypt by now?

as for offsets, isnt this a logical thing to do with devicemapper
aswell?

ofcourse loop/losetup cannot just be removed, theres probably LOTS of
scripts around which are being used.

>
> -Andi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2008-08-27 14:59:58

by Andi Kleen

[permalink] [raw]
Subject: Re: An idea .... with code

On Wed, Aug 27, 2008 at 04:49:35PM +0200, Kasper Sandberg wrote:
> On Wed, 2008-08-27 at 14:47 +0200, Andi Kleen wrote:
> > > As far as features are concerned, please suggest me what could be done with /dev/loop0 and not for /dev/vblk?
> >
> > Serious question? cryptoloop, offsets are the big ticket items,
> > i'm sure there are more.
>
> As far as cryptoloop goes, isnt that obsoleted anyway?

It's like saying: FAT -- isn't that obsolete by now?
As long as there are crypto loop files around it's not obsolete.

> is there anyone
> that doesent use dm-crypt by now?

Wrong question. You lose.

-Andi

2008-08-28 01:24:17

by jassi brar

[permalink] [raw]
Subject: Re: An idea .... with code

--- On Wed, 8/27/08, Andi Kleen <[email protected]> wrote:
> On Wed, Aug 27, 2008 at 04:49:35PM +0200, Kasper Sandberg
> wrote:
> > As far as cryptoloop goes, isnt that obsoleted anyway?
>
> It's like saying: FAT -- isn't that obsolete by
> now?
> As long as there are crypto loop files around it's not
> obsolete.

Cryptoloop? No, my code isn't the 'same' as loop.c and hence can't provide for cryptoloop. I lose, you win :)
Cryptoloop was designed so b'coz 'loop' feature was available, not the other way round. Had we not had '-o loop', developers would have devised mechanism for encrypting block devices rather than exploiting the -o loop hack. In retrospect, it seems Cryptoloop was a bad design based upon a hack.
Encryption? Yes, my code does provide a way for it. It supports better dm-crypt/LUKS off the shelf.

Loop.c sure provides more 'features' but the point is that we can make do without them, which only adds to the complexity. A simpler and more generic driver could do us equal good.

> > is there anyone
> > that doesent use dm-crypt by now?
> Wrong question. You lose.
I see one mail from Andrew Morton suggesting removal of cryptoloop altogether in order to ramp up its obsoletion! I am only providing a more-transparent-way of using dm-crypt.

I assume the concede Kasper's point on 'offset' feature of loop.c.

Anyways, seems we are moving from discussion to debate. You've got the whole picture and now its purely your call to approve of it or not.

Better still, you could suggest how to make it better: we are for sure gonna drop cryptoloop for dm-crypt. What wud we do with the then-unnecessary crypt support of loop.c?

Or maybe, for the time being, it would be better to modify Loop.c to allow for runtime alloc/free of as many resources as needed and provide for sysfs interface to add/remove 'loop' rather than ioctls?

I had an idea and i cooked up a code. I wanted to put it somewhere for someone to stumble upon the idea sometime. I am done.

Best Regards,
-Jassi



2008-08-28 10:08:42

by Kasper Sandberg

[permalink] [raw]
Subject: Re: An idea .... with code

On Wed, 2008-08-27 at 17:02 +0200, Andi Kleen wrote:
> On Wed, Aug 27, 2008 at 04:49:35PM +0200, Kasper Sandberg wrote:
> > On Wed, 2008-08-27 at 14:47 +0200, Andi Kleen wrote:
> > > > As far as features are concerned, please suggest me what could be done with /dev/loop0 and not for /dev/vblk?
> > >
> > > Serious question? cryptoloop, offsets are the big ticket items,
> > > i'm sure there are more.
> >
> > As far as cryptoloop goes, isnt that obsoleted anyway?
>
> It's like saying: FAT -- isn't that obsolete by now?
> As long as there are crypto loop files around it's not obsolete.
I disagree, as i recall, i stopped using cryptoloop and went directly to
dm-crypt with compatibility for my already existing files?

>
> > is there anyone
> > that doesent use dm-crypt by now?
>
> Wrong question. You lose.
In the case that i am correct above, it would appear that i did not
loose, given that the newer stuff has backwards compatibility (which i
am fairly certain it does)

also do not forget that i did say its unlikely that loop be removed due
to its heavily "established" status.

>
> -Andi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2008-08-30 02:43:44

by Bill Davidsen

[permalink] [raw]
Subject: Re: An idea .... with code

Kasper Sandberg wrote:
> On Wed, 2008-08-27 at 14:47 +0200, Andi Kleen wrote:
>>> As far as features are concerned, please suggest me what could be done with /dev/loop0 and not for /dev/vblk?
>> Serious question? cryptoloop, offsets are the big ticket items,
>> i'm sure there are more.
>
> As far as cryptoloop goes, isnt that obsoleted anyway? is there anyone
> that doesent use dm-crypt by now?

I, for one, have never been able to figure out how to use dm-crypt with
cryptoloop files. If I had, for instance, a file and knew that if I
started at a certain offset using a certain encryption with a certain
key, and I wanted to mount my part of it, I have never persuaded
dm-crypt to do that.

All proposed solutions have started with copy or convert, neither of
which are practical with this data.
>
> as for offsets, isnt this a logical thing to do with devicemapper
> aswell?
>
> ofcourse loop/losetup cannot just be removed, theres probably LOTS of
> scripts around which are being used.
>

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot