2002-08-27 04:44:24

by Andre Bonin

[permalink] [raw]
Subject: Loop devices under NTFS

Hello,
I don't know if this is kernel-related.

I'me trying to mount my redhat iso's off an NTFS mount but I get a
strange error. Here is the exact command I am entering:

"mount -o loop -r -t iso9660 /mnt/win_d/Source/Iso/Red\ Hat\
7.3/valhalla-i386-disc1.iso /mnt/rh7.3/cd1"

It gives me an error
"ioctl: LOOP_SET_FD: Invalid argument"

A quick grep found that "Invalid argument" comes from:
'acsi.c:322: { 0x24, "Invalid argument" }'

This might sound silly but I can't really seem to track it down.
Yes loop devices are enabled in my kernel.

It might interest you that I am running an SMP system (with the AMD
controller). But I don't think that should affect anything at such a
higher level... You never know.

Anyone have any ideas?

Thanks!


*******************************
Andre Bonin
Student in Software Engineering
Lakehead University
Thunder Bay, Ontario
Canada
*******************************



2002-08-27 09:13:18

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: Loop devices under NTFS

At 05:48 27/08/02, Andre Bonin wrote:
>I don't know if this is kernel-related.

It is.

>I'me trying to mount my redhat iso's off an NTFS mount but I get a strange
>error. Here is the exact command I am entering:
>
>"mount -o loop -r -t iso9660 /mnt/win_d/Source/Iso/Red\ Hat\
>7.3/valhalla-i386-disc1.iso /mnt/rh7.3/cd1"
>
>It gives me an error
>"ioctl: LOOP_SET_FD: Invalid argument"
>
>A quick grep found that "Invalid argument" comes from:
>'acsi.c:322: { 0x24, "Invalid argument" }'

Your grep was too quick...

>Anyone have any ideas?

Yes, you forgot to tell us which kernel version you are using! And even
more importantly which NTFS driver version?

If you are using 2.4.x then you are likely to have the old ntfs driver
which does NOT support mounting of loopback devices.

If you are using 2.5.x then you have the new driver and what you did ought
to have worked without problems. Also, if you are using 2.4.18/19/20-pre
you could be using the new driver by installing our patches - you can get
the patch for 2.4.19 from:
http://linux-ntfs.sf.net/downloads.html
or you can clone/pull from our bitkeeper repository which is currently at
2.4.20-pre4 or so:
http://linux-ntfs.bkbits.net/ntfs-2.4

So check your kernel & ntfs version and if your ntfs driver version is 1.*
then you know you have the old driver and if you have 2.* then you know you
have the new driver.

Once again the old driver will not work. The new one should work just fine.

Best regards,

Anton


--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2002-08-27 12:36:43

by Adam J. Richter

[permalink] [raw]
Subject: Re: Loop devices under NTFS

At 05:48 27/08/02, Andre Bonin wrote:
>"mount -o loop -r -t iso9660 /mnt/win_d/Source/Iso/Red\ Hat\
>7.3/valhalla-i386-disc1.iso /mnt/rh7.3/cd1"
>
>It gives me an error
>"ioctl: LOOP_SET_FD: Invalid argument"

I believe that NTFS does not provide
address_space_operations->{prepare,commit}_write for plain files, so
the version of loop.c in stock 2.5.31 will not work.

I posted an update for loop.c that I posted on August 15th:
( http://marc.theaimsgroup.com/?l=linux-kernel&m=102941520919910&w=2 ).
That update included a hack (originally conceived by Andrew Morton and
originally implemented Jari Ruusu) to use file_operations->{read,write}
for mounting files on file systems that do not provide
{prepare,commit}_write, although this involves an extra data copy if
a transformation (such as encryption) is selected.

I submitted the loop.c update to Linus at least a week ago, but
I have not seen it appear in
ftp://ftp.kernel.org/pub/linux/kernel/people/dwmw2/bk-2.5.

Side note:

There are only a few file systems that provide writable files
without aops->{prepare,commit}_write. I think they are just tmpfs,
ntfs and intermezzo. If all file systems that provided writable files
could be expected to provide {prepare,commit}_write, I could eliminate
the file_ops->{read,write} code from loop.c. I wish I understood if
there really is a need for a file system to be able to provide a
writable random access file (as opposed to /dev/tty) that does not
have aops->{prepare,commit}_write. I would be interesting in knowing
if there is anything preventing implementation of
{prepare,commit}_wriet in ntfs.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."




2002-08-27 12:42:19

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Tue, 27 Aug 2002, Adam J. Richter wrote:
> At 05:48 27/08/02, Andre Bonin wrote:
> >"mount -o loop -r -t iso9660 /mnt/win_d/Source/Iso/Red\ Hat\
> >7.3/valhalla-i386-disc1.iso /mnt/rh7.3/cd1"
> >
> >It gives me an error
> >"ioctl: LOOP_SET_FD: Invalid argument"
>
> I believe that NTFS does not provide
> address_space_operations->{prepare,commit}_write for plain files, so
> the version of loop.c in stock 2.5.31 will not work.

He is mounting read-only which does not require prepare,commit_write and
it works fine. I have done it myself. Both in 2.4 and 2.5.

Further the NTFS updates I posted a few days ago actually implement
prepare,commit write so they make loop work completely.

> I posted an update for loop.c that I posted on August 15th:
> ( http://marc.theaimsgroup.com/?l=linux-kernel&m=102941520919910&w=2 ).
> That update included a hack (originally conceived by Andrew Morton and
> originally implemented Jari Ruusu) to use file_operations->{read,write}
> for mounting files on file systems that do not provide
> {prepare,commit}_write, although this involves an extra data copy if
> a transformation (such as encryption) is selected.
>
> I submitted the loop.c update to Linus at least a week ago, but
> I have not seen it appear in
> ftp://ftp.kernel.org/pub/linux/kernel/people/dwmw2/bk-2.5.
>
> Side note:
>
> There are only a few file systems that provide writable files
> without aops->{prepare,commit}_write. I think they are just tmpfs,
> ntfs and intermezzo. If all file systems that provided writable files

NTFS doesn't provide write access at all. My posted updates implement
write in the new driver and they do it using prepare,commit write.

> could be expected to provide {prepare,commit}_write, I could eliminate
> the file_ops->{read,write} code from loop.c. I wish I understood if
> there really is a need for a file system to be able to provide a
> writable random access file (as opposed to /dev/tty) that does not
> have aops->{prepare,commit}_write. I would be interesting in knowing
> if there is anything preventing implementation of
> {prepare,commit}_wriet in ntfs.

No, and it is in fact implemented already. As soon as Linus pulls from my
bk repository (http://linux-ntfs.bkbits.net/ntfs-2.5) the stock kernel
will have prepare,commit write in ntfs.

For the old ntfs driver still present in 2.4.x, there are no address space
operations at all, the page cache is in fact not used at all, so that
prevents the loop driver from working at all...

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2002-08-27 13:11:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Tue, Aug 27, 2002 at 05:40:52AM -0700, Adam J. Richter wrote:
> There are only a few file systems that provide writable files
> without aops->{prepare,commit}_write. I think they are just tmpfs,
> ntfs and intermezzo. If all file systems that provided writable files
> could be expected to provide {prepare,commit}_write, I could eliminate
> the file_ops->{read,write} code from loop.c.

This is the wrong level of abstraction. There is no reason why a filesystem
has to use the pagecache at all.

Note that there is a more severe bug in loop.c: it's abuse of
do_generic_file_read.

2002-08-27 13:19:38

by Adam J. Richter

[permalink] [raw]
Subject: Re: Loop devices under NTFS

>On Tue, Aug 27, 2002 at 05:40:52AM -0700, Adam J. Richter wrote:
>> There are only a few file systems that provide writable files
>> without aops->{prepare,commit}_write. I think they are just tmpfs,
>> ntfs and intermezzo. If all file systems that provided writable files
>> could be expected to provide {prepare,commit}_write, I could eliminate
>> the file_ops->{read,write} code from loop.c.

>This is the wrong level of abstraction. There is no reason why a filesystem
>has to use the pagecache at all.

Are you complaining about something in loop.c, or are you just
saying that you'd like to see some kind of
generic_file_{prepare,commit}_write routines that plain files in all
writable filesystems could use?


>Note that there is a more severe bug in loop.c: it's abuse of
>do_generic_file_read.

Could you please elaborate on this and give an example where
it return incorrect data, deadlock, generate a kernel oops, etc.?

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-08-27 13:23:35

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Tue, Aug 27, 2002 at 06:23:48AM -0700, Adam J. Richter wrote:
> Are you complaining about something in loop.c,

Yes. Anything but the filesystem itself and the generic read/write path
is not supposed to use address space operations directly.

> >Note that there is a more severe bug in loop.c: it's abuse of
> >do_generic_file_read.
>
> Could you please elaborate on this and give an example where
> it return incorrect data, deadlock, generate a kernel oops, etc.?

Depending on the filesystem implementation _anything_ may happen.
With current intree filesystems the only real life problem is that
it doesn't work on certain filesystems. I think at least the network
filesystems might be oopsable with some preparation.

2002-08-27 13:49:14

by Adam J. Richter

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Tue, 27 Aug 2002, Christoph Hellwig:
>Yes. Anything but the filesystem itself and the generic read/write path
>is not supposed to use address space operations directly.

Why?

According to linux-2.5.31/Documentation/Locking,
"->prepare_write(), ->commit_write(), ->sync_page() and ->readpage()
may be called from the request handler (/dev/loop)."

Using the page cache in loop.c saves a copy when there is a
data transformation (such as encryption) involved, and that can be
important for reducing the cost of privacy.

>> >Note that there is a more severe bug in loop.c: it's abuse of
>> >do_generic_file_read.
>>
>> Could you please elaborate on this and give an example where
>> it return incorrect data, deadlock, generate a kernel oops, etc.?

>Depending on the filesystem implementation _anything_ may happen.
>With current intree filesystems the only real life problem is that
>it doesn't work on certain filesystems.

Sorry for repeating myself here: If you're referring to the
stock loop.c not working with tmpfs because tmpfs lacks
{prepare,commit}_write which my patch works around (based on Jari's
patch before mine, and a patch by Andrew Morton as well). I have yet
to hear a clear reason why any writable plain file on any given file
system could not have {prepare,commit}_write operations available.

>I think at least the network
>filesystems might be oopsable with some preparation.

Please come up with a clear example. I'm not asking you for a
test case that can produce it, just some narrative of the problem
occurring.

I am aware that you can get races if someone mounts a loop
device while accessing the underlying file by some other mechanism,
but I believe that the only case where that would be done in practice
is to change the encryption of a device, and, because of the read and
write patterns involved in that, it should not be a problem.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-08-27 17:00:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Tue, Aug 27, 2002 at 06:53:19AM -0700, Adam J. Richter wrote:
> Why?
>
> According to linux-2.5.31/Documentation/Locking,
> "->prepare_write(), ->commit_write(), ->sync_page() and ->readpage()
> may be called from the request handler (/dev/loop)."

Just because it's present in current code it doesn't mean it's right.
Calling aops directly from generic code is a layering violation and
it will not survive 2.5.

> Using the page cache in loop.c saves a copy when there is a
> data transformation (such as encryption) involved, and that can be
> important for reducing the cost of privacy.

Separating a stackalbe encryption block device from the loop driver is
a good idea. The current loop code is a horrible mess because it tries
to do the job of three drivers in one.

> >Depending on the filesystem implementation _anything_ may happen.
> >With current intree filesystems the only real life problem is that
> >it doesn't work on certain filesystems.
>
> Sorry for repeating myself here: If you're referring to the
> stock loop.c not working with tmpfs because tmpfs lacks
> {prepare,commit}_write which my patch works around (based on Jari's
> patch before mine, and a patch by Andrew Morton as well). I have yet
> to hear a clear reason why any writable plain file on any given file
> system could not have {prepare,commit}_write operations available.

No, tmpfs also does not use generic_file_read but a sligh variation,
calling do_generic_file_read on tmpfs inodes will not always works as
expected. Don't do it.

> Please come up with a clear example. I'm not asking you for a
> test case that can produce it, just some narrative of the problem
> occurring.

loop on nfs, do_generic_file_read is called without the needed
nfs_revalidate_inode, thus i_size is outdated, and loop might happily
read out of the filesize.

> I am aware that you can get races if someone mounts a loop
> device while accessing the underlying file by some other mechanism,
> but I believe that the only case where that would be done in practice
> is to change the encryption of a device, and, because of the read and
> write patterns involved in that, it should not be a problem.

This is true for filesystems like nfs (above) that only revalidate and then
call generic_file_read. For totally different implementations anything can
happen. Even if it mostly works it's not the kind of design we want to have
in the kernel.

2002-08-27 17:22:25

by Jan Harkes

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Tue, Aug 27, 2002 at 06:53:19AM -0700, Adam J. Richter wrote:
> >Depending on the filesystem implementation _anything_ may happen.
> >With current intree filesystems the only real life problem is that
> >it doesn't work on certain filesystems.
>
> Sorry for repeating myself here: If you're referring to the
> stock loop.c not working with tmpfs because tmpfs lacks
> {prepare,commit}_write which my patch works around (based on Jari's
> patch before mine, and a patch by Andrew Morton as well). I have yet
> to hear a clear reason why any writable plain file on any given file
> system could not have {prepare,commit}_write operations available.
...
> Please come up with a clear example. I'm not asking you for a
> test case that can produce it, just some narrative of the problem
> occurring.

Not all filesystems use generic_read/generic_write. If they did we
wouldn't need those calls in the fops structure.

Ofcourse the prepar_write/commit_write were introduced later on and
perhaps it is possible to modify all filesystems to put all their
custom functionality in these functions. Then we can simply remove the
read and write (and mmap?) fops, i.e. force everyone to use the provided
generic read/write functions.

But as long as a filesystem is allowed to provide it's own functions,
any wrapper should never assume that generic operations will work.

The clear example you are looking for is probably Coda, most others
filesystems simply wrap the generic_ function plus some minor
functionality (like setting the ctime).

Jan

2002-08-27 23:38:44

by Adam J. Richter

[permalink] [raw]
Subject: Re: Loop devices under NTFS

>On Tue, Aug 27, 2002 at 06:53:19AM -0700, Adam J. Richter wrote:
>> Why?
>>
>> According to linux-2.5.31/Documentation/Locking,
>> "->prepare_write(), ->commit_write(), ->sync_page() and ->readpage()
>> may be called from the request handler (/dev/loop)."

>Just because it's present in current code it doesn't mean it's right.
>Calling aops directly from generic code is a layering violation and
>it will not survive 2.5.

Only according your own proclamation. You are arguing
circular logic, and I am aruging a concrete benefit: we can avoid an
extra copying of all data in the input and output paths going through
an encrypted device. While I don't have a benchmark to show you (and
the burden of proof is upon you since you want a change), an extra
copying of all data going through a potentially high throughput
service (like, say, all of your file systems if you're running an
encryptd disk), is likely to have a substantial performance impact.
There is a real world benefit at stake here of making strong
encryption as "cheap" to use as as possible.


>> Using the page cache in loop.c saves a copy when there is a
>> data transformation (such as encryption) involved, and that can be
>> important for reducing the cost of privacy.

>Separating a stackalbe encryption block device from the loop driver is
>a good idea. The current loop code is a horrible mess because it tries
>to do the job of three drivers in one.

Just saying "good idea" is no substitute for an argument about
real world benefits, like performance, code footprint, etc.

Granted, loop.c is more complex than I would like it to be,
but other changes elsewhere in the kernel that I'd like to see would
make the same or better simplifications in loop.c without introducing
extra data copy in the critical path:

1. If every writable plain file had aops->{prepare,commit}_write, I could delete the file_ops->{read,write} transfer mode from
loop.c.

2. At some point, lvm2 device mapper will hopefully be merged
into the kernel. This will allow pushing drivers/md,
drivers/ide/*raid*.[ch], and even disk partitioning out of
the kernel to raidtools and partx. Device Mapper supports
data transformations in order to do RAID parity sectors.
This duplication between loop.c and lvm2 could be
eliminated by making a cryptoapi lvm2 plugin, and porting
the code to the device mapper to map files to devices.
That would also allow users to do things like RAID across
files on different remote file systems (so you don't need
to convince a central administration to reallocate space on
file servers to run nbd).


>> >Depending on the filesystem implementation _anything_ may happen.
>> >With current intree filesystems the only real life problem is that
>> >it doesn't work on certain filesystems.
>>
>> Sorry for repeating myself here: If you're referring to the
>> stock loop.c not working with tmpfs because tmpfs lacks
>> {prepare,commit}_write which my patch works around (based on Jari's
>> patch before mine, and a patch by Andrew Morton as well). I have yet
>> to hear a clear reason why any writable plain file on any given file
>> system could not have {prepare,commit}_write operations available.

>No, tmpfs also does not use generic_file_read but a sligh variation,
>calling do_generic_file_read on tmpfs inodes will not always works as
>expected. Don't do it.

Your first sentence is not a clear reason why tmpfs cannot
provide {prepare,commit}_write, and your second sentence ("Don't do
it.") is not a reason.


>> Please come up with a clear example. I'm not asking you for a
>> test case that can produce it, just some narrative of the problem
>> occurring.

>loop on nfs, do_generic_file_read is called without the needed
>nfs_revalidate_inode, thus i_size is outdated, and loop might happily
>read out of the filesize.

>> I am aware that you can get races if someone mounts a loop
>> device while accessing the underlying file by some other mechanism,
>> but I believe that the only case where that would be done in practice
>> is to change the encryption of a device, and, because of the read and
>> write patterns involved in that, it should not be a problem.

>This is true for filesystems like nfs (above) that only revalidate and then
>call generic_file_read. For totally different implementations anything can
>happen. Even if it mostly works it's not the kind of design we want to have
>in the kernel.

I have to say I haven't see much documentation of
address_space_operations aside from the code, and a few pages about
the page cache in _Understanding The Linux Kernel_. However, if you
believe that loop.c is relying on some guarantee that aops does not
officially provide but all of its implementations currently abide by,
then simply documenting that guarantee as "official" would result in a
kerenl that is lives within its guarantees and yet has faster performance
for software encrypted block devices.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-08-27 23:55:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Tue, Aug 27, 2002 at 04:42:56PM -0700, Adam J. Richter wrote:
> >On Tue, Aug 27, 2002 at 06:53:19AM -0700, Adam J. Richter wrote:
> >> Why?
> >>
> >> According to linux-2.5.31/Documentation/Locking,
> >> "->prepare_write(), ->commit_write(), ->sync_page() and ->readpage()
> >> may be called from the request handler (/dev/loop)."
>
> >Just because it's present in current code it doesn't mean it's right.
> >Calling aops directly from generic code is a layering violation and
> >it will not survive 2.5.
>
> Only according your own proclamation. You are arguing
> circular logic, and I am aruging a concrete benefit: we can avoid an
> extra copying of all data in the input and output paths going through
> an encrypted device.

I tell you that the address_spaces are an _optional_ abstraction. Thus
using the directly from generic code is a layering violation. This layer
of indirection was added intentionally in 2.3, and if you want to get rid
of it again please submit a patch to Al to merge the aops back into the
inode_operations vector. Otherwise I will cleanup the last remaining
violation of that layering rule in 2.5.

> While I don't have a benchmark to show you (and
> the burden of proof is upon you since you want a change), an extra
> copying of all data going through a potentially high throughput
> service (like, say, all of your file systems if you're running an
> encryptd disk), is likely to have a substantial performance impact.
> There is a real world benefit at stake here of making strong
> encryption as "cheap" to use as as possible.

I am currently reviewing a patch from Jari Ruusu that does not only
get rid of the layering violation but also provides certain advantags
for the loop-AES crypto addon he wrote/maintains. I doubt he would do
so it it kills performance for his application. Neverless I must say
I don't care at all for performace of encrypted loop: It's not merged,
and mainline correctness has a _much_ higher priority for me than
performance of external code.

You argue for performace at the cost of correctness.

> >Separating a stackalbe encryption block device from the loop driver is
> >a good idea. The current loop code is a horrible mess because it tries
> >to do the job of three drivers in one.
>
> Just saying "good idea" is no substitute for an argument about
> real world benefits, like performance, code footprint, etc.

Correctness and cleanness.

> >No, tmpfs also does not use generic_file_read but a sligh variation,
> >calling do_generic_file_read on tmpfs inodes will not always works as
> >expected. Don't do it.
>
> Your first sentence is not a clear reason why tmpfs cannot
> provide {prepare,commit}_write, and your second sentence ("Don't do
> it.") is not a reason.

It could provide them just for the sake of loop.c's layering violation
to exist for a longer time. Due to it's abuse of do_generic_file read
it would continue to have another problem.

> I have to say I haven't see much documentation of
> address_space_operations aside from the code, and a few pages about
> the page cache in _Understanding The Linux Kernel_. However, if you
> believe that loop.c is relying on some guarantee that aops does not
> officially provide but all of its implementations currently abide by,
> then simply documenting that guarantee as "official" would result in a
> kerenl that is lives within its guarantees and yet has faster performance
> for software encrypted block devices.

If you think that the guarantee that every filesystem should be pagecache
backed is worth documenting (and adopting everything to it), feel free to
submit a patch for review. I have stated above why it's not a good idea.

2002-08-28 01:02:22

by Adam J. Richter

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Wed, 28 Aug 2002 at 00:59:55AM +0100, Christoph Hellwig wrote:
>On Tue, Aug 27, 2002 at 04:42:56PM -0700, Adam J. Richter wrote:
>> >On Tue, Aug 27, 2002 at 06:53:19AM -0700, Adam J. Richter wrote:
>> >> Why?
>> >>
>> >> According to linux-2.5.31/Documentation/Locking,
>> >> "->prepare_write(), ->commit_write(), ->sync_page() and ->readpage()
>> >> may be called from the request handler (/dev/loop)."
>>
>> >Just because it's present in current code it doesn't mean it's right.
>> >Calling aops directly from generic code is a layering violation and
>> >it will not survive 2.5.
>>
>> Only according your own proclamation. You are arguing
>> circular logic, and I am aruging a concrete benefit: we can avoid an
>> extra copying of all data in the input and output paths going through
>> an encrypted device.

>I tell you that the address_spaces are an _optional_ abstraction.
>Thus using the directly from generic code is a layering violation.

That does not follow from your previous sentence. It is
perfectly legitimate to check for the existence of an optional feature
and use it if it is there, which is what the stock 2.5.31 loop.c and
my version do.

>This layer
>of indirection was added intentionally in 2.3, and if you want to get rid
>of it again please submit a patch to Al to merge the aops back into the
>inode_operations vector.

Regardless of whether aops and inode_operations are merged,
you haven't shown a problem with using aops when they are present. It
is not necessary to remerge these data structures in order to use an
optional feature if it is present.

>Otherwise I will cleanup the last remaining
>violation of that layering rule in 2.5.

You've failed to show real end user benefits against the
disadvantage of slower encrypted devices and you're going to go ahead
anyway? I am trying to keep an open mind here, but if you can't
provide real reasons against performance benefits and nobody else
convinces me otherwise, and you insist on trying to put through these
changes, then I think Linux will be better off if I ask Linus to
reject your loop.c patches and encourage others to do likewise.


>> While I don't have a benchmark to show you (and
>> the burden of proof is upon you since you want a change), an extra
>> copying of all data going through a potentially high throughput
>> service (like, say, all of your file systems if you're running an
>> encryptd disk), is likely to have a substantial performance impact.
>> There is a real world benefit at stake here of making strong
>> encryption as "cheap" to use as as possible.

>I am currently reviewing a patch from Jari Ruusu that does not only
>get rid of the layering violation but also provides certain advantags
>for the loop-AES crypto addon he wrote/maintains.

Why don't you list those advantages or at least some of them?

I have nothing against Jari, and I adapted his file_operations
change in my loop.c cleanup, but I should tell you that the last time
I looked at his changes, they created a much bigger loop.c. In
comparison, my version fixes the serious bug of loop.c allocating
n*(n+1)/2 pages for an n page bio, cleans up the locking dramatically,
eliminats the need to preallocate a fixed number of loop devices,
exports the DMA parameters of each underlying loop devices to enable
handling of bigger bio's, eliminates unnecessary data copying (via
bio_copy, not just the aops stuff we have been talkign about), a
generally makes it a lot more readable. Before I put in the
file_ops->{read,write} stuff, my changes actually shrunk loop.c.

>I doubt he would do
>so it it kills performance for his application.

There is a difference between killing performance and making
enough of a difference to warrant an engineering decision.
Differences that warrant such changes can be small enough that you
need to do benchmarks to be sure of them. People present lots of
papers on measurement results in Linux at conferences because of this.
In general, and extra copy of all data on the input and output data
paths is a big deal.

>Neverless I must say
>I don't care at all for performace of encrypted loop: It's not merged,
>and mainline correctness has a _much_ higher priority for me than
>performance of external code.

Again, your only definition of "correctness" is by your own
proclamation. Linux isn't just for your personal interests. Anyone
who filters patches on that basis is abusing the trust placed in him
or her. If you care that little about a major use of loop.c, then
Linux will be better off if you stay out of the patch approval path
for it.

>You argue for performace at the cost of correctness.

Again, your definition of "correct" is only your own proclamation
against my real world benefits.

>> >Separating a stackalbe encryption block device from the loop driver is
>> >a good idea. The current loop code is a horrible mess because it tries
>> >to do the job of three drivers in one.
>>
>> Just saying "good idea" is no substitute for an argument about
>> real world benefits, like performance, code footprint, etc.

>Correctness and cleanness.

If we discount circular proclamation ("correct is what I
say"), I think my loop.c patch is by far the cleanest implementation
(granted, I still need to submit a patch to deal with bio_copy
failing, but that is not very rare occurance now that I have fixed the
n**2 pages bug, and the infrastructure for it is in my latest patch).

[...]
>If you think that the guarantee that every filesystem should be pagecache
>backed is worth documenting (and adopting everything to it), feel free to
>submit a patch for review.

In the meantime, it is not a layering violation to check for
an optional feature and use it if availalbe.

Actually, if someone implements
tmpfs->aops->{prepare,commit}_write, then I would be happy to have
writable /dev/loop work only on file systems that provide
{prepare,commit}_write, as I think the only remaining one would be
intermezzo (which perhaps could also be fixed, but I believe has few
users and is likely to be replaced by Lustre).


>I have stated above why it's not a good idea.

No, you haven't. I have asked you repeatedly. Please provide
a narrative that shows where there can be a behavior that users will
dislike more than the performance penalty of your approach. Walk me
through the call graph. If the facts are on your side, why can't you
explain such a scenario clearly?

If I wanted to argue circular proclamations like you are
doing, I could complain about the kernel code basically pretending to
be a user process with set_fs(), make hokey claims it is a "layering
violation" to call blk_run_queues, etc., but I don't, because I know
that benefits to the user are the purpose that discliplines are
supposed to serve, and that it is sometimes delivers more benefits to
question and adjust such practices accordingly. In this particular
case, by the way, the documentation (Documentation/Locking) is even on
my side, so it's really you who are arguing for a change in practice,
and all you can cite is a circular proclamation that boils down to
"because I say so."

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-08-28 01:45:06

by Adam J. Richter

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Tue, 27 Aug 2002 at 13:26:44 -0400, Jan Harkes wrote:
>Not all filesystems use generic_read/generic_write. If they did we
>wouldn't need those calls in the fops structure.

My loop.c patch supports files that do not provide
aops->{prepare,commit}_write (derived from changes by Jari Ruusu
and Andrew Morton).

Christoph was arguing that even if the file provides
aops->{prepare,commit}_write, that there could be a problem using it.
I am looking for a clear example of that. I don't see the problem
with using this facility if you first check that it is provided.

However, thank you for correcting me about Coda. I missed
that in the list of file system that do not appear to provide
{prepare,commit}_write for plain files. I actually grepped around and
discussed this by email with Andrew Morton and Hugh Dickens on August
16th. Looking back at that email now, the list of file systems that
my grepping around suggested lacked {prepare,commot}_write for
writable files was:

tmpfs
coda
intermezzo
ncpfs

Side note:
>Ofcourse the prepar_write/commit_write were introduced later on and
>perhaps it is possible to modify all filesystems to put all their
>custom functionality in these functions. Then we can simply remove the
>read and write (and mmap?) fops [...]

You still need read and write methods at least for files that
are not seekable (e.g., serial devices, network sockets, pipes), but I
think you could conceivably have everything else use generic page
cache routines.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-08-28 08:33:43

by Urban Widmark

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Tue, 27 Aug 2002, Adam J. Richter wrote:

> On Tue, 27 Aug 2002 at 13:26:44 -0400, Jan Harkes wrote:
> >Not all filesystems use generic_read/generic_write. If they did we
> >wouldn't need those calls in the fops structure.
>
> My loop.c patch supports files that do not provide
> aops->{prepare,commit}_write (derived from changes by Jari Ruusu
> and Andrew Morton).
>
> Christoph was arguing that even if the file provides
> aops->{prepare,commit}_write, that there could be a problem using it.
> I am looking for a clear example of that. I don't see the problem
> with using this facility if you first check that it is provided.

smbfs has aops but when used with the current loop.c it corrupts the file
it is using. I can't say that the error is in loop.c but it is the only
way I can trigger the corruption and the smbfs aops (locking) aren't all
that different from the nfs ones.

Here is an example:

# dd if=/dev/zero of=/opt/src/smbfs/share/iozone.tmp bs=1024 count=200000
(/opt/src/smbfs/share is exported by a localhost samba as tmp)
# mount -t -o guest //localhost/tmp /mnt/smb
# losetup /dev/loop0 /mnt/smb/iozone.tmp
# mke2fs /dev/loop0
# mount /dev/loop0 /mnt/tmp
# cp -a ~puw/src/linux/linux-2.4.18/* /mnt/tmp
<something>: Input/Output error
# dmesg
EXT2-fs error (device loop(7,0)): read_block_bitmap: Cannot read block
bitmap - block_group = 0, block_bitmap = 3
EXT2-fs error (device loop(7,0)): read_block_bitmap: Cannot read block
bitmap - block_group = 21, block_bitmap = 172033

I can't say that I understand the problem, and maybe it can be explained
by a need for revalidate as Christoph said earlier in this thread. But
there should be no size changes and any revalidate shouldn't change
anything.

When I asked what a filesystem must do to support loop on linux-fsdevel
(May) AM suggested changing loop to use file->read/write (yes, he cleverly
avoided answering my question :).

I made an ugly patch and it fixed the corruption (but broke encryption) to
see if anyone cared about loop. Jari does so he took the idea and included
it with the other things he wants from loop.


Maybe this problem is caused by loop.c not using the aops correctly and
maybe it is an example of what layering violation can do. I wish I
understood this well enough to make it a clear example.

/Urban

2002-08-28 09:13:08

by Adam J. Richter

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Wed, 28 Aug 2002 10:36:29 +0200, Urban Widmark wrote:
>smbfs has aops but when used with the current loop.c it corrupts the file
>it is using. I can't say that the error is in loop.c but it is the only
>way I can trigger the corruption and the smbfs aops (locking) aren't all
>that different from the nfs ones.

I think you're just exercising a bug in the stock
loop.c that I fixed in a recent patch for 2.5.31 loop.c.
It occurred when the of the file system block size was less than
than a page size, which I believe is what mke2fs will default to
for a ~20MB file system, as in your example. The fix is item #3
discussed in my posting of an earlier version of the loop.c patch:

http://marc.theaimsgroup.com/?l=linux-kernel&m=102924362322080&w=2

although I posted a newer version of the loop.c patch
(which also has the fix but does not discuss it in the message text)
here:

http://marc.theaimsgroup.com/?l=linux-kernel&m=102941520919910&w=2

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-08-28 15:46:34

by Andre Bonin

[permalink] [raw]
Subject: Re: Loop devices under NTFS

Adam J. Richter wrote:

>On Wed, 28 Aug 2002 at 00:59:55AM +0100, Christoph Hellwig wrote:
>
>
>>On Tue, Aug 27, 2002 at 04:42:56PM -0700, Adam J. Richter wrote:
>>
>>
>>>>On Tue, Aug 27, 2002 at 06:53:19AM -0700, Adam J. Richter wrote:
>>>>
>>>>
>>>>> Why?
>>>>>
>>>>> According to linux-2.5.31/Documentation/Locking,
>>>>>"->prepare_write(), ->commit_write(), ->sync_page() and ->readpage()
>>>>>may be called from the request handler (/dev/loop)."
>>>>>
>>>>>
>>>>Just because it's present in current code it doesn't mean it's right.
>>>>Calling aops directly from generic code is a layering violation and
>>>>it will not survive 2.5.
>>>>
>>>>
>>> Only according your own proclamation. You are arguing
>>>circular logic, and I am aruging a concrete benefit: we can avoid an
>>>extra copying of all data in the input and output paths going through
>>>an encrypted device.
>>>
I'me new to kernel development and i've never fooled around with drivers
before (I do have a course in it this september though, Wohoo!).

Why are you saying it would copy the data? Couldn't you just make some
sort of shared memory system that would let you unencript/uncompress the
data without having to do a copy? The way i see it you can read the
block, pass it through the necessary mods using the same data. You
could get a wierd race condition if your uncompress and your unencript
work on the same data at the same time but that can easily be avoided
The way i see it, the NTFS driver should be able to read the file and
uncompress. The loop driver should have access to that block without
having to do a copy to present it to a third party driver. Which then
reads the data read by the driver and rpesents it as a filesystem.

Maby even do away with loop.c, there should really be no loop.c . A
normal mount (/dev/hda1 for example) is a first level mount. If you let
/mnt/foo/myfile.iso be from /dev/hda1, then the chain should be
/dev/hda1->ISO9660 module-->presentation.

I think the filesystem drivers should be written in such a way that they
are totally pluggable with eachother. That it doesn't matter where the
blocks are comming from and going to. You could have a mount of
/dev/hda1 of an iso containing an ext2 image within it etc etc etc.

But like i said i'me new to kernel development so I think i might have
the wrong perspective.


-----------------------------------
Andre Bonin
Student in Software Engineering
Lakehad University
Thunder Bay,
Canada
-----------------------------------

>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
>




2002-08-28 16:37:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Tue, Aug 27, 2002 at 06:06:32PM -0700, Adam J. Richter wrote:
> >I tell you that the address_spaces are an _optional_ abstraction.
> >Thus using the directly from generic code is a layering violation.
>
> That does not follow from your previous sentence. It is
> perfectly legitimate to check for the existence of an optional feature
> and use it if it is there, which is what the stock 2.5.31 loop.c and
> my version do.

I didn't say an optional feature. The filesystem may choose to use that
abstraction in a totally different way than the generic code. An example
for such an filesystem is GFS. Thus OpenGFS doesn't support loop devices
at all and sistina has tp provide workaround patches for their propritary
product.

> >This layer
> >of indirection was added intentionally in 2.3, and if you want to get rid
> >of it again please submit a patch to Al to merge the aops back into the
> >inode_operations vector.
>
> Regardless of whether aops and inode_operations are merged,
> you haven't shown a problem with using aops when they are present. It
> is not necessary to remerge these data structures in order to use an
> optional feature if it is present.

See above. Execpt of ->writepage which must be callable by the VM writeout
code filesystems may have totally different assumptions. For a worst case
example look at the (Open-)GFS cluster filesystem. For a not so bad example
look at XFS which does in theory require additional locking although this
doesn't seem exploitable in practice.

> You've failed to show real end user benefits against the
> disadvantage of slower encrypted devices and you're going to go ahead
> anyway?

The end-user benefict is that a user can use the loop device omtop
of any filesystem without the danger of those races actually beeing exposed
for some random reason.

> I looked at his changes, they created a much bigger loop.c.

Yes, loop.c does grow.

> In
> comparison, my version fixes the serious bug of loop.c allocating
> n*(n+1)/2 pages for an n page bio, cleans up the locking dramatically,
> eliminats the need to preallocate a fixed number of loop devices,
> exports the DMA parameters of each underlying loop devices to enable
> handling of bigger bio's, eliminates unnecessary data copying (via
> bio_copy, not just the aops stuff we have been talkign about), a
> generally makes it a lot more readable. Before I put in the
> file_ops->{read,write} stuff, my changes actually shrunk loop.c.

I don't complain about your other changes. These bugs were in before and
after your changes so I certainly do not blaim you.

> There is a difference between killing performance and making
> enough of a difference to warrant an engineering decision.
> Differences that warrant such changes can be small enough that you
> need to do benchmarks to be sure of them. People present lots of
> papers on measurement results in Linux at conferences because of this.
> In general, and extra copy of all data on the input and output data
> paths is a big deal.

Blarg. If you care for performance encrypted loop is the last thing you want.

> If you care that little about a major use of loop.c, then
> Linux will be better off if you stay out of the patch approval path
> for it.

I don't care about loop.c, really. And I don't plan to approve or reject
loop-specific patches. I care about it's interaction with the VFS and
lowlevel filesystems.

2002-08-29 10:56:21

by Adam J. Richter

[permalink] [raw]
Subject: Re: Loop devices under NTFS

>On Tue, Aug 27, 2002 at 06:06:32PM -0700, Adam J. Richter wrote:
>> >I tell you that the address_spaces are an _optional_ abstraction.
>> >Thus using the directly from generic code is a layering violation.
>>
>> That does not follow from your previous sentence. It is
>> perfectly legitimate to check for the existence of an optional feature
>> and use it if it is there, which is what the stock 2.5.31 loop.c and
>> my version do.

>I didn't say an optional feature. The filesystem may choose to use that
>abstraction in a totally different way than the generic code. An example
>for such an filesystem is GFS. Thus OpenGFS doesn't support loop devices
>at all and sistina has tp provide workaround patches for their propritary
>product.

To the best of my knowledge, OpenGFS is not available for 2.5,
and OpenGFS patches the 2.4 kernel to use {prepare,commit}_write the
way it does (referring to
opengfs-0.0.92/kernel_patches/2.4.17/generic_file_write.patch,
although you should be happy to know that that should not be necessary
if you port it to 2.5, since 2.5 provides generic_file_write_nolock).

If the OpenGFS wants to change the aops requirements given in
Documentation/filesystems/Locking, let's have that "review", as you
have argued for in this thread when the shoe was on the other foot.

It is worth noting that the shoe also seems to be on the other
foot with respect to your doctine of giving priority to "mainline"
kernel conventions. Documentation/filesystem/Locking and the existing
drivers/block/loop.c are part of the mainline kernel, and the reasons
for cryptoapi not being in the mainline kernel partly have to do with
concerns about distributability (although a much smaller issue in some
countries than under their past crypto laws). In comparison, OpenGFS
is not in the mainline kernel, and I haven't heard any legal grumbles
from Sistina about it, so it's not being included is probably due to a
lack of users and/or maintenance. The last OpenGFS release, 0.0.92,
was seven months ago. Opengfs-users has had three messages since
July. Opengfs-devel has had a total of three in the month of August,
and has been quiet for three weeks. So, your doctine about about the
mainline kernel having priority argues against accomodating OpenGFS.
Fortunately, I don't give much weight to that policy. My point is
just that you only make trouble for yourself with these proclamations.

In other words, doctrinaire ex cathedra proclamations like "I
didn't say an optional feature" will not identify the most beneficial
design trade-off as reliably, convincingly or even as quickly as
analyzing end user benefits. For example, everything above this line
could have been skipped if you would we would avoid "proof by
proclamation." Can we please try? Thanks in advance.

---------------------------------------------------------------------------

If OpenGFS is being made to work under 2.5, we should be able
to arrange for /dev/loop to work on OpenGFS without making users of
encrypted loop files pay the cost of the extra copy (more on this cost
below). Even if OpenGFS or a variant is never going to be ported to
2.5, I am willing to look at it as an example of the potential
benefits of removing aops->{prepare,commit}_write calls from loop.c,
although it may then be reasonable to conclude that the code should
not be changed until needed.

Here are the three approaches that I can think of and their
major pros and cons:

1. Make loop.c never use {prepare,commit}_write.
Pro: Shrinks loop.c
Con: adds a copy operation to most encrypted loop files.

2. As you mention (but do not endorse) in your posting to gfs-devel,
modify loop.c so that it does not use {prepare,commit}_write
on OpenGFS, but does on other file systems (to avoid a data copy).
This kludge could, for example, be of any of these ~5 line changes:
a. strcmp(lo->backing_file->file_dentry->d_inode->i_sb->s_type->name, "gfs")
b. address_space_operations.pagecache_unwritable
c. an ioctl option passed via losetup
Pro: Other encrypted loop files avoid a copy.
Con: Encrypted loop files on GFS do an extra copy;
it's a kludge; option c is not automatic.

3. Make OpenGFS (and potentially other future file systems)
export a {prepare,commit}_write that works with loop.c, as
documented in Documentation/filesystem/Locking.
Pro: Encrypted loop files on all file systems including GFS
avoid a copy operation. If tmpfs follows suit, then
maybe loop.c can shrink (remove file->{read,write} IO).
Con: (To be discussed.)

The pros and cons basically boil down to two questions:

1. How beneficial is it to avoid the copy?
(high benefit opposes #1, slightly supports #3 over #2)
2. What are the cons for #3?


Let's start with the cost of the copy, which you address here:

>Blarg. If you care for performance encrypted loop is the last thing you want.

I think you've got it backwards. Ultimately, the only reason
that people care about performance is to apply it to some useful
purpose. For example, perhaps you want to create an encrypted loop
file to store and view some movies (say, because you want to protect
the authors copyright interests), or for a small confidential database.

Bruce Schneier's x86 implementation of twofish encrypts at 18
cycles per byte on Pentium 3, which should be about 55MB/sec on a 1GHz
P3. Here is a URL for someone who claims to have an x86 AES
implementation that does up to 58MB/sec.:
http://fp.gladman.plus.com/cryptography_technology/rijndael/ and one
that does 45MB/sec on an 800MHz Pentium III:
http://home.cyber.ee/helger/implementations/. I believe that is about
the sustained transfer rate of one top of the line hard disk (although
the file system means there will be some seeks slowing that down), and
the file system will have a slower sustained transfer rate on one
disk, due at least to some seeking. So, depending on CPU speed and
other computing that there is to be done, it is possible that with
read-ahead and write-behind that encryption one one CPU can be fast
enough to keep up with the maximum throughput of the filesystem on a
one disk drive.

In this context, the cost of an extra memory copy becomes
nearly as important as it is without encryption, perhaps more
important because CPU and memory bandwidth are now more of a gating
factor. Poking around the web,
http://old.lwn.net/2001/0405/a/sched-tests.php3 has some lmbench
numbers for a 1GHz computer, which give bcopy a speed of up to about
270MB/sec, or 20% of time used by high optimized encryption. As the
speed ratios of processor cores to memory bandwidth increases, bcopy
will account for a larger penalty. Also, the sizes of the data
transfers that we are talking about in a single bio are a substantial
fraction of the size of the level 2 data cache (the L1 data cache will
be completely used either way).

Anyhow, I would guess that, given optimized encryption, will
turn out to make a difference of at least 5% in sustained bandwidth.
That's not the kind of difference that brings a system to its knees,
but it is the kind of difference that people benchmark and feel
justified in adding or keeping extra code.

Now let's look at the costs involved in having opengfs
allow loop.c to use {prepare,commit}_write.

I have downloaded opengfs-0.0.92.tar.gz and looked at
gfs_{prepare,commit}_write, gfs_unstuff_dinode, but I don't yet see
the specific problem that you refer to in your message on
opengfs-devel about gfs_{prepare,commit}_write assuming that a certain
lock is held.

When I try to build 0.0.92 under linux-2.5.31, I get
compilation errors such as, src/locking/modules/memexp/memxp.h needing
<linux/locks.h>, which does not exist in 2.5, and the sourceforge cvs
version appears to have this dependency too. The documentation in
0.0.92 only talks about 2.4, and neither 0.9.2 nor the sourceforge cvs
tree have src/fs/arch_linux_2_5. Its documentation only talks about
2.4, and the current cvs tree on sourceforge does not have a
src/fs/arch_linux_2_5 either. When I try to build it, with errors
such as, src/locking/modules/memexp/memxp.h needing <linux/locks.h>,
which does not exist in 2.5, and the sourceforge cvs version appears
to have this dependency as well.

I am guessing that the basic problem is that OpenGFS wants
to do something like this:

down(&inode->i_sem);
Acquire file lock via pluggable lock manager (gfs_glock_i?)
some other gfs-specific initialization
Call __generic_file_write

generic_file_write calls gfs_prepare_write
generic_file_write calls gfs_commit_write

generic_file_write calls gfs_prepare_write
generic_file_write calls gfs_commit_write
.
.
__generic_file_write returns
some other gfs-specific tear-down
Release file lock via pluggable lock manager
up(&inode->i_sem);


If I understand correctly, the situation is that you could
do the initialization and tear-down in every gfs_prepare_write and
gfs_commit_write.

Would it be possible to add two more address_space operations,
along the following lines to provide a general mechanism for this
optimization?:

struct address_space_operations {
int before_io(struct file *file, int dir, unsigned start, unsigned len);
int after_io(struct file *file, int dir, unsigned start, unsigned len);
};

generic_file_write is pretty big to begin with, so checking
these values once on each call to generic_file_write should not be a
big deal.

I don't think that this would need to go into the kernel until
OpenGFS is ported to 2.5, so I think there is plenty of time to get
everyone's input on it. Could you make OpenGFS work approximately as
optimally given that abstraction?

In the meantime, I'm quite willing to put a kludge in loop.c
to make it use file_ops->{read,write} if the underlying file is on a
file system type named "gfs". I would like to see OpenGFS work
without problems. I think it's great that you guys are doing it
(although perhaps you have lately been waiting to see how real Lustre
turns out to be?).

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-08-29 11:23:17

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: Loop devices under NTFS

On Thu, 29 Aug 2002, Adam J. Richter wrote:
[snip]
> Here are the three approaches that I can think of and their
> major pros and cons:
>
> 1. Make loop.c never use {prepare,commit}_write.
> 2. As you mention (but do not endorse) in your posting to gfs-devel,
> modify loop.c so that it does not use {prepare,commit}_write
> on OpenGFS, but does on other file systems (to avoid a data copy).
> 3. Make OpenGFS (and potentially other future file systems)
> export a {prepare,commit}_write that works with loop.c, as
> documented in Documentation/filesystem/Locking.

And why not 4., have a per fs flag (say fs_{,set_,clear_,}generic_aops())
(or per superblock flag or whatever, perhaps a per address space flag
even?) specifying whether the fs' aops support loop or not. loop.c then
simply does:

if (fs_generic_aops()/fs_aops_support_loop()/whatever...)
use aops ->readpage and ->{prepare,commit}_write
else
use fops ->read and ->write

I guess that is like point 2, just making it a simple generic mechanism so
that loop always works yet users of address spaces are free to implement
their ->readpage and ->{prepare,commit}_write anything they want...

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2002-08-29 15:12:06

by Jari Ruusu

[permalink] [raw]
Subject: Re: Loop devices under NTFS

"Adam J. Richter" wrote:
> Bruce Schneier's x86 implementation of twofish encrypts at 18
> cycles per byte on Pentium 3, which should be about 55MB/sec on a 1GHz
> P3. Here is a URL for someone who claims to have an x86 AES
> implementation that does up to 58MB/sec.:
> http://fp.gladman.plus.com/cryptography_technology/rijndael/ and one
> that does 45MB/sec on an 800MHz Pentium III:
> http://home.cyber.ee/helger/implementations/. I believe that is about
> the sustained transfer rate of one top of the line hard disk (although
> the file system means there will be some seeks slowing that down), and
> the file system will have a slower sustained transfer rate on one
> disk, due at least to some seeking. So, depending on CPU speed and
> other computing that there is to be done, it is possible that with
> read-ahead and write-behind that encryption one one CPU can be fast
> enough to keep up with the maximum throughput of the filesystem on a
> one disk drive.

Adam,

If you had followed what is going on with loop crypto, you would have known
that loop-AES' AES cipher is based on Dr Brian Gladman's implementation, and
as such is about twice as fast as the one in cryptoapi. Here is some data on
AMD Duron 800 MHz:

key length 128 bits, encrypt speed 354.3 Mbits/sec
key length 128 bits, decrypt speed 359.3 Mbits/sec
key length 192 bits, encrypt speed 298.8 Mbits/sec
key length 192 bits, decrypt speed 297.7 Mbits/sec
key length 256 bits, encrypt speed 258.8 Mbits/sec
key length 256 bits, decrypt speed 260.6 Mbits/sec

So if you really cared about speed, you would not be using cryptoapi. Also,
your loop patch will deadlock when used to encrypt swap. My loop patch does
not.

Regards,
Jari Ruusu <[email protected]>