2006-03-24 22:25:14

by Michael Halcrow

[permalink] [raw]
Subject: eCryptfs Design Document

On Fri, Nov 18, 2005 at 10:16:59PM -0800, Andrew Morton wrote:
> If Linux is going to offer a feature like this then people have to
> be able to trust it to be quite secure. What we don't want to
> happen is to distribute it for six months and then be buried in
> reports of vulnerabilites from cryptography specialists. Even worse
> if those reports lead to exploits.
>
> So I guess what I'm asking is: has this code been reviewed by crypto
> experts? Bearing in mind that it'll be world-class crypto people
> who will try to poke holes in it.

We have been in a constant process of design and code review over the
last several months. Among others, the eCryptfs design has been
reviewed by colleagues in the security and cryptography groups at
IBM's TJ Watson Research Center, including Shai Halevi, Ron Perez,
Dave Safford, Reiner Sailer, and Wietse Venema. Suggestions and other
useful feedback from these formal and informal reviews has been
incorporated. The general consensus was that the design of eCryptfs
version 0.1 is sufficient to provide data confidentiality given the
threat model detailed in Section 2 of the design document. In short,
the threats considered are along the lines of theft or loss of the
physical media containing the data.

The PDF document is obtainable from the eCryptfs SourceForge file
download section:

http://sourceforge.net/project/showfiles.php?group_id=133988

I also have it posted on the eCryptfs web site:

http://ecryptfs.sourceforge.net/ecryptfs_design_doc_v0_1.pdf

I have included an inlined plaintext rendering of the design document
at the end of this message.

The source code itself has been undergoing scrutiny by various users;
we have also been running it through static code analyzers such as
Coverity.

I am submitting this document to the public for review by other
parties who are interested in reviewing the design and implementation
of eCryptfs. This document reflects the design of the eCryptfs
filesystem version 0.1, which the eCryptfs development team is
currently recommending for inclusion into the Linux kernel. If there
are no objections to the design at this point, the patch set for the
current eCryptfs version can be posted to the LKML in the near future.

Thanks,
Mike Halcrow
[email protected]
http://ecryptfs.sourceforge.net

---

eCryptfs v0.1 Design Document

Michael A. Halcrow

Table of Contents

1 Introduction
2 Threat Model
3 Functional Overview
3.1 VFS Objects
3.2 VFS Operations
3.2.1 Mount
3.2.2 File Open
3.2.3 Page Read
3.2.4 Page Write
3.2.5 File Truncation
3.2.6 File Close
4 Cryptographic Properties
4.1 Key Management
4.2 Cryptographic Confidentiality Enforcement
4.3 File Format
4.3.1 Marker
4.4 Deployment Considerations
4.5 Cryptographic Summary


1 Introduction

This document details the design for eCryptfs. eCryptfs is a
POSIX-compliant enterprise-class stacked cryptographic filesystem for
Linux. It is derived from Erez Zadok's Cryptfs, implemented through
the FiST framework for generating stacked filesystems. eCryptfs stores
cryptographic metadata in the header of each file written, so that
encrypted files can be copied between hosts; the file will be
decryptable with the proper key, and there is no need to keep track of
any additional information aside from what is already in the encrypted
file itself.

eCryptfs is a native Linux filesystem. It builds as a stand-alone
kernel module for the Linux kernel; there is no need to apply any
kernel patches.

The developers are implementing eCryptfs features on a staged
basis. The first stage (version 0.1) includes mount-wide passphrase
support and data confidentiality enforcement. The second stage
(version 0.2) will include mount-wide public key support and data
integrity enforcement. The third stage (version 0.3) will include
per-file policy support. This document provides a technical
description of the eCryptfs filesystem release version 0.1. eCryptfs
version 0.1 is now complete, and the developers are recommending that
eCryptfs be merged into the mainline Linux kernel.

Michael Halcrow has published two papers covering eCryptfs at the
Ottawa Linux Symposium (2004 and 2005). The eCryptfs paper is on page
209 of the first of the two halves of the proceedings document. These
papers provide a high-level overview of eCryptfs, along with extensive
discussion of various topics relating to filesystem security in Linux.


2 Threat Model

eCryptfs version 0.1 protects data confidentiality in the event that
an unauthorized agent gains access to the data in a context that is
outside the control of the host operating environment. A secret
passphrase predicates access to the unencrypted contents of each
individual file object. An agent without the passphrase secret
associated with any given file should not be able to discern any
strategic information about the contents of any given encrypted file,
aside from what can be deduced from the file name, the file size, or
other metadata associated with the file. It should be about as
difficult to attack an encrypted eCryptfs file as it is to attack a
file encrypted by GnuPG (using the same cipher, key, etc.).

No intermediate state of the file on disk should be more easily
attacked than the final state of the file on disk; in the event of a
system error or power failure during an eCryptfs operation, no
partially written content should weaken the file's
confidentiality. Attackers should not be able to detect via a
watermarking attack whether an eCryptfs user is storing any particular
plaintext. We assume that an attacker potentially has access to every
intermediate state of an encrypted file on secondary storage.

eCryptfs offers no additional access control functions other than what
is already implementable via standard POSIX file permissions,
Mandatory Access Control mechanisms (i.e., SE Linux), and so
forth. Release 0.1 does not include integrity verification; that
feature will be included in a later release.


3 Functional Overview

eCryptfs is a stacked filesystem that is implemented natively in the
Linux kernel VFS. Since eCryptfs is stacked, it does not write
directly into a block device. Instead, it mounts on top of a directory
in a lower filesystem. Most any POSIX-compliant filesystem can act as
a lower filesystem; EXT2, EXT3, and JFS are known to work with
eCryptfs. Objects in the eCryptfs filesystem, including inode, dentry,
and file objects, correlate in a one-to-one basis with the objects in
the lower filesystem.

eCryptfs encrypts and decrypts the contents of the file; release 0.1
passes through other attributes of the file unencrypted, such as the
file size, the file name, the access permissions, the timestamp, and
the extended attributes. Directory contents are also passed through
unencrypted.

eCryptfs is derived from Cryptfs[ref:cryptfs], which is part of the
FiST framework developed and maintained by Erez Zadok[ref:fist].

3.1 VFS Objects

eCryptfs maintains the reference between the objects in the eCryptfs
filesystem and the objects in the lower filesystem. The references to
the lower filesystem objects are maintained from eCryptfs via (1) the
file object's private_data pointer, (2) the inode object's
u.generic_ip pointer, (3) the dentry object's d_fsdata pointer, and
(4) the superblock object's s_fs_info pointer. The pointers for the
eCryptfs dentry, file, and superblock objects only reference the
corresponding lower filesystem objects.

The inode u.generic_ip pointer references a data structure that
contains state information for cryptographic operations and a
reference to the lower inode object. The ecryptfs_crypt_stat structure
is the inode cryptographic state structure. eCryptfs fills in the
ecryptfs_crypt_stat struct from information stored in the header
region of the lower file (for existing files) or from the mount-wide
policy (for newly created files).

3.2 VFS Operations

3.2.1 Mount

At mount time, a helper application generates an authentication token
for the passphrase specified by the user. eCryptfs uses the keyring
support in the Linux kernel to store the authentication token in the
user's session keyring. A mount parameter contains the identifier for
this authentication token. eCryptfs retrieves the authentication token
from the session keyring using this identifier. It then uses the
contents of the authentication token to set up the cryptographic
context for newly created files. It also uses the contents of the
authentication token to access the contents of previously created
files.

3.2.2 File Open

When an existing file is opened in eCryptfs, eCryptfs opens the lower
file and reads in the header. The existence of an eCryptfs marker is
verified, the flags are parsed, and then the packet set is parsed. The
key identifier contained in the header is matched against the
mount-wide key identifier specified at mount time. If eCryptfs cannot
match the key identifier with the one specified at mount time, the
open fails with a -EIO error code. eCryptfs generates a root
initialization vector by taking the MD5 sum of the file encryption
key; the root IV is the first N bytes of that MD5 sum, where N is the
number of bytes constituting an initialization vector for the cipher
being used for the file (it is worth noting that known plaintext
attacks against the MD5 hash algorithm do not affect the security of
eCryptfs, since eCryptfs only hashes secret values).

While processing the header information, eCryptfs modifies the
ecryptfs_crypt_stat struct associated with the eCryptfs inode
object. The modifications to the ecryptfs_crypt_stat structure
include:

* Setting various flags, such as ECRYPTFS_ENCRYPTED.

* Writing the inode file encryption key.

* Writing the cipher name.

* Writing the root initialization vector.

* Filling in the array of authentication token
signatures for the authentication tokens associated
with the inode.

* Setting the number of header pages.

* Setting the extent size.

eCryptfs later uses this information when performing VFS operations.

When a file is opened that does not yet exist, the ecryptfs_crypt_stat
structure is initialized according to the mount-wide policy for
release 0.1. This information is used to generate and write the file
header prior to any further VFS operations:

* The file is encrypted.

* The cipher is AES-128.

* The root IV is the MD5 hash of the session key.

* The only authentication token associated with the
file is the mount-wide passphrase specified at mount time.

* There is one header page.

* The extent size is equal to the kernel's configured
page size.

Once the ecryptfs_crypt_stat structure is filled in, eCryptfs
initializes the kernel crypto API cryptographic context for the
inode. The cryptographic context is initialized in CBC mode and is
used in all subsequent page reads and writes.

3.2.3 Page Read

Reads can only occur on an open file, and a file can only be opened if
an applicable authentication token exists in the user's session
keyring at the time that the VFS syscall that effectively opens the
file takes place.

On a page read, the eCryptfs page index is interpolated into the
corresponding lower page index, taking into account the header page in
the file. eCryptfs derives the initialization vector for the given
page index by concatenating the ASCII text representation of the page
offset to the root initialization vector bytes for the inode and
taking the MD5 sum of that string.

eCryptfs then reads in the encrypted page from the lower file and
decrypts the page. eCryptfs first sets up the cryptographic structures
to perform the decryption. It then makes the call to the kernel crypto
API to perform the decryption for the page (in release 0.1, an extent
is equivalent to a page). This decrypted page is what results from the
VFS page read syscall.

3.2.4 Page Write

On a page write, eCryptfs performs a similar set of operations that
occur on a page read, only the data is encrypted rather than
decrypted. The lower index is interpolated, the initialization vector
is derived, the page is encrypted with the file encryption key via the
kernel crypto API, and the encrypted page is written out to the lower
file.

3.2.5 File Truncation

When a file is either truncated to a smaller size or extended to a
larger size, eCryptfs updates the filesize field (the first 8 bytes of
the lower file) accordingly. When seeking past the end of the file,
eCryptfs writes encrypted strings of zero's between the previous end
of the file and the new end of the file.

3.2.6 File Close

In eCryptfs release 0.1, the packet set in the header never changes
after the file is initially created. When a file is no longer being
accessed, the kernel VFS frees its associated file, dentry, and inode
objects according to the standard resource deallocation process in the
VFS; eCryptfs does not perform any further cryptographic operations on
the file.


4 Cryptographic Properties

4.1 Key Management

RFC 2440 (OpenPGP)[ref:rfc2440] heavily influences the design of
eCryptfs, although deviations from the RFC are necessary to support
random access in a filesystem. eCryptfs stores RFC 2440-compatible
packets in the header for each file. Packet types used include Tag 3
(passphrase) and Tag 11 (literal). Each file has a unique file
encryption key associated with it; the file encryption key acts as a
symmetric key to encrypt and decrypt the file contents and is
analogous to the session key referenced in RFC 2440 . eCryptfs
generates that file encryption key via the Linux kernel
get_random_bytes() function call at the time that a file is
created. The length of the file encryption key is dependent upon the
cipher being used. By default, eCryptfs selects AES-128. Later
versions of eCryptfs will allow the user to select the cipher and key
length.

Active eCryptfs inodes contain cryptographic contexts, with one unique
context per unique inode. This context exists in a data structure that
contains such things as the file encryption key, the cipher name, the
root initialization vector, signatures of authentication tokens
associated with the inode, various flags indicating inode
cryptographic properties, pointers to crypto API structs, and so
forth. The ecryptfs_crypt_stat struct definition is in the
ecryptfs_kernel.h header file and is comprised of the elements in the
following table:

+-------------------+-----------------------+----------------------------------+
| Name | Type | Description |
+-------------------+-----------------------+----------------------------------+
| lock | Mutex | Mutex for crypt stat object |
| | | |
+-------------------+-----------------------+----------------------------------+
| root_iv | Byte Array | The root initialization vector |
| | | |
+-------------------+-----------------------+----------------------------------+
| iv | Byte Array | The current cached |
| | | initialization vector |
+-------------------+-----------------------+----------------------------------+
| key | Byte Array | The file encryption key |
| | | |
+-------------------+-----------------------+----------------------------------+
| cipher | Byte Array | Kernel crypto API cipher |
| | | description string |
+-------------------+-----------------------+----------------------------------+
| keysig | Byte Array | Signature for authentication |
| | | token associated with the inode|
+-------------------+-----------------------+----------------------------------+
| flags | Bit vector | Status flags (encrypted, etc.) |
| | | |
+-------------------+-----------------------+----------------------------------+
| iv_bytes | Integer | Length of IV |
| | | |
+-------------------+-----------------------+----------------------------------+
| num_header_pages | Integer | Number of header pages for |
| | | lower file |
+-------------------+-----------------------+----------------------------------+
| extent_size | Integer | Number of bytes in an extent |
| | | |
+-------------------+-----------------------+----------------------------------+
| key_size_bits | Integer | Length of file encryption key |
| | | in bits |
+-------------------+-----------------------+----------------------------------+
| tfm | Crypto API Context | Bulk data crypto context |
| | | |
+-------------------+-----------------------+----------------------------------+
| md5_tfm | Crypto API Context | MD5 crypto context |
| | | |
+-------------------+-----------------------+----------------------------------+

The file encryption key is encrypted and stored in the first extent of
the lower (encrypted) file. The file encryption key is encrypted with
the authentication token's key that encrypts the file encryption key.
Authentication token types reflect the encryption mechanism. There is
one "global" passphrase authentication token that eCryptfs generates
at mount time from the user's specified passphrase. Conversion of a
passphrase into a key follows the S2K process as described in RFC
2440, in that the passphrase is concatenated with a salt; that data
block is then iteratively MD5-hashed 65,536 times to generate the key
that encrypts the file encryption key.

eCryptfs stores authentication tokens in the user's session keyring (a
component of the Linux kernel keyring service). Helper scripts place
the authentication token containing the mount-wide passphrase into the
user session keyring at mount time.

When eCryptfs opens an encrypted file, it attempts to match the
authentication token contained in the header of the file against the
instantiated authentication token for the mount point. If the
authentication token for the mount point matches the authentication
token in the header of the file, then it uses that instantiated
authentication token to decrypt the file encryption key that is used
to encrypt and decrypt the file contents on page write and read
operations.

4.2 Cryptographic Confidentiality Enforcement

eCryptfs enforces the confidentiality of the data that is outside the
control of the host operating environment by encrypting the contents
of the file objects containing the data. eCryptfs utilizes the Linux
kernel cryptographic API to perform the encryption and decryption of
the contents of its files over subregions known as extents.

In release 0.1, the length of each extent is fixed to the page size
(typically 4,096 bytes). Since each file encrypted by eCryptfs
contains a header page, the encrypted file in the lower filesystem
will always be one page larger than the unencrypted file delivered by
eCryptfs; eCryptfs transparently maps the page indices between the
eCryptfs file and the lower file on read and write operations. Each
extent is independently encrypted in CBC mode.

eCryptfs derives the initialization vector (IV) for each extent from a
root initialization vector that is unique for each file. The root IV
is a subset of the MD5 hash of the file encryption key for the
file. The extent IV derivation process entails taking the MD5 sum of
the secret root IV concatenated with the ASCII decimal characters
representing the extent index.

When a readpage() request comes through as the result of a VFS
syscall, eCryptfs will interpolate the page index to find the
corresponding extent in the lower (encrypted) file. eCryptfs reads
this extent in and then decrypts it; each extent is encrypted with
whatever cipher that eCryptfs selected for the file at the time the
file was created (in release 0.1, this defaults to the AES-128
cipher). Each extent region is independent of the other extent
regions; they are not chained in any way.

When a writepage() request comes through as a result of a VFS syscall,
eCryptfs will read the target extent from the lower file using the
process described in the prior paragraph. The data on that page is
modified according to the write request. The entire (modified) page is
re-encrypted (again, in CBC mode) with the same IV and key that were
used to originally encrypt the page; the newly encrypted page is then
written out to the lower file.

Future releases will include support for integrity verification.

4.3 File Format

This release only supports a mount-wide passphrase, and so the packet
set consists only of a single Tag 3 followed by a single Tag 11
packet. These packets store the encrypted file encryption key and
adhere to the specification given in RFC 2440.

The first 20 bytes consist of the file size, the eCryptfs marker, and
a set of status flags. From byte 20 on, only RFC 2440-compliant
packets are valid.

Page 0:
Octets 0-7: Unencrypted file size
Octets 8-15: eCryptfs special marker
Octets 16-19: Flags
Octet 16: File format version number (between 0 and 255)
Octets 17-18: Reserved
Octet 19: Bit 1 (lsb): Reserved
Bit 2: Encrypted?
Bits 3-8: Reserved
Octet 20: Begin RFC 2440 authentication token packet set
Page 1:
Extent 0 (CBC encrypted)
Page 2:
Extent 1 (CBC encrypted)
...

In the RFC 2440 packet set, each Tag 3 (passphrase) packet is
immediately followed by a Tag 11 (literal) packet containing the
identifier for the passphrase in the Tag 3 packet. This identifier is
formed by hashing the key that is generated from the passphrase in the
String-to-Key (S2K) operation. Release 0.1 only supports one Tag 3/Tag
11 pair, which correlates with the mount-wide passphrase.

4.3.1 Marker

The eCryptfs marker for each file is formed by generating a 32-bit
random number ( X ) and writing it immediately after the 8-byte file
size at the head of the lower file. The hexadecimal value 0x3c81b7f5
is XOR'd with the random value ( Y=0x3c81b7f5^X ), and the
result is written immediately after the random number.

4.4 Deployment Considerations

eCryptfs is concerned with protecting the confidentiality of data on
secondary storage that is outside the control of a trusted host
environment. eCryptfs operates on the VFS layer, and so it will not
encrypt data written to the swap secondary storage. It is recommended
that the user employ dm-crypt to encrypt the swap space on a machine
where sensitive data may be loaded into memory at some point.

Selection of a passphrase should follow standard strong passphrase
practices. eCryptfs ships with various helper applications in the
misc/ directory; use whatever tools are convenient for you to generate
a strong passphrase string. The user should store the string in a
secure place and use that as the passphrase when prompted.

4.5 Cryptographic Summary

The key design components for eCryptfs release 0.1 are:

* Header page contains plaintext file size, eCryptfs
marker, version, flags, and RFC 2440 packets.

* A mount-wide passphrase is stored in the user session
keyring in the form of an authentication token.

* Each file has a unique randomly-generate file
encryption key. The file encryption key is encrypted
and stored in the file header as a Tag 3 packet as
defined by RFC 2440.

* The authentication token identifier, which is stored
in the Tag 11 packet following the Tag 3 packet, is
formed by taking the hash of the key that encrypts
the file encryption key.

* The key that encrypts the file encryption key is
generated according to the S2K mechanism described
in RFC 2440.

* Page-size extents are encrypted with the default
cipher (AES-128) in CBC mode.

* Each file's root initialization vector is the MD5 sum
of the file encryption key for the file.

* The initialization vector for each extent is
generated by concatenating the root IV and the ASCII
representation of the page index and taking the MD5
sum of that string.


References

[ref:rfc2440]
J. Callas, L. Donnerhacke, H. Finney, R. Thayer, "OpenPGP
Message Format," RFC 2440, Internet Engineering Task
Force, Network Working Group, Nov. 1998, ; accessed
March 13, 2006.

[ref:cryptfs]
E. Zadok, I. Badulescu, and A. Shender. Cryptfs: A
Stackable Vnode Level Encryption File System. Technical
Report CUCS-021-98. Computer Science Department,
Columbia University, 1998.

[ref:fist]
E. Zadok and J. Nieh. FiST: A Language for Stackable
File Systems. To appear in USENIX Conf. Proc., June 2000.


2006-03-24 23:12:51

by James Morris

[permalink] [raw]
Subject: Re: eCryptfs Design Document

On Fri, 24 Mar 2006, Michael Halcrow wrote:

> initialization vector by taking the MD5 sum of the file encryption
> key; the root IV is the first N bytes of that MD5 sum, where N is the
> number of bytes constituting an initialization vector for the cipher
> being used for the file (it is worth noting that known plaintext
> attacks against the MD5 hash algorithm do not affect the security of
> eCryptfs, since eCryptfs only hashes secret values).

What about other attacks on MD5? Hard coding it into the system makes me
nervous, what about making this selectable?

> By default, eCryptfs selects AES-128. Later versions of eCryptfs will
> allow the user to select the cipher and key length.

Also, what about making the encryption mode selectable, to at least allow
for like LRW support in addition to CBC?


- James
--
James Morris
<[email protected]>

2006-03-24 23:47:12

by Andrew Morton

[permalink] [raw]
Subject: Re: eCryptfs Design Document

Michael Halcrow <[email protected]> wrote:
>
> On Fri, Nov 18, 2005 at 10:16:59PM -0800, Andrew Morton wrote:
> > If Linux is going to offer a feature like this then people have to
> > be able to trust it to be quite secure. What we don't want to
> > happen is to distribute it for six months and then be buried in
> > reports of vulnerabilites from cryptography specialists. Even worse
> > if those reports lead to exploits.
> >
> > So I guess what I'm asking is: has this code been reviewed by crypto
> > experts? Bearing in mind that it'll be world-class crypto people
> > who will try to poke holes in it.
>
>
> ...
> The PDF document is obtainable from the eCryptfs SourceForge file
> download section:
>
> http://sourceforge.net/project/showfiles.php?group_id=133988
>
> I also have it posted on the eCryptfs web site:
>
> http://ecryptfs.sourceforge.net/ecryptfs_design_doc_v0_1.pdf

Helps, thanks.

> ...
>
> 3.2.3 Page Read
>
> ...
>
> On a page read, the eCryptfs page index is interpolated into the
> corresponding lower page index, taking into account the header page in
> the file.

I trust that PAGE_CACHE_SIZE is not implicitly encoded into the file layout?

> ...
> When a writepage() request comes through as a result of a VFS syscall,
> eCryptfs will read the target extent from the lower file using the
> process described in the prior paragraph. The data on that page is
> modified according to the write request. The entire (modified) page is
> re-encrypted (again, in CBC mode) with the same IV and key that were
> used to originally encrypt the page; the newly encrypted page is then
> written out to the lower file.

So ecryptfs files have their own plain-text pagecache, which is backed by
the underlying file's encrypted pagecache. Passing through things like
fsync() will be interesting. We get that wrong for loop at present.

hm. The above write() description doesn't sound right. The read+decrypt
from the underlying fs should happen at ->prepare_write(), not at
->writepage(). And it can be elided if ->prepare_write() is about to write
the whole page, and if the underlying fs's blocksize is less than or equal
to the ecryptfs's blocksize.

Or something like that. The way this document talks about a file's "page
size" is a worry. Files have block sizes, and they're <= PAGE_CACHE_SIZE,
so the files are portable between different PAGE_SIZE setups.

Anyway, I'll stop trying to review the code without the code.



One dutifully wonders whether all this functionality could be provided via
FUSE...

2006-03-25 00:14:00

by Michael Halcrow

[permalink] [raw]
Subject: Re: eCryptfs Design Document

On Fri, Mar 24, 2006 at 03:49:20PM -0800, Andrew Morton wrote:
> Michael Halcrow <[email protected]> wrote:
> > On a page read, the eCryptfs page index is interpolated into the
> > corresponding lower page index, taking into account the header page in
> > the file.
>
> I trust that PAGE_CACHE_SIZE is not implicitly encoded into the file
> layout?

For release 0.1, it is. Managing differing page sizes is one of the
not-so-trivial changes to eCryptfs that we have planned for the 0.2
release. For this release, we can easily include a flag setting in the
header that indicates the page size, so that at least eCryptfs will
return a -EIO when the file is moved between hosts with different page
sizes. We will make sure that this is in the code before it is
submitted.

Do you think that this is an acceptable approach for the initial
release of eCryptfs?

> hm. The above write() description doesn't sound right. The
> read+decrypt from the underlying fs should happen at
> ->prepare_write(), not at ->writepage(). And it can be elided if
> ->prepare_write() is about to write the whole page, and if the
> underlying fs's blocksize is less than or equal to the ecryptfs's
> blocksize.

Yes, the design document is unclear; what you describe here is what
the code actually does.

> Anyway, I'll stop trying to review the code without the code.

I would say that this design document is largely only good for
evaluating the security of the design. At this point, it looks like
the code is ready to go out for evaluation, so we will get to work on
cleaning it up and getting it into patches.

Speaking of which, is there any particular way of breaking the code
into patches that you would prefer for delivery of a new filesystem?
In the past, we have been breaking the code into one patch for
inode.c, another for dentry.c, and so forth.

> One dutifully wonders whether all this functionality could be
> provided via FUSE...

My main concern with FUSE has to do with shared memory mappings. My
next concern is with regard to performance impact of constant context
switching during page reads and writes.

Whether to implement in the kernel or in userspace was a fundamental
design decision that was the subject of debate early in the
development of eCryptfs. Given the in-kernel support for something
like a cryptographic filesystem, such as the kernel crypto API and the
keyring support, and given performance and shared memory mapping
concerns, it seemed rational to implement it directly in the kernel.

More heavy-weight cryptographic operations in later versions of
eCryptfs, such as policy evaluation and public key operations, will be
done mostly in userspace and only on file open/close events.

Thanks,
Mike

2006-03-25 00:31:46

by Andrew Morton

[permalink] [raw]
Subject: Re: eCryptfs Design Document

Michael Halcrow <[email protected]> wrote:
>
> On Fri, Mar 24, 2006 at 03:49:20PM -0800, Andrew Morton wrote:
> > Michael Halcrow <[email protected]> wrote:
> > > On a page read, the eCryptfs page index is interpolated into the
> > > corresponding lower page index, taking into account the header page in
> > > the file.
> >
> > I trust that PAGE_CACHE_SIZE is not implicitly encoded into the file
> > layout?
>
> For release 0.1, it is. Managing differing page sizes is one of the
> not-so-trivial changes to eCryptfs that we have planned for the 0.2
> release. For this release, we can easily include a flag setting in the
> header that indicates the page size, so that at least eCryptfs will
> return a -EIO when the file is moved between hosts with different page
> sizes. We will make sure that this is in the code before it is
> submitted.
>
> Do you think that this is an acceptable approach for the initial
> release of eCryptfs?

Well it's not good. Will ecryptfs-0.1 files be both-way compatible with
ecryptfs-0.2 files?

The basic unit of a pagecache page's backing store should be a
filesystem-determined blocksize, divorced from page sizes.

For your purposes we can abstract things out a bit and not have to worry
about the actual on-underlying-disk blocksize. Which is fortunate, because
you want an ecryptfs-on-ext3-on-1kblocksize file to work when copied to an
ecryptfs-on-ext2-on-2kblocksize filesystem.

I think it would be acceptable to design ecryptfs to assume that its
underlying store has a 4096-byte "blocksize". So all the crypto operates
on 4096-byte hunks and the header is 4096-bytes long and things are copied
to and from the underlying fs's pagecache in 4096-byte hunks.

That's because 4096 is, for practical purposes, the minimum Linux
PAGE_CACHE_SIZE. Globally available and all filesystems support it.

>
> ...
>
> Speaking of which, is there any particular way of breaking the code
> into patches that you would prefer for delivery of a new filesystem?
> In the past, we have been breaking the code into one patch for
> inode.c, another for dentry.c, and so forth.

That seems a reasonable way of doing it. It's all logically one patch, but
for review purposes we need some sort of splitup.

> > One dutifully wonders whether all this functionality could be
> > provided via FUSE...
>
> My main concern with FUSE has to do with shared memory mappings.

OK. But I'm sure Miklos would appreciate help with that ;)

> My
> next concern is with regard to performance impact of constant context
> switching during page reads and writes.

Maybe. One could estimate the cost of that by benchmarking an existing
(efficient) FUSE fs and then add fiddle factors. If the number of copies
is the same for in-kernel versus FUSE then one would expect the performance
to be similar. Especially if the encrypt/decryption cost perponderates.

2006-03-25 07:39:40

by Miklos Szeredi

[permalink] [raw]
Subject: Re: eCryptfs Design Document

> > > One dutifully wonders whether all this functionality could be
> > > provided via FUSE...
> >
> > My main concern with FUSE has to do with shared memory mappings.
>
> OK. But I'm sure Miklos would appreciate help with that ;)

You bet.

> > My next concern is with regard to performance impact of constant
> > context switching during page reads and writes.
>
> Maybe. One could estimate the cost of that by benchmarking an existing
> (efficient) FUSE fs and then add fiddle factors. If the number of copies
> is the same for in-kernel versus FUSE then one would expect the performance
> to be similar. Especially if the encrypt/decryption cost perponderates.

The main overhead of FUSE is not in copies, but in context switching.
For I/O that can be mitigated by doing it in big chunks, otherwise the
only solution is adding more processors.

Miklos

2006-03-25 19:29:17

by Phillip Susi

[permalink] [raw]
Subject: Re: eCryptfs Design Document

Michael Halcrow wrote:
> * A mount-wide passphrase is stored in the user session
> keyring in the form of an authentication token.

I'm a bit confused because you appear to be contradicting yourself. You
say several times that a mount-wide passphrase is used for the master
key. If that is the case, then it would be given at mount time and be
bound to the super block. You also then say that the master key is
stored in the kernel keyring. If that is the case, then you don't have
to know the key at mount time, rather the key is associated with a given
process or group of processes and will be required when such a process
attempts to open a file on that mount point. This would also allow
different users to use different keys.


So which is it? Is the master key bound to the superblock, or to the
session keyring? Or am I just confused about the meaning of the kernel
keyring?

> passphrase into a key follows the S2K process as described in RFC
> 2440, in that the passphrase is concatenated with a salt; that data
> block is then iteratively MD5-hashed 65,536 times to generate the key
> that encrypts the file encryption key.


Are you saying that you salt the passphrase, hash that, then hash the
hash, then hash that hash, and so on? What good does repeatedly hashing
the hash do? Simply hashing the salted passphrase should be sufficient
to obtain a key.


2006-03-25 19:55:25

by Michael Halcrow

[permalink] [raw]
Subject: Re: eCryptfs Design Document

On Sat, Mar 25, 2006 at 02:28:21PM -0500, Phillip Susi wrote:
> Michael Halcrow wrote:
> >* A mount-wide passphrase is stored in the user session
> > keyring in the form of an authentication token.
>
> You say several times that a mount-wide passphrase is used for the
> master key. If that is the case, then it would be given at mount
> time and be bound to the super block.

The mount-wide passphrase in the user session keyring is actually not
necessary to keep around after the mount process is finished in this
release, and we will likely alter the design and implementation for
the 0.1 release to just remove it once the file key encryption key is
associated with the eCryptfs superblock object on mount.

In future releases, we will be storing multiple passphrase and public
key authentication tokens in the user's eCryptfs keyring, and so the
use of the kernel keyring will make a lot more sense. We are trying to
make things as simple as possible for the 0.1 release so as to limit
the complexity involved in analysis and debugging.

For the record, if you mount with one passphrase, create a file,
unmount, mount with another passphrase, and create another file, then
you will have two files side-by-side that are only accessible with
their respective passphrases. To access the first file, you need to
mount with the passphrase used to create that file in the first place,
and to access the second file, you need to mount with the passphrase
used to create that file. In future releases, the idea is that the
user will have two authentication tokens in the keyring, one for each
passphrase, so that he will be able to access either file under the
same mount.

> >passphrase into a key follows the S2K process as described in RFC
> >2440, in that the passphrase is concatenated with a salt; that data
> >block is then iteratively MD5-hashed 65,536 times to generate the key
> >that encrypts the file encryption key.
>
> Are you saying that you salt the passphrase, hash that, then hash
> the hash, then hash that hash, and so on? What good does repeatedly
> hashing the hash do? Simply hashing the salted passphrase should be
> sufficient to obtain a key.

This approach is only used to help make dictionary attacks against the
passphrase a bit harder.

Mike

2006-03-26 17:11:36

by Phillip Susi

[permalink] [raw]
Subject: Re: eCryptfs Design Document

Michael Halcrow wrote:
> The mount-wide passphrase in the user session keyring is actually not
> necessary to keep around after the mount process is finished in this
> release, and we will likely alter the design and implementation for
> the 0.1 release to just remove it once the file key encryption key is
> associated with the eCryptfs superblock object on mount.
>

I see, so for now you import the key from the keyring into the
superblock at mount time, but in the future you will directly use the
key from the keyring as needed?

> In future releases, we will be storing multiple passphrase and public
> key authentication tokens in the user's eCryptfs keyring, and so the
> use of the kernel keyring will make a lot more sense. We are trying to
> make things as simple as possible for the 0.1 release so as to limit
> the complexity involved in analysis and debugging.
>
>> Are you saying that you salt the passphrase, hash that, then hash
>> the hash, then hash that hash, and so on? What good does repeatedly
>> hashing the hash do? Simply hashing the salted passphrase should be
>> sufficient to obtain a key.
>
> This approach is only used to help make dictionary attacks against the
> passphrase a bit harder.
>

Isn't that what adding the salt is for? You add 16 bits of salt so that
a pre hashed dictionary would require 65,536 different hashes per
passphrase permutation. That places more computation burden on
generating such a dictionary, but more importantly it places a large
storage burden on the dictionary.

Recursively hashing only places greater computation on the creation of
the dictionary, which is of no consequence as the dictionary only has to
be created once. If you want to fight dictionary attacks, you should
add a longer salt rather than recursively hash. Taking the salt from 16
bits to 32 bits also requires the attacker to compute 65,536 times more
hashes per passphrase permutation at dictionary creation time, but ALSO
requires that they store 65,536 times more hashed values in the dictionary.

Another thought that crossed my mind is that it is likely possible to
factor the recursive hash function and simplify it such that it can be
computed almost as quickly as the single hash rather than taking 65,536
times longer.

2006-03-26 18:10:26

by Michael Halcrow

[permalink] [raw]
Subject: Re: eCryptfs Design Document

On Sun, Mar 26, 2006 at 12:10:29PM -0500, Phillip Susi wrote:
> Michael Halcrow wrote:
> >The mount-wide passphrase in the user session keyring is actually
> >not necessary to keep around after the mount process is finished in
> >this release, and we will likely alter the design and
> >implementation for the 0.1 release to just remove it once the file
> >key encryption key is associated with the eCryptfs superblock
> >object on mount.
>
> I see, so for now you import the key from the keyring into the
> superblock at mount time, but in the future you will directly use
> the key from the keyring as needed?

Yes. Since that's going to be ``in the future,'' it makes sense just
to remove it from the keyring after the mount is complete for now.

> >>Are you saying that you salt the passphrase, hash that, then hash
> >>the hash, then hash that hash, and so on? What good does repeatedly
> >>hashing the hash do? Simply hashing the salted passphrase should be
> >>sufficient to obtain a key.
> >
> >This approach is only used to help make dictionary attacks against the
> >passphrase a bit harder.
>
> Isn't that what adding the salt is for? You add 16 bits of salt so that
> a pre hashed dictionary would require 65,536 different hashes per
> passphrase permutation.

The salt in eCryptfs is 64 bits (8 octets, per the spec of the
iterated and salted S2K; see Section 3.6.1.3 RFC 2440). Without the
iterated hashing, the raw dictionary generation will require storage
on the scale of 2^64 multiplied by the size of the dictionary. I think
this is big enough to address any attacks that involve pre-computed
dictionaries, as you have already pointed out.

The dictionary attack that I have in mind that I would like to make a
``bit'' harder is the dedicated attack against one particular file. If
an attacker just wants to attack the passphrase on that file, then
without iterated hashing, the difficulty is roughly proportional (in
aggregate) to half size of the dictionary. If the passphrase is weak,
then the file will likely be compromised anyway, but at least an
iterated hash multiplies the amount of work that the dictionary
attacker needs to do by the number of iterations.

The question then is whether the additional hashing effort on the part
of a legitimate user has a good cost/benefit security tradeoff. If the
passphrase is strong and if the user is having to do the iterated hash
on every file in a large directory, then the tradeoff is probably
bad. If the passphrase is weak, the user can tolerate the overhead of
iterated hashing on file opens, and the attacker has limited
computational resources, then the tradeoff might be good. Ultimately,
that is the sort of thing I would like to be configurable via policy
in later versions of eCryptfs.

Keep in mind that, for the current release of eCryptfs, the iterated
hashing only happens once per mount, not once per file, and so there
is very little cost to the user, and it has at least some security
benefit, so why not do it? Note that this will change in later
versions, as a different salt is generated for each file in the
system, but it will also be configurable in later versions.

Mike

2006-03-27 00:05:26

by Phillip Hellewell

[permalink] [raw]
Subject: Re: eCryptfs Design Document

> The salt in eCryptfs is 64 bits (8 octets, per the spec of the
> iterated and salted S2K; see Section 3.6.1.3 RFC 2440). Without the
> iterated hashing, the raw dictionary generation will require storage
> on the scale of 2^64 multiplied by the size of the dictionary. I think
> this is big enough to address any attacks that involve pre-computed
> dictionaries, as you have already pointed out.

I concur with Mike. The salt is long enough to effectively thwart any
possibility of a pre-computed dictionary attack.

> The dictionary attack that I have in mind that I would like to make a
> ``bit'' harder is the dedicated attack against one particular file. If
> an attacker just wants to attack the passphrase on that file, then
> without iterated hashing, the difficulty is roughly proportional (in
> aggregate) to half size of the dictionary. If the passphrase is weak,
> then the file will likely be compromised anyway, but at least an
> iterated hash multiplies the amount of work that the dictionary
> attacker needs to do by the number of iterations.

Again I concur with Mike. Iterative hashing is a very common technique,
and is very effective against this type of dictionary attack. If you
hash 1000 times, then an attack that normally could check 1 million
passwords per second would now only be able to check 1000 passwords per
second.

Without iterative hashing, as computers get faster, so would dictionary
attacks, and then people would have to keep using longer and longer
passwords to be as effective. Iterative hashing "levels the playing
field" in a way.

> Keep in mind that, for the current release of eCryptfs, the iterated
> hashing only happens once per mount, not once per file, and so there
> is very little cost to the user, and it has at least some security

Again, I agree with Mike. The cost is extremely small, especially since
the hashing only happens once per mount. As long as we make sure that
on an average computer it takes less than a second, or make it
configurable so the user can keep it as small as they want, then I can't
see anything but good coming from an iterative hash.

Now what would be really cool is if we could auto-configure the number
of iterations so that it always ends up taking 1/10 of a second to
perform a hash. Then it will always hash a reasonable number of times
regardless of your computer speed.

Remeber, a user is not going to notice a 1/10 of a second pause when
they type in their password, but an attacker will definitely notice that
they are only able to guess 10 passwords per second!

Phillip

--
Phillip Hellewell <phillip AT hellewell.homeip.net>

2006-03-27 02:55:01

by Phillip Susi

[permalink] [raw]
Subject: Re: eCryptfs Design Document

Phillip Hellewell wrote:
> Again I concur with Mike. Iterative hashing is a very common technique,
> and is very effective against this type of dictionary attack. If you
> hash 1000 times, then an attack that normally could check 1 million
> passwords per second would now only be able to check 1000 passwords per
> second.
>
> Without iterative hashing, as computers get faster, so would dictionary
> attacks, and then people would have to keep using longer and longer
> passwords to be as effective. Iterative hashing "levels the playing
> field" in a way.
>


Except that I believe you can write code to compute the nth hash in O(1)
time rather than O(n) time, so that kind of defeats the purpose, though
I'm no expert so I could be wrong.


2006-03-27 16:10:07

by Michael Thompson

[permalink] [raw]
Subject: Re: eCryptfs Design Document

On 3/26/06, Phillip Susi <[email protected]> wrote:
> Phillip Hellewell wrote:
> > Again I concur with Mike. Iterative hashing is a very common technique,
> > and is very effective against this type of dictionary attack. If you
> > hash 1000 times, then an attack that normally could check 1 million
> > passwords per second would now only be able to check 1000 passwords per
> > second.
> >
> > Without iterative hashing, as computers get faster, so would dictionary
> > attacks, and then people would have to keep using longer and longer
> > passwords to be as effective. Iterative hashing "levels the playing
> > field" in a way.
> >
>
>
> Except that I believe you can write code to compute the nth hash in O(1)
> time rather than O(n) time, so that kind of defeats the purpose, though
> I'm no expert so I could be wrong.

I do not believe it is possible to compute the nth hash in O(1) time,
starting with no previously-computer hashes, since in order to
computer the nth hash, you need input which is the n-1th hash. This
takes the form: hash(n) = hash(hash(n-1)). In order to know the hash
of n-1, you need to know the hash of n-2. This chains down to your
original hash. This argument holds if you retaining the standard
properties of hashes: that is it is non-trivial to find input which
yields a given hash.

--
Michael C. Thompson <[email protected]>
Software-Engineer, IBM LTC Security

2006-03-27 16:17:53

by Michael Thompson

[permalink] [raw]
Subject: Re: eCryptfs Design Document

On 3/24/06, James Morris <[email protected]> wrote:
> On Fri, 24 Mar 2006, Michael Halcrow wrote:
>
> > initialization vector by taking the MD5 sum of the file encryption
> > key; the root IV is the first N bytes of that MD5 sum, where N is the
> > number of bytes constituting an initialization vector for the cipher
> > being used for the file (it is worth noting that known plaintext
> > attacks against the MD5 hash algorithm do not affect the security of
> > eCryptfs, since eCryptfs only hashes secret values).
>
> What about other attacks on MD5? Hard coding it into the system makes me
> nervous, what about making this selectable?
>
> > By default, eCryptfs selects AES-128. Later versions of eCryptfs will
> > allow the user to select the cipher and key length.
>
> Also, what about making the encryption mode selectable, to at least allow
> for like LRW support in addition to CBC?

These are part of the eCryptfs roadmap. I'm not sure when we are
planning to incorperate the functionality to select your hash and
cipher (I believe its 0.2 or 0.3), but we have experimented with this
and have had success doing so. The code is not included in 0.1 due to
lack of testing and conflict with our mental model of the releases.

Should this functionality be high desired / required, I see no reason
why it can't be added, but Mike Halcrow and Phillip need to weight in
on this too :)

--
Michael C. Thompson <[email protected]>
Software-Engineer, IBM LTC Security

2006-03-27 16:52:02

by Michael Halcrow

[permalink] [raw]
Subject: Re: eCryptfs Design Document

On Fri, Mar 24, 2006 at 06:12:46PM -0500, James Morris wrote:
> What about other attacks on MD5?

The only attacks that I am aware of against MD5 require known prefix
values, and all of the prefix values in eCryptfs are secret.

> Hard coding it into the system makes me nervous, what about making
> this selectable?

This is yet another attribute that we would like to include in policy
support in future versions of eCryptfs. There are many things we could
have made parameterizable in release 0.1, but we decided to just lock
it down to a single hash algorithm, cipher/key size, chaining mode,
and so forth. Once eCryptfs has been weathered in its current state,
we would like to incrementally start allowing more flexibility in
these cryptographic attributes on a step-by-step basis.

Those who reviewed the design document did express concern over the
fact that we are using MD5, simply because of the known-prefix
attacks, but up to now, based on what is known about the cryptographic
properties of the hash algorithms, nobody has presented a reason why
using something like SHA-256 or RIPEMD-160 for either the S2K
operation or the root IV generation would make eCryptfs any more
secure than it currently is.

> > By default, eCryptfs selects AES-128. Later versions of eCryptfs
> > will allow the user to select the cipher and key length.
>
> Also, what about making the encryption mode selectable, to at least
> allow for like LRW support in addition to CBC?

That also is another feature that we would like to defer for a future
release. Changing the chaining mode may have security implications,
and so we would prefer to think through how that feature can be
intelligently offered to the user. For instance, we would not want a
user to just be able to blindly select ECB mode, which he might
naively do if he finds that it helps performance.

Thanks,
Mike

2006-03-27 23:31:11

by Michael Halcrow

[permalink] [raw]
Subject: Re: eCryptfs Design Document

On Fri, Mar 24, 2006 at 04:33:58PM -0800, Andrew Morton wrote:
> I think it would be acceptable to design ecryptfs to assume that its
> underlying store has a 4096-byte "blocksize". So all the crypto
> operates on 4096-byte hunks and the header is 4096-bytes long and
> things are copied to and from the underlying fs's pagecache in
> 4096-byte hunks.
>
> That's because 4096 is, for practical purposes, the minimum Linux
> PAGE_CACHE_SIZE. Globally available and all filesystems support it.

So let's say that locking eCryptfs files to only be accessible on
machines with the same page size as the machine on which the files
were created is unacceptable. eCryptfs will have to be modified a bit
to accommodate that. Now we have several issues to consider. My team
has discussed several potential solutions, but I would like to lay it
all out on the table to see if anyone out there has any suggestions on
how to proceed.

eCryptfs currently keeps the header information in the first page of
the file. This will not work when moving from a host with a page size
of 4K to a host with a page size of 8K (or vice versa). We will be
changing that so that eCryptfs works on extent-based regions of 4096
bytes, as Andrew Morton suggested.

In the current release, eCryptfs writes the header in the first page
of the file (which will soon be changed to the first 4k extent of the
file). This is nice because the header only needs to be generated and
written once (at file creation), and then it can be left alone from
that point forward.

In the current release, changing eCryptfs to operate in terms of
fixed-size (4096-byte) extents will cause page reads and writes in
eCryptfs to ``straddle'' pages in the lower filesystem if the first
extent contains the header. Consider 8K page sizes:

eCryptfs (unencrypted view):
+----------+----------+----------+----------+----------
| EXTENT_0 | EXTENT_1 | EXTENT_2 | EXTENT_3 | ...
+----------+----------+----------+----------+----------
| PAGE_0 | PAGE_1 | ...
+---------------------+---------------------+----------

Lower (encrypted form):
+----------+----------+----------+----------+----------
| HEADER | EXTENT_0 | EXTENT_1 | EXTENT_2 | ...
+----------+----------+----------+----------+----------
| PAGE_0 | PAGE_1 | ...
+---------------------+---------------------+----------

So, to read or write page 0 via eCryptfs, eCryptfs will have to read
or write extents 0 and 1, which will require accessing both page 0 and
page 1 in the lower filesystem. I do not think that this will be
acceptable in terms of performance, nor will it maintain the pattern
of one page operation in eCryptfs correlating with exactly one page
operation in the lower filesystem. For instance, if eCryptfs writes
page 0 out to disk, and then a crash occurs, then the data will be
left in an inconsistent state.

To achieve page alignment, one solution is to make the header consume
as many extents as will occupy some notion of a ``largest supported
page size.'' If we arbitrarily set that at, say, 64k, then every file
in eCryptfs on a system with a page size of 4k will automatically
consume at least 68k of space (64k header + 4k page), and eCryptfs
still will have to straddle pages for systems with 128k or 256k page
sizes (how may systems out there have page sizes >64k?).

Another solution is to write the ``header'' at the tail 4k of the
file. Then we have to abandon the benefit of having an ``untouchable''
first 4k region of the lower file. Seeks past the end of the file to
truncate to a larger size or to append data will blow away the header
extent, and it will have to be re-written. When should that happen? On
each and every truncate? When the file is closed? If we choose the
latter, then it is easy to lose your file forever if there is a system
crash before the file is closed and the header can be re-written to
its new location.

To complicate matters, in future versions, the header will need to
take multiple extents, and so we have always been planning on
eventually appending some header information at the end of the file
anyway; it looks like we are having to confront some of the issues
involved in doing that right now. In later versions, the header will
contain multiple passphrase and public key packets, along with HMAC
values. The header will need to grow to consume an arbitrary number of
extents, depending on the file size and the number of authentication
token packets.

To guarantee that the header is always present in the file, when the
eCryptfs function ecryptfs_truncate() is called, it could add on as
many additional pages to the lower file as are necessary to write the
header and then write the header out prior to returning. The overhead
in maintaining several 4k header extents at the end of the file would
be substantial (e.g., for a log file that is constantly being
appended). Plus, if the header spans more than one page, then there
are additional steps necessary to maintain consistency in the event of
an incomplete header write operation (i.e., maintain a temporary
pointer to the prior header location until the header is completely
written out to the new location).

Another idea that we have kicked around involves keeping an eCryptfs
journal file. In this case, the header can be overwritten in the lower
file for a while before it is re-written, but all of the information
necessary to generate that header always exists in a hidden journal
file. If there is a system crash, on the next mount, ecryptfsck will
check the journal file to determine that the header needs to be
written out to the file, and then it can repair the file. Journaling
functionality is something that we gauge to be a fairly large
development effort, and we feel it really should be slated for a
future release (>0.2) of eCryptfs.

So we have several ways to proceed at this point, but before we run
off and implement one of them, does anyone else out there have any
insights?

Thanks,
Mike

2006-03-28 13:49:53

by Christian Cachin

[permalink] [raw]
Subject: Re: eCryptfs Design Document

Hi all,

I'm a cryptographer with an interest in encrypting stored data.

Mike had asked me to read the eCryptfs design and I can confirm the
security statements made there, and that the algorithm choices are
adequate. The current release does not support integrity protection, but
this feature is promised for the next release through a MAC.

I don't see the need for tweakable encryption modes (like LRW, CMC)
in the eCryptfs strategy because being a virtual file system, it can
afford to insert some extra space and is not bound to the block
boundaries like a block device, for which these were developed. And with
integrity protection coming in the next release, the little extra security
gained in the current release by the tweakable modes would be a wasted
effort.

cc

---
Christian Cachin email: [email protected]
IBM Zurich Research Laboratory tel: +41-44-724-8989
Saumerstrasse 4 / Postfach fax: +41-44-724-8953
CH-8803 Rueschlikon, Switzerland http://www.zurich.ibm.com/~cca

2006-03-28 16:01:15

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: eCryptfs Design Document

Hi,

On Mon, 2006-03-27 at 17:31 -0600, Michael Halcrow wrote:

> So let's say that locking eCryptfs files to only be accessible on
> machines with the same page size as the machine on which the files
> were created is unacceptable.

Agreed.

> We will be
> changing that so that eCryptfs works on extent-based regions of 4096
> bytes, as Andrew Morton suggested.
...
> In the current release, changing eCryptfs to operate in terms of
> fixed-size (4096-byte) extents will cause page reads and writes in
> eCryptfs to ``straddle'' pages in the lower filesystem if the first
> extent contains the header.

Right, that's undesirable from a performance point of view.

> For instance, if eCryptfs writes
> page 0 out to disk, and then a crash occurs, then the data will be
> left in an inconsistent state.

Sorry, but you have to deal with that anyway. There is no guarantee
that a page write is atomic.

On an ext2/3 filesystem with 1k blocksize, for example, it can take 4
separate writes to 4 different, potentially discontiguous disk blocks to
perform that page write. Some storage may be doing transparent
bad-block relocation at the sector level. And your partition may be
oddly aligned with the tracks on disk, causing a 4k write to span
tracks. Many of these effects can be worked around with care, but the
default assumption must be that 4k writes are not guaranteed to be
atomic wrt crashes.

ext3 in its default "ordered" journaling mode does give you the
guarantee that newly-allocated page writes (writes to holes, or appends)
are atomic, but that's a special case not guaranteed at the VFS level in
general; and even there, overwrites are not necessarily atomic.

Restricting eCryptfs to filesystems with a 4k or larger blocksize is
possible, although many filesystems (eg. XFS) are fundamentally sector-
and extent-based, and cannot provide such an allocation unit guarantee.
(Though I think XFS will give you 4k extents anyway unless and until it
gets so fragmented and full that there are no >4k free extents left.)
With a 4k fs blocksize, there may be no absolute guarantee of page write
atomicity, but it should be pretty rare that a page write gets
fractured.

> To achieve page alignment, one solution is to make the header consume
> as many extents as will occupy some notion of a ``largest supported
> page size.'' If we arbitrarily set that at, say, 64k, then every file
> in eCryptfs on a system with a page size of 4k will automatically
> consume at least 68k of space (64k header + 4k page), and eCryptfs
> still will have to straddle pages for systems with 128k or 256k page
> sizes (how may systems out there have page sizes >64k?).

If you make the underlying file sparse, you could move the data offset
all the way out to 4MB or more without penalty. (It would make for
extra pain when doing backups with non-sparse-aware tools, though, or
when using non-sparse-capable filesystems; and it would bring down the
file size limit at which you get EFBIG on ext2/3 slightly.)

> Another solution is to write the ``header'' at the tail 4k of the
> file. Then we have to abandon the benefit of having an ``untouchable''
> first 4k region of the lower file.

Would it be possible simply to shift metadata to a subdir, so file foo
has the header in .encfs/foo ? That may be a performance cost you don't
want to bear, especially for small files, of course. If the header can
be shrunk enough, xattrs might also be possible; although that has its
own problems, such as compatibility with NFS etc.

> Seeks past the end of the file to
> truncate to a larger size or to append data will blow away the header
> extent, and it will have to be re-written. When should that happen? On
> each and every truncate?

On every one; and it would have to be done atomically to avoid
corrupting things.

> When the file is closed? If we choose the
> latter, then it is easy to lose your file forever if there is a system
> crash before the file is closed and the header can be re-written to
> its new location.

Right; nasty.

> To complicate matters, in future versions, the header will need to
> take multiple extents, and so we have always been planning on
> eventually appending some header information at the end of the file
> anyway;

At that point, it definitely sounds attractive to start thinking about
an external file in a subdir for the metadata. You need to keep the two
in sync, but in principle syncing data between two different parts of
different files is actually no harder than doing it in the same file,
simply because even within a file there are no guarantees against
reordering of writes.

> So we have several ways to proceed at this point, but before we run
> off and implement one of them, does anyone else out there have any
> insights?

It sounds like simply reserving the head 64k (or so) of the file for the
header will get rid of the short-term alignment problems for now, so for
development that's probably the easy way out. But if you really
anticipate potentially unbounded growth of the key/layout metadata for a
file, then using a separate file for that may be easier than trying to
come up with a complex interleaved data/metadata format.

--Stephen


2006-03-29 20:14:46

by Michael Halcrow

[permalink] [raw]
Subject: Re: eCryptfs Design Document

On Tue, Mar 28, 2006 at 11:00:56AM -0500, Stephen C. Tweedie wrote:
> Sorry, but you have to deal with that anyway. There is no guarantee
> that a page write is atomic.

Since this is the case, we really do not lose any guarantees by
allowing situations where eCryptfs does two page reads or writes on
the lower filesystem for one page read or write on the eCryptfs
filesystem. If we allow for this under some circumstances, I think we
have a good solution for the page alignment problem, balancing
correctness and performance needs. The idea is to make the common case
fast while making every case correct.

The first step will be to make eCryptfs operate on 4K-block chunks
rather than on PAGE_SIZE chunks. This should not be a very difficult
modification, and it will even result in faster performance on hosts
with page sizes over 4K, since unnecessary encryption and decryption
will be avoided. This is at least necessary to make eCryptfs files
movable between hosts with different page sizes at the moment.

Then, if the host has a page size of either 4K or 8K, we will make the
default header size 8K. This will cover, by default, all hosts where
the page size is either 4K or 8K (which is the vast majority); no page
straddling will occur, and so page reads and writes will be one-to-one
between the eCryptfs and the lower files. If the host's page size is
greater than 8K, then the header will, by default, occupy the host's
page size. This will take care of the page alignment problem for the
majority of the use case scenarios. The only time pages will not be
aligned is if a file is created on a host with a page size less than
or equal to 8K and then the file is transferred to a host with a page
size greater than 8K. In that case, two-to-one page reads and writes
will occur, causing a performance hit, but it will still work. In any
case, userspace tools can always convert the header to an appropriate
size for optimal performance if need be.

In future versions of eCryptfs, where the header data can span more
than 8K, then eCryptfs will spill over that header information onto
the end of the file, rewriting on truncate events, as previously
discussed in this thread. This will incur a performance hit on every
truncate event. Again, userspace tools (i.e., ecryptfstune) can resize
the header at the beginning of the file to store all of the data in
order to improve performance if need be. In any case, we anticipate
that header extent spill-over will be a rare event.

When HMAC support is enabled in future versions, the quantity of data
necessary to store the HMAC data may frequently be larger than 8K; we
can just make the header for HMAC-verified files something like 64K by
default then, and only very large files will require the header
information to spill over to the end of the file. The optimal header
size with HMAC turned on will need to be determined via analysis.

> Would it be possible simply to shift metadata to a subdir, so file
> foo has the header in .encfs/foo ? That may be a performance cost
> you don't want to bear, especially for small files, of course. If
> the header can be shrunk enough, xattrs might also be possible;
> although that has its own problems, such as compatibility with NFS
> etc.

Keeping the meta-data together with the data in the lower file has
always been a key feature of the eCryptfs filesystem. Divorcing the
crypto meta-data necessary to access the data for any given file
should be an absolute last resort. In any case, keeping groups of 4K
extents of header data in the front or on the end of the file does not
impose a significant degree of design or implementation complexity on
eCryptfs.

Thanks,
Mike